unsprawl.spatial

Spatial analysis module for MRT accessibility scoring.

This module provides geospatial analysis capabilities for computing MRT accessibility scores using KDTree-based nearest-neighbor queries.

Attributes

Classes

TransportScorer

Compute MRT accessibility scores using spatial nearest-neighbor queries.

Module Contents

KDTree = None
class TransportScorer(stations_df=None, cache_dir=None)[source]

Compute MRT accessibility scores using spatial nearest-neighbor queries.

This scorer loads a catalog of MRT station coordinates, strictly excluding all LRT stations using a regex filter ‘^(BP|S[WE]|P[WE])’. The pattern matches the line codes for Bukit Panjang (BP), Sengkang (SW/SE), and Punggol (PW/PE) LRT loops, ensuring that only heavy rail stations are retained.

A KDTree (from scikit-learn) is used for vectorized nearest-neighbor computation across thousands of records instantly, avoiding Python loops.

Accessibility score definition

score = max(0, 10 - (dist_km * 2)) where dist_km is the Euclidean distance in kilometers from the HDB listing coordinate to the nearest MRT station in the filtered catalog.

logger
_stations: DataFrame | None = None
_tree: KDTree | None = None
_cache_dir
static _exclude_lrt(df)[source]

Exclude LRT stations using strict regex on line codes.

Excludes station rows whose line_code matches ‘^(BP|S[WE]|P[WE])’. Column expectations: - name: station name (str) - line_code: string line code such as ‘NS’, ‘EW’, ‘DT’, ‘CC’, ‘BP’, ‘SW’ - lat, lon: numeric coordinates in degrees

_cache_paths(tag)[source]
clear_cache()[source]

Delete cached stations and KDTree files in the cache directory.

_try_load_cache(tag)[source]
_save_cache(tag)[source]
load_stations(stations_df)[source]

Load station catalog, exclude LRT, and build KDTree index.

Parameters:

stations_df (pd.DataFrame) – DataFrame with columns: [‘name’, ‘line_code’, ‘lat’, ‘lon’].

load_stations_geojson(path)[source]

Load MRT stations from an LTA Exit GeoJSON file and build KDTree.

The GeoJSON is expected to be a FeatureCollection where each feature is a station exit with properties containing station information. This loader will:

  • Extract station name and line code from common property keys.

  • Preserve robust fallback logic for station name parsing across GeoJSON variants (STATION_NA / STN_NAME / STN_NAM / NAME / etc.).

  • Strictly exclude LRT using the regex ‘^(BP|S[WE]|P[WE])’ on line codes when available, and additionally filter out any stations with ‘LRT’ in the name as a safety fallback.

  • Build a KDTree over exit coordinates (lon, lat). Using exits provides accurate pedestrian access points for distance calculations.

Parameters:

path (str) – Path to the GeoJSON file.

static _haversine_meters(latlon1, latlon2)[source]

Compute haversine distance in meters between arrays of points.

Parameters:
  • latlon1 (np.ndarray) – Array of shape (n, 2) with columns [lat_rad, lon_rad] in radians.

  • latlon2 (np.ndarray) – Array of shape (n, 2) with columns [lat_rad, lon_rad] in radians.

calculate_accessibility_score(df)[source]

Annotate DataFrame with nearest MRT and accessibility score.

Adds columns: - Nearest_MRT: name of nearest heavy-rail MRT station - Dist_m: distance to nearest station in meters - Accessibility_Score: score = max(0, 10 - (dist_km * 2))

Expectations: Input df must have ‘lat’ and ‘lon’ columns (degrees).