unsprawl.spatial¶

Spatial analysis module for MRT accessibility scoring.

This module provides geospatial analysis capabilities for computing MRT accessibility scores using KDTree-based nearest-neighbor queries.

Attributes¶

KDTree

Classes¶

TransportScorer

Compute MRT accessibility scores using spatial nearest-neighbor queries.

Module Contents¶

KDTree = None¶

class TransportScorer(stations_df=None, cache_dir=None)[source]¶

Compute MRT accessibility scores using spatial nearest-neighbor queries.

This scorer loads a catalog of MRT station coordinates, strictly excluding all LRT stations using a regex filter ‘^(BP|S[WE]|P[WE])’. The pattern matches the line codes for Bukit Panjang (BP), Sengkang (SW/SE), and Punggol (PW/PE) LRT loops, ensuring that only heavy rail stations are retained.

A KDTree (from scikit-learn) is used for vectorized nearest-neighbor computation across thousands of records instantly, avoiding Python loops.

Accessibility score definition¶

score = max(0, 10 - (dist_km * 2)) where dist_km is the Euclidean distance in kilometers from the HDB listing coordinate to the nearest MRT station in the filtered catalog.

logger¶

_stations: DataFrame | None = None¶

_tree: KDTree | None = None¶

_cache_dir¶

static _exclude_lrt(df)[source]¶

Exclude LRT stations using strict regex on line codes.

Excludes station rows whose line_code matches ‘^(BP|S[WE]|P[WE])’. Column expectations: - name: station name (str) - line_code: string line code such as ‘NS’, ‘EW’, ‘DT’, ‘CC’, ‘BP’, ‘SW’ - lat, lon: numeric coordinates in degrees

_cache_paths(tag)[source]¶

clear_cache()[source]¶

Delete cached stations and KDTree files in the cache directory.

_try_load_cache(tag)[source]¶

_save_cache(tag)[source]¶

load_stations(stations_df)[source]¶

Load station catalog, exclude LRT, and build KDTree index.

Parameters:: stations_df (pd.DataFrame) – DataFrame with columns: [‘name’, ‘line_code’, ‘lat’, ‘lon’].

load_stations_geojson(path)[source]¶

Load MRT stations from an LTA Exit GeoJSON file and build KDTree.

The GeoJSON is expected to be a FeatureCollection where each feature is a station exit with properties containing station information. This loader will:

Extract station name and line code from common property keys.
Preserve robust fallback logic for station name parsing across GeoJSON variants (STATION_NA / STN_NAME / STN_NAM / NAME / etc.).
Strictly exclude LRT using the regex ‘^(BP|S[WE]|P[WE])’ on line codes when available, and additionally filter out any stations with ‘LRT’ in the name as a safety fallback.
Build a KDTree over exit coordinates (lon, lat). Using exits provides accurate pedestrian access points for distance calculations.

Parameters:: path (str) – Path to the GeoJSON file.

static _haversine_meters(latlon1, latlon2)[source]¶

Compute haversine distance in meters between arrays of points.

Parameters:

latlon1 (np.ndarray) – Array of shape (n, 2) with columns [lat_rad, lon_rad] in radians.
latlon2 (np.ndarray) – Array of shape (n, 2) with columns [lat_rad, lon_rad] in radians.

calculate_accessibility_score(df)[source]¶

Annotate DataFrame with nearest MRT and accessibility score.

Adds columns: - Nearest_MRT: name of nearest heavy-rail MRT station - Dist_m: distance to nearest station in meters - Accessibility_Score: score = max(0, 10 - (dist_km * 2))

Expectations: Input df must have ‘lat’ and ‘lon’ columns (degrees).