unsprawl

Unsprawl - A hardware-accelerated compound AI system for high-fidelity urban simulation and autonomous infrastructure resilience.

This package provides: 1. Universal Modular Design (UMD): Region-agnostic schemas (Entity, Asset, Agent) and dynamic Provider→Adapter→Loader architecture. 2. Legacy Singapore Valuation Pipeline: A complete CLI + programmatic API for lease-adjusted property valuation (backwards-compatible).

Quick Start (Global Platform - New UMD API)

>>> from unsprawl import Region, UniversalLoader
>>>
>>> loader = UniversalLoader()
>>> assets = loader.load(Region.SG)  # Dynamic dispatch to SGAdapter
>>> print(assets[0].asset_type, assets[0].floor_area_sqm)

Quick Start (Legacy SG Pipeline - Backwards Compatible)

>>> from unsprawl import UnsprawlApp
>>>
>>> app = UnsprawlApp()
>>> results = app.process(
...     input_path="resale.csv",
...     town="PUNGGOL",
...     budget=600000,
...     top_n=10
... )
>>> print(results)

Main Classes (UMD Architecture)

  • Entity, Asset, Agent : Universal simulation schemas (Pydantic)

  • Region : Nested namespace for region codes (e.g., Region.SG, Region.US.CA.SF)

  • UniversalLoader : Dynamic dispatcher that routes region nodes to country adapters

  • GovSGProvider : Network-first data fetcher for Singapore (cached under ~/.unsprawl/data)

  • SGAdapter : Normalizes SG datasets into universal Asset objects

Legacy Classes (Singapore Valuation - v0 compat)

  • UnsprawlApp : High-level orchestrator for the complete SG valuation pipeline

  • HDBLoader : Load and normalize HDB resale CSV data

  • FeatureEngineer : Parse remaining lease and compute price efficiency

  • LeaseDepreciationModel : Bala’s Curve implementation for lease depreciation

  • ValuationEngine : Compute group-wise z-scores and valuation scores

  • TransportScorer : Calculate MRT accessibility scores

  • ReportGenerator : Filter, rank, and format results

  • Schema : Column name definitions for the pipeline

Submodules

Attributes

Classes

UnsprawlApp

Application orchestrator wiring the pipeline and providing both programmatic and

Agent

A dynamic actor (Commuter, Bus, Car).

Asset

A static economic unit (Building, Park, Transit Station).

Entity

Universal base class for the Unsprawl simulation.

HDBLoader

Load and normalize HDB resale CSV data.

Schema

Canonical column names expected by the pipeline.

UniversalLoader

Dynamic dispatcher that loads Assets for a given Region node.

FeatureEngineer

Engineer features required for valuation.

LeaseDepreciationModel

Non-linear lease depreciation model (Bala's Curve Approximation).

ValuationEngine

Compute group-wise Z-Scores, growth potential, and a final valuation score.

ReportGenerator

Filter, rank, and render a clean buy list table to console.

TransportScorer

Compute MRT accessibility scores using spatial nearest-neighbor queries.

Functions

main([argv])

Entry point that dispatches to Typer app.

ensure_lat_lon_from_town_centroids(df, *[, town_col, ...])

Ensure DataFrame has numeric lat and lon columns.

configure_logging(verbosity)

Configure root logger formatting and level.

Package Contents

class UnsprawlApp(schema=None, transport_cache_dir=None)[source]

Application orchestrator wiring the pipeline and providing both programmatic and CLI access.

This class can be used directly as a Python module or via the CLI. For programmatic usage, use the process() method with explicit parameters. For CLI usage, use the run() method with parsed arguments.

Example (Module Usage)

>>> app = UnsprawlApp()
>>> results = app.process(
...     input_path="resale.csv",
...     town="PUNGGOL",
...     budget=600000,
...     top_n=10
... )
>>> print(results.head())

Example (With MRT Accessibility - Default)

>>> app = UnsprawlApp()
>>> results = app.process(
...     input_path="resale.csv",
...     town="BISHAN"
... )

Example (Custom MRT Catalog)

>>> results = app.process(
...     input_path="resale.csv",
...     mrt_catalog="stations.geojson",
...     town="BISHAN"
... )

Initialize the valuation engine with optional custom schema and cache directory.

Parameters:
  • schema (Schema | None) – Custom schema definition. If None, uses default Schema().

  • transport_cache_dir (Optional[str]) – Directory for caching transport KDTree data. If None, uses default .cache_transport.

schema
loader
fe
engine
transport
reporter
logger
_data: DataFrame | None = None
load_data(input_path)[source]

Load HDB resale data from CSV file.

Parameters:

input_path (str) – Path to the HDB resale CSV file.

Returns:

Loaded and normalized DataFrame.

Return type:

pd.DataFrame

Raises:
process(input_path=None, data=None, mrt_catalog=None, clear_transport_cache=False, group_by=None, enable_accessibility_adjust=True, town=None, town_like=None, budget=None, flat_type=None, flat_type_like=None, flat_model=None, flat_model_like=None, storey_min=None, storey_max=None, area_min=None, area_max=None, lease_min=None, lease_max=None, top_n=10, return_full=False)[source]

Process HDB resale data and return filtered, scored results.

This is the main programmatic entry point for using the valuation engine as a module.

Parameters:
  • input_path (Optional[str]) – Path to HDB resale CSV. Required if data is not provided.

  • data (Optional[pd.DataFrame]) – Pre-loaded DataFrame. If provided, input_path is ignored.

  • mrt_catalog (Optional[str]) – Path to MRT stations GeoJSON or CSV for transport scoring.

  • clear_transport_cache (bool) – Whether to clear transport cache before processing.

  • group_by (Optional[List[str]]) – Columns to group by for peer comparison z-scores. Defaults to [town, flat_type].

  • enable_accessibility_adjust (bool) – Whether to adjust price efficiency based on MRT accessibility. Default True.

  • town (Optional[str]) – Exact town filter (case-insensitive).

  • town_like (Optional[str]) – Partial town match (substring).

  • budget (Optional[float]) – Maximum resale price.

  • flat_type (Optional[str]) – Exact flat type filter.

  • flat_type_like (Optional[str]) – Partial flat type match.

  • flat_model (Optional[str]) – Exact flat model filter.

  • flat_model_like (Optional[str]) – Partial flat model match.

  • storey_min (Optional[int]) – Minimum storey number.

  • storey_max (Optional[int]) – Maximum storey number.

  • area_min (Optional[float]) – Minimum floor area (sqm).

  • area_max (Optional[float]) – Maximum floor area (sqm).

  • lease_min (Optional[float]) – Minimum remaining lease (years).

  • lease_max (Optional[float]) – Maximum remaining lease (years).

  • top_n (int) – Number of top results to return. Default 10.

  • return_full (bool) – If True, return all filtered results instead of just top_n.

Returns:

Filtered and scored results, sorted by valuation_score descending.

Return type:

pd.DataFrame

Raises:
  • ValueError – If neither input_path nor data is provided and the default dataset path is not available.

  • FileNotFoundError – If input_path does not exist.

Examples

>>> app = UnsprawlApp()
>>> results = app.process(
...     input_path="resale.csv",
...     town="PUNGGOL",
...     budget=600000,
...     top_n=5
... )
>>> print(f"Found {len(results)} undervalued properties")
render_report(data=None, town=None, town_like=None, budget=None, flat_type=None, flat_type_like=None, flat_model=None, flat_model_like=None, storey_min=None, storey_max=None, area_min=None, area_max=None, lease_min=None, lease_max=None, top_n=10)[source]

Render a formatted string report from processed data.

Parameters:
  • data (Optional[pd.DataFrame]) – Pre-processed DataFrame with scores. If None, uses internally stored data.

  • top_n (int) – Number of results to include in report.

Notes

This method accepts the same filter arguments as process().

Returns:

Formatted table string ready for console output.

Return type:

str

render_rich_table(df, title='🏠 Top Undervalued Residential Properties')[source]

Render a Rich table from results DataFrame.

Parameters:
  • df (pd.DataFrame) – Results DataFrame with valuation scores.

  • title (str) – Table title.

Returns:

Formatted Rich table ready for console output.

Return type:

rich.table.Table

main(argv=None)[source]

Entry point that dispatches to Typer app.

Keeps return code semantics for tests, and supports legacy calls without a subcommand by defaulting to the valuate command when argv starts with flags.

class Agent(/, **data)[source]

Bases: Entity

A dynamic actor (Commuter, Bus, Car).

Agents flow through the city graph / continuous space depending on the simulation backend.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

velocity: tuple[float, float] = (0.0, 0.0)
goal: LatLon
state: Literal['idle', 'moving', 'stuck'] = 'idle'
class Asset(/, **data)[source]

Bases: Entity

A static economic unit (Building, Park, Transit Station).

This replaces the legacy Singapore-specific concept of “HDB Flat” with a generic container that can represent any asset class across any region.

The physics engine treats local_metadata as an opaque payload.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

asset_type: Literal['residential', 'commercial', 'transport']
floor_area_sqm: float
lease_remaining_years: float
valuation_currency: str = 'USD'
predicted_valuation: float = 0.0
local_metadata: dict[str, Any] = None
class Entity(/, **data)[source]

Bases: pydantic.BaseModel

Universal base class for the Unsprawl simulation.

Everything in the simulation (static or moving) is an Entity.

Notes

Coordinate ordering is strictly (lat, lon) across the entire platform. Adapters must normalize any source data into this convention at the boundary.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

id: str
location: LatLon
classmethod _validate_location_lat_lon(value)[source]

Validate that location is (lat, lon) in sane numeric ranges.

This validator is not meant to be geo-precise; it is a defensive check that: - enforces the ordering contract at runtime - catches common adapter mistakes early (swapped lon/lat)

Region
ensure_lat_lon_from_town_centroids(df, *, town_col='town', lat_col='lat', lon_col='lon')[source]

Ensure DataFrame has numeric lat and lon columns.

If lat and lon already exist, they are coerced to numeric and left as-is. If one or both are missing, they are inferred from town using TOWN_CENTROIDS.

Unknown towns will remain NaN.

class HDBLoader(schema=None)[source]

Load and normalize HDB resale CSV data.

The loader focuses on robust file I/O and schema normalization. It lowercases and strips column names to mitigate schema drift and attempts to coerce core numeric columns into numeric dtype with proper NA handling.

schema
logger
load(path)[source]

Load CSV into a pandas DataFrame with normalized column names.

Parameters:

path (str) – Path to the CSV file.

Returns:

DataFrame with normalized columns and raw types preserved where possible.

Return type:

pd.DataFrame

Raises:
class Schema[source]

Canonical column names expected by the pipeline.

This class centralizes schema expectations while allowing flexible mapping from real-world datasets where names may vary slightly in case or spacing.

town: str = 'town'
flat_type: str = 'flat_type'
resale_price: str = 'resale_price'
floor_area: str = 'floor_area_sqm'
remaining_lease_raw: str = 'remaining_lease'
remaining_lease_years: str = 'remaining_lease_years'
price_efficiency: str = 'price_efficiency'
z_price_efficiency: str = 'z_price_efficiency'
valuation_score: str = 'valuation_score'
class UniversalLoader[source]

Dynamic dispatcher that loads Assets for a given Region node.

load(region_node)[source]

Load assets for a region.

Parameters:

region_node – A Region.* node from unsprawl.core.regions.

Returns:

The normalized assets for this region.

Return type:

list[Asset]

class FeatureEngineer(schema=None, use_lease_depreciation=True, depreciation_model=None)[source]

Engineer features required for valuation.

Responsibilities

  • Parse remaining lease strings of the form “85 years 3 months” into a float in units of years (e.g., 85.25) with robust handling of edge cases.

  • Compute price efficiency as: resale_price / (floor_area_sqm * remaining_lease_years)

  • Apply non-linear lease depreciation adjustment via LeaseDepreciationModel

Mathematical Notes

Price efficiency penalizes larger prices per effective area-year. By dividing price by both floor area (sqm) and remaining lease (years), the metric naturally adjusts for lease decay. The non-linear depreciation model further refines this by accounting for the accelerating loss of value as lease expiry approaches.

Initialize FeatureEngineer with optional lease depreciation model.

Parameters:
  • schema (Schema | None) – Schema definition for column names.

  • use_lease_depreciation (bool) – Whether to apply non-linear lease depreciation adjustment (default: True).

  • depreciation_model (LeaseDepreciationModel | None) – Custom depreciation model. If None and use_lease_depreciation=True, creates default LeaseDepreciationModel.

_LEASE_YEARS_RE
_LEASE_MONTHS_RE
schema
logger
use_lease_depreciation = True
depreciation_model: LeaseDepreciationModel | None
_parse_lease_text(text)[source]

Parse a remaining lease string into float years.

Examples

  • “85 years 3 months” -> 85.25

  • “99 years” -> 99.0

  • “8 months” -> 0.666…

  • “less than 1 year” -> 0.5 (conservative placeholder)

Parameters:

text (str | float | int | None) – Raw value from the dataset.

Returns:

Parsed years as float, or None if parsing fails.

Return type:

Optional[float]

_infer_remaining_lease_from_commence(df, assumed_lease_years=99.0)[source]

Infer remaining lease (years) from lease_commence_date and month columns.

Mathematics

remaining_years = assumed_lease_years - ((year + month/12) - lease_commence_year) where (year, month) come from the transaction month string “YYYY-MM”.

Values are clipped to [0, assumed_lease_years]. Non-parsable rows yield NaN.

parse_remaining_lease(df)[source]

Add a remaining_lease_years float column to the DataFrame.

The method attempts to parse the canonical remaining_lease column if present. If a numeric-looking remaining_lease_years already exists, it is respected. If missing, it falls back to inferring from (lease_commence_date, month) assuming a 99-year lease. All parsing errors coerce to NaN.

Parameters:

df (pd.DataFrame) – Input dataframe.

Returns:

DataFrame with an added/updated remaining_lease_years column.

Return type:

pd.DataFrame

compute_price_efficiency(df)[source]

Compute price efficiency metric with optional non-linear lease depreciation.

Formula (Base)

price_efficiency = resale_price / (floor_area_sqm * remaining_lease_years)

Formula (With Depreciation Adjustment)

price_efficiency_adjusted = base_efficiency / depreciation_factor(remaining_lease)

where depreciation_factor ∈ [0, 1] computed via Bala’s Curve.

Interpretation

Lower values indicate better cost per area-year. The non-linear depreciation adjustment increases the effective price for properties with shorter leases, reflecting the accelerating loss of market value as lease expiry approaches. This makes the valuation economically rigorous and market-realistic.

class LeaseDepreciationModel(max_lease=99.0, decay_rate=3.0, steepness=2.5)[source]

Non-linear lease depreciation model (Bala’s Curve Approximation).

This model implements an economically rigorous depreciation curve for HDB leases, recognizing that a 99-year lease does not depreciate linearly. The value holds well for the first 30-40 years and then accelerates downward as lease expiry approaches.

Mathematical Model

The depreciation factor is computed using a sigmoid-like curve:

factor = exp(-k * ((99 - remaining) / 99)^n)

Where: - remaining: years of lease remaining - k: decay rate parameter (default: 3.0) - n: curve steepness (default: 2.5)

This produces: - Factor ≈ 1.0 for remaining > 80 years (minimal depreciation) - Factor ≈ 0.8-0.9 for remaining = 50-80 years (moderate depreciation) - Factor ≈ 0.3-0.7 for remaining = 20-50 years (accelerating depreciation) - Factor ≈ 0.0-0.2 for remaining < 20 years (severe depreciation)

References

This approximates the observed market behavior described in academic literature on HDB lease decay, including Bala’s studies on Singapore public housing valuation.

Initialize the lease depreciation model.

Parameters:
  • max_lease (float) – Maximum lease period in years (default: 99.0 for HDB).

  • decay_rate (float) – Controls overall depreciation intensity (higher = more aggressive decay).

  • steepness (float) – Controls curve shape (higher = sharper decline near end of lease).

max_lease = 99.0
decay_rate = 3.0
steepness = 2.5
logger
compute_depreciation_factor(remaining_years)[source]

Compute non-linear depreciation factor for given remaining lease years.

Parameters:

remaining_years (pd.Series | float) – Remaining lease in years (can be Series or scalar).

Returns:

Depreciation factor between 0 and 1, where 1 = no depreciation.

Return type:

pd.Series | float

adjust_price_efficiency(base_efficiency, remaining_years)[source]

Adjust price efficiency using non-linear lease depreciation.

The adjusted efficiency accounts for the non-linear loss of value over time. Lower depreciation factors increase the effective price per area-year, making properties with shorter leases appear more expensive on a value-adjusted basis.

Parameters:
  • base_efficiency (pd.Series) – Base price efficiency (price / (area * remaining_years)).

  • remaining_years (pd.Series) – Remaining lease years for each property.

Returns:

Lease-adjusted price efficiency.

Return type:

pd.Series

class ValuationEngine(schema=None)[source]

Compute group-wise Z-Scores, growth potential, and a final valuation score.

Methodology

  1. Compute Z-Score of price_efficiency within groups defined by configurable grouping keys (default: (town, flat_type)). The Z-Score is defined as:

    z = (x - mu) / sigma

    where x is the observation’s price_efficiency, mu is the group mean, and sigma is the group standard deviation. If sigma == 0 or NaN, z is set to 0.

  2. Define Valuation_Score = -Z_Price_Efficiency so that higher scores indicate better (cheaper-than-peers) properties.

  3. Compute Growth_Potential metric based on Price-per-Sqm vs Town Average: - Deep Value (High Growth): Unit PSM < 0.85 × Town Avg PSM - Fair Value (Moderate Growth): 0.85 ≤ Unit PSM < 1.0 × Town Avg PSM - Premium (Low Growth): Unit PSM ≥ 1.0 × Town Avg PSM

This civic value metric identifies properties trading significantly below their peer average, suggesting potential for price appreciation or representing exceptional value for money.

schema
logger
_groupwise_zscore(series, groups)[source]

Compute group-wise Z-Score with robust handling of zero std.

Parameters:
  • series (pd.Series) – Numeric series to standardize.

  • groups (pd.Series) – Group labels of same length as series.

Returns:

Group-wise z-scores with NaN-safe handling; zeros where std is 0 or NaN.

Return type:

pd.Series

_compute_growth_potential(df)[source]

Compute future appreciation potential based on price-per-sqm vs town average.

This civic finance heuristic identifies “deep value” properties trading significantly below their peer group average, which may indicate: 1. Undervaluation relative to neighborhood 2. Higher potential for price appreciation 3. Exceptional value-for-money opportunities

The metric uses vectorized pandas operations for performance.

Parameters:

df (pd.DataFrame) – Input DataFrame with resale_price, floor_area_sqm, town, and flat_type.

Returns:

DataFrame with added columns: - price_per_sqm: Unit price per square meter - town_avg_psm: Average PSM for (town, flat_type) peer group - psm_ratio: Unit PSM / Town Avg PSM - growth_potential: Categorical score (High/Moderate/Low)

Return type:

pd.DataFrame

score(df, group_by=None)[source]

Add Z-Score, Valuation Score, and Growth Potential columns to the DataFrame.

Adds the following columns: - z_price_efficiency: group-wise Z-Score of price_efficiency within selected groups - valuation_score: -z_price_efficiency, so higher is more undervalued - price_per_sqm: Price per square meter - town_avg_psm: Average PSM for peer group (town, flat_type) - psm_ratio: Unit PSM / Town Average PSM - growth_potential: Categorical (High/Moderate/Low) appreciation potential

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing required columns.

  • group_by (Optional[List[str]]) – Column names to define peer groups. Defaults to [town, flat_type].

Returns:

DataFrame with added score columns.

Return type:

pd.DataFrame

class ReportGenerator(schema=None)[source]

Filter, rank, and render a clean buy list table to console.

Filtering

  • Optional exact/partial town filter.

  • Optional budget filter for maximum resale price.

  • Extended filters: flat_model, flat_type (exact or partial), storey_min/max, area_min/max, lease_min/max.

Ranking

  • Sort by valuation_score descending (highest implies most undervalued), with ties broken by lowest price_efficiency and then lowest resale_price.

  • Display the top N results (default: 10).

Rendering

  • Human-friendly table using pandas’ built-in formatting.

schema
logger
static _parse_storey_range(sr)[source]

Parse HDB storey range strings like “07 TO 09” into (min, max).

Non-parsable inputs return (None, None).

_apply_filters(df, town=None, town_like=None, budget=None, flat_type=None, flat_type_like=None, flat_model=None, flat_model_like=None, storey_min=None, storey_max=None, area_min=None, area_max=None, lease_min=None, lease_max=None)[source]

Apply user-specified filters to DataFrame.

Parameters:
  • df (pd.DataFrame) – The scored dataset.

  • town (Optional[str]) – Town name for exact case-insensitive filtering.

  • town_like (Optional[str]) – Substring case-insensitive match for town.

  • budget (Optional[float]) – Maximum resale price.

  • flat_type (Optional[str]) – Exact match filter for flat_type (case-insensitive).

  • flat_type_like (Optional[str]) – Substring match for flat_type.

  • flat_model (Optional[str]) – Exact match filter for flat_model.

  • flat_model_like (Optional[str]) – Substring match for flat_model.

  • storey_min, storey_max (Optional[int]) – Min/max storey number filter (overlap with storey_range).

  • area_min, area_max (Optional[float]) – Floor area filters.

  • lease_min, lease_max (Optional[float]) – Remaining lease (years) filters.

generate_dataframe(df, town=None, town_like=None, budget=None, flat_type=None, flat_type_like=None, flat_model=None, flat_model_like=None, storey_min=None, storey_max=None, area_min=None, area_max=None, lease_min=None, lease_max=None, top_n=10, full=False)[source]

Produce the filtered, sorted DataFrame for display/export.

If full is True, returns all rows after sorting; otherwise, returns the top_n rows.

render(df, town=None, town_like=None, budget=None, flat_type=None, flat_type_like=None, flat_model=None, flat_model_like=None, storey_min=None, storey_max=None, area_min=None, area_max=None, lease_min=None, lease_max=None, top_n=10)[source]

Generate the formatted table for the buy list.

The table prioritizes the most undervalued units by sorting on valuation_score desc, breaking ties by price_efficiency asc and resale_price asc.

class TransportScorer(stations_df=None, cache_dir=None)[source]

Compute MRT accessibility scores using spatial nearest-neighbor queries.

This scorer loads a catalog of MRT station coordinates, strictly excluding all LRT stations using a regex filter ‘^(BP|S[WE]|P[WE])’. The pattern matches the line codes for Bukit Panjang (BP), Sengkang (SW/SE), and Punggol (PW/PE) LRT loops, ensuring that only heavy rail stations are retained.

A KDTree (from scikit-learn) is used for vectorized nearest-neighbor computation across thousands of records instantly, avoiding Python loops.

Accessibility score definition

score = max(0, 10 - (dist_km * 2)) where dist_km is the Euclidean distance in kilometers from the HDB listing coordinate to the nearest MRT station in the filtered catalog.

logger
_stations: DataFrame | None = None
_tree: KDTree | None = None
_cache_dir
static _exclude_lrt(df)[source]

Exclude LRT stations using strict regex on line codes.

Excludes station rows whose line_code matches ‘^(BP|S[WE]|P[WE])’. Column expectations: - name: station name (str) - line_code: string line code such as ‘NS’, ‘EW’, ‘DT’, ‘CC’, ‘BP’, ‘SW’ - lat, lon: numeric coordinates in degrees

_cache_paths(tag)[source]
clear_cache()[source]

Delete cached stations and KDTree files in the cache directory.

_try_load_cache(tag)[source]
_save_cache(tag)[source]
load_stations(stations_df)[source]

Load station catalog, exclude LRT, and build KDTree index.

Parameters:

stations_df (pd.DataFrame) – DataFrame with columns: [‘name’, ‘line_code’, ‘lat’, ‘lon’].

load_stations_geojson(path)[source]

Load MRT stations from an LTA Exit GeoJSON file and build KDTree.

The GeoJSON is expected to be a FeatureCollection where each feature is a station exit with properties containing station information. This loader will:

  • Extract station name and line code from common property keys.

  • Preserve robust fallback logic for station name parsing across GeoJSON variants (STATION_NA / STN_NAME / STN_NAM / NAME / etc.).

  • Strictly exclude LRT using the regex ‘^(BP|S[WE]|P[WE])’ on line codes when available, and additionally filter out any stations with ‘LRT’ in the name as a safety fallback.

  • Build a KDTree over exit coordinates (lon, lat). Using exits provides accurate pedestrian access points for distance calculations.

Parameters:

path (str) – Path to the GeoJSON file.

static _haversine_meters(latlon1, latlon2)[source]

Compute haversine distance in meters between arrays of points.

Parameters:
  • latlon1 (np.ndarray) – Array of shape (n, 2) with columns [lat_rad, lon_rad] in radians.

  • latlon2 (np.ndarray) – Array of shape (n, 2) with columns [lat_rad, lon_rad] in radians.

calculate_accessibility_score(df)[source]

Annotate DataFrame with nearest MRT and accessibility score.

Adds columns: - Nearest_MRT: name of nearest heavy-rail MRT station - Dist_m: distance to nearest station in meters - Accessibility_Score: score = max(0, 10 - (dist_km * 2))

Expectations: Input df must have ‘lat’ and ‘lon’ columns (degrees).

__version__: str = '0.0.1'
configure_logging(verbosity)[source]

Configure root logger formatting and level.

Parameters:

verbosity (int) – Verbosity level from CLI: - 0: WARNING - 1: INFO - 2+: DEBUG