unsprawl.loader

Data loading module for Unsprawl.

This module handles CSV ingestion and schema normalization for HDB resale data.

Classes

Schema

Canonical column names expected by the pipeline.

HDBLoader

Load and normalize HDB resale CSV data.

Module Contents

class Schema[source]

Canonical column names expected by the pipeline.

This class centralizes schema expectations while allowing flexible mapping from real-world datasets where names may vary slightly in case or spacing.

town: str = 'town'
flat_type: str = 'flat_type'
resale_price: str = 'resale_price'
floor_area: str = 'floor_area_sqm'
remaining_lease_raw: str = 'remaining_lease'
remaining_lease_years: str = 'remaining_lease_years'
price_efficiency: str = 'price_efficiency'
z_price_efficiency: str = 'z_price_efficiency'
valuation_score: str = 'valuation_score'
class HDBLoader(schema=None)[source]

Load and normalize HDB resale CSV data.

The loader focuses on robust file I/O and schema normalization. It lowercases and strips column names to mitigate schema drift and attempts to coerce core numeric columns into numeric dtype with proper NA handling.

schema
logger
load(path)[source]

Load CSV into a pandas DataFrame with normalized column names.

Parameters:

path (str) – Path to the CSV file.

Returns:

DataFrame with normalized columns and raw types preserved where possible.

Return type:

pd.DataFrame

Raises: