unsprawl.models

Financial modeling module for HDB valuation.

This module contains the core valuation logic including: - Bala’s Curve implementation for non-linear lease depreciation - Feature engineering for price efficiency metrics - Valuation scoring with growth potential analysis

Classes

LeaseDepreciationModel

Non-linear lease depreciation model (Bala's Curve Approximation).

FeatureEngineer

Engineer features required for valuation.

ValuationEngine

Compute group-wise Z-Scores, growth potential, and a final valuation score.

Module Contents

class LeaseDepreciationModel(max_lease=99.0, decay_rate=3.0, steepness=2.5)[source]

Non-linear lease depreciation model (Bala’s Curve Approximation).

This model implements an economically rigorous depreciation curve for HDB leases, recognizing that a 99-year lease does not depreciate linearly. The value holds well for the first 30-40 years and then accelerates downward as lease expiry approaches.

Mathematical Model

The depreciation factor is computed using a sigmoid-like curve:

factor = exp(-k * ((99 - remaining) / 99)^n)

Where: - remaining: years of lease remaining - k: decay rate parameter (default: 3.0) - n: curve steepness (default: 2.5)

This produces: - Factor ≈ 1.0 for remaining > 80 years (minimal depreciation) - Factor ≈ 0.8-0.9 for remaining = 50-80 years (moderate depreciation) - Factor ≈ 0.3-0.7 for remaining = 20-50 years (accelerating depreciation) - Factor ≈ 0.0-0.2 for remaining < 20 years (severe depreciation)

References

This approximates the observed market behavior described in academic literature on HDB lease decay, including Bala’s studies on Singapore public housing valuation.

Initialize the lease depreciation model.

Parameters:
  • max_lease (float) – Maximum lease period in years (default: 99.0 for HDB).

  • decay_rate (float) – Controls overall depreciation intensity (higher = more aggressive decay).

  • steepness (float) – Controls curve shape (higher = sharper decline near end of lease).

max_lease = 99.0
decay_rate = 3.0
steepness = 2.5
logger
compute_depreciation_factor(remaining_years)[source]

Compute non-linear depreciation factor for given remaining lease years.

Parameters:

remaining_years (pd.Series | float) – Remaining lease in years (can be Series or scalar).

Returns:

Depreciation factor between 0 and 1, where 1 = no depreciation.

Return type:

pd.Series | float

adjust_price_efficiency(base_efficiency, remaining_years)[source]

Adjust price efficiency using non-linear lease depreciation.

The adjusted efficiency accounts for the non-linear loss of value over time. Lower depreciation factors increase the effective price per area-year, making properties with shorter leases appear more expensive on a value-adjusted basis.

Parameters:
  • base_efficiency (pd.Series) – Base price efficiency (price / (area * remaining_years)).

  • remaining_years (pd.Series) – Remaining lease years for each property.

Returns:

Lease-adjusted price efficiency.

Return type:

pd.Series

class FeatureEngineer(schema=None, use_lease_depreciation=True, depreciation_model=None)[source]

Engineer features required for valuation.

Responsibilities

  • Parse remaining lease strings of the form “85 years 3 months” into a float in units of years (e.g., 85.25) with robust handling of edge cases.

  • Compute price efficiency as: resale_price / (floor_area_sqm * remaining_lease_years)

  • Apply non-linear lease depreciation adjustment via LeaseDepreciationModel

Mathematical Notes

Price efficiency penalizes larger prices per effective area-year. By dividing price by both floor area (sqm) and remaining lease (years), the metric naturally adjusts for lease decay. The non-linear depreciation model further refines this by accounting for the accelerating loss of value as lease expiry approaches.

Initialize FeatureEngineer with optional lease depreciation model.

Parameters:
  • schema (Schema | None) – Schema definition for column names.

  • use_lease_depreciation (bool) – Whether to apply non-linear lease depreciation adjustment (default: True).

  • depreciation_model (LeaseDepreciationModel | None) – Custom depreciation model. If None and use_lease_depreciation=True, creates default LeaseDepreciationModel.

_LEASE_YEARS_RE
_LEASE_MONTHS_RE
schema
logger
use_lease_depreciation = True
depreciation_model: LeaseDepreciationModel | None
_parse_lease_text(text)[source]

Parse a remaining lease string into float years.

Examples

  • “85 years 3 months” -> 85.25

  • “99 years” -> 99.0

  • “8 months” -> 0.666…

  • “less than 1 year” -> 0.5 (conservative placeholder)

Parameters:

text (str | float | int | None) – Raw value from the dataset.

Returns:

Parsed years as float, or None if parsing fails.

Return type:

Optional[float]

_infer_remaining_lease_from_commence(df, assumed_lease_years=99.0)[source]

Infer remaining lease (years) from lease_commence_date and month columns.

Mathematics

remaining_years = assumed_lease_years - ((year + month/12) - lease_commence_year) where (year, month) come from the transaction month string “YYYY-MM”.

Values are clipped to [0, assumed_lease_years]. Non-parsable rows yield NaN.

parse_remaining_lease(df)[source]

Add a remaining_lease_years float column to the DataFrame.

The method attempts to parse the canonical remaining_lease column if present. If a numeric-looking remaining_lease_years already exists, it is respected. If missing, it falls back to inferring from (lease_commence_date, month) assuming a 99-year lease. All parsing errors coerce to NaN.

Parameters:

df (pd.DataFrame) – Input dataframe.

Returns:

DataFrame with an added/updated remaining_lease_years column.

Return type:

pd.DataFrame

compute_price_efficiency(df)[source]

Compute price efficiency metric with optional non-linear lease depreciation.

Formula (Base)

price_efficiency = resale_price / (floor_area_sqm * remaining_lease_years)

Formula (With Depreciation Adjustment)

price_efficiency_adjusted = base_efficiency / depreciation_factor(remaining_lease)

where depreciation_factor ∈ [0, 1] computed via Bala’s Curve.

Interpretation

Lower values indicate better cost per area-year. The non-linear depreciation adjustment increases the effective price for properties with shorter leases, reflecting the accelerating loss of market value as lease expiry approaches. This makes the valuation economically rigorous and market-realistic.

class ValuationEngine(schema=None)[source]

Compute group-wise Z-Scores, growth potential, and a final valuation score.

Methodology

  1. Compute Z-Score of price_efficiency within groups defined by configurable grouping keys (default: (town, flat_type)). The Z-Score is defined as:

    z = (x - mu) / sigma

    where x is the observation’s price_efficiency, mu is the group mean, and sigma is the group standard deviation. If sigma == 0 or NaN, z is set to 0.

  2. Define Valuation_Score = -Z_Price_Efficiency so that higher scores indicate better (cheaper-than-peers) properties.

  3. Compute Growth_Potential metric based on Price-per-Sqm vs Town Average: - Deep Value (High Growth): Unit PSM < 0.85 × Town Avg PSM - Fair Value (Moderate Growth): 0.85 ≤ Unit PSM < 1.0 × Town Avg PSM - Premium (Low Growth): Unit PSM ≥ 1.0 × Town Avg PSM

This civic value metric identifies properties trading significantly below their peer average, suggesting potential for price appreciation or representing exceptional value for money.

schema
logger
_groupwise_zscore(series, groups)[source]

Compute group-wise Z-Score with robust handling of zero std.

Parameters:
  • series (pd.Series) – Numeric series to standardize.

  • groups (pd.Series) – Group labels of same length as series.

Returns:

Group-wise z-scores with NaN-safe handling; zeros where std is 0 or NaN.

Return type:

pd.Series

_compute_growth_potential(df)[source]

Compute future appreciation potential based on price-per-sqm vs town average.

This civic finance heuristic identifies “deep value” properties trading significantly below their peer group average, which may indicate: 1. Undervaluation relative to neighborhood 2. Higher potential for price appreciation 3. Exceptional value-for-money opportunities

The metric uses vectorized pandas operations for performance.

Parameters:

df (pd.DataFrame) – Input DataFrame with resale_price, floor_area_sqm, town, and flat_type.

Returns:

DataFrame with added columns: - price_per_sqm: Unit price per square meter - town_avg_psm: Average PSM for (town, flat_type) peer group - psm_ratio: Unit PSM / Town Avg PSM - growth_potential: Categorical score (High/Moderate/Low)

Return type:

pd.DataFrame

score(df, group_by=None)[source]

Add Z-Score, Valuation Score, and Growth Potential columns to the DataFrame.

Adds the following columns: - z_price_efficiency: group-wise Z-Score of price_efficiency within selected groups - valuation_score: -z_price_efficiency, so higher is more undervalued - price_per_sqm: Price per square meter - town_avg_psm: Average PSM for peer group (town, flat_type) - psm_ratio: Unit PSM / Town Average PSM - growth_potential: Categorical (High/Moderate/Low) appreciation potential

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing required columns.

  • group_by (Optional[List[str]]) – Column names to define peer groups. Defaults to [town, flat_type].

Returns:

DataFrame with added score columns.

Return type:

pd.DataFrame