Roadmap¶

This roadmap outlines the strategic development path for Unsprawl. Our focus is on increasing financial rigor, optimizing geospatial query latency, and enhancing the “Shadow Valuation” capabilities.

In addition, the project now follows stricter release governance:

Version Control Gate: versions must remain consistent across the project, and monotonic version bumps are enforced on version-specific release branches (e.g., vX.Y.Z).
Hook Integrity: contributors must not bypass pre-commit checks.
Release Integrity: releases are expected to be created as signed tags.

v0.1.0: Project Advect (Warp ABM) — Primary Roadmap¶

Focus: GPU Simulation, Urban Agent Dynamics, and Benchmarkable Systems Performance

MVP¶

✅ SoA agent update kernel (flow-field sampling + Euler integration)
✅ NumPy interop for visualization pipelines
✅ Divergence-free procedural flow fields (curl/stream-function generator)
✅ advect CLI demo subcommand (export positions + perf stats)
✅ Local benchmark harness (python tools/benchmark_advect.py ...)
The Sentinel kernel, a perpetually-active kernel which leverages Gemini’s reasoning abilities to control custom Holodeck-style sandboxed Advect simulations to generate insights.
The Refinery kernel, which leverages DuckDB’s Arrow language (a Declarative Transformation Layer) and Gemini’s reasoning abilities to synthesize new parsers and data-sanitizers for newly-uploaded government datasets.

v0.1.0 checklist (grouped by TODO category)¶

`TODO(optimization)` — complexity + data structures¶

[ ] Uniform-grid spatial hashing (cell lists) for expected O(N) neighbor search
[ ] Bilinear flow-field sampling (reduce aliasing, improve stability)
[ ] Profiling + bandwidth audit (identify hot kernels and memory bottlenecks)

`TODO(gpu)` — minimize host↔device overhead¶

[ ] On-device reductions for visualization (downsample, density map, histograms)
[ ] On-device flow-field updates (procedural and/or learned fields)
[ ] Streaming time-series outputs (chunked Arrow/Parquet) without full-frame downloads

`TODO(research)` — calibrated proxies + better datasets¶

[ ] Replace straight-line distance proxies with calibrated impedance (L1/anisotropic/learned)
[ ] Dataset-driven flow fields (OD matrices, station attractiveness potentials, travel-time gradients)

`TODO(limitations)` — relax simplifying assumptions¶

[ ] Demographic-conditioned walking speeds and mobility constraints
[ ] Behavior heterogeneity (agent types) once datasets are available

Plans¶

The “Commuter Graph” Update¶

Focus: Systems Optimization & Latency Reduction

Hybrid Commuter Graph: Implementation of O(1) Station-to-CBD lookup tables.
- Pre-computed travel times from all 140+ MRT/LRT stations to 5 economic cores (Raffles Place, Jurong East, Changi Business Park, Woodlands Regional Centre, Punggol Digital District).
- Removes the need for real-time API calls to routing engines, enabling sub-millisecond “Time-to-Core” calculations.
Isochrone Approximations: Generation of 30-min and 45-min accessibility polygons using pre-aggregated transport nodes.
- Integration with OneMap API or precomputed OSRM datasets for walking-time based accessibility scoring.
- Replaces Euclidean distance heuristics with actual pedestrian network routing.
- Storage-efficient GeoJSON polygon caching for ~130 MRT stations × 3 time zones (5/10/15 minutes).

The “Shadow Valuation” Engine¶

Focus: Market Efficiency & Comparative Analysis

Comparative Market Analysis (CMA):
- New ComparablesEngine class to identify “Nearest Neighbor” transactions (same town, ±5 years lease, ±10% size).
- Automatic “Overpriced/Underpriced” flagging relative to immediate peer transactions (radius < 400m).
- K-Nearest Neighbors (KNN) spatial clustering with configurable similarity metrics.
Market Regime Detection:
- Time-series analysis to detect “Seller’s Markets” vs “Buyer’s Markets” based on 3-month vs 12-month moving averages of Price-per-PSM.
- Rolling volatility calculations and trend strength indicators.
- Seasonal adjustment for local holidays and property cooling measure effects.

Visualization & UX¶

Focus: Accessibility & Reporting

Enhanced “Rich” Terminal UI:
- Dashboard-style console output with ASCII charts and sparklines.
- Color-coded valuation heatmaps and distribution visualizations.
- Interactive property comparison tables with drill-down capabilities.
PDF Report Generation:
- Automated generation of “Investment Memos” for specific properties.
- Bala’s Curve depreciation visualizations with property-specific overlays.
- Transport accessibility maps and comparable transaction analysis.
- Export-ready charts using matplotlib/plotly for static reports.

Advanced Analytics & Prediction¶

Focus: Temporal Modeling & Forecasting

Lease Decay Scenarios:
- Monte Carlo simulations for remaining lease value projections.
- Sensitivity analysis: impact of HDB policy changes (VERS, LBS) on valuations.
Predictive Modeling (Experimental):
- Time-series forecasting for town-level price trends (SARIMA, Prophet).
- Feature importance analysis: which factors drive undervaluation most strongly?
- Ensemble models combining Bala’s Curve with machine learning for refined scoring.

Long-Term Research Goals¶

URA Master Plan Overlay: Integrating future zoning and “Reserve Site” data to value potential appreciation.
- Geospatial intersection with URA’s Master Plan 2019/2024 datasets.
- Proximity scoring for planned MRT extensions (Cross Island Line, Jurong Region Line).
- Development Charge (DC) rate awareness for areas with high redevelopment potential.
School Proximity Scoring: Geodesic scoring for “Primary School Priority Registration” zones (1km/2km radius).
- Integration with MOE school location data and PSLE ranking proxies.
- Distance-weighted scoring for top-tier primary schools.
Rental Yield Estimation:
- Integration with URA rental transaction data (where available).
- Yield calculator: estimated monthly rent ÷ purchase price for investment analysis.
Carbon Footprint & Sustainability Metrics:
- HDB Green Mark certification integration.
- Proximity to green spaces (Parks & Waterbodies dataset from data.gov.sg).
- Solar panel potential and energy efficiency scoring.

Comprehensive Gemini Integration (Hackathon Demo)¶

Integrate Gemini’s flagship capabilities end-to-end.

GeminiProvider v1 (text + tool calling):
- [ ] unsprawl agent run --region SG executes a full loop: fetch → normalize → score → explain.
- [ ] Tool calling: provider fetch, loader dispatch, export report.
- [ ] Deterministic replay logs (inputs/outputs) for judging.
Multimodal Senses (image/video):
- [ ] Accept a traffic GIF/video clip and extract congestion descriptors for the simulation dashboard.
- [ ] Demonstrate “vision → structured signal” transformation.
Reflex + Reasoning split:
- [ ] Fast reflex model for low-latency classification tasks.
- [ ] Pro model for longer chain-of-thought planning (hidden) + verifiable tool outputs.

Community & Integration¶

RESTful API Service:
- FastAPI-based microservice for programmatic access.
- Rate-limited public endpoints for researchers and civic tech projects.
Web Dashboard (Streamlit/Gradio):
- No-code interface for non-technical users.
- Interactive maps with property highlighting and filtering.
Data Pipeline Automation:
- Scheduled fetching of latest data.gov.sg resale transactions (monthly).
- Automated model retraining and cache invalidation on new data.

Technical Debt & Infrastructure¶

Release & Versioning Governance¶

Strict version alignment: keep versions uniform across pyproject.toml, unsprawl/utils.py, and docs/source/conf.py.
Monotonic version bumps (release branches): enforced via pre-commit on version-specific release branches (e.g., vX.Y.Z).
Signed releases: use signed annotated tags for releases.
Docs quality gate: docs should build cleanly (ideally sphinx -n -W) before tagging releases.

Performance Optimization¶

Vectorization: Full numpy/pandas optimization for O(n) scaling to 1M+ transactions.
Parallel Processing: Multiprocessing support for transport scoring across large datasets.
Database Backend: Optional PostgreSQL/PostGIS support for enterprise deployments.

Testing & Quality¶

100% Test Coverage: Comprehensive unit and integration test suites.
Property-Based Testing: Hypothesis integration for edge case discovery.
Benchmark Suite: Performance regression testing for core algorithms.

Documentation¶

Interactive Tutorials: Jupyter notebooks for common analysis workflows.
Video Walkthroughs: Screencasts demonstrating CLI and module usage.
Case Studies: Real-world examples of undervalued property discoveries.

This roadmap is subject to change based on community feedback, data availability, and emerging best practices in quantitative real estate analysis.