unsprawl.fetch

Data fetching utilities for downloading HDB and MRT datasets.

This module handles downloading datasets from Data.gov.sg APIs and provides fallback synthetic data generation when official sources are unavailable.

Attributes

Functions

validate_hdb_schema(path)

Validate that a CSV file has the expected HDB resale schema.

validate_mrt_schema(path)

Validate that a GeoJSON or CSV file has the expected MRT schema.

prompt_user_download(dataset_name)

Prompt user for permission to download a dataset.

api_get_download_url(dataset_id[, verbose])

Hit the initiate-download endpoint to get the temporary S3 URL.

generate_synthetic_hdb(path, limit)

Generate synthetic HDB resale dataset for testing.

generate_synthetic_mrt(path)

Generate synthetic MRT GeoJSON for testing.

download_file(url, dest_path)

Stream the file from the S3 URL to disk.

fetch_hdb_data(limit, out_dir, filename[, verbose])

Fetch HDB resale dataset using the official Data.gov.sg 'initiate-download'

fetch_mrt_data(mrt_out_path[, verbose])

Fetch MRT GeoJSON dataset.

ensure_hdb_dataset(path[, verbose])

Ensure HDB dataset exists and has valid schema, downloading if necessary.

ensure_mrt_dataset(path[, verbose])

Ensure MRT dataset exists and has valid schema, downloading if necessary.

Module Contents

DATASET_IDS
DEFAULT_HDB_PATH
DEFAULT_MRT_PATH
EXPECTED_HDB_COLUMNS
EXPECTED_MRT_COLUMNS
validate_hdb_schema(path)[source]

Validate that a CSV file has the expected HDB resale schema.

Parameters:

path (str) – Path to the CSV file to validate.

Returns:

True if the file exists and has the expected columns, False otherwise.

Return type:

bool

validate_mrt_schema(path)[source]

Validate that a GeoJSON or CSV file has the expected MRT schema.

Parameters:

path (str) – Path to the GeoJSON or CSV file to validate.

Returns:

True if the file exists and has the expected structure, False otherwise.

Return type:

bool

prompt_user_download(dataset_name)[source]

Prompt user for permission to download a dataset.

Parameters:

dataset_name (str) – Name of the dataset to download (“HDB resale data” or “MRT stations data”).

Returns:

True if user approves download, False otherwise.

Return type:

bool

api_get_download_url(dataset_id, verbose=0)[source]

Hit the initiate-download endpoint to get the temporary S3 URL.

generate_synthetic_hdb(path, limit)[source]

Generate synthetic HDB resale dataset for testing.

generate_synthetic_mrt(path)[source]

Generate synthetic MRT GeoJSON for testing.

download_file(url, dest_path)[source]

Stream the file from the S3 URL to disk.

fetch_hdb_data(limit, out_dir, filename, verbose=0)[source]

Fetch HDB resale dataset using the official Data.gov.sg ‘initiate-download’ API.

fetch_mrt_data(mrt_out_path, verbose=0)[source]

Fetch MRT GeoJSON dataset.

ensure_hdb_dataset(path, verbose=0)[source]

Ensure HDB dataset exists and has valid schema, downloading if necessary.

Parameters:
  • path (str) – Path where the HDB dataset should exist.

  • verbose (int) – Verbosity level for logging.

Returns:

True if dataset is available and valid, False if user declined download or download failed.

Return type:

bool

ensure_mrt_dataset(path, verbose=0)[source]

Ensure MRT dataset exists and has valid schema, downloading if necessary.

Parameters:
  • path (str) – Path where the MRT dataset should exist.

  • verbose (int) – Verbosity level for logging.

Returns:

True if dataset is available and valid, False if user declined download or download failed.

Return type:

bool