unsprawl.refinery

Autonomous ETL Agent using Gemini Thinking Mode + Code Execution.

This module implements the RefineryAgent that ingests raw CSV data, uses Gemini’s thinking capabilities to reason about schema mappings, and generates Python transformation code via Code Execution.

Example

>>> from unsprawl.refinery import RefineryAgent
>>> from pydantic import BaseModel
>>>
>>> class PropertyRecord(BaseModel):
...     address: str
...     price: float
...     area_sqm: float
>>>
>>> agent = RefineryAgent()
>>> records = await agent.ingest(
...     "https://example.com/raw_data.csv",
...     PropertyRecord
... )

Attributes

Classes

RefineryAgent

Autonomous ETL agent that transforms raw CSV to structured Pydantic models.

Module Contents

logger
T
class RefineryAgent(model='gemini-2.5-flash', api_key=None, thinking_budget=8192)[source]

Autonomous ETL agent that transforms raw CSV to structured Pydantic models.

Uses Gemini’s Thinking Mode for deep reasoning about column mappings and Code Execution to run the generated transformation scripts.

Parameters:
  • model (str) – The Gemini model to use. Defaults to gemini-2.5-flash for thinking support.

  • api_key (str | None) – Google API key. If None, reads from GOOGLE_API_KEY environment variable.

  • thinking_budget (int) – Token budget for thinking. Higher values enable deeper reasoning.

Notes

  • Requires google-genai SDK (NOT google.generativeai)

  • Code Execution runs in a sandboxed environment

model = 'gemini-2.5-flash'
thinking_budget = 8192
client
_schema_to_prompt(schema_cls)[source]

Convert a Pydantic model to a prompt-friendly schema description.

async ingest(file_url, target_schema)[source]

Ingest a CSV URL and transform it to match the target schema.

Parameters:
  • file_url (str) – URL to the raw CSV file to ingest.

  • target_schema (type[BaseModel]) – Pydantic model defining the target data structure.

Returns:

List of records conforming to the target schema.

Return type:

list[dict[str, Any]]

Raises:

RuntimeError – If code execution fails or returns invalid data.

async ingest_and_validate(file_url, target_schema)[source]

Ingest and validate records against the Pydantic schema.

Parameters:
  • file_url (str) – URL to the raw CSV file to ingest.

  • target_schema (type[T]) – Pydantic model defining the target data structure.

Returns:

List of validated Pydantic model instances.

Return type:

list[T]

convert_cityjson_to_deckgl(input_path, output_path)[source]

Convert CityJSON to DeckGL-compatible GeoJSON.