unsprawl.refinery¶
Autonomous ETL Agent using Gemini Thinking Mode + Code Execution.
This module implements the RefineryAgent that ingests raw CSV data, uses Gemini’s thinking capabilities to reason about schema mappings, and generates Python transformation code via Code Execution.
Example
>>> from unsprawl.refinery import RefineryAgent
>>> from pydantic import BaseModel
>>>
>>> class PropertyRecord(BaseModel):
... address: str
... price: float
... area_sqm: float
>>>
>>> agent = RefineryAgent()
>>> records = await agent.ingest(
... "https://example.com/raw_data.csv",
... PropertyRecord
... )
Attributes¶
Classes¶
Autonomous ETL agent that transforms raw CSV to structured Pydantic models. |
Module Contents¶
- logger¶
- T¶
- class RefineryAgent(model='gemini-2.5-flash', api_key=None, thinking_budget=8192)[source]¶
Autonomous ETL agent that transforms raw CSV to structured Pydantic models.
Uses Gemini’s Thinking Mode for deep reasoning about column mappings and Code Execution to run the generated transformation scripts.
- Parameters:
model (str) – The Gemini model to use. Defaults to gemini-2.5-flash for thinking support.
api_key (str | None) – Google API key. If None, reads from GOOGLE_API_KEY environment variable.
thinking_budget (int) – Token budget for thinking. Higher values enable deeper reasoning.
Notes
Requires google-genai SDK (NOT google.generativeai)
Code Execution runs in a sandboxed environment
- model = 'gemini-2.5-flash'¶
- thinking_budget = 8192¶
- client¶
- _schema_to_prompt(schema_cls)[source]¶
Convert a Pydantic model to a prompt-friendly schema description.
- async ingest(file_url, target_schema)[source]¶
Ingest a CSV URL and transform it to match the target schema.
- Parameters:
file_url (str) – URL to the raw CSV file to ingest.
target_schema (type[BaseModel]) – Pydantic model defining the target data structure.
- Returns:
List of records conforming to the target schema.
- Return type:
- Raises:
RuntimeError – If code execution fails or returns invalid data.
- async ingest_and_validate(file_url, target_schema)[source]¶
Ingest and validate records against the Pydantic schema.
- Parameters:
file_url (str) – URL to the raw CSV file to ingest.
target_schema (type[T]) – Pydantic model defining the target data structure.
- Returns:
List of validated Pydantic model instances.
- Return type:
list[T]