refinery¶
Autonomous ETL Agent using Gemini Thinking Mode + Code Execution.
This module implements the RefineryActor that ingests raw CSV data, uses Gemini’s thinking capabilities to reason about schema mappings, and generates Python transformation code via Local Sandbox Execution.
Attributes¶
Classes¶
Production-grade Ingestion Engine running as a Ray Actor. |
Module Contents¶
- logger¶
- T¶
- class ResourceGuard(max_cpu=85.0, max_ram=85.0)¶
- max_cpu = 85.0¶
- max_ram = 85.0¶
- check_health()¶
System 1 Safety Check: Is the host dying?
- get_docker_limits()¶
Containment Field: Prevent OOM kills on the host.
- class RefineryActor(model='gemini-2.5-flash', api_key=None, thinking_budget=8192, kafka_bootstrap_servers='localhost:9092', mock_mode=False)¶
Production-grade Ingestion Engine running as a Ray Actor.
Features: - off-main-thread execution via Ray - “Thought Loop” observability with Logfire - “Pulse” event publishing to Redpanda - “Patience” via Postgres Job Queue - “Latent Space” via Docker Sandbox
- model = 'gemini-2.5-flash'¶
- thinking_budget = 8192¶
- mock_mode = False¶
- kafka_available = False¶
- guard¶
- docker = None¶
- _schema_to_prompt(schema_cls)¶
- _publish_pulse(event_type, payload)¶
- async maintenance_loop()¶
Background heartbeat to process queued jobs.
- async _execute_in_sandbox(script)¶
Run Python script in Docker Sandbox.
- async ingest(file_url, target_schema, dataset_name='unknown_dataset')¶
Ingest URL -> Plan -> Check Health -> Execute/Queue.
- convert_cityjson_to_deckgl(input_path, output_path)¶
Convert CityJSON to DeckGL-compatible GeoJSON.