refinery

Autonomous ETL Agent using Gemini Thinking Mode + Code Execution.

This module implements the RefineryActor that ingests raw CSV data, uses Gemini’s thinking capabilities to reason about schema mappings, and generates Python transformation code via Local Sandbox Execution.

Attributes

Classes

ResourceGuard

RefineryActor

Production-grade Ingestion Engine running as a Ray Actor.

Module Contents

logger
T
class ResourceGuard(max_cpu=85.0, max_ram=85.0)
max_cpu = 85.0
max_ram = 85.0
check_health()

System 1 Safety Check: Is the host dying?

get_docker_limits()

Containment Field: Prevent OOM kills on the host.

class RefineryActor(model='gemini-2.5-flash', api_key=None, thinking_budget=8192, kafka_bootstrap_servers='localhost:9092', mock_mode=False)

Production-grade Ingestion Engine running as a Ray Actor.

Features: - off-main-thread execution via Ray - “Thought Loop” observability with Logfire - “Pulse” event publishing to Redpanda - “Patience” via Postgres Job Queue - “Latent Space” via Docker Sandbox

model = 'gemini-2.5-flash'
thinking_budget = 8192
mock_mode = False
kafka_available = False
guard
docker = None
_schema_to_prompt(schema_cls)
_publish_pulse(event_type, payload)
async maintenance_loop()

Background heartbeat to process queued jobs.

async _execute_in_sandbox(script)

Run Python script in Docker Sandbox.

async ingest(file_url, target_schema, dataset_name='unknown_dataset')

Ingest URL -> Plan -> Check Health -> Execute/Queue.

convert_cityjson_to_deckgl(input_path, output_path)

Convert CityJSON to DeckGL-compatible GeoJSON.