Skip to content

Installation

LakeLogic is designed to be lightweight — install only what you need and scale up anytime.

Analogy: Think of LakeLogic like a Swiss Army knife. The base package gives you the blade (Polars engine). Extras snap on tools — Spark for big data, DuckDB for analytics, notifications for alerting, and database connectors for CDC.


Quick Install

# Recommended — fast, conflict-free
uv pip install "lakelogic[polars]"

# Or with pip
pip install "lakelogic[polars]"

That's it. You're ready to run your first contract.


Choose Your Extras

Engines — How You Process Data

Pick the engine that matches your workload:

Extra What You Get Best For
[polars] Polars + Delta Lake + Excel/XML readers Start here — fast local processing, notebooks, CI/CD
[spark] PySpark Petabyte-scale Lakehouse jobs on Databricks
[duckdb] DuckDB + Pandas + PyArrow + Delta Analytical SQL workloads, fast local queries
[engines] All of the above Want everything — switch engines freely

Data Sources — Where You Read From

Extra What You Get Best For
[databases] SQL Server, PostgreSQL, MySQL, MongoDB CDC ingestion from operational databases
[azuresql] SQL Server + Azure AD auth Azure SQL Database / Managed Instance
[postgresql] PostgreSQL + Azure AD auth PostgreSQL CDC
[mysql] MySQL connector MySQL CDC · 🔜 connector wrapper coming soon
[mongodb] MongoDB connector Document store ingestion · 🔜 connector wrapper coming soon
[api] REST API client Pulling data from HTTP APIs
[sftp] SFTP/SSH client File-based ingestion from remote servers
[dlt] dlt + PyArrow Declarative API ingestion · 🔜 contract integration coming soon

Streaming — Real-Time Processing

Extra What You Get Best For
[streaming] Bytewax + Pathway + Kafka + SSE + WebSocket Full real-time stack
[bytewax] Bytewax (Rust-based stream processor) High-performance streaming
[pathway] Pathway (real-time SQL transforms) SQL-first streaming
[kafka] Apache Kafka client Kafka-based event pipelines
[sse] Server-Sent Events client Wikimedia, live feeds
[websocket] WebSocket client Coinbase, Binance, live APIs

Cloud & Warehouses — Where You Deploy

Extra What You Get Best For
[delta] Delta Lake + Azure + AWS + GCP storage Spark-free Delta table reads/writes
[azure] Azure AD + Key Vault + Blob Storage + Databricks SDK Azure-native deployments
[snowflake] Snowflake connector Run contracts directly in Snowflake
[bigquery] BigQuery client Run contracts directly in BigQuery
[cloud] All cloud providers + messaging + warehouses Multi-cloud deployments

Notifications & Secrets — Who Gets Alerted

Extra What You Get Best For
[notifications] Apprise + Jinja2 + Key Vault + Secrets Manager + HVAC Slack, Teams, Email, PagerDuty alerts
[notify] Apprise + HashiCorp Vault Lightweight notification setup
[azure_messaging] Azure Service Bus + Event Grid Azure-native event routing
[aws_messaging] AWS SQS/SNS via Boto3 AWS-native event routing
[gcp_messaging] Google Cloud Pub/Sub GCP-native event routing

AI & Profiling — Smart Contract Generation

Extra What You Get Best For
[ai] OpenAI + Anthropic + Google GenAI AI-powered contract bootstrap (lakelogic bootstrap --ai)
[pii] DataProfiler + Presidio Automatic PII detection, profiling, and anonymization

Bundles — Pre-Packaged Combinations

Extra What You Get Best For
[all] Base install (backwards compat) Minimal footprint
[enterprise] Spark + PII + Bytewax + notebooks Full-stack enterprise deployment
[cli] Typer CLI framework lakelogic command-line tool

Engine Selection Tips

Rule of thumb: Start with [polars]. If you hit scale limits, switch to [spark]. If you want SQL-native analytics, try [duckdb]. Your contracts work on all three — zero code changes.

Scenario Recommended Engine
Local development, notebooks, CI/CD Polars — fastest startup, smallest footprint
Databricks, Unity Catalog, petabyte-scale Spark — distributed, Delta-native
Analytical queries, fast local SQL DuckDB — SQL-first, great for ad-hoc work
Don't know yet Polars — you can always switch later

After installing, warm your environment for Delta Lake support:

lakelogic setup-oss

Developer Installation

If you want to contribute to LakeLogic:

git clone https://github.com/lakelogic/LakeLogic.git
cd lakelogic
uv sync
uv run pytest

Requirements: Python 3.9+ · Windows, macOS, or Linux