Installation Guide
LakeLogic is designed to be lightweight. Install only what you need and scale up anytime.
1. Using uv (Recommended)
uv is the fastest way to install and manage LakeLogic.
# Install everything (recommended for testing)
uv pip install "lakelogic[all]"
# Or install only what you need
uv pip install "lakelogic[polars]"
uv pip install "lakelogic[spark]"
2. Using pip
If you prefer standard pip:
Installation Options (Extras)
Recommended
| Extra | What it includes | Use case |
|---|---|---|
[all] |
Engines + Delta + notifications + databases + streaming | Most users start here โ fast, conflict-free. |
[polars] |
Polars engine + Delta | High-speed local processing. |
[pandas] |
Pandas + DuckDB | For data science teams. |
[duckdb] |
DuckDB native | Fast analytical SQL in-memory. |
Platform-Specific
| Extra | What it includes | Use case |
|---|---|---|
[spark] |
PySpark | Large-scale Lakehouse jobs. |
[snowflake] |
Snowflake connector | Run contracts directly in Snowflake. |
[bigquery] |
BigQuery client | Run contracts directly in BigQuery. |
[cloud] |
All GCP + Azure messaging + Snowflake + BigQuery | Multi-cloud deployments. |
Specialist
| Extra | What it includes | Use case |
|---|---|---|
[notifications] |
Apprise + vault clients | Alerts via Slack, Teams, Email, etc. |
[profiling] |
DataProfiler + Presidio | Schema profiling + PII detection. |
[enterprise] |
Spark + profiling + Bytewax + notebook | Everything (may have numpy conflicts). |
Engine Selection Tips
Selecting the right engine depends on your data source and operating system:
- Polars (Default): Best for remote URLs (
https://), local files (CSV/Parquet), and high-speed local processing. - Spark: Best for large-scale Lakehouse jobs and distributed data. Note: Spark on Windows does not natively support reading directly from
https://URLs for CSV/Parquet. Download remote files locally first if you must use Spark on Windows. - DuckDB: Best for complex SQL analysis and native Iceberg/Delta support on local machines.
Materialization notes:
- Spark engine: Supports
append,overwrite,merge, andscd2strategies natively. Uses distributed DataFrame operations for merge/SCD2, avoiding driver memory bottlenecks at scale. Delta LakeMERGE INTOis used when available. - DuckDB engine: Full support for local and cloud data lakes. Supports
iceberganddeltaformats natively. - Polars engine: Supports
deltaformat natively viadeltalake. - Pandas engine: Full materialization support.
Install [duckdb] or [polars] for high-performance OSS processing. After installing, it is recommended to "warm" your environment for modern formats:
This command pre-installs the necessary DuckDB extensions (Iceberg, Delta, Cloud Drivers) so they are available offline and during runtime.
Developer Installation
If you want to contribute to LakeLogic:
- Clone the repo:
- Sync with uv:
- Run tests:
Clean Developer Install (Recommended for Windows/Jupyter)
To install in editable mode while suppressing warnings about script paths and avoiding dependency conflicts (e.g., NumPy version mismatches with pandas-ta):
Requirements
- Python: 3.9 or higher.
- OS: Windows, macOS, or Linux.
Windows Developer Notes
Python 3.13 Venv Launcher
On some Windows machines with Python 3.13, creating a virtual environment with
python -m venv may log:
Unable to copy 'C:\Program Files\Python313\Lib\venv\scripts\nt\venvlauncher.exe' to '.venv\Scripts\python.exe'
This can cause pip to install native extension wheels for the wrong Python
ABI (e.g., cp312 wheels into a cp313 environment).
Recommended fix: use the py launcher explicitly, which resolves the stub
correctly:
Making lakelogic Available Bare
After pip install lakelogic (outside a venv), the CLI scripts land in the
Python user scripts directory but that directory may not be on PATH. Add it
once:
Open a new terminal โ lakelogic will then be available without a full path.
Console Encoding
LakeLogic automatically reconfigures stdout/stderr to UTF-8 on Windows at
import time. You do not need to set PYTHONIOENCODING=utf-8 in your shell or
CI scripts. This is handled inside the package itself.