Skip to content

CLI Reference

The LakeLogic CLI is the high-efficiency entry point for enforcing your data contracts. It is designed for Speed-to-Production and Engine Portability.

Strategic Value

  • Developer Productivity: Bootstrap production-ready contracts from raw data in seconds.
  • Infrastructure Optionality: Use the --engine flag to swap between Polars (local speed) and Spark (cluster scale) with zero code changes.
  • Audit Readiness: Every execution generates a run summary for instant reconciliation.

Invoking the CLI

Run lakelogic with no arguments to display the full help page:

lakelogic
 Usage: lakelogic [OPTIONS] COMMAND [ARGS]...

 LakeLogic — Consistent Data Contracts across engines.
 ...

┌─ Contract Execution ──────────────────────────────────────────────────────┐
│ run        Run a data contract against a source file.                     │
│ bootstrap  Bootstrap contracts and registry from a landing zone.          │
└───────────────────────────────────────────────────────────────────────────┘
┌─ Data Tooling ────────────────────────────────────────────────────────────┐
│ generate   Generate synthetic data from a contract definition.            │
│ import-dbt Import dbt schema.yml / sources.yml -> LakeLogic contract YAML.│
└───────────────────────────────────────────────────────────────────────────┘
┌─ Environment Setup ───────────────────────────────────────────────────────┐
│ setup-oss  Pre-install DuckDB extensions & check OSS dependencies.        │
└───────────────────────────────────────────────────────────────────────────┘
┌─ Help ────────────────────────────────────────────────────────────────────┐
│ help       Show contextual help for LakeLogic commands.                   │
└───────────────────────────────────────────────────────────────────────────┘

Commands are grouped into logical sections matching the Databricks CLI convention. Use lakelogic [COMMAND] --help for full option detail on any command.


Command Groups

Contract Execution

lakelogic run

Validates a source dataset against a contract and (optionally) materializes clean output.

lakelogic run \
  --contract contract.yaml \
  --source data.csv

Key options:

Flag Short Description
--contract -c Path to the YAML contract
--source -s Input file (CSV / Parquet) or table name for warehouse engines
--engine -e Engine: polars, pandas, duckdb, spark, snowflake, bigquery
--stage Apply stage overrides from the contract's stages block (e.g., bronze, silver)
--output-good Save good records to CSV / Parquet
--output-bad Save quarantined records to CSV / Parquet
--output-format csv or parquet (defaults to CSV or inferred from extension)
--materialize / --no-materialize Write good data to the contract materialization target
--materialize-target Override the materialization target path
--verbose -v Enable debug logging
--trace Display a step-by-step execution trace in the terminal

Spark note: --output-good/--output-bad are written with the Spark writer and produce a directory of part files — standard Spark behaviour.

Examples:

# Polars (default — fastest local)
lakelogic run --contract orders.yaml --source data/orders.csv

# DuckDB with quarantine output
lakelogic run --engine duckdb --contract orders.yaml \
  --source data/orders.parquet \
  --output-good good.parquet --output-bad quarantine.parquet \
  --output-format parquet

# Snowflake (table-only)
lakelogic run --engine snowflake --contract contract.yaml \
  --source table:ANALYTICS.SILVER.CUSTOMERS

# Materialise clean records and print a full trace
lakelogic run --contract orders.yaml --source data/orders.csv \
  --materialize --trace

# Apply a stage override
lakelogic run --contract pipeline.yaml --source data.csv --stage silver

lakelogic bootstrap

Scans a landing zone directory, infers schema from sample files, and generates:

  • A ready-to-use contract YAML per entity
  • A _registry.yaml that maps entities to their contracts

This is the Governance Accelerator for Day 1 compliance.

lakelogic bootstrap \
  --landing data/landing \
  --output-dir contracts/ \
  --registry contracts/_registry.yaml \
  --format csv \
  --pattern "*.csv"

Key options:

Flag Description
--landing Landing zone root path
--output-dir Directory to write generated contracts
--registry Output path for the registry YAML
--format Input file format: csv, parquet, json
--pattern File glob pattern (default *.csv)
--layer Layer prefix for dataset names (default bronze)
--sample-rows Rows to sample for schema inference (default 1000)
--sync Sync an existing registry with new landing data
--sync-update-schema Add new columns to existing contracts
--sync-overwrite Overwrite existing contracts entirely
--profile Generate a DataProfiler report per entity
--detect-pii Detect PII using Presidio and tag fields
--suggest-rules Auto-suggest quality rules from data profile
--profile-output-dir Directory for profile JSON reports
--pii-sample-size Sample values per column for PII detection (default 50)

Sync mode — align an existing registry as new data arrives:

lakelogic bootstrap \
  --landing data/landing \
  --output-dir contracts/ \
  --registry contracts/_registry.yaml \
  --sync --sync-update-schema

PII + Profile + rule suggestion in one pass:

lakelogic bootstrap \
  --landing data/landing \
  --output-dir contracts/ \
  --registry contracts/_registry.yaml \
  --profile --detect-pii --suggest-rules \
  --profile-output-dir reports/

Requires lakelogic[profiling] for --profile and --detect-pii.


Data Tooling

lakelogic generate

Generates synthetic test data from a contract definition. Respects field types, nullability, accepted_values, and range constraints. Use --invalid-ratio to inject intentionally bad rows for validating your quarantine pipeline.

lakelogic generate --contract orders.yaml --rows 1000 --output sample.parquet

Key options:

Flag Short Description
--contract -c Path to the contract YAML file
--rows -n Number of rows to generate (default 100)
--output -o Output file path (CSV / Parquet / JSON)
--format -f Output format: parquet, csv, json (default parquet)
--engine -e DataFrame engine: polars, pandas (default polars)
--invalid-ratio Fraction of rows that intentionally break rules (0.0–1.0)
--seed Random seed for reproducibility
--preview Rows to print to console (default 5; 0 = silent)

Examples:

# Generate 1 000 clean rows as Parquet
lakelogic generate --contract orders.yaml --rows 1000 --output sample.parquet

# Inject 10 % bad rows to test quarantine logic
lakelogic generate --contract orders.yaml --rows 500 \
  --invalid-ratio 0.1 --format csv --output orders_with_errors.csv

# 200 rows with Pandas, print 10 preview rows, reproducible
lakelogic generate --contract orders.yaml \
  --rows 200 --engine pandas --seed 42 --preview 10

# Dry-run without saving — just print to console
lakelogic generate --contract orders.yaml --rows 50 --preview 50

lakelogic import-dbt

Imports a dbt schema.yml or sources.yml file and converts its model definitions into LakeLogic contract YAMLs. Eliminates duplicate schema maintenance across dbt and LakeLogic.

lakelogic import-dbt \
  --schema models/schema.yml \
  --output contracts/

Key options:

Flag Short Description
--schema Path to the dbt schema.yml or sources.yml file
--model -m Import a single model by name; omit to import all models
--source-name dbt source name (for sources.yml files)
--source-table dbt source table name (for sources.yml files)
--output -o Output path: a .yaml file for a single contract, or a directory for batch import
--overwrite / --no-overwrite Overwrite existing contracts (default: skip)
--dry-run Print generated YAML to console without writing files
--verbose -v Verbose output

Examples:

# Import a single dbt model
lakelogic import-dbt \
  --schema models/schema.yml \
  --model customers \
  --output contracts/

# Import all models in a schema file
lakelogic import-dbt \
  --schema models/schema.yml \
  --output contracts/

# Import a dbt source table
lakelogic import-dbt \
  --schema models/sources.yml \
  --source-name raw --source-table orders \
  --output contracts/

# Dry-run — preview the generated YAML without writing
lakelogic import-dbt \
  --schema models/schema.yml \
  --model customers \
  --dry-run

Environment Setup

lakelogic setup-oss

Pre-installs DuckDB extensions (Iceberg, Delta, cloud drivers) and checks all OSS dependencies so they are available offline and at job runtime — critical for air-gapped or ephemeral compute environments.

lakelogic setup-oss

Run this once after installing lakelogic[duckdb] or lakelogic[polars]. It verifies deltalake is installed and warms DuckDB's extension cache.


Help

lakelogic help

Prints short usage guidance and examples in the terminal.

lakelogic help
lakelogic help driver
lakelogic help bootstrap

For full option lists, use the --help flag on any command:

lakelogic run --help
lakelogic generate --help
lakelogic import-dbt --help

Pipeline Driver

The registry-driven driver is exposed as the separate entry point lakelogic-driver. It orchestrates Bronze → Silver → Gold pipelines from a _registry.yaml.

lakelogic-driver \
  --registry examples/insurance_elt/contracts/insurance/_registry.yaml \
  --reference-registry examples/insurance_elt/contracts/shared/reference/_registry.yaml \
  --gold-registry examples/insurance_elt/contracts/insurance/warehouse/_registry.yaml \
  --layers reference,bronze,silver,gold \
  --window last_success

See Driver Reference for the full option list.


Windows Notes

Console Encoding

LakeLogic automatically reconfigures stdout and stderr to UTF-8 on Windows at startup. This means:

  • No need to set PYTHONIOENCODING=utf-8 manually
  • Rich panel borders, Unicode arrows, and emoji in help text render correctly
  • Works in cmd.exe, PowerShell, and Windows Terminal alike

This is handled inside the package and requires no external wrapper scripts.

Making lakelogic Available on PATH

After pip install lakelogic, the lakelogic.exe script lands in Python's user scripts directory. If your shell cannot find it, add that directory to your PATH once:

:: For Python 3.13 (adjust version as needed)
setx PATH "%USERPROFILE%\AppData\Roaming\Python\Python313\Scripts;%PATH%"

Open a new terminal and lakelogic will be available bare.

Developer installs: When working from a cloned repo with pip install -e ., the same scripts directory is used. To isolate dependencies in a virtual environment see the Developer Installation guide.