LakeLogic Bootstrap
The bootstrap command generates starter contracts + a registry by scanning a landing zone. It accelerates onboarding for new systems when no contracts exist yet.
When to Use It
- New system onboarding with only raw landing files available.
- Legacy data dump where schemas are unknown.
- Rapid POC to validate whether LakeLogic fits a data source.
- Database onboarding — generate contracts from an existing catalog, schema, or warehouse.
Bootstrap Sources
| Source | Command | What It Does |
|---|---|---|
| Landing zone (files) | --landing data/ |
Scans CSV/JSON/Parquet files, infers schemas |
| Unity Catalog | --catalog catalog.schema |
Reads INFORMATION_SCHEMA from Databricks |
| Snowflake | --database ANALYTICS.PUBLIC |
Introspects Snowflake schema |
| PostgreSQL | --database postgresql://... |
Reads information_schema.columns |
| BigQuery | --database project.dataset |
Reads BigQuery dataset metadata |
| DuckDB | --database duckdb:///path.db |
Reads DuckDB information schema |
File-Based Bootstrap
Basic Usage
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv"
Profiling and PII Detection
Generate profiles and detect PII during bootstrap:
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--profile \
--detect-pii \
--suggest-rules
Sync Mode (Keep Contracts in Sync)
If new files or entities appear later, you can sync the registry without re-creating everything:
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--sync
Update Existing Schemas
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--sync \
--sync-update-schema
Overwrite Existing Contracts
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--sync \
--sync-overwrite
Database & Catalog Bootstrap
Generate contracts directly from an existing database schema. LakeLogic introspects the catalog, reads column metadata (types, nullability, comments), and produces one contract per table — no data files required.
Unity Catalog (Databricks)
lakelogic bootstrap \
--catalog retail_prod.marketing \
--output-dir contracts/marketing \
--registry contracts/marketing/_system.yaml \
--detect-pii \
--suggest-rules \
--ai
This connects to your Databricks workspace via the configured Spark session and:
- Lists all tables in
retail_prod.marketing - Reads column types, nullability, and column comments from
INFORMATION_SCHEMA - Generates one contract per table with proper type mappings
- Optionally runs AI to suggest quality rules from column names and comments
- Creates a
_system.yamlregistry listing all discovered entities
Snowflake
lakelogic bootstrap \
--database "snowflake://user@account/ANALYTICS/PUBLIC" \
--output-dir contracts/analytics \
--registry contracts/analytics/_system.yaml \
--detect-pii
PostgreSQL / MySQL
lakelogic bootstrap \
--database "postgresql://user:pass@host:5432/mydb?schema=public" \
--output-dir contracts/mydb \
--registry contracts/mydb/_system.yaml
DuckDB (Local)
lakelogic bootstrap \
--database "duckdb:///path/to/analytics.db" \
--output-dir contracts/analytics \
--registry contracts/analytics/_system.yaml
What Catalog Bootstrap Generates
For each table, the generated contract includes:
- Schema: All columns with correct types mapped to LakeLogic types (
string,long,double,timestamp, etc.) - Required fields: Columns with
NOT NULLconstraints marked asrequired: true - Descriptions: Column comments imported as
description:fields - PII detection: Columns matching PII patterns (email, phone, SSN) annotated with
pii: true - Primary keys: Extracted from catalog constraints and set as
primary_key - Suggested rules: AI-generated quality rules when
--aiis enabled
Example Output
# Auto-generated by: lakelogic bootstrap --catalog retail_prod.marketing
version: 1.0.0
info:
title: "Silver Marketing Sessions"
table_name: "silver_google_analytics_sessions"
target_layer: "silver"
domain: "marketing"
system: "google_analytics"
model:
fields:
- name: "session_id"
type: "string"
required: true
description: "Unique session identifier from GA4"
- name: "user_id"
type: "string"
pii: true
description: "Pseudonymous user ID"
- name: "session_start"
type: "timestamp"
required: true
- name: "page_views"
type: "integer"
- name: "bounce_rate"
type: "double"
primary_key: ["session_id"]
quality:
row_rules:
- not_null: "session_id"
- not_null: "session_start"
- sql: "page_views >= 0"
dataset_rules:
- unique: "session_id"
What It Generates
- One contract per detected entity
- A
_registry.yamlwith all entities enabled - Bronze defaults (schema + quarantine + materialization)
Entity Discovery Rules
- Folder per entity (preferred)
landing/customers/*.csv-> entitycustomers-
landing/orders/*.csv-> entityorders -
File prefix fallback
customers_2026-02-05.csv-> entitycustomersorders_2026-02-05.csv-> entityorders
Real World Examples
Example 1 - New CRM Feed
Landing zone:
Bootstrap:
lakelogic bootstrap \
--landing landing/crm \
--output-dir contracts/crm \
--registry contracts/crm/_registry.yaml \
--format csv \
--pattern "*.csv"
bronze_customers.yaml, bronze_contacts.yaml, and a registry for the CRM system.
Result:
- bronze_customers.yaml
- bronze_contacts.yaml
- _registry.yaml with both entities enabled
Example 2 - Vendor Drop (Parquet)
Landing zone:
Bootstrap:
lakelogic bootstrap \
--landing landing/vendor \
--output-dir contracts/vendor \
--registry contracts/vendor/_registry.yaml \
--format parquet \
--pattern "*.parquet"
Example 3 - One-off Data Dump
Landing zone:
Bootstrap:
lakelogic bootstrap \
--landing landing/audit \
--output-dir contracts/audit \
--registry contracts/audit/_registry.yaml \
--format csv \
--pattern "*.csv"
Notes and Next Steps
- Bootstrap contracts are starter templates. You can add quality rules, transformations, and lineage.
- For production, attach policy packs or shared rules for consistent governance.
- You can immediately run with:
Common Pitfalls
- Mixed schemas in one folder: If a landing folder contains multiple schemas, the inferred contract may be too narrow. Consider splitting by entity or file prefix.
- Inconsistent file naming: Bootstrap relies on folder names or file prefixes to identify entities. Standardize naming to avoid merging unrelated datasets.
- Non-representative samples: Schema inference uses sample rows. If early files are atypical, inferred types may be wrong.
- Nested JSON: JSON with nested structures may require manual adjustments after bootstrap.
Requirements
For profiling and PII detection, install the optional dependencies: