LakeLogic Bootstrap
The bootstrap command generates starter contracts + a registry by scanning a landing zone. It accelerates onboarding for new systems when no contracts exist yet.
When to Use It
- New system onboarding with only raw landing files available.
- Legacy data dump where schemas are unknown.
- Rapid POC to validate whether LakeLogic fits a data source.
Basic Usage
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv"
Profiling and PII Detection
Generate profiles and detect PII during bootstrap:
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--profile \
--detect-pii \
--suggest-rules
Sync Mode (Keep Contracts in Sync)
If new files or entities appear later, you can sync the registry without re-creating everything:
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--sync
Update Existing Schemas
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--sync \
--sync-update-schema
Overwrite Existing Contracts
lakelogic bootstrap \
--landing data/landing/new_system \
--output-dir contracts/new_system \
--registry contracts/new_system/_registry.yaml \
--format csv \
--pattern "*.csv" \
--sync \
--sync-overwrite
What It Generates
- One contract per detected entity
- A
_registry.yamlwith all entities enabled - Bronze defaults (schema + quarantine + materialization)
Entity Discovery Rules
- Folder per entity (preferred)
landing/customers/*.csv-> entitycustomers-
landing/orders/*.csv-> entityorders -
File prefix fallback
customers_2026-02-05.csv-> entitycustomersorders_2026-02-05.csv-> entityorders
Real World Examples
Example 1 - New CRM Feed
Landing zone:
Bootstrap:
lakelogic bootstrap \
--landing landing/crm \
--output-dir contracts/crm \
--registry contracts/crm/_registry.yaml \
--format csv \
--pattern "*.csv"
bronze_customers.yaml, bronze_contacts.yaml, and a registry for the CRM system.
Result:
- bronze_customers.yaml
- bronze_contacts.yaml
- _registry.yaml with both entities enabled
Example 2 - Vendor Drop (Parquet)
Landing zone:
Bootstrap:
lakelogic bootstrap \
--landing landing/vendor \
--output-dir contracts/vendor \
--registry contracts/vendor/_registry.yaml \
--format parquet \
--pattern "*.parquet"
Example 3 - One-off Data Dump
Landing zone:
Bootstrap:
lakelogic bootstrap \
--landing landing/audit \
--output-dir contracts/audit \
--registry contracts/audit/_registry.yaml \
--format csv \
--pattern "*.csv"
Notes and Next Steps
- Bootstrap contracts are starter templates. You can add quality rules, transformations, and lineage.
- For production, attach policy packs or shared rules for consistent governance.
- You can immediately run with:
Common Pitfalls
- Mixed schemas in one folder: If a landing folder contains multiple schemas, the inferred contract may be too narrow. Consider splitting by entity or file prefix.
- Inconsistent file naming: Bootstrap relies on folder names or file prefixes to identify entities. Standardize naming to avoid merging unrelated datasets.
- Non-representative samples: Schema inference uses sample rows. If early files are atypical, inferred types may be wrong.
- Nested JSON: JSON with nested structures may require manual adjustments after bootstrap.
Requirements
For profiling and PII detection, install the optional dependencies: