Tutorial: Medallion Architecture
Learn the Bronze → Silver pipeline pattern used in modern lakehouses.
What You'll Learn
- Stages - Define different processing behaviors per layer
- Bronze Layer - Ingest raw data with minimal rules
- Silver Layer - Apply full quality validation
- Materialization - Write output to Parquet files
Files
The example files are located at:
examples/02_core_patterns/medallion_architecture/
├── contract.yaml # Multi-stage contract
├── data/
│ └── crm_export.csv # Raw CRM data
└── quickstart_tutorial.ipynb # Interactive notebook
The Pattern
Raw CSV → [Bronze Stage] → bronze.parquet → [Silver Stage] → silver.parquet
↓ ↓
Minimal rules Full validation
Capture everything Quality gates
Contract Breakdown
# Bronze: Ingest everything, minimal validation
stages:
bronze:
source:
type: raw_landing
path: "data/crm_*.csv"
materialization:
strategy: overwrite
target_path: "data/bronze/bronze_customers.parquet"
quality:
row_rules: [] # No rules at Bronze - capture all raw data
# Silver: Full validation (default stage)
source:
type: bronze_layer
path: "data/bronze/bronze_customers.parquet"
quality:
row_rules:
- regex_match:
field: email
pattern: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"
- name: positive_spend
sql: "total_spend >= 0"
materialization:
strategy: overwrite
target_path: "data/silver/silver_customers.parquet"
Run It
Full Pipeline (Bronze then Silver)
from lakelogic import DataProcessor
# Run Bronze stage
proc = DataProcessor(contract="contract.yaml", stage="bronze")
bronze_good, bronze_bad = proc.run("data/crm_export.csv")
# Run Silver stage (reads Bronze output)
proc = DataProcessor(contract="contract.yaml") # default = silver
silver_good, silver_bad = proc.run()
Interactive Notebook
Open examples/02_core_patterns/medallion_architecture/quickstart_tutorial.ipynb for a guided walkthrough.
Key Concepts
| Concept | Bronze | Silver |
|---|---|---|
| Purpose | Capture raw data | Validate & clean |
| Quality Rules | None or minimal | Full validation |
| Schema | Loose | Strict |
| Failures | Rare | Expected (quarantine) |
Next Steps
- Notifications & Secrets - Configure alerts
- Dedup & Survivorship - Handle duplicates