Patterns
Patterns are reusable recipes for common data engineering problems. Each pattern includes:
- A working contract YAML
- Sample data you can run locally
- A README explaining when and how to use it
Note: Some patterns use SQL window functions (e.g., ROW_NUMBER) and are best run with the DuckDB or Spark engine.
Available Patterns
| Pattern | What it Solves | Location |
|---|---|---|
| Bronze Quality Gate | Reject bad data at ingestion | examples/03_patterns/bronze_quality_gate/ |
| Dedup & Survivorship | Handle duplicate records | examples/03_patterns/dedup_survivorship/ |
| SCD2 Dimension | Track historical changes | examples/03_patterns/scd2_dimension/ |
| Late Arriving Reprocess | Safe partition backfill | examples/03_patterns/late_arriving_reprocess/ |
| External Python Logic | Custom Python/notebook hooks | examples/03_patterns/external_python_logic/ |
When to Use Each Pattern
| Problem | Pattern |
|---|---|
| Reject garbage at ingestion | Bronze Quality Gate |
| Multiple records per key | Dedup & Survivorship |
| Track historical changes | SCD2 Dimension |
| Backfill without data loss | Late Arriving Reprocess |
| Complex business logic | External Python Logic |
See Also
- Tutorials - Start with the basics first
- Production Examples - Complete end-to-end pipeline