LakeLogic Contract Reference
What Is LakeLogic?
The short version: You describe your data in YAML. LakeLogic builds the pipeline.
Think of LakeLogic like a building blueprint system for data. Just as an architect draws blueprints that builders follow to construct a house โ with exact specifications for every room, door, and electrical outlet โ you write YAML contracts that LakeLogic follows to build data pipelines.
No construction crew (developers writing Spark code) needed. The blueprint IS the building instruction.
The Three Levels of Configuration
LakeLogic organizes your data estate like a company org chart:
๐ข Domain (Marketing, Sales, Finance)
โ "Who owns this data?"
โ โ _domain.yaml โ ownership, SLOs, contacts, alerts
โ
โโโ ๐ญ System (Google Analytics, Salesforce, SAP)
โ "Where does this data come from?"
โ โ _system.yaml โ storage, environments, settings
โ
โโโ ๐ Data Product (events, customers, orders)
"What does this specific table look like?"
โ entity_v1.0.yaml โ schema, quality rules, transforms
Analogy: A domain is like a department (Marketing). A system is like a tool that department uses (Google Analytics). A data product is like a specific report from that tool (website sessions).
How Inheritance Works
Settings flow downward โ configure once at the top, override only where needed:
Domain sets "alert the on-call team on failures"
โ inherited by all systems in the domain
System sets "use Azure storage in production"
โ inherited by all contracts in the system
Contract sets "this specific table needs SCD2 history tracking"
This means you write less YAML and get consistent governance across hundreds of data products.
Which Page Do I Need?
| I'm a... | I want to... | Go to |
|---|---|---|
| Data Lead | Set up domain ownership & SLOs | Domain Config |
| Platform Engineer | Configure storage & cloud environments | System Config |
| Data Engineer | Create my first data contract | Data Product Contracts |
| Data Engineer | Ingest files or tables | Ingestion |
| Data Engineer | Choose how to track "what's new" | Watermark Strategies |
| Data Engineer | Add transforms (rename, join, SQL) | Transformations |
| Data Engineer | Add quality checks | Quality |
| Data Engineer | Configure how data is written | Materialization |
| Data Modeller | Build SCD2 dimensions or fact tables | Dimensional Modeling |
| Data Engineer | Define schema, PII masking | Schema & Model |
| Data Lead | Set up alerting | Notifications |
| Compliance | Add GDPR / EU AI Act metadata | Compliance |
| Anyone | Understand versioning | Versioning |
| Power User | See everything in one file | Complete Template |
Why Data Mesh?
LakeLogic implements all four principles from ThoughtWorks' Data Mesh framework:
| Principle | What It Means (Plain English) | How LakeLogic Enables This |
|---|---|---|
| Domain Ownership | The people closest to the data own it | _domain.yaml names the owner, their contacts, and cost centre |
| Data as a Product | Treat each dataset like a product with quality guarantees | Each contract declares schema, quality rules, and SLOs |
| Self-Serve Platform | Give teams tools so they don't wait on a central team | Write YAML โ run pipeline. No tickets, no handoffs |
| Federated Governance | Consistent rules without a bottleneck | Domain-level SLOs inherited automatically by every table |
Bottom line: LakeLogic lets each team own their data, while the platform ensures nobody ships garbage to production.