Skip to content

LakeLogic Contract Reference

What Is LakeLogic?

The short version: You describe your data in YAML. LakeLogic builds the pipeline.

Think of LakeLogic like a building blueprint system for data. Just as an architect draws blueprints that builders follow to construct a house โ€” with exact specifications for every room, door, and electrical outlet โ€” you write YAML contracts that LakeLogic follows to build data pipelines.

No construction crew (developers writing Spark code) needed. The blueprint IS the building instruction.


The Three Levels of Configuration

LakeLogic organizes your data estate like a company org chart:

๐Ÿข Domain (Marketing, Sales, Finance)
โ”‚   "Who owns this data?"
โ”‚   โ†’ _domain.yaml โ€” ownership, SLOs, contacts, alerts
โ”‚
โ”œโ”€โ”€ ๐Ÿญ System (Google Analytics, Salesforce, SAP)
โ”‚   "Where does this data come from?"
โ”‚   โ†’ _system.yaml โ€” storage, environments, settings
โ”‚
โ””โ”€โ”€ ๐Ÿ“„ Data Product (events, customers, orders)
    "What does this specific table look like?"
    โ†’ entity_v1.0.yaml โ€” schema, quality rules, transforms

Analogy: A domain is like a department (Marketing). A system is like a tool that department uses (Google Analytics). A data product is like a specific report from that tool (website sessions).

How Inheritance Works

Settings flow downward โ€” configure once at the top, override only where needed:

Domain sets "alert the on-call team on failures"
  โ†“ inherited by all systems in the domain
System sets "use Azure storage in production"
  โ†“ inherited by all contracts in the system
Contract sets "this specific table needs SCD2 history tracking"

This means you write less YAML and get consistent governance across hundreds of data products.


Which Page Do I Need?

I'm a... I want to... Go to
Data Lead Set up domain ownership & SLOs Domain Config
Platform Engineer Configure storage & cloud environments System Config
Data Engineer Create my first data contract Data Product Contracts
Data Engineer Ingest files or tables Ingestion
Data Engineer Choose how to track "what's new" Watermark Strategies
Data Engineer Add transforms (rename, join, SQL) Transformations
Data Engineer Add quality checks Quality
Data Engineer Configure how data is written Materialization
Data Modeller Build SCD2 dimensions or fact tables Dimensional Modeling
Data Engineer Define schema, PII masking Schema & Model
Data Lead Set up alerting Notifications
Compliance Add GDPR / EU AI Act metadata Compliance
Anyone Understand versioning Versioning
Power User See everything in one file Complete Template

Why Data Mesh?

LakeLogic implements all four principles from ThoughtWorks' Data Mesh framework:

Principle What It Means (Plain English) How LakeLogic Enables This
Domain Ownership The people closest to the data own it _domain.yaml names the owner, their contacts, and cost centre
Data as a Product Treat each dataset like a product with quality guarantees Each contract declares schema, quality rules, and SLOs
Self-Serve Platform Give teams tools so they don't wait on a central team Write YAML โ†’ run pipeline. No tickets, no handoffs
Federated Governance Consistent rules without a bottleneck Domain-level SLOs inherited automatically by every table

Bottom line: LakeLogic lets each team own their data, while the platform ensures nobody ships garbage to production.