Search

Search IconIcon to open search

Medallion Architecture

Last updated Feb 23, 2025

The databricks medallion architecture is not really an architecture, but more Approach or Pattern with three data stages: bronze, silver, and gold.


Image Source by Jorrit Sandbrink

Data flows through the layers from dirty to clean, normalized to denormalized, and granular to aggregated. The gold layer often represents the final stage of this transformation.

# Benefits and Considerations

# Historical Context

The Medallion Architecture, created and announced by Databricks, can be seen as an evolution of Classical Architecture of Data Warehouse, with layers such as stage -> cleansing -> core -> mart, but optimized for Data Lakes and modern data processing needs.

Simon Whiteley has another great overview that combines the two and argues that every company has different requirements and, therefore, different layers. Not each layer of the medallion architecture must have only one layer, as shown in the image, it can contain multiple:

Image source from Behind the Hype - The Medallion Architecture Doesn’t Work - YouTube

graph LR
    %% Input sources
    B[Batch] --> Bronze
    S[Streaming] --> Bronze
    
    %% Main flow
    subgraph Bronze[Bronze Layer]
        B1[Raw Integration
Landing zone
No schema needed] end subgraph Silver[Silver Layer] S1[Filtered, Cleaned
Augmented
Define & evolve schema] end subgraph Gold[Gold Layer] G1[Business-oriented
Denormalized
Clean data delivery] end subgraph Platinum[Platinum Layer] P1[Semantic Layer
Aggregated
Sub-seconds] end %% Connections between layers Bronze -->|cleaning|Silver Silver -->|organize| Gold Gold --> |curate|Platinum %% Output connections Platinum --> Excel[Excel] Platinum --> BI[BI] Platinum --> ML[ML/AI] Platinum --> Apps[Data Apps] %% Styling classDef default fill:#f9f9f9,stroke:#333,stroke-width:1px classDef platinum fill:#5f9ea0,color:white class Platinum platinum

^b4b488

# Implementation

Databricks provides tools like Delta Live Tables (DLT) that allow users to build data pipelines with Bronze, Silver, and Gold tables using minimal code. These pipelines can be built on Apache Spark Structured Streaming for real-time data processing.

# Variations

As with the Classical Architecture of Data Warehouse, Medallion Architectures can vary in their layers.


Origin: Iceberg + Spark + Trino + Dagster: modern, open-source data stack demo | by ZD | Jul, 2022 | Dev Genius
References: Trivadis Data Warehouse Layers.
Created 2022-08-16