Search
Classical Architecture of Data Warehouse
This is the classical data warehouse architecture I learned at the beginning of my career. And to this day, I like to model my data warehouse this way.
Going from Staging -> Cleansing -> Core -> Marts:

From
Data Warehouse Blueprints: Business Intelligence in der Praxis : Schnider, Dani, Jordan, Claus, Welker, Peter, Wehner, Joachim: Amazon.de: Bücher, September 2016, created Trivadis. ^437a59
# Other Views - All the same?
I almost think it does not matter which model or playbook you follow, as long as you do it consistently and all follow the same. So having the dbt, the Kimball DW and BI Lifecycle Methodology and others as the documented goal, can make everything much simpler.
Below you find some more to follow.
# Medallion Architecture
Other views, e.g. from the Medallion Architecture:

Image source from
Behind the Hype - The Medallion Architecture Doesn’t Work - YouTube
A modern take with a Semantic Layer or OLAP Cubes, almost like the old days in the above classical architecture:

# Azure
Same architecture, different names. In the Azure world, recommendations are: raw → conformed (cleaned) → enriched → curated based on
Data lake zones and containers - Cloud Adoption Framework | Microsoft Learn. Thanks to
Dominik Hron
# dbt
dbt uses staging → intermediate → marts, see their definitions in
How we structure our dbt projects.
# Example with Orchestration and Modeling tools
What is your go-to modeling architecture? I like the one below; if it’s complex, I’d like to add cleansing between staging and core.
# Modern Data Architecture and Orchestration
The example contains tools such as orchestration, modeling, data ingestion and Data Modeling Techniques such as SCD2 and Dimensional Modeling (Kimball):
flowchart TB
subgraph DWH_Architecture["Modern Data Stack Architecture"]
direction LR
subgraph DataFlow["Data Flow"]
direction LR
subgraph Source["Data Source"]
DB[(Source Postgres)]
TableA[Table_A]
TableAHistory[Table_A_History]
WALLogs[WAL-Logs]
DB --- TableA
DB --- TableAHistory
DB --- WALLogs
end
subgraph Integration["Data Integration (dlt)"]
Arrow["Arrow (in-memory)"]
Norm["normalization schema"]
Load["load"]
LoadTables["Adding Load-Tables"]
Arrow --> Norm
Norm --> Load
Load --> LoadTables
end
subgraph Warehouse["Data Warehouse"]
direction LR
subgraph LZ["staging (stg)"]
Address["Address\n...\nload_date"]
AddressHist["Address_History\n...\nload_date"]
Car["Car\n...\nload_date"]
CarHist["Car_History\n...\nload_date"]
end
subgraph Core["core (historization)"]
AddressCore["Address\nvalid_from\nvalid_to"]
CarCore["Car\nvalid_from\nvalid_to"]
TheoryExam["Theory_Exam\ncreated_at"]
LicensePlate["License_Plate\nvalid_from\nvalid_to"]
MergeSCD2["Merge SCD2"]
end
subgraph Mart["data mart (Star Schema)"]
direction LR
subgraph Facts["Fact Tables"]
CarFact["fact_car\n- metrics\n- foreign keys"]
TheoryFact["fact_theory_exam\n- metrics\n- foreign keys"]
end
subgraph Dims["Dimension Tables"]
DimAddress["dim_address\n- attributes\n- surrogate key"]
DimLicense["dim_license_plate\n- attributes\n- surrogate key"]
DimDate["dim_date\n- date attributes\n- surrogate key"]
end
end
end
subgraph Tools["Tools & Orchestration"]
direction TB
subgraph Orchestration["Orchestration"]
LoadDag["Load DAG"]
ModelingDag["Modeling DAG"]
LoadDag --> ModelingDag
end
subgraph Modeling["dbt: Data Modeling"]
LZModels["Landing Zone Models"]
CoreModels["Core Models"]
MartModels["Mart Models"]
LZModels --> CoreModels
CoreModels --> MartModels
end
end
end
end
%% Main data flow connections
Source --> Integration
Integration --> LZ
Address --> AddressHist
Car --> CarHist
LZ --> MergeSCD2
MergeSCD2 --> AddressCore
MergeSCD2 --> CarCore
MergeSCD2 --> TheoryExam
MergeSCD2 --> LicensePlate
AddressCore --> CarFact
AddressCore --> TheoryFact
CarCore --> CarFact
LicensePlate --> CarFact
TheoryExam --> TheoryFact
%% Dimension relationships
CarFact --- DimAddress
CarFact --- DimLicense
CarFact --- DimDate
TheoryFact --- DimAddress
TheoryFact --- DimDate
%% Data Modeling Flow
LZ -.->|"dbt run"| LZModels
Core -.->|"dbt run"| CoreModels
Mart -.->|"dbt run"| MartModels
%% Orchestration
LoadDag -.->|"triggers"| Integration
ModelingDag -.->|"triggers"| Modeling
%%classDef source fill:#1a202c,color:#ffffff %% Dark blue-gray with white text
%%classDef integration fill:#2d3748,color:#ffffff %% Slightly lighter blue-gray
%%classDef warehouse fill:#2a4365,color:#ffffff %% Navy blue
%%classDef core fill:#2c5282,color:#ffffff %% Medium blue
%%classDef mart fill:#2b6cb0,color:#ffffff %% Light blue
%%classDef modeling fill:#1a365d,color:#ffffff %% Dark blue
%%classDef orchestration fill:#233876,color:#ffffff %% Royal blue
class Source source
class Integration integration
class Warehouse warehouse
class Core core
class Mart mart
class Modeling modeling
class Orchestration orchestration

# Layers
In this overview, we’ll delve into each layer of a complete Data Warehouse (DWH) architecture and explore why this modeling approach is effective:
# Staging Area
Staging Area: This initial layer serves as the landing point for data from various source systems.
- Here, data is stored in its original form, as delivered.
- The structure of the staging tables mirrors the source system interfaces, with no inter-table relationships.
- Each table holds data from the latest delivery, which is cleared before the next batch.
- Think of it like a grocery store’s loading dock, where suppliers (source systems) drop off goods (data) for temporary storage.
# Cleansing Area
Cleansing Area: Prior to integration into the Core, data undergoes cleaning in the Cleansing Area.
- This involves filtering out, correcting, or supplementing faulty data.
- Data from different sources is transformed and unified.
- Similar to the Staging Area, only the most recent data batch is kept here.
- A grocery store analogy would be the preparation area where goods are made ready for sale, undergoing quality control and labeling.
# Core
Core: Data from various sources converges in the Core, having passed through the Staging and Cleansing areas, and is stored long-term, often for years.
- The Core’s main function is integrating data from diverse sources and organizing it thematically.
- It’s the go-to source for Data Marts and should generally not be accessed directly by users.
# Data Mart
Data Mart: Marts store subsets of Core data, optimized for user queries.
- Each Mart caters to specific applications or user groups, simplifying query complexity and enhancing system usability.
- These can be likened to specialized market stalls in a grocery store, each offering a curated selection of goods.
# Metatdata
Metadata: The foundation of the DWH system, metadata, is essential for its smooth operation.
- Business metadata includes descriptions of attributes, drill paths, and aggregation rules for front-end applications.
- Technical metadata covers data structures, mapping rules, and ETL control parameters.
- Operational metadata encompasses log tables, error messages, ETL process logs, and more, essentially forming the DWH’s infrastructure.
While not every Data Warehouse adheres strictly to this structure, with some areas merged or renamed, the essential concept is to segment the system for task specialization. This segmentation facilitates data cleaning, integration, historization, and query handling, simplifying the transformation processes between layers.
Origin:
References: Medallion Architecture, One Big Table (OBT), Data Modeling
Created 2023-04-27