Search

Search IconIcon to open search

Classical Architecture of Data Warehouse

Last updated Dec 6, 2024

This is the classical data warehouse architecture I learned at the beginning of my career. And to this day, I like to model my data warehouse this way.


From Data Warehouse Blueprints: Business Intelligence in der Praxis : Schnider, Dani, Jordan, Claus, Welker, Peter, Wehner, Joachim: Amazon.de: Bücher, September 2016, created Trivadis. ^437a59

Other views, e.g. from the Medallion Architecture:

Image source from Behind the Hype - The Medallion Architecture Doesn’t Work - YouTube

A modern take with a Semantic Layer or OLAP Cubes, almost like the old days in the above classical architecture:

# Example with Orchestration and Modeling tools

What is your go-to modeling architecture? I like the one below; if it’s complex, I’d like to add cleansing between staging and core.

The example contains tools such as orchestration, modeling, data ingestion and Data Modeling Techniques such as SCD2 and Dimensional Modeling (Kimball):

flowchart TB
    subgraph DWH_Architecture["Modern Data Stack Architecture"]
        direction LR

        subgraph DataFlow["Data Flow"]
            direction LR
            subgraph Source["Data Source"]
                DB[(Source Postgres)]
                TableA[Table_A]
                TableAHistory[Table_A_History]
                WALLogs[WAL-Logs]
                
                DB --- TableA
                DB --- TableAHistory
                DB --- WALLogs
            end

            subgraph Integration["Data Integration (dlt)"]
                Arrow["Arrow (in-memory)"]
                Norm["normalization schema"]
                Load["load"]
                LoadTables["Adding Load-Tables"]
                
                Arrow --> Norm
                Norm --> Load
                Load --> LoadTables
            end

            subgraph Warehouse["Data Warehouse"]
                direction LR
                subgraph LZ["staging (stg)"]
                    Address["Address\n...\nload_date"]
                    AddressHist["Address_History\n...\nload_date"]
                    Car["Car\n...\nload_date"]
                    CarHist["Car_History\n...\nload_date"]
                end

                subgraph Core["core (historization)"]
                    AddressCore["Address\nvalid_from\nvalid_to"]
                    CarCore["Car\nvalid_from\nvalid_to"]
                    TheoryExam["Theory_Exam\ncreated_at"]
                    LicensePlate["License_Plate\nvalid_from\nvalid_to"]
                    MergeSCD2["Merge SCD2"]
                end

                subgraph Mart["data mart (Star Schema)"]
                    direction LR
                    subgraph Facts["Fact Tables"]
                        CarFact["fact_car\n- metrics\n- foreign keys"]
                        TheoryFact["fact_theory_exam\n- metrics\n- foreign keys"]
                    end
                    subgraph Dims["Dimension Tables"]
                        DimAddress["dim_address\n- attributes\n- surrogate key"]
                        DimLicense["dim_license_plate\n- attributes\n- surrogate key"]
                        DimDate["dim_date\n- date attributes\n- surrogate key"]
                    end
                end
            end
        subgraph Tools["Tools & Orchestration"]
            direction TB
            subgraph Orchestration["Dagster: Orchestration"]
                LoadDag["Load DAG"]
                ModelingDag["Modeling DAG"]
                LoadDag --> ModelingDag
            end
            subgraph Modeling["dbt: Data Modeling"]
                LZModels["Landing Zone Models"]
                CoreModels["Core Models"]
                MartModels["Mart Models"]
                LZModels --> CoreModels
                CoreModels --> MartModels
            end
        end
        end
    end

    %% Main data flow connections
    Source --> Integration
    Integration --> LZ
    Address --> AddressHist
    Car --> CarHist
    LZ --> MergeSCD2
    MergeSCD2 --> AddressCore
    MergeSCD2 --> CarCore
    MergeSCD2 --> TheoryExam
    MergeSCD2 --> LicensePlate
    AddressCore --> CarFact
    AddressCore --> TheoryFact
    CarCore --> CarFact
    LicensePlate --> CarFact
    TheoryExam --> TheoryFact

    %% Dimension relationships
    CarFact --- DimAddress
    CarFact --- DimLicense
    CarFact --- DimDate
    TheoryFact --- DimAddress
    TheoryFact --- DimDate

    %% Data Modeling Flow
    LZ -.->|"dbt run"| LZModels
    Core -.->|"dbt run"| CoreModels
    Mart -.->|"dbt run"| MartModels

    %% Orchestration
    LoadDag -.->|"triggers"| Integration
    ModelingDag -.->|"triggers"| Modeling


    %%classDef source fill:#1a202c,color:#ffffff      %% Dark blue-gray with white text
    %%classDef integration fill:#2d3748,color:#ffffff %% Slightly lighter blue-gray
    %%classDef warehouse fill:#2a4365,color:#ffffff   %% Navy blue
    %%classDef core fill:#2c5282,color:#ffffff        %% Medium blue
    %%classDef mart fill:#2b6cb0,color:#ffffff        %% Light blue
    %%classDef modeling fill:#1a365d,color:#ffffff    %% Dark blue
    %%classDef orchestration fill:#233876,color:#ffffff %% Royal blue

    class Source source
    class Integration integration
    class Warehouse warehouse
    class Core core
    class Mart mart
    class Modeling modeling
    class Orchestration orchestration

# Layers

In this overview, we’ll delve into each layer of a complete Data Warehouse (DWH) architecture and explore why this modeling approach is effective:

# Staging Area

Staging Area: This initial layer serves as the landing point for data from various source systems.

# Cleansing Area

Cleansing Area: Prior to integration into the Core, data undergoes cleaning in the Cleansing Area.

# Core

Core: Data from various sources converges in the Core, having passed through the Staging and Cleansing areas, and is stored long-term, often for years.

# Data Mart

Data Mart: Marts store subsets of Core data, optimized for user queries.

# Metatdata

Metadata: The foundation of the DWH system, metadata, is essential for its smooth operation.

While not every Data Warehouse adheres strictly to this structure, with some areas merged or renamed, the essential concept is to segment the system for task specialization. This segmentation facilitates data cleaning, integration, historization, and query handling, simplifying the transformation processes between layers.


Origin:
References: Medallion Architecture
Created 2023-04-27