Classical Architecture of Data Warehouse
This is the classical data warehouse architecture I learned at the beginning of my career. And to this day, I like to model my data warehouse this way.
In this overview, we’ll delve into each layer of a complete Data Warehouse (DWH) architecture and explore why this modeling approach is effective:
# Staging Area
Staging Area: This initial layer serves as the landing point for data from various source systems.
- Here, data is stored in its original form, as delivered.
- The structure of the staging tables mirrors the source system interfaces, with no inter-table relationships.
- Each table holds data from the latest delivery, which is cleared before the next batch.
- Think of it like a grocery store’s loading dock, where suppliers (source systems) drop off goods (data) for temporary storage.
# Cleansing Area
Cleansing Area: Prior to integration into the Core, data undergoes cleaning in the Cleansing Area.
- This involves filtering out, correcting, or supplementing faulty data.
- Data from different sources is transformed and unified.
- Similar to the Staging Area, only the most recent data batch is kept here.
- A grocery store analogy would be the preparation area where goods are made ready for sale, undergoing quality control and labeling.
Core: Data from various sources converges in the Core, having passed through the Staging and Cleansing areas, and is stored long-term, often for years.
- The Core’s main function is integrating data from diverse sources and organizing it thematically.
- It’s the go-to source for Data Marts and should generally not be accessed directly by users.
# Data Mart
Data Mart: Marts store subsets of Core data, optimized for user queries.
- Each Mart caters to specific applications or user groups, simplifying query complexity and enhancing system usability.
- These can be likened to specialized market stalls in a grocery store, each offering a curated selection of goods.
Metadata: The foundation of the DWH system, metadata, is essential for its smooth operation.
- Business metadata includes descriptions of attributes, drill paths, and aggregation rules for front-end applications.
- Technical metadata covers data structures, mapping rules, and ETL control parameters.
- Operational metadata encompasses log tables, error messages, ETL process logs, and more, essentially forming the DWH’s infrastructure.
While not every Data Warehouse adheres strictly to this structure, with some areas merged or renamed, the essential concept is to segment the system for task specialization. This segmentation facilitates data cleaning, integration, historization, and query handling, simplifying the transformation processes between layers.