🧠 Second Brain
Search
Data Vault
A data vault is a Data Modeling approach used to build a data warehouse for enterprise-scale analytics. The data vault has three types of entities: hubs, links, and satellites.
Hubs represent core business concepts, links represent relationships between hubs, and satellites store information about hubs and the relationships between them.
# Features
The Data Vault methodology represents a dynamic and flexible approach to managing Big Data and evolving data connection points in your Data Warehouse. Recently, there has been a significant shift towards using Data Vaults as governed Data Lakes. This shift addresses the key challenges we’ve identified in Data Warehousing:
- Adapting to changing business environments
- Handling massive data sets
- Reducing the complexities of Data Warehouse design
- Enhancing accessibility for business users by modeling close to the business domain
- Allowing seamless integration of new data sources without affecting the existing architecture
This method is proving to be highly effective and efficient, facilitating easier design, build, population, and modification of Data Warehouses. This is where Data Warehouse Automation can be particularly beneficial.
# Why Data Vault 2.0?
Data Vault 2.0 is the prescriptive, industry-standard methodology for turning raw data into actionable intelligence, leading to tangible business outcomes. Follow our proactive, proven recipe and transform your raw data into information that will allow you to produce the results that your business finds most valuable.
Video about “Behind the Hype: Should you ever build a Data Vault in a Lakehouse?”
Write-optimized approach (opposed to snowflake for querying) Video Lin
# When to Use
- Managing numerous disparate data sources
- Accommodating frequent schema changes (DDL) in source OLTP databases
# Layers
- ? Lanzing Zone (LZN)
- Raw Data Vault (RDV)
- Business Data Vault (BDV)
- Universal Data Model (UDM)
# Difference between 1.0 and 2.0
Data Vault 1.0, introduced by Dan Linstedt in the early 2000s, established the core principles:
- Hub, Link, and Satellite structure
- Business keys in Hubs
- Relationships captured in Links
- Descriptive data in Satellites
- Focus on historical tracking and auditability
Data Vault 2.0, released around 2013, built upon 1.0 by adding:
- Integration with big data platforms and NoSQL databases
- Support for unstructured and semi-structured data
- Advanced hash key implementation for performance
- More emphasis on parallel loading and scalability
- Incorporation of virtualization concepts
- Methodologies for handling real-time data streams
- Introduction of point-in-time and bridge tables as first-class citizens
- More formal governance and documentation requirements
# Dan Linstedt vs. Hans Hultgren:
Dan Linstedt is the original creator of the Data Vault methodology. His approach tends to be more focused on:
- Technical implementation details
- Performance optimization
- Strict adherence to core Data Vault principles
- Enterprise scalability
- Integration with modern data platforms
Hans Hultgren has been a significant contributor to Data Vault evolution, with his approach emphasizing:
- Business alignment and modeling practices
- Practical implementation guidance
- More flexible interpretation of some Data Vault rules
- Focus on teaching and making concepts accessible
- Integration with agile methodologies
# Raw vs. Business Vault
Raw Vault is the first layer where data is loaded from source systems, following strict Data Vault modeling principles:
- It maintains full history and auditability of source data
- Data is stored in its original form without business transformations
- Uses Hubs (unique business keys), Links (relationships), and Satellites (descriptive attributes)
- Focuses on capturing and preserving source data exactly as received
Business Vault serves as a transformation layer that:
- Can be a Logical Data Model, not physical database objects
- Contains derived business rules and calculations
- Implements data quality rules and business definitions
- May combine data from multiple Raw Vault entities
- Creates business-friendly views and structures
- Can include Point-in-Time (PIT) and Bridge tables for easier querying
- Sometimes implements slowly changing dimensions (SCD) logic
Origin: Data Modeling Techniques
References: Dimensional Modeling