🧠 Second Brain

Search

Search IconIcon to open search

Data Modeling

Last updated Nov 21, 2024

Data modeling has changed; when I started (~20 years ago), choosing between Inmon and Kimball was common.

Today, in data engineering, data modeling creates a structured representation of your organization’s data. Often illustrated visually, this representation helps understand the relationships, constraints, and patterns within the data and serves as a blueprint for gaining business value in designing data systems, such as data warehouses, lakes, or any analytics solution.

In its most straightforward form, data modeling is how we design the flow of our data such that it flows as efficiently and in a structured way, with good data quality and as little redundancy as possible.

Abstract

Data modeling and design are key to creating usable data systems. Designing involves different high-level modeling, such as generation or source database design, data integration, ETL processes, data warehouse schema creation, data lake structuring, BI tool presentation layer design, machine learning, or AI feature engineering.

Further details design involves moving from a Logical Data Model to a Physical Data Model, defining tables with dimensions and facts, partitioning and indices, relationships, constraints, etc. Further, the architect needs to be aware of redundancy and denormalization, the grain of the table, and data optimized for efficient queries of dashboards, ad-hoc notebooks, or data apps, ensuring data quality throughout the process.

# Different Levels

What do you think about different levels of modeling? Generally, when I started (20 years ago), choosing between Inmon and Kimball was common. But today, there are so many layers, levels, and approaches. Did you find a good way of separating or naming the different “levels” (still unsure about levels) to clarify what is meant? Below is a list of what I think so far (I also wrote extensively about it, in case of interest).

LinkedIn Post and Discussion, X/Twitter and dbt Slack. Links (from the post): Data Model Matrix.

# Different Data Modeling Techniques

See Data Modeling Techniques

# (Design) Patterns

Common approaches are well explained here:

others

# Data Modeling is changing

Data Modeling is as much about Data Engineering Architecture as it is about modeling the data only. Therefore besides the below links, many approaches and common architecture you can find in Data Engineering Architecture.

It’s getting more about language than really modeling, Shane Gibson says on Making Data Modeling Accessible. For example, a Data Scientist speaks Wide Tables, a Data engineer talks about facts and dimensions, etc., it’s what I call the different levels of data modeling.

See more on Data Modeling is changing.

# Data Modeling Tools

See Data Modeling Tools.

# Integrating data modeling into data platform tools

Mostly, modeling happens outside the data platform (Logical Data Model). I’d argue you should model on Paper, Excalidraw, or sophisticated model tools (see more on Data Modeling: Architecture Pattern, Tools and the Future (Part 3)) before integrating into any tool.

The next layer (Physical Data Model) would be dbt as the tool I’d use to implement the modeled architecture. It’s SQL, everybody understands, and you get documentation out of the box. Integrated into Dagster, you get a high-level data-flow model from tables to data assets.

More on Logical vs. Physical see Conceptual, Logical to physical Data Models.

Tweet

# Data Modeling Languages

See Data Modeling Languages, or also on Data Model Engines.

# Data Modeling Frameworks

See Data Modeling Frameworks

# Differences to Dimensional Modeling

See Data Modeling – The Unsung Hero of Data Engineering- Modeling Techniques and Data Architecture Patterns (Part 2).

There is more than dimensional modeling:

# Data Modeling part of Data Engineering?

Data modeling, incredibly Dimensional Modeling with defining facts and dimensions, is a big thing for a data engineer, IMO. It would help if you asked vital questions to optimize for data consumers. Do you want to drill down the different products? Daily or monthly enough —keywords Granularity and rollup.

It also lets you consider Big-O implications regarding how often you touch and transfer data. I’d recommend the old  Data Warehouse Toolkit from Ralph Kimball, which initiated many of these concepts and is still applicable today. Mostly, it’s not done in the beginning, but as soon as you get bigger, you wish you had done more :)

Relates to [.

Links:

# Follow-Up Blog Series

I wrote an extensive three-part series about data modeling; check it out on my blog:

  1. An Introduction to Data Modeling
  2. Modeling Approaches and Techniques
  3. Architecture Pattern, Tools and the Future

Origin:
References: enterprise architecture modeling
Created 2022-09-24