🧠 Second Brain


Search IconIcon to open search

One Big Table (OBT)

Last updated Mar 13, 2024

And extension of Kimball, suitable for long-term analysis. Essentially a denormalized table, akin to what is found in a Data Mart. And also synonym with a Wide Table.

Data Modeling in the Modern Data Stack - YouTube

On Normalization vs Denormalization, One Big Table is very much on the right with most denormalization.

The OBT technique is nothing new; at the start of my career, we built data marts with materialized views and used them in similar ways to OBT today.

They are very similar to initially discussed MVs and to data martsWide TablesSuper TableSnapshotting, or just any denormalized table.

# History

The term “One Big Table” (OBT) doesn’t seem to have a single point of origin. This means it either came up from the natural language to make one big table, therefore also the close connection to existing terms, or someone tried to create a new marketing term. I initially found the first appearance in 2021 and later in 2022.

But it might have come from BigQuery with its insufficient joining capabilities. Google suggests people put everything in one big table[^note1]:

“February 2028, Josh Andrews, dbt Slack
I haven’t looked at Bigquery in a while. Still, when I spoke to the Google team a couple of years ago about building a “true” data warehouse with facts and dimensions and asked for their recommendations about data modeling, they said essentially, “Just denormalize everything and put it in one big table”. Also see what Bora Beran has to say, Bigtable Product Lead at Google. 

From there, it might have been a common term used in the community for the denormalized or wide table or even “Super Table”. On the other hand, Super Table has been announced and will probably be introduced here in 2022. If you have more information about the origin, I’d be happy to hear from you.

# Pros and Cons

One Big Table refers to a facts table that includes most of its dimensional attributes as additional columns.

# Pros:

  1. Simplifies querying by reducing the need for multiple joins.
  2. Enhances query response time, notably with column encoding and without joins.

# Cons:

  1. Challenges arise in managing changing dimensions within OBT.
  2. Decisions between using the latest dimension values or vs. point-in-time values.
  3. Adding new dimensional attributes requires backfilling (not if using Apache Iceberg, etc)
    Source: Joseph M. on LinkedIn

Reply by Jonathan Neo:

Regarding one of the cons: 

Need help managing changing dimensions in OBT. 

If you’re creating an OBT on top of a Dimensional Model and you have created a fact and dimension with SCD Type 2 handling, then the OBT will inherit SCD Type 2 properties. 

This is why I prefer to build OBT models on top of dimensional models as it facilitates D.R.Y patterns, and reuse of dimension tables. 

Personally, I reserve OBT for tools that require a flat dataset (e.g. a CSV export, or a BI tool that does not support relationships like Preset). If the BI tool supports relationships (e.g. Power BI, Tableau), then I would directly consume the dimensional model.

# My Comments

The One Big Table approach, or OBT, represents a shift towards wide, denormalized models. The core ideas here are that storage is inexpensive, allowing for broad tables, and modern cloud databases have such robust computational capabilities that they can efficiently handle these extensive models. This approach streamlines the process, moving data directly from staging to the final reporting mart without intermediate layers.

However, this efficiency can come at a cost. Complex queries and extensive business logic can drive up computational expenses, particularly in large-scale enterprises. Additionally, this model can lead to data redundancy and an over-reliance on source systems, limiting flexibility in data modeling. It’s crucial to strategize the modeling of each layer to avoid data overload and maintain control over your pipelines.

In summary, while OBT offers quick setup and operational simplicity, it requires careful consideration of computational costs and strategic data modeling to avoid scalability issues and data management complexities.

Explore more in Data Modeling and Materialized Views vs. One Big Table (OBT) vs. dbt tables vs. Traditional OLAP vs. DWA - 📖 Data Engineering Design Patterns (DEDP).

# References

Origin: Data Modeling in the Modern Data Stack - YouTube, Star Schema vs. OBT for Data Warehouse Performance | Blog | Fivetran
Created 2022-10-14