🧠 Second Brain
Search
Entity-Centric Data Modeling (ECM)
A new term was introduced by Maxime Beauchemin . Entity-centric data modeling (ECM), is a new approach to data modeling for analytics that elevates the core idea of an “entity” (i.e.: user, customer, product, business unit, ad campaign, etc.) at the very top of things.
Essentially, the entity-centric model is to push it further by enriching those dimensions with metrics and data structures, combining Dimensional Modeling with Feature Engineering.
Max recommends Snapshotting over SCD2 (for several reasons, see RW Introducing Entity-Centric Data Modeling for Analytics), but mainly for simplicity and for the tradeoff of more storage of dimensions which are usually small. But with ECM, he wants to add metrics directly to the dimensions, e.g. customers with weekly_active_users
can be queried with visits.7d >= 3
. These columns like visits.7d
are getting updated on each snapshot, making it easy to query.
It’s not a Semantic Layer, as Max wrote about:
“ The Case for Dataset-Centric visualization”, the idea being that visualizations/BI tools work best with simple tabular datasets as opposed to complex semantic layers or sets of more normalized datasets.
# Highlights from the Article
Key Points from RW Introducing Entity-Centric Data Modeling for Analytics by Max:
- Anchors on entities and brings metrics into dimensions
- Assumes familiarity with dimensional modeling, feature engineering, normalization/denormalization, and analytics
- ECM aligns with people’s mental model of data and tabular datasets
- Enriching important entities with metrics and data structures
- ECM addresses multi-factual analysis of entities
- Simplifies complex queries for segmentation, cohort creation, and complex classification
- Entity-centric model enriched with metrics and data structures
- Feature engineering involves denormalizing facets of an entity
- Techniques: time-bound metrics, dimensional snapshot, complex data structures
- Snapshotting dimensions for easier management, point-in-time querying, and time-series analysis
- Entity-centric datasets allow for intuitive querying and clear guarantees
- Challenges: circular dependencies in Directed Acyclic Graphs (DAGs), wide tables
- Solutions: vertical partitioning, logical vertical partitioning, using views
Origin: Introducing Entity-Centric Data Modeling for Analytics
References:
Created 2023-04-17