Entity-Centric Data Modeling (ECM)
A new term was introduced by Maxime Beauchemin . Entity-centric data modeling (ECM), is a new approach to data modeling for analytics that elevates the core idea of an “entity” (i.e.: user, customer, product, business unit, ad campaign, etc.) at the very top of things.
Max recommends Snapshotting over SCD2 (for several reasons, see Introducing Entity-Centric Data Modeling for Analytics), but mainly for simplicity and for the tradeoff of more storage of dimensions which are usually small. But with ECM, he wants to add metrics directly to the dimensions, e.g. customers with
weekly_active_users can be queried with
visits.7d >= 3. These columns like
visits.7d are getting updated on each snapshot, making it easy to query.
It’s not a Semantic Layer, as Max wrote about:
“ The Case for Dataset-Centric visualization”, the idea being that visualizations/BI tools work best with simple tabular datasets as opposed to complex semantic layers or sets of more normalized datasets.
# Highlights from the Article
Key Points from Introducing Entity-Centric Data Modeling for Analytics by Max:
- Anchors on entities and brings metrics into dimensions
- Assumes familiarity with dimensional modeling, feature engineering, normalization/denormalization, and analytics
- ECM aligns with people’s mental model of data and tabular datasets
- Enriching important entities with metrics and data structures
- ECM addresses multi-factual analysis of entities
- Simplifies complex queries for segmentation, cohort creation, and complex classification
- Entity-centric model enriched with metrics and data structures
- Feature engineering involves denormalizing facets of an entity
- Techniques: time-bound metrics, dimensional snapshot, complex data structures
- Snapshotting dimensions for easier management, point-in-time querying, and time-series analysis
- Entity-centric datasets allow for intuitive querying and clear guarantees
- Challenges: circular dependencies in Directed Acyclic Graphs (DAGs), wide tables
- Solutions: vertical partitioning, logical vertical partitioning, using views
Origin: Introducing Entity-Centric Data Modeling for Analytics