Search

Search IconIcon to open search

Semantic Layer

Last updated by Simon Späti

A Semantic Layer acts as an intermediary, translating complex data into understandable business concepts for users. It bridges the gap between raw data in databases (like sales data with various attributes) and actionable insights (like revenue per store or popular brands). This layer helps business users access and interpret data using familiar terms, without needing deep technical knowledge.

The Semantic Layer serves as a translator between various data presentation layers (Business Intelligence, Notebooks, data apps) and data sources. It integrates data sources, models metrics, and connects with data consumers, translating metrics into languages like SQL, REST, or GraphQL.

A Semantic Layer defines key business metrics (like “active” users or “paying” customers) once company-wide, eliminating inconsistencies across different tools. This centralization and standardization of definitions ensure uniform understanding and reporting across the organization.

The Metrics Layer, a subset of the Semantic Layer, was first introduced by SAP BO in 1991. The Rise of the Semantic Layer provides more on this. Kimball Group defined it in 2013.

Headless BI vs. Semantic Layer

Headless BI, often used interchangeably with a Semantic Layer, can be considered a practical implementation of the latter. The term’s origins are traced to a LinkedIn Comment.

Insight from Maxime Beauchemin ( Podcast)

The Semantic Layer is like a restaurant menu: you know what you’re ordering, but not how it’s made. This layer maps metrics to physical tables and can range from minimal modifications (thin layer) to encompassing transformation logic atop physical tables (thick layer).

# Semantic Layer Definition

My definition goes something along the lines of:

A semantic layer acts as an intermediary, translating complex data into understandable user business concepts. It bridges the gap between raw data in databases (such as sales data with various attributes) and actionable insights (such as revenue per store or popular brands). This layer helps business users access and interpret data using familiar terms without needing deep technical knowledge. Universal Semantic Layer: Capabilities, Integrations, and Enterprise Benefits

Or by Julian Hyde and this is how SQL could express ‘measures’ to bridge the gap:

A semantic layer, also known as a metrics layer, lies between business users and the database, and lets those users compose queries in the concepts that they understand. It also governs access to the data, manages data transformations, and can tune the database by defining materializations.
Like many new ideas, the semantic layer is a distillation and evolution of many old ideas, such as query languages, multidimensional OLAP, and query federation. Building a semantic/metrics layer using Apache Calcite

See also Data Engineering Whitepapers

There’s also now a paper: https://arxiv.org/pdf/2406.00251

# Why a Semantic Layer

In my previous company, we developed an Analytics API, similar to what Cube does, but with orchestration as a key component. We switched from SSAS to Druid, a Modern OLAP solution, to handle diverse business metrics queries from Tableau, Notebooks, and our Web App. The inability of Druid to store queries as SSAS led us to create this Semantic Layer or Analytics API. The Semantic Layer’s real advantage lies in its ability to define and automate metrics as code, offering a universal, open-source, or open-standard approach to handling business metrics.

More of my thoughts are elaborated in Building an Analytics API with GraphQL and in The Rise of the Semantic Layer: Metrics On-The-Fly. More Slack Convo.

# Example of How Semantics Look are defined

See Semantic Layer Measure Defininition Examples.

# Different Types of Semantic Layers

From What Is a Semantic Layer? | GoodData:

  • Semantic layer in a data warehouse: The main purpose of a data warehouse is to provide a centralized data source for the whole organization. It is designed to be a single source of truth for different departments, user groups, and use cases. The structure of data in the warehouse can be complex and technical, which makes it difficult for users to access the information they need. As a result, business users often extract portions of this data into BI tools, creating localized semantic layers that can contribute to semantic layer spread.
  • Semantic layer within data pipelines: When constructing data pipelines (the process of adding data from various sources to a data warehouse), data engineers input a semantic layer in the code. This layer helps to name and organize the different parts of the data models, such as tables and attributes.
  • Semantic layer in Business Intelligence (BI) and data analytics: This type of semantic layer defines business concepts and the relationships between them. It also defines metrics and calculations that can be used for analysis and reporting through different users and user groups for specific business use cases.
  • Universal semantic layer: There is a connection between raw data and the different tools for users to analyze their data (such as BI and AI/ML tools, management tools, and business applications). A universal semantic layer doesn’t focus on a specific business use case and needs to cover company-wide requirements.

# History

The Evolution of the Semantic Layer (and related for context MDM, dbt/Jinja):

  • 1991: SAP BusinessObjects Universe and BI semantic layer
  • 1997: SSAS and MDX with their logical modeling layer with MDX, define business metrics and dimensions in a structured way (1997)
  • 2008: Master Data Management (MDM) (with MDS from Microsoft in 2008) 
    • Business entities (customers, products, locations): MDM focuses on core business entities (customers, products, locations) while semantic layers typically focus on metrics and dimensions.
    • Single Source of Truth: MDM with master data records and SL for Metrics
    • Data Governance: Both approaches involve managing and governing data definitions
  • 2013: Kimball discussed the concept of a semantic layer in  #158 Making Sense of the Semantic Layer 
  • 2016: Maturing BI tools with an integrated semantic layer such as Tableau, TARGIT, PowerBI, Apache Superset, etc. have their own metrics layer definition
  • 2018: Jinja templates and dbt eroding the transformation layer into a semantic layer
    • Not by definition, but the dbt declarative SQL definitions, defining all your DWH, in a way is an early semantic layer. In a way creating single source of truth (although potentially many different 😅).
    • Like if you think what the old SAP BO Universe was, it was a logical model of SQL definitions. In a way, dbt definitions are the same. You do not have the visual designer, except you run dbt docs. That’s at least my thought and how it relates to the history overall.
    • I think the line is fine. You can define measures and dimensions in dbt as SQL and add stuff with Jinja, but maybe too far stretched to call it semantics. BUT, it is declarative with SQL :) Full Discussion
  • 2019: Looker and LookML popularized as the first real semantic layer
  • 2022: Modern Semantic Layer, Metric Layer or Headless BI tools such as MetriQL, MetricFlow, Minerva, dbt arose with the explosion of data tools (BI tools, notebooks, spreadsheets, machine learning models, data apps, reverse ETL, …)

more on The Rise of the Semantic Layer | ssp.sh

# Semantic Layer Tools

See also Modern Semantic Layers and Business Intelligence, Semantic Layer, Modern OLAP, Data Virtualization - 📖 Data Engineering Design Patterns (DEDP).

# Choosing

Burak Karakan said on LinkedIn:

I have done this, and it was just fine for simple use cases.

I keep the macros in .sql files as the source of truth and push them to GitHub. Then, I use Lorenzo Mangani’s webmacro extension to load them into DuckDB memory at runtime.

The benefit of Semantic Models being just SQL files is that you can have static data and can implement your JOIN logic or measures in a native SQL way. In my opinion, this is superior to any YAML/Python approach since it removes the middleware, which makes things less transparent and customizable.

Then we have the duckdb-fastapi adapter to expose metrics in a REST API for applications. For integration with BI tools and SQL clients, we expose DuckDB via the Snowflake wire protocol, which enables any tool to access metrics via SQL.

# When not to Use a Semantic Layer

Don’t use a semantic layer when you start out. A user said it will slow down innovation.

True, a (modern) semantic layer is usually not something you start with, except you use it as federated, universal API for your distributed backends (multiple DBs, source systems etc), but the more common one is once you have a larger organization where KPIs are not defined across, or people don’t know where to get the “correct” definition.

But when you start out, use SQL, MVs, dbt, and persist all the tables. Or build your semantics in an OLAP cube or BI tool; if you only have one, no need to add an extra tool.

In Why Semantic Layers Matter — and How to Build One with DuckDB, I explain that the simplest and most straightforward reasons are:

  • You’re just getting started with analytics and only have one consumer, meaning you only have one way of showcasing analytics data, for example, a BI tool, notebooks, or a web app, but not multiple ways of presenting data. This means you don’t apply calculated logic in different places.
  • You don’t have extensive business logic that you query ad hoc; you have simple counts, SUMs, or averages.
  • You preprocess all your metrics as SQL transformations into physical tables, meaning your downstream analytics tools get all metrics preprocessed and aggregated, and filtering is fast enough.

# Use Cases of Semantic Layer


From the great talk by Brian Bickell from Cube: Semantic Layer Deep Dive w/ Brian Bickell (Cube)

# Similar to MVC?

Idea
Could the Semantic Layer be likened to the MVC (Model View Controller) model? In Cube (OLAP), views are created similar to DB views, with the model and component handling the rest. This is akin to Convergent Evolution between Semantic Layer and MVC.

Wrote more about this on Exploring the Semantic Layer Through the Lens of MVC - Cube Blog.

# Knowledge Graphs, LLMs with Semantic Layer

# Why not Define Measures within SQL?

This is what Julian Hyde brought up in his talk Extending SQL for analytics, similar to what MDX Studio did to SSAS.

# How Does it Compare to a Data Warehouse Automation Tool?

See BiGenius-X vs dbt semantic layer etc. and Data Warehouse Automation and my book:

# Semantic Data Model Layering

Building Blocks: Advanced Semantic Data Model Layering - YouTube

# Other Resources

# Pedram Navid

Post LinkedIn / Pedram Navid on LinkedIn: #dbt #metrics | 38 comments: While dbt is building a metrics layer, the question still remains whether a metrics layer outside of BI will ever gain wide enough adoption. Jacob Matson rightly points out that Looker, Thoughtspot, Power BI, and Transform all have a metrics layer tightly integrated within BI, and they are good enough.

The challenges dbt has are that its implementation is pretty bad (no one wants to write Jinja Template in yaml), it lacks features critical for it to be useful (like joins), and it’s not clear that we need widespread access to metrics across tools outside of BI.

# Michael Driscoll

replies on The metrics layer may not actually need to be a layer, it could get baked into the SQL standard. (Extending SQL for analytics)

Databases could implement it. And every BI tool could query metrics directly.

Metrics layers are just aggregate expressions with some metadata.

Let them live in SQL.

If  DuckDB Labs can introduce ‘GROUP BY ALL’ and proclaim that ‘FROM foo;’ is valid SQL, surely they could bring us aggregate awareness too.

# Artyom Keydunov & Pavel Tiunov - Cube

Semantic Layer and its relation to MVC (Model View Controller) pattern, popularized in Ruby On Rails.

The concept of a Semantic Layer shares similarities with the MVC (Model-View-Controller) model, particularly as popularized by Ruby on Rails through its Active Record pattern. In a conversation with Artyom Keydunov & Pavel Tiunov from Cube.dev, Artyom drew parallels between the two:

  • Just as the MVC model focuses on decoupling data, the Cube Semantic Layer serves a similar purpose. In Cube, this decoupling is evident in how they create views, akin to database views, while the model and component handle the rest.
  • The Semantic Layer can be likened to the Active Record in Ruby on Rails. Active Record is an implementation of the ORM (Object-Relational Mapping) pattern, which abstracts and simplifies database interactions. Similarly, the Semantic Layer abstracts complex data structures, making them more understandable and accessible to business users.
  • At its core, the Semantic Layer represents the Logical Data Model in Data Modeling, serving as an intermediary between raw data and its representation to end-users.

This comparison underscores the universality of certain design patterns across different domains and technologies. Whether it’s web development with Ruby on Rails or data engineering with tools like Cube, the principles of abstraction, simplification, and decoupling remain consistent.

# Abhi Sivasailam: LookML is Still the Best SL?

LookML was the GOAT and every other “modern” semantic layer spec is a step backwards. How is this possible? Abhi Sivasailam

I believe if you only need to integrate into one BI tool, which LookML did, it’s much easier compared to open source. They must integrate not only in multiple BI tools, but also with notebooks, web apps, and even Excel. But in terms of syntax and implementation, they made many good decisions.

To me, the perfect “not so modern” semantic layer was the SAP BO universe. Still one of the best implementations to this day, but again, it still only had to support one BI tool.

# More Perspectives

Explore more about the Semantic Layer:

I wrote a deep dive into The Rise of the Semantic Layer | ssp.sh, in case you want to know more.


Origin: Metrics Layer
References: The Rise of the Semantic Layer, Modern Semantic Layer, Ontology
Created 2022-09-29