🧠 Second Brain

Search

Search IconIcon to open search

Extending SQL for Analytics

Last updated Mar 10, 2025

SQL is the origin of data. And it was extended over the years. We have geospatial capabilities and window functions and whatnot throughout the year, so why not have analytics semantics added, too?

# Cubing and Metrics

My feedback from Julian Hyde talk:

Syntax /Semantic

He is saying that SQL should be used for defining Measures, but in the way, MDX does with not repeating itself, it is more concise. But he forgot the fact, this is only possible because you did the work in SSAS and defined the relations and hierarchy, meaning you need another layer like a Semantic Layer. If not, you need to define everything inside SQL, which will be a mess.

I used templating with BiGenius and dbt, to know that anything on top of SQL gives you non-maintainable spaghetti code. Maybe SQLGlot can solve some of these? ^00ad31

Think like formatting (TO_DATE, and left joins with repeated joins), will always be difficult without a declarative definition outside SQL. But wait, SQL is declarative, why isn’t that possible? Also related to The Rise of the Declarative Data Stack.


The have views, Cube also have this context of Cube, see RW Introducing Views for Defining and Managing Metrics - Cube Blog


the planner used window functions to create that SQL query.

grain:

visible vs none visible - i remember that from MDX syntaxt

Semantic Models/Semantic Layer vs Databases:

# Pipe into SQL

Maybe related, or similar, also coming from Google, Pipe Syntax In SQL.

# Semantic Model

From the talk: About Semantic Models:

If SQL is the universal language of data, why do we author our most important data applications (metrics, analytics, business intelligence) in languages other than SQL? Multidimensional databases and languages such as MDX, DAX and Tableau LOD solve these problems but introduce others: they require specialized knowledge, complicate the data pipeline and don’t integrate well. Is it possible to define and query business intelligence models in SQL?

Apache Calcite has extended SQL to support metrics (which we call ‘measures’), filter context, and analytic expressions. With these concepts you can define data models (which we call Analytic Views) that contain metrics, use them in queries, and define new metrics in queries.

In this talk by the original developer of Apache Calcite, we describe the SQL syntax extensions for metrics, and how to use them for cross-dimensional calculations such as period-over-period, percent-of-total, non-additive and semi-additive Measures. We describe how we got around fundamental limitations in SQL Semantics, and approaches for optimizing queries that use metrics.

# Talk Summary (AI)

By Claude based on transcript:

  1. Evolution of SQL: SQL has grown significantly since its inception, with extensions added for various purposes like geospatial awareness and document storage. However, analytics capabilities have remained separate from SQL databases for a long time.
  2. Limitations of current BI tools: Business Intelligence (BI) tools often sit on top of SQL databases, providing features like semantic models, visualizations, and governance. However, this separation leads to issues such as data lag, maintenance efforts for multiple BI tools, and the need for complex SQL queries for analytical tasks.
  3. Proposed SQL extensions: The speaker suggests extending SQL to incorporate analytics capabilities directly into the language. This includes:
    • Adding measures to SQL tables
    • Allowing queries to return measures (expressions) instead of just values
    • Introducing new syntax like ‘AS MEASURE’, ‘AGGREGATE’, and the ‘@’ operator for cross-dimensional calculations
  4. Benefits of the proposed extensions:
    • More concise and readable queries for complex analytical tasks
    • Ability to reuse calculations and build richer semantic models within SQL
    • Improved performance by moving processing closer to the data
    • Elimination of data lag issues
  5. Redefinition of measures: The speaker proposes a broader definition of measures, encompassing not just aggregations but any column that can be evaluated in any context, including complex calculations and even forecasts.
  6. Semantic Modelss in databases: The speaker argues for incorporating semantic models directly into SQL databases, allowing for centralized definitions of important calculations and metrics across different BI tools and queries.
  7. SQL as an API: The speaker envisions SQL becoming an API for data, allowing different data systems to communicate using a higher-level language that understands analytics concepts.
  8. Backwards compatibility: The proposed changes aim to extend SQL capabilities without breaking existing functionality, allowing users to adopt these features gradually.
    1. Relates to my article about Analytics API
  9. Future of analytics: By bringing analytics capabilities directly into SQL, the speaker hopes to make complex analytical queries as concise and intuitive as natural language questions, bridging the gap between technical SQL and business user needs.

These proposed extensions to SQL aim to make analytics more integrated, efficient, and accessible within database systems, potentially transforming how we approach data analysis and business intelligence.

# Use cases

# Further Reading


Origin: Cubing and Metrics in SQL, Oh My! - YouTube, Julian Hyde
References: Metrics Layer, Metric Trees,
Created 2024-08-27