🧠Second Brain
Search
Cube
Formerly known as Cube.js, now simply Cube on Cube.dev. Cube is a Semantic Layer that is built as an OLAP cube capabilities but includes Analytics API capabilities too with fetching data with SQL, REST, and GraphQL out of the box.
# Cube Store (Cache Layer)
In Episode 2: Headless BI with Pavel Tiunov - The Analytics Everywhere Podcast | Podcast on Spotify, Pavel Tiunov mentioned that initially, they experimented with MySQL or Postgres for the
Consequently, they developed their unique solution using DataFusion and custom coding. Their blog post details this journey:
Historically, pre-aggregations were either stored alongside source data in a database (e.g., PostgreSQL or MySQL) or in a custom-provisioned instance of the same databases for read-only or cost-ineffective data sources (e.g., AWS Athena or BigQuery). Typically, these would be asynchronously refreshed by a dedicated worker instance. Although this was a viable solution, the pre-aggregation database often became a scalability bottleneck for the analytical API.
This breakthrough in the OLAP Cache Layer is quite remarkable. A particularly insightful article on how Cube achieves sub-second query times and manages compute is RW Cube on Latency and Caching.
Further details can be found in Cube Store.
# They also have now DuckDB built in
Since DuckDB is an in-process OLAP DBMS, the integration with Cube implies bundling DuckDB together with Cube. Indeed, now every Cube Core or Cube Cloud deployment comes with a built-in and instantly available DuckDB which has the HTTPFS extension installed and loaded by default. With this extension, DuckDB can directly query Parquet, CSV, and JSON files over HTTPS, including files on S3-compatible object storage servers like AWS S3, Google Cloud Storage, and Cloudflare R2.
The idea:
Now, you can go from a quick analysis in a local DuckDB instance accessible to you and you only to a governed metric in a semantic layer accessible across the whole organization in a single step.
More on
Introducing DuckDB and MotherDuck integrations - Cube Blog.
# Integrations / Collaborations
Cube has integrated with various other metrics layers like dbt (see an example here: Combining dbt Metrics with API, Caching, and Access Control - Cube Blog). Their focus extends to the OLAP Cache Layer, Security, and Data Governance.
# API
-
Why API-Based Data Access is Essential for Modern Data Management - Cube Blog:
- SQL API - Delivers data over the Postgres-compatible protocol to BI tools.
- REST APIÂ - Delivers data over the HTTP protocol to embedded analytics applications.
- GraphQL API - Delivers data over the HTTP protocol to GraphQL-enabled data applications.
- MDX APIÂ - Provides a native interface for Microsoft Excel connections via the XMLA standard.
- AI APIÂ - Provides a standard interface for interacting with large language models (LLMs) as a turnkey solution for text-to-semantic layer queries.
Image from
Why API-Based Data Access is Essential for Modern Data Management.
# Visual Designer
To model data in the Logical Data Model, similar to what SAP BO with the SAP Universe had:
Introducing DuckDB and MotherDuck integrations - Cube Blog
# Embedded Analytics with MotherDuck
An Elegant Data Stack for Embedded Analytics
# Dashboards
Their initial implementation was with Superset, but now includes many others.
# Caching Layer
Cube replaced Redis with their bespoke solution, Cube Store. Details in RW Replacing Redis With Cube Store - Cube Blog.
A workflow illustrating idempotent query execution with caching is depicted here:
# Introducing Views in Cube
Introduced on (2022-10-12) as detailed in this article: RW Introducing Views for Defining and Managing Metrics - Cube Blog.
This new feature allows combining cubes, like companies and users, to create an interface showing active users.
# My Quick Take
The concept of a view layer is fascinating, reminiscent of my days creating Data Marts with Views in Oracle.
To be honest, I’m still exploring the nuances of Cube. My initial assumption was that Cube would function like data marts, akin to a singular view. However, the visualizations suggest a more intricate interface design. The idea of establishing contracts and schemas is intriguing and logical. This aspect of Cube certainly piques my interest for deeper exploration.
On a side note, I was pleasantly surprised to discover that metrics can be defined using JavaScript in Cube. While Python would have been my preference, JavaScript is a suitable choice given its integration with the frontend.
# Semantic Catalog
Announced a Semantic Catalog today 2024-06-25.
# Active Record
Active Record for Data Analytics
Origin:
References: Cube - Wiki, Cube Dev Inc
Created 2022-04-10