🧠 Second Brain

Search

Search IconIcon to open search

Cube

Last updated Nov 18, 2024

Formerly known as Cube.js, now simply Cube on Cube.dev. Cube is a Semantic Layer that is built as an OLAP cube capabilities but includes Analytics API capabilities too with fetching data with SQL, REST, and GraphQL out of the box.

# Cube Store (Cache Layer)

In Episode 2: Headless BI with Pavel Tiunov - The Analytics Everywhere Podcast | Podcast on Spotify, Pavel Tiunov mentioned that initially, they experimented with MySQL or Postgres for the

Consequently, they developed their unique solution using DataFusion and custom coding. Their blog post details this journey:

Historically, pre-aggregations were either stored alongside source data in a database (e.g., PostgreSQL or MySQL) or in a custom-provisioned instance of the same databases for read-only or cost-ineffective data sources (e.g., AWS Athena or BigQuery). Typically, these would be asynchronously refreshed by a dedicated worker instance. Although this was a viable solution, the pre-aggregation database often became a scalability bottleneck for the analytical API.

This breakthrough in the OLAP Cache Layer is quite remarkable. A particularly insightful article on how Cube achieves sub-second query times and manages compute is RW Cube on Latency and Caching.

Further details can be found in Cube Store.

# They also have now DuckDB built in

Since DuckDB is an in-process OLAP DBMS, the integration with Cube implies bundling DuckDB together with Cube. Indeed, now every Cube Core or  Cube Cloud deployment comes with a built-in and instantly available DuckDB which has the  HTTPFS extension installed and loaded by default. With this extension, DuckDB can directly query Parquet, CSV, and JSON files over HTTPS, including files on S3-compatible object storage servers like AWS S3, Google Cloud Storage, and Cloudflare R2.


The idea:
Now, you can go from a quick analysis in a local DuckDB instance accessible to you and you only to a governed metric in a semantic layer accessible across the whole organization in a single step.
More on Introducing DuckDB and MotherDuck integrations - Cube Blog.

# Integrations / Collaborations

Cube has integrated with various other metrics layers like dbt (see an example here: Combining dbt Metrics with API, Caching, and Access Control - Cube Blog). Their focus extends to the OLAP Cache Layer, Security, and Data Governance.

# API


Image from Why API-Based Data Access is Essential for Modern Data Management.

# Visual Designer

To model data in the Logical Data Model, similar to what SAP BO with the SAP Universe had:

Introducing DuckDB and MotherDuck integrations - Cube Blog

# Embedded Analytics with MotherDuck

An Elegant Data Stack for Embedded Analytics

# Dashboards

Their initial implementation was with Superset, but now includes many others.

# Caching Layer

Cube replaced Redis with their bespoke solution, Cube Store. Details in RW Replacing Redis With Cube Store - Cube Blog.

A workflow illustrating idempotent query execution with caching is depicted here:

# Introducing Views in Cube

Introduced on (2022-10-12) as detailed in this article: RW Introducing Views for Defining and Managing Metrics - Cube Blog.

This new feature allows combining cubes, like companies and users, to create an interface showing active users.

# My Quick Take

The concept of a view layer is fascinating, reminiscent of my days creating Data Marts with Views in Oracle.

To be honest, I’m still exploring the nuances of Cube. My initial assumption was that Cube would function like data marts, akin to a singular view. However, the visualizations suggest a more intricate interface design. The idea of establishing contracts and schemas is intriguing and logical. This aspect of Cube certainly piques my interest for deeper exploration.

On a side note, I was pleasantly surprised to discover that metrics can be defined using JavaScript in Cube. While Python would have been my preference, JavaScript is a suitable choice given its integration with the frontend.

# Semantic Catalog

Announced a Semantic Catalog today 2024-06-25.

# Active Record

Active Record for Data Analytics


Origin:
References: Cube - Wiki, Cube Dev Inc
Created 2022-04-10