šŸ§  Second Brain

Search

Search IconIcon to open search

Declarative Data Stack

Last updated Feb 16, 2025

Declarative Data Stack is a term introduced in the article The Rise of the Declarative Data Stack by Mike Driscoll and myself.

# What Is a Declarative Data Stack?

A declarative data stack is a set of tools and, precisely, its configs can be thought of as a single function such as run_stack(serve(transform(ingest))) that can recreate the entire data stack.

Instead of having one framework for one piece, we want a combination of multiple tools combined into a single declarative data stack. Like the Modern Data Stack, but integrated the way Kubernetes integrates all infrastructure into a single deployment, like YAML.

We focus on the end-to-end Data Engineering Lifecycle, from ingestion to visualization. But what does the combination with declarative mean? Think of Functional Data Engineering, which leaves us in a place of confident reproducibility with little side effects (hopefully none) and uses idempotency to restart function to recover and reinstate a particular state with conviction or rollback to a specific version.

More on The Rise of the Declarative Data Stack.

Other Naming

Tobiko calls it an Integrated Data Stack.

Dagster talks about Impedance Mismatch, and Data Asset oriented orchestration: The Data Engineering Impedance Mismatch | Dagster Blog

# Tools

# Close-Sourced DDSEs

Let’s start with closed-source firstā€”one key point to note. Most of what we’ve discussed here is something that most closed-source tools have implemented in one way or another. Because they’ve built one big monolith, this is relatively straightforward and the natural thing to do.

This can be more challenging and not immediately obvious with an open-source approach and numerous integration tools. Let’s now look at tools that have successfully implemented such features.

Usually, the problem with closed-source software is that it is structured as a monolith, combining transformation logic with persisted database tables while keeping the underlying code unknown.

# Open-Source DDSEs

But even more interesting are the open-source tools I found[^1] - they are fantastic and built in the open, building in the open. Not all might be truly declarative data stacks by their definition, but they all build on top of other tools and declaratively integrate them.

SQL Compilers

SQLGlot would be a good integration to parse SQL without running. Same as SDF integration with Datafusion.

Existing Templating
Beyond tools, templating can solve some of the Jinja Template, GoLang’s template package, biGENIUS Template modules, Apache Velocity, Liquid, and many others.

# Alternatives

# What’s the difference between a Composable Data System

Composable Data Stacks or System sounds very similar. Or also Multi-Engine Data Stacks. Are all of the same, but different wording?

# Testing

We need Deterministic Simulation Testing for DDS, like libSQL does for the sqlite rewrite.

# Declarative Data Stack ENGINE

Engine is an important part, which I go into more details in Designing a Declarative Data Stack: From Theory to Practice | ssp.sh, similar to Markdown can be the code, and HackMD, GDocs and other are engines to run it.

Key distinctions: part-3-example-implementation-declarative-data-stack

Docker and Kubernetes are other engines. There are many more, see part-3-example-implementation-declarative-data-stack.

# Further Reading


Origin: Rill | The Rise of the Declarative Data Stack, The Rise of the Declarative Data Stack - Rill
References:
Created 2024-10-17