Declarative Data Stack

Last updatedUpdated: May 22, 2026 by Simon Späti · CreatedCreated: Oct 17, 2024

Declarative Data Stack is a term introduced in the article The Rise of the Declarative Data Stack by Mike Driscoll and myself.

Below you find more about what it means, and what declarative data stacks are out there already.

# What Is a Declarative Data Stack?

A declarative data stack is a set of tools and, precisely, its configs can be thought of as a single function such as run_stack(serve(transform(ingest))) that can recreate the entire data stack.

Instead of having one framework for one piece, we want a combination of multiple tools combined into a single declarative data stack. Like the Modern Data Stack, but integrated the way Kubernetes integrates all infrastructure into a single deployment, like YAML.

We focus on the end-to-end Data Engineering Lifecycle, from ingestion to visualization. But what does the combination with declarative mean? Think of Functional Data Engineering, which leaves us in a place of confident reproducibility with little side effects (hopefully none) and uses idempotency to restart function to recover and reinstate a particular state with conviction or rollback to a specific version.

More on The Rise of the Declarative Data Stack.

Other Naming

Tobiko calls it an Integrated Data Stack.

Dagster talks about Impedance Mismatch, and Data Asset oriented orchestration: The Data Engineering Impedance Mismatch | Dagster Blog

# Why you need a Declarative Data Stack?

Why do you need an declarative data stack or Open Enterprise Data Platform, you might ask?

Data grows yearly more than the entire lifetime before. There is a growing need to make sense of more data. In the old days, you had a single vendor solution: think of SAP or Oracle. These days, new SaaS and open-source tools products created daily, specializing in a tiny little niche. So why would you need another platform?

You want the best of both worlds. You want Open-Source not to be locked in and to use the strongest, collaboratively created tools in the open. People worldwide can fix a security bug in minutes or fix it yourself—compared to an extensive vendor where you solely rely on their update cycle.

The downside of open-source is that there are a lot of bugs, missing features, and independent tools. That’s precisely where HelloDATA BE comes into play. We are building the missingplatform that combines the best-of-breed open-source technologies into a single portal, making it enterprise-ready.

# How to balance flexibility with the simplicity

Flexibility is one disadvantage of DDS as you may need to implement the interface first to respect new functions in your configs. Therefore, if you require full flexibility, an imperative approach is preferable.

I’d say start somewhere, try it out, feel it, and adjust. If you know you’re going to throw it away, there’s no need for a DDS. However, if you know you are at the enterprise level, you most probably want the simplicity and also the restrictive nature of DDS. asked on LI

# Tools

# Close-Sourced DDSEs

Let’s start with closed-source first—one key point to note. Most of what we’ve discussed here is something that most closed-source tools have implemented in one way or another. Because they’ve built one big monolith, this is relatively straightforward and the natural thing to do.

This can be more challenging and not immediately obvious with an open-source approach and numerous integration tools. Let’s now look at tools that have successfully implemented such features.

Ascend: The platform automates up to 90% of repetitive data tasks using their DataAware Automation Engine.
Palantir Foundry: One of the first lakehouse implementations before the term was coined. Enables real-time collaboration between data, analytics, and operational teams through a common logic data lake layer
Find more on Closed-Source Data Platforms and a fantastic read on composable data stacks on a new frontier by Voltron Data.
Y42: Mission Control for Your Data Pipelines.

Usually, the problem with closed-source software is that it is structured as a monolith, combining transformation logic with persisted database tables while keeping the underlying code unknown.

# Open-Source DDSEs

But even more interesting are the open-source tools I found[^1] - they are fantastic and built in the open, building in the open. Not all might be truly declarative data stacks by their definition, but they all build on top of other tools and declaratively integrate them.

DataForge: Write functional transformation pipelines by leveraging software engineering principles. It does not have a visualization tool that focuses on transformation[^3].
Starlake: Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Dashtool: A Lakehouse build tool that builds Iceberg tables from declarative SQL statements and generates Kubernetes workflows to keep these tables up-to-date. It handles Ingestion, Transformation, and Orchestration. Written in Rust and uses Datafusion.
BoilingData: A local-first data processing native application designed for rapid data pipeline development. Enables data engineers to build and test pipelines quickly using tools like DuckDB, dbt, and dlt.
HelloDATA BE: An enterprise data platform built on open-source tools based on the modern data stack. It uses state-of-the-art tools such as dbt for data modeling with SQL and Airflow to run and orchestrate tasks, Superset to visualize the BI dashboards, and JupyterHub for data science tasks. It includes multi-tenancy, full authentication, and authorization, which are handled with a single web portal.
GitHub Actions: This is the simplest version for building a declarative data stack. A deploy.yaml script could be a simple DDS config. GitHub also has an engine that converts and runs on Docker-runners. So, in a way, it’s another engine implementation, and maybe we could take some configs based on that? (see image at github-actions-dds.webp)
Datacoves: The platform helps enterprises solve data analytics challenges with managed dbt, Airflow, and VS Code, adopting best practices. This approach avoids negotiating multiple SaaS contracts and reduces consulting costs without compromising data security.
Datadex: Serverless and local-first Open Data Platform.
BigFunctions: A framework to build a governed catalog of powerful BigQuery functions, SQL first approach. Ingesting, advanced data transforms, and serving data on a data app with a single SQL query.
Hoptimator: Multi-hop declarative data pipelines.
Bruin: Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Iasql:: Cloud Infrastructure as data in PostgreSQL.
SQL Declarative-Data-Stack such as BigFunctions.

# Templates

These are not actual declarative stack, but templates to deploy a data stack.

Boring Data: An AWS + Snowflake Data Stack Template by Julien Hurault.
- Documentation at docs.boringdata
Nexus-Stack: One-command deployment: Hetzner + Cloudflare Zero Trust + Docker by Stefan Koch. Acompanied blog post at Nexus Stack: Your Data. Your Rules. Your Flow.

Existing Templating

Beyond tools, templating can solve some of the Jinja Template, GoLang’s template package, biGENIUS, Data Warehouse Automations (DWA), Template modules, Apache Velocity, Liquid, and many others.

# Compilers and Frameworks

SDF: Similar to DataForge, built on Rust and Datafusion. Tries to be the Typescript for SQL, creating faster development cycles and reliable results with a powerful compiler.
SQLMesh: An efficient data transformation and modeling framework that has compiler capabilities built-in with SQLGlot, a Python SQL parser and a transpiler.