Search

Search IconIcon to open search

Dbt

Last updatedUpdated: by Simon Späti · CreatedCreated:

dbt (data build tool) allows you to establish macros and integrate other functions outside of SQL’s capabilities for advanced use cases. Macros in Jinja are pieces of code that can be used multiple times.

dbt is a pivotal component of the Modern Data Stack, dbt handles the Transformation aspect of ETL exclusively using SQL. For Python-based tasks, integration with Data Orchestrators like Dagster is necessary.

Explore the dbt Roadmap 2022 to understand their vision for integrating dbt and Python.

# Why Dbt? Data Modeling with SQL

dbt is a small database toolset that has gained immense popularity and is the facto standard for working with SQL. Why, you might ask? SQL is the most used language besides Python for data engineers, as it is declarative and easy to learn the basics, and many business analysts or people working with Excel or similar tools might know a little already.

The declarative approach is handy as you only define the what, meaning you determine what columns you want in the SELECT and which table to query in the FROM statement. You can do more advanced things with WHERE, GROUP BY, etc., but you do not need to care about the how. You do not need to watch which database, which partition it is stored, what segment, or what storage. You do not need to know if an index makes sense to use. All of it is handled by the query optimizer of Postgres (or any database supporting SQL).

Attention to putting imperative code into dbt

If we start doing SCD2 and put the how and logic of doing SCD2 into the models, then we start to divert from the declarative approach. If we build a template that can be reused across the models and separate the technical implementation from the business logic, that’s what we want.

# Downside of SQL

But let’s face it: SQL also has its downside. If you have worked extensively with SQL, you know the spaghetti code that usually happens when using it. It’s an issue because of the repeatability—no variable we can set and reuse in an SQL. If you are familiar with them, you can achieve a better structure with  CTEs, which allows you to define specific queries as a block to reuse later. But this is only within one single query and handy if the query is already long.

But what if you’d like to define your facts and dimensions as a separate query and reuse that in another query? You’d need to decouple the queries from storage, and we would persist it to disk and use that table on disk as a FROM statement for our following query. But what if we change something on the query or even change the name we won’t notice in the dependent queries? And we will need to find out which queries depend on each other. There is no Data Lineage or dependency graph.

It takes a lot of work to be organized with SQL. There is also not a lot of support if you use a database, as they are declarative. You need to make sure how to store them in git or how to run them.

# The Power of dbt

That’s where dbt comes into play. dbt lets you create these dependencies within SQL. You can declaratively build on each query, and you’ll get errors if one changes but not the dependent one. You get a lineage graph, unit tests, and more. It’s like you have an assistant that helps you do your job. It’s added software engineering practice that we stitch on top of SQL engineering.

The danger we need to be aware of, as it will be so easy to build your models, is not to make 1000 of 1000 tables. As you will get lots of errors checked by the pre-compiling dbt,  good data modeling techniques are essential to succeed.

# History

dbt started from a small consultancy firm. Originally, it started at RJMetrics in 2016 but really took off later in 2018 when Fishtown Analytics more publically showed, and the market was more ready for it. Later it renamed itself to dbt Labs.

Raised a $150M Series C.

# Fork

# Acquisition by Fivetran

They got acquired 2025-10-13 by Fivetran, see post on Data Engineering Acquisitions.

The new direction posted on the dbt coalesce conference is open as for Open Standards with Apache Iceberg - and MetricFlow has been re-opened after been closed by dbt, but was already open before by Transform.co.

More on Wrap-up of Day 1 of dbt Coalesce. Takeaways, ideas and opinion about what’s going on in data ecosystem ✨ This is my first dbt Coalesce in-person after following the event remotely for the last 3… | Christophe Blefari.

[[open-source vs ]]

# Acquiring

See Data Engineering Acquisitions for the 2025-01-14 SDF acquisition.

# Technical Features

# Dbt Fusion

Since acquisition of SDF they added new feature, but in a seperate dbt version, that is not free to use anymore.

# State-aware Orchestration

New feauture that were announced 2025-10-15 at dbt Coalesce Conference, it state-aware orchestration. It looks the same as Dagster has for years which I call data-aware orchestration.

More thoughts of mine:

My thought too, and im happy to see Dbt is catching up (tough in the closed source dbt). If its the same to data-aware, which I asume, its something dagster does since a while. I wrote about it 3 years ago - https://www.ssp.sh/blog/data-orchestration-trends/ Data Orchestration Trends- The Shift From Data Pipelines to Data Products

# Templates

# Dbt Tests

dbt tests

# Dbt Commands

dbt commands

# Dbt DELTES

dbt DELETES
Execute SQL Delete Statement in DBT | by Uyen Huynh | Medium

# Roadmap

Detailed in dbt Roadmap 2022.

# Future with the Fivetran + dbt Merger

From the

AI-Summary based on [AMA] We’re dbt Labs, ask us anything! on Reddit:

Key discussion points suggested for the AMA covered updates on dbt Core 1.11 and its future roadmap, recent advancements in AI and agentic analytics (mentioning MCP server, ADE bench, and dbt agent skills), and the status of Fusion with a query about its general availability.

Humorous questions were also included, such as the origin of a "nodes_to_a_grecian_urn" reference in their docs site and whether the team gets goosebumps when someone capitalizes \“dbt\”.

In the comments, dbt Labs addressed concerns about their long-term strategy, especially in light of the Fivetran merger and the balance between dbt Cloud’s advanced features and dbt Core’s open-source nature. dbt-Jason affirmed commitment to dbt Core as the open-source standard, promising that new commercial features wouldn’t come at the expense of core functionality. Grace Goheen further elaborated on their dedication to dbt Core and the open-source community, highlighting recent releases and efforts to improve responsiveness, including re-licensing MetricFlow and parts of Fusion under Apache 2. She also announced new team members joining the dbt Core team to bolster open-source activity.

  • User Sentiment: The sentiment is mixed, with a strong undercurrent of skepticism and direct criticism regarding the impact of the Fivetran merger, perceived abandonment of dbt Core for ELv2-licensed dbt Fusion, and high pricing of dbt Cloud. However, there are also positive comments about the product’s usefulness and specific feature requests. Some users expressed confusion and distrust about the overall product direction.
  • Answer to SQLMesh: The provided text does not directly answer the specific question about consolidation or standardization efforts between dbt and SQLMesh following the Tobiko Data acquisition. Responses from dbt Labs generally refer to the ongoing Fivetran merger and a commitment to keeping dbt Core an open-source standard, without detailing SQLMesh’s future.

Inspiring/Surprising AI summary:

  • Transparency on Open Source Commitment: dbt Labs directly addresses user concerns about the balance between dbt Core (open-source) and commercial products like dbt Cloud and Fusion. They explicitly state their commitment to dbt Core as the open-source standard and highlight re-licensing efforts (MetricFlow, parts of Fusion under Apache 2), which is a notable level of transparency in a public forum, especially amidst a merger.
  • Humor and Informal Tone: The inclusion of whimsical questions (e.g., about \“nodes_to_a_grecian_urn\” and capitalization of \“dbt\”) and CEO Tristan Handy’s comment about the \“spiciest conversation around dbt materializations\” adds a surprising, informal, and engaging element that fosters community interaction.
  • Deep Technical Engagement: The discussions delve into highly specific technical topics such as requests for an \“intermediate\” table materialization strategy and the \“write audit publish\” pattern, indicating a willingness to engage with advanced user-driven feature requests rather than just high-level marketing points.
  • AI Vision: The strong emphasis on AI and agentic analytics (MCP server, ADE bench, dbt agent skills) showcases dbt Labs’ forward-looking strategy and belief in dbt’s evolving role in the AI era.

# Alternatives

# Further Readings


Origin: