SQLMesh

Don’t hack custom scripts or use half-baked tools. SQLMesh ensures accurate and efficient data pipelines with the most complete DataOps solution for transformation, testing, and collaboration

Similar to dbt but tries to understand more of the semantics of the SQLs, where dbt is just stitching together blobs of SQL, SQLMesh tries to understand the SQL statements more. That allows them to do the translation from Duckdb-SQL to BigQuery-SQL, because of that awareness.

Blog about it: Tobiko Data - Efficient Development with the SQLMesh Browser UI

From GitHub - TobikoData/sqlmesh: Efficient data transformation and modeling framework that is backwards compatible with dbt.

# Docs

twitter.com/Captaintobs/status/1653486832590684161

# Plans (Environments)

SQLMesh concepts with plans that apply to different environments (prod, dev) are elegant. Even fetchdf is integrated into the CLI.

Also, on the right, you see SQLMesh auto-detecting the new columns as non-breaking and simply applying the (virtual) changes y. x.com

# Virtual Environments

Check more at Virtual Data Environments.

# They have Column-Level Lineage

Tobiko Data - Column level lineage for dbt

Also, comments can be lineaged, with just adding a comment on previous column.

# Installation on top of dbt

Using SQLMesh on dbt Projects for Column Level Lineage 🚀-

# Semantic Understanding

SQLMesh actually understands the SQL you write and improves developer productivity by finding issues at compile time. Built-in column-level lineage provides a deeper understanding of your data model and transpilation makes it easy to run your SQL across multiple engines.

More on Semantic SQL and Semantic Models.

# Using SQLGlot expressions

SQLMesh automatically parses strings returned by Python macro functions into SQLGlot expressions so they can be incorporated into the model query’s semantic representation. Functions can also return SQLGlot expressions directly.

For example, consider a macro function that uses the BETWEEN operator in the predicate of a WHERE clause. A function returning the predicate as a string might look like this, where the function arguments are substituted into a Python f-string:


`from sqlmesh import macro @macro() def between_where(evaluator, column_name, low_val, high_val): return f"{column_name} BETWEEN {low_val} AND {high_val}"`

The function could then be called in a query:


`SELECT a FROM table WHERE @between_where(a, 1, 3)`

And it would render to:


`SELECT a FROM table WHERE a BETWEEN 1 and 3`

Alternatively, the function could return a SQLGLot expression equivalent to that string by using SQLGlot’s expression methods for building semantic representations:


`from sqlmesh import macro @macro() def between_where(evaluator, column, low_val, high_val): return column.between(low_val, high_val)`

The methods are available because the column argument is parsed as a SQLGlot Column expression when the macro function is executed.

Column expressions are sub-classes of the Condition class, so they have builder methods like between and like.

See more at SQLMesh macros - SQLMesh.

# Integrations

# Features

A thoughtful write-up from Reddit: SQLMesh versus dbt Core:

If you want my honest opinion, these are not the main reasons to use SQLMesh (some of them are indirectly, but a lot of them don’t matter that much, especially for smaller teams / data), but there are a lot of other great reasons outside of this:

Python macros (better than jinja + they are testable and type checkable)

Compilation is more than just basic ref and jinja parsing but actually validates the SQL, reducing the likelihood of invalid SQL making it out of dev and into production (you should have a staging environment etc… ideally, but we’ve all worked in some nasty situations where we don’t for some reason + even with that, it’s nice to be able to push out a change without having as much fear regarding stupid mistakes)

Breaking changes and potential needs for backfills are made obvious to users in cases where it may not be so obvious (e.g., changing value of a column or adding a duplicate column name in table ‘a’ upstream and having a “SELECT a., b.” in a downstream model - this would not get caught in DBT)

CTEs can be unit tested (this is awesome for models that are large and have a lot of complex SQL) + if you choose to use mostly or only ‘standard’ (sqlglot convertible) SQL, then for many cases you can actually run almost all of your tests using DuckDB rather than hitting the warehouse

Native table diffing between prod and dev (extremely convenient and extremely valuable for understanding the effects of your change between prod and dev environment). It even works between different engines / warehouses which is awesome for migrations (never got the chance to use it, but I certainly would have liked to have had it for my recent one from redshift to snowflake)

Native batching for backfills - DBT recently added something like this, but it’s not as robust, and I can attest to having fought with DBT in this in the past when needing to backfill a model for 1+ year and wanting to pick the bucket size to be like 7 days at a time and not one mega query of 2 years backfill in one go, which may not be possible (especially in things like bigquery where you can’t just increase the warehouse size for a backfill)

Native understanding of partitioning. Just makes life easy when working with something like Dagster, especially if you are dealing with lots of different types of partitioning (especially intraday). It’s handling of backfills and making sure that models have ran completely means you can just run it and know that once it is done, that everything is up to date and complete (rather than having to do a bunch of extra hacky checks all the time, which are just a lot of mental overhead and boilerplate which are often more error prone in my experience)

SQLMesh also seems en route to start supporting multi-engine repos for things that have the same catalog, which means that potentially if you had an iceberg catalog perse, then you could have a traditional warehouse + Trino for example which could open up huge number of possibilities for federated queries (this is very new and still needs to be ironed out, but something like this hasn’t even begun to be brought up seriously on the DBT core side to my understanding)

# History of SQLMesh

! 2025-01-15: Tobiko - Tobiko Acquires Quary: Evolving the Gold Standard for Data Engineers - acquisition of Quary
- see Data Engineering Acquisitions
2025-09-03: Acquired by Fivetran: Fivetran Acquires Tobiko Data to Power the Next Generation of Advanced, AI-Ready Data Transformation | Press | Fivetran ^32b5a1

# Joining Linux Foundation

2026-03-12: SQLMesh joined the Linux Foundation, see PR

Comment on Reddit to speculate what’s the future of SQLMesh:

It is a curious move indeed, in a way I feel sad to see a great competitor go through a path like this.

I guess it was kinda obvious that Fivetran would not end up running both dbt and swlmesh as part of their product, and it does seem like sqlmesh was used as leverage for the dbt acquisition by Fivetran. Fivetran could have invested further into sqlmesh to make it a bigger and stronger competitor to dbt, and they chose not to.

The way I read the situation now is that sqlmesh will be developed outside the Fivetran umbrella. It could be utilized to get a bit more friendly vibe than Fivetran itself since the community don’t seem to be a big fan of them, and try to get in anywhere while they can. Another alternative could be that Fivetran leadership had to make a decision on what to prioritize and they might have picked dbt over sqlmesh. I have also noticed some important members of the team leaving Fivetran recently, which signals a similar pattern.

Regardless, I have utmost respect for Toby and the team for what they have built, they have definitely pushed the space forward and contributed great ideas. Looking forward to seeing what they’ll do next. Sqlmesh joined linux foundation . What it means : r/dataengineering

# Alternatives

# dbt vs SQLMesh

Just some notes and comments regarding dbt (and dbt fusion) vs. SQLMesh as discussed:

dbt just announced a new closed-source version of it, called dbt Fusion (i have some notes here). Basically dbt core (the open source version) will probably be unmaintained over time by dbt as all new features go into the dbt fusion engine, which needs a paid dbt cloud.
dbt fusion is similar to SQLMesh, Rust based, super fast and does compiling features such as checking data types, or data loads ahead of runtime where as the dbt core does not have these features and you have to run the pipeline to get the errors (more expensive, takes longer)
Due to that, SQLMesh has a great momentum. They also created and use SQLGlot (open-source transpiler) that can read almost any SQL syntax and translate in the others.
SQLMesh: Unless you use the cloud version of them, it’s very CLI heavy, which is good if you want to run it in Kestra or embedded, but less if you want to have people working on it. But they column-level lineage, plans for environments and semantic understanding that is quite sophisticated.

I have used dbt much more than SQLMesh, and dbt is for sure the market leader. But if you start out, it’s a really good idea to check out SQLMesh too, as you are doing.

Origin: dlt and Data Engineering Podcast: Eliminate The Overhead In Your Data Integration With The Open Source dlt Library