Search
Data Engineering Acquisitions
Consolidation in the Data Engineering market is happening quickly. Tools from the Modern Data Stack get unified into bigger Data Platforms. This note highlights the latest acquisitions across data engineering. It serves as an overview of the latest consolidations.
Find attached the acquisition overview from 2022 to today.
gantt
title Data Engineering Acquisitions Timeline 2022-2025
dateFormat YYYY-MM
axisFormat %b %Y
section Fivetran + dbt
dbt → MetricFlow :milestone, 2023-02, 0d
dbt → SDF :milestone, 2025-01, 0d
Fivetran → Census :milestone, 2025-05, 0d
Fivetran → SQLMesh :milestone, 2025-09, 0d
Fivetran → dbt :milestone, 2025-10, 0d
section Databricks
Okera :milestone, 2023-05, 0d
MosaicML :milestone, 2023-06, 0d
Arcion :milestone, 2023-10, 0d
Tabular :milestone, 2024-06, 0d
Neon :milestone, 2025-05, 0d
section Snowflake
Streamlit :milestone, 2022-03, 0d
Applica :milestone, 2022-08, 0d
SnowConvert :milestone, 2023-01, 0d
LeapYear :milestone, 2023-02, 0d
Neeva :milestone, 2023-05, 0d
TruEra :milestone, 2024-05, 0d
Datavolo :milestone, 2024-11, 0d
Crunchy Data :milestone, 2025-06, 0d
Select Star :milestone, 2025-11, 0d
section Confluent
Immerok :milestone, 2023-01, 0d
WarpStream :milestone, 2024-09, 0d
section Qlik
Talend :milestone, 2023-01, 0d
Kyndi :milestone, 2024-01, 0d
Upsolver :milestone, 2025-01, 0d
section Data Quality & Observability
IBM → Databand.ai :milestone, 2022-06, 0d
Bigeye → Data Advantage Group :milestone, 2023-06, 0d
Datadog → Metaplane :milestone, 2025-04, 0d
Soda → nannyML :milestone, 2025-06, 0d
section Analytics & BI
Alteryx → Trifacta :milestone, 2022-01, 0d
Hex → Hashboard :milestone, 2025-04, 0d
Coalesce.io → CastorDocs :milestone, 2025-03, 0d
section Streaming & Real-time
Cloudflare → Arroyo :milestone, 2025-04, 0d
Redis → Decodable :milestone, 2025-09, 0d
section Database & Infrastructure
NetApp → Instaclustr :milestone, 2022-04, 0d
Vector Capital → SingleStore :milestone, 2025-09, 0d
^4d105f
# Timeline
The timelines show each year’s acquisitions, starting in 2022. After that, we discuss related topics such as Bundling vs. Unbundling and the general state of the data engineering ecosystem.
# 2025
- 2025-11-24: Snowflake → Select Star
- 2025-11-18: DVC → LakeFS ^c38a54
- Announcement: A Celebration of Shared Vision: lakeFS + DVC
- 2025-11-04: ClickHouse → LibreChat
- Announcement: ClickHouse welcomes LibreChat
- 2025-10-13: Fivetran → dbt ^746334
- 2025-09-10: Vector Capital Management → SingleStore ^b9f630
- 2025-09-04: Redis → Decodable
- Announcement: Redis to acquire Decodable
- 2025-09-03: Fivetran → SQLMesh
- Announcement: Acquisition (SQLMesh) - 2025-06-13: Soda → nannyML
- Announcement: Announcement - 2025-06-02: Snowflake → Crunchy Data ^193bc3
- Announcement: Announcement
- Deal Value: Estimated the deal to be worth around $250 million. Delivering the Most Enterprise-Ready Postgres, Built for Snowflake - 2025-05-14: Databricks → Neon ^3cf8ff
- Announcement: Announcement
- Deal Value: Bought Neon for a reported $1 billion in equity ( source) - 2025-05-01: Fivetran → Census
- Announcement: Fivetran Signs Agreement to Acquire Census
- Blog: Why Fivetran and Census are joining forces - 2025-04-30: Hex → Hashboard
- Announcement: Hex Acquires Hashboard
- Blog: Hex has acquired Hashboard - 2025-04-23: Datadog → Metaplane
- Announcement: Datadog acquires Metaplane
- Blog: Metaplane by Datadog - 2025-04-10: Arroyo → Cloudflare
- Announcement: Arroyo is joining Cloudflare | Arroyo blog
- Comment: To build R2 SQL
- 2025-03-19: Coalesce.io → CastorDocs
- Announcement: Coalesce Expands Data Platform With CastorDoc Acquisition
- Blog: Building the Future of Data Together: Why Coalesce Acquired CastorDoc - 2025-01-14: Qlik → Upsolver
- Announcement: Qlik Acquires Upsolver - 2025-01-14: dbt Labs → SDF
- Announcement: dbt Labs Acquires SDF Labs
- Blog: dbt Labs acquires SDF Labs to advance analytics engineering
# 2024
- 2024-11-20: Snowflake → Datavolo
- Announcement: Snowflake Agrees to Acquire Open Data Integration Platform Datavolo
- Blog: Snowflake snaps up data management company Datavolo
- Deal Value: Approximately $110M (primarily stock, remainder cash)
- Comment: Powered by Apache NiFi, provides single platform for automating and managing both structured and unstructured data flows
- 2024-09-09: Confluent → WarpStream
- Announcement: Confluent Acquires WarpStream to Advance Next-Gen BYOC Data Streaming
- Blog: Confluent Acquires WarpStream
- Deal Value: $220M
- 2024-06-04: Databricks → Tabular (Iceberg)
- Announcement: Databricks Agrees to Acquire Tabular
- Blog: Databricks + Tabular
- 2024-05-22: Snowflake → TruEra
- Announcement: Snowflake Acquires TruEra to Bring LLM & ML Observability to Data Cloud
- Blog: Snowflake acquires TruEra to deliver LLM observability inside Data Cloud
- Comment: Provides tools to test, debug, and monitor ML models and LLM apps in production
- 2024-03-19: Databricks → Lilac AI
- 2024-01-30: Databricks → Einblick
- 2024-01-18: Qlik → Kyndi
# 2023
- 2023-10-23: Databricks → Arcion
- Announcement: After 43B valuation, Databricks acquires data replication startup Arcion for 100M
- Deal Value: $100M
- Comment: Data replication
- 2023-06: Databricks → MosaicML
- Announcement: Databricks Completes Acquisition of MosaicML
- Deal Value: $1.3B
- Comment: One of the biggest AI deals of 2023
- 2023-06-22: Bigeye → Data Advantage Group
- 2023-05-24: Snowflake → Neeva
- Announcement: Snowflake Acquires Neeva to Accelerate Search in the Data Cloud Through Generative AI
- Blog: Snowflake acquires Neeva to bring intelligent search to its cloud data management solution
- Deal Value: Approximately $150-185M
- Comment: Founded by former Google executives; founder Sridhar Ramaswamy became Snowflake CEO in February 2024
- 2023-05-03: Databricks → Okera
- Announcement: Welcome, Okera: Adopting an AI-Centric Approach to Governance
- Blog: A new chapter in our journey: Okera is joining the Databricks family
- Comment: Okera co-founder and CEO Nong Li, creator of Apache Parquet, joined Databricks
- 2023-02-07: Snowflake → LeapYear Technologies
- Announcement: Snowflake to Acquire LeapYear
- Deal Value: $62M
- Comment: Differential privacy platform to enhance data clean room capabilities
- 2023-02-08: dbt → MetricFlow (Transform.co)
- Announcement: dbt Labs Signs Definitive Agreement to Acquire Transform, Accelerating development of the dbt Semantic Layer (updated by dbt)
- Press: Dbt Labs acquires Transform, adding semantic tools to its data analytics platform
- GitHub MetricFlow: The Future of the MetricFlow Project
- 2023-01-19: Snowflake → SnowConvert (from Mobilize.Net)
- Announcement: Snowflake Announces Intent to Acquire Mobilize.Net’s SnowConvert
- Deal Value: 76.3M (approximately 11.6M net of cash acquired)
- 2023-01-06: Confluent → Immerok
- Announcement: Confluent Announces Intent to Acquire Immerok to Accelerate Development of Cloud Native Apache Flink
- Deal Value: Approximately $54.9M
- Comment: Founded by original creators of Apache Flink
- 2023-01-05: Qlik → Talend
- Announcement: Qlik Acquires Talend
- Comment: Both companies owned by Thoma Bravo; Talend was acquired by Thoma Bravo for $2.4B in 2021
# 2022
-
2022-08: Snowflake → Applica
- Blog: Snowflake blog
- Comment: Specializes in decoding and automating complex documents based on deep learning
-
2022-06-27: IBM → Databand.ai
- Announcement: IBM Aims to Capture Growing Market Opportunity for Data Observability with Databand.ai Acquisition
- Deal Value: Estimated ~$150M
- Comment: Data observability platform; validated data observability as critical category in modern data stack
-
2022-04-07: NetApp → Instaclustr
- Announcement: NetApp acquires Instaclustr to deliver open source databases as a service
- Comment: Managed open source databases as a service (PostgreSQL, Kafka, OpenSearch, Cassandra)
-
2022-04: Talend → Gamma Soft
- Comment: Cloud-based data discovery platform based in France, founded 1983
-
2022-03-02: Snowflake → Streamlit
- Announcement: Snowflake acquires Streamlit for $800M
- Deal Value: $800M
- Comment: Popular open source project for building data-based applications
-
2022-01-06: Alteryx → Trifacta
- Announcement: Alteryx Announces Acquisition of Trifacta
- Deal Value: $400M cash + 75M retention pool (RSUs)
- Comment: Cloud-based data preparation platform
# Bundling vs. Unbundling
# Once Called, Software is Eating the World
Once called “Software is eating the world”, now it seems the pendulum is swinging back to more unified and integrated data platforms.
I think the best way for OSS products to survive is to embrace the «Declarative Data Stack» approach, where integration happens with a single configuration file. By integrating with multiple tools, you gain the best of both worlds: a combination of integrated and open-source capabilities, along with end-to-end analytics.
However, making money from open-source is hard, but I still hope that many will pursue this path. When deciding on a tool, I will always pick the open-source one. To me, it builds trust, and because it’s shared as a gift for free to use by anyone, it makes me want to support it more.
Consider the Framework Laptops. They’re fully repairable, with every part replaceable, allowing you to swap out the screen or even the motherboard later on. This is something I want to support. Same with OSS. I believe the strategy shouldn’t be to cash out on OSS, but rather have it as a sign of valuing your customer by giving them a gift.
The money stream should be independent of OSS, if at all possible, so you have a clear distinction, and also don’t confuse them with the company down the road. Much easier said than done, but there are still many great data engineering companies that are doing a great job of exactly that. I hope it stays this way.
What do you think, should the vendor still have an open-source product, or focus on making money so it will sustain itself over time?
# Fivetran + dbt
# Table Formats Market Updates
See Open Table Formats (Market Updates).
# AI Acquisition
Seperate note worthy acquisition in AI related to data engineering.
- Codeium, the makers of the Windsurf AI IDE, forked from VSCode were acquired by OpenAI for $3 billion dollars
# OSS vs. Closed-Source
Open-Source vs Closed-Source Data Engineering
# Further Reads
Origin: Data Engineering
References: Earning Money with the Open-Source Model—Making Gifts
Created 2025-05-02