Search

Search IconIcon to open search

Data Engineering: Trends and Predictions (2022-2026) đź”®

Last updated by Simon Späti

Here I’ll list my predictions and trends, mainly for myself and to keep a record of what’s happened in the past.

It helps me, and potentially you, to see that the higher-level changes are not that big. We still need Data Engineering Fundamentals, and most things predicted or hyped for next year might just swing back a year or two later. So don’t go for the Hype Cycle of Data Engineering, but observe the market, and make a decision based on the examination you made for yourself and your company. The Data Engineering Toolkit I wrote, is still relevant for most data work, which included fundamentals more than just new shiny tools.

For example, in 2024, I am predicting many of the same things as the year before. I guess things are just moving more slowly. Being slower can also save a lot of money, as you don’t need to upgrade every hype cycle.

Preliminary thoughts below

Again, the notes below are somewhat preliminary, though, and random updates for myself. Also, what are your predictions?

# 2026 Predictions

Preliminary thoughts on the field as of 2025-11-27:
I believe in 2025, AI was more slop than anything else. Obviously, we need to distinguish: Claude Code are great helpers, but every layer above is mostly just a wrapper. In 2026, I predict and hope we get less hype and bring actual value. Instead of putting AI everywhere in the name or sales pitch, as it’s a tool and not the selling criterion, this will help the whole industry. People will realise a little more that it’s not the silver bullet everyone thought it was.

On that note, we’ll see more guardrails and ways to manage AI agents, as in the old days with Master Data Management, where people approved and stewarded data before it was allowed into production. The problem is that agentic generations are so easy and fast that the verification process is very hard to keep up with.

# 2025 Observations

A couple of preliminary thoughts on the field as of 2025-11-27: More DevOps is one change I saw, see more in The State of DevOps in Data Engineering.

Another one are Small Data Stacks powered by CLIs like Rill, dbc, dlt, DuckDB, all small and tiny binaries or CLIs that can plug and play into any stack are super powerful and a theme I’d say I saw more in 2025 on one side, whereas we also see more unified data platforms on the other side.

The pendulum swang back again between Bundling vs Unbundling: Currently, with the recent Data Engineering Acquisitions, especially around Fivetran, Databricks, and Snowflake, which seem to buy everything they can, it’s shifting more to a consolidated Data Platform. With DevOps trends, the aim for a simpler, just-out-of-the-box working data platform has risen as well. Compared to talking to 4-5 vendors, you get it all from one, like back in the days when we used SAP, Oracle, and Microsoft.

To be continued..

# 2024 Random thoughts

Data engineering is still going strong. Stronger than ever, I’d say, especially since every industry focuses on data. AI won’t take our jobs; the opposite, as there will be more chaos, and people who know how to model the data and its flow, understand the business requirements and can deliver high-quality insights will always be used.

What will change, though, is how we learn— how we use the data. Once we’ve centralized, cleaned, and automated the data, we can do cool stuff with advanced technology. The presentation will be more “fancy,” hopefully more insightful, and easier to understand. Throughout my career, a key task has always been to present the data understandably. Because no matter how fancy your pipeline, tooling, or even your profound insights, if the presentation is not up to it or the data quality is terrible, no one cares.


On the other hand, I’m super stoked about how far we’ve come tooling-wise. I still remember the times vividly when I was creating the same ETL pipeline, either with PL-/T-SQL or sometimes in bash, in every company again. Sometimes, I still feel we’re in the same loophole and building the same things to this day. But zooming out, it’s clear that open source has come a long way.

I can bring us Airbyte and have a full-blown ingestion tool, I can use dagster for orchestration that has all necessary functions backed in (and much better than one of us alone could build), and I can choose any open-source BI tool and visualize it. All for “free”.

This shortcut is mind-blowing to me. The fun part starts for us engineers when you want to bring these tools together. But if you are mindful, this is a much better starting point than starting from scratch at every company, where the integration must also be made. But this time, we can use modern languages like Python or Rust instead of bash :).

LinkedIn Post, Tweet

# 2024 Predictions

Predictions as of 2024-01-19.

  • We’ll be back to the fundamentals and patterns we used over the years in data engineering, either with Data Modeling or other topics around the Data Engineering Lifecycle with security and data governance (including new fancy name data contracts). Software engineering practices applied to data engineering will increase even more.
  • Open/Modern Data Data Stack will be more widely used, especially in Europe and non-US countries. But more parts of the stack will die due to economic challenges.
  • Data Lake Table Format will be used more heavily as prominent vendors are integrating them in their platforms (Snowflake with the Iceberg Table, and Microsoft with the Delta Format powering their Microsoft Fabric)—creating Open Standards for the ecosystem.
  • Many people will use more data due to the AI hype as they notice they need regularly updated and good-quality data to do valuable predictions or any generative AI use cases.
  • The declarative trends continue with more tools from the modern data stack betting on it. For example, Kestra is a full-fledged YAML orchestrator, Rill Developer is a BI tool as code, dlt is a data integration as code, and many more introduce models; interestingly, many of them use DuckDB under the hood.
  • GDPR is still ongoing, especially with the Google Analytics 4 change. Many companies have changed to fully anonymized analytics, which is good. I have used GoatCounter for years.
  • The Rust hype in data engineering is less loud, but many rewrites are undergoing, and the complete ones dominate the market. ruff is a linter that is 10-100x faster than the defaults, Polars is the fastest data frame out there, many times faster than Pandas, and so on.
  • The Semantic Layer is still to find its place in the business intelligence world, but the concept will spread more due to its leading forces like Cube and DBT.

A recent poll by Eckerson Group showcased that 85% of respondents believe we need more data engineers in 2024, not less. Data democratization and AI initiatives make data engineering one of the busiest jobs in tech - even as GenAI boosts productivity. Oliver Molander, LinkedIn

# 2023 Predictions and Anticipations

Anticipations:J

  • data modeling comes back with the exposing of the MDS, and people started to create a mess, data modeling and modeling in general will help on all levels.
  • Also, people in  enterprises cannot grasp how to use MDS
  • AI and generative AI with chatGPT. Still needs to find its way into data, but many are focusing on it to make sense of it (more hype atm than anything else)
  • The year of bundling of MDS. Startups getting layoffs and bundled (Transform into dbt, Layoffs across MDS stack)

Some more predictions/anticipations of mine đź”®:

  • I agree that Rust is becoming more mainstream as a data engineering language
  • Spark will compete with Rust Ballista/arrow/data fusion
  • Modern Data Stack will rename and be more known outside of the data bubble
  • Declarative orchestration will be an acknowledged key component
  • Semantic Layers will gain adoption with dbt and cube
  • DuckDB will be the new standard when working with data
  • Open standards will be a key to all of the above and will consolidate MDS into a few key components

Predictions of Zach Wilson on “My bold 5 year predictions about  #dataengineering “:

  • Streaming data eng jobs account for 15-20% of all data eng jobs, but pay the most
  • Rust becomes a mainstream data engineering language
  • Spark starts looking like Hive does now
  • Data engineers will need to grow broadly into either  #dataanalytics or  #softwareengineering to stay competitive
  • We’ll start seeing blockchain data engineer roles which require a firm understanding of smart contracts, distributed compute, and  #machinelearning.

As of 2022-10-16: Twitter/ LinkedIn:

  • declarative approach everywhere (from Kubernetes where we have code as infra, we have  orchestration as code, we have  integration as code with our low-code, and it goes in every discipline).
  • same underlying approach with the  rise of the Semantic layer (basically a declarative approach for Metrics)
  • and metadata trends are constantly growing with tools for data cataloging, data lineage, and data discovery.
  • Rust will be the future of performance-intense applications in data. It most probably will be used as Spark today.
  • Vector Databases such as  DuckDB are here for small data. And newer ones especially supporting the AI wave behind the curtains with pinecone, Qdrant, etc. AI is always a data game, and not the other way around.
  • With regulations like GDPR and CCPA, privacy and governance are rising in every small to big company.

As of 2023-05-19. Tweet, LinkedIn and Reddit Discussion.

# Further Reads


Origin: Benjamin Rogojan on LinkedIn: FACT! 2022 is coming to an end. What is the state of data infra? | 24 comments
References: 12 Things You Need to Know to Become a Better Data Engineer in 2023 | Airbyte
Created 2022-11-16