🧠 Second Brain

Search

Search IconIcon to open search

SELECT Insights - Bundling with Microsoft Fabric and Orchestration (#2)

Last updated Feb 9, 2024

Hello friends,

A quick note before we start: I changed the name of this newsletter to “SELECT Insights” and changed the structure slightly. I plan to follow this structure for the upcoming editions. If you want to know more, I have more details on newsletter.ssp.sh. — And on another exciting side note, I also move the domain from sspaeti.com to ssp.sh. I hope you like it; more on the history in this Tweet. But now, let’s get into this month’s newsletter.

This week’s SELECT Topic is Data Orchestrators. Data Orchestration is a topic we must recognize, given the recent events and trends in the data ecosystem.

# SELECT Orchestration - Dead or Alive? [4 min read]

I kick things off with a thought-provoking topic, ranging from the intricacies of data engineering to the fascinating charm of everyday life experiences.

This is interesting from two standpoints. On one side, Dagster, through their parent company Elementl, raised $33 million in Series B funding. And on the other hand, Microsoft announced its new product launch with a big bang, called Microsoft Fabric. Why is this interesting, you might as? Because on the one hand, Microsft is bundling the whole Azure data stack into a single SaaS service, essentially bundling it into PowerBI.

On the other hand, Dagster is bundling the Open Data Stack into a single control pane.

# The Data Orchestration Challenge

Data orchestration is increasingly vital as the hub of the Data Engineering Lifecycle. It combines the core aspects of data integration, transformation, and analytics in a cohesive, manageable workflow. The demand for more sophisticated orchestration tools is growing, as highlighted in the great summary of data orchestration articles I recently read.

However, I’ve observed a concerning trend - the unbundling of the orchestrator across various tools. Instead of allowing the orchestrator to do its job effectively, we are diverting away from this ideology, fragmenting the process across different devices and platforms. This is not the direction we should be heading.

Still, there is a lot of hype for now, but the good news. Microsoft, for instance, offers excellent no-code and Closed-Source Data Platforms solutions to kickstart data engineering tasks. They are betting strongly on open standards with open-source Data Lake Table Formats Delta.io and Spark for computation.

What they still need to include, though, is an open-source orchestrator. They have some closed-source solutions, but they remind me more about the bad times of SSIS, with lots of missing efficiencies.

On the flip side, the Open Data Stack, which integrates the core needs of the data engineering lifecycle, could be our trustworthy, bundled solution.

# Is the Orchestrator Dead or Alive?

This brings me to the Symposium: Is the orchestrator dead or alive? This engaging series, initiated by Stephen Bailey, invites authors from different fields to discuss the role of the orchestrator in today’s data ecosystem. Some takeaways:

  1. The need for speed and simplicity: The current data orchestrators should be more quick. They need to onboard use cases faster and justify displacing or running managed services through it.
  2. Data ingestion: This is a process that an orchestrator must own.
  3. Actions, not meta-narratives: GitHub Actions thrives by focusing on running things rather than what it should be.
  4. Integration: The most helpful orchestrator is one plugged into everything.
  5. Control vs. chaos: The orchestrator embraces chaos but of a certain kind: ordered chaos, not wasteful chaos.

Finally, I want to address the contention that Data Engineering is a transitional job. Data engineering is not a job but a field. It’s too broad to be confined to one role. Collaboration between BI engineers, DBAs, DevOps, data scientists, and more is necessary to leverage data effectively. After all, as we know, everyone needs data.

Feel free to dive deeper into the data orchestration discussion at Stephen Bailey’s Symposium. Also, check out the latest orchestration comparison by Christophe Blefari - Airflow alternatives Mage and Kestra, and Prefect and Dagster.

# UPDATE Engineering - Latest Updates in Data Engineering Tools and Techniques [5 min read]

In this bustling realm of data engineering, let’s take a look at the recent updates that caught my attention:

# JOIN Perspectives [2 min read]

Where nerdy pursuits like blogging, neovim, dotfiles, and coding intersect with life’s subtle nuances and diverse worldviews.

# FETCH Socials - Conversations Stirring Up The Digital World [1 min read]

This is the space where I share intriguing conversations, trending topics, and powerful ideas from around the social media landscape.

# SCAN Books - Through The Lens of Written Papers [1 min read]

Every book opens up a new world of insights and perspectives. Here, I’ll share some of my recent reads across a spectrum of topics. Let’s explore these new horizons together!

# Thanks!

If you are still reading, thanks so much. I hope you enjoyed this update! It turned out longer than I expected (as always). Let me know what you want more or less; happy to take feedback to improve the topics and style to your liking.

Until next time, happy reading and exploring!
Simon


Republished: Substack
References:
Created 2023-05-25