🧠 Second Brain

Search

Search IconIcon to open search

SELECT Insights - What the Heck Is the Open Data Stack?

Last updated Feb 9, 2024

Hello friends,

This week’s SELECT Topic is the Open Data Stack. This is something I have written before, but I’d like to share my view on the differences between the Modern Data Stack and what I see behind the term.

# SELECT Open Data Stack - Shaping the Data Engineering Landscape with Open Standards [4 min read]

I kick things off with a thought-provoking topic, ranging from the intricacies of data engineering to the fascinating charm of everyday life experiences.

As data engineering evolves, it’s crucial to understand the potential of open-source solutions that adhere to open standards, providing an integral and adaptable framework known as the Open Data Stack. This approach outshines the term ‘Modern Data Stack,’ which has garnered some negativity and created confusion within the industry. So, what exactly is the Open Data Stack, and why should we focus on it?

The Open Data Stack addresses all aspects of the Data Engineering Lifecycle. While it shares the same goal as the Modern Data Stack, the Open Data Stack provides better tool integration due to its open nature, resulting in greater usability for data practitioners. The key word here is open, which implies that the tools or frameworks employed are either open-source or compliant with Open Standards. This openness facilitates tools like Dremio, a data lakehouse platform. While Dremio itself isn’t open source, it operates based on open standards like Apache Iceberg and Apache Arrow, allowing for seamless integration without vendor lock-in for larger organizations.

Some data practitioners propose alternative names such as ’ngods (new generation open-source data stack)’, ‘DataStack 2.0’’, or ‘DAD Stack’. Nonetheless, the essence remains the same: better, more integrated tooling is used by more individuals within every company that can genuinely comprehend the data it manipulates. This contemporary data stack will significantly differ from its predecessors.

It’s important to note the core distinction between ‘old data stack vs. modern data stack,’ monolith vs. microservices’, or ‘orchestrations vs choreography’. These terms illustrate the ongoing shift from Monolith Data, bundled solutions to a more microservices-driven, unbundled approach. This shift enables data pipelines to operate as ‘microservices on steroids,’ improving scalability and alignment across various code services.

The Open Data Stack is accessible and maintained by all users, fostering an environment where companies can leverage existing, tested solutions instead of re-implementing critical components for each data stack component. The ‘open’ aspect makes this stack embeddable with various tools, unlike closed-source services. This level of integration allows you to easily incorporate tools like Airbyte, dbt, Dagster, Superset, and more into your services.

You might wonder which ones to adopt in a world with over 100 tools. This guide introduces the Open Data Stack, emphasizing the advantages of reusing and building upon existing solutions. With the open data stack approach, you no longer need to write custom code for each step of the data engineering lifecycle. Instead, the Open Data Stack allows you to quickly address common challenges, ranging from data extraction and visualization to monitoring and scaling. Consequently, the Open Data Stack ensures that you’re not reinventing the wheel but building upon proven foundations to expedite and streamline your data engineering processes.

Whether you’re just starting to delve into the field or seeking to enhance your existing knowledge, the Open Data Stack provides robust, flexible solutions to help you master data engineering. Check out the GitHub repo Open-Data-Stack to see the Open Data Stack in action and start your journey toward a more effective and efficient data-driven operation.

From the Keynote The End of the Road for the Modern Data Stack You Know by DBT Labs

Better, more integrated tooling, used by more humans inside of every company, that actually understands the data that it is operating on.

This modern data stack—if we still want to call it that!—will be unrecognizable to its former self.

Check more on Open Data Stack, Data Engineering Blog Tags and Bundling vs Unbundling- Monolith Data vs Microservices.

# UPDATE Engineering - Latest Updates in Data Engineering Tools and Techniques [5 min read]

In this bustling realm of data engineering, let’s take a look at the recent updates that caught my attention:

# AI Specific

# JOIN Perspectives [2 min read]

Where nerdy pursuits like blogging, Neovim, dotfiles, and coding intersect with life’s subtle nuances and diverse worldviews.

# FETCH Socials - Conversations Stirring Up The Digital World [1 min read]

This is the space where I share intriguing conversations, trending topics, and powerful ideas from around the social media landscape.

In this month’s social fetch, here are some posts that caught my attention:

# SCAN Books - Through The Lens of Written Papers [1 min read]

Every book opens up a new world of insights and perspectives. Here, I’ll share some of my recent reads across a spectrum of topics. Let’s explore these new horizons together!

# Thanks!

If you’ve made it this far, thank you for reading! Your thoughts and feedback are invaluable to me, so don’t hesitate to let me know what you’d like to see more or less of. I’m always open to suggestions on how to improve both the topics covered and the style of this newsletter.

Please note, for this particular edition of the newsletter, I’ve activated analytics. This is to ensure that my emails are reaching your inbox, especially since I’ve recently made changes to the name and sender. Rest assured, this is a one-time measure, and I will disable analytics for subsequent newsletters. Your privacy is important to me.

Until next time, happy reading and exploring!

Best,
Simon


Republished: Substack
References:
Created 2023-07-16