🧠 Second Brain
Search
SELECT Insights - Practical Data Engineering: A Hands-On Real-Estate Project Guide
Hello friends,
This week’s SELECT Topic is the updated GitHub repo I worked on many days to update from the ancient versions of dagster, delta lake, and MinIO to the latest.
I am also changing the format from a link-heavy newsletter to a more written one. I find it less attractive to add a stack of lists, and writing motivates me to write to you all more; I’ll add the links I encounter to a dedicated page in the Newsletter Link Collection. E.g., the collection from the last issue of this newsletter
# SELECT Practical Data Engineering Project [2 min read]
Usually, I start with a thought-provoking topic, ranging from the intricacies of data engineering to the fascinating charm of everyday life experiences.
I updated my Practical Data Engineering project on GitHub✨.
Three years ago, I published a hands-on data engineering project that showcased the complexity of the data engineering landscape. Today, I’m announcing a significant update to this venture on GitHub.
What fascinated me was that despite the data engineering space moving extremely fast, the core of my project, powered by carefully chosen tools from the Open Data Stack, remains relevant to this day. This project is my most searched blog post on Google, which motivated me to update it.
Please check the guide on Github: Practical Data Engineering: A Hands-On Real-Estate Project Guide. I also added a short YouTube video that shows how to install and navigate the project.
# What’s New?
- Project Enhancements: The upgrade was substantial, with over 2,070 lines of code added or revised. I mainly upgraded the latest versions of the projects and made them run again. Mostly updating to the latest Dagster, Delta Lake, and MinIO changes.
- Goodbye, Apache Spark (locally): I’ve abandoned Spark for local development due to its setup complexities and integration issues with Delta Lake. Instead, I’ve embraced delta-rs, a more straightforward and efficient path to managing Delta Tables in Python.
Future plans include integrating Rill Developer for local data analysis fun, upgrading jupyter notebooks with delta-rs, and adding DuckDB or Polars. Please feel free to open a PR if you wish to update or improve any parts of the projects. Your insights, feedback, and contributions are not highly welcome.
This project started as a learning project for myself, but it’s also a good community project for everyone to learn and experience multiple facades of data engineering. It offers insights into the application of modern data stacks and showcases how to navigate the rapidly evolving landscape of data engineering. I hope you enjoy it.
More on the GitHub Project:
# UPDATE Engineering - Latest Updates in Data Engineering Tools and Techniques [1 min read]
In this bustling realm of data engineering, let’s look at the recent updates that caught my attention.
There were a few discussion about the Modern Data Stack is dead. I always find these discussions fascinating, as I live in Europe, where these terms are only slowly getting known and recognized by people. Also, with the realization of the three-year-old project, where I used many tools from the Open Data Stack, it’s not changing as fast as we always believe following data on Twitter or the latest on LinkedIn.
The other recurring theme is back to the roots, which I am a big fan of. While writing my book, I’m always exploring that something stood the Lindy Effect, and the longer it is around, the higher the chance it will be around for the same amount of time. Which is somewhat calming when we constantly think or read the latest.
Check all the links in the Newsletter Link Collection.
# JOIN Perspectives [3 min read]
Where nerdy pursuits like blogging, Neovim, dotfiles, and coding intersect with life’s subtle nuances and diverse worldviews.
Digital Minimalism with Tiago Forte and The Minimalists. I recommend this talk as they talk deeply about what matters—some thoughts of mine on Digital Minimalism. A similar talk that similarly inspired me was Jacob’s. Staying true to himself, listening to his feelings, and following his instincts early on. He declined Quincy Jones and created his own (pathless) path. Intriguing listen.
Modal is the tech that Erik Bernhardsson built, one of the early developers of Spotify. Instead of using Kubernetes, he built an infrastructure that easily integrates into the Python workflow. Read the Tweet with some insights into Modal and Why they build it
This is an interesting article by Simon Willison about scraping data with a database (SQLite) and git. And Simon’s earlier article on Git scraping: track changes over time by scraping to a Git repository.
File over app is this principle that a note-taking app shouldn’t be bigger than its format. It’s an homage to Markdown, where tools like Obsidian use Plaintext Files to store your data, which will live out any app much longer. If you are interested in this content, please also check out the essay on local first. I also recommend the podcast with Martin Kleppmann on local-first.fm.
Also, check out my little Video on Vim with Obsidian. Or the daily doses of today’s graph.
A fascinating and human story behind a VC startup called The Final Chapter of My First Startup.
This is a story about the other kind of developer than the ones we meet all the time, which they call the 99% of developers: the Dark Matter Developers.
After writing many CVs in Word and Adobe Indesign to make them fancy, I moved back to Word and other formats. I recently moved my CV to Markdown and online. This helps me focus on the content and always be up-to-date. Instead of seeing a static PDF, I can send a link that includes not only my CV but the whole work I did online and update continuously.
# FETCH Socials - Conversations Stirring Up The Digital World [3 min read]
This is where I share intriguing conversations, trending topics, and powerful ideas from around the social media landscape.
In this month’s social fetch, here are some posts that caught my attention:
My data modeling articles got many views on Christophe Blefari’s excellent data news Tweet.
Here are some new feature images for my second brain or book cover. Don’t worry; I asked my sister, who is a professional designer, to make an actual cover, not like the toy AI generated once 😉.
Don’t jump on the latest shiny new Personal Knowledge Management (PKM) system. Use what exists, e.g., Obsidian and vim :).
Many have asked me how to get started with data engineering. I suggest solving a problem or something you are passionate about with an actual project. I have collected a list of projects if you need help—get inspired and choose according to your skills. Tweet, Open-Source Data Engineering Projects.
Some discussions about “Influencers” and why they succeed with writing. On that very topic, there are some no-fluff, not clickbaity data influencers.
Will Airflow become obsolete in coming years? My opinion is yes. Whenever I need to use it, it will force me to write the wrong code, chunk everything into one single DAG, and bundle technical and business logic together. I might be more sensitive, as I have used better tools for many many years now, but Airflow is still the default for many.
Probably something we are all guilty of here and there: Data Engineers overcomplicate things 🙃.
# SCAN Books - Through The Lens of Written Papers [3 min read]
Every book opens up a new world of insights and perspectives. Here, I’ll share some of my recent reads across various topics. Let’s explore these new horizons together!
Slow Productivity by Cal Newport: This is a fantastic book so far. It showcases why we should quit the race of everyday life and slow down for a compounding effect instead of overnight success. The same is true for money: investing long-term instead of gambling in a casino.
He describes that instead of sudo Productivity, which is the norm these days, where we try to be as busy as possible to showcase we are doing something, we are trying to be as productive as possible. This means the more chats, emails, and artifacts we ship, the more productive we seem from the outside. This is because there is no system to measure Productivity. I enjoyed all his books on deep work and digital minimalism, and this book is no different.
I also read the Elon Musk Biography. It’s very long, but I loved every page (or word as I listened to it on Audible).
It is inspiring. It shows how an exceptional, hard-working Musk is doing everything for humanity in an unhealthy way. He is also, to an extent, sick. He has his dark sides that just come out sometimes, which are also a sign of his mental state, which he got from his dad, and also his hard work and pressure.
He is striving in chaos; whenever there is a calm time, he will do something new as he can’t stand the status quo. It’s also inspiring how he has a particular way of working to make things successful. But in doing that, he passed some laws and expected people to work and spend all their lives in the company. If, on a Saturday, there are no people, he would get angry and order people to the site. He will often ask for unreasonable timelines to push people. He is highly product-oriented to the point that he does no Marketing, as he says you can’t do marketing for a lousy product, and a good product will speak for itself.
Also, How Will You Measure Your Life? by Clayton M. Christensen is a good book if you want to know more than just work and how you’ll measure it. This book was reassuring and strengthened many things I already knew.
It reminded me that finding your principles is the key. Follow them and align your life so you have a happy life. It was also helpful for me as a dad and family to pass on the same principles and values to my kids. Be intentional about your values. It’s hard to find them. They won’t be sent to you; you need to make them. But be aware that it is a process, not an event.
Also, the biggest reason for happiness is relationships. Try to have a good circle of friends and family. It distinguishes between emergent and deliberate life strategies, similar to Paul Milled’s Pathless Path principle, compared to the Default Path.
# Thanks!
If you’ve made it this far, thank you for reading! Your thoughts and feedback are invaluable to me, so don’t hesitate to tell me what you’d like to see more or less of. I’m always open to suggestions on improving both the topics covered and the style of this newsletter.
Until next time, happy reading and exploring.
—Simon
Republished:
Substack
Created 2023-07-16