🧠 Second Brain
Search
Newsletter Link Collection
# SELECT Insights - Practical Data Engineering Guide
# Tools & Frameworks
- Why - Data Developer Platform
- DuckDB ADBC - Zero-Copy data transfer via Arrow Database Connectivity - DuckDB
- Data Version Control in R with lakeFS
- Titan Framework by teej: Tweet
- Airflow’s new CLI: GitHub - kaxil/airflowctl: A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects
- Dagster Pipes: [RFC] Dagster Pipes (previously ext) · dagster-io/dagster · Discussion #16319 · GitHub
- Introducing Python for configuration - Cube Blog
# Databases & Storage
- Reading Very Large Postgres tables - Top 4 Lessons We Learned | Airbyte
- Comparison: Delta, Hudi, Iceberg — A Benchmark Compilation | by Kyle Weller | Aug, 2023 | Medium
- The battle between Hudi and Iceberg:
- Pro Iceberg: Iceberg and Hudi ACID Guarantees by Tabular
- Pro Hudi: On “Iceberg and Hudi ACID Guarantees” by OneHouse
# Data Modeling & Warehousing
- After the Modern Data Stack: Welcome back, Data platforms
- Illustration on modeling techniques: Data Modeling in the Modern Data Stack | Towards Dev
- Data Engineering Best Practices - #1. Data flow & Code · Start Data Engineering
- Using Multi-engine data stack , e.g. DuckDB and Snowflake.
- How important modeling is you can read on Meta post The future of the data engineer — Part I | by Analytics at Meta | Apr, 2023 | Medium. Followed up with Data engineering at Meta: High-Level Overview of the internal tech stack | by Analytics at Meta | Oct, 2023 | Medium.
- Answering “Why did the KPI change?” using decomposition
# Orchestration & Pipelines
- Using Dagster with git tooling: Dagster + lakeFS: How to Troubleshoot and Reproduce Data
- Data pipeline orchestrators - the emerging force in the MDS?
- Orchestrating unstructured data pipeline with Dagster and dlt
- Analytics Data Stacks for Growth-Stage Businesses using Cube, Dagster and Preset
- Abstracting the Pipelines for Analysts with a YAML DSL, a common requirement.
- Dagster, dbt, duckdb as new local MDS
# Data Integration
- Challenges in data integration: Why data integration will never be fully solved, and what Fivetran, Airbyte, Singer, dlt and CloudQuery do about it | Kestra
- Stop Reinventing Orchestration: Embedded ELT in the Orchestrator | Dagster Blog
# Podcasts & Videos
- Great Podcasts:
- Dagster and orchestration: Data Engineering Podcast: An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem
- With Maxime Beauchemin about orchestration: Data Engineering Podcast: Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling
- dlt to ease data integration: Data Engineering Podcast: Eliminate The Overhead In Your Data Integration With The Open Source dlt Library
- Video on DataFusion and Apache Arrow: D3L2: Discussing Rust, Ballista, Ray SQL, Data Fusion with Andy Grove
# Tutorials & Best Practices
- Ultimate dbt Jinja Functions Cheat Sheet - Datacoves
- Tobiko Data - Efficient Development with the SQLMesh Browser UI
- Factory Patterns in Python
- The Evolution of a Data Platform, part 2
# Misc
- For the love of the game - by winnie and its presentation: DuckDBT: Not a database or a dbt adapter but a secret third thing – DuckCon #3 (San Francisco) - YouTube by Josh Wills
- Critical view: Quo vadis, Data Open source - by timo dechau 🕹🛠
- Example of open data stack: YouTube
- Everyone Wants a Piece of the Pie, Nobody Wants to Bake
- The state of open source and rise of AI in 2023
- Programming language popularity on Reddit
- Why (some data people) Love Rust? - by Daniel Beach
- Announcing DuckDB 0.9.0 - DuckDB
- Scrape & Analyze Football Data with Kestra, Malloy and DuckDB | by Benoit Pimpaud | Medium
- Data Engineering with the Open Source Modern Data Stack (From MDS Fest ‘23) - YouTube
- dbt tests: How to write fewer and better data tests?
- Kaxil Naik on LinkedIn: #apacheairflow #cli #airflow2 #opensource | 54 comments
- Data engineers in Europe : r/dataengineering
- A Comprehensive Guide to MLOps | Saturn Cloud Blog
Origin: Newsletter
References:
Created 2024-03-18