Search
Data Engineering Vault
Welcome to the Data Engineering Vault, an integral part of my larger Second Brain. This curated network of data engineering knowledge is designed to facilitate exploration, discovery, and deep learning in the field of data engineering. Here, you'll find a rich ecosystem of 1000+ interconnected terms and concepts, each serving as a gateway to deeper insights. Functioning like a Digital Garden for data engineering, this network allows you to organically explore and connect ideas.
Key Topics & Concepts
As you navigate through the concepts, you'll uncover hidden relationships, expanding your understanding and providing a unique, immersive learning experience whether you're a seasoned data engineer or just starting your journey.
Data Engineering Foundations
Data Engineering Concepts, Data Engineering Lifecycle, Data Modeling, Dimensional Modeling, Inmon vs Kimball, Data Engineering Architecture, Data Engineering Concepts, Normalization, Data Assets, Entity-Centric Data Modeling (ECM), Semantic Layer, Data Lineage, The Role of a Data Engineer
Modern Data Infrastructure
DuckDB, Data Lake Table Format, Delta Lake, Apache Iceberg, Data Lakehouse, Open Table Format, Modern OLAP Systems, Apache Arrow, DataFusion, Microsoft Fabric, Cloud Data Warehouses, Open Data Stack, Declarative Data Stack, Modern Data Stack, Data Lake File Formats, Metrics Layer
Data Transformation & Processing
SQL, dbt, Python, Functional Data Engineering, Extending SQL for Analytics, ETL vs ELT, Stream Processing, Push-downs, Declarative vs Imperative, Semantic SQL, Data Orchestrators, Kestra, Dagster, Apache Airflow, SQL Query Engine
Modern Analytics Approaches
Open-Source Data Engineering Projects, Metrics, Pivot Table, Semantic Models, Data Governance, Data Mesh, Data Catalog, Master Data Management, Reverse ETL, Data Warehouse Automation, Medallion Architecture, Analytics API, Personalized API, Traditional OLAP Cube Replacements
Specialized Data Technologies
Data Contracts, Data Product, Change Data Capture (CDC), Snapshotting, Slowly Changing Dimension, Time Travel, ACID Transactions, Schema Evolution, Schema Drift, Software-Defined Asset, Data Integration CLI tools, Cube, VertiPaq, Idempotency
Data engineering is a term that has shifted over the years from a Database Admins (DBA), ETL Developer, and Business Intelligence Specialist and merged with Software Engineers to a Data Engineer with the growth of data made his title.
It’s still not well defined, the latest book on Fundamentals of Data Engineering (Joe Reis, Matt Housley) tries and does probably best as of today; it’s getting clearer. Besides several boot camps, universities are also starting to get a degree in data engineering like Data Science did before. Let’s start by defining what data engineering is.
# What is Data Engineering
Data engineering is the less famous sibling of data science. Data science is growing like no tomorrow, as does data engineering, but much less heard. Compared to existing roles, it would be a software engineering plus business intelligence engineer including big data abilities as the Hadoop ecosystem, streaming, and computation at scale.
Business creates more reporting artifacts, but with more data that needs to be collected, cleaned, and updated near real-time, complexity is expanding daily. With that said, more programmatic skills are required, similar to software engineering. The emerging language at the moment is Python (more The Tool Language, Python) which is used in engineering with tools identical to Apache Airflow, Dagster, other Data Orchestrators, and data science with powerful libraries. Today as a BI engineer, you use SQL for almost everything except when using external data from an FTP server, for example. You would use bash and PowerShell in the nightly batch jobs. But this is no longer sufficient, and because it gets a full-time job to develop and maintain all these requirements and rules (called pipelines), data engineering is needed.
# Evolution of Data Engineering
- the history and state of data engineering, // the state of data engineering
- Data Engineering, the future of Data Warehousing? | ssp.sh
- Business Intelligence meets Data Engineering with Emerging Technologies | ssp.sh
- The Evolution of The Data Engineer: A Look at The Past, Present & Future
# Getting Started with Data Engineering
Additional resources that can further enhance your understanding of data engineering. Whether you’re just starting out or looking to deepen your expertise, these resources are handpicked for their clarity, depth, and practical insights.
# Must-Read Articles
Begin your journey with the “holy trinity” from Maxime Beauchemin, defining the essence of data engineering:
- The Rise of the Data Engineer
- The Downfall of the Data Engineer
- Functional Data Engineering — a modern paradigm for batch data processing
# Community and Learning
Don’t miss out on these foundational reads and thought leaders in the field:
- Books of Data Engineering – A selection of essential reads for every data engineer.
- People of Data Engineering – Learn from the pioneers and current leaders shaping the data engineering landscape.
- Data Engineering Glossaries & Handbooks - Glossaries and Handbooks that explain the complex terms of DE.
- RSS feeds for Data Engineering - My list of best data engineering blog posts as RSS feeds.
- Data Engineering Whitepapers - Whitepapers that define the foundation of data engineering.
- Data Engineering Blogs and Newsletters - Providing insightful articles, podcasts, and newsletters on DE.
- Data Engineering YouTube - Popular YouTube Channels focusing on DE.
- Learning Data Engineering - With more resources such as hands-on projects, courses, and boot camps to start learning.
Feel free to explore, learn, and contribute to this ever-growing field. Your journey in data engineering is just beginning.
Origin:
Data Engineering, the future of Data Warehousing?
References:
Created: 2021-10-11