🧠 Second Brain

Search

Search IconIcon to open search

Data Engineering Vault: A Second Brain Knowledge Network

Last updated Nov 10, 2024

Welcome to the Data Engineering Vault, an integral part of my larger Second Brain. This curated network of data engineering knowledge is designed to facilitate exploration, discovery, and deep learning in the field of data engineering. Here, you’ll find a rich ecosystem of 1000+ interconnected terms and concepts, each serving as a gateway to deeper insights.

Functioning like a Digital Garden for data engineering, this network allows you to organically explore and connect ideas. As you navigate through the concepts, you’ll uncover hidden relationships, expanding your understanding and providing a unique, immersive learning experience whether you’re a seasoned data engineer or just starting your journey.

I invite you to explore the entry topics and notes below, discover new connections in the graph or backlinks, or search for a data engineering term or concept. Please enjoy.

Topic Notes
Data Engineering Concepts Classical Architecture of Data Warehouse • Data Engineering Architecture • Data Engineering Lifecycle • Data Modeling • Anchor Modeling • Bitemporal Modeling • Entity-Centric Data Modeling (ECM) • Dimensional Modeling • Focal Modeling • Conceptual Data Model • Logical Data Model • Physical Data Model • Inmon vs Kimball • Normalization • Relational Model • Data Virtualization • Data Contracts
Data Engineering Tools and Technologies SQL • Python • Apache Arrow • Ballista (Arrow) • DataFusion • DBT • Data Lake • Data Lakehouse • Data Warehouse • ETL • ELT • Hadoop • Kubernetes • OLAP • OLTP • Storage Layer • Data Lake File Formats • Data Lake Table Format • Data Modeling Tools • Data Modeling Frameworks • DuckDB • Stream Processing
Data Engineering Practices Data Engineering Approaches • Data Modeling • Open-Source Data Engineering Projects • Data Governance • Data Integration CLI tools • Data Orchestrators • Data Product • Data Warehouse Automation • Master Data Management • Semantic Layer • Personalized API • SQL IDEs • Data Catalog • ELT • ETL • EtLT • ELTP • Software-Defined Asset • Data Contracts
Modern Data Engineering Cloud Data Warehouses • Closed-Source Data Platforms • Data Lake Table Format • Modern Data Stack • Modern OLAP Systems • Open Data Stack • Serviced Cloud and Analytics • Snapshotting • The Role of a Data Engineer • Declarative vs Imperative • Notebooks • Reverse ETL
Data Engineering Management and Analysis BI Tools • Bus Matrix • Business Intelligence • Data Assets • Data Engineer Job Description and Skills • Data Engineer vs Software Engineer • KPI • Managed Data Stacks • Metrics • Super Table • Metrics Layer
Data Engineering Design and Development Functional Data Engineering • SSAS • Stackable • Traditional OLAP Cube Replacements • Traditional OLAP Cubes • Delta Load a Data Warehouse
Additional Data Engineering Resources
Data Engineering Design Patterns Book 📖.
Data Engineering Glossaries • People of Data Engineering • Books of Data Engineering • Data Engineering Podcasts • RSS feeds for Data Engineering • Data Engineering Whitepapers

Stay Updated

If you find the content of my Data Engineering Glossary valuable and want to stay updated, consider subscribing to my Newsletter, following the RSS feed, or seeing the latest updates below in recent notes.

Subscribe to the Newsletter

# Definition of Data Engineering

Data engineering is a term that has shifted over the years from a Database Admins (DBA), ETL Developer, and Business Intelligence Specialist and merged with Software Engineers to a Data Engineer with the growth of data made his title.

It’s still not well defined, the latest book on Fundamentals of Data Engineering (Joe Reis, Matt Housley) tries and does probably best as of today; it’s getting clearer. Besides several boot camps, universities are also starting to get a degree in data engineering like Data Science did before. Let’s start by defining what data engineering is.

# What is Data Engineering

Data engineering is the less famous sibling of data science. Data science is growing like no tomorrow, as does data engineering, but much less heard. Compared to existing roles, it would be a software engineering plus business intelligence engineer including big data abilities as the Hadoop ecosystem, streaming, and computation at scale.

Business creates more reporting artifacts, but with more data that needs to be collected, cleaned, and updated near real-time, complexity is expanding daily. With that said, more programmatic skills are required, similar to software engineering. The emerging language at the moment is Python (more The Tool Language, Python) which is used in engineering with tools identical to Apache Airflow, Dagster, other Data Orchestrators, and data science with powerful libraries. Today as a BI engineer, you use SQL for almost everything except when using external data from an FTP server, for example. You would use bash and PowerShell in the nightly batch jobs. But this is no longer sufficient, and because it gets a full-time job to develop and maintain all these requirements and rules (called pipelines), data engineering is needed.

# Evolution of Data Engineering

# Getting Started with Data Engineering

Additional resources that can further enhance your understanding of data engineering. Whether you’re just starting out or looking to deepen your expertise, these resources are handpicked for their clarity, depth, and practical insights.

# Must-Read Articles

Begin your journey with the “holy trinity” from Maxime Beauchemin, defining the essence of data engineering:

# Community and Learning

Don’t miss out on these foundational reads and thought leaders in the field:

Feel free to explore, learn, and contribute to this ever-growing field. Your journey in data engineering is just beginning.


Origin: Data Engineering, the future of Data Warehousing?
References:
Created: 2021-10-11