🧠Second Brain
Search
Data Engineering Whitepapers
- Data Lakehouse: Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics ^6a7f75
- Data Catalog: Ground: A Data Context Service
- Apache Spark: Spark: Cluster Computing with Working Sets
- Data Engineering Architecture: The Google File System
- Streaming: The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
- Google File System (GFS): The Google File System ^fdbf43
- MapReduce: [MapReduce: Simplified Data Processing on Large Clusters](MapReduce: Simplified Data Processing on Large Clusters)
- Data Warehousing: Dremel: Interactive Analysis of Web-Scale Datasets
- Data Mesh: How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
- DuckDB:
MotherDuck: DuckDB in the cloud and in the client
- Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age
- Unnesting Arbitrary Queries: Unnest subqueries queries in SQL.
Find more useful content on Data Engineering.
Origin:
References:
Created 2024-01-05