🧠 Second Brain

Search

Data Engineering Whitepapers

Last updated Apr 16, 2024

Data Lakehouse: Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics ^6a7f75
Data Catalog: Ground: A Data Context Service
Apache Spark: Spark: Cluster Computing with Working Sets
Data Engineering Architecture: The Google File System
Streaming: The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
Google File System (GFS): The Google File System ^fdbf43
MapReduce: [MapReduce: Simplified Data Processing on Large Clusters](MapReduce: Simplified Data Processing on Large Clusters)
Data Warehousing: Dremel: Interactive Analysis of Web-Scale Datasets
Data Mesh: How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
DuckDB: MotherDuck: DuckDB in the cloud and in the client
- Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age
- Unnesting Arbitrary Queries: Unnest subqueries queries in SQL.

Find more useful content on Data Engineering.

Origin:
References:
Created 2024-01-05

Backlinks