Search

Search IconIcon to open search

History of General Architecture in Data

Last updated by Simon Späti

KEYNOTE: Hannes Mühleisen - Data Architecture Turned Upside Down at the PyData in Amsterdam 2025.

Below some of my prelimitary notes, and also note, that this is heavily focused on DuckDB. But still has it’s relevance and mostly true with the Small Data Manifesto, and small data revolution.

# Changes of Architecture from 1985 to 2015

What motivated him when Hannes started DuckDB or as a professor, was also when he saw the Oracle Pricing, and what it costs.

Hannes Mühleisen says - from 1985 to 2015 not much changed:

other than Compute and Storage Separation:

# Clients Are Strong

Later - hannes wrote a paper, that client side tranfering data like the small lines above are the bottleneck, and this should be faster:

Paper: Data Engineering Whitepapers (DBMS X is Oracle)

The dotted line there, its netcat, pure network transfer.
THe change or revolution is done by Wes McKinney with Pandas, that Hannes credited for the revolutions.

# Lakehouse

Later the Lakehouse came across.


deletes

Open Table Formats

# DuckLake


So we needed DuckLake. See more on DuckLake

# 2025

Advantage: More user, more compute 🤯

# Use Cases with DuckDB

My notes illustrated as part of my Obsidian vault


Origin: Data Engineering Architectures Overview, KEYNOTE: Hannes Mühleisen - Data Architecture Turned Upside Down | PyData Amsterdam 2025 - YouTube
References: Classical Architecture of Data Warehouse
Created 2025-11-04