DuckLake is a DuckDB Labs approach to simplifying Lakehouse architectures using a standard SQL database (DuckDB) for all metadata instead of complex file-based systems while still storing data in Open File Formats like Parquet.
The goal: This makes it more reliable, faster, and easier to manage.
# Architecture
It’s not a DuckDB thing per se, but an Open Standards in which they’re trying to simplify the overhead Open Table Formats features such as ACID Transactions, multi-writes, etc., to not write them to files, but to an actual database.
See the image below, which was before separate metadata in files and Open Catalogs; the DuckLake writes all of it to a relational database using a database that provides all the features that open table formats needed to have all database features π:

DuckLake’s architecture: Just a database and some Parquet files | Image from
The DuckLake Manifesto: SQL as a Lakehouse Format β DuckLake
Important to know
There is the open standard “DuckLake”, that combines the metadata with the catalog, and there is the
implementation “DuckLake” (MIT).
# Replacement of OTF?
Does it replace Open Table Formats? No. It is an alternative to Apache Iceberg + Polaris Catalog or Delta Lake + Unity Catalog. This means the catalog and database-like features are in one unit, which is the part that “simplifies”.
# Separation between Compute and Storage
With DuckLake we have the same advantages of Compute and Storage Separation as with OTF:
- Hannes notes that Snowflake and BigQuery were “the pioneers” of separating storage and compute
- They use the same architecture as DuckLake but with proprietary, closed systems
- DuckLake uses “the exact same design that snowflake and bigquery have been using for a long time already successfully with the one difference that on the bottom we have the open format parquet”
The DuckLake’s Innovation is around the three-way separation: Storage, compute, AND metadata can all scale independently.
# How Does the Compute Model Work?
According to
Introducing DuckLake - YouTube:
- “Everyone runs a DuckDB instance” - Instead of a traditional client-server model, each user/application runs their own compute node
- Fully distributed compute: “You can have an arbitrary amount of DuckDB instances” that all connect to the same metadata server and storage
- Local compute nodes: The compute runs “on your laptop, your co-workers can have it running on their laptop, you can have services like EC2 instances, lambda functions, wherever you can run DuckDB”
- No resource scheduling problems: Eliminates traditional Data Warehouse issues of “fairness and resource scheduling”
The fundamental shift is from “client-server” to “compute-everywhere” - where instead of scaling up a central server and connecting clients to it, you scale out compute nodes that coordinate through lightweight shared metadata.
This eliminates the traditional data warehouse bottlenecks while maintaining full transactional consistency across the distributed system. The clients are essentially the compute nodes.
# Features
Some of the features are here, many more in .
# Zero-Trust Data Lakes (Encryption)
Automatic Encryption with Key Separation, you simply specify encrypted: true
- no need to manage encryption keys yourself. DuckLake automatically generates a unique encryption key for every parquet file. These keys are stored in the metadata database, which is in a different trust zone than your file storage.
DuckLake only writes parquet files to blob storage (which have built-in encryption support), and all metadata stays in the separate database, so no unencrypted metadata files on blob storage.
This is a huge innovation, and unblocks many use-cases where you aren’t allowed to store sensitive data on a public data provider.
Real-World Architecture Possibilities
You could have the setup:
- Private metadata server: Run a local PostgreSQL server in your organization that you trust
- Public blob storage: Store encrypted files on untrusted S3 buckets
- CDN distribution: You could even expose encrypted files through Cloudflare caching “because the only thing that you need to keep private is the access to your metadata storage”
Hive or Iceberg catalog couldn’t do that
Hive cannot encrypt because it relies on partition information being visible in directory names. And Iceberg/Delta cannot encrypt because it writes unencrypted Avro and JSON metadata files to the blob store alongside the data.
# Articles
Here are the two announcement articles by DuckDB on their new site
DuckLake.select and Duckdb.org.
# Podcast: Introducing DuckLake
Announcement podcast with Hannes MΓΌhleisen and Mark Raasveldt. Key takeaway:
- Introducing DuckLake, a new project from the creators of DuckDB to address challenges in multiplayer database setups and data lake architectures.
- DuckDB has gained popularity for single-node analytics, but faces limitations in multi-user environments due to database file locking.
- Existing data lake formats like Apache Iceberg and Delta Lake attempt to solve update problems in data lakes, but involve complexity and questionable aesthetics.
- DuckLake proposes a new approach by leveraging a database system (e.g., PostgreSQL, DuckDB) to manage metadata for data lakes, enabling transactional consistency and multi-table operations.
- Principles of DuckLake: simplicity, scalability, and speed.
- Key features include arbitrary query support, multi-schema, multi-table transactions, time travel, schema evolution, views, partitioning, data inlining, and encryption support.
- You can have your encryption keys in the DuckLake in a secure, even locally, place, and leave the data on the data lake encrypted. So even if it’s on public S3, nobody can access, except if it has access to the DuckLake.
- This is very valuable to the full Data Sharing capabilities with Open Table formats
- DuckLake offers compatibility with existing Open Table Formats
- The DuckLake DuckDB extension implementation supports various metadata storage options (e.g., DuckDB, SQLite, MySQL, PostgreSQL, MotherDuck) and blob storage backends (e.g., S3, Google Cloud Storage, Azure Blob Storage).
- DuckLake is positioned as a solution for multiplayer DuckDB, enabling globally synchronized transactions across distributed instances.
- Next steps include releasing the extension, adding import/export capabilities for Iceberg, and expanding database support.
- Mentioned and shout outs:
# The DuckLake Manifesto & Technical Implementation
Sources:
The DuckLake Manifesto: SQL as a Lakehouse Format and
DuckLake: SQL as a Lakehouse Format.
TL;DR: DuckLake simplifies Lakehouse architectures by leveraging SQL databases for metadata management, enabling efficient data changes and scalability.
DuckLake re-imagines the Lakehouse architecture by storing data in open formats like Parquet while managing metadata within a SQL database, supporting ACID transactions and referential consistency. Unlike Iceberg and Delta Lake, DuckLake avoids complex file-based metadata systems, streamlining processes with standard SQL. The architecture uses an SQL database for catalog and table data, facilitating schema creation, modification, and data operations through SQL transactions.
The DuckLake format, implemented as a DuckDB extension, supports multi-table transactions, complex types, schema evolution, time travel, transactional DDL, and incremental scans. It integrates with various storage systems (S3, Azure Blob Store, GCS) and SQL databases (PostgreSQL, SQLite, MySQL, MotherDuck). DuckLake allows transparent inlining of small changes directly into the metadata store, minimizing small file issues. For example, initializing DuckLake involves ATTACH 'ducklake:metadata.ducklake' AS my_ducklake;
, followed by standard SQL operations like CREATE TABLE my_ducklake.demo (i INTEGER);
and INSERT INTO my_ducklake.demo VALUES (42), (43);
.
As Mark Raasveldt and Hannes MΓΌhleisen said, this design allows for low-latency query planning, reduces small file writes, supports many concurrent changes, and enables sub-millisecond writes with support for millions of snapshots, significantly outperforming traditional lakehouse formats in scenarios with frequent, small data changes.
Practical takeaway: By consolidating metadata management into a SQL database, DuckLake offers a more streamlined and scalable solution for lakehouse architectures, particularly beneficial for applications requiring high concurrency and low-latency data modifications.
Data Model of DuckLake
For how the catalog is structured and set up, check out the physical data model of DuckLake creates internally in .
# Hackernews
Check the comments
here.
# MotherDuck’s View
Jordan Tigani
writes about the limitations of storing metadata in S3 for data lakehouse formats like Iceberg and Delta Lake, highlighting the performance bottlenecks and complexities in implementing ACID operations. DuckLake addresses these issues by storing table and physical metadata in a database of your choosing (e.g., Postgres), while keeping data in an S3-compatible object store as Parquet files.
This design allows for faster metadata lookups using SQL queries (a couple of milliseconds vs. hundreds of milliseconds for cold S3 scans). DuckLake also reduces metadata file compaction, optimizing snapshot management by pointing multiple snapshots to portions of a single Parquet file. According to Jordan, this approach makes DuckLake more portable and easier to implement than Iceberg. MotherDuck is rolling out hosted support for DuckLake, offering fast, cloud-proximate queries and scalable data transformation.
Jordan argues to consider DuckLake for data lake implementations where metadata performance and transactional capabilities are critical, and leverage MotherDuck’s hosted support for enhanced query speed and data management.
βJoin MotherDuck CEO Jordan Tigani and DuckDB’s Hannes MΓΌhleisen for an
in-depth discussion about DuckLake.
My takeaways:
- Avro is bad as it can only be read row based. So if you have large metadata, it’s not optimized. Also, in terms of Iceberg, if you already have Parquet, adding another format just to do metadata complicates everything
- Access and permission pattern can be done through encryption key, potentially new features is encryption key based on column, that Parquet supports. Meaning you could have different keys per column and give access like this. This is not very advanced compared to a database where you can give row-and-column level access, but as its based on distributed files, that would already be great. MotherDuck is probably better positioned to integrate something easier here.
- Open Table Formats makes sense cost wise for data you do not need every day. Instead of storing them in costly datawarehouses with dedicated hardware, you can store them on “cheap” S3 distributed storage.
- The iceberg writers are sparse. Not because nobody wants to implement it, but because you basically have to build and integrate database capabilities into it, as the file format is just a “format” and not the actual engine that runs it. It’s also very hard with Iceberg, as you have now Avro (metadata), Parquet (data, snapshots) and new Open Catalogs for access and listing capabilites. All of it has to be coordinated and done concurrently in best cases when more people want to write at the same time.
- this is why ducklake was created, as databases are designed for this atomic, integrity style data. And that’s why ducklake can support 1000 of concurrent updates (e.g. through Postgres) where Iceberg is around 1 per second according to some.
- Updates can be performed transactionally, and changes can be inlined into database tables for high-frequency updates.
- Critics beside the huge amount of positive: to create a new standard when people started to align on Iceberg/Delta/Hudi. But in fact, it’s not a new one, it’s adding and simplifies to not lock in users to a vendor through their closed-source implementation of catalogs.
AI Highlights from
YouTube:
- Open data formats like Iceberg and Delta Lake aim to break data silos and facilitate data exchange between systems.
- Hadoop era pain points included unruly data lakes and challenges with updating data.
- Concerns that open table formats are conceptually flawed and complex to implement, particularly writers.
- Duck Lake: Acknowledges the need for a catalog server (database) for metadata management, unlike Iceberg’s initial file-based structure.
- Duck Lake uses a database schema for metadata, including tables for tables, columns, and files, along with a specification and DuckDB extension.
- Duck Lake offers a simpler, more elegant solution with fewer technologies involved, making implementation easier.
- Duck Lake can handle a large number of snapshots without performance issues.
- Duck Lake is agnostic to the metadata database used, allowing flexibility in choosing the storage solution.
- Initial feedback on Duck Lake has been positive, with vendors expressing interest.
- Discussion of access management and encryption strategies for Duck Lake.
- Thoughts on the Iceberg REST API and the tension between database people and developers.
# DuckLake Example (Code) including Key Feature
This example demonstrates DuckLake’s key features:
- Catalog Database: DuckDB (for metadata storage)
- Storage Backend: AWS S3 (for Parquet files)
- Schema Evolution: Adding/dropping columns, type promotion
- Time Travel: Querying historical snapshots
# Prerequisites
1
2
|
INSTALL ducklake;
INSTALL aws; -- for S3 support
|
# Setup DuckLake with S3 Storage
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
-- Configure AWS credentials (if needed)
CREATE SECRET ducklake (
TYPE s3,
KEY_ID 'my_secret_key1', --replace with yours
SECRET 'my_secret_value1', --replace with yours
REGION 'eu-central-1', --replace with yours
SCOPE 's3://sspaeti/warehouse-data' --replace with yours
);
--list
FROM duckdb_secrets();
-- Create DuckLake with DuckDB catalog and S3 storage
ATTACH 'ducklake:my_warehouse.ducklake' AS warehouse
(DATA_PATH 's3://sspaeti/warehouse-data/');
USE warehouse;
|
# Create Initial Table and Insert Data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
-- Create a customer table
CREATE TABLE customers (
id INTEGER,
name VARCHAR,
email VARCHAR,
created_date DATE
);
-- Insert initial data
INSERT INTO customers VALUES
(1, 'Alice Johnson', 'alice@email.com', '2024-01-15'),
(2, 'Bob Smith', 'bob@email.com', '2024-02-20'),
(3, 'Carol Davis', 'carol@email.com', '2024-03-10');
-- Check current data
FROM customers;
|
# Schema Evolution - Adding Columns
Schema Evolution is a great function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
-- Add a new column with default value
ALTER TABLE customers ADD COLUMN phone VARCHAR DEFAULT 'Not provided';
-- Add another column for customer status
ALTER TABLE customers ADD COLUMN status VARCHAR DEFAULT 'Active';
-- Verify schema changes
FROM customers;
-- Output shows new columns with default values for existing rows
-- Insert new data with all columns
INSERT INTO customers VALUES
(4, 'David Wilson', 'david@email.com', '2024-04-05', '555-0123', 'Premium');
FROM customers;
βββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬ββββββββββ
β id β name β email β created_date β phone β status β
β int32 β varchar β varchar β date β varchar β varchar β
βββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββββββββΌβββββββββββββββΌββββββββββ€
β 1 β Alice Johnson β alice@email.com β 2024-01-15 β Not provided β Active β
β 2 β Bob Smith β bob@email.com β 2024-02-20 β Not provided β Active β
β 3 β Carol Davis β carol@email.com β 2024-03-10 β Not provided β Active β
β 4 β David Wilson β david@email.com β 2024-04-05 β 555-0123 β Premium β
βββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄ββββββββββ
|
# More Data Changes for Time Travel Demo
1
2
3
4
5
6
7
8
9
|
-- Update customer status
UPDATE customers SET status = 'VIP' WHERE email = 'alice@email.com';
-- Add more customers
INSERT INTO customers VALUES
(5, 'Eva Martinez', 'eva@email.com', '2024-05-01', '555-0456', 'Active');
-- Delete a customer (creates another snapshot)
DELETE FROM customers WHERE id = 2;
|
# Schema Evolution - Dropping Columns
1
2
3
4
5
|
-- Drop the phone column (maybe privacy concerns)
ALTER TABLE customers DROP COLUMN phone;
-- Verify the column is gone
FROM customers;
|
# Time Travel Queries
Time Travel features:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
-- List all snapshots to see our history
FROM warehouse.snapshots();
D FROM warehouse.snapshots();
βββββββββββββββ¬βββββββββββββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β snapshot_id β snapshot_time β schema_version β changes β
β int64 β timestamp with time zone β int64 β map(varchar, varchar[]) β
βββββββββββββββΌβββββββββββββββββββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 0 β 2025-05-28 15:19:40.916+02 β 0 β {schemas_created=[main]} β
β 1 β 2025-05-28 15:20:15.92+02 β 1 β {tables_created=[main.customers]} β
β 2 β 2025-05-28 15:26:00.582+02 β 2 β {tables_dropped=[1]} β
β 3 β 2025-05-28 15:26:02.818+02 β 3 β {tables_created=[main.customers]} β
β 4 β 2025-05-28 15:31:15.704+02 β 3 β {tables_inserted_into=[2]} β
β 5 β 2025-05-28 15:31:35.534+02 β 4 β {tables_dropped=[2]} β
β 6 β 2025-05-28 15:31:41.484+02 β 5 β {tables_created=[main.customers]} β
β 7 β 2025-05-28 15:31:43.698+02 β 5 β {tables_inserted_into=[3]} β
β 8 β 2025-05-28 15:32:04.017+02 β 6 β {tables_altered=[3]} β
β 9 β 2025-05-28 15:32:08.438+02 β 7 β {tables_altered=[3]} β
β 10 β 2025-05-28 15:32:23.051+02 β 7 β {tables_inserted_into=[3]} β
β 11 β 2025-05-28 15:33:49.549+02 β 7 β {tables_deleted_from=[3]} β
β 12 β 2025-05-28 15:36:59.068+02 β 7 β {tables_inserted_into=[3], tables_deleted_from=[3]} β
β 13 β 2025-05-28 15:37:16.985+02 β 7 β {tables_inserted_into=[3]} β
β 14 β 2025-05-28 15:37:42.358+02 β 8 β {tables_altered=[3]} β
βββββββββββββββ΄βββββββββββββββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 15 rows 4 columns β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
Query version 10
including the phone number column still:
1
2
3
4
5
6
7
8
9
10
|
D SELECT * FROM customers AT (VERSION => 10);
βββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬ββββββββββ
β id β name β email β created_date β phone β status β
β int32 β varchar β varchar β date β varchar β varchar β
βββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββββββββΌβββββββββββββββΌββββββββββ€
β 1 β Alice Johnson β alice@email.com β 2024-01-15 β Not provided β Active β
β 2 β Bob Smith β bob@email.com β 2024-02-20 β Not provided β Active β
β 3 β Carol Davis β carol@email.com β 2024-03-10 β Not provided β Active β
β 4 β David Wilson β david@email.com β 2024-04-05 β 555-0123 β Premium β
βββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄ββββββββββ
|
eventough current version does not hold it anymore:
1
2
3
4
5
6
7
8
9
10
|
D from customers;
βββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ¬ββββββββββ
β id β name β email β created_date β status β
β int32 β varchar β varchar β date β varchar β
βββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββββββββΌββββββββββ€
β 3 β Carol Davis β carol@email.com β 2024-03-10 β Active β
β 4 β David Wilson β david@email.com β 2024-04-05 β Premium β
β 1 β Alice Johnson β alice@email.com β 2024-01-15 β VIP β
β 5 β Eva Martinez β eva@email.com β 2024-05-01 β Active β
βββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ΄ββββββββββ
|
This is how the S3 Object store looks like:

1
2
3
4
5
6
7
8
9
10
|
-- Query by timestamp (if you know approximate times)
D SELECT * FROM customers AT (TIMESTAMP => '2025-05-28 15:31:43.698+02');
βββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ
β id β name β email β created_date β
β int32 β varchar β varchar β date β
βββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββββββββ€
β 1 β Alice Johnson β alice@email.com β 2024-01-15 β
β 2 β Bob Smith β bob@email.com β 2024-02-20 β
β 3 β Carol Davis β carol@email.com β 2024-03-10 β
βββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ
|
# Advanced Features
# Change Data Feed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
-- See what changed between snapshots 1 and 3
D SELECT * FROM warehouse.table_changes('customers', 0, 14);
βββββββββββββββ¬ββββββββ¬βββββββββββββββββββ¬ββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ¬ββββββββββ
β snapshot_id β rowid β change_type β id β name β email β created_date β status β
β int64 β int64 β varchar β int32 β varchar β varchar β date β varchar β
βββββββββββββββΌββββββββΌβββββββββββββββββββΌββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββββββββΌββββββββββ€
β 7 β 0 β insert β 1 β Alice Johnson β alice@email.com β 2024-01-15 β Active β
β 7 β 1 β insert β 2 β Bob Smith β bob@email.com β 2024-02-20 β Active β
β 7 β 2 β insert β 3 β Carol Davis β carol@email.com β 2024-03-10 β Active β
β 10 β 3 β insert β 4 β David Wilson β david@email.com β 2024-04-05 β Premium β
β 12 β 0 β update_postimage β 1 β Alice Johnson β alice@email.com β 2024-01-15 β VIP β
β 13 β 5 β insert β 5 β Eva Martinez β eva@email.com β 2024-05-01 β Active β
β 12 β 0 β update_preimage β 1 β Alice Johnson β alice@email.com β 2024-01-15 β Active β
β 11 β 1 β delete β 2 β Bob Smith β bob@email.com β 2024-02-20 β Active β
βββββββββββββββ΄ββββββββ΄βββββββββββββββββββ΄ββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ΄ββββββββββ
|
# Exploring Storage Files
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
-- See the Parquet files in S3 (if using local storage, adjust path)
D FROM glob('s3://sspaeti/warehouse-data/*');
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β file β
β varchar β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β s3://sspaeti/warehouse-data/ducklake-01971717-0b7b-7d25-a85e-1254fe2c1014.parquet β
β s3://sspaeti/warehouse-data/ducklake-01971717-78e1-7c42-9d20-1b6387d0476a.parquet β
β s3://sspaeti/warehouse-data/ducklake-01971718-128d-7faf-9415-c7e452f29887.parquet β
β s3://sspaeti/warehouse-data/ducklake-01971719-653a-7990-8375-f5b5b5732f69-delete.parquet β
β s3://sspaeti/warehouse-data/ducklake-0197171c-48be-7bfa-b39d-dc6013a6d3bc.parquet β
β s3://sspaeti/warehouse-data/ducklake-0197171c-4c6f-7c74-9970-4583571c429a-delete.parquet β
β s3://sspaeti/warehouse-data/ducklake-0197171c-8eba-7a40-8cf7-67fc115ace70.parquet β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
1
2
3
4
5
6
7
8
9
10
11
12
|
-- Query the actual Parquet files directly
D FROM 's3://sspaeti/warehouse-data/*.parquet' LIMIT 5;
βββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ
β id β name β email β created_date β
β int32 β varchar β varchar β date β
βββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββββββββ€
β 1 β Alice Johnson β alice@email.com β 2024-01-15 β
β 2 β Bob Smith β bob@email.com β 2024-02-20 β
β 3 β Carol Davis β carol@email.com β 2024-03-10 β
β 1 β Alice Johnson β alice@email.com β 2024-01-15 β
β 2 β Bob Smith β bob@email.com β 2024-02-20 β
βββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ
|
# Multiple Table Operations
1
2
3
4
5
6
7
8
9
10
11
12
13
|
-- Create related table for ACID transaction demo
CREATE TABLE orders (
order_id INTEGER,
customer_id BIGINT,
amount DECIMAL(10,2),
order_date DATE
);
-- Atomic multi-table transaction
BEGIN TRANSACTION;
INSERT INTO orders VALUES (1001, 1, 99.99, '2024-05-28');
UPDATE customers SET status = 'Premium' where status = 'VIP';
COMMIT;
|
Result:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
D from orders;
ββββββββββββ¬ββββββββββββββ¬ββββββββββββββββ¬βββββββββββββ
β order_id β customer_id β amount β order_date β
β int32 β int64 β decimal(10,2) β date β
ββββββββββββΌββββββββββββββΌββββββββββββββββΌβββββββββββββ€
β 1001 β 1 β 99.99 β 2024-05-28 β
ββββββββββββ΄ββββββββββββββ΄ββββββββββββββββ΄βββββββββββββ
D from customers;
βββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ¬ββββββββββ
β id β name β email β created_date β status β
β int32 β varchar β varchar β date β varchar β
βββββββββΌββββββββββββββββΌββββββββββββββββββΌβββββββββββββββΌββββββββββ€
β 3 β Carol Davis β carol@email.com β 2024-03-10 β Active β
β 4 β David Wilson β david@email.com β 2024-04-05 β Premium β
β 5 β Eva Martinez β eva@email.com β 2024-05-01 β Active β
β 1 β Alice Johnson β alice@email.com β 2024-01-15 β Premium β
βββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ΄ββββββββββ
|
# How the DuckLake Catalog Looks Like?
It created this tables:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
D select table_name, table_type from information_schema.tables where table_catalog = '__ducklake_metadata_warehouse';
βββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββ
β table_name β table_type β
β varchar β varchar β
βββββββββββββββββββββββββββββββββββββββββΌβββββββββββββ€
β ducklake_column β BASE TABLE β
β ducklake_column_tag β BASE TABLE β
β ducklake_data_file β BASE TABLE β
β ducklake_delete_file β BASE TABLE β
β ducklake_files_scheduled_for_deletion β BASE TABLE β
β ducklake_file_column_statistics β BASE TABLE β
β ducklake_file_partition_value β BASE TABLE β
β ducklake_inlined_data_tables β BASE TABLE β
β ducklake_metadata β BASE TABLE β
β ducklake_partition_column β BASE TABLE β
β ducklake_partition_info β BASE TABLE β
β ducklake_schema β BASE TABLE β
β ducklake_snapshot β BASE TABLE β
β ducklake_snapshot_changes β BASE TABLE β
β ducklake_table β BASE TABLE β
β ducklake_table_column_stats β BASE TABLE β
β ducklake_table_stats β BASE TABLE β
β ducklake_tag β BASE TABLE β
β ducklake_view β BASE TABLE β
βββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββ€
β 19 rows 2 columns β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
# Data Model of DuckLake
This is how the data model looks like:

Image from
The DuckLake Manifesto: SQL as a Lakehouse Format β DuckLake
# Other DuckLake Implementations
Before the official DuckLake announcement by DuckDB Labs 2025-05-27, there were other versions of DuckLake.
# Xebia Version
An integrate between DuckDB with the open-source Unity Catalog.

Source
The article
Ducklake: A journey to integrate DuckDB with Unity Catalog - Xebia walks us through the integration
# Limitation as of Now
2024-10-22
There are a few limitations to be aware of, but these are evolving fast, and solutions are already on the horizon.
- Unity Catalog Authentication: At the time of initial development we used Unity Catalog 0.1.0. This version did not include Role-Based Access Control (RBAC) yet. Since then, support for this was added in Unity Catalog 0.2.0, making our setup even more powerful. In the upcoming series, weβll be able to demonstrate how to set it upβso stay tuned for that future post.
- Read-Only Delta Support: Currently, DuckDB can only read from Unity Catalog, because of its dependency onΒ
delta-kernel-rs
, which limits operations to read-only. The DuckDB team is actively working on write support, and itβs definitely on their roadmap. In the meantime, weβve developed a workaround to help you write to Unity Catalog, so you can keep moving forward even before native support arrives.
- Motherduck Integration: Motherduck offers a managed DuckDB experience, but unfortunately does not yet supportΒ
uc_catalogΒ extension, that doesnβt stop us! To get similar notebook integration, we have built a solution using Jupyter notebooks, a web-based tool for interactive computing.
Origin:
x.com
References: DuckHouse
Created 2024-10-22