Search

Search IconIcon to open search

QuackStore: OLAP Cache Layer for DuckDB

Last updatedUpdated: by Simon Späti · CreatedCreated: · 3 min read

OLAP Cache Layer locally with QuackStore.

Created by Coginiti.

Speed up your data queries by caching remote files locally. The QuackStore extension uses block-based caching to automatically store frequently accessed file portions in a local cache, dramatically reducing load times for repeated queries on the same data.

# Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
INSTALL quackstore FROM community;
LOAD quackstore;

SET GLOBAL quackstore_cache_path = '/tmp/my_duckdb_cache.bin';
SET GLOBAL quackstore_cache_enabled = true;

.timer on

-- Slow: Downloads every time
select count(*) FROM read_csv('https://noaa-ghcn-pds.s3.amazonaws.com/csv.gz/by_year/2025.csv.gz');

-- Fast: Cached after first download
SUMMARIZE FROM read_csv('quackstore://https://noaa-ghcn-pds.s3.amazonaws.com/csv.gz/by_year/2025.csv.gz');

The outcome - first time without cache 49.366 - generating it:

1
2
3
4
count_star()
------------
26016543    
Run Time (s): real 49.366 user 51.777825 sys 0.449690

second time, cached this time is 3.304:

1
2
3
4
count_star()
------------
26016543    
Run Time (s): real 3.304 user 7.630344 sys 0.237343

The cache is 116 MB for this 26 million row dataset:

Even Summarize query:

1
SUMMARIZE FROM read_csv('quackstore://https://noaa-ghcn-pds.s3.amazonaws.com/csv.gz/by_year/2025.csv.gz');

was faster after, eventough this specific question was not cached yet. It only took 4.100 on first run.

# More from GitHub

THere are more example from GitHub:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Cache a CSV file from GitHub
SELECT * FROM 'quackstore://https://raw.githubusercontent.com/owner/repo/main/data.csv';

-- Cache a single Parquet file from S3
SELECT * FROM parquet_scan('quackstore://s3://example_bucket/data/file.parquet');

-- Cache whole Iceberg catalog from S3
SELECT * FROM iceberg_scan('quackstore://s3://example_bucket/iceberg/catalog');

-- Cache any web resource
SELECT content FROM read_text('quackstore://https://example.com/file.txt');

# Comparison

# DuckDB DiskCache vs QuackStore

Based on the repositories provided, here is a comparison between DuckDB DiskCache (referenced via Peter Boncz’s work and implemented as the cache_httpfs community extension) and QuackStore (by Coginiti).

Both extensions solve the same core problem: accelerating repeated queries on remote files (S3, HTTP/HTTPS, etc.) by caching data blocks on the local disk. However, they approach integration, configuration, and metadata handling differently.

# High-Level Comparison

Feature DuckDB DiskCache (via cache_httpfs) QuackStore (Coginiti)
Primary Goal High-performance, tunable caching for remote I/O (S3/HTTP). robust, easy-to-use persistent caching for remote files.
Usage Method Transparent / Wrapper: Can wrap existing filesystems or be configured globally. Explicit Prefix: Requires changing URLs to quackstore://....
Caching Scope Data Blocks + Metadata + Globs: Caches file lists and headers, speeding up glob patterns (e.g., s3://bucket/*.parquet). Data Blocks: Primarily caches file content chunks.
Granularity Tunable block size (default usually 1MB), tunable parallelism. Fixed block-based caching (1MB blocks).
Persistence directory of cache files; supports in-memory and on-disk modes. Single persistent cache file (or managed set) with corruption detection.
Eviction LRU (Least Recently Used) or Timestamp-based. LRU with automatic corruption recovery.
Developer Community / Research (Peter Boncz, et al.) Coginiti (Commercial Tool Vendor)
When to choose which
Choose QuackStore if… Choose DuckDB DiskCache if…
You want a “safe” and explicit solution where you control exactly which files are cached via the URL. You need to speed up glob patterns (e.g., *.parquet) and metadata listing, not just data reading.
You are working in an environment where cache corruption (disk bit rot) is a concern and you want auto-recovery. You want advanced tuning (parallelism, specific block sizes) to maximize S3 throughput.
You prefer a simple setup provided by a commercial vendor (Coginiti). You prefer a solution aligned with core DuckDB research and transparent filesystem integration.

Origin: DuckDB