Search

Search IconIcon to open search

Kaggle

Last updated by Simon Späti

Kaggle is the world’s largest Data Science community with powerful tools and resources to help you achieve your data science goals.

# Datasets

See datasets in Find good Data Sets or Sources:

!Find good Data Sets or Sources

# Using DuckDB to Read Datasets with Extension

Gaggle is a DuckDB extension for working with Kaggle datasets. TL;DR: You can directly access and query Kaggle datasets with kaggle: as if they were local tables, abstracting away the download, extraction, and caching process.

See how this works with example. First setup API keys with - Go to kaggle.com/settings API section → "Create New Token" and export it:

1
2
export KAGGLE_USERNAME="your_username"
export KAGGLE_KEY="your_api_key"

Then start DuckDB and install the extension:

1
2
install gaggle from community;
load gaggle;

List boston house prices (this dataset here: Boston House Prices):

1
2
3
4
5
6
7
8
-- List files (no recursive parameter in released version)
SELECT * FROM gaggle_ls('vikrishnan/boston-house-prices');
┌─────────────┬───────┬─────────────────────────────────────────────────────────────────────────────────┐
    name      size                                        path                                       
   varchar    int64                                      varchar                                     
├─────────────┼───────┼─────────────────────────────────────────────────────────────────────────────────┤
 housing.csv    0    /home/sspaeti/.cache/gaggle/datasets/vikrishnan/boston-house-prices/housing.csv 
└─────────────┴───────┴─────────────────────────────────────────────────────────────────────────────────┘

Query dataset directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT * FROM 'kaggle:vikrishnan/boston-house-prices/housing.csv' LIMIT 5;
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
 0.00632  18.00   2.310  0  0.5380  6.5750  65.20  4.0900   1  296.0  15.30 396.90   4.98  24.00  
                                             varchar                                              
├──────────────────────────────────────────────────────────────────────────────────────────────────┤
  0.02731   0.00   7.070  0  0.4690  6.4210  78.90  4.9671   2  242.0  17.80 396.90   9.14  21.60 
  0.02729   0.00   7.070  0  0.4690  7.1850  61.10  4.9671   2  242.0  17.80 392.83   4.03  34.70 
  0.03237   0.00   2.180  0  0.4580  6.9980  45.80  6.0622   3  222.0  18.70 394.63   2.94  33.40 
  0.06905   0.00   2.180  0  0.4580  7.1470  54.20  6.0622   3  222.0  18.70 396.90   5.33  36.20 
  0.02985   0.00   2.180  0  0.4580  6.4300  58.70  6.0622   3  222.0  18.70 394.12   5.21  28.70 
└──────────────────────────────────────────────────────────────────────────────────────────────────┘

Or download first, then get file path:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
SELECT gaggle_download('vikrishnan/boston-house-prices');
┌─────────────────────────────────────────────────────────────────────┐
          gaggle_download('vikrishnan/boston-house-prices')          
                               varchar                               
├─────────────────────────────────────────────────────────────────────┤
 /home/sspaeti/.cache/gaggle/datasets/vikrishnan/boston-house-prices 

└─────────────────────────────────────────────────────────────────────┘

SELECT gaggle_file_path('vikrishnan/boston-house-prices', 'housing.csv');
┌─────────────────────────────────────────────────────────────────────────────────┐
        gaggle_file_path('vikrishnan/boston-house-prices', 'housing.csv')        
                                     varchar                                     
├─────────────────────────────────────────────────────────────────────────────────┤
 /home/sspaeti/.cache/gaggle/datasets/vikrishnan/boston-house-prices/housing.csv 
└─────────────────────────────────────────────────────────────────────────────────┘

Checking version:

1
2
3
4
5
6
7
SELECT gaggle_version();
┌──────────────────┐
 gaggle_version() 
     varchar      
├──────────────────┤
 0.1.0-alpha.3    
└──────────────────┘

Origin: Kaggle examples
References: Change Data Capture (CDC) - Real-Estate Project, Updating my Open-Source Real-Estate Data Engineering Project
Created 2025-12-02