Search

Search IconIcon to open search

Querying Bluesky with DuckDB and SQL

Last updatedUpdated: by Simon Späti · CreatedCreated: · 4 min read

Querying Bluesky directly with DuckDB and SQL is easily possible because the AT Protocol is publicly available and has a Jetstream that you can read.

# Examples

# Engagement Stats

This is the recent Bluesky Engagement Stats ordered by engagement, including a chart:

See origin on Bluesky post

# Query

# DuckDB Obsidian Extension
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Charting bars with engagement metrics
WITH raw_data AS (
  SELECT * FROM read_json_auto('https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=did:plc:edglm4muiyzty2snc55ysuqx&limit=100')
),
unnested_feed AS (
  SELECT unnest(feed) as post_data FROM raw_data
),
engagement_data AS (
  SELECT 
    RIGHT(post_data.post.uri, 13) as post_uri,
    --post_data.post.uri as post_uri,
    post_data.post.author.handle,
    LEFT(post_data.post.record.text, 50) as post_text,
    post_data.post.record.createdAt as created_at,
    (post_data.post.replyCount + 
     post_data.post.repostCount + 
     post_data.post.likeCount + 
     post_data.post.quoteCount) as total_engagement,
      post_data.post.replyCount as replies,
      post_data.post.repostCount as reposts,
      post_data.post.likeCount as likes,
      post_data.post.quoteCount as quotes,
  FROM unnested_feed
)
SELECT 
  post_uri,
  created_at,
  total_engagement,
  bar(total_engagement, 0, 
      (SELECT MAX(total_engagement) FROM engagement_data), 
      30) as engagement_chart,
  replies, reposts, likes, quotes,
  post_text,

FROM engagement_data
WHERE handle = 'ssp.sh'
ORDER BY total_engagement DESC
LIMIT 10;  
post_uri created_at total_engagement engagement_chart replies reposts likes quotes post_text
3mnct7grlyc25 1780411227949 23 ██████████████████████████████ 1 2 20 0 If you want to read your recent engagement stats f
3mmnticitu22h 1779689970939 19 ████████████████████████▊ 2 0 17 0 Touch some grass, they say. 🇨🇭
3mn2s5af4uk2u 1780135202446.637 18 ███████████████████████▍ 3 1 14 0 The fastest way to learn. Swyx: “Create learning
3mmvm7zvud22z 1779957055219 18 ███████████████████████▍ 2 0 16 0 Running a DuckDB as a server, setting up from scra
3mn7tvavtrs2l 1780308880778 12 ███████████████▋ 2 1 8 1 My second brain just got a lot more powerful. Ins
3mnfozcj5ks2q 1780509806399 11 ██████████████▎ 1 1 7 2 101 concepts every data engineer should know. If
3mmwzgy264t2o 1780005606449.243 10 █████████████ 1 1 8 0 Data lakes solve flexibility. Data warehouses solv
3mn7uh2svps2o 1780309478371 10 █████████████ 0 1 9 0 This wealth of knowledge is free of charge and upd
3mnf62txilk2i 1780491604591.725 10 █████████████ 2 0 8 0 Using DuckDB as a caching layer for sub-second ana
3mnah6tsa632z 1780329603611.699 9 ███████████▋ 1 0 8 0 Note-taking is like saving money with interest. Yo

done with Obsidian DuckDB Extension

See more at DuckDB with Obsidian.

# Live on Page with Workbench

Run the query live on this page:


-- Charting bars with engagement metrics
WITH raw_data AS (
  SELECT * FROM read_json_auto('https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=did:plc:edglm4muiyzty2snc55ysuqx&limit=100')
),
unnested_feed AS (
  SELECT unnest(feed) as post_data FROM raw_data
),
engagement_data AS (
  SELECT 
    RIGHT(post_data.post.uri, 13) as post_uri,
    --post_data.post.uri as post_uri,
    post_data.post.author.handle,
    LEFT(post_data.post.record.text, 50) as post_text,
    post_data.post.record.createdAt as created_at,
    (post_data.post.replyCount + 
     post_data.post.repostCount + 
     post_data.post.likeCount + 
     post_data.post.quoteCount) as total_engagement,
      post_data.post.replyCount as replies,
      post_data.post.repostCount as reposts,
      post_data.post.likeCount as likes,
      post_data.post.quoteCount as quotes,
  FROM unnested_feed
)
SELECT 
  post_uri,
  created_at,
  total_engagement,
  bar(total_engagement, 0, 
      (SELECT MAX(total_engagement) FROM engagement_data), 
      30) as engagement_chart,
  replies, reposts, likes, quotes,
  post_text,

FROM engagement_data
WHERE handle = 'ssp.sh'
ORDER BY total_engagement DESC
LIMIT 30;  

# Reading Posts

Bsky and GitHub Gist:

Queried with Query databases in vim (vim-dadbod) within Neovim.

# Query

Or reading the posts - again, run the query live on this page:


-- Query the API directly and flatten the nested JSON structure
WITH raw_data AS (
  SELECT * FROM read_json_auto('https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=did:plc:edglm4muiyzty2snc55ysuqx&limit=10')
),
unnested_feed AS (
  SELECT unnest(feed) as post_data FROM raw_data
)
SELECT 
  -- Post basics
  post_data.post.uri as post_uri,
  post_data.post.author.handle as author_handle,
  post_data.post.author.displayName as display_name,
  
  -- Post content
  post_data.post.record.text as post_text,
  post_data.post.record.createdAt as created_at,
  
  -- Engagement metrics
  post_data.post.replyCount as replies,
  post_data.post.repostCount as reposts,
  post_data.post.likeCount as likes,
  post_data.post.quoteCount as quotes,
  
  -- Embedded content (if available)
  CASE 
    WHEN post_data.post.embed IS NOT NULL 
    AND post_data.post.embed['$type'] = 'app.bsky.embed.external#view' 
    THEN post_data.post.embed.external.uri
    ELSE NULL
  END as embedded_link,
  
  CASE 
    WHEN post_data.post.embed IS NOT NULL 
    AND post_data.post.embed['$type'] = 'app.bsky.embed.external#view' 
    THEN post_data.post.embed.external.title
    ELSE NULL
  END as embedded_title,
  
  -- Total engagement score
  (post_data.post.replyCount + 
   post_data.post.repostCount + 
   post_data.post.likeCount + 
   post_data.post.quoteCount) as total_engagement
FROM unnested_feed
ORDER BY created_at DESC;

Uploaded to Reading bsky posts with DuckDB example. · GitHub, too.

# Get Unique ID

Get the Unique ID of my User through public API using HTTPS community extension

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
D WITH __input AS (
    SELECT
      http_get('https://public.api.bsky.app/xrpc/com.atproto.identity.resolveHandle?handle=ssp.sh') AS res
  )
  SELECT
    res::json->>'body' as identity_json
  FROM __input;
┌────────────────────────────────────────────┐
               identity_json                
                  varchar                   
├────────────────────────────────────────────┤
 {"did":"did:plc:edglm4muiyzty2snc55ysuqx"} 
└────────────────────────────────────────────┘

Also find a older Git repo with some queries

GitHub - sspaeti/bsky-atproto: Playground for bsky-atproto · GitHub

# Further Reads


Origin: ATProto