Querying Bluesky with DuckDB and SQL
Querying Bluesky directly with DuckDB and SQL is easily possible because the AT Protocol is publicly available and has a Jetstream that you can read.
# Examples
# Engagement Stats
This is the recent Bluesky Engagement Stats ordered by engagement, including a chart:

See
origin on Bluesky post
# Query
# DuckDB Obsidian Extension
|
|
| post_uri | created_at | total_engagement | engagement_chart | replies | reposts | likes | quotes | post_text |
|---|---|---|---|---|---|---|---|---|
| 3mnct7grlyc25 | 1780411227949 | 23 | ██████████████████████████████ | 1 | 2 | 20 | 0 | If you want to read your recent engagement stats f |
| 3mmnticitu22h | 1779689970939 | 19 | ████████████████████████▊ | 2 | 0 | 17 | 0 | Touch some grass, they say. 🇨🇭 |
| 3mn2s5af4uk2u | 1780135202446.637 | 18 | ███████████████████████▍ | 3 | 1 | 14 | 0 | The fastest way to learn. Swyx: “Create learning |
| 3mmvm7zvud22z | 1779957055219 | 18 | ███████████████████████▍ | 2 | 0 | 16 | 0 | Running a DuckDB as a server, setting up from scra |
| 3mn7tvavtrs2l | 1780308880778 | 12 | ███████████████▋ | 2 | 1 | 8 | 1 | My second brain just got a lot more powerful. Ins |
| 3mnfozcj5ks2q | 1780509806399 | 11 | ██████████████▎ | 1 | 1 | 7 | 2 | 101 concepts every data engineer should know. If |
| 3mmwzgy264t2o | 1780005606449.243 | 10 | █████████████ | 1 | 1 | 8 | 0 | Data lakes solve flexibility. Data warehouses solv |
| 3mn7uh2svps2o | 1780309478371 | 10 | █████████████ | 0 | 1 | 9 | 0 | This wealth of knowledge is free of charge and upd |
| 3mnf62txilk2i | 1780491604591.725 | 10 | █████████████ | 2 | 0 | 8 | 0 | Using DuckDB as a caching layer for sub-second ana |
| 3mnah6tsa632z | 1780329603611.699 | 9 | ███████████▋ | 1 | 0 | 8 | 0 | Note-taking is like saving money with interest. Yo |
done with Obsidian DuckDB Extension
See more at DuckDB with Obsidian.
# Live on Page with Workbench
Run the query live on this page:
-- Charting bars with engagement metrics
WITH raw_data AS (
SELECT * FROM read_json_auto('https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=did:plc:edglm4muiyzty2snc55ysuqx&limit=100')
),
unnested_feed AS (
SELECT unnest(feed) as post_data FROM raw_data
),
engagement_data AS (
SELECT
RIGHT(post_data.post.uri, 13) as post_uri,
--post_data.post.uri as post_uri,
post_data.post.author.handle,
LEFT(post_data.post.record.text, 50) as post_text,
post_data.post.record.createdAt as created_at,
(post_data.post.replyCount +
post_data.post.repostCount +
post_data.post.likeCount +
post_data.post.quoteCount) as total_engagement,
post_data.post.replyCount as replies,
post_data.post.repostCount as reposts,
post_data.post.likeCount as likes,
post_data.post.quoteCount as quotes,
FROM unnested_feed
)
SELECT
post_uri,
created_at,
total_engagement,
bar(total_engagement, 0,
(SELECT MAX(total_engagement) FROM engagement_data),
30) as engagement_chart,
replies, reposts, likes, quotes,
post_text,
FROM engagement_data
WHERE handle = 'ssp.sh'
ORDER BY total_engagement DESC
LIMIT 30;
# Reading Posts
Bsky and
GitHub Gist:

Queried with Query databases in vim (vim-dadbod) within Neovim.
# Query
Or reading the posts - again, run the query live on this page:
-- Query the API directly and flatten the nested JSON structure
WITH raw_data AS (
SELECT * FROM read_json_auto('https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed?actor=did:plc:edglm4muiyzty2snc55ysuqx&limit=10')
),
unnested_feed AS (
SELECT unnest(feed) as post_data FROM raw_data
)
SELECT
-- Post basics
post_data.post.uri as post_uri,
post_data.post.author.handle as author_handle,
post_data.post.author.displayName as display_name,
-- Post content
post_data.post.record.text as post_text,
post_data.post.record.createdAt as created_at,
-- Engagement metrics
post_data.post.replyCount as replies,
post_data.post.repostCount as reposts,
post_data.post.likeCount as likes,
post_data.post.quoteCount as quotes,
-- Embedded content (if available)
CASE
WHEN post_data.post.embed IS NOT NULL
AND post_data.post.embed['$type'] = 'app.bsky.embed.external#view'
THEN post_data.post.embed.external.uri
ELSE NULL
END as embedded_link,
CASE
WHEN post_data.post.embed IS NOT NULL
AND post_data.post.embed['$type'] = 'app.bsky.embed.external#view'
THEN post_data.post.embed.external.title
ELSE NULL
END as embedded_title,
-- Total engagement score
(post_data.post.replyCount +
post_data.post.repostCount +
post_data.post.likeCount +
post_data.post.quoteCount) as total_engagement
FROM unnested_feed
ORDER BY created_at DESC;
Uploaded to Reading bsky posts with DuckDB example. · GitHub, too.
# Get Unique ID
Get the Unique ID of my User through public API using HTTPS community extension
|
|
Also find a older Git repo with some queries
GitHub - sspaeti/bsky-atproto: Playground for bsky-atproto · GitHub
# Further Reads
- I wrote about Building with Bluesky: Inside the New Open Social Network
- Run DuckDB in your Website
- DuckDB with Obsidian
Origin: ATProto