Search
Postgres WAL Implementation
How it works see A Guide to Logical Replication and CDC in PostgreSQL with Airbyte - Neon.
A nice implementation with Singer Tap: pipelinewise-tap-postgres/tap_postgres/sync_strategies/logical_replication.py at d3180a193e30b3813ea71d3425b1fe8b2379896c · transferwise/pipelinewise-tap-postgres · GitHub.
# General workflow of WAL
When using logical replication in PostgreSQL, if the stream is closed and then restarted after a delay (like the 60 seconds in your script), you do not lose the changes that occurred during the interval when the stream was closed. Here’s why:
- Persistent Replication Slots: PostgreSQL uses a concept called “replication slots” for logical replication. When you create a replication slot, PostgreSQL keeps track of the changes that need to be sent to that slot. Even if your script (or the consumer of the replication slot) disconnects or stops consuming the WAL changes, PostgreSQL retains these changes.
- Resuming From Last Checkpoint: When you restart the stream, it resumes from the last checkpoint – the last LSN (Log Sequence Number) that was acknowledged by your script using
msg.cursor.send_feedback(flush_lsn=msg.data_start)
. This ensures that all changes that occurred while the stream was closed are still sent to your script once it resumes listening. - Disk Space Consideration: It’s important to note that if a replication slot is not actively consumed (like during your 60-second pause), the WAL segments that contain changes for that slot will be retained on disk and not recycled. This can lead to increased disk space usage on the server if the slot remains inactive for a long time.
- Handling Prolonged Disconnections: In cases where your script might be disconnected for an extended period, it’s crucial to manage the replication slot and ensure it does not lead to excessive disk space usage. You might need to drop and recreate the slot if it becomes too far behind, though this would mean losing some data.
# Activate WAL Log
go to local file postgresql.conf
:
|
|
# Read WAL logs with Python
Simples Beispiel wie man ein Stream lesen kann:
|
|
# Stream json im wal2json format.
Setup database:
|
|
after that we build up the stream with python abobe-script.py
.
# Inserts and Updates
If we have these inserts and updates
|
|
The wal2json
looks like this:
|
|
Origin:
Subscribing to PostgreSql logical replication using python and psycopg2 – dbaStreet – A Database & Cloud Blog
References:
Created 2024-01-18