PostgreSQL → Snowflake: 10M CDC ops in 3m 52s

So we set up a Change Data Capture replication pipeline from PostgreSQL to Snowflake and wanted to benchmark how it performs under a “typical” OLTP load. This article discusses the various characteristics of the benchmark.

Benchmark Details

The most important decision here was to determine how to do a replication benchmark, specifically what constitutes a typical load profile of an OLTP database. We realized that a typical OLTP database has the following characteristics

It has multiple tables.
It has some inserts.
It has somewhat more updates.
It has some deletes.
It has small transactions with lot of commits, mostly single row operations and commits.

Given the above characteristics we decided on doing the following for the benchmark.

We will have 10 tables.
We will have 2 million single row inserts.
We will have 6 million single row updates.
We will have 2 million single row deletes.
We will have auto commit on, causing all the above operations to have corresponding commit.

Table Structure

We will have 10 tables called firenibble 1 to 10.

CREATE TABLE public.firenibble1 (
  f0 bigint NOT NULL,
  f1 bigint DEFAULT NULL,
  f2 bigint DEFAULT NULL,
  f3 int DEFAULT NULL,
  f4 float8 DEFAULT NULL,
  f5 float8 DEFAULT NULL,
  f6 float8 DEFAULT NULL,
  f7 float8 DEFAULT NULL,
  f8 varchar(128) DEFAULT NULL,
  f9 varchar(128) DEFAULT NULL,
  f10 date DEFAULT NULL,
  f11 date DEFAULT NULL,
  f12 date DEFAULT NULL,
  f13 varchar(128) DEFAULT NULL,
  f14 varchar(128) DEFAULT NULL,
  f15 varchar(128) DEFAULT NULL,
  PRIMARY KEY (f0)
) ;

Insert

Insert will essentially be new primary key insert.

insert into firenibble1 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');

Update

Update will be updating of two columns on a primary key value.

update firenibble1 set f2 = 280396830734761092, f9 = 'n6ChQjN9dfxIl6nc6SNgIzSahfeXfAef' where f0 = 19998;

Delete

Delete will be essentially a primary key delete.

delete from firenibble1 where f0 = 19998;

DML Operations Sequence

Then came the question of how do we order the load so that we can rerun the load over and over again and never have to clean up the previous load. So we decided that the load will be such that we will start with 0 rows in all 10 tables, do a bunch of inserts, do a bunch of updates, do a bunch of deletes and end up with 0 rows in all tables. This kept the load and the tables rather clean. So the exact sequence will look like this. I know we could have been more granular mixing the inserts, updates and deletes even more, and as we do more benchmarks we might try that strategy but for starters this looked pretty good. Also we wanted to see how our replication performed with large number of insert only, update only and delete only loads. Also note that there is no relationships between the load and the change logs, in the sense there is no flush binary logs happening anywhere causing the inserts, updates and deletes to cross binary log boundaries.

1 million inserts
3 million updates
1 million deletes
1 million inserts
3 million updates
1 million deletes.

We will also makes sure that all inserts, updates and deletes happen on 10 tables in a sequence so that the operations are granularily staggered over all tables and not contiguous on one table and the next. Which means, the inserts look like this (and so does the updates and deletes).

insert into firenibble1 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble2 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble3 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble4 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble5 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble6 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble7 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble8 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble9 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');
insert into firenibble10 values (19998, 5963887289447355291, 6644703987735879175, -2094978734, 0.51946, 0.480314, 0.528386, 0.247536, 'Y4Jt79YtKat99KRV8CioT7p6bUVdtsm6', 'hC8SKt3MgvoGkOYJIljgKjdEaoY68YJe', '5354-04-29', '5741-02-18', '9548-08-16', 'K3fiVGzTGeiIB2AfChQDXOAQ7ANZBxEN', 'SETnGb7dxOVQthQtFdH6lbZ6yqVVoXPU', 'fJMS3fPnIJBnsi0CBxkFc2fZDf5gUiSJ');

Benchmark Sequence

A choice we had was to make was do one of the following

Have the wirekite replication running. Run the load and wait for the last event of the load to appear on the target. Measure the time between when the load started and the last event appeared.
Run the load and note down the binlog number and position at which the load started and wait for the load to finish. Now start the wirekite replication from the starting binlog number and position and wait for the last event of the load to appear on the target. Measure the time between when the wirekite replication started and the last event appeared on the target.

We decided to go with option 2. The primary reason being that in option 1 wirekite replication was waiting for the events to finish successfully on the database, and essentially we were not only benchmarking wirekite, but also the database, with the boundaries between the two completely blurred. So we preferred the pre-loaded database with the changes, so that we are only benchmarking wirekite and nothing else.

Machine Configuration

For Postgres we created an n2-standard-16 GCP instance with all the standard configurations - 8 CPU’s and 64 GB RAM in the us-central region. We installed Ubuntu version 24 and Oracle version 19. Our archive log size was 330 MB.

For Snowflake we created a Large Size Warehouse with Standard Type also in the us-central1 region to minimize the network distance.

We have been using the same configurations for all our benchmarks so that we are comparing apples and apples.

Wirekite Run

So how long did it take for wirekite to move 10 million changes — 3 minutes and 52 seconds.

cat wirekite_cdc.log | grep -E 'START|FINISH|ELAPSED'
START: Tue May 13 06:46:30 UTC 2025
FINISH: Tue May 13 06:50:22 UTC 2025
TOTAL ELAPSED TIME: 0 hrs 4 min 5 sec