En resumen Debezium reads Oracle's redo logs through LogMiner, publishes every change as a Kafka event, and a JDBC sink connector applies those events to PostgreSQL in real time.
The result is a transparent, replayable replication pipeline you can run for weeks before cutover — useful for any Oracle to PostgreSQL migration that needs zero or near-zero downtime.
This post walks through a working setup end to end, on real Oracle and PostgreSQL servers, with a self-contained banking schema you can paste and reproduce.
Most Oracle to PostgreSQL migration projects fail at the cutover, not at the schema conversion.
The schema work is hard but bounded — you spend a few weeks on it, you fix what the conversion tool cannot translate, and you move on.
The cutover is open-ended.
You have a production Oracle database that processes thousands of transactions per second.
You need to switch the application to PostgreSQL without losing a single row, and ideally without a multi-hour downtime window.
A snapshot-based migration alone — dump, restore, repoint — does not solve this.
The moment you take the snapshot, every new INSERT, UPDATE, and DELETE on Oracle is invisible to PostgreSQL.
You need a tool that captures Oracle's ongoing changes and applies them to PostgreSQL continuously.
Debezium is the open-source standard for that job.
Índice
What Debezium Does
Debezium is a change data capture (CDC) framework built on top of Kafka Connect.
For Oracle, it uses the LogMiner adapter — a built-in Oracle feature that reads the redo logs and exposes every committed change as a structured stream.
Debezium polls LogMiner, converts each row change into a Kafka event, and publishes the event to a Kafka topic.
A second connector — the Debezium JDBC sink — subscribes to those topics and applies the changes to a target database.
For Oracle to PostgreSQL replication, the pipeline looks like this:
Oracle (LogMiner) → Debezium source → Kafka → Debezium JDBC sink → PostgreSQLBoth connectors run inside a single Debezium Connect process.
Kafka holds events on topics named after each Oracle table.
You can stop the sink and let events accumulate in Kafka.
You can replay events from a specific offset.
You can run two sinks in parallel — one to PostgreSQL, one to a staging environment for testing.
This is what makes Debezium useful for migration work: you can see exactly what is happening at every stage, and you can intervene at any of them.
The Environment
Three VMs on the same network.
srv1(192.168.0.180) runs Oracle 19c Enterprise Edition with a CDB calledORADBand a pluggable databasepdb1.srvdebezium(192.168.0.230) runs Ubuntu 24.04 with Docker, hosting the Kafka broker and the Debezium Connect container.srv2(192.168.0.181) runs PostgreSQL 18 from the official PGDG repository, hosting the target databasebankingdb.
The pipeline replicates a small banking schema from Oracle to PostgreSQL: five tables (branches, employees, clientes, accounts, transactions) with foreign-key relationships, mixed data types, and 18 sample rows.
The full DDL and sample data are below — copy-paste reproducible from a clean Oracle install.
Step 1 — Prepare the Oracle Source
LogMiner needs three things turned on before Debezium can read it.
Archivelog mode.
Without it, Oracle overwrites old redo logs once they are no longer needed for instance recovery.
LogMiner needs them on disk to read.
Switching to archivelog mode requires an instance restart.
-- On srv1 as SYSDBA
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
ALTER PLUGGABLE DATABASE pdb1 OPEN;
Supplemental logging.
By default, Oracle writes only the changed columns to the redo log.
Debezium needs full row images to populate the before field in update events.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
A capture user.
Debezium connects as a common user (the C## prefix is mandatory in a multitenant database) with read access to the data dictionary and LogMiner views.
Apply every grant — Debezium will fail to start if any one of them is missing.
CREATE USER C##DBZUSER IDENTIFIED BY dbz
DEFAULT TABLESPACE users
QUOTA UNLIMITED ON users
CONTAINER=ALL;
GRANT CREATE SESSION, SET CONTAINER, LOGMINING,
SELECT ANY TRANSACTION, SELECT ANY DICTIONARY,
SELECT_CATALOG_ROLE, EXECUTE_CATALOG_ROLE,
FLASHBACK ANY TABLE, SELECT ANY TABLE,
CREATE TABLE, LOCK ANY TABLE, CREATE SEQUENCE
TO C##DBZUSER CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR TO C##DBZUSER CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##DBZUSER CONTAINER=ALL;
-- Plus SELECT on V_$DATABASE, V_$LOG, V_$LOGFILE, V_$ARCHIVED_LOG,
-- V_$LOGMNR_LOGS, V_$LOGMNR_CONTENTS, V_$TRANSACTION, and a few more.
-- Do not shortcut with GRANT DBA — explicit grants are auditable and reversible.
The complete grant list is in the Debezium 3.5 Oracle connector documentation.
Twenty-eight grants in total.
Step 2 — Create the BANKING Schema
The schema below is small enough to paste in 30 seconds and varied enough to exercise the data-type behaviour Debezium gets right and wrong.
-- On srv1, connected to pdb1 as a privileged user
CREATE USER banking IDENTIFIED BY banking
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp;
GRANT CONNECT, RESOURCE, CREATE VIEW TO banking;
ALTER USER banking QUOTA UNLIMITED ON users;
Connect as banking and create the five tables:
CREATE TABLE branches (
branch_id NUMBER(6) PRIMARY KEY,
branch_name VARCHAR2(100) NOT NULL,
city VARCHAR2(50),
country VARCHAR2(50),
opened_date DATE
);
CREATE TABLE employees (
employee_id NUMBER(6) PRIMARY KEY,
branch_id NUMBER(6) NOT NULL,
full_name VARCHAR2(100) NOT NULL,
role VARCHAR2(50),
hire_date DATE,
CONSTRAINT fk_emp_branch FOREIGN KEY (branch_id) REFERENCES branches(branch_id)
);
CREATE TABLE customers (
customer_id NUMBER(8) PRIMARY KEY,
full_name VARCHAR2(100) NOT NULL,
email VARCHAR2(100),
country VARCHAR2(50),
created_at TIMESTAMP DEFAULT SYSTIMESTAMP
);
CREATE TABLE accounts (
account_id NUMBER(10) PRIMARY KEY,
customer_id NUMBER(8) NOT NULL,
branch_id NUMBER(6) NOT NULL,
account_type VARCHAR2(20),
balance NUMBER(15,2) DEFAULT 0,
opened_at TIMESTAMP DEFAULT SYSTIMESTAMP,
CONSTRAINT fk_acc_customer FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
CONSTRAINT fk_acc_branch FOREIGN KEY (branch_id) REFERENCES branches(branch_id)
);
CREATE TABLE transactions (
txn_id NUMBER(12) PRIMARY KEY,
account_id NUMBER(10) NOT NULL,
employee_id NUMBER(6),
txn_type VARCHAR2(20),
amount NUMBER(15,2) NOT NULL,
txn_date TIMESTAMP DEFAULT SYSTIMESTAMP,
description VARCHAR2(200),
CONSTRAINT fk_txn_account FOREIGN KEY (account_id) REFERENCES accounts(account_id),
CONSTRAINT fk_txn_employee FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
Insert a small representative dataset — three branches, four employees, three customers, four accounts, four transactions:
INSERT INTO branches VALUES (1, 'Madrid Centro', 'Madrid', 'Spain', DATE '2010-03-15');
INSERT INTO branches VALUES (2, 'Barcelona Diagonal', 'Barcelona', 'Spain', DATE '2012-06-01');
INSERT INTO branches VALUES (3, 'Lisbon Avenida', 'Lisbon', 'Portugal', DATE '2015-09-10');
INSERT INTO accounts VALUES (10001, 1001, 1, 'CHECKING', 5000.00, SYSTIMESTAMP);
INSERT INTO accounts VALUES (10002, 1001, 1, 'SAVINGS', 25000.00, SYSTIMESTAMP);
-- (additional INSERTs in the lab repository)
COMMIT;
Eighteen rows in total, all the foreign keys satisfied.
This is the source state Debezium will snapshot.
Step 3 — Start Kafka and Kafka Connect
On srvdebezium, create a Docker network and start Kafka in detached mode so the broker survives terminal closure.
docker network create debezium-net
docker run -d --name kafka --hostname kafka --network debezium-net \
-p 9092:9092 \
-e CLUSTER_ID=$(docker run --rm quay.io/debezium/kafka:3.5 /kafka/bin/kafka-storage.sh random-uuid) \
-e NODE_ID=1 -e NODE_ROLE=combined \
-e KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka:9093 \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 \
quay.io/debezium/kafka:3.5
The Debezium Oracle connector ships without the Oracle JDBC driver for licensing reasons.
Download ojdbc11.jar from Maven Central and mount it into Connect at startup:
mkdir -p /home/fernando/oracle-jdbc
curl -L \
https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc11/21.9.0.0/ojdbc11-21.9.0.0.jar \
-o /home/fernando/oracle-jdbc/ojdbc11.jar
docker run -d --name connect --network debezium-net \
-p 8083:8083 \
-e BOOTSTRAP_SERVERS=kafka:9092 \
-e GROUP_ID=1 \
-e CONFIG_STORAGE_TOPIC=my_connect_configs \
-e OFFSET_STORAGE_TOPIC=my_connect_offsets \
-e STATUS_STORAGE_TOPIC=my_connect_statuses \
-v /home/fernando/oracle-jdbc/ojdbc11.jar:/kafka/connect/debezium-connector-oracle/ojdbc11.jar \
quay.io/debezium/connect:3.5
Utilice -d (detached).
Without it, closing the terminal stops the container — and stops your replication pipeline.
If you are scoping an Oracle to PostgreSQL migration and want a second opinion on cutover design, I offer a fixed-fee migration assessment →
Step 4 — Register the Oracle Source Connector
The connector configuration tells Debezium which database to read, which schema to capture, and where to store its internal schema history.
curl -i -X POST -H "Content-Type:application/json" \
http://localhost:8083/connectors/ -d '{
"name": "oracle-connector",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"tasks.max": "1",
"database.hostname": "192.168.0.180",
"database.port": "1521",
"database.user": "C##DBZUSER",
"database.password": "dbz",
"database.dbname": "ORADB",
"database.pdb.name": "pdb1",
"topic.prefix": "oracle",
"schema.include.list": "BANKING",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schemahistory.oracle",
"decimal.handling.mode": "string"
}
}'
Three settings worth understanding:
schema.include.list: "BANKING"captures every table in the BANKING schema.
Notable.include.listis set, so when a new table appears in BANKING, Debezium picks it up automatically — you'll see this in Step 6.topic.prefix: "oracle"controls the Kafka topic naming.
Events forBANKING.TRANSACTIONSland onoracle.BANKING.TRANSACTIONS.
The PDB name does not appear in the topic name, even thoughdatabase.pdb.nameis set — it is used for filtering only.decimal.handling.mode: "string"makes OracleNÚMEROvalues arrive as readable strings instead of base64-encodedBigDecimalbytes.
Convenient for debugging events.
As we will see, this has a downstream consequence on the PostgreSQL side.
The connector immediately starts a snapshot phase: it scans every captured table, publishes every existing row as an event with operation type r (read), and then switches to streaming mode where it tails the redo log.
You can watch progress with docker logs -f connect.
Step 5 — Watch CDC Events Live
Open a second terminal on srvdebezium and start a Kafka console consumer to watch events as they arrive.
Pipe through grep to extract just the meaningful fields — the full JSON envelope is dominated by the schema definition, which is fine for processing but unreadable on screen.
docker run -it --rm --name watcher --network debezium-net \
--entrypoint /kafka/bin/kafka-console-consumer.sh \
quay.io/debezium/kafka:3.5 \
--bootstrap-server kafka:9092 \
--topic oracle.BANKING.TRANSACTIONS \
--from-beginning 2>/dev/null \
| grep -oE '"(op|TXN_ID|TXN_TYPE|AMOUNT|DESCRIPTION)":"[^"]*"'
Four snapshot events appear immediately, one per existing transaction, each with "op":"r".
Now generate a live change on Oracle:
INSERT INTO transactions VALUES (100005, 10002, 102, 'DEPOSIT', 5000.00, SYSTIMESTAMP, 'Bonus deposit');
COMMIT;
A new event appears with "op":"c" and the full row in after.
An UPDATE shows "op":"u" with both before y after populated.
A DELETE shows "op":"d" con before populated and after null.
Sub-second latency in steady state.
Step 6 — Set Up the PostgreSQL Sink
On srv2, create the target user, database, and schema, and allow remote connections from srvdebezium.
sudo -u postgres psql -c "CREATE USER banking WITH PASSWORD 'banking';"
sudo -u postgres psql -c "CREATE DATABASE bankingdb OWNER banking;"
sudo -u postgres psql -d bankingdb -c "CREATE SCHEMA banking AUTHORIZATION banking;"
Then register the JDBC sink connector.
En RegexRouter SMT strips the oracle.BANKING. prefix from each topic before it becomes the target table name, so oracle.BANKING.TRANSACTIONS maps to banking.transactions en lugar de banking.oracle_banking_transactions.
curl -i -X POST -H "Content-Type:application/json" \
http://localhost:8083/connectors/ -d '{
"name": "jdbc-sink-connector",
"config": {
"connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.url": "jdbc:postgresql://192.168.0.181:5432/bankingdb",
"connection.username": "banking",
"connection.password": "banking",
"insert.mode": "upsert",
"delete.enabled": "true",
"primary.key.mode": "record_key",
"schema.evolution": "basic",
"topics.regex": "oracle\\.BANKING\\..+",
"table.name.format": "banking.${topic}",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "oracle\\.BANKING\\.(.*)",
"transforms.route.replacement": "$1"
}
}'
Within seconds of registration, the snapshot rows appear in PostgreSQL.
The 18 BANKING rows replicate in under a minute.
Any subsequent change on Oracle propagates to PostgreSQL in seconds.
Step 7 — Add a New Table Mid-Stream
This is where the pipeline proves itself.
Add a LOANS table to BANKING while replication is running.
CREATE TABLE loans (
loan_id NUMBER(10) PRIMARY KEY,
customer_id NUMBER(8) NOT NULL,
loan_type VARCHAR2(30),
principal NUMBER(15,2) NOT NULL,
interest_rate NUMBER(5,2),
start_date DATE,
status VARCHAR2(20),
CONSTRAINT fk_loan_customer FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
INSERT INTO loans VALUES (1, 1001, 'MORTGAGE', 250000.00, 3.25, DATE '2023-01-15', 'ACTIVE');
INSERT INTO loans VALUES (2, 1002, 'AUTO', 18000.00, 4.50, DATE '2024-06-01', 'ACTIVE');
INSERT INTO loans VALUES (3, 1003, 'PERSONAL', 5000.00, 6.75, DATE '2025-02-10', 'ACTIVE');
COMMIT;
The Oracle connector picks up the new table because it matches BANKING\..*.
A new topic oracle.BANKING.LOANS appears in Kafka.
The JDBC sink picks up that topic because it matches oracle\.BANKING\..+.
A new table banking.loans is created in PostgreSQL with the three rows.
No connector reconfiguration.
No restart.
No redeploy.
The schema.evolution Tradeoff
schema.evolution: basic is what made the auto-creation of banking.loans work without manual intervention.
It is also what disqualifies this configuration for production.
Ejecutar \d banking.accounts on PostgreSQL after replication and the table looks like this:
Table "banking.accounts"
Column | Type | Default
--------------+--------------------------------+----------------
account_id | text | not null
customer_id | text | not null
branch_id | text | not null
account_type | text |
balance | text | '0.00'::text
opened_at | timestamp(6) without time zone |Every Oracle NÚMERO column lands in PostgreSQL as texto.
Including balance.
With a default of '0.00'::text.
This is the combined effect of two settings:
schema.evolution: basic— Debezium auto-creates target tables using the lowest-common-denominator JDBC types from the Kafka event schema.decimal.handling.mode: "string"— OracleNÚMEROvalues arrive as strings in Kafka events, so the auto-created PostgreSQL columns are typed accordingly.
The practical impact appears as soon as you write a query: WHERE customer_id = 1001 fails with operator does not exist: text = integer.
You have to use a string literal: WHERE customer_id = '1001'.
Every existing application query breaks.
The production fix is straightforward but requires preparation:
- Build the PostgreSQL schema beforehand using ora2pg or another conversion tool.
The schema gets the right types:NUMERIC(15,2)for amounts,ENTEROoNUMÉRICOfor IDs, properVARCHAR(n)lengths. - Disable
schema.evolutionin the sink connector.
Debezium streams data into the existing schema without modifying it. - Utilice
decimal.handling.mode: "precise"(the default).
Values arrive as JavaBigDecimaland the JDBC sink correctly maps them to the PostgreSQLNUMÉRICOcolumnas.
This is the pattern for any production migration.
The schema evolution feature is for labs and prototypes — useful for proving the pipeline works, dangerous to ship.
The New-Table Discovery Delay
When a new table appears in Oracle while replication is running, there is a small but real delay before it appears in PostgreSQL.
The first SELECCIONE on the new PostgreSQL table can return relation does not exist even though the Kafka topic is already populated.
The sequence is:
- Oracle commits the DDL — the new table appears in the redo log.
- Debezium captures the DDL and creates the Kafka topic.
At this point the topic exists, but the sink connector has not seen it yet. - The INSERTs publish to the new topic.
- The sink connector eventually refreshes its topic metadata.
The defaultmetadata.max.age.msis 5 minutes, though Connect typically refreshes more often.
When it does, the sink discovers the new topic, reads the schema from the first event, runsCREAR TABLAon PostgreSQL viaschema.evolution: basic, and then applies the events. - The table now exists on PostgreSQL.
Typical end-to-end discovery delay: 5–30 seconds.
In a real migration, this matters if an application is expected to write to a brand-new Oracle table and read it from PostgreSQL immediately.
Two ways to handle it:
- Lower
consumer.override.metadata.max.age.msin the sink connector config (e.g. to 5 seconds) for faster discovery. - Pre-create tables in PostgreSQL with the correct types and disable
schema.evolution— the production pattern recommended above.
Tables exist before any data flows.
Production Gotchas
A few things to watch for in production that the lab does not surface.
Tables without primary keys.
Debezium's UPDATE and DELETE events are keyed by the primary key.
Tables without a PK can be snapshotted and INSERTs replicated, but UPDATEs and DELETEs cannot be applied to PostgreSQL.
Audit the source schema for PK-less tables before scoping the migration.
LogMiner archive log retention.
LogMiner can only mine archive logs that still exist on disk.
If your retention policy deletes archive logs after 24 hours and Debezium falls behind by more than that, the pipeline cannot resume from where it stopped.
Size archive log retention to your worst-case Debezium downtime — multiply your usual retention by 3 for safety.
Container persistence.
Running docker run -it ties the container to the terminal.
Closing the terminal stops the container and breaks the pipeline.
Utilice -d (detached) for everything that runs longer than a manual test.
Consumer group offsets.
Deleting a sink connector and recreating it with the same name does not reset its position in Kafka.
The new connector picks up at the previous committed offset, skipping any events queued in between.
Reset offsets explicitly with kafka-consumer-groups.sh --reset-offsets --to-earliest before re-registering.
Schema name case.
Oracle schema names are uppercase by default.
En schema.include.list value in the connector config must match exactly — BANKING, no banking.
The sink-side topic names follow the Oracle case, which is why RegexRouter strips an uppercase prefix.
Preguntas frecuentes
Can Debezium migrate the Oracle schema, or only the data?
Only the data.
Debezium captures DML events — INSERT, UPDATE, DELETE — from the Oracle redo log.
Schema definitions, stored procedures, triggers, views, and sequences must be migrated separately, typically with ora2pg.
The schema.evolution: basic feature can auto-create target tables but with simplified types and no constraints — fine for labs, wrong for production.
What is the latency between Oracle and PostgreSQL?
A few seconds in steady state on a healthy LogMiner setup.
Latency grows under heavy write load or if archive log generation outpaces LogMiner's read rate.
Plan for sub-minute lag and monitor it explicitly with kafka-consumer-groups.sh –describe.
Is LogMiner deprecated in Oracle 19c or 21c?
LogMiner is the supported CDC mechanism in Oracle 19c.
In Oracle 21c and later, Oracle introduced XStream as a paid alternative.
Debezium continues to support both — the LogMiner adapter is the default and works without an additional Oracle license.
Can this run against Oracle Standard Edition?
Yes, with caveats.
LogMiner is available in Standard Edition 2, but supplemental logging and archivelog mode are required, and some optimisations available in Enterprise Edition are not.
Test against the exact edition you have.
Can I use Debezium for one-shot migration without keeping it running?
Sí.
You can run the snapshot phase, wait for it to complete, then stop the connector.
But this is rarely the right call — if you are going to set up Debezium, you may as well let it stream until cutover so you have a tested, lag-free pipeline ready when the application repoints.
En resumen
Debezium turns an Oracle to PostgreSQL migration from a single-shot dump-and-restore into a controllable, observable pipeline that you can run for weeks before cutover.
LogMiner reads Oracle's redo log, Kafka holds the event stream, and the JDBC sink applies changes to PostgreSQL with sub-minute latency.
The result is a real-time replication channel you can validate, replay, and cancel cleanly — the foundation of any zero or near-zero downtime migration.
If you are planning an Oracle to PostgreSQL migration and want help designing the cutover phase or reviewing your CDC architecture, ponerse en contacto
