Cómo configurar la replicación en tiempo real de Oracle a PostgreSQL usando Debezium

En resumen Debezium reads Oracle's redo logs through LogMiner, publishes every change as a Kafka event, and a JDBC sink connector applies those events to PostgreSQL in real time.
The result is a transparent, replayable replication pipeline you can run for weeks before cutover — useful for any Oracle to PostgreSQL migration that needs zero or near-zero downtime.
This post walks through a working setup end to end, on real Oracle and PostgreSQL servers, with a self-contained banking schema you can paste and reproduce.

Most Oracle to PostgreSQL migration projects fail at the cutover, not at the schema conversion.

The schema work is hard but bounded — you spend a few weeks on it, you fix what the conversion tool cannot translate, and you move on.

The cutover is open-ended.

You have a production Oracle database that processes thousands of transactions per second.

You need to switch the application to PostgreSQL without losing a single row, and ideally without a multi-hour downtime window.

A snapshot-based migration alone — dump, restore, repoint — does not solve this.

The moment you take the snapshot, every new INSERT, UPDATE, and DELETE on Oracle is invisible to PostgreSQL.

You need a tool that captures Oracle's ongoing changes and applies them to PostgreSQL continuously.

Debezium is the open-source standard for that job.

Índice

What Debezium Does

Debezium is a change data capture (CDC) framework built on top of Kafka Connect.

For Oracle, it uses the LogMiner adapter — a built-in Oracle feature that reads the redo logs and exposes every committed change as a structured stream.

Debezium polls LogMiner, converts each row change into a Kafka event, and publishes the event to a Kafka topic.

A second connector — the Debezium JDBC sink — subscribes to those topics and applies the changes to a target database.

For Oracle to PostgreSQL replication, the pipeline looks like this:

Oracle (LogMiner) → Debezium source → Kafka → Debezium JDBC sink → PostgreSQL

Both connectors run inside a single Debezium Connect process.

Kafka holds events on topics named after each Oracle table.

You can stop the sink and let events accumulate in Kafka.

You can replay events from a specific offset.

You can run two sinks in parallel — one to PostgreSQL, one to a staging environment for testing.

This is what makes Debezium useful for migration work: you can see exactly what is happening at every stage, and you can intervene at any of them.

The Environment

Three VMs on the same network.

srv1 (192.168.0.180) runs Oracle 19c Enterprise Edition with a CDB called ORADB and a pluggable database pdb1.
srvdebezium (192.168.0.230) runs Ubuntu 24.04 with Docker, hosting the Kafka broker and the Debezium Connect container.
srv2 (192.168.0.181) runs PostgreSQL 18 from the official PGDG repository, hosting the target database bankingdb.

The pipeline replicates a small banking schema from Oracle to PostgreSQL: five tables (branches, employees, clientes, accounts, transactions) with foreign-key relationships, mixed data types, and 18 sample rows.

The full DDL and sample data are below — copy-paste reproducible from a clean Oracle install.

Step 1 — Prepare the Oracle Source

LogMiner needs three things turned on before Debezium can read it.

Archivelog mode.
Without it, Oracle overwrites old redo logs once they are no longer needed for instance recovery.
LogMiner needs them on disk to read.
Switching to archivelog mode requires an instance restart.

-- On srv1 as SYSDBA
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
ALTER PLUGGABLE DATABASE pdb1 OPEN;

Supplemental logging.
By default, Oracle writes only the changed columns to the redo log.
Debezium needs full row images to populate the before field in update events.

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

A capture user.
Debezium connects as a common user (the C## prefix is mandatory in a multitenant database) with read access to the data dictionary and LogMiner views.
Apply every grant — Debezium will fail to start if any one of them is missing.

CREATE USER C##DBZUSER IDENTIFIED BY dbz
  DEFAULT TABLESPACE users
  QUOTA UNLIMITED ON users
  CONTAINER=ALL;

GRANT CREATE SESSION, SET CONTAINER, LOGMINING,
      SELECT ANY TRANSACTION, SELECT ANY DICTIONARY,
      SELECT_CATALOG_ROLE, EXECUTE_CATALOG_ROLE,
      FLASHBACK ANY TABLE, SELECT ANY TABLE,
      CREATE TABLE, LOCK ANY TABLE, CREATE SEQUENCE
  TO C##DBZUSER CONTAINER=ALL;

GRANT EXECUTE ON DBMS_LOGMNR   TO C##DBZUSER CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR_D TO C##DBZUSER CONTAINER=ALL;

-- Plus SELECT on V_$DATABASE, V_$LOG, V_$LOGFILE, V_$ARCHIVED_LOG,
-- V_$LOGMNR_LOGS, V_$LOGMNR_CONTENTS, V_$TRANSACTION, and a few more.
-- Do not shortcut with GRANT DBA — explicit grants are auditable and reversible.

The complete grant list is in the Debezium 3.5 Oracle connector documentation.
Twenty-eight grants in total.

Step 2 — Create the BANKING Schema

The schema below is small enough to paste in 30 seconds and varied enough to exercise the data-type behaviour Debezium gets right and wrong.

-- On srv1, connected to pdb1 as a privileged user
CREATE USER banking IDENTIFIED BY banking
  DEFAULT TABLESPACE users
  TEMPORARY TABLESPACE temp;
GRANT CONNECT, RESOURCE, CREATE VIEW TO banking;
ALTER USER banking QUOTA UNLIMITED ON users;

Connect as banking and create the five tables:

CREATE TABLE branches (
  branch_id    NUMBER(6) PRIMARY KEY,
  branch_name  VARCHAR2(100) NOT NULL,
  city         VARCHAR2(50),
  country      VARCHAR2(50),
  opened_date  DATE
);

CREATE TABLE employees (
  employee_id  NUMBER(6) PRIMARY KEY,
  branch_id    NUMBER(6) NOT NULL,
  full_name    VARCHAR2(100) NOT NULL,
  role         VARCHAR2(50),
  hire_date    DATE,
  CONSTRAINT fk_emp_branch FOREIGN KEY (branch_id) REFERENCES branches(branch_id)
);

CREATE TABLE customers (
  customer_id  NUMBER(8) PRIMARY KEY,
  full_name    VARCHAR2(100) NOT NULL,
  email        VARCHAR2(100),
  country      VARCHAR2(50),
  created_at   TIMESTAMP DEFAULT SYSTIMESTAMP
);

CREATE TABLE accounts (
  account_id    NUMBER(10) PRIMARY KEY,
  customer_id   NUMBER(8) NOT NULL,
  branch_id     NUMBER(6) NOT NULL,
  account_type  VARCHAR2(20),
  balance       NUMBER(15,2) DEFAULT 0,
  opened_at     TIMESTAMP DEFAULT SYSTIMESTAMP,
  CONSTRAINT fk_acc_customer FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
  CONSTRAINT fk_acc_branch   FOREIGN KEY (branch_id)   REFERENCES branches(branch_id)
);

CREATE TABLE transactions (
  txn_id       NUMBER(12) PRIMARY KEY,
  account_id   NUMBER(10) NOT NULL,
  employee_id  NUMBER(6),
  txn_type     VARCHAR2(20),
  amount       NUMBER(15,2) NOT NULL,
  txn_date     TIMESTAMP DEFAULT SYSTIMESTAMP,
  description  VARCHAR2(200),
  CONSTRAINT fk_txn_account  FOREIGN KEY (account_id)  REFERENCES accounts(account_id),
  CONSTRAINT fk_txn_employee FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);

Insert a small representative dataset — three branches, four employees, three customers, four accounts, four transactions:

INSERT INTO branches VALUES (1, 'Madrid Centro',      'Madrid',    'Spain',    DATE '2010-03-15');
INSERT INTO branches VALUES (2, 'Barcelona Diagonal', 'Barcelona', 'Spain',    DATE '2012-06-01');
INSERT INTO branches VALUES (3, 'Lisbon Avenida',     'Lisbon',    'Portugal', DATE '2015-09-10');

INSERT INTO accounts VALUES (10001, 1001, 1, 'CHECKING', 5000.00,  SYSTIMESTAMP);
INSERT INTO accounts VALUES (10002, 1001, 1, 'SAVINGS',  25000.00, SYSTIMESTAMP);
-- (additional INSERTs in the lab repository)

COMMIT;

Eighteen rows in total, all the foreign keys satisfied.

This is the source state Debezium will snapshot.

Step 3 — Start Kafka and Kafka Connect

On srvdebezium, create a Docker network and start Kafka in detached mode so the broker survives terminal closure.

docker network create debezium-net

docker run -d --name kafka --hostname kafka --network debezium-net \
  -p 9092:9092 \
  -e CLUSTER_ID=$(docker run --rm quay.io/debezium/kafka:3.5 /kafka/bin/kafka-storage.sh random-uuid) \
  -e NODE_ID=1 -e NODE_ROLE=combined \
  -e KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka:9093 \
  -e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093 \
  -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 \
  quay.io/debezium/kafka:3.5

The Debezium Oracle connector ships without the Oracle JDBC driver for licensing reasons.

Download ojdbc11.jar from Maven Central and mount it into Connect at startup:

mkdir -p /home/fernando/oracle-jdbc
curl -L \
  https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc11/21.9.0.0/ojdbc11-21.9.0.0.jar \
  -o /home/fernando/oracle-jdbc/ojdbc11.jar

docker run -d --name connect --network debezium-net \
  -p 8083:8083 \
  -e BOOTSTRAP_SERVERS=kafka:9092 \
  -e GROUP_ID=1 \
  -e CONFIG_STORAGE_TOPIC=my_connect_configs \
  -e OFFSET_STORAGE_TOPIC=my_connect_offsets \
  -e STATUS_STORAGE_TOPIC=my_connect_statuses \
  -v /home/fernando/oracle-jdbc/ojdbc11.jar:/kafka/connect/debezium-connector-oracle/ojdbc11.jar \
  quay.io/debezium/connect:3.5

Utilice -d (detached).
Without it, closing the terminal stops the container — and stops your replication pipeline.

If you are scoping an Oracle to PostgreSQL migration and want a second opinion on cutover design, I offer a fixed-fee migration assessment →

Step 4 — Register the Oracle Source Connector

The connector configuration tells Debezium which database to read, which schema to capture, and where to store its internal schema history.

curl -i -X POST -H "Content-Type:application/json" \
  http://localhost:8083/connectors/ -d '{
    "name": "oracle-connector",
    "config": {
      "connector.class": "io.debezium.connector.oracle.OracleConnector",
      "tasks.max": "1",
      "database.hostname": "192.168.0.180",
      "database.port": "1521",
      "database.user": "C##DBZUSER",
      "database.password": "dbz",
      "database.dbname": "ORADB",
      "database.pdb.name": "pdb1",
      "topic.prefix": "oracle",
      "schema.include.list": "BANKING",
      "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
      "schema.history.internal.kafka.topic": "schemahistory.oracle",
      "decimal.handling.mode": "string"
    }
  }'

Three settings worth understanding:

schema.include.list: "BANKING" captures every table in the BANKING schema.
No table.include.list is set, so when a new table appears in BANKING, Debezium picks it up automatically — you'll see this in Step 6.
topic.prefix: "oracle" controls the Kafka topic naming.
Events for BANKING.TRANSACTIONS land on oracle.BANKING.TRANSACTIONS.
The PDB name does not appear in the topic name, even though database.pdb.name is set — it is used for filtering only.
decimal.handling.mode: "string" makes Oracle NÚMERO values arrive as readable strings instead of base64-encoded BigDecimal bytes.
Convenient for debugging events.
As we will see, this has a downstream consequence on the PostgreSQL side.

The connector immediately starts a snapshot phase: it scans every captured table, publishes every existing row as an event with operation type r (read), and then switches to streaming mode where it tails the redo log.

You can watch progress with docker logs -f connect.

Step 5 — Watch CDC Events Live

Open a second terminal on srvdebezium and start a Kafka console consumer to watch events as they arrive.

Pipe through grep to extract just the meaningful fields — the full JSON envelope is dominated by the schema definition, which is fine for processing but unreadable on screen.

docker run -it --rm --name watcher --network debezium-net \
  --entrypoint /kafka/bin/kafka-console-consumer.sh \
  quay.io/debezium/kafka:3.5 \
  --bootstrap-server kafka:9092 \
  --topic oracle.BANKING.TRANSACTIONS \
  --from-beginning 2>/dev/null \
  | grep -oE '"(op|TXN_ID|TXN_TYPE|AMOUNT|DESCRIPTION)":"[^"]*"'

Four snapshot events appear immediately, one per existing transaction, each with "op":"r".

Now generate a live change on Oracle:

INSERT INTO transactions VALUES (100005, 10002, 102, 'DEPOSIT', 5000.00, SYSTIMESTAMP, 'Bonus deposit');
COMMIT;

A new event appears with "op":"c" and the full row in after.

An UPDATE shows "op":"u" with both before y after populated.

A DELETE shows "op":"d" con before populated and after null.

Sub-second latency in steady state.

Step 6 — Set Up the PostgreSQL Sink

On srv2, create the target user, database, and schema, and allow remote connections from srvdebezium.

sudo -u postgres psql -c "CREATE USER banking WITH PASSWORD 'banking';"
sudo -u postgres psql -c "CREATE DATABASE bankingdb OWNER banking;"
sudo -u postgres psql -d bankingdb -c "CREATE SCHEMA banking AUTHORIZATION banking;"

# /etc/postgresql/18/main/postgresql.conf listen_addresses = ‘*'# /etc/postgresql/18/main/pg_hba.conf host all all 192.168.0.0/24 scram-sha-256

Then register the JDBC sink connector.

En RegexRouter SMT strips the oracle.BANKING. prefix from each topic before it becomes the target table name, so oracle.BANKING.TRANSACTIONS maps to banking.transactions en lugar de banking.oracle_banking_transactions.

curl -i -X POST -H "Content-Type:application/json" \
  http://localhost:8083/connectors/ -d '{
    "name": "jdbc-sink-connector",
    "config": {
      "connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
      "tasks.max": "1",
      "connection.url": "jdbc:postgresql://192.168.0.181:5432/bankingdb",
      "connection.username": "banking",
      "connection.password": "banking",
      "insert.mode": "upsert",
      "delete.enabled": "true",
      "primary.key.mode": "record_key",
      "schema.evolution": "basic",
      "topics.regex": "oracle\\.BANKING\\..+",
      "table.name.format": "banking.${topic}",
      "transforms": "route",
      "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
      "transforms.route.regex": "oracle\\.BANKING\\.(.*)",
      "transforms.route.replacement": "$1"
    }
  }'

Within seconds of registration, the snapshot rows appear in PostgreSQL.

The 18 BANKING rows replicate in under a minute.

Any subsequent change on Oracle propagates to PostgreSQL in seconds.

Step 7 — Add a New Table Mid-Stream

This is where the pipeline proves itself.

Add a LOANS table to BANKING while replication is running.

CREATE TABLE loans (
  loan_id        NUMBER(10) PRIMARY KEY,
  customer_id    NUMBER(8) NOT NULL,
  loan_type      VARCHAR2(30),
  principal      NUMBER(15,2) NOT NULL,
  interest_rate  NUMBER(5,2),
  start_date     DATE,
  status         VARCHAR2(20),
  CONSTRAINT fk_loan_customer FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

INSERT INTO loans VALUES (1, 1001, 'MORTGAGE', 250000.00, 3.25, DATE '2023-01-15', 'ACTIVE');
INSERT INTO loans VALUES (2, 1002, 'AUTO',     18000.00,  4.50, DATE '2024-06-01', 'ACTIVE');
INSERT INTO loans VALUES (3, 1003, 'PERSONAL', 5000.00,   6.75, DATE '2025-02-10', 'ACTIVE');
COMMIT;

The Oracle connector picks up the new table because it matches BANKING\..*.

A new topic oracle.BANKING.LOANS appears in Kafka.

The JDBC sink picks up that topic because it matches oracle\.BANKING\..+.

A new table banking.loans is created in PostgreSQL with the three rows.

No connector reconfiguration.

No restart.

No redeploy.

The schema.evolution Tradeoff

schema.evolution: basic is what made the auto-creation of banking.loans work without manual intervention.

It is also what disqualifies this configuration for production.

Ejecutar \d banking.accounts on PostgreSQL after replication and the table looks like this:

                  Table "banking.accounts"
    Column    |              Type              |   Default
--------------+--------------------------------+----------------
 account_id   | text                           | not null
 customer_id  | text                           | not null
 branch_id    | text                           | not null
 account_type | text                           |
 balance      | text                           | '0.00'::text
 opened_at    | timestamp(6) without time zone |

Every Oracle NÚMERO column lands in PostgreSQL as texto.

Including balance.

With a default of '0.00'::text.

This is the combined effect of two settings:

schema.evolution: basic — Debezium auto-creates target tables using the lowest-common-denominator JDBC types from the Kafka event schema.
decimal.handling.mode: "string" — Oracle NÚMERO values arrive as strings in Kafka events, so the auto-created PostgreSQL columns are typed accordingly.

The practical impact appears as soon as you write a query: WHERE customer_id = 1001 fails with operator does not exist: text = integer.

You have to use a string literal: WHERE customer_id = '1001'.

Every existing application query breaks.

The production fix is straightforward but requires preparation:

Build the PostgreSQL schema beforehand using ora2pg or another conversion tool.
The schema gets the right types: NUMERIC(15,2) for amounts, ENTERO o NUMÉRICO for IDs, proper VARCHAR(n) lengths.
Disable schema.evolution in the sink connector.
Debezium streams data into the existing schema without modifying it.
Utilice decimal.handling.mode: "precise" (the default).
Values arrive as Java BigDecimal and the JDBC sink correctly maps them to the PostgreSQL NUMÉRICO columnas.

This is the pattern for any production migration.

The schema evolution feature is for labs and prototypes — useful for proving the pipeline works, dangerous to ship.

The New-Table Discovery Delay

When a new table appears in Oracle while replication is running, there is a small but real delay before it appears in PostgreSQL.

The first SELECCIONE on the new PostgreSQL table can return relation does not exist even though the Kafka topic is already populated.

The sequence is:

Oracle commits the DDL — the new table appears in the redo log.
Debezium captures the DDL and creates the Kafka topic.
At this point the topic exists, but the sink connector has not seen it yet.
The INSERTs publish to the new topic.
The sink connector eventually refreshes its topic metadata.
The default metadata.max.age.ms is 5 minutes, though Connect typically refreshes more often.
When it does, the sink discovers the new topic, reads the schema from the first event, runs CREAR TABLA on PostgreSQL via schema.evolution: basic, and then applies the events.
The table now exists on PostgreSQL.

Typical end-to-end discovery delay: 5–30 seconds.

In a real migration, this matters if an application is expected to write to a brand-new Oracle table and read it from PostgreSQL immediately.

Two ways to handle it:

Lower consumer.override.metadata.max.age.ms in the sink connector config (e.g. to 5 seconds) for faster discovery.
Pre-create tables in PostgreSQL with the correct types and disable schema.evolution — the production pattern recommended above.
Tables exist before any data flows.

Production Gotchas

A few things to watch for in production that the lab does not surface.

Tables without primary keys.
Debezium's UPDATE and DELETE events are keyed by the primary key.
Tables without a PK can be snapshotted and INSERTs replicated, but UPDATEs and DELETEs cannot be applied to PostgreSQL.
Audit the source schema for PK-less tables before scoping the migration.

LogMiner archive log retention.
LogMiner can only mine archive logs that still exist on disk.
If your retention policy deletes archive logs after 24 hours and Debezium falls behind by more than that, the pipeline cannot resume from where it stopped.
Size archive log retention to your worst-case Debezium downtime — multiply your usual retention by 3 for safety.

Container persistence.
Running docker run -it ties the container to the terminal.
Closing the terminal stops the container and breaks the pipeline.
Utilice -d (detached) for everything that runs longer than a manual test.

Consumer group offsets.
Deleting a sink connector and recreating it with the same name does not reset its position in Kafka.
The new connector picks up at the previous committed offset, skipping any events queued in between.
Reset offsets explicitly with kafka-consumer-groups.sh --reset-offsets --to-earliest before re-registering.

Schema name case.
Oracle schema names are uppercase by default.
En schema.include.list value in the connector config must match exactly — BANKING, no banking.
The sink-side topic names follow the Oracle case, which is why RegexRouter strips an uppercase prefix.

Preguntas frecuentes

Can Debezium migrate the Oracle schema, or only the data?

Only the data.
Debezium captures DML events — INSERT, UPDATE, DELETE — from the Oracle redo log.
Schema definitions, stored procedures, triggers, views, and sequences must be migrated separately, typically with ora2pg.
The schema.evolution: basic feature can auto-create target tables but with simplified types and no constraints — fine for labs, wrong for production.

What is the latency between Oracle and PostgreSQL?

A few seconds in steady state on a healthy LogMiner setup.
Latency grows under heavy write load or if archive log generation outpaces LogMiner's read rate.
Plan for sub-minute lag and monitor it explicitly with kafka-consumer-groups.sh –describe.

Is LogMiner deprecated in Oracle 19c or 21c?

LogMiner is the supported CDC mechanism in Oracle 19c.
In Oracle 21c and later, Oracle introduced XStream as a paid alternative.
Debezium continues to support both — the LogMiner adapter is the default and works without an additional Oracle license.

Can this run against Oracle Standard Edition?

Yes, with caveats.
LogMiner is available in Standard Edition 2, but supplemental logging and archivelog mode are required, and some optimisations available in Enterprise Edition are not.
Test against the exact edition you have.

Can I use Debezium for one-shot migration without keeping it running?

Sí.
You can run the snapshot phase, wait for it to complete, then stop the connector.
But this is rarely the right call — if you are going to set up Debezium, you may as well let it stream until cutover so you have a tested, lag-free pipeline ready when the application repoints.

En resumen

Debezium turns an Oracle to PostgreSQL migration from a single-shot dump-and-restore into a controllable, observable pipeline that you can run for weeks before cutover.
LogMiner reads Oracle's redo log, Kafka holds the event stream, and the JDBC sink applies changes to PostgreSQL with sub-minute latency.

The result is a real-time replication channel you can validate, replay, and cancel cleanly — the foundation of any zero or near-zero downtime migration.

If you are planning an Oracle to PostgreSQL migration and want help designing the cutover phase or reviewing your CDC architecture, ponerse en contacto