{"id":6685,"date":"2026-04-08T00:03:00","date_gmt":"2026-04-07T22:03:00","guid":{"rendered":"https:\/\/rootfan.com\/?p=6685"},"modified":"2026-04-08T00:25:38","modified_gmt":"2026-04-07T22:25:38","slug":"alta-disponibilidad-de-postgresql-con-patroni-etcd-haproxy-y-keepalived","status":"publish","type":"post","link":"https:\/\/rootfan.com\/es\/postgresql-high-availability-with-patroni-etcd-haproxy-and-keepalived\/","title":{"rendered":"Alta disponibilidad de PostgreSQL con Patroni, etcd, HAProxy y keepalived"},"content":{"rendered":"<p>PostgreSQL tiene una pila madura de alta disponibilidad de grado de producci\u00f3n que no cuesta nada en licencias y es sencilla de operar una vez que est\u00e1 configurada. <\/p>\n\n\n\n<p>Este laboratorio crea un cl\u00faster de alta disponibilidad de seis nodos utilizando cuatro componentes de c\u00f3digo abierto: <strong>Patroni<\/strong> para la gesti\u00f3n de cl\u00fasteres y la conmutaci\u00f3n por error autom\u00e1tica, <strong>etcd<\/strong> como el almac\u00e9n de consenso distribuido, <strong>HAProxy<\/strong> para balanceo de carga y enrutamiento de conexiones, y <strong>keepalived<\/strong> para una IP virtual flotante que sobreviva a fallos de nodos de HAProxy.<\/p>\n\n\n\n<p>El resultado es un cl\u00faster donde se detecta una falla primaria y se elige una nueva primaria en menos de 30 segundos, sin requerir intervenci\u00f3n manual. <\/p>\n\n\n\n<!--more-->\n\n\n\n<p>Switchovers are clean and zero data loss. <\/p>\n\n\n\n<p>The entire stack is managed through a single CLI (<code>patronictl<\/code>) that makes day-to-day operations \u2014 switchover, failover, reinitialisation, configuration changes \u2014 simple commands rather than multi-step procedures.<\/p>\n\n\n\n<p>This lab covers everything from scratch: TLS certificate generation, etcd cluster formation, Patroni configuration, HAProxy setup, keepalived VIP configuration, and full verification of failover and switchover. <\/p>\n\n\n\n<p>Each step is explained with expected output and failure diagnosis so you know exactly what success looks like at every stage.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<div class=\"wp-block-rank-math-toc-block\" id=\"rank-math-toc\"><h2>\u00cdndice<\/h2><nav><ul><li><a href=\"#postgre-sql-high-availability-with-patroni\">PostgreSQL High Availability with Patroni<\/a><\/li><li><a href=\"#1-architecture\">1. Architecture<\/a><\/li><li><a href=\"#2-prerequisites\">2. Prerequisites<\/a><ul><li><a href=\"#step-1-set-the-correct-timezone-on-all-nodes\">Step 1 \u2014 Set the correct timezone on all nodes<\/a><\/li><li><a href=\"#step-2-confirm-required-ports-are-open\">Step 2 \u2014 Confirm required ports are open<\/a><\/li><\/ul><\/li><li><a href=\"#3-postgre-sql-installation\">3. PostgreSQL Installation<\/a><ul><li><a href=\"#step-1-install-postgre-sql-on-all-3-postgre-sql-nodes\">Step 1 \u2014 Install PostgreSQL on all 3 PostgreSQL nodes<\/a><\/li><li><a href=\"#step-2-stop-and-disable-the-postgre-sql-service\">Step 2 \u2014 Stop and disable the PostgreSQL service<\/a><\/li><\/ul><\/li><li><a href=\"#4-etcd-installation\">4. etcd Installation<\/a><ul><li><a href=\"#step-1-install-etcd-on-all-3-postgre-sql-nodes\">Step 1 \u2014 Install etcd on all 3 PostgreSQL nodes<\/a><\/li><li><a href=\"#step-2-create-the-etcd-system-user\">Step 2 \u2014 Create the etcd system user<\/a><\/li><\/ul><\/li><li><a href=\"#5-tls-certificate-generation\">5. TLS Certificate Generation<\/a><ul><li><a href=\"#step-1-create-the-working-directory\">Step 1 \u2014 Create the working directory<\/a><\/li><li><a href=\"#step-2-generate-the-certificate-authority\">Step 2 \u2014 Generate the Certificate Authority<\/a><\/li><li><a href=\"#step-3-generate-per-node-etcd-certificates\">Step 3 \u2014 Generate per-node etcd certificates<\/a><\/li><li><a href=\"#step-4-generate-the-postgre-sql-server-certificate\">Step 4 \u2014 Generate the PostgreSQL server certificate<\/a><\/li><li><a href=\"#step-5-distribute-certificates-to-postgres-02-and-postgres-03\">Step 5 \u2014 Distribute certificates to postgres-02 and postgres-03<\/a><\/li><li><a href=\"#step-6-install-certificates-on-each-postgre-sql-node\">Step 6 \u2014 Install certificates on each PostgreSQL node<\/a><\/li><li><a href=\"#step-7-create-the-combined-pem-file-for-patroni\">Step 7 \u2014 Create the combined PEM file for Patroni<\/a><\/li><\/ul><\/li><li><a href=\"#6-etcd-configuration\">6. etcd Configuration<\/a><ul><li><a href=\"#step-1-create-the-etcd-data-directory\">Step 1 \u2014 Create the etcd data directory<\/a><\/li><li><a href=\"#step-2-create-the-etcd-environment-file-on-each-node\">Step 2 \u2014 Create the etcd environment file on each node<\/a><\/li><li><a href=\"#step-3-create-the-etcd-systemd-service-file\">Step 3 \u2014 Create the etcd systemd service file<\/a><\/li><li><a href=\"#step-4-start-etcd-on-all-3-nodes\">Step 4 \u2014 Start etcd on all 3 nodes<\/a><\/li><li><a href=\"#step-5-verify-etcd-cluster-health\">Step 5 \u2014 Verify etcd cluster health<\/a><\/li><\/ul><\/li><li><a href=\"#7-patroni-installation-and-configuration\">7. Patroni Installation and Configuration<\/a><ul><li><a href=\"#step-1-install-patroni\">Step 1 \u2014 Install Patroni<\/a><\/li><li><a href=\"#step-2-create-patroni-yml-on-each-node\">Step 2 \u2014 Create patroni.yml on each node<\/a><\/li><\/ul><\/li><li><a href=\"#8-starting-the-cluster\">8. Starting the Cluster<\/a><ul><li><a href=\"#step-1-start-patroni-on-postgres-01-first\">Step 1 \u2014 Start Patroni on postgres-01 first<\/a><\/li><li><a href=\"#step-2-verify-postgres-01-is-the-leader\">Step 2 \u2014 Verify postgres-01 is the leader<\/a><\/li><li><a href=\"#step-3-start-patroni-on-postgres-02-and-postgres-03\">Step 3 \u2014 Start Patroni on postgres-02 and postgres-03<\/a><\/li><li><a href=\"#step-4-change-initial-cluster-state-to-existing-on-all-nodes\">Step 4 \u2014 Change initial-cluster-state to existing on all nodes<\/a><\/li><\/ul><\/li><li><a href=\"#9-ha-proxy-setup\">9. HAProxy Setup<\/a><ul><li><a href=\"#step-1-install-ha-proxy\">Step 1 \u2014 Install HAProxy<\/a><\/li><li><a href=\"#step-2-configure-ha-proxy\">Step 2 \u2014 Configure HAProxy<\/a><\/li><li><a href=\"#step-3-validate-config-and-reload-ha-proxy\">Step 3 \u2014 Validate config and reload HAProxy<\/a><\/li><\/ul><\/li><li><a href=\"#10-keepalived-setup\">10. keepalived Setup<\/a><ul><li><a href=\"#step-1-install-keepalived\">Step 1 \u2014 Install keepalived<\/a><\/li><li><a href=\"#step-2-create-the-ha-proxy-health-check-script\">Step 2 \u2014 Create the HAProxy health check script<\/a><\/li><li><a href=\"#step-3-configure-keepalived\">Step 3 \u2014 Configure keepalived<\/a><\/li><li><a href=\"#step-4-start-keepalived\">Step 4 \u2014 Start keepalived<\/a><\/li><\/ul><\/li><li><a href=\"#11-set-the-postgres-superuser-password\">11. Set the postgres Superuser Password<\/a><\/li><li><a href=\"#12-verify-the-full-stack\">12. Verify the Full Stack<\/a><\/li><li><a href=\"#13-switchover-planned\">13. Switchover (Planned)<\/a><\/li><li><a href=\"#14-failover-unplanned\">14. Failover (Unplanned)<\/a><ul><li><a href=\"#simulate-a-primary-failure\">Simulate a primary failure<\/a><\/li><li><a href=\"#manual-failover-use-only-when-automatic-failover-has-not-triggered\">Manual failover \u2014 use only when automatic failover has not triggered<\/a><\/li><\/ul><\/li><li><a href=\"#15-day-to-day-operations\">15. Day-to-Day Operations<\/a><ul><li><a href=\"#check-cluster-status\">Check cluster status<\/a><\/li><li><a href=\"#pause-and-resume-automatic-failover\">Pause and resume automatic failover<\/a><\/li><li><a href=\"#restart-postgre-sql-on-a-node\">Restart PostgreSQL on a node<\/a><\/li><li><a href=\"#reload-configuration\">Reload configuration<\/a><\/li><li><a href=\"#edit-cluster-dcs-configuration\">Edit cluster DCS configuration<\/a><\/li><li><a href=\"#reinitialise-a-failed-standby\">Reinitialise a failed standby<\/a><\/li><\/ul><\/li><li><a href=\"#16-synchronous-mode-zero-data-loss\">16. Synchronous Mode (Zero Data Loss)<\/a><\/li><li><a href=\"#17-monitoring\">17. Monitoring<\/a><ul><li><a href=\"#patroni-rest-api\">Patroni REST API<\/a><\/li><li><a href=\"#replication-lag\">Replication lag<\/a><\/li><\/ul><\/li><li><a href=\"#18-full-reset\">18. Full Reset<\/a><ul><li><a href=\"#step-1-stop-patroni-and-etcd-on-all-postgre-sql-nodes\">Step 1 \u2014 Stop Patroni and etcd on all PostgreSQL nodes<\/a><\/li><li><a href=\"#step-2-wipe-etcd-and-postgre-sql-data-directories\">Step 2 \u2014 Wipe etcd and PostgreSQL data directories<\/a><\/li><li><a href=\"#step-3-recreate-directories-with-correct-ownership\">Step 3 \u2014 Recreate directories with correct ownership<\/a><\/li><li><a href=\"#step-4-restore-acl-permissions-on-etcd-certificates\">Step 4 \u2014 Restore ACL permissions on etcd certificates<\/a><\/li><li><a href=\"#step-5-reset-etcd-initial-cluster-state-to-new\">Step 5 \u2014 Reset ETCD_INITIAL_CLUSTER_STATE to &#8220;new&#8221;<\/a><\/li><li><a href=\"#step-6-start-etcd-on-all-nodes\">Step 6 \u2014 Start etcd on all nodes<\/a><\/li><li><a href=\"#step-7-start-patroni-on-all-nodes\">Step 7 \u2014 Start Patroni on all nodes<\/a><\/li><\/ul><\/li><li><a href=\"#19-common-issues\">19. Common Issues<\/a><\/li><li><a href=\"#20-key-commands-reference\">20. Key Commands Reference<\/a><\/li><\/ul><\/nav><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"postgre-sql-high-availability-with-patroni\">PostgreSQL High Availability with Patroni<\/h2>\n\n\n\n<p>Covers Patroni architecture, TLS setup, failover, switchover, and day-to-day operations.<\/p>\n\n\n\n<p><strong>Environment:<\/strong> Six servers total. etcd runs on the same nodes as PostgreSQL \u2014 no dedicated etcd servers.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Role<\/th><th>Hostname<\/th><th>IP<\/th><\/tr><\/thead><tbody><tr><td>HAProxy node 1<\/td><td>haproxy-01<\/td><td>192.168.0.200<\/td><\/tr><tr><td>HAProxy node 2<\/td><td>haproxy-02<\/td><td>192.168.0.201<\/td><\/tr><tr><td>HAProxy node 3<\/td><td>haproxy-03<\/td><td>192.168.0.202<\/td><\/tr><tr><td>PostgreSQL + etcd + Patroni 1<\/td><td>postgres-01<\/td><td>192.168.0.203<\/td><\/tr><tr><td>PostgreSQL + etcd + Patroni 2<\/td><td>postgres-02<\/td><td>192.168.0.204<\/td><\/tr><tr><td>PostgreSQL + etcd + Patroni 3<\/td><td>postgres-03<\/td><td>192.168.0.205<\/td><\/tr><tr><td>Virtual IP (VIP)<\/td><td>\u2014<\/td><td>192.168.0.210<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1-architecture\">1. Architecture<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>          +----------+   +----------+   +----------+\n          |  etcd    |   |  etcd    |   |  etcd    |\n          | Patroni  |   | Patroni  |   | Patroni  |\n          |   +PG    |   |   +PG    |   |   +PG    |\n          | node1    |   | node2    |   | node3    |\n          | PRIMARY  |   | STANDBY  |   | STANDBY  |\n          +----------+   +----------+   +----------+\n                \\              |              \/\n                 \\             |             \/\n          +----------+   +----------+   +----------+\n          | HAProxy  |   | HAProxy  |   | HAProxy  |\n          |  node1   |   |  node2   |   |  node3   |\n          | (MASTER) |   | (BACKUP) |   | (BACKUP) |\n          +----------+   +----------+   +----------+\n                \\              |              \/\n                 \\             |             \/\n              &#91;keepalived VIP: 192.168.0.210:5432]\n                              |\n                        Applications<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>etcd<\/strong>: distributed key-value store co-located on each PostgreSQL node. Holds cluster state (current leader, member list). Requires an odd number of nodes for quorum \u2014 3 nodes tolerate 1 failure, 5 nodes tolerate 2.<\/li>\n\n\n\n<li><strong>Patroni<\/strong>: daemon on each PostgreSQL node. Manages replication, monitors health, and coordinates failover through etcd.<\/li>\n\n\n\n<li><strong>HAProxy<\/strong>: three dedicated nodes route application connections to the current primary by checking Patroni's REST API.<\/li>\n\n\n\n<li><strong>keepalived<\/strong>: manages the VIP using VRRP. One HAProxy node holds the VIP at a time. If that node fails, the VIP moves to the next HAProxy node automatically.<\/li>\n\n\n\n<li><strong>All communication is TLS-encrypted<\/strong>: etcd peer traffic, etcd client traffic, Patroni REST API, and PostgreSQL connections.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"2-prerequisites\">2. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-set-the-correct-timezone-on-all-nodes\">Step 1 \u2014 Set the correct timezone on all nodes<\/h3>\n\n\n\n<p>Run on all 6 servers (postgres-01\/02\/03 and haproxy-01\/02\/03). <\/p>\n\n\n\n<p>Servers default to UTC \u2014 set to your local timezone before anything else. <\/p>\n\n\n\n<p>Mismatched or wrong timestamps cause confusion in logs and certificate validation.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo timedatectl set-timezone Europe\/Madrid\ntimedatectl\n# Expected: Time zone: Europe\/Madrid (CET\/CEST, +0100\/+0200)\n# NTP service should show: active\n# System clock synchronized: yes<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-confirm-required-ports-are-open\">Step 2 \u2014 Confirm required ports are open<\/h3>\n\n\n\n<p>Before starting, confirm the following ports are open:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Source<\/th><th>Destination<\/th><th>Port<\/th><th>Prop\u00f3sito<\/th><\/tr><\/thead><tbody><tr><td>PostgreSQL nodes<\/td><td>PostgreSQL nodes<\/td><td>2379<\/td><td>etcd client (Patroni \u2192 etcd)<\/td><\/tr><tr><td>PostgreSQL nodes<\/td><td>PostgreSQL nodes<\/td><td>2380<\/td><td>etcd peer communication<\/td><\/tr><tr><td>PostgreSQL nodes<\/td><td>PostgreSQL nodes<\/td><td>5432<\/td><td>PostgreSQL replication<\/td><\/tr><tr><td>PostgreSQL nodes<\/td><td>PostgreSQL nodes<\/td><td>8008<\/td><td>Patroni REST API<\/td><\/tr><tr><td>HAProxy nodes<\/td><td>PostgreSQL nodes<\/td><td>8008<\/td><td>HAProxy health check<\/td><\/tr><tr><td>HAProxy nodes<\/td><td>HAProxy nodes<\/td><td>112\/VRRP<\/td><td>keepalived VIP election<\/td><\/tr><tr><td>Applications<\/td><td>VIP<\/td><td>5432<\/td><td>Client connections<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"3-postgre-sql-installation\">3. PostgreSQL Installation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-install-postgre-sql-on-all-3-postgre-sql-nodes\">Step 1 \u2014 Install PostgreSQL on all 3 PostgreSQL nodes<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo apt update\nsudo apt install -y postgresql-common\n# postgresql-common: provides the PGDG repository setup script\n\nsudo \/usr\/share\/postgresql-common\/pgdg\/apt.postgresql.org.sh\n# This script adds the official PostgreSQL apt repository (postgresql.org)\n# and imports its GPG key \u2014 ensures you get the latest PostgreSQL version,\n# not the older version bundled with Ubuntu\n\nsudo apt update\nsudo apt install -y postgresql-18 postgresql-contrib-18\n# Install version 18 explicitly \u2014 the generic \"postgresql\" meta-package installs Ubuntu's\n# default bundled version (16) in addition to the PGDG version, leaving two versions installed\n# Always specify the version number to avoid this\n# postgresql-contrib-18: additional modules including pg_rewind, which Patroni uses\n# to resync the old primary after a failover without a full base backup<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-stop-and-disable-the-postgre-sql-service\">Step 2 \u2014 Stop and disable the PostgreSQL service<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo systemctl stop postgresql\n# Patroni manages PostgreSQL startup entirely\n# If PostgreSQL is already running when Patroni starts, Patroni will fail with:\n# \"postmaster is already running\"\n\nsudo systemctl disable postgresql\n# Prevents PostgreSQL from starting automatically on boot\n# Patroni's own systemd service starts PostgreSQL when the node joins the cluster<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"4-etcd-installation\">4. etcd Installation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-install-etcd-on-all-3-postgre-sql-nodes\">Step 1 \u2014 Install etcd on all 3 PostgreSQL nodes<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo apt-get install -y wget curl\n\nwget https:\/\/github.com\/etcd-io\/etcd\/releases\/download\/v3.6.10\/etcd-v3.6.10-linux-amd64.tar.gz\n# Download the etcd binary directly from GitHub releases\n# The apt package is often outdated \u2014 always install from the official releases\n# Check https:\/\/github.com\/etcd-io\/etcd\/releases for the latest stable version\n\ntar xvf etcd-v3.6.10-linux-amd64.tar.gz\n# xvf: extract (x), verbose (v), from file (f)\n\nsudo mv etcd-v3.6.10-linux-amd64\/etcd \/usr\/local\/bin\/\nsudo mv etcd-v3.6.10-linux-amd64\/etcdctl \/usr\/local\/bin\/\n# etcd: the etcd server binary\n# etcdctl: the etcd client CLI \u2014 used for health checks and inspecting cluster state\n\n# Verify the installation\netcd --version\n# Expected: etcd Version: 3.6.10\n# If \"etcd: command not found\": \/usr\/local\/bin is not in PATH \u2014 run: export PATH=$PATH:\/usr\/local\/bin\n\netcdctl version\n# Expected: etcdctl version: 3.6.10<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-create-the-etcd-system-user\">Step 2 \u2014 Create the etcd system user<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo useradd --system --home \/var\/lib\/etcd --shell \/bin\/false etcd\n\n# --system: creates a system account with no login shell by default\n# --home \/var\/lib\/etcd: etcd stores its data here\n# --shell \/bin\/false: prevents interactive login \u2014 etcd runs as a daemon only<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5-tls-certificate-generation\">5. TLS Certificate Generation<\/h2>\n\n\n\n<p>All certificates are generated once on postgres-01 and then distributed to the other nodes.<\/p>\n\n\n\n<p>The CA private key (<code>ca.key<\/code>) stays on postgres-01 after distribution is complete \u2014 do not copy it to other nodes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-create-the-working-directory\">Step 1 \u2014 Create the working directory<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nmkdir ~\/certs && cd ~\/certs\n# All certificate files are created here before being copied to each node<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-generate-the-certificate-authority\">Step 2 \u2014 Generate the Certificate Authority<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nopenssl genrsa -out ca.key 2048\n# genrsa: generate an RSA private key; 2048: key length in bits\n\nopenssl req -x509 -new -nodes -key ca.key -subj \"\/CN=etcd-ca\" -days 7300 -out ca.crt\n# req -x509: create a self-signed certificate (not a signing request)\n# -new -nodes: new certificate, no passphrase on the private key\n# -subj \"\/CN=etcd-ca\": the certificate's Common Name \u2014 identifies this as the cluster CA\n# -days 7300: valid for 20 years\n# ca.crt: distributed to every node as the root of trust<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-generate-per-node-etcd-certificates\">Step 3 \u2014 Generate per-node etcd certificates<\/h3>\n\n\n\n<p>Each node gets its own certificate with its IP as a Subject Alternative Name (SAN). TLS hostname verification requires the server's IP to appear in the SAN \u2014 without it, connections will fail.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01 \u2014 generate all three node certificates here, then distribute\n\n# Certificate for postgres-01 (192.168.0.203)\nopenssl genrsa -out etcd-node1.key 2048\n\ncat &gt; temp.cnf &lt;&lt;EOF\n&#91; req ]\ndistinguished_name = req_distinguished_name\nreq_extensions = v3_req\n&#91; req_distinguished_name ]\n&#91; v3_req ]\nsubjectAltName = @alt_names\n&#91; alt_names ]\nIP.1 = 192.168.0.203\nIP.2 = 127.0.0.1\nEOF\n\n# temp.cnf: OpenSSL config that adds the node IP as a SAN\n# IP.2 = 127.0.0.1: allows etcdctl to connect locally without specifying a remote address\n\nopenssl req -new -key etcd-node1.key -out etcd-node1.csr \\\n  -subj \"\/CN=etcd-node1\" \\\n  -config temp.cnf\n# req -new: generate a certificate signing request (CSR)\n# -subj: the certificate's identity \u2014 CN identifies the node in logs\n\nopenssl x509 -req -in etcd-node1.csr -CA ca.crt -CAkey ca.key \\\n  -CAcreateserial -out etcd-node1.crt -days 7300 \\\n  -sha256 -extensions v3_req -extfile temp.cnf\n# x509 -req: sign the CSR with the CA to produce a certificate\n# -CAcreateserial: creates ca.srl to track serial numbers across certificates\n# -extensions v3_req -extfile temp.cnf: embed the SANs into the signed certificate\n\nopenssl x509 -in etcd-node1.crt -text -noout | grep -A1 \"Subject Alternative Name\"\n# Verify the SAN was embedded \u2014 Expected: IP Address:192.168.0.203, IP Address:127.0.0.1\n# If the SAN is missing: the -extensions and -extfile flags were not applied correctly\n\nrm temp.cnf\n\n# Certificate for postgres-02 (192.168.0.204)\nopenssl genrsa -out etcd-node2.key 2048\n\ncat &gt; temp.cnf &lt;&lt;EOF\n&#91; req ]\ndistinguished_name = req_distinguished_name\nreq_extensions = v3_req\n&#91; req_distinguished_name ]\n&#91; v3_req ]\nsubjectAltName = @alt_names\n&#91; alt_names ]\nIP.1 = 192.168.0.204\nIP.2 = 127.0.0.1\nEOF\n\nopenssl req -new -key etcd-node2.key -out etcd-node2.csr \\\n  -subj \"\/CN=etcd-node2\" -config temp.cnf\n\nopenssl x509 -req -in etcd-node2.csr -CA ca.crt -CAkey ca.key \\\n  -CAcreateserial -out etcd-node2.crt -days 7300 \\\n  -sha256 -extensions v3_req -extfile temp.cnf\n\nopenssl x509 -in etcd-node2.crt -text -noout | grep -A1 \"Subject Alternative Name\"\n# Expected: IP Address:192.168.0.204, IP Address:127.0.0.1\nrm temp.cnf\n\n\n# Certificate for postgres-03 (192.168.0.205)\nopenssl genrsa -out etcd-node3.key 2048\n\ncat &gt; temp.cnf &lt;&lt;EOF\n&#91; req ]\ndistinguished_name = req_distinguished_name\nreq_extensions = v3_req\n&#91; req_distinguished_name ]\n&#91; v3_req ]\nsubjectAltName = @alt_names\n&#91; alt_names ]\nIP.1 = 192.168.0.205\nIP.2 = 127.0.0.1\nEOF\n\nopenssl req -new -key etcd-node3.key -out etcd-node3.csr \\\n  -subj \"\/CN=etcd-node3\" -config temp.cnf\n\nopenssl x509 -req -in etcd-node3.csr -CA ca.crt -CAkey ca.key \\\n  -CAcreateserial -out etcd-node3.crt -days 7300 \\\n  -sha256 -extensions v3_req -extfile temp.cnf\n\nopenssl x509 -in etcd-node3.crt -text -noout | grep -A1 \"Subject Alternative Name\"\n# Expected: IP Address:192.168.0.205, IP Address:127.0.0.1\nrm temp.cnf<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-4-generate-the-postgre-sql-server-certificate\">Step 4 \u2014 Generate the PostgreSQL server certificate<\/h3>\n\n\n\n<p>One shared certificate covers all PostgreSQL nodes. <\/p>\n\n\n\n<p>It is used for both PostgreSQL connections and the Patroni REST API.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nopenssl genrsa -out server.key 2048\n\nopenssl req -new -key server.key -out server.req\n# You will be prompted for certificate details \u2014 the Common Name is not critical\n# since connections are verified by IP SAN, not CN\n# Warning \"No -copy_extensions given\" is harmless \u2014 the server cert does not need SANs\n\nopenssl req -x509 -key server.key -in server.req -out server.crt -days 7300\n# Self-signed server certificate \u2014 signed directly with server.key, not the CA\n# Patroni and PostgreSQL use this certificate to identify themselves to clients<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-5-distribute-certificates-to-postgres-02-and-postgres-03\">Step 5 \u2014 Distribute certificates to postgres-02 and postgres-03<\/h3>\n\n\n\n<p>postgres-01 keeps its own certs in ~\/certs \u2014 no scp needed for it.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nscp ~\/certs\/ca.crt ~\/certs\/etcd-node2.crt ~\/certs\/etcd-node2.key \\\n  ~\/certs\/server.crt ~\/certs\/server.key fernando@192.168.0.204:\/tmp\/\n\nscp ~\/certs\/ca.crt ~\/certs\/etcd-node3.crt ~\/certs\/etcd-node3.key \\\n  ~\/certs\/server.crt ~\/certs\/server.key fernando@192.168.0.205:\/tmp\/<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-6-install-certificates-on-each-postgre-sql-node\">Step 6 \u2014 Install certificates on each PostgreSQL node<\/h3>\n\n\n\n<p>All certificates live in <code>\/etc\/etcd\/certs\/<\/code> on every node. <\/p>\n\n\n\n<p>The directory is owned by <code>etcd:etcd<\/code> so the etcd daemon can read its certs. <\/p>\n\n\n\n<p>En <code>postgres<\/code> user gets read access via ACL so Patroni can connect to etcd.<\/p>\n\n\n\n<p><strong>Important:<\/strong> set file permissions before locking down the directory. <\/p>\n\n\n\n<p>After <code>chmod 700<\/code> the shell cannot expand globs inside the directory as a non-root user \u2014 use explicit filenames.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo mkdir -p \/etc\/etcd\/certs\nsudo apt-get install -y acl\n# acl: provides setfacl \u2014 needed to grant postgres user access without changing ownership<\/code><\/pre>\n\n\n\n<p><strong>On postgres-01<\/strong> \u2014 copy from ~\/certs (files were never in \/tmp on this node):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo cp ~\/certs\/ca.crt \/etc\/etcd\/certs\/\nsudo cp ~\/certs\/etcd-node1.crt \/etc\/etcd\/certs\/\nsudo cp ~\/certs\/etcd-node1.key \/etc\/etcd\/certs\/\nsudo cp ~\/certs\/server.crt \/etc\/etcd\/certs\/\nsudo cp ~\/certs\/server.key \/etc\/etcd\/certs\/<\/code><\/pre>\n\n\n\n<p><strong>On postgres-02<\/strong> \u2014 move from \/tmp:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/tmp\/ca.crt \/etc\/etcd\/certs\/\nsudo mv \/tmp\/etcd-node2.crt \/etc\/etcd\/certs\/\nsudo mv \/tmp\/etcd-node2.key \/etc\/etcd\/certs\/\nsudo mv \/tmp\/server.crt \/etc\/etcd\/certs\/\nsudo mv \/tmp\/server.key \/etc\/etcd\/certs\/<\/code><\/pre>\n\n\n\n<p><strong>On postgres-03<\/strong> \u2014 move from \/tmp:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/tmp\/ca.crt \/etc\/etcd\/certs\/\nsudo mv \/tmp\/etcd-node3.crt \/etc\/etcd\/certs\/\nsudo mv \/tmp\/etcd-node3.key \/etc\/etcd\/certs\/\nsudo mv \/tmp\/server.crt \/etc\/etcd\/certs\/\nsudo mv \/tmp\/server.key \/etc\/etcd\/certs\/<\/code><\/pre>\n\n\n\n<p><strong>On all three nodes<\/strong> \u2014 set permissions then lock the directory:<\/p>\n\n\n\n<p>etcd certs must be owned by <code>etcd<\/code> \u2014 the etcd daemon runs as this user. <\/p>\n\n\n\n<p>Server certs must be owned by <code>postgres<\/code> \u2014 PostgreSQL enforces that its SSL private key is owned by the database user or root. <\/p>\n\n\n\n<p>Utilizando <code>etcd<\/code> ownership will cause PostgreSQL to refuse to start with: &#8220;private key file must be owned by the database user or root&#8221;.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Set file permissions first \u2014 must happen before chmod 700 on the directory\n# After chmod 700, the shell cannot expand globs as a non-root user\n\n# etcd certs: owned by etcd\n# On postgres-01:\nsudo chown etcd:etcd \/etc\/etcd\/certs\/etcd-node1.crt \/etc\/etcd\/certs\/etcd-node1.key \/etc\/etcd\/certs\/ca.crt\nsudo chmod 600 \/etc\/etcd\/certs\/etcd-node1.key\nsudo chmod 644 \/etc\/etcd\/certs\/etcd-node1.crt \/etc\/etcd\/certs\/ca.crt\n\n# On postgres-02:\nsudo chown etcd:etcd \/etc\/etcd\/certs\/etcd-node2.crt \/etc\/etcd\/certs\/etcd-node2.key \/etc\/etcd\/certs\/ca.crt\nsudo chmod 600 \/etc\/etcd\/certs\/etcd-node2.key\nsudo chmod 644 \/etc\/etcd\/certs\/etcd-node2.crt \/etc\/etcd\/certs\/ca.crt\n\n# On postgres-03:\nsudo chown etcd:etcd \/etc\/etcd\/certs\/etcd-node3.crt \/etc\/etcd\/certs\/etcd-node3.key \/etc\/etcd\/certs\/ca.crt\nsudo chmod 600 \/etc\/etcd\/certs\/etcd-node3.key\nsudo chmod 644 \/etc\/etcd\/certs\/etcd-node3.crt \/etc\/etcd\/certs\/ca.crt\n\n# Server certs: owned by postgres (all three nodes \u2014 same files on each)\nsudo chown postgres:postgres \/etc\/etcd\/certs\/server.crt \/etc\/etcd\/certs\/server.key\nsudo chmod 600 \/etc\/etcd\/certs\/server.key\nsudo chmod 644 \/etc\/etcd\/certs\/server.crt\n\n# Lock the directory \u2014 run this last, after all file permissions are set\nsudo chown etcd:etcd \/etc\/etcd\/certs\nsudo chmod 700 \/etc\/etcd\/certs\n\n# Grant the postgres user read access to the directory and all files inside it\n# Patroni needs to read the etcd certs to connect with TLS\nsudo setfacl -R -m u:postgres:rX \/etc\/etcd\/certs\n# -R: apply recursively to all files; rX: read + execute on directories (to traverse)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-7-create-the-combined-pem-file-for-patroni\">Step 7 \u2014 Create the combined PEM file for Patroni<\/h3>\n\n\n\n<p>Patroni's <code>restapi.certfile<\/code> expects a single file containing both the certificate and the private key.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo sh -c 'cat \/etc\/etcd\/certs\/server.crt \/etc\/etcd\/certs\/server.key \\\n  &gt; \/etc\/etcd\/certs\/server.pem'\n# Concatenates certificate then key into one file\nsudo chown postgres:postgres \/etc\/etcd\/certs\/server.pem\nsudo chmod 600 \/etc\/etcd\/certs\/server.pem\n# server.pem contains the private key \u2014 PostgreSQL requires 0600 and postgres ownership\n\n# Verify the PEM file is valid\nsudo openssl x509 -in \/etc\/etcd\/certs\/server.pem -text -noout\n# Expected: certificate details including validity dates and subject\n# If \"unable to load certificate\": the PEM file is malformed \u2014 recreate it\n\n# Verify final permissions on all cert files\nsudo ls -la \/etc\/etcd\/certs\/\n# Expected output (postgres-01 shown \u2014 node number differs on 02\/03):\n#   drwx------+  2 etcd     etcd      ca.crt etcd-node1.crt etcd-node1.key server.crt server.key server.pem\n#   -rw-r--r--+  1 etcd     etcd      ca.crt\n#   -rw-r--r--+  1 etcd     etcd      etcd-node1.crt\n#   -rw-------+  1 etcd     etcd      etcd-node1.key\n#   -rw-r--r--+  1 postgres postgres  server.crt\n#   -rw-------+  1 postgres postgres  server.key   \u2190 must be 0600, postgres-owned\n#   -rw-------+  1 postgres postgres  server.pem   \u2190 must be 0600, postgres-owned\n# PostgreSQL will refuse to start if server.key or server.pem is group- or world-readable<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"6-etcd-configuration\">6. etcd Configuration<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-create-the-etcd-data-directory\">Step 1 \u2014 Create the etcd data directory<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo mkdir -p \/var\/lib\/etcd\nsudo chown -R etcd:etcd \/var\/lib\/etcd\n# etcd stores its WAL and snapshot data here \u2014 must be owned by the etcd user<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-create-the-etcd-environment-file-on-each-node\">Step 2 \u2014 Create the etcd environment file on each node<\/h3>\n\n\n\n<p>etcd is configured via environment variables loaded by the systemd service. <\/p>\n\n\n\n<p>Only the node-specific values differ between nodes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># \/etc\/etcd\/etcd.env \u2014 postgres-01 (192.168.0.203)\n\nETCD_NAME=\"postgresql-01\"\n# ETCD_NAME: unique identifier for this member within the cluster\n\nETCD_DATA_DIR=\"\/var\/lib\/etcd\"\n# ETCD_DATA_DIR: where etcd stores its WAL and snapshots\n\nETCD_INITIAL_CLUSTER=\"postgresql-01=https:\/\/192.168.0.203:2380,postgresql-02=https:\/\/192.168.0.204:2380,postgresql-03=https:\/\/192.168.0.205:2380\"\n# ETCD_INITIAL_CLUSTER: all members at bootstrap time \u2014 must be identical on all three nodes\n\nETCD_INITIAL_CLUSTER_STATE=\"new\"\n# new: this is a fresh cluster bootstrap\n# Important: change to \"existing\" after the cluster is running (see section 9 step 3)\n\nETCD_INITIAL_CLUSTER_TOKEN=\"etcd-cluster\"\n# ETCD_INITIAL_CLUSTER_TOKEN: prevents nodes from accidentally joining the wrong cluster\n\nETCD_INITIAL_ADVERTISE_PEER_URLS=\"https:\/\/192.168.0.203:2380\"\n# ETCD_INITIAL_ADVERTISE_PEER_URLS: address this node advertises to other etcd members for peer traffic\n\nETCD_LISTEN_PEER_URLS=\"https:\/\/0.0.0.0:2380\"\n# ETCD_LISTEN_PEER_URLS: address etcd listens on for peer connections from other etcd members\n\nETCD_LISTEN_CLIENT_URLS=\"https:\/\/0.0.0.0:2379\"\n# ETCD_LISTEN_CLIENT_URLS: address etcd listens on for client connections (Patroni connects here)\n\nETCD_ADVERTISE_CLIENT_URLS=\"https:\/\/192.168.0.203:2379\"\n# ETCD_ADVERTISE_CLIENT_URLS: address this node advertises to clients \u2014 must be reachable from Patroni\n\n# TLS for client connections (Patroni \u2192 etcd)\nETCD_CLIENT_CERT_AUTH=\"true\"\n# ETCD_CLIENT_CERT_AUTH: require clients to present a valid certificate (mutual TLS)\nETCD_TRUSTED_CA_FILE=\"\/etc\/etcd\/certs\/ca.crt\"\n# ETCD_TRUSTED_CA_FILE: CA certificate used to verify client certificates\nETCD_CERT_FILE=\"\/etc\/etcd\/certs\/etcd-node1.crt\"\n# ETCD_CERT_FILE: certificate presented to clients connecting to this node\nETCD_KEY_FILE=\"\/etc\/etcd\/certs\/etcd-node1.key\"\n# ETCD_KEY_FILE: private key for the above certificate\n\n# TLS for peer connections (etcd node \u2194 etcd node)\nETCD_PEER_CLIENT_CERT_AUTH=\"true\"\n# ETCD_PEER_CLIENT_CERT_AUTH: require peer nodes to present a valid certificate\nETCD_PEER_TRUSTED_CA_FILE=\"\/etc\/etcd\/certs\/ca.crt\"\nETCD_PEER_CERT_FILE=\"\/etc\/etcd\/certs\/etcd-node1.crt\"\nETCD_PEER_KEY_FILE=\"\/etc\/etcd\/certs\/etcd-node1.key\"<\/code><\/pre>\n\n\n\n<p>For postgres-02 (192.168.0.204): change <code>ETCD_NAME<\/code> a <code>postgresql-02<\/code>, both IP addresses to <code>192.168.0.204<\/code>, and cert\/key filenames to <code>etcd-node2.crt<\/code> \/ <code>etcd-node2.key<\/code>. <\/p>\n\n\n\n<p>For postgres-03 (192.168.0.205): change <code>ETCD_NAME<\/code> a <code>postgresql-03<\/code>, both IP addresses to <code>192.168.0.205<\/code>, and cert\/key filenames to <code>etcd-node3.crt<\/code> \/ <code>etcd-node3.key<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-create-the-etcd-systemd-service-file\">Step 3 \u2014 Create the etcd systemd service file<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\n# Create \/etc\/systemd\/system\/etcd.service with the following content:<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;Unit]\nDescription=etcd key-value store\nDocumentation=https:\/\/github.com\/etcd-io\/etcd\nAfter=network-online.target\nWants=network-online.target\n\n&#91;Service]\nType=notify\n# Type=notify: systemd waits for etcd to send a readiness signal before marking it as started\nWorkingDirectory=\/var\/lib\/etcd\nEnvironmentFile=\/etc\/etcd\/etcd.env\n# EnvironmentFile: loads all ETCD_* variables from the file created in step 2\nExecStart=\/usr\/local\/bin\/etcd\nRestart=always\n# Restart=always: systemd restarts etcd if it exits for any reason\nRestartSec=10s\nLimitNOFILE=40000\n# LimitNOFILE: raise the open file descriptor limit \u2014 etcd opens many files under load\nUser=etcd\nGroup=etcd\n\n&#91;Install]\nWantedBy=multi-user.target<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-4-start-etcd-on-all-3-nodes\">Step 4 \u2014 Start etcd on all 3 nodes<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo systemctl daemon-reload\n# daemon-reload: required after creating or modifying a systemd unit file\n\nsudo systemctl enable etcd\n# enable: start etcd automatically on boot\n\nsudo systemctl start etcd\n# start: start the etcd service now\n\nsudo systemctl status etcd\n# Expected: Active: active (running)\n# If \"Active: failed\": check journalctl -xeu etcd.service for details\n# Common causes:\n#   - cert not found: verify paths in etcd.env match files in \/etc\/etcd\/certs\/\n#   - permission denied on key: run \"sudo chown etcd:etcd \/etc\/etcd\/certs\/*.key\"\n#   - port in use: run \"ss -tlnp | grep 237\" to find what is on ports 2379\/2380<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-5-verify-etcd-cluster-health\">Step 5 \u2014 Verify etcd cluster health<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\netcdctl endpoint health\n# Expected:\n#   127.0.0.1:2379 is healthy: successfully committed proposal: took = 2.3ms\n# This checks only the local node \u2014 the full cluster check is below\n\n# Full cluster health check with TLS credentials\nsudo etcdctl \\\n  --endpoints=https:\/\/192.168.0.203:2379,https:\/\/192.168.0.204:2379,https:\/\/192.168.0.205:2379 \\\n  --cacert=\/etc\/etcd\/certs\/ca.crt \\\n  --cert=\/etc\/etcd\/certs\/etcd-node1.crt \\\n  --key=\/etc\/etcd\/certs\/etcd-node1.key \\\n  endpoint health\n# --cacert: CA certificate to verify the server certificates\n# --cert \/ --key: client certificate and key for mutual TLS\n# Expected:\n#   https:\/\/192.168.0.203:2379 is healthy: successfully committed proposal: took = 2.3ms\n#   https:\/\/192.168.0.204:2379 is healthy: successfully committed proposal: took = 2.1ms\n#   https:\/\/192.168.0.205:2379 is healthy: successfully committed proposal: took = 2.4ms\n# If any node is unhealthy: check journalctl -u etcd on that node\n# If all nodes unhealthy: check firewall on port 2380 between nodes\n\n# Check leader election \u2014 one node should be the leader\nsudo etcdctl \\\n  --endpoints=https:\/\/192.168.0.203:2379,https:\/\/192.168.0.204:2379,https:\/\/192.168.0.205:2379 \\\n  --cacert=\/etc\/etcd\/certs\/ca.crt \\\n  --cert=\/etc\/etcd\/certs\/etcd-node1.crt \\\n  --key=\/etc\/etcd\/certs\/etcd-node1.key \\\n  endpoint status --write-out=table\n# Expected: a table with one node showing IS LEADER = true\n# If no leader: quorum is not established \u2014 verify all three nodes are running<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"7-patroni-installation-and-configuration\">7. Patroni Installation and Configuration<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-install-patroni\">Step 1 \u2014 Install Patroni<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo apt install -y patroni\n# On Ubuntu 22.04+, the apt package includes the etcd v3 client library\n# If your distro's package does not include it: pip install patroni&#91;etcd3]\n# &#91;etcd3]: selects the etcd v3 API backend \u2014 required for etcd 3.5+\n# The older &#91;etcd] flag uses the deprecated v2 HTTP API\n\nsudo mkdir -p \/etc\/patroni\/\n# Patroni reads its configuration from a YAML file in this directory<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-create-patroni-yml-on-each-node\">Step 2 \u2014 Create patroni.yml on each node<\/h3>\n\n\n\n<p>Only <code>nombre<\/code>, <code>restapi.connect_address<\/code>, <code>postgresql.connect_address<\/code>, and the etcd cert\/key paths differ between nodes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># \/etc\/patroni\/config.yml \u2014 postgres-01 (192.168.0.203)\n\nscope: postgresql-cluster\n# scope: cluster name \u2014 must be identical on all Patroni nodes\n# Used as the key prefix in etcd to separate multiple Patroni clusters\n\nnamespace: \/service\/\n# namespace: etcd key prefix \u2014 all cluster state is stored under \/service\/postgresql-cluster\/\n\nname: postgresql-01\n# name: unique name for this node within the cluster\n\netcd3:\n  hosts: 192.168.0.203:2379,192.168.0.204:2379,192.168.0.205:2379\n  # etcd3: use the etcd v3 API (gRPC) \u2014 required for etcd 3.5+\n  # The older \"etcd:\" key uses the v2 HTTP API which is deprecated\n  protocol: https\n  # protocol: https \u2014 Patroni connects to etcd over TLS\n  cacert: \/etc\/etcd\/certs\/ca.crt\n  # cacert: CA certificate to verify the etcd server certificates\n  cert: \/etc\/etcd\/certs\/etcd-node1.crt\n  # cert: client certificate presented to etcd for mutual TLS\n  key: \/etc\/etcd\/certs\/etcd-node1.key\n  # key: private key for the client certificate\n\nrestapi:\n  listen: 0.0.0.0:8008\n  # listen: Patroni's REST API listens on all interfaces on port 8008\n  # HAProxy queries this API to determine which node is the current primary\n  connect_address: 192.168.0.203:8008\n  # connect_address: the address other nodes use to reach this node's REST API \u2014 must be the actual IP\n  certfile: \/etc\/etcd\/certs\/server.pem\n  # certfile: combined cert+key PEM file \u2014 enables TLS on the REST API\n  # HAProxy uses check-ssl when querying \/primary \u2014 the API must serve a certificate\n\nbootstrap:\n  dcs:\n    ttl: 30\n    # ttl: leader lease duration in seconds\n    # If the primary does not renew within ttl seconds, it is considered failed\n    # and a standby is promoted. Lower = faster failover; higher = more tolerance for slow networks.\n    loop_wait: 10\n    # loop_wait: how often Patroni checks cluster health (seconds)\n    retry_timeout: 10\n    # retry_timeout: how long Patroni retries a failed etcd or PostgreSQL operation before giving up\n    maximum_lag_on_failover: 1048576\n    # maximum_lag_on_failover: standbys more than 1MB behind the primary will not be promoted\n    # Prevents promoting a very stale standby that would cause significant data loss\n    postgresql:\n      parameters:\n        ssl: 'on'\n        # ssl: enable TLS on PostgreSQL connections\n        ssl_cert_file: \/etc\/etcd\/certs\/server.crt\n        ssl_key_file: \/etc\/etcd\/certs\/server.key\n      pg_hba:\n        # pg_hba: Patroni writes this pg_hba.conf on bootstrap\n        # hostssl: TLS is required \u2014 plain connections are rejected\n        - hostssl replication replicator 127.0.0.1\/32 md5\n        - hostssl replication replicator 192.168.0.203\/32 md5\n        - hostssl replication replicator 192.168.0.204\/32 md5\n        - hostssl replication replicator 192.168.0.205\/32 md5\n        # Replication connections from all three PostgreSQL nodes\n        - hostssl all all 127.0.0.1\/32 md5\n        - hostssl all all 0.0.0.0\/0 md5\n        # Application connections \u2014 TLS required, password authentication\n\n  initdb:\n    - encoding: UTF8\n    - data-checksums\n    # data-checksums: enables page-level checksums \u2014 required for pg_rewind\n    # Detects corrupted blocks; slight write overhead (typically under 2%)\n\npostgresql:\n  listen: 0.0.0.0:5432\n  connect_address: 192.168.0.203:5432\n  # connect_address: this node's actual IP \u2014 used by standbys to connect for replication\n  data_dir: \/var\/lib\/postgresql\/data\n  # data_dir: PostgreSQL data directory \u2014 Patroni manages this directory entirely\n  bin_dir: \/usr\/lib\/postgresql\/18\/bin\n  # bin_dir: directory containing pg_ctl, pg_basebackup, pg_rewind, initdb\n  # Adjust the version number to match your PostgreSQL installation\n  authentication:\n    superuser:\n      username: postgres\n      password: strongpassword\n      # Patroni uses these credentials for internal management connections\n      # Change before production use\n    replication:\n      username: replicator\n      password: replpassword\n      # Patroni creates this role automatically during bootstrap\n      # Change before production use\n  parameters:\n    max_connections: 100\n    shared_buffers: 256MB\n    # shared_buffers: PostgreSQL's main memory cache \u2014 typically 25% of total RAM\n    # 256MB is a conservative default; increase based on available memory\n\ntags:\n  nofailover: false\n  # nofailover: set to true on a node you never want automatically promoted (e.g. a DR standby)\n  noloadbalance: false\n  # noloadbalance: set to true to exclude this node from read replica routing\n  clonefrom: false\n  nosync: false\n\nctl:\n  insecure: true\n  # insecure: skip TLS certificate verification when patronictl calls the Patroni REST API\n  # Required for switchover and failover commands \u2014 without it patronictl fails with an SSL error\n  # Read-only commands (patronictl list) work without this because they use etcd, not the REST API\n  # Do NOT add cacert or certfile here \u2014 a server certfile causes a bad TLS handshake<\/code><\/pre>\n\n\n\n<p>For postgres-02 (192.168.0.204): change <code>nombre<\/code> a <code>postgresql-02<\/code>, both <code>connect_address<\/code> values to <code>192.168.0.204<\/code>, and etcd cert\/key to <code>etcd-node2.crt<\/code> \/ <code>etcd-node2.key<\/code>. <\/p>\n\n\n\n<p>For postgres-03 (192.168.0.205): change <code>nombre<\/code> a <code>postgresql-03<\/code>, both <code>connect_address<\/code> values to <code>192.168.0.205<\/code>, and etcd cert\/key to <code>etcd-node3.crt<\/code> \/ <code>etcd-node3.key<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"8-starting-the-cluster\">8. Starting the Cluster<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-start-patroni-on-postgres-01-first\">Step 1 \u2014 Start Patroni on postgres-01 first<\/h3>\n\n\n\n<p>The intended primary must start first. <\/p>\n\n\n\n<p>If a standby starts before a primary exists in etcd, it waits \u2014 it does not cause an error. Starting the primary first avoids a three-way leader election race.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nsudo systemctl enable patroni\n# enable: start Patroni automatically on boot\nsudo systemctl restart patroni\n# Patroni initialises the PostgreSQL data directory (initdb), starts PostgreSQL,\n# acquires the leader lease in etcd, and configures itself as primary\n\njournalctl -u patroni -f\n# -f: follow \u2014 stream new log lines as they appear\n# Watch for: \"promoted self to leader\" \u2014 confirms postgres-01 is the primary\n# Press Ctrl+C to stop following once confirmed\n# If \"could not connect to etcd\": verify etcd is running and TLS certs are correct<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-verify-postgres-01-is-the-leader\">Step 2 \u2014 Verify postgres-01 is the leader<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\n# patronictl reads ca.crt from \/etc\/etcd\/certs\/ \u2014 that directory is mode 700.\n# Run with sudo, or it will fail with an SSL certificate load error.\nsudo patronictl -c \/etc\/patroni\/config.yml list\n# Expected:\n# + Cluster: postgresql-cluster +----+-----------+\n# | Member        | Host              | Role   | State   | TL | Lag in MB |\n# +---------------+-------------------+--------+---------+----+-----------+\n# | postgresql-01 | 192.168.0.203    | Leader | running |  1 |           |\n# +---------------+-------------------+--------+---------+----+-----------+\n# If State is \"start failed\": check journalctl -u patroni for the PostgreSQL error\n# If no Leader after 30 seconds: etcd health check (section 7 step 5)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-start-patroni-on-postgres-02-and-postgres-03\">Step 3 \u2014 Start Patroni on postgres-02 and postgres-03<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-02 and postgres-03\nsudo systemctl enable patroni && sudo systemctl restart patroni\n# Patroni detects the existing primary in etcd, runs pg_basebackup from postgres-01,\n# and starts PostgreSQL as a streaming standby\n\n# Verify all three nodes\nsudo patronictl -c \/etc\/patroni\/config.yml list\n# Expected:\n# + Cluster: postgresql-cluster -----+----+-----------+\n# | Member        | Host           | Role    | State   | TL | Lag in MB |\n# +---------------+----------------+---------+---------+----+-----------+\n# | postgresql-01 | 192.168.0.203 | Leader  | running |  1 |           |\n# | postgresql-02 | 192.168.0.204 | Replica | running |  1 |         0 |\n# | postgresql-03 | 192.168.0.205 | Replica | running |  1 |         0 |\n# +---------------+----------------+---------+---------+----+-----------+\n# Lag in MB = 0: standbys are fully caught up with the primary\n# If a node shows \"stopped\": check journalctl -u patroni on that node\n# If pg_basebackup failed: verify port 5432 is open between nodes<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-4-change-initial-cluster-state-to-existing-on-all-nodes\">Step 4 \u2014 Change initial-cluster-state to existing on all nodes<\/h3>\n\n\n\n<p>After successful bootstrap, <code>initial-cluster-state<\/code> must be changed from <code>new<\/code> a <code>existing<\/code>. <\/p>\n\n\n\n<p>This prevents a node from accidentally bootstrapping a new cluster if it is restarted in isolation later.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\n# Edit \/etc\/etcd\/etcd.env and change:\n#   ETCD_INITIAL_CLUSTER_STATE=\"new\"\n# to:\n#   ETCD_INITIAL_CLUSTER_STATE=\"existing\"\n\nsudo systemctl restart etcd\n# Restart to apply the change\n# Expected: etcd rejoins the existing cluster cleanly\n# Verify: run the endpoint health command from section 7 step 5<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9-ha-proxy-setup\">9. HAProxy Setup<\/h2>\n\n\n\n<p>HAProxy routes all application connections to the current primary by querying Patroni's REST API. <\/p>\n\n\n\n<p>Only the primary returns HTTP 200 on <code>\/primary<\/code> \u2014 standbys return HTTP 503. <\/p>\n\n\n\n<p>Three HAProxy nodes provide redundancy; keepalived (section 11) moves the VIP between them.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-install-ha-proxy\">Step 1 \u2014 Install HAProxy<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On haproxy-01, haproxy-02, haproxy-03\nsudo apt install -y haproxy\n\nhaproxy -v\n# Expected: HAProxy version 2.x.x\n# If an older version: add the HAProxy apt repository for the latest version<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-configure-ha-proxy\">Step 2 \u2014 Configure HAProxy<\/h3>\n\n\n\n<p>The configuration is identical on all three HAProxy nodes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># \/etc\/haproxy\/haproxy.cfg\n\nfrontend postgres_frontend\n    bind *:5432\n    mode tcp\n    # mode tcp: HAProxy forwards raw TCP \u2014 PostgreSQL is not an HTTP protocol\n    timeout client 30s\n    # timeout client belongs in the frontend only \u2014 HAProxy will warn and ignore it in a backend\n    default_backend postgres_backend\n\nbackend postgres_backend\n    mode tcp\n    option tcp-check\n    option httpchk GET \/primary\n    # httpchk: HAProxy queries this endpoint on each node's Patroni REST API\n    # Only the current primary returns HTTP 200 on \/primary; standbys return 503\n    # This is how HAProxy knows which node to route write traffic to\n    http-check expect status 200\n    timeout connect 5s\n    # timeout connect and timeout server belong in the backend only\n    timeout server 30s\n    server postgresql-01 192.168.0.203:5432 port 8008 check check-ssl verify none\n    server postgresql-02 192.168.0.204:5432 port 8008 check check-ssl verify none\n    server postgresql-03 192.168.0.205:5432 port 8008 check check-ssl verify none\n    # port 8008: HAProxy checks the Patroni REST API port, not the PostgreSQL port\n    # check-ssl: use TLS when querying the REST API (Patroni REST API has TLS enabled)\n    # verify none: skip certificate CN verification \u2014 acceptable in a private cluster\n    # where nodes are identified by IP<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-validate-config-and-reload-ha-proxy\">Step 3 \u2014 Validate config and reload HAProxy<\/h3>\n\n\n\n<p>Always validate the config before reloading. <\/p>\n\n\n\n<p>HAProxy will refuse to reload if the config has errors.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On haproxy-01, haproxy-02, haproxy-03\nsudo haproxy -c -f \/etc\/haproxy\/haproxy.cfg\n# -c: check config only, do not start\n# -f: path to config file\n# Expected: Configuration file is valid\n# If you see warnings about timeout client\/server in the wrong section,\n# check that timeout client is in the frontend and timeout connect\/server are in the backend<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># On haproxy-01, haproxy-02, haproxy-03\nsudo systemctl reload haproxy\n# reload: applies the new configuration without dropping existing connections\n# Expected: no output on the terminal; confirm success in the logs below<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo journalctl -u haproxy --since \"1 minute ago\"\n# Expected lines (timestamps will differ):\n#   systemd&#91;1]: Reloading haproxy.service - HAProxy Load Balancer...\n#   systemd&#91;1]: Reloaded haproxy.service - HAProxy Load Balancer.\n# If you see \"Failed\": check sudo haproxy -c output first<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo tail -f \/var\/log\/syslog | grep haproxy\n# Watch the HAProxy logs to confirm it is checking the PostgreSQL nodes\n# Expected: repeated health check lines; one node should show \"UP\" (the primary)\n# If all nodes show \"DOWN\": HAProxy cannot reach port 8008 \u2014 check firewall<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"10-keepalived-setup\">10. keepalived Setup<\/h2>\n\n\n\n<p>keepalived manages the VIP using VRRP. <\/p>\n\n\n\n<p>One HAProxy node holds the VIP (the MASTER). <\/p>\n\n\n\n<p>If the MASTER's health check fails or the node goes down, keepalived on a BACKUP node claims the VIP. <\/p>\n\n\n\n<p>Applications always connect to the VIP \u2014 HAProxy node failures are transparent.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-install-keepalived\">Step 1 \u2014 Install keepalived<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On haproxy-01, haproxy-02, haproxy-03\nsudo apt update && sudo apt install -y keepalived<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-create-the-ha-proxy-health-check-script\">Step 2 \u2014 Create the HAProxy health check script<\/h3>\n\n\n\n<p>keepalived runs this script every 2 seconds. <\/p>\n\n\n\n<p>A non-zero exit code tells keepalived this node should not hold the VIP.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On haproxy-01, haproxy-02, haproxy-03\n# Create \/etc\/keepalived\/check_haproxy.sh with the following content:<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/bin\/bash\n\nPORT=5432\n\nif ! pidof haproxy &gt; \/dev\/null; then\n    # pidof haproxy: checks whether the HAProxy process is running\n    echo \"HAProxy is not running\"\n    exit 1\n    # exit 1: non-zero tells keepalived this node is unhealthy \u2014 VIP moves to a BACKUP\nfi\n\nif ! ss -ltn | grep -q \":${PORT}\"; then\n    # ss -ltn: list listening TCP sockets; grep checks whether port 5432 is bound\n    echo \"HAProxy is not listening on port ${PORT}\"\n    exit 2\nfi\n\nexit 0\n# exit 0: tells keepalived this node is healthy and should keep (or receive) the VIP<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># On haproxy-01, haproxy-02, haproxy-03\nsudo useradd -r -s \/bin\/false keepalived_script\n# -r: system account; -s \/bin\/false: no interactive login\n# keepalived runs health check scripts as this user when enable_script_security is active\n\nsudo chmod +x \/etc\/keepalived\/check_haproxy.sh\nsudo chown keepalived_script:keepalived_script \/etc\/keepalived\/check_haproxy.sh\nsudo chmod 700 \/etc\/keepalived\/check_haproxy.sh\n# 700: owner execute-only \u2014 keepalived requires the script not be writable by other users\n# when enable_script_security is active (configured in keepalived.conf below)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-configure-keepalived\">Step 3 \u2014 Configure keepalived<\/h3>\n\n\n\n<p>Each node has a different <code>state<\/code> y <code>priority<\/code>. <\/p>\n\n\n\n<p>The MASTER (priority 100) holds the VIP initially. <\/p>\n\n\n\n<p>If it fails, the BACKUP with the next highest priority (90) claims the VIP.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># \/etc\/keepalived\/keepalived.conf \u2014 haproxy-01 (MASTER)\n\nglobal_defs {\n    enable_script_security\n    # enable_script_security: prevents keepalived from running scripts owned by root\n    # or writable by anyone \u2014 prevents privilege escalation through health check scripts\n    script_user keepalived_script\n    # script_user: the OS user keepalived uses to execute health check scripts\n}\n\nvrrp_script check_haproxy {\n    script \"\/etc\/keepalived\/check_haproxy.sh\"\n    interval 2\n    # interval: run the health check every 2 seconds\n    fall 3\n    # fall: mark this node as failed after 3 consecutive failing checks (6 seconds total)\n    rise 2\n    # rise: mark this node as recovered after 2 consecutive passing checks (4 seconds)\n}\n\nvrrp_instance VI_1 {\n    state MASTER\n    # MASTER: this node holds the VIP initially on startup\n    interface enp0s3\n    # interface: the network interface where the VIP is assigned\n    # Run \"ip link show\" to find your interface name \u2014 may be ens3, enp0s3, eth0, etc.\n    # In this lab the interface is enp0s3 (VirtualBox default)\n    virtual_router_id 51\n    # virtual_router_id: identifies this VRRP group \u2014 must be identical on all three nodes\n    priority 100\n    # priority: the node with the highest priority wins the VIP election\n    # haproxy-01 wins by default (100 &gt; 90 &gt; 80)\n    advert_int 1\n    # advert_int: send VRRP advertisements every 1 second\n    authentication {\n        auth_type PASS\n        auth_pass changeme123\n        # auth_pass: shared password between all keepalived nodes \u2014 change before production use\n    }\n    virtual_ipaddress {\n        192.168.0.210\n        # The VIP \u2014 applications connect to this address on port 5432\n    }\n    track_script {\n        check_haproxy\n        # track_script: tie this VRRP instance to the HAProxy health check\n        # If the script exits non-zero, this node's effective priority drops\n        # below the BACKUPs and the VIP moves\n    }\n}<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># \/etc\/keepalived\/keepalived.conf \u2014 haproxy-02 (BACKUP, second priority)\n# Identical to haproxy-01 except:\n#   state BACKUP\n#   priority 90<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># \/etc\/keepalived\/keepalived.conf \u2014 haproxy-03 (BACKUP, lowest priority)\n# Identical to haproxy-01 except:\n#   state BACKUP\n#   priority 80<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-4-start-keepalived\">Step 4 \u2014 Start keepalived<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On haproxy-01, haproxy-02, haproxy-03\nsudo systemctl enable --now keepalived<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo journalctl -u keepalived -f\n# Watch the keepalived logs \u2014 expected to see VRRP state transitions\n# haproxy-01 should log: \"(VI_1) Entering BACKUP STATE\" then \"(VI_1) Entering MASTER STATE\"\n#   (briefly enters BACKUP on startup, wins election after ~4 seconds due to priority 100)\n# haproxy-02 and haproxy-03 should log: \"(VI_1) Entering BACKUP STATE\"\n# Press Ctrl+C to stop following\n#\n# NOTE: \"Truncating auth_pass to 8 characters\" is a warning, not an error.\n# keepalived only uses the first 8 characters of auth_pass regardless of what you set.\n# This is fine as long as all nodes use the same password string \u2014 they will all truncate identically.<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># Verify the VIP is active on haproxy-01\nping -c 3 192.168.0.210\n# Expected: 3 packets transmitted, 3 received\n# If 0 received: VIP is not assigned to any node \u2014 check keepalived logs on all three HAProxy nodes<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"11-set-the-postgres-superuser-password\">11. Set the postgres Superuser Password<\/h2>\n\n\n\n<p>Patroni manages its own <code>pg_hba.conf<\/code> at <code>\/var\/lib\/postgresql\/data\/pg_hba.conf<\/code> \u2014 not the default package location at <code>\/etc\/postgresql\/18\/main\/pg_hba.conf<\/code>. <\/p>\n\n\n\n<p>Patroni's file contains only <code>hostssl<\/code> entries and no local Unix socket entry. <\/p>\n\n\n\n<p>This means <code>sudo -u postgres psql<\/code> will fail until a local entry is added.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01 \u2014 check Patroni's actual pg_hba.conf\nsudo cat \/var\/lib\/postgresql\/data\/pg_hba.conf\n# Expected: \"Do not edit this file manually! It will be overwritten by Patroni!\"\n# followed by hostssl entries only \u2014 no local socket entry<\/code><\/pre>\n\n\n\n<p>Add a temporary local entry so the postgres OS user can connect via socket:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\necho \"local all postgres peer\" | sudo tee -a \/var\/lib\/postgresql\/data\/pg_hba.conf<\/code><\/pre>\n\n\n\n<p>Reload the config using pg_ctl \u2014 pg_reload_conf() cannot be used because there is no socket connection yet:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nsudo -u postgres \/usr\/lib\/postgresql\/18\/bin\/pg_ctl reload -D \/var\/lib\/postgresql\/data\n# Expected: server signaled<\/code><\/pre>\n\n\n\n<p>Set the password:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nsudo -u postgres psql -c \"ALTER USER postgres PASSWORD 'strongpassword';\"\n# Expected: ALTER ROLE\n# IMPORTANT: this password must match postgresql.authentication.superuser.password in config.yml\n# If they differ, Patroni cannot connect to its local PostgreSQL instance and will report\n# unknown LSN \u2014 the node will appear stuck after a switchover or restart\n# The password is now stored in the database \u2014 it will survive Patroni overwriting pg_hba.conf<\/code><\/pre>\n\n\n\n<p>Patroni will overwrite <code>pg_hba.conf<\/code> on its next cycle and remove the local entry \u2014 that is expected and fine. <\/p>\n\n\n\n<p>The password persists in the database catalog regardless.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"12-verify-the-full-stack\">12. Verify the Full Stack<\/h2>\n\n\n\n<p>Run these checks in order after completing all setup steps.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Check 1: etcd cluster is healthy (run on postgres-01)\nsudo etcdctl \\\n  --endpoints=https:\/\/192.168.0.203:2379,https:\/\/192.168.0.204:2379,https:\/\/192.168.0.205:2379 \\\n  --cacert=\/etc\/etcd\/certs\/ca.crt \\\n  --cert=\/etc\/etcd\/certs\/etcd-node1.crt \\\n  --key=\/etc\/etcd\/certs\/etcd-node1.key \\\n  endpoint health\n# Expected: all three nodes report \"is healthy\"\n\n# Check 2: Patroni cluster has a leader and two replicas (run on any PostgreSQL node)\nsudo patronictl -c \/etc\/patroni\/config.yml list\n# Expected: one Leader, two Replica, all State = running, Lag = 0\n\n# Check 3: VIP is responding\nping -c 3 192.168.0.210\n# Expected: 3 packets received\n# If 0 received: VIP not assigned \u2014 check keepalived on all HAProxy nodes\n\n# Check 4: PostgreSQL is reachable through the VIP\npsql -h 192.168.0.210 -U postgres -c \"SELECT inet_server_addr(), pg_is_in_recovery();\"\n# Expected: inet_server_addr = 192.168.0.203 (the primary), pg_is_in_recovery = f\n# pg_is_in_recovery = f: confirms this is the primary (standbys return t)\n# If connection refused: HAProxy is not routing \u2014 check haproxy status and \/primary endpoint\n\n# Check 5: replication is streaming\npsql -h 192.168.0.210 -U postgres -c \"SELECT client_addr, replay_lag FROM pg_stat_replication;\"\n# Expected: two rows (one per standby), replay_lag = 00:00:00 or NULL (caught up)\n# If no rows: standbys are not streaming \u2014 check Patroni logs on standby nodes\n\n# Check 6: insert data on primary, read it from both standbys\npsql -h 192.168.0.210 -U postgres -c \"CREATE TABLE test (id serial, val text);\"\npsql -h 192.168.0.210 -U postgres -c \"INSERT INTO test (val) VALUES ('replication works');\"\npsql -h 192.168.0.204 -U postgres -c \"SELECT * FROM test;\"\n# Expected: one row with val = 'replication works'\n# Connects directly to postgres-02 (a standby) to confirm the data replicated\n# If \"relation does not exist\": replication is not running \u2014 check pg_stat_replication\npsql -h 192.168.0.205 -U postgres -c \"SELECT * FROM test;\"\n# Expected: same row \u2014 confirms postgres-03 is also replicating\npsql -h 192.168.0.210 -U postgres -c \"DROP TABLE test;\"\n# Clean up the test table<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"13-switchover-planned\">13. Switchover (Planned)<\/h2>\n\n\n\n<p>A switchover moves the primary role to a specific standby with zero data loss. <\/p>\n\n\n\n<p>Patroni waits for the candidate standby to be fully caught up before promoting.<\/p>\n\n\n\n<p><strong>Prerequisite \u2014 <code>ctl:<\/code> section in config.yml:<\/strong> En <code>ctl:<\/code> section must be present with <code>insecure: true<\/code> before running switchover. <\/p>\n\n\n\n<p>Without it, patronictl cannot authenticate to the Patroni REST API and the switchover will fail with an SSL error. <\/p>\n\n\n\n<p>Read-only commands like <code>patronictl list<\/code> do not require the REST API (they use etcd) \u2014 so the missing section is not obvious until you attempt a write operation.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Verify the ctl section is present on all PostgreSQL nodes before attempting switchover\nsudo tail -5 \/etc\/patroni\/config.yml\n# Expected:\n#   ctl:\n#     insecure: true\n# If missing: add it and reload patroni (sudo systemctl reload patroni)\n# NOTE: do not add cacert or certfile to the ctl section \u2014 only insecure: true\n# Adding a server certfile causes a bad TLS handshake and switchover still fails<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># On any PostgreSQL node\nsudo patronictl -c \/etc\/patroni\/config.yml switchover postgresql-cluster \\\n  --leader postgresql-01 \\\n  # --leader: the current primary being demoted\n  # NOTE: older Patroni versions used --master here \u2014 newer versions use --leader\n  --candidate postgresql-02\n  # --candidate: the standby being promoted\n# patronictl will show the current topology and ask for confirmation \u2014 press Enter for \"now\", then y\n\n# Patroni performs these steps automatically:\n# 1. Pauses writes on the primary (checkpoint)\n# 2. Waits for the candidate to confirm it has applied all WAL\n# 3. Demotes the current primary to standby\n# 4. Promotes the candidate to primary\n# 5. Reconfigures the old primary as a standby using pg_rewind<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo patronictl -c \/etc\/patroni\/config.yml list\n# Expected: postgresql-02 now shows Role = Leader on a new timeline; postgresql-01 shows Role = Replica\n# postgresql-01 may briefly show \"stopped\" \u2014 this is pg_rewind running; wait 10 seconds and recheck\n# If postgresql-01 stays \"stopped\": pg_rewind failed\n#   Fix: sudo patronictl -c \/etc\/patroni\/config.yml reinit postgresql-cluster postgresql-01<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"14-failover-unplanned\">14. Failover (Unplanned)<\/h2>\n\n\n\n<p>When the primary fails, Patroni detects it automatically:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The primary fails to renew its leader lease in etcd within <code>ttl<\/code> seconds (30 by default)<\/li>\n\n\n\n<li>Patroni on the remaining nodes holds an election in etcd<\/li>\n\n\n\n<li>The standby with the least lag that is within <code>maximum_lag_on_failover<\/code> is elected and promoted<\/li>\n\n\n\n<li>Other standbys reattach to the new primary using <code>pg_rewind<\/code><\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"simulate-a-primary-failure\">Simulate a primary failure<\/h3>\n\n\n\n<p>Stop Patroni on the current primary to simulate a crash. <\/p>\n\n\n\n<p>Patroni manages PostgreSQL \u2014 stopping Patroni also stops PostgreSQL on that node.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01 (the current primary)\nsudo systemctl stop patroni\n# This simulates a primary crash \u2014 PostgreSQL stops and the leader lease expires<\/code><\/pre>\n\n\n\n<p>On postgres-02 or postgres-03, watch the automatic failover happen:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-02 or postgres-03\nwatch -n2 \"sudo patronictl -c \/etc\/patroni\/config.yml list\"\n# Expected sequence over ~30 seconds (ttl):\n# 1. postgresql-01 disappears or shows \"stopped\"\n# 2. One of the remaining nodes shows Leader on a new timeline\n# 3. The other remaining node shows Replica streaming\n# Press Ctrl+C when the new leader is confirmed<\/code><\/pre>\n\n\n\n<p>Bring postgres-01 back:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nsudo systemctl start patroni<\/code><\/pre>\n\n\n\n<p>Check the cluster \u2014 postgres-01 should rejoin as a replica:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo patronictl -c \/etc\/patroni\/config.yml list\n# Expected: postgres-01 shows Replica streaming, Lag = 0\n# If postgres-01 shows \"start failed\": check sudo journalctl -u patroni -n 30 --no-pager<\/code><\/pre>\n\n\n\n<p>Switch back to postgres-01 as primary when ready:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo patronictl -c \/etc\/patroni\/config.yml switchover postgresql-cluster \\\n  --leader &lt;new-leader&gt; \\\n  --candidate postgresql-01<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"manual-failover-use-only-when-automatic-failover-has-not-triggered\">Manual failover \u2014 use only when automatic failover has not triggered<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo patronictl -c \/etc\/patroni\/config.yml failover postgresql-cluster \\\n  --leader postgresql-01 \\\n  --candidate postgresql-02 \\\n  --force\n  # --force: skip the confirmation prompt\n  # Use only when the primary is confirmed down\n  # Without --force, patronictl prompts you to confirm before proceeding<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"15-day-to-day-operations\">15. Day-to-Day Operations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"check-cluster-status\">Check cluster status<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo patronictl -c \/etc\/patroni\/config.yml list\n# Shows all members, roles (Leader\/Replica), state (running\/stopped\/start failed),\n# timeline, and replication lag in MB\n# Run this first whenever diagnosing any cluster issue<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"pause-and-resume-automatic-failover\">Pause and resume automatic failover<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Pause \u2014 Patroni will not promote any standby while paused\n# Use during planned maintenance to prevent accidental failover\nsudo patronictl -c \/etc\/patroni\/config.yml pause postgresql-cluster\n\n# Resume \u2014 restore automatic failover\nsudo patronictl -c \/etc\/patroni\/config.yml resume postgresql-cluster<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"restart-postgre-sql-on-a-node\">Restart PostgreSQL on a node<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Always restart PostgreSQL through patronictl \u2014 never directly via systemctl\n# Running \"systemctl restart postgresql\" bypasses Patroni and causes inconsistent state\nsudo patronictl -c \/etc\/patroni\/config.yml restart postgresql-cluster postgresql-02<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"reload-configuration\">Reload configuration<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Apply postgresql.conf changes across all nodes without a restart\nsudo patronictl -c \/etc\/patroni\/config.yml reload postgresql-cluster<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"edit-cluster-dcs-configuration\">Edit cluster DCS configuration<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Edit settings stored in etcd (ttl, maximum_lag_on_failover, synchronous_mode, etc.)\nsudo patronictl -c \/etc\/patroni\/config.yml edit-config postgresql-cluster\n# Opens the current configuration in your $EDITOR\n# Changes take effect immediately after saving \u2014 no restart required<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"reinitialise-a-failed-standby\">Reinitialise a failed standby<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># If pg_rewind fails after a failover, wipe and rebuild the standby from the primary\nsudo patronictl -c \/etc\/patroni\/config.yml reinit postgresql-cluster postgresql-03\n# Wipes data_dir on postgresql-03 and runs pg_basebackup from the current primary\n# Wipes the standby and rebuilds it from the current primary<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"16-synchronous-mode-zero-data-loss\">16. Synchronous Mode (Zero Data Loss)<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo patronictl -c \/etc\/patroni\/config.yml edit-config postgresql-cluster<\/code><\/pre>\n\n\n\n<p>Add or update:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>synchronous_mode: true\n# true: commits on the primary wait for at least one standby to confirm before returning\n# Patroni sets synchronous_standby_names automatically \u2014 no manual configuration needed\n\nsynchronous_mode_strict: false\n# false (default): if no synchronous standby is available, the primary continues writing\n# true: if no synchronous standby is available, the primary stops accepting writes entirely\n#       use true only when zero data loss is mandatory and availability can be sacrificed<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"17-monitoring\">17. Monitoring<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"patroni-rest-api\">Patroni REST API<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># Check HTTP status code only \u2014 this is what HAProxy checks internally\ncurl -k -o \/dev\/null -w \"%{http_code}\\n\" https:\/\/192.168.0.203:8008\/primary\n# Expected on the primary: 200\n# Expected on a replica: 503\n# -k: skip certificate verification; -o \/dev\/null: discard body; -w: print status code only\n\ncurl -k -o \/dev\/null -w \"%{http_code}\\n\" https:\/\/192.168.0.203:8008\/replica\n# Expected on a replica: 200\n# Expected on the primary: 503\n\n# Check full JSON response (useful for debugging)\ncurl -k https:\/\/192.168.0.203:8008\/primary\n# Response includes: role, server_version, timeline, replication state of each standby\n\ncurl -k https:\/\/192.168.0.203:8008\/cluster | python3 -m json.tool\n# Full cluster status in formatted JSON\n# Expected: all members listed with roles, state, and lag\n\ncurl -k https:\/\/192.168.0.203:8008\/health\n# Expected: HTTP 200 if the node is running normally<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"replication-lag\">Replication lag<\/h3>\n\n\n\n<p>Connect directly to the primary node \u2014 <code>pg_stat_replication<\/code> only has rows on the primary, not on replicas. <\/p>\n\n\n\n<p>HAProxy also routes port 5432 on the VIP to the primary, so either works.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Option 1 \u2014 direct to primary\npsql -h 192.168.0.203 -U postgres -c \"SELECT client_addr, replay_lag, sync_state FROM pg_stat_replication;\"\n\n# Option 2 \u2014 through VIP (HAProxy routes all connections on port 5432 to the primary)\npsql -h 192.168.0.210 -p 5432 -U postgres -c \"SELECT client_addr, replay_lag, sync_state FROM pg_stat_replication;\"<\/code><\/pre>\n\n\n\n<p>Expected output:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  client_addr  | replay_lag | sync_state\n---------------+------------+------------\n 192.168.0.204 |            | async\n 192.168.0.205 |            | async\n(2 rows)<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>One row per connected standby<\/li>\n\n\n\n<li><code>replay_lag = NULL<\/code> (empty): standby is fully caught up \u2014 normal at rest<\/li>\n\n\n\n<li><code>replay_lag<\/code> growing: standby is falling behind \u2014 check network and standby Patroni logs<\/li>\n\n\n\n<li>0 rows: no standbys streaming \u2014 check <code>patronictl list<\/code> to confirm replica states; if replicas show running, check <code>primary_conninfo<\/code> in <code>postgresql.auto.conf<\/code> on each standby<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"18-full-reset\">18. Full Reset<\/h2>\n\n\n\n<p>Use this procedure to rebuild the entire cluster from scratch \u2014 for example after a lab failure, a misconfiguration that cannot be recovered, or to re-run the lab from Section 9. TLS certificates are preserved on all nodes. <\/p>\n\n\n\n<p>Only etcd cluster state and PostgreSQL data are wiped.<\/p>\n\n\n\n<p>Run all commands in this section on <strong>postgres-01, postgres-02, and postgres-03<\/strong> unless labelled otherwise.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-1-stop-patroni-and-etcd-on-all-postgre-sql-nodes\">Step 1 \u2014 Stop Patroni and etcd on all PostgreSQL nodes<\/h3>\n\n\n\n<p>Patroni must stop before etcd so it can cleanly release its leader lock. <\/p>\n\n\n\n<p>If etcd is stopped first, Patroni loses its DCS connection and may hang.<\/p>\n\n\n\n<p>On postgres-01:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nsudo systemctl stop patroni\n# Stops PostgreSQL gracefully via Patroni \u2014 do not use systemctl stop postgresql directly\n\nsudo systemctl stop etcd\n# Stops the etcd member on this node<\/code><\/pre>\n\n\n\n<p>Repeat on postgres-02 and postgres-03 before continuing.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-2-wipe-etcd-and-postgre-sql-data-directories\">Step 2 \u2014 Wipe etcd and PostgreSQL data directories<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo rm -rf \/var\/lib\/etcd\/\n# Removes all etcd WAL and snapshot data \u2014 the etcd cluster will re-bootstrap from scratch\n\nsudo rm -rf \/var\/lib\/postgresql\/data\/\n# Removes all PostgreSQL data files \u2014 Patroni will re-initialise via pg_basebackup on standbys<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-3-recreate-directories-with-correct-ownership\">Step 3 \u2014 Recreate directories with correct ownership<\/h3>\n\n\n\n<p><code>rm -rf<\/code> removes the directory itself, not just its contents. <\/p>\n\n\n\n<p>The directories must be recreated before etcd and Patroni can write to them.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo mkdir -p \/var\/lib\/etcd\/\nsudo chown etcd:etcd \/var\/lib\/etcd\/\n# etcd runs as the etcd user \u2014 it must own its data directory\n\nsudo mkdir -p \/var\/lib\/postgresql\/data\nsudo chown postgres:postgres \/var\/lib\/postgresql\/data\n# Patroni runs as the postgres user \u2014 it must own the PostgreSQL data directory<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-4-restore-acl-permissions-on-etcd-certificates\">Step 4 \u2014 Restore ACL permissions on etcd certificates<\/h3>\n\n\n\n<p><code>rm -rf<\/code> on the data directory does not affect <code>\/etc\/etcd\/certs\/<\/code>, but if you also wiped the certs directory during troubleshooting, the <code>postgres<\/code> user will have lost read access to the etcd certificates. <\/p>\n\n\n\n<p>Run this step to restore those permissions.<\/p>\n\n\n\n<p>On postgres-01:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/ca.crt\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/etcd-node1.crt\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/etcd-node1.key\n# Grants the postgres OS user read access to the etcd TLS files\n# Required because Patroni (running as postgres) connects to etcd over TLS<\/code><\/pre>\n\n\n\n<p>On postgres-02:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-02\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/ca.crt\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/etcd-node2.crt\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/etcd-node2.key<\/code><\/pre>\n\n\n\n<p>On postgres-03:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-03\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/ca.crt\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/etcd-node3.crt\nsudo setfacl -m u:postgres:r \/etc\/etcd\/certs\/etcd-node3.key<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-5-reset-etcd-initial-cluster-state-to-new\">Step 5 \u2014 Reset ETCD_INITIAL_CLUSTER_STATE to &#8220;new&#8221;<\/h3>\n\n\n\n<p>After the first bootstrap, <code>etcd.env<\/code> on all nodes was changed to <code>existing<\/code>. <\/p>\n\n\n\n<p>For a full reset, it must be set back to <code>new<\/code> so etcd treats this as a fresh cluster formation.<\/p>\n\n\n\n<p>On each node, back up and edit <code>\/etc\/etcd\/etcd.env<\/code>:<\/p>\n\n\n\n<p>On postgres-01:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nsudo cp \/etc\/etcd\/etcd.env \/etc\/etcd\/etcd.env.$(date +%Y%m%d)\nsudo vi \/etc\/etcd\/etcd.env<\/code><\/pre>\n\n\n\n<p>Change:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ETCD_INITIAL_CLUSTER_STATE=\"existing\"<\/code><\/pre>\n\n\n\n<p>To:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ETCD_INITIAL_CLUSTER_STATE=\"new\"<\/code><\/pre>\n\n\n\n<p>Repeat on postgres-02 and postgres-03.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-6-start-etcd-on-all-nodes\">Step 6 \u2014 Start etcd on all nodes<\/h3>\n\n\n\n<p>etcd must be running on all three nodes before Patroni starts. <\/p>\n\n\n\n<p>Start etcd on all three nodes before continuing to Step 7.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01, postgres-02, postgres-03\nsudo systemctl start etcd<\/code><\/pre>\n\n\n\n<p>Verify all three members are healthy before starting Patroni:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\nETCDCTL_API=3 etcdctl \\\n  --cacert=\/etc\/etcd\/certs\/ca.crt \\\n  --cert=\/etc\/etcd\/certs\/etcd-node1.crt \\\n  --key=\/etc\/etcd\/certs\/etcd-node1.key \\\n  --endpoints=https:\/\/192.168.0.203:2379,https:\/\/192.168.0.204:2379,https:\/\/192.168.0.205:2379 \\\n  endpoint health\n# Expected: all three endpoints report \"is healthy\"\n# If any node is unhealthy: check journalctl -u etcd on that node before starting Patroni<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"step-7-start-patroni-on-all-nodes\">Step 7 \u2014 Start Patroni on all nodes<\/h3>\n\n\n\n<p>Start Patroni on postgres-01 first. <\/p>\n\n\n\n<p>Patroni on postgres-01 will bootstrap a new PostgreSQL primary. <\/p>\n\n\n\n<p>Only after postgres-01 is running and shows as Leader should postgres-02 and postgres-03 be started \u2014 they will join as replicas via pg_basebackup.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01 \u2014 start first\nsudo systemctl start patroni<\/code><\/pre>\n\n\n\n<p>Wait until postgres-01 shows as Leader:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\npatronictl -c \/etc\/patroni\/config.yml list\n# Expected: postgresql-01 shows as Leader, state running\n# Wait for this before starting postgres-02 and postgres-03<\/code><\/pre>\n\n\n\n<p>Then start Patroni on the remaining nodes:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-02\nsudo systemctl start patroni<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-03\nsudo systemctl start patroni<\/code><\/pre>\n\n\n\n<p>Final verification:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># On postgres-01\npatronictl -c \/etc\/patroni\/config.yml list\n# Expected:\n# + Cluster: postgres-cluster --------+----+-----------+\n# | Member        | Host            | Role    | State   | TL | Lag in MB |\n# +---------------+-----------------+---------+---------+----+-----------+\n# | postgresql-01 | 192.168.0.203:5432 | Leader | running |  1 |           |\n# | postgresql-02 | 192.168.0.204:5432 | Replica | running |  1 |         0 |\n# | postgresql-03 | 192.168.0.205:5432 | Replica | running |  1 |         0 |\n# +---------------+-----------------+---------+---------+----+-----------+\n# TL 1: fresh cluster, timeline resets to 1 on full reset\n# If a node shows \"start failed\": check journalctl -u patroni on that node<\/code><\/pre>\n\n\n\n<p>Continue from Section 9 Step 3 to re-verify the cluster and set <code>ETCD_INITIAL_CLUSTER_STATE=existing<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"19-common-issues\">19. Common Issues<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Issue<\/th><th>Causa<\/th><th>Fix<\/th><\/tr><\/thead><tbody><tr><td>No leader elected<\/td><td>etcd quorum lost<\/td><td>Restore etcd; check port 2380 between PostgreSQL nodes<\/td><\/tr><tr><td>Patroni cannot connect to etcd<\/td><td>TLS misconfiguration<\/td><td>Verify cacert\/cert\/key paths in config.yml; check setfacl permissions<\/td><\/tr><tr><td>Node stuck in <code>start failed<\/code><\/td><td>PostgreSQL will not start<\/td><td>Check journalctl -u patroni; fix config, then <code>patronictl reinit<\/code><\/td><\/tr><tr><td>Standby not streaming<\/td><td>Wrong credentials or pg_hba<\/td><td>Check primary_conninfo in postgresql.auto.conf; verify replicator in pg_hba<\/td><\/tr><tr><td><code>pg_rewind<\/code> fails after failover<\/td><td><code>wal_log_hints<\/code> not enabled<\/td><td>Enable wal_log_hints = on before cluster init; use <code>patronictl reinit<\/code> as fallback<\/td><\/tr><tr><td>VIP not responding<\/td><td>keepalived not running<\/td><td>Check systemctl status keepalived; check journalctl -u keepalived on all HAProxy nodes<\/td><\/tr><tr><td>HAProxy routing to wrong node<\/td><td>Patroni REST API not reachable<\/td><td>Check port 8008 is open; verify Patroni is running on all PostgreSQL nodes<\/td><\/tr><tr><td>etcd cluster reforms on restart<\/td><td>initial-cluster-state still &#8220;new&#8221;<\/td><td>Change to &#8220;existing&#8221; in etcd.env on all nodes and restart etcd<\/td><\/tr><tr><td>PostgreSQL starts outside Patroni<\/td><td><code>systemctl start postgresql<\/code> run directly<\/td><td>Stop PostgreSQL; restart via <code>patronictl restart<\/code><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"20-key-commands-reference\">20. Key Commands Reference<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Command<\/th><th>Descripci\u00f3n<\/th><\/tr><\/thead><tbody><tr><td><code>patronictl list<\/code><\/td><td>Show all cluster members, roles, state, and replication lag<\/td><\/tr><tr><td><code>patronictl switchover<\/code><\/td><td>Planned switchover to a specific node \u2014 zero data loss<\/td><\/tr><tr><td><code>patronictl failover --force<\/code><\/td><td>Force failover \u2014 use only when primary is confirmed down<\/td><\/tr><tr><td><code>patronictl pause<\/code><\/td><td>Disable automatic failover \u2014 use during planned maintenance<\/td><\/tr><tr><td><code>patronictl resume<\/code><\/td><td>Re-enable automatic failover<\/td><\/tr><tr><td><code>patronictl restart<\/code><\/td><td>Restart PostgreSQL on a node via Patroni \u2014 never use systemctl directly<\/td><\/tr><tr><td><code>patronictl reload<\/code><\/td><td>Reload postgresql.conf on all cluster nodes<\/td><\/tr><tr><td><code>patronictl reinit<\/code><\/td><td>Wipe and rebuild a standby from the current primary<\/td><\/tr><tr><td><code>patronictl edit-config<\/code><\/td><td>Edit cluster DCS configuration stored in etcd<\/td><\/tr><tr><td><code>curl -k https:\/\/&lt;node&gt;:8008\/primary<\/code><\/td><td>HTTP 200 if this node is currently the primary<\/td><\/tr><tr><td><code>curl -k https:\/\/&lt;node&gt;:8008\/replica<\/code><\/td><td>HTTP 200 if this node is currently a replica<\/td><\/tr><tr><td><code>curl -k https:\/\/&lt;node&gt;:8008\/cluster<\/code><\/td><td>Full cluster status in JSON<\/td><\/tr><tr><td><code>etcdctl endpoint health<\/code><\/td><td>Check health of all etcd cluster members<\/td><\/tr><tr><td><code>ip addr show enp0s3<\/code><\/td><td>Confirm which HAProxy node currently holds the VIP (VirtualBox default interface)<\/td><\/tr><\/tbody><\/table><\/figure>","protected":false},"excerpt":{"rendered":"<p>Step-by-step lab for building a 6-node PostgreSQL HA cluster with Patroni, etcd, HAProxy, and keepalived. Covers TLS setup, failover, switchover, monitoring, and day-to-day operations.<\/p>","protected":false},"author":1,"featured_media":6695,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rank_math_focus_keyword":"","rank_math_title":"","rank_math_description":"","rank_math_robots":"","rank_math_og_title":"","rank_math_og_description":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[126],"tags":[145,144],"class_list":["post-6685","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql","tag-high-availability","tag-patroni"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/rootfan.com\/wp-content\/uploads\/pexels-photo-36043291-1.jpeg?fit=1880%2C1253&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/posts\/6685","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/comments?post=6685"}],"version-history":[{"count":10,"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/posts\/6685\/revisions"}],"predecessor-version":[{"id":6698,"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/posts\/6685\/revisions\/6698"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/media\/6695"}],"wp:attachment":[{"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/media?parent=6685"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/categories?post=6685"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rootfan.com\/es\/wp-json\/wp\/v2\/tags?post=6685"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}