PostgreSQL High Availability Setup

PostgreSQL High Availability setup covers the installation and basic administration of PostgreSQL 17.2 in a pacemaker cluster on two VirtualBox VMs.

Preparation of the PostgreSQL High Availability setup

To create a PostgreSQL High Availability (HA) system we need a basic pacemaker cluster consisting of two RedHat Enterprise Linux 8.10 nodes. The steps to setup these two machines are described here. On both nodes we install PostgreSQL as described in “Installation and setup of the PostgreSQL 17 server”.

Database configuration

The next step is the configuration of the databases for the PostgreSQL High Availability setup. Run the following commands:

# as root on both nodes
# set password of postgres database user
su - postgres -c "psql -c \"alter user postgres password 'changeme'\";"
# disable automatic startup of postgres
systemctl disable postgresql-17
# stop running database
systemctl stop postgresql-17
# create the archive directory
su - postgres -c 'mkdir /var/lib/pgsql/pg_archive'
# create ssh keys
echo ''|ssh-keygen -q -t rsa -N ''
su - postgres -c "echo ''|ssh-keygen -q -t rsa -N ''"

This needs to be done only on the first node:

(# as root on the first node
# configure passwordless ssh login for root and postgres
# for root
ssh-keyscan -t rsa lin2 >> ~/.ssh/known_hosts
sshpass -p changeme ssh-copy-id lin2
ssh lin2 'ssh-keyscan -t rsa lin1 >> ~/.ssh/known_hosts'
ssh lin2 sshpass -p changeme ssh-copy-id lin1
# for postgres
su - postgres -c 'ssh-keyscan -t rsa lin2 >> ~postgres/.ssh/known_hosts'
su - postgres -c 'sshpass -p changeme ssh-copy-id lin2'
su - postgres -c "ssh lin2 'ssh-keyscan -t rsa lin1 >> ~postgres/.ssh/known_hosts'"
su - postgres -c 'ssh lin2 sshpass -p changeme ssh-copy-id lin1'

# configure database
cat << EOF >> /var/lib/pgsql/17/data/postgresql.conf
# replication settings
listen_addresses = '*'
wal_level = hot_standby
synchronous_commit = on
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/pg_archive/%f'
max_wal_senders=5
hot_standby = on
restart_after_crash = off
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
synchronous_commit = on
restart_after_crash = off
hot_standby_feedback = on
EOF

cat << EOF >> /var/lib/pgsql/17/data/pg_hba.conf
# Allow connections from all other IPv4 hosts
host    all             all             0.0.0.0/0               scram-sha-256
# replication
host    replication     postgres        192.168.0.0/16          trust
host    replication     postgres        11.1.1.0/24             trust
EOF

# start postgres
su - postgres -c 'pg_ctl -D /var/lib/pgsql/17/data start')

Next we restore the slave database from the master. On the second node:

# as root on the second node
rm -rf /var/lib/pgsql/17/data/*
su - postgres -c 'cd /var/lib/pgsql/17/data/;pg_basebackup -h 192.168.0.1 -U postgres -D /var/lib/pgsql/17/data -X stream -P'
cat << EOF >> /var/lib/pgsql/17/data/postgresql.conf
primary_conninfo = 'host=192.168.0.1 port=5432 user=postgres application_name=node2'
restore_command = 'cp /var/lib/pgsql/pg_archive/%f %p'
recovery_target_timeline = 'latest'
EOF
su - postgres -c 'touch /var/lib/pgsql/17/data/standby.signal;pg_ctl -D /var/lib/pgsql/17/data start'

We can confirm that the replication is running:

# on node 1:
su - postgres -c "psql -c 'select application_name,client_addr,backend_start,backend_xmin,state,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag,flush_lag,replay_lag,sync_state,reply_time from pg_stat_replication;'"
Output:
[root@lin1 ~]# # on node 1:
[root@lin1 ~]# su - postgres -c "psql -c 'select application_name,client_addr,backend_start,backend_xmin,state,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag,flush_lag,replay_lag,sync_state,reply_time from pg_stat_replication;'"
 application_name | client_addr |        backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_state |          reply_time
------------------+-------------+------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+------------+-------------------------------
 node2            | 192.168.0.2 | 2025-02-09 11:40:48.85456+01 |          752 | streaming | 0/C000168 | 0/C000168 | 0/C000168 | 0/C000168  |           |           |            | async      | 2025-02-09 11:50:28.294068+01
(1 row)

[root@lin1 ~]#

The next step is the creation of the pacemaker cluster resources. For that we need to stop the databases on both nodes:

# run as root on both nodes:
su - postgres -c 'pg_ctl -D /var/lib/pgsql/17/data stop'

Creation of the cluster resources

To create the cluster resources run the commands below on the first node. You might want to adjust the ip address 11.1.1.184 (it is just a free ip address in my network that will be available on the master node).

# as root on the first node:
pcs resource create pgvip ocf:heartbeat:IPaddr2 ip=11.1.1.184 nic=enp0s3 cidr_netmask=24

pcs resource create postgresql ocf:heartbeat:pgsql \
rep_mode=sync \
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
node_list="lin1 lin2" \
restore_command='cp /var/lib/pgsql/pg_archive/%f "%p"' \
master_ip=11.1.1.184 \
restart_on_promote=true \
pgdata=/var/lib/pgsql/17/data \
pgctl=/usr/pgsql-17/bin/pg_ctl

pcs resource promotable postgresql master-max=1 master-node-max=1 notify=true

pcs constraint colocation add pgvip with Promoted postgresql-clone INFINITY
pcs constraint order promote postgresql-clone then start pgvip symmetrical=false score=INFINITY
pcs constraint order demote  postgresql-clone then stop  pgvip symmetrical=false score=0

After both nodes rebootet the cluster is ready. This can be verified with:

pcs status --full
Output:

[root@lin1 ~]# pcs status –full
Cluster name: clu
Cluster Summary:

  • Stack: corosync (Pacemaker is running)
  • Current DC: lin2 (2) (version 2.1.7-5.2.el8_10-0f7f88312) – partition with quorum
  • Last updated: Fri Feb 7 19:44:54 2025 on lin1
  • Last change: Fri Feb 7 19:42:33 2025 by root via root on lin1
  • 2 nodes configured
  • 4 resource instances configured

Node List:

  • Node lin1 (1): online, feature set 3.19.0
  • Node lin2 (2): online, feature set 3.19.0

Full List of Resources:

  • sbd_fencing (stonith:fence_sbd): Started lin1
  • pgvip (ocf::heartbeat:IPaddr2): Started lin1
  • Clone Set: postgresql-clone postgresql:
    • postgresql (ocf::heartbeat:pgsql): Master lin1
    • postgresql (ocf::heartbeat:pgsql): Slave lin2

Node Attributes:

  • Node: lin1 (1):
    • master-postgresql : 1000
    • postgresql-data-status : LATEST
    • postgresql-master-baseline : 00000000040000A0
    • postgresql-status : PRI
  • Node: lin2 (2):
    • master-postgresql : 100
    • postgresql-data-status : STREAMING|SYNC
    • postgresql-status : HS:sync

Migration Summary:

Tickets:

PCSD Status:
lin1: Online
lin2: Online

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
sbd: active/enabled
[root@lin1 ~]#

Terms used:

postgresql-data-status: LATESTPostgreSQL is master.
postgresql-data-status: DISCONNECTThe master can’t see the connection of the HS, so the state of the HS is set to DISCONNECT.
postgresql-status: PRIPostgreSQL works as a primary (master)
postgresql-status: STOPPostgreSQL is stopped.
postgresql-status: HS:alonePostgreSQL works as Hot Standby.
postgresql-status: HS:STREAMING|ASYNCPostgreSQL works as Hot Standby. Write transactions on the primary don’t wait for confirmation from the HS. HS can be used for Read Only queries.
postgresql-status: HS:STREAMING|SYNCPostgreSQL works as Hot Standby. Write transaction on the primary wait for the HS to confirm data is written to the WAL on the HS. HS can be used for Read Only queries.

more info see here.

Administration and Troubleshooting

Rebuild the standby database

If there is a failure on the standby database that cannot be resolved otherwise, we can always rebuild the standby database from the current master:

# run as root on the standby db that should be rebuilt
# check if a lock file exists on the standby db and remove it.
rm -fv /var/lib/pgsql/tmp/PGSQL.lock
# this will temp. remove the node from the cluster and stop postgres
pcs node standby lin2
sleep 2
# remove old data
rm -rf /var/lib/pgsql/17/data/*
# restore data from the current master (lin1/192.168.0.1)
su - postgres -c 'pg_basebackup -h 192.168.0.1 -U postgres -D /var/lib/pgsql/17/data -X stream -P'
# let the standby re-join the cluster and start replicating
pcs node unstandby lin2

requested WAL segment has already been removed / waiting for WAL to become available

Error messages in the postgres logfile on a standby database like these

2025-02-09 11:12:52.868 CET [11870] LOG:  started streaming WAL from primary at 0/9000000 on timeline 1
2025-02-09 11:12:52.868 CET [11870] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000009 has already been removed
cp: cannot stat '/var/lib/pgsql/pg_archive/00000002.history': No such file or directory
2025-02-09 11:12:52.871 CET [4217] LOG:  waiting for WAL to become available at 0/9000168
cp: cannot stat '/var/lib/pgsql/pg_archive/000000010000000000000009': No such file or directory

happen for example when the primary has already archived a WAL file that is now needed from the standy. To solve it just copy the archived WAL file from the master to the standby:

# on the master:
su - postgres -c 'scp /var/lib/pgsql/pg_archive/000000010000000000000009 lin2:/var/lib/pgsql/pg_archive/'

Web UI for cluster monitoring / administration

The pcsd Web UI can be used to monitor and administer the cluster. Login with hacluster / changeme and click on Add Existing (add one of the nodes, e.g. lin1). The we interface looks like this:

Further info

# location of the pgsgl resource agent
/usr/lib/ocf/resource.d/heartbeat/pgsql
# get metadata of the resouce agent (e.g. version)
export OCF_ROOT=/usr/lib/ocf
$OCF_ROOT/resource.d/heartbeat/pgsql meta-data
# clean the errors of a failed resource
pcs resource cleanup postgresql
# show resource config
pcs resource config postgresql
Sample output of some commands:
[root@lin1 ~]# pcs resource config postgresql
Resource: postgresql (class=ocf provider=heartbeat type=pgsql)
  Attributes: postgresql-instance_attributes
    master_ip=11.1.1.184
    node_list="lin1 lin2"
    pgctl=/usr/pgsql-17/bin/pg_ctl
    pgdata=/var/lib/pgsql/17/data
    primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5"
    rep_mode=sync
    restart_on_promote=true
    restore_command='cp /var/lib/pgsql/pg_archive/%f "%p"'
  Operations:
    demote: postgresql-demote-interval-0s
      interval=0s
      timeout=120s
    methods: postgresql-methods-interval-0s
      interval=0s
      timeout=5s
    monitor: postgresql-monitor-interval-30s
      interval=30s
      timeout=30s
    monitor: postgresql-monitor-interval-29s
      interval=29s
      timeout=30s
      role=Master
    notify: postgresql-notify-interval-0s
      interval=0s
      timeout=90s
    promote: postgresql-promote-interval-0s
      interval=0s
      timeout=120s
    start: postgresql-start-interval-0s
      interval=0s
      timeout=120s
    stop: postgresql-stop-interval-0s
      interval=0s
      timeout=120s
[root@lin1 ~]#