WAL important in PostgreSQL
What is WAL?
Write-Ahead Logging (WAL) is a method used in PostgreSQL to log changes to the database before those changes are written to the actual database files. This is crucial for maintaining consistency and durability in the database, as it allows PostgreSQL to recover from a crash by replaying these logs and applying all changes up to the last known consistent state.
Why is WAL important in PostgreSQL?
- Data Durability: WAL ensures that all committed transactions are saved, even if the database crashes immediately after a transaction is committed. 
- Crash Recovery: Using the WAL files, PostgreSQL can restore the database to its last consistent state by replaying transactions recorded in these logs. 
- Replication: WAL files are used in replication setups to propagate changes from a primary server to one or more standby servers. This is essential for ensuring that all servers in a replication setup are consistent with each other. 
- Performance: By writing changes to a log before they are applied, the database can write to disk in a more efficient way, reducing the number of writes and improving overall performance. 
Managing WAL in PostgreSQL
- Configuration: PostgreSQL offers several configuration options to manage WAL, including the size of the log segments (wal_segment_size), the retention policy (wal_keep_segments), and the frequency of writes (wal_writer_delay). 
- Archiving: You can set up WAL archiving to save completed WAL files to a specified location. This is useful for long-term storage and point-in-time recovery. 
- Monitoring: Monitoring WAL generation and consumption is important, especially in high-load environments. Tools like pg_stat_replication and viewing the pg_wal directory can help assess the health of the WAL process. 
Best Practices
- Regular Backups: Even with WAL, regular backups are crucial for data safety. 
- Tuning WAL Settings: Depending on your workload, adjusting WAL settings can help balance between performance and the overhead of disk IO. 
- WAL Compression: Enabling WAL compression can reduce disk space usage and IO overhead, especially in systems with high transaction rates. 
Overall, the management of WAL in PostgreSQL is vital for ensuring the stability and efficiency of your databases
Read replicas for Amazon RDS for PostgreSQL
To turn on logical replication for a newly created RDS for PostgreSQL DB instance – Create a new DB custom parameter group and set the static parameter rds.logical_replication to 1.
Then, associate this DB parameter group with the Source DB instance and its physical read replica.
You can use the following query to verify the values for wal_level and rds.logical_replication on the source DB instance and its physical read replica.
Postgres=>SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
name | setting
-------------------------+---------
rds.logical_replication | on
wal_level | logical
(2 rows)
Inspect logical replication slot state – You can only see the physical replication slot on your source DB instance.
Postgres=>select slot_name, slot_type, confirmed_flush_lsn from pg_replication_slots;
slot_name | slot_type | confirmed_flush_lsn
---------------------------------------------+-----------+---------------------
rds_us_west_2_db_dhqfsmo5wbbjqrn3m6b6ivdhu4 | physical |
(1 row)
However, on your read replica instance, you can see the logical replication slot and the confirmed_flush_lsn value changes as the application actively consumes logical changes
Postgres=>select slot_name, slot_type, confirmed_flush_lsn from pg_replication_slots;
slot_name | slot_type | confirmed_flush_lsn
-----------+-----------+---------------------
testsub | logical | 0/500002F0
(1 row)
Best Practices
- Regular Backups: Even with WAL, regular backups are crucial for data safety. 
- Tuning WAL Settings: Depending on your workload, adjusting WAL settings can help balance between performance and the overhead of disk IO. 
- WAL Compression: Enabling WAL compression can reduce disk space usage and IO overhead, especially in systems with high transaction rates. 
Read replica limitations with PostgreSQL
The following are limitations for PostgreSQL read replicas:
· PostgreSQL read replicas are read-only. Although a read replica isn't a writeable DB instance, you can promote it to become a standalone RDS for PostgreSQL DB instance. However, the process isn't reversible.
· You can't create a read replica from another read replica if your RDS for PostgreSQL DB instance is running a PostgreSQL version earlier than 14.1. RDS for PostgreSQL supports cascading read replicas on RDS for PostgreSQL version 14.1 and higher releases only.
· If you promote a PostgreSQL read replica, it becomes a writable DB instance. It stops receiving write-ahead log (WAL) files from a source DB instance, and it's no longer a read-only instance. You can create new read replicas from the promoted DB instance as you do for any RDS for PostgreSQL DB instance.
Understanding the parameters that control PostgreSQL replication
The following parameters affect the replication process and determine how well read replicas stay up to date with the source DB instance:
max_wal_senders
The max_wal_senders parameter specifies the maximum number of connections that the source DB instance can support at the same time over the streaming replication protocol. The default for RDS for PostgreSQL 13 and higher releases is 20
wal_keep_segments
The wal_keep_segments parameter specifies the number of write-ahead log (WAL) files that the source DB instance keeps in the pg_wal directory. The default setting is 32.
If wal_keep_segments isn't set to a large enough value for your deployment, a read replica can fall so far behind that streaming replication stops.
max_slot_wal_keep_size
The max_slot_wal_keep_size parameter controls the quantity of WAL data that the RDS for PostgreSQL DB instance retains in the pg_wal directory to serve slots. This parameter is used for configurations that use replication slots. The default value for this parameter is -1, meaning that there's no limit to how much WAL data is kept on the source DB instance.
Setting the parameters that control shared memory
The parameters affected are:
· max_connections
· max_worker_processes
· max_wal_senders
· max_prepared_transactions
· max_locks_per_transaction
To avoid RDS reboots of replicas due to insufficient memory, we recommend applying the parameter changes as a rolling reboot to each replica. You must apply the following rules, when you set the parameters:
Check the amount of ReplicaLag between source DB instance and replicas. Replica lag is the amount of time, in seconds, that a read replica lags behind its source DB instance. This metric reports the result of the following query.
SELECT extract(epoch from now() - pg_last_xact_replay_timestamp()) AS "ReplicaLag";
You can view the replication slots on your RDS for PostgreSQL DB instances by querying the pg_replication_slots view, as follows.
postgres=> SELECT * FROM pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
---------------------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+----------The wal_status of reserved value means that the amount of WAL data held by the slot is within the bounds of the max_wal_size parameter. In other words, the replication slot is properly sized. Other possible status values are as follows:
· extended – The slot exceeds the max_wal_size setting, but the WAL data is retained.
· unreserved – The slot no longer has the all required WAL data. Some of it will be removed at the next checkpoint.
· lost – Some required WAL data has been removed. The slot is no longer usable.
The unreserved and lost states of the wal_status are seen only when max_slot_wal_keep_size is non-negative.
rds_us_west_1_db_555555555 | | physical | | | f | t | 13194 | | | 23/D8000060 | | reserved | | f
(1 row)
Terminate the query that causes the read replica lag
Run a query on the primary instance similar to the following to find the process ID (PID) of the query that's running for a long time:
SELECT datname, pid,usename, client_addr, backend_start,xact_start, current_timestamp - xact_start AS xact_runtime, state,backend_xmin FROM pg_stat_activity WHERE state='active';
SELECT now() - state_change as idle_in_transaction_duration, now() - xact_start as xact_duration,* FROM pg_stat_activity WHERE state = 'idle in transaction' AND xact_start is not null ORDER BY 1 DESC;
After identifying the PID of the query, you can choose to end the query.
Run a query on the primary instance similar to the following to terminate the query that's running for a long time:
SELECT pg_terminate_backend(PID);