Postgresql-Vacuum Function
Vacuum
Whenever rows in a PostgreSQL table are updated or deleted, dead rows are left behind. VACUUM gets rid of them so that the space can be reused. If a table doesn’t get vacuumed, it will get bloated
VACUUM [FULL] [FREEZE] [VERBOSE] [table_name ];
PostgreSQL 13 continues to improve the vacuuming system with the introduction of parallelized vacuum for indexes .In addition to these performance benefits, data inserts can now trigger the autovacuum process.
workers specified with PARALLEL option if any which is further limited by max_parallel_maintenance_workers if the size of the index is more than min_parallel_index_scan_size
AUTOVACUUM settings for PostgreSQL can be found in the postgresql.conf file and control when/how the daemon runs
# AUTOVACUUM PARAMETERS
#-----------------------------------------------------------------------------------------------------------#
autovacuum = on # Enable autovacuum subprocess? 'on' # requires track_counts to also be on.
log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and # their durations, > 0 logs only # actions running at least this number # of milliseconds.
autovacuum_max_workers = 3 # max number of autovacuum subprocesses # (change requires restart)
autovacuum_naptime = 1min # time between autovacuum runs
autovacuum_vacuum_threshold = 50 # min number of row updates before vacuum
autovacuum_analyze_threshold = 50 # min number of row updates before analyze
autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum # (change requires restart)
autovacuum_multixact_freeze_max_age = 400000000
# maximum multixact age # before forced vacuum # (change requires restart)
autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for # autovacuum, in milliseconds; # -1 means use vacuum_cost_delay
autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovacuum, -1 means use
# vacuum_cost_limit
#-----------------------------------------------------------------------------------------------------------#
Autovacuum
Autovacuum automates the start of the VACUUM and the ANALYZE commands. It checks for tables with a large number of inserted, updated, or deleted tuples. After this check, it reclaims storage by removing obsolete data or tuples from the PostgreSQL database.
· clean up “dead tuples” left behind after UPDATE or DELETE operations
· update the free space map that keeps track of free space in table blocks
· update the visibility map that is required for index-only scans
· “freeze” table rows so that the transaction ID counter can safely wrap around
· schedule regular ANALYZE runs to keep the table statistics updated
We recommend that you set rds.force_autovacuum_logging_level to LOG. We also recommend that you set log_autovacuum_min_duration to a value from 1000 to 5000
NOTE: Best practices disable the PostgreSQL autovacuum parameter.
#-----------------------------------------------------------------------------------------------------------#
Tuning autovacuum to run faster
· Increase autovacuum_vacuum_cost_limit from its default value of 200 (this is the gentle method)
· Reduce autovacuum_vacuum_cost_delay from its default value of 2 (in older versions: 20!) milliseconds (this is the effective method)
Setting autovacuum_vacuum_cost_delay to 0 will make autovacuum as fast as a manual VACUUM – that is, as fast as possible.
ALTER TABLE busy_table SET (autovacuum_vacuum_cost_delay = 1);
autovacuum_vacuum_insert_threshold
index-only scans on insert-only tables
autovacuum_vacuum_insert_scale_factor
ALTER TABLE mytable SET (autovacuum_vacuum_insert_scale_factor = 0.005);
toast.autovacuum_vacuum_insert_threshold
#-----------------------------------------------------------------------------------------------------------#
Vacuums Speed Up the Cleaning
VACUUM reclaims storage occupied by dead Tuples.
Tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done.
PostgreSQL does have its very own VACUUM, and this vacuum is an essential piece of PostgreSQL maintenance. Vacuuming is part of PostgreSQL's muliversion concurrency control system and, in short, cleans up rows that are no longer visible, e.g. because they were updated or deleted. Also, use autovacuum.
Standard VACUUM simply reclaims space and makes it available for re-use, as an exclusive lock is not obtained. VACUUM can run in parallel with production database operations.
VACUUM FULL VACUUM FULL can reclaim more disk space but runs much more slowly,
Rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed, and therefore cannot be done in parallel with other use of the table
Anti-wraparound vacuums
For PostgreSQL versions older than v13, you can achieve a similar effect by triggering anti-wraparound vacuum earlier, so that it becomes less disruptive. For example, if you want to vacuum a table every 100000 transactions, you can set this storage parameter:
ALTER TABLE mytable SET ( autovacuum_freeze_max_age = 100000);
If all tables in your database are insert_only, you can reduce the overhead from autovacuum by setting vacuum_freeze_min_age to 0, so that tuples get frozen right when the table is first vacuumed.
Reducing the likelihood of transaction ID wraparound
Adjust parameters for autovacuum if a table continues to trend toward transaction ID wraparound
A sample monitoring strategy might look like this:
Set the autovacuum_freeze_max_age value to 200 million transactions.
If a table reaches 500 million unvacuumed transactions, that triggers a low-severity alarm. This isn't an unreasonable value, but it can indicate that autovacuum isn't keeping up.
If a table ages to 1 billion, this should be treated as an alarm to take action on. In general, you want to keep ages closer to autovacuum_freeze_max_age for performance reasons. We recommend that you investigate using the recommendations that follow.
If a table reaches 1.5 billion unvacuumed transactions, that triggers a high-severity alarm. Depending on how quickly your database uses transaction IDs, this alarm can indicate that the system is running out of time to run autovacuum. In this case, we recommend that you resolve this immediately.
#-----------------------------------------------------------------------------------------------------------#
Parallel VACUUM parameters in RDS for PostgreSQL
PARALLEL option was introduced in PostgreSQL 13 and is disabled by default.
The following are important parameters for parallel vacuuming in RDS for PostgreSQL and Aurora PostgreSQL:
max_worker_processes – Sets the maximum number of concurrent worker processes
min_parallel_index_scan_size – The minimum amount of index data that must be scanned in order for a parallel scan to be considered
max_parallel_maintenance_workers – The maximum number of parallel workers that can be started by a single utility command
You can add these parameter settings in RDS for PostgreSQL using DB parameter groups modify an existing DB parameter group called rds-pg-13, refer to Modifying parameters in a DB parameter group.
https://aws.amazon.com/blogs/database/parallel-vacuuming-in-amazon-rds-for-postgresql-and-amazon-aurora-postgresql/