Postgresql-Vacuum Function

Vacuum


Whenever rows in a PostgreSQL table are updated or deleted, dead rows are left behind. VACUUM gets rid of them so that the space can be reused. If a table doesn’t get vacuumed, it will get bloated 

VACUUM [FULL] [FREEZE] [VERBOSE] [table_name ];

 PostgreSQL 13 continues to improve the vacuuming system with the introduction of parallelized vacuum for indexes .In addition to these performance benefits, data inserts can now trigger the autovacuum process. 

workers specified with PARALLEL option if any which is further limited by max_parallel_maintenance_workers if the size of the index is more than min_parallel_index_scan_size 

AUTOVACUUM settings for PostgreSQL can be found in the postgresql.conf file and control when/how the daemon runs 

# AUTOVACUUM PARAMETERS

#-----------------------------------------------------------------------------------------------------------#

autovacuum = on                         # Enable autovacuum subprocess?  'on'                                                                          # requires track_counts to also be on.

log_autovacuum_min_duration = -1        # -1 disables, 0 logs all actions and                                                                          # their durations, > 0 logs only                                                                              # actions running at least this number # of milliseconds.

autovacuum_max_workers = 3              # max number of autovacuum subprocesses                                                                        # (change requires restart)

autovacuum_naptime = 1min               # time between autovacuum runs

autovacuum_vacuum_threshold = 50        # min number of row updates before vacuum

autovacuum_analyze_threshold = 50       # min number of row updates before analyze

autovacuum_vacuum_scale_factor = 0.2    # fraction of table size before vacuum

autovacuum_analyze_scale_factor = 0.1   # fraction of table size before analyze

autovacuum_freeze_max_age = 200000000   # maximum XID age before forced vacuum                                                                         # (change requires restart)

autovacuum_multixact_freeze_max_age = 400000000 

                                        # maximum multixact age # before forced vacuum                                                                 # (change requires restart)

autovacuum_vacuum_cost_delay = 20ms     # default vacuum cost delay for                                                                               # autovacuum, in milliseconds;    # -1 means use vacuum_cost_delay

autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for # autovacuum, -1 means use

                                       # vacuum_cost_limit

#-----------------------------------------------------------------------------------------------------------#

Autovacuum

Autovacuum automates the start of the VACUUM and the ANALYZE commands. It checks for tables with a large number of inserted, updated, or deleted tuples. After this check, it reclaims storage by removing obsolete data or tuples from the PostgreSQL database.

 

·        clean up “dead tuples” left behind after UPDATE or DELETE operations

·        update the free space map that keeps track of free space in table blocks

·        update the visibility map that is required for index-only scans

·        “freeze” table rows so that the transaction ID counter can safely wrap around

·        schedule regular ANALYZE runs to keep the table statistics updated

 

We recommend that you set rds.force_autovacuum_logging_level to LOG. We also recommend that you set log_autovacuum_min_duration to a value from 1000 to 5000

 NOTE: Best practices disable the PostgreSQL autovacuum parameter.

 #-----------------------------------------------------------------------------------------------------------#


 Tuning autovacuum to run faster

·  Increase autovacuum_vacuum_cost_limit from its default value of 200 (this is the gentle method)

·  Reduce autovacuum_vacuum_cost_delay from its default value of 2 (in older versions: 20!) milliseconds (this is the effective method)

         Setting autovacuum_vacuum_cost_delay to 0 will make autovacuum as fast as a manual VACUUM – that is, as fast as possible.

              ALTER TABLE busy_table SET (autovacuum_vacuum_cost_delay = 1);

 

autovacuum_vacuum_insert_threshold

 

index-only scans on insert-only tables

autovacuum_vacuum_insert_scale_factor

ALTER TABLE mytable SET (autovacuum_vacuum_insert_scale_factor = 0.005);

 toast.autovacuum_vacuum_insert_threshold


#-----------------------------------------------------------------------------------------------------------#

Vacuums Speed Up the Cleaning

VACUUM reclaims storage occupied by dead Tuples.

Tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done.

 

PostgreSQL does have its very own VACUUM, and this vacuum is an essential piece of PostgreSQL maintenance. Vacuuming is part of PostgreSQL's muliversion concurrency control system and, in short, cleans up rows that are no longer visible, e.g. because they were updated or deleted. Also, use autovacuum.

 

Standard VACUUM  simply reclaims space and makes it available for re-use, as an exclusive lock is not obtained. VACUUM can run in parallel with production database operations.

 

VACUUM FULL VACUUM FULL can reclaim more disk space but runs much more slowly,

Rewrites the entire contents of the table into a new disk file with no extra space, allowing unused space to be returned to the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed, and therefore cannot be done in parallel with other use of the table 

Anti-wraparound vacuums

For PostgreSQL versions older than v13, you can achieve a similar effect by triggering anti-wraparound vacuum earlier, so that it becomes less disruptive. For example, if you want to vacuum a table every 100000 transactions, you can set this storage parameter:

        ALTER TABLE mytable SET (  autovacuum_freeze_max_age = 100000);

If all tables in your database are insert_only, you can reduce the overhead from autovacuum by setting vacuum_freeze_min_age to 0, so that tuples get frozen right when the table is first vacuumed.

Reducing the likelihood of transaction ID wraparound

Adjust parameters for autovacuum if a table continues to trend toward transaction ID wraparound 

A sample monitoring strategy might look like this: 


#-----------------------------------------------------------------------------------------------------------#

Parallel VACUUM parameters in RDS for PostgreSQL

 PARALLEL option was introduced in PostgreSQL 13 and is disabled by default.  

The following are important parameters for parallel vacuuming in RDS for PostgreSQL and Aurora PostgreSQL:

You can add these parameter settings in RDS for PostgreSQL using DB parameter groups  modify an existing DB parameter group called rds-pg-13, refer to Modifying parameters in a DB parameter group.

https://aws.amazon.com/blogs/database/parallel-vacuuming-in-amazon-rds-for-postgresql-and-amazon-aurora-postgresql/