PostgresSQL-Admin
What is multi-version control?
Multi-version concurrency control or MVCC is used to avoid unnecessary locking of the database. This removes the time lag for the user to log into his database. This feature or time lag occurs when someone else is accessing the content. All the transactions are kept as a record.
PostgreSQL stores transaction IDs in the xmin and xmax system columns to determine which row version is visible to which query.
Two meanings of xmax
· It stores the transaction ID (“xid”) of the transaction that deleted the tuple.
· It stores row locks on the tuple
#-----------------------------------------------------------------------------------------------------------#
Provide an explanation About Write Ahead Logging?
WAL or write-ahead logging is a standard method to ensure data integrity. It is a protocol or the correct rule to write both actions and changes into a transaction log. This feature is known to increase the reliability of the database by logging changes before any changes or updating to the database. This provides the log of the database in case of a database crash. This helps to start the work from the point it was discontinued.
There are the number of data administration tools, and they are
Phppgadmin
Psql
Pgadmin
Out of these, phppgadmin is the most popular one. Most of these tools are front-end administration tools and web-based interfaces
#---------------------------------------------------------------------------------
Backup and Restore Databases
This section shows you how to use various PostgreSQL backup and restore tools including pg_dump, pg_dumpall, psql, pg_restore and pgAdmin to backup and restore databases.
Tablespace Management
PostgreSQL tablespaces allow you to control how data stored in the file system. The tablespaces are very useful in many cases such as managing large tables and improving database performance.
#-----------------------------------------------------------------------------------------------------------#
Bloat
Database bloat is disk space that was used by a table or index and is available for reuse by the database but has not been reclaimed. Bloat is created when updating tables or indexes.
It is very important to run VACUUM on individual tables after large UPDATE and DELETE operations
Detecting Bloat
SELECT * FROM gp_toolkit.gp_bloat_diag;
Removing Bloat from Database Tables
The VACUUM command adds expired rows to the free space map so that the space can be reused.
It is better to run VACUUM before running ANALYZE. Analyzing a severely bloated table can generate poor statistics if the sample contains empty pages, so it is good practice to vacuum a bloated table before analyzing it.
#-----------------------------------------------------------------------------------------------------------#
Partitioning Improvements in PostgreSQL 13
Tables can be range-partitioned or list-partitioned.
· List partitioning is limited to a single column,
· Range partitioning can involve multiple columns. A partitioning "column" can be an
expression.
Here are some key partitioning feature made it to PG over the years…
Partitioning
With very big tables, it can be advisable to use partitioning. The advantage here is you can have several autovacuum workers working on several partitions in parallel, so that the partitioned table as a whole is done faster than a single autovacuum worker could.
If you have many partitions, you should increase autovacuum_max_workers, the maximum number of autovacuum workers.
Memory
Memory
maintenance_work_mem Larger settings might improve performance for vacuuming and for restoring database dumps.
autovacuum_work_mem: Maximum amount of memory to be used by each autovacuum worker process. It defaults to -1
Either of these two parameters sets the maximum size of memory that each autovacuum session can use.
work_mem Memory to be used by internal sort operations and hash tables before writing to temporary disk files
The case of long-running autovacuum sessions
There are sessions that have been running for a long time.
This situation usually indicates maintenance_work_mem is set too small for the size of the larger table.
Tuning ------
seq_page_cost =0.1
random_page_cost = 0.1
cpu_tuple_cost = 0.03
shared_buffers = 25% memory
work_mem =16mb
maintenance_work_mem =128mb
For refrence https://www.elephantsql.com/blog/what-is-work_mem-in-postgresql.html
Execution Plan
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
ANALYZE executes statement and shows execution details
_ VERBOSE verbose output
_ COSTS show plan costs
_ BUFFERS show information about buffers operated by the query
_ TIMING show time spent
_ SUMMARY show totals at the end of output
_ FORMATTEXTjXMLjJSONjYAML output in selected format
Execution Plan | Postgres Guide
Seq Scan — sequential scan of whole relation
_ Parallel Seq Scan — parallel sequential scan of whole relation
_ Index Scan — targeted random IO (read index + read table)
_ Index Only Scan — read only from index2
_ Bitmap Index Scan — prepare a map of rows to read from relation,possibly combining maps from several indexes
_ Bitmap Heap Scan — use map from Bitmap Index Scan and read rows from relation, always follows Bitmap Index Scan
_ CTE Scan - read from Common Table Expression (WITH Block)
_ Function Scan - read results, returned by a function
Operations on nodes
_ join – joins data from two nodes using appropriate join method
_ sort – various methods of sorting
_ limit – cuts the dataset off
_ aggregate – performs aggregation
_ hash aggregate – groups data
_ unique – removes duplicates from sorted datasets
_ gather – gather data from different workers
PgBouncer
PgBouncer is an open-source, lightweight, single-binary connection pooler for PostgreSQL. It can pool connections to one or more databases (on possibly different servers) and serve clients over TCP and Unix domain sockets.
Pgbouncer supports three types of pooling when rotating connections:
Session pooling: A server connection is assigned to the client application for the life of the client connection. PgBouncer releases the server connection back into the pool once the client application disconnects. This is the default method.
Transaction pooling: A server connection is assigned to the client application for the duration of a transaction. When PgBouncer detects the completion of the transaction, it releases the server connection back into the pool.
Statement pooling: A server connection is assigned to the client application for each statement. When the statement completes, the server connection is returned into the pool. Multi-statement transactions are not permitted for this mode.
Loading data into a PostgreSQL DB instance
When loading data into an Amazon RDS PostgreSQL DB instance, you should modify your DB instance settings and your DB parameter group values to allow for the most efficient importing of data into your DB instance.
Modify your DB instance settings to the following:
Disable DB instance backups (set backup_retention to 0)
Disable Multi-AZ
Modify your DB parameter group to include the following settings
Increase the value of the maintenance_work_mem parameter.
Increase the value of the max_wal_size and checkpoint_timeout parameters to reduce the number of writes to the wal log.
Disable the synchronous_commit parameter.
Disable the PostgreSQL autovacuum parameter.
What can Cause Database Performance to Degrade?
· Badly written queries with bad joins, logic etc. that take a lot of CPU and memory
· Queries performing full-table-scans on big tables due to improper Indexing
· Bad database maintenance with no proper statistics in place
· Inefficient capacity planning resulting in inadequately dimensioned infrastructure
· Improper logical and physical design
· No connection pooling in place, which cause applications to make huge number of connections in an uncontrollable manner
Tuning PostgreSQL I/O
· Indexing
· Partitioning
· Checkpoints
· VACUUM, ANALYZE (with FILLFACTOR)
· Other I/O problems
· PostgreSQL I/O on Cloud
· Tools
There are parameters in postgresql.conf which can be configured / tuned to control checkpoint behavior and those parameters are max_wal_size, min_wal_size, checkpoint_timeout and checkpoint_completion_target. These parameters will decide how frequently the checkpoints should occur, and within how much time the checkpoints have to finish.
------------------------------------------------------------------------------------------------------------------------------------
postgresql.conf
> psql -U postgres
=# show hba_file;
=# show config_file
https://stackoverflow.com/questions/3602450/where-does-postgresql-store-configuration-conf-files
------------------------------------------------------------------------------------------------------------------------------------