GSoC 2017

With

PostgreSQL

Project Title

Explicitly Support Predicate locks in index access methods besides B-tree.

Author : Shubham Barai

Project Description

Serializable Isolation Implementation Strategies

Techniques for implementing full serializable isolation have been published and in use in many database products for decades. The primary technique which has been used is Strict Two-Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions. A cycle in a graph of blocking indicates a deadlock, requiring a rollback. Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.

A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008. This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation. In particular, reads don't block anything and writes don't block reads. Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur. It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur. In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.

Predicate Locking

Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with later inserts or with later updates which move data into the selected range. PostgreSQL didn't already have predicate locking, so it needed to be added to support full serializable transactions under either strategy. Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularity (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures. This approach was used in PostgreSQL. Coarse granularity can cause some false positive indications of conflict. The number of false positives can be influenced by plan choice.

Current Implementation and its limitations

Currently, only B+-trees support page level predicate locking. For other indexes, it acquires relation level lock which can lead to unnecessary serialization failure due to rw dependency caused by any insert into the index.

Goal of the project

The main task of this project is to support page level predicate locking for remaining indexes which includes Gist, SP-Gist , hash, gin and rum index.

Benefits of Page level predicate locking

Page level predicate locking can provide better performance at serializable isolation level by reducing the number of false positives which leads to unnecessary serialization failure.

Project Implementation

1. Gist Index

GiST stands for Generalized Search Tree. It is a balanced, tree-structured access method, that acts as a base template in which to implement arbitrary indexing schemes. B-trees, R-trees and many other indexing schemes can be implemented in GiST .

The main difference between b-tree and gist index while searching for a target tuple is that in gist index, we can determine if there is a match or not at any level of the index. In gist, index entry of internal nodes contains a predicate which is used as a search key to search all reachable tuples from that node. To insert a tuple in the index, we first check the key representing a target sub-tree. If it doesn't already cover the key we are inserting, we have to replace it with the union of old key and the key we are inserting. After considering all these points, it seems logical to acquire a predicate lock at each index level.

Predicate Locking Strategy

During a scan, we acquire a predicate lock on a page(internal or leaf) at each index level. An index insert at the leaf level can then be trusted to ripple up to all levels and locations where conflicting predicate locks may exist. If split occurs during insertion, a predicate lock is copied from an original page to all new pages that get generated during split process.

2. SP-Gist Index

SP-GiST is an abbreviation of space-partitioned GiST. It provides a generalized infrastructure for implementing space-partitioned data structures, such as quad trees, k-d trees, and radix trees.

Logically, an SP-GiST tree is a set of tuples, each of which can be either an inner or leaf tuple. Each inner tuple contains "nodes", which are (label,pointer) pairs, where the pointer (ItemPointerData) is a pointer to another inner tuple or to the head of a list of leaf tuples. Inner tuples can have different numbers of nodes (children). Branches can be of different depth (actually, there is no control or code to support balancing), which means that the tree is non-balanced. However, leaf and inner tuples cannot be intermixed at the same level: a downlink from a node of an inner tuple leads either to one inner tuple, or to a list of leaf tuples.

Predicate Locking Strategy

When the search traversal algorithm reaches an inner tuple, it chooses a set of nodes to continue tree traverse in depth. If it reaches a leaf page, it scans a list of leaf tuples to find the ones that match the query. Unlike Gist, SP-Gist can't determine if there is a match or not at internal level of the index. So, for SP-Gist searches, a predicate lock on leaf pages will be enough. If split occurs during insertion, a predicate lock will be copied from an original page to a new pages that gets generated during split process.

3. Hash Index

A hash index consists of two or more "buckets", into which tuples are placed whenever their hash key maps to the bucket number. The key-to-bucket-number mapping is chosen so that the index can be incrementally expanded. When a new bucket is to be added to the index, exactly one existing bucket will need to be "split", with some of its tuples being transferred to the new bucket according to the updated key-to-bucket-number mapping.

Each bucket in the hash index comprises one or more index pages. The bucket's first page (known as primary page) is permanently assigned to it when the bucket is created. Additional pages, called "overflow pages", are added if the bucket receives too many tuples to fit in the primary bucket page. The pages of a bucket are chained together in a doubly-linked list using fields in the index page special space.

Predicate Locking Strategy

During a scan operation, acquire a predicate lock on the primary page of a bucket. During an insert operation, check if there is any conflicting predicate lock on the primary page of a bucket. In case of a bucket split, copy predicate lock from the primary page of an old bucket to the primary page of a new bucket.

4. Gin Index

Gin stands for Generalized Inverted Index. Generalized means that the index does not know which operation it accelerates. It instead works with custom strategies, defined for specific data types (read "Index Method Strategies" in the PostgreSQL documentation). In that sense, Gin is similar to GiST and differs from b-tree indices, which have predefined, comparison-based operations.

A Gin index consists of a B-tree index constructed over key values, where each key is an element of some indexed items (element of array, lexeme for tsvector) and where each tuple in a leaf page contains either a pointer to a B-tree over item pointers (posting tree), or a simple list of item pointers (posting list) if the list is small enough.

Predicate Locking Strategy

Gin index has a feature called fast update which postpones the insertion of tuples by temporarily storing them into pending list. The pending list is eventually flushed during vacuum. So this creates a problem because even if a scan acquires a page level predicate lock on the pending list, we will not be able to detect r-w conflict because a new insert will just append tuples on the pending list. So, page level predicate locking is not possible when fast update is enabled. So, If fast update is enabled, a predicate lock on the index relation is required. Otherwise , Gin searches will need predicate locks only on the leaf pages whether it is entry tree or posting tree. During a page(non-root) split, a predicate lock will be copied from the original page to the new page. If root (only node in the relation ) gets split, then a predicate lock will be copied from the root page to both (left and right) new pages.

5. Rum Index

The rum module provides access method to work with rum index. It is based on the gin access method. Unlike gin, rum doesn't support fast update.

Gin index allows to perform fast full text search using tsvector and tsquery types. But full text search with GIN index has several problems:

1. Slow ranking. It is need position information about lexeme to ranking. Gin index doesn't store positions of lexeme. So after index scan we need additional heap scan to retrieve lexeme positions.

2. Slow phrase search with gin index. This problem relates with previous problem. It is need position information to perform phrase search.

3. Slow ordering by time-stamp. Gin index can't store some related information in index with lexeme. So it is necessary to perform additional heap scan.

Rum solves this problems by storing additional information in posting tree. For example, positional information of lexeme or time-stamps.

Predicate Locking Strategy

Rum has a predicate locking strategy which is similar to gin index except rum doesn't support fast update so we don't have to worry about acquiring a relation level lock.

Contact

shubhambaraiss@gmail.com

Thanks to PostgreSQL

I would like to thank PostgreSQL for giving me the opportunity to work on this project. I would also like to thank my mentor Andrey Borodin for being really supportive throughout the program. It's been a wonderful experience being a part of the PostgreSQL Community and learning from all of you. I am hoping to contribute to PostgreSQL in the future.