Latches are serialization mechanisms that protect areas of Oracle’s shared memory (the SGA). In simple terms latches prevent two processes from simultaneously updating - and possibly corrupting - the same area of the SGA.
Oracle sessions need to update or read from the SGA for almost all database operations. For instance:
When a session reads a block from disk, it must modify a free block in the buffer cache and adjust the buffer cache LRU (Least Recently Used) chain.
When a session reads a block from the SGA, it will modify the LRU chain.
When a new SQL statement is parsed, it will be added to the library cache within the SGA.
As modifications are made to blocks, entries are placed in the redo buffer.
The database writer periodically writes buffers from the cache to disk (and must update their status from “dirty” to “clean”).
The redo log writer writes entries from the redo buffer to the redo logs.
Latches prevent any of these operations from colliding and possibly corrupting the SGA.
Because the duration of operations against memory is very small (typically in the order of nanoseconds) and the frequency of latch requests very high, the latching mechanism needs to be very light-weight. On most systems, a single machine instruction called “test and set” is used to see if the latch is taken (by looking at a specific memory address) and if not, acquire it (by changing the value in the memory address).
If the latch is already in use, Oracle can assume that it will not be in use for long, so rather than go into a passive wait (e.g., relinquish the CPU and go to sleep) Oracle will retry the operation a number of times before giving up. This algorithm is called acquiring a spin lock and the number of “spins” before sleeping is controlled by the Oracle initialization parameter “_spin_count”.
The first time the session fails to acquire the latch by spinning it will attempt to awaken after a millisecond or so. Subsequent waits will increase in duration and in extreme circumstances may reach 100s of milliseconds. In a system suffering from intense contention for latches, these waits will have a severe impact on response time and throughput.
The latches that most frequently affect performance are those protecting the buffer cache, areas of the shared pool and the redo buffer.
Library cache and shared pool latches: These latches protect the library cache in which sharable SQL is stored. In a well defined application there should be little or no contention for these latches, but in an application that uses literals instead of bind variables (for instance “WHERE surname=’HARRISON’” rather that “WHERE surname=:surname”, library cache contention is common.
Redo copy/redo allocation latches: These latches protect the redo log buffer, which buffers entries made to the redo log. These latches were a significant problem in earlier versions of Oracle, but are rarely encountered today.
Cache buffers chain latches: These latches are held when sessions read or write to buffers in the buffer cache. There are typically a very large number of these latches each of which protects only a handful of blocks. Contention on these latches is typically caused by concurrent access to a very “hot” block and the most common type of such a hot block is an index root or branch block (since any index based query must access the root block).
Oracle’s wait interface makes it relatively easy to detect latch contention and – from 10g onwards – to accurately identify the specific latch involved. In 10 and 11g, each latch has it’s own wait category if waits on the specific latch become significant then we can deduce a latch contention problem.
As often as not, latch contention will be associated with other symptoms that can help diagnose the root cause of the problem. For instance, a high shared pool miss rate and shared pool lock alarm are also current. These alarms are typical of a system in which a high rate of dynamic (non sharable) SQL is causing shared pool or library cache latch contention.
In the “bad old days”, we often had to use ratio based techniques to determine which latch was causing a problem, since Oracle collated all latch waits into a single category. Since we didn’t know what latch was responsible for the greatest amount of waits, we would typically examine “miss” and “sleep” rates. A “miss” occurs when a session cannot immediately obtain a latch. A sleep occurs when the session cannot obtain the latch even when retrying to the value of the parameter “_spin_count”. These values can be found in the V$LATCH table
There are some things we can do within our application design that can reduce contention for latches.
As noted earlier, failure to use bind variables within an application is the major cause of library cache and/or shared pool latch contention. All Oracle applications should make use of bind variables whenever possible.
However, all is not lost if you are unable to modify your application code. You can also try the “CURSOR_SHARING” parameter to cause Oracle to modify SQL on the fly to use bind variables. A setting of FORCE causes all literals to be converted to bind variables. A setting of SIMILAR causes statements to be rewritten only if it would not cause the statements execution plan to vary (which can happen if there are histogram statistics defined on a column referenced in the WHERE clause).
CURSOR_SHARING is one of the few silver bullet parameters that can instantly improve performance. Figure 5 shows how performance changed on my latch constrained system when I changed CURSOR_SHARING to FORCE (using Spotlights Parameters page). On changing the parameter (at 4:10pm) the execution rate more than doubled, latch contention was eliminated and my SQL Area miss rate halved.
Cache buffers chains latch contention is one of the most intractable types of latch contention. There are a couple of things you can do at the application level to reduce the severity of this type of contention.
I always recommend that the application workload be optimized before dealing with contention issues (see for instance SystematicOracletuning.pdf). But in the case of cache buffer chains contention it is particularly important. Cache buffer chains contention occurs most often because of very high logical read rates on a relatively small number of database blocks. A common cause of this phenomenon is SQL which repeatedly and unnecessarily reads the same blocks over and over again. So first, identify the SQL that is associated with the most cache buffer chains latch activity, and see if that SQL should be tuned.
If you are satisfied that the SQL is optimized but you still have a cache buffer chains latch contention problem, try and identify the blocks that are “hot". Metalink note 163424.1 “How to Identify a Hot Block Within The Database” describes how to do this.
Having identified the identity of the hot block, you may find that it is an index root or branch block. If this is the case, there are two application design changes that may help.
Consider partitioning the table and using local indexes. This might allow you to spread the heat amongst multiple indexes (you will probably want to use a hash partition to ensure an even spread of load amongst the partitions).
Consider converting the table to a hash cluster keyed on the columns of the index. This allows the index to be bypassed completely and may also result in some other performance improvements. However, hash clusters are suitable only for tables of relatively static size, and determining an optimal setting for the SIZE and HASHKEYS storage parameters are essential.
If the block is a table block that just happens to be very heavily accessed, then perhaps partitioning can still help by spreading the load across multiple partitions. However, if it actually a single row that is hot, then you may need to review your application design. Alternatively, you can try adjusting the _spin_count parameter (as discussed below).
Latches protect areas of Oracle shared memory from concurrent access in roughly the same way that locks protect data in tables. When a session wants a latch it will repeatedly attempt to obtain the latch until reaching the value of "_spin_count" after which it will sleep and a "latch free" wait will occur. Excessive latch sleeps can create restrictions on throughput and response time.
The two most frequently encountered forms of latch contention in modern Oracle (10g/11g) are:
Library cache/shared pool latch contention. This is usually caused when an application issues high volumes of SQL which are non-sharable due to an absence of bind variables. The CURSOR_SHARING parameter can often be used to alleviate this form of contention.
cache buffer chains contention. This is usually associated with very high logical read rates and "hot" blocks within the database (sometimes index blocks). After tuning SQL to reduce logical IO and eliminate repetitive reads of the same information, partitioning is often a possible solution.
If latch contention is causing serious problems, and the system has some free CPU capacity, adjusting the value of the undocumented parameter _spin_count may be effective in reducing contention. As always, modifying undocumented parameters should be approached with great caution.
Spotlight has a variety of alarms, advice and diagnostic screens for identifying and diagnosing latch contention. This includes a capability of establishing the optimum _spin_count value for a particular workload.