Datapump dump file I/O
DG Broker configuration file I/O
Log file init write
buffer read retry
db file single write
db file sequential - read Single block read
db file scattered - read Multi block read
db file parallel - read Non-contiguous multi block read
read by other session - wait for another session to do the io
select parameter1, parameter2, parameter3 from v$event_name where ...
NAME P1 P2 P3
----------------------- ----------- --------- --------
db file sequential read file# block# blocks
db file scattered read file# block# blocks
db file parallel read files blocks requests
read by other session file# block# class#
db file sequential read
- Top reported wait
- Single block read
- block accessed via
select * from emp where empno=99;
where there is an index on emp(empno)
search buffer cache for block by rowid,
if fail, read block off disk via file# and block#
Note: "sequential" means a sequence as in rowid
db file scattered read
- Multi Block Read
- Full Table Scan
- Index Fast Full Scan
select * from emp;
search buffer cache for block by rowid,
if fail, read block off disk via file# and block# and # of blocks
Note: "scattered" means blocks are scattered throughout the buffer cache (even though they are laid out sequential on disk)
db_file_multiblock_read_count set the target block read size
db file parallel read
see as well
||Process issues multiple single block reads in parallel
- Documentation says only for recovery
- seems to happen for normal read ops
- Async Call – wait for all reads to complete
- Not contiguous multi block read
- Index full or range scan
- Where values in
Select * from emp where empno in (1,2,3,4,5,6);
read by other session
||Multiple sessions reading the same data that requires IO
two users doing a full table scan at the same time
- Block not found in cache
- Read block from disk
- Found other session already reading block from disk
- Wait for the other session to finish IO
This mechanism lets the client bypass the buffer cache for I/O intensive operations.
The disk blocks are written into and read from process private memory.
select parameter1, parameter2, parameter3 from v$event_name
NAME P1 P2 P3
----------------------- ----------- --------- --------
direct path read file number first dba block cnt
direct path read temp file number first dba block cnt
direct path write file number first dba block cnt
direct path write temp file number first dba block cnt
Direct I/O read and write size are controlled by the parameters
See Doc ID: 330818.1 Increasing db_file_multiblock_read_count Results In Slower Performance Due To More & Smaller Direct Path Waits
Note: from Bug 4047116, the value 248 can be used when I/O size is 256Kb
whereas 1008 can be used when I/O size is 1Mb. Both the min and the
max parameter must be set to the same value.
direct path read
Grouping by Query Coordinator:
count (distinct session_id) deg,
decode(session_state, 'WAITING',event,'CPU') event
from v$active_session_history ash,
where o.object_id (+)= ash.CURRENT_OBJ#
and qc_session_id is not null
group by qc_session_id, sql_id, o.object_name,
o.object_type, CURRENT_OBJ#, event, session_state
Order by qc_session_id, sql_id
SQL_ID QSID CNT DEG OBJ OTYPE EVENT
------------- ---- ---- --- ------------ ----- --------------------------
7p3jt75phub2d 144 386 4 WAIT_OBJECTS TABLE PX Deq Credit: send blkd
144 4 3 WAIT_OBJECTS TABLE PX qref latch
144 37 1 WAIT_OBJECTS TABLE db file sequential read
144 3 2 WAIT_OBJECTS TABLE direct path read
144 70 1 WAIT_OBJECTS TABLE CPU
144 21 4 0 PX Deq Credit: send blkd
144 12 4 0 db file sequential read
direct path write
create table as select
Seems to be a bug on 10g where direct path write waits are incorrectly recorded as CPU
"If asynchronous IO is not being used then the IO write request blocks until completed but this dies not show as a wait at the time the IO is issued. The session returns later to pick up the completed IO data but can then show a wait on "direct path write" even though this wait will return immediately. "
11g : There is hidden parameter _direct_path_insert_features, which seems controls direct path behaviour. alter session set "_direct_path_insert_features" = 1 turns off direct path for insert .. values().
direct path read temp
||This event happens when sorting overflows the memory buffers and has to be written to disk.
If it's a problem consider increasing paramater
The explain plan can give estimated sort size requirements and extended row source stats in the explain plan can give the actual sort size usage of the sql statement
select s.sid, --s.osuser, s. process,
and segtype in ('HASH','SORT')
order by blocks desc
SID SQL_ID SEGTYPE MB TABLESPACE
--- ------------- ------- -- ----------
90 gm5s0vc0vzppc SORT 2 TEMP03
105 gm5s0vc0vzppc SORT 2 TEMP03
80 gm5s0vc0vzppc SORT 2 TEMP03
105 gm5s0vc0vzppc HASH 2 TEMP03
102 gm5s0vc0vzppc SORT 2 TEMP03
90 gm5s0vc0vzppc HASH 2 TEMP03
The following information from Jonathan Lewis, Randolf Geist and Joze Senegacnik :
On Oracle starting in 10.2.0.3 to 188.8.131.52, this doesn't work:
alter session set workarea_size_policy = manual;
alter session set sort_area_size = 104857600;
broke in 10.2.0.3
fixed in 184.108.40.206
Workaround for serial connections, just run the command twice:
alter session set sort_area_size = 104857600;
alter session set sort_area_size = 104857600;
But this doesn't work for parallel sessions that don't pick up the manual size at all but use the system default size (64K).You can change the system default, but this will be a one size fits all workaround.
Does not seem to affect HASH_AREA_SIZE. hash_area_size parameter is an integer in bytes so it is might be limited to approx 2G
On 9i the upper limit for sort area size using PGA_AGGREGATE_TARGET for a single session was fixed to 100M unless using undocumented parameters:
In 10gR2 the 100M does not apply anymore when using a PGA_AGGREGATE_TARGET > 1000M.
In 10.2 it is limited to 200M as long as pga_aggregate_target is smaller than 1GB.
When pga_aggregate_target is large than 1GB then _pga_max_size= 20% of pga_aggregate_target .
The pga_aggregate_target now drives in most cases _smm_max_size:
pga_aggregate_target <=500MB, _smm_max_size = 20%* pga_aggregate_target
pga_aggregate_target between 500MB and 1000MB, _smm_max_size = 100MB
pga_aggregate_target >1000MB, _smm_max_size = 10%* pga_aggregate_target
and _smm_max_size in turns now drives _pga_max_size: _pga_max_size = 2 * _smm_max_size
A pga_aggregate_target larger than 1000MB will now allow much higher default thresholds in 10gR2: pga_aggregate_target set to 5GB will allow an _smm_max_size of 500MB (was 100MB before) and _pga_max_size of 1000MB (was 200MB).
direct path write (lob)
CNT AAS EVENT P1 TABLESPACE
---- --- ----------------------- --- ----------
1 .00 db file sequential read 1 SYSTEM
2 .00 db file sequential read 3 SYSAUX
38 .06 db file sequential read 6 SOE
179 .30 db file sequential read 7 SOEINDEX
CNT AAS OBJN OTYPE
---- ------- ------------------------- ---------------
79 .00 52949 ORDER_ITEMS TABLE PARTITION
97 .00 -1
130 .00 53117 ORD_STATUS_IX INDEX
498 .01 53120 CUST_EMAIL_IX INDEX
512 .01 0
1632 .03 53112 ITEM_ORDER_IX INDEX
CNT AAS EVENT OBJ P1 TABLESPACE_NAME
--- --- --------------- ----------------- --- ---------------
1 .02 db file sequent ORDER_ITEMS_PK 7 SOEINDEX
1 .02 db file sequent 0 3 SYSAUX
2 .03 db file scatter WAIT_OBJECTS 1 SYSTEM
2 .03 db file sequent ORDER_ITEMS 6 SOE
3 .05 db file sequent CUST_EMAIL_IX 7 SOEINDEX
4 .07 db file sequent CUST_LNAME_IX 7 SOEINDEX
6 .10 db file sequent ORD_ORDER_DATE_IX 7 SOEINDEX
13 .22 db file sequent ITEM_ORDER_IX 7 SOEINDEX
25 .42 db file sequent 0 2 UNDOTBS1
TCNT SQL_ID CNT AAS OBJN OBJ P1 TABLESPAC
---- ------------- --- --- ----- --------------- -- ----------
30 0yas01u2p9ch4 1 .01 53113 ITEM_PRODUCT_IX 7 SOEINDEX
1 .01 53079 ORDER_ITEMS_UK 7 SOEINDEX
28 .16 53112 ITEM_ORDER_IX 7 SOEINDEX
58 6v6gm0fd1rgrz 4 .02 54533 WAIT_OBJECTS 1 SYSTEM
54 .30 0 0 2 UNDOTBS1
AAS SQL_ID PCT OBJ SUB_OBJ OTYPE EVENT F# TABLESPAC CONTENTS
---- ----------------- ----------- ------- ---------- ---------- -- --------- ---------
.00 f9u2k84v884y7 33 CUSTOMERS SYS_P27 TABLE PART sequentia 1 SYSTEM PERMANENT
33 ORDER_PK INDEX sequentia 4 USERS PERMANENT
33 sequentia 2 UNDOTBS1 UNDO
.01 0tvtamt770hqz 100 TOTO1 TABLE scattered 7 NO_ASSM PERMANENT
.06 75621g9y3xmvd 3 CUSTOMERS SYS_P36 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P25 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P22 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P29 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P21 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P35 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P32 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P27 TABLE PART sequentia 4 USERS PERMANENT
3 CUSTOMERS SYS_P34 TABLE PART sequentia 4 USERS PERMANENT
4 CUSTOMERS SYS_P23 TABLE PART sequentia 4 USERS PERMANENT
5 CUSTOMERS SYS_P33 TABLE PART sequentia 4 USERS PERMANENT
50 CUSTOMERS_PK INDEX sequentia 4 USERS
Missing Object Names
Notice that in some of the analysis the object name or object id is missing? Sometimes the OBJECT ID is -1 or 0 in the ASH data for I/O so the above scripts will be missing the Object Name.
In this case there is a second though more costly way to get the OBJECT NAME and that is by using the FILE# and BLOCK# that is being read. The File# and BLOCK# can be translated into to the OBJECT via a query like
where &file = file_id -- File = P1
and &block between -- Block = P2
block_id and block_id + blocks – 1
This query is expensive and slow. If running this query more than once, it's recommended to create a copy of "DBA_EXTENTS" like
Create table my_extents as select * from DBA_EXTENTS;
Create index my_extents_i on my_extents(FILE_ID,BLOCK_ID);
To give an idea of just how much faster a copy of DBA_EXTENTS is, here an example on my machine (with out even creating an index)
Enter value for file: 3
Enter value for block: 6619
OWNER SEGMENT_NAME SEGMENT_TYPE
--------------- -------------------- ------------------
WMSYS LOG_TAB_PK INDEX
Above I just queried for one block. Now below I will look up over a 150 objects in a faster time !
create table myextents as select * from dba_extents
col own for A7
col segment_name for a40
col segment_type for a15
col cnt for 9999
( event like 'db file s%' or event like 'direct%' )
and (current_obj# in (0,-1) or current_obj# is Null)
and sample_time > sysdate - &v_minutes/(60*24)
and P1 = file_id
and P2 between block_id and block_id + blocks - 1
owner, segment_name, segment_type
CNT OWN SEGMENT_NAME SEGMENT_TYPE
--- ---- ---------------- ------------
11 SYS SMON_SCN_TO_TIME CLUSTER
993 SYS _SYSSMU7$ TYPE2 UNDO
150 rows selected.
On other option is querying X$BH just after the IO has occured. X$BH reveals the contents of which blocks are in the buffer cache, so if the blocks from the recent physical IOs are still in the buffer cache we can query X$BH.
Check average read times per file
Should be between 5-20 ms
ADDM expects I/Os to be under 20ms
The parameter defaults to 10000 micro seconds
change behavior with DBMS_ADVISOR.SET_DEFAULT_TASK_PARAMETER( 'ADDM', 'DBIO_EXPECTED', <new value>)
Check time in
Data in Statspack under “File IO Stats” (8i and up)
DBA_HIST_FILESTATXS for historical (10g)
V$FILEMETRIC_HISTORY for recent (10g)
Check Cache buffer Hit ratio
Check db_cache_advice 9i and higher
Data in Statspack under “Buffer Pool Advisory”
Want to optimize data caching
Tune High IO SQL
Check reads off of UNDO
Average IO times
File IO Stats DB/Inst:labsf03 Snaps: 1-2
Av Mx Av
Av Rd Rd Av Av Buffer BufWt
Tablespace Filename Reads Reads/s (ms) Bkt Blks/Rd Writes Writes/s Waits (ms)
---------- ----------------- ----- ------- ---- ---- ------- ------- -------- ------- ------
SYSTEM /u01/system01.dbf 445 15 0.4 16 1.0 1,157 39 2,744 93.3
USERS /u01/users01.dbf 223 7 0.5 ### 1.0 9,725 324 4 100.0
Recent IO read times
to_char(begin_time,'yyyy-mm-dd hh24:mi') begin_time,
average_read_time *10 avgrd_ms,
average_write_time *10 avgwr_ms,
order by begin_time;
BEGIN_TIME FID AVGRD_MS AVGWR_MS PR PW
---------------- ---- -------- -------- ---------- ----------
2008-01-30 12:25 1 5.2 2.5 100 24
2008-01-30 12:25 5 80.0 20.0 2 2
2008-01-30 12:25 2 36.7 2.0 3 218
2008-01-30 12:25 4 22.8 1.8 89 2754
2008-01-30 12:25 3 22.9 2.6 14 47
Historic IO read times
to_char(begin_interval_time,'yyyy-mm-dd hh24:mi') snap_time,
file#, readtim/nullif(phyrds,0) avg_read_ms, phyrds
where f.snap_id=s.snap_id ;
SNAP_TIME FILE# AVG_READ_MS PHYRDS
---------------- ---------- ----------- ----
2008-01-05 12:00 9 36.67 39
2008-01-05 12:00 10 32.31 39
2008-01-05 13:00 1 11.63 178224
2008-01-05 13:00 2 56.37 2014
2008-01-05 13:00 3 17.38 50668
2008-01-05 13:00 4 9.39 565321
2008-01-05 13:00 5 38.78 41
2008-01-05 13:00 6 28.29 41
2008-01-05 13:00 7 27.44 39
2008-01-05 13:00 8 42.56 39
2008-01-05 13:00 9 36.67 39
2008-01-05 13:00 10 32.31 39
ADDM and Avg Read Times
The parameter - DBIO_EXPECTED - that tells ADDM what is an acceptable
average single block read in micro seconds.
The parameter defaults to 10000 micro seconds and can be changed with
Buffer Cache Hit Ratio
Buffer Cache Hit Ratio (BCHR) has a sordid history and it's tempting to just not mention it. For years the BCHR has been used to identify IO performance bottlenecks despite the fact that it can be completely unreliable. Connor McDonald even wrote a script (http://www.oracledba.co.uk/tips/choose.htm) that allows you to set the BCHR as high as you want without changing the size of the cache. The point being that IO bottlenecks should be identified with IO wait events and not BCHR, but if an IO bottleneck does happen, it is worth knowing if the buffer cache is supporting the IO load sufficiently. The question of whether the Buffer Cache is sufficiently sized can be addressed with Oracle's "DB Cache Advice".
Buffer Pool Advisory
Size for Size Buffers for Read Estimated
P Est (M) Factor Estimate Factor Physical Reads
--- -------- ------ ------------ ------ --------------
D 56 .1 6,986 2.3 58,928
D 112 .2 13,972 1.6 42,043
D 224 .4 27,944 1.0 25,772
D 336 .6 41,916 1.0 25,715
D 448 .8 55,888 1.0 25,715
D 596 1.0 74,351 1.0 25,715
D 728 1.2 90,818 1.0 25,715
D 840 1.4 104,790 1.0 25,715
D 952 1.6 118,762 1.0 25,715
D 1,064 1.8 132,734 1.0 25,715
The Buffer Pool Advisory gives expected physical reads at different cache sizes. The column "Factor" shows size relative to the current. I "Factor" of "1" is the current cache size. We can see in the above case, we can actually drop the cache size to .4 of the current size without significantly impacting the physical IO and on the other hand increasing it to 1.8 the current size won't help us at all. In this case the buffer cache size is probably fine an won't help us address IO performance issues on this particular database.
If the IO response time of the disks is good and if the buffer cache is correctly sized, then at this point it's clear it's time to tune the SQL with the most IO wait time:
select count(*),sql_id, event
where event in ('db file sequential read',
'db file scattered read',
'db file parallel read')
group by sql_id, event
order by count(*);
COUNT(*) SQL_ID EVENT
--------- ------------- ------------------------
10 8hk7xvhua40va db file sequential read
335 3hatpjzrqvfn7 db file sequential read
343 0uuqgjq7k12nf db file sequential read
549 75621g9y3xmvd db file sequential read
1311 0bzhqhhj9mpaa db file sequential read
1523 41zu158rqf4kf db file sequential read
1757 0yas01u2p9ch4 db file sequential read
Unix Buffer Cache
Vendors say that read cache will take care of IOPs but of you have a sustained throughput it will only sustain for so long
Seagate Barracuda 4LP
Seagate Cheetah 73LP
Time to read 32Kb
Seek Time (avg)
Total time for Single I/O
I/O per second (conservative)
IOPs/sec per 100GB
James Morle http://scaleability.com
BAARF - Battle Against any Raid Five
In the chart below, 1 represents good and 5 represents bad.
Raid 5 runs into performance issues because every write takes two reads plus a write. These two reads can conflict with other readers. Also because Raid 5 is often configured with small stripe sizes, say 32K or 64K, a multiblock read can easily span several disks, increasing the chances of concurrency read conflicts.
SAME - Stripe and Mirror Everything
Redo will take a hit being mixed with everything else, so if it's a possibility, I'd recommend putting redo on a separate stripe/mirror set.
Of course with Oracle's ASM and now Exadata machines, much if this discussion will become mute over the coming years.
direct path read
The direct path read wait event has three parameters: file#, first block#, and block count. In Oracle Database 10g, this wait event falls under the User I/O wait class. Keep the following key thoughts in mind when dealing with the direct path read wait event.
These are waits that are associated with direct read operations. An Oracle direct read operation reads data directly into the session’s PGA (Program Global Area), bypassing the SGA. The data in the PGA is not shared with other sessions.
Direct reads may be performed in synchronous or asynchronous mode, depending on the platform and the value of the DISK_ASYNC_IO parameter. The systemwide direct path read wait event statistics can be very misleading when asynchronous I/O is used.
A significant number of direct path read waits is most likely an application issue.
Common Causes, Diagnosis, and Actions
The direct path read waits are driven by SQL statements that perform direct read operations from the temporary or regular tablespaces.
SQL statements with functions that require sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace when the input size is larger than the work area in the PGA. The sort runs in the temporary tablespace are subsequently read and merged to provide the final result. An Oracle session waits on the direct path read wait event while the sort runs are being read. SQL statements that employ the merge join, either through a hint or as decided by the optimizer, also require sort.
SQL statements that employ the hash join, either as directed by a hint or the optimizer, flush the hash partitions that don’t fit in memory to the temporary tablespace. The hash partitions in the temporary tablespace are read back into the memory to be probed for rows that match the SQL predicates. An Oracle session waits on the direct path read wait event while the hash partitions are being read.
SQL statements that employ parallel scans also contribute to the systemwide direct path read waits. In a parallel execution, the direct path read waits are associated with the query slaves and not the parent query. The session that runs the parent query waits mostly on the PX Deq: Execute Reply wait event (in Oracle8i Database and above).
As of Oracle 8.1.7, there is a separate direct read wait event for LOB segments: direct path read (lob). This wait event applies to LOBs that are stored as NOCACHE. When LOBs are stored as CACHE, reads and writes go through the buffer cache and waits show up as db file sequential read.
It is highly unlikely that the direct path read wait event will show up as the leading bottleneck within a session even though it may actually be it. The reasons are as follows:
The way Oracle accounts for the waits, as discussed in Chapter 3.
Sessions that perform parallel processing using parallel query do not wait on the direct path read wait event, which normally represents the bulk of the time. The direct path read waits are associated with the query slaves’ sessions that scan the tables. The only time the direct path read wait event shows up within a parent session is when the parent session itself has read activities against the temporary tablespace. This is driven by SQL functions, such as ORDER BY, GROUP BY, and DISTINCT, or hash partitions that spill to disk but not parallel scans.
Therefore you shouldn’t evaluate direct path read waits based on the TOTAL_WAITS or TIME_WAITED in the V$SESSION_EVENT view. Instead, you can find current sessions that perform a lot of direct read operations from the V$SESSTAT view using the following query. The physical reads direct is comprised of the direct reads that are originated by the parent session itself as well as the sum of all direct reads that are originated by the query slaves that the parent session employs. The direct reads that are initiated by query slaves are only reflected in the parent session when the slaves complete their jobs. The downside to this approach is that there is no time element.
select a.name, b.sid, b.value,
round((sysdate - c.logon_time) * 24) hours_connected
from v$statname a, v$sesstat b, v$session c
where b.sid = c.sid
and a.statistic# = b.statistic#
and b.value > 0
and a.name = 'physical reads direct'
order by b.value;
NAME SID VALUE HOURS_CONNECTED
------------------------- ---- ---------- ---------------
physical reads direct 2 41 980
physical reads direct 4 41 980
physical reads direct 5 445186 980
Apart from finding the sessions that perform a lot of direct reads, you must also find out where the sessions are reading from (temporary tablespace or data files), the SQL statements that initiate the waits, and the type of segments being read. The following query gives you the answers. Sessions that read from the temporary tablespace may be reading sort or hash segments. Sessions that read from data files normally belong to parallel query slaves.
1,'SORT', 2,'HASH', 3,'DATA',
null) as segment_type,
from v$session_wait a, dba_data_files b, v$session c, x$ktsso d
where c.saddr = d.ktssoses(+)
and c.serial# = d.ktssosno(+)
and d.inst_id(+) = userenv('instance')
and a.sid = c.sid
and a.p1 = b.file_id
and a.event = 'direct path read'
1,'SORT', 2,'HASH', 3,'DATA',
null) as segment_type,
from v$session_wait a, dba_temp_files b, v$parameter c,
v$session d, x$ktsso e
where d.saddr = e.ktssoses(+)
and d.serial# = e.ktssosno(+)
and e.inst_id(+) = userenv('instance')
and a.sid = d.sid
and b.file_id = a.p1 - c.value
and c.name = 'db_files'
and a.event = 'direct path read'
order by 1,2;
-- File name output is edited to fit page.
EVENT SID HASH_VALUE SEGMENT TABLESPACE_N FILE_NAME
------------------ --- ---------- ------- ------------ -----------------
direct path read 8 511952958 SORT TEMP_BATCH temp_batch_01.dbf
direct path read 9 3138787393 ORDERS orders_01.dbf
direct path read 11 3138787393 ORDERS orders_01.dbf
direct path read 12 3138787393 ORDERS orders_01.dbf
direct path read 14 3138787393 ORDERS orders_01.dbf
If you catch a session reading sort segments from the temporary tablespace, this indicates the SORT_AREA_SIZE (or work area size if you use the PGA_AGGREGATE_TARGET in Oracle9i Database) is not large enough to accommodate a cache sort (or in memory sort). This is fine. It is unrealistic to expect all SQL statements to perform cache sorts. However, you should avoid multipass sorts because they create a lot of I/Os to the temporary tablespace and are very slow. How can you tell if a SQL statement is doing a multipass sort? Well, it is not that easy in versions prior to Oracle9i Database. You have to trace the session with the event 10032 and examine the trace file. However, beginning in Oracle9i Database, you can simply query the V$SQL_WORKAREA or V$SQL_WORKAREA_ACTIVE views with the SQL hash value that performs the sort. For a more in-depth discussion on sort, please review the “If Your Memory Serves You Right” white paper from the International Oracle Users Group (IOUG) 2004 conference proceedings at www.ioug.org.
The goal of tuning in this case is to minimize the number of sorts as a whole and, more specifically, disk sorts. Increasing the SORT_AREA_SIZE (or PGA_AGGREGATE_ TARGET) may help reduce the number of disk sorts, but that’s usually the workaround rather than the cure, unless your SORT_AREA_SIZE is unreasonably small to begin with. You should examine the application and the SQL statements to see if sorts are really necessary. Applications have the tendency to abuse the DISTINCT and UNION functions. Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN. Make sure the optimizer selects the right driving table. Check to see if the composite index’s columns can be rearranged to match the ORDER BY clause to avoid sort entirely. Also, consider automating the SQL work areas using PGA_AGGREGATE_TARGET in Oracle9i Database. Statistically, the automatic memory management delivers a higher percentage of cache sorts.
Be careful when switching from UNION to UNION ALL as this can produce different results depending on the data. The UNION operator returns only distinct rows that appear in either result, while the UNION ALL operator returns all rows.
By default, the HASH_AREA_SIZE is twice the SORT_AREA_SIZE. A larger HASH_AREA_SIZE will influence the optimizer toward the hash joins (full table scan) rather than nested loops operation.
Similarly, if you catch a session reading hash segments from the temporary tablespace, all that tells you is that the HASH_AREA_SIZE (or work area size in case of the PGA_AGGREGATE_TARGET in Oracle9i Database) is not big enough to accommodate the hash table in memory. The solution is similar to the one just mentioned: cure it from the application and SQL tuning before adjusting the HASH_AREA_SIZE (or PGA_AGGREGATE_TARGET). Unless, of course, your HASH_AREA_SIZE is too small to begin with.
If you discover that the direct reads belong to parallel query slaves, you should verify if parallel scans are appropriate for the parent SQL statement and that the degree of parallelism is right. Make sure the query slaves do not saturate your CPUs or disks. Identifying the parent SQL statement can be a bit tricky because the hash value of the parent statement is not the same as the hash value of child statements that the query slaves execute. It was even trickier before the V$PX_SESSION view was introduced in Oracle 8.1.5. Following are two examples, one for versions prior to 8.1.5 and the other for version 8.1.5 and greater, that can help you identify the parent SQL statements when parallel queries are involved:
-- For versions prior to 8.1.5.
-- Note: This query is not able to differentiate parallel query statements
-- that are executed by multiple SYS users as they all share a common
select decode(ownerid,2147483644,'PARENT','CHILD') stmt_level,
where type <> ‘BACKGROUND’
and audsid in (select audsid
group by audsid
having count(*) > 1)
order by audsid, stmt_level desc, sid, username, osuser;
STMT_L AUDSID SID SERIAL# USERNAME OSUSER PROCESS HASH_VALUE SQL_ADDR
------ -------- ---- ------- -------- ------- ------- ---------- --------
PARENT 3086501 20 779 INTREPID cdh8455 16537 3663187692 A0938E54
CHILD 3086501 12 841 INTREPID cdh8455 16544 817802256 A092E1CC
CHILD 3086501 14 2241 INTREPID cdh8455 16546 817802256 A092E1CC
CHILD 3086501 17 3617 INTREPID cdh8455 16540 817802256 A092E1CC
CHILD 3086501 21 370 INTREPID cdh8455 16542 817802256 A092E1CC
The following query applies to version 8.1.5 and higher.
select decode(a.qcserial#, null, 'PARENT', 'CHILD') stmt_level,
from v$px_session a, v$session b
where a.sid = b.sid
order by a.qcsid, stmt_level desc;
STMT_L SID SERIAL# USERNAME OSUSER HASH_VALUE SQL_ADDR DEG REQ_DEG
------ --- ------- -------- ------- ---------- -------- --- -------
PARENT 20 779 INTREPID cdh8455 3663187692 A0938E54
CHILD 17 3617 INTREPID cdh8455 817802256 A092E1CC 4 4
CHILD 21 370 INTREPID cdh8455 817802256 A092E1CC 4 4
CHILD 12 841 INTREPID cdh8455 817802256 A092E1CC 4 4
CHILD 14 2241 INTREPID cdh8455 817802256 A092E1CC 4 4
Initialization Parameters of Interest
The DB_FILE_DIRECT_IO_COUNT initialization parameter can impact the direct path read performance. It sets the maximum I/O buffer size for direct reads and writes operations. Up to Oracle8i Database, the default value on most platforms is 64 blocks. So if the DB_BLOCK_SIZE is 8K, the maximum I/O buffer size for direct reads and writes operations is 512K. This number is further subject to hardware limits.
The DB_FILE_DIRECT_IO_COUNT parameter is hidden in Oracle9i Database, and the value is expressed in bytes instead of blocks. The default value in Oracle9i Database is 1MB. The actual direct I/O size depends on your hardware configuration and limits.
You can discover the actual direct read I/O size in three ways:
Trace the Oracle session that performs direct reads operations using the trace event 10046 at level 8. The P3 parameter indicates the number of blocks read. Based on the following example, the direct path readI/O size is 64K since the block size is 8K. Alternatively, you can query the V$SESSION_WAIT view for the P3 value of the direct path read event.
WAIT #1: nam='direct path read' ela= 4 p1=4 p2=86919 p3=8
WAIT #1: nam='direct path read' ela= 5 p1=4 p2=86927 p3=8
WAIT #1: nam='direct path read' ela= 10 p1=4 p2=86935 p3=8
WAIT #1: nam='direct path read' ela= 39 p1=4 p2=86943 p3=8
WAIT #1: nam='direct path read' ela= 5 p1=4 p2=86951 p3=8
WAIT #1: nam='direct path read' ela= 38 p1=4 p2=86959 p3=8
. . .
Trace the Unix session that performs direct reads or writes operations using the operating system trace facility such as truss, tusc, trace, or strace. The snippet of the truss report from an Oracle9i Database reveals the direct I/O size is 65536 bytes or 64K:
9218/6: kaio(AIONOTIFY, -14712832) = 0
9218/1: kaio(AIOWAIT, 0xFFBECE98) = 1
9218/1: lwp_cond_signal(0xFEB7BFA0) = 0
9218/3: pread64(256, "0602\0\001\0 ~13C19AEE }".., 65536, 0x0FC26000) = 65536
9218/1: lwp_cond_signal(0xFEB69FA0) = 0
9218/4: pread64(256, "0602\0\001\0 ~1BC19AEFE7".., 65536, 0x0FC36000) = 65536
. . .
Enable the debug information for the session that performs direct I/O operations using the trace event 10357 at level 1. Example: alter session set events '10357 trace name context forever, level 1'. The snippet of the trace file is provided here:
Unix process pid: 4375, image: oracle@kccdeds73 (P000)
*** SESSION ID:(9.18) 2004-02-08 21:47:01.908
DBA Range Initialized: length is 1570, start dba is 0100602b
kcblin: lbs=fc86c1cc flag=8 slot_cnt=32 slot_size=65536 state obj=24321224
kcblin: state objects are: Call=243a2210,Current Call=243a2210, Session=24321224
kdContigDbaDrCbk:starting from tsn 5
kdContigDbaDrCbk:starting from rdba 0100602b
kdContigDbaDrCbk:returning 1570 blocks
kcblrs:issuing read on slot : 0
kcbldio:lbs=fc86c1cc slt=fc86408c typ=0 async=1 afn=4 blk=602b cnt=8 buf=fc87fe00
kcblrs:issuing read on slot : 1
kcbldio:lbs=fc86c1cc slt=fc864210 typ=0 async=1 afn=4 blk=6033 cnt=8 buf=fc89fe00
kcblcio: lbs=fc86c1cc slt=fc86408c type=0 afn=4 blk=602b cnt=8 buf=fc87fe00
. . .
In the preceding example, the trace file belongs to query slave #0 (P000). There are 32 I/O slots available for the direct read operation (slot_cnt=32). A slot is a unit of I/O, and each slot is 65536 bytes (slot_size=65536). Asynchronous I/O is enabled during the read operation (async=1). The query slave reads data file #4 (afn=4). The number of blocks read is 8 (cnt=8). Since the block size is 8K, this translates to 65536 bytes.
In this case, the direct I/O slot size prevents the process from achieving the full 1MB, which is the default limit of the _DB_FILE_DIRECT_IO_COUNT parameter. The slot size can be modified by event 10351. The number of slots can be modified by event 10353.
The preceding information gives you a sense of the direct I/O throughput in your system. Don’t simply change the default slot size or the number of direct I/O slots. You need to know your hardware limits before doing so. Besides, you should focus on optimizing the application and SQL statements first.
Lastly, in Oracle8i Database, direct reads can also be enabled for serial scans with the _SERIAL_DIRECT_READ initialization parameter. Earlier releases may do the same by setting event 10355. Doing so will cause data from full scans to be read into the session’s PGA and not shared with other sessions. This can sharply increase memory usage.