Overview
Oracle's cost-based optimizer (COB) uses statistics to calculate the selectivity (the fraction of rows in a table that the SQL statement's predicate chooses) of predicates and to estimate the "cost" of each execution plan. The COB will use the selectivity of a predicate to estimate the cost of a particular access method and to determine the optimal join order.
Statistics are used to quantify the data distribution and storage characteristics of tables, columns, indexes and partitions. The COB uses these statistics to estimate how much I/O and memory are required to execute a SQL statement using a particular execution plan. Statistics are stored in the data dictionary, and they can be exported from one database and imported into another. Situations in where you would want to perform this, might be to transfer production statistics to a test system to simulate the real environment, even though the test system may only have small samples of the data.
In order to give the Oracle cost-based optimizer the most up-to-date information about schema objects (and the best chance for choosing a good execution plan) all application tables and indexes to be accessed must be analyzed. New statistics should be gathered on schema objects that are out of date. After loading or deleting large amounts of data would obviously change the number of rows. Other changes like updating a large amount of rows would not effect the number of rows, but may effect the average row length.
Statistics can be generated with the ANALYZE statement or with the package DBMS_STATS (introduced in Oracle8i). The DBMS_STATS package is great for DBA's in managing database statistics only for use by the COB. The package itself allows the DBA to create, modify, view and delete statistics from a standard, well-defined set of package procedures. The statistics can be gathered on tables, indexes, columns, partitions and schemas, but note that it does not generate statistics for clusters.
DBMS_STATS provides a mechanism for you to view and modify optimizer statistics gathered for database objects.The statistics can reside in two different locations:
The dictionary.
A table created in the user's schema for this purpose.
Only statistics stored in the dictionary itself have an impact on the cost-based optimizer.
When you generate statistics for a table, column, or index, if the data dictionary already contains statistics for the object, then Oracle updates the existing statistics. Oracle also invalidates any currently parsed SQL statements that access the object.
The next time such a statement executes, the optimizer automatically chooses a new execution plan based on the new statistics. Distributed statements issued on remote databases that access the analyzed objects use the new statistics the next time Oracle parses them.
When you associate a statistics type with a column or domain index, Oracle calls the statistics collection method in the statistics type if you analyze the column or domain index.
Analyze vs. DBMS_STATS
The following is a quick overview of the two.
Analyze
The only method available for collecting statistics in Oracle 8.0 and lower.
ANALYZE can only run serially.
ANALYZE cannot overwrite or delete certain types of statistics that where generated by DBMS_STATS.
ANALYZE calculates global statistics for partitioned tables and indexes instead of gathering them directly. This can lead to inaccuracies for some statistics, such as the number of distinct values.
For partitioned tables and indexes, ANALYZE gathers statistics for the individual partitions and then calculates the global statistics from the partition statistics.
For composite partitioning, ANALYZE gathers statistics for the subpartitions and then calculates the partition statistics and global statistics from the subpartition statistics.
ANALYZE can gather additional information that is not used by the optimizer, such as information about chained rows and the structural integrity of indexes, tables, and clusters. DBMS_STATS does not gather this information.
No easy way of knowing which tables or how much data within the tables have changed. The DBA would generally re-analyze all of their tables on a semi-regular basis.
DBMS_STATS
Only available for Oracle 8i and higher.
Statistics can be generated to a statistics table and can then be imported or exported between databases and re-loaded into the data dictionary at any time. This allows the DBA to experiment with various statistics.
DBMS_STATS routines have the option to run via parallel query or operate serially.
Can gather statistics for sub-partitions or partitions.
Certain DDL commands (ie. create index) automatically generate statistics, therefore eliminating the need to generate statistics explicitly after DDL command.
DBMS_STATS does not generate information about chained rows and the structural integrity of segments.
The DBA can set a particular table, a whole schema or the entire database to be automatically monitored when a modification occurs. When enabled, any change (insert, update, delete, direct load, truncate, etc.) that occurs on a table will be tracked in the SGA. This information is incorporated into the data dictionary by the SMON process at a pre-set interval (every 3 hours in Oracle 8.1.x, and every 15 minutes in Oracle 9i). The information collected by this monitoring can be seen in the DBA_TAB_MODIFICATIONS view. Oracle 9i introduced a new function in the DBMS_STATS package called: FLUSH_DATABASE_MONITORING_INFO. The DBA can make use of this function to flush the monitored table data more frequently. Oracle 9i will also automatically call this procedure prior to executing DBMS_STATS for statistics gathering purposes. Note that this function is not included with Oracle 8i.
DBMS_STATS provides a more efficient, scalable solution for statistics gathering and should be used over the traditional ANALYZE command which does not support features such as parallelism and stale statistics collection.
Use of table monitoring in conjunction with DBMS_STATS stale object statistics generation is highly recommended for environments with large, random and/or sporadic data changes. These features allow the database to more efficiently determine which tables should be re-analyzed versus the DBA having to force statistics collection for all tables. Including those that have not changed enough to merit a re-scan)
What gets collected?
Table Statistics
Oracle collects the following statistics for a table. Statistics marked with an asterisk are always computed exactly. Table statistics, including the status of domain indexes, appear in the data dictionary views USER_TABLES, ALL_TABLES, and DBA_TABLES in the columns shown in parentheses.
Number of rows (NUM_ROWS)
* Number of data blocks below the high water mark (that is, the number of data blocks that have been formatted to receive data, regardless whether they currently contain data or are empty) (BLOCKS)
* Number of data blocks allocated to the table that have never been used (EMPTY_BLOCKS)
Average available free space in each data block in bytes (AVG_SPACE)
Number of chained rows. [Not collected by DBMS_STATS] (CHAIN_COUNT)
Average row length, including the row's overhead, in bytes (AVG_ROW_LEN)
Index Statistics
Oracle collects the following statistics for an index. Statistics marked with an asterisk are always computed exactly. For conventional indexes, the statistics appear in the data dictionary views USER_INDEXES, ALL_INDEXES, and DBA_INDEXES in the columns in parentheses.
* Depth of the index from its root block to its leaf blocks (BLEVEL)
Number of leaf blocks (LEAF_BLOCKS)
Number of distinct index values (DISTINCT_KEYS)
Average number of leaf blocks per index value (AVG_LEAF_BLOCKS_PER_KEY)
Average number of data blocks per index value (for an index on a table) (AVG_DATA_BLOCKS_PER_KEY)
Clustering factor (how well ordered the rows are about the indexed values) (CLUSTERING_FACTOR)
Where are the statistics stored?
Statistics are stored into the Oracle Data Dictionary, in tables owned by SYS. Views are created on these tables to retrieve data more easily.
These views are prefixed with DBA_ or ALL_ or USER_. For ease of reading, we will use DBA_% views, but ALL_% views or USER_% views could be used as well.
Conventions Used
- Statistics available only since 8.0.X rdbms release : (*)
- Statistics available only since 8.1.X rdbms release : (**)
- Statistics not available at partition or subpartition level : (G)
- Statistics not available at subpartition level : (GP)
Table level statistics can be retrieved from:
· DBA_ALL_TABLES - (8.X onwards)
· DBA_OBJECT_TABLES - (8.X onwards
· DBA_TABLES - (all versions)
· DBA_TAB_PARTITIONS - (8.X onwards)
· DBA_TAB_SUBPARTITIONS - (8.1 onwards)
Columns to look at are:
NUM_ROWS : Number of rows (always exact even when computed
with ESTIMATE method)
BLOCKS : Number of blocks which have been used even
if they are empty due to delete statements
EMPTY_BLOCKS : Number of empty blocks (these blocks have
never been used)
AVG_SPACE : Average amount of FREE space in bytes in blocks
allocated to the table : Blocks + Empty Blocks
CHAIN_CNT : Number of chained or migrated rows
AVG_ROW_LEN : Average length of rows in bytes
AVG_SPACE_FREELIST_BLOCKS (*)(G) : Average free space of blocks in the freelist
NUM_FREELIST_BLOCKS (*)(G) : Number of blocks in the freelist
SAMPLE_SIZE : Sample defined in ESTIMATE method (0 if COMPUTE)
LAST_ANALYZED : Timestamp of last analysis
GLOBAL_STATS (**) : For partitioned tables, YES means statistics
are collected for the TABLE as a whole
NO means statistics are estimated from statistics
on underlying table partitions or subpartitions
USER_STATS (**) : YES if statistics entered directly by the user
Index level statistics can be retrieved from:
· DBA_INDEXES - (all versions )
· DBA_IND_PARTITIONS - (8.X onwards)
· DBA_IND_SUBPARTITIONS - (8.1 onwards )
Columns to look at are:
BLEVEL : B*Tree level : depth of the index from its root
block to its leaf blocks
LEAF_BLOCKS : Number of leaf blocks
DISTINCT_KEYS : Number of distinct keys
AVG_LEAF_BLOCKS_PER_KEY : Average number of leaf blocks in which each
distinct key appears (1 for a UNIQUE index)
AVG_DATA_BLOCKS_PER_KEY : Average number of data blocks in the table that
are pointed to by a distinct key
CLUSTERING_FACTOR : - if near the number of blocks, then the table is
ordered : index entries in a single leaf block
tend to point to rows in same data block
- if near the number of rows, the table is
randomly ordered : index entries in a single
leaf block are unlikely to point to rows in
same data block
SAMPLE_SIZE : Sample defined in ESTIMATE method (0 if COMPUTE)
LAST_ANALYZED : Timestamp of last analysis
GLOBAL_STATS (**) : For partitioned indexes, YES means statistics
are collected for the INDEX as a whole
NO means statistics are estimated from statistics
on underlying index partitions or subpartitions
USER_STATS (**) : YES if statistics entered directly by the user
PCT_DIRECT_ACCESS (**)(GP) : For secondary indexes on IOTs, percentage of
rows with VALID guess
Column level statistics can be retrieved from:
· DBA_TAB_COLUMNS - (all versions)
· DBA_TAB_COL_STATISTICS - (Version 8.X onwards)
· DBA_PART_COL_STATISTICS - (Version 8.X onwards)
· DBA_SUBPART_COL_STATISTICS - (Version 8.1 onwards)
The last three views extract statistics data from DBA_TAB_COLUMNS.
Columns to look at are:
NUM_DISTINCT : Number of distinct values
LOW_VALUE : Lowest value
LOW_VALUE : Highest value
DENSITY : Density
NUM_NULLS : Number of columns having a NULL value
AVG_COL_LEN : Average length in bytes
NUM_BUCKETS : Number of buckets in histogram for the column
SAMPLE_SIZE : Sample defined in ESTIMATE method (0 if COMPUTE)
LAST_ANALYZED : Timestamp of last analysis
(**)GLOBAL_STATS : For partitioned tables, YES means statistics
are collected for the TABLE as a whole
NO means statistics are estimated from statistics
on underlying table partitions or subpartitions
(**)USER_STATS : YES if statistics entered directly by the user
DBMS_STATS functions and variable definitions
Most of the DBMS_STATS procedures include the three parameters statown, stattab, and statid. These parameters allow you to store statistics in your own tables (outside of the dictionary), which does not affect the optimizer. Therefore, you can maintain and experiment with sets of statistics.
The stattab parameter specifies the name of a table in which to hold statistics, and it is assumed that it resides in the same schema as the object for which statistics are collected (unless the statown parameter is specified). Users may create multiple tables with different stattab identifiers to hold separate sets of statistics.
Additionally, users can maintain different sets of statistics within a single stattab by using the statid parameter, which can help avoid cluttering the user's schema.
For all of the SET or GET procedures, if stattab is not provided (i.e., NULL), then the operation works directly on the dictionary statistics; therefore, users do not need to create these statistics tables if they only plan to modify the dictionary directly. However, if stattab is not NULL, then the SET or GET operation works on the specified user statistics table, and not the dictionary.
Create Stats Table
DBMS_STATS.CREATE_STAT_TABLE (
ownname VARCHAR2,
stattab VARCHAR2,
tblspace VARCHAR2 DEFAULT NULL);
ownname : Name of the schema.
stattab : Name of the table to create. This value should be passed as the stattab parameter to other procedures when the user does not want to modify the dictionary statistics directly.
tblspace : Tablespace in which to create the stat tables. If none is specified, then they are created in the user's default tablespace.
Drop Stats Table
DBMS_STATS.drop_stat_table (
ownname VARCHAR2,
stattab VARCHAR2);
ownname : Name of the schema.
stattab : User stat table identifier.
Gather Schema Stats
DBMS_STATS.gather_schema_stats (
ownname VARCHAR2,
estimate_percent NUMBER DEFAULT NULL,
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT 'FOR ALL COLUMNS SIZE 1',
degree NUMBER DEFAULT NULL,
granularity VARCHAR2 DEFAULT 'DEFAULT',
cascade BOOLEAN DEFAULT FALSE,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
options VARCHAR2 DEFAULT 'GATHER',
objlist OUT ObjectTab,
statown VARCHAR2 DEFAULT NULL);
ownname : Schema to analyze (NULL means current schema).
estimate_percent : Percentage of rows to estimate (NULL means compute): The valid range is [0.000001,100).
block_sample : Whether or not to use random block sampling instead of random row sampling. Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated. Only pertinent when doing an estimate statistics.
method_opt : Method options of the following format (the phrase 'SIZE 1' is required to ensure gathering statistics in parallel and for use with the phrase hidden):
FOR ALL [INDEXED | HIDDEN] COLUMNS [SIZE integer]
This value is passed to all of the individual tables.
degree : Degree of parallelism (NULL means use table default value).
granularity : Granularity of statistics to collect (only pertinent if the table is partitioned).
DEFAULT: Gather global- and partition-level statistics.
SUBPARTITION: Gather subpartition-level statistics.
PARTITION: Gather partition-level statistics.
GLOBAL: Gather global statistics.
ALL: Gather all (subpartition, partition, and global) statistics.
cascade : Gather statistics on the indexes as well.
Index statistics gathering is not parallelized. Using this option is equivalent to running the gather_index_stats procedure on each of the indexes in the schema in addition to gathering table and column statistics.
stattab : User stat table identifier describing where to save the current statistics.
statid : Identifier (optional) to associate with these statistics within stattab.
options : Further specification of which objects to gather statistics for:
GATHER: Gather statistics on all objects in the schema.
GATHER STALE: Gather statistics on stale objects as determined by looking at the *_tab_modifications views. Also, return a list of objects found to be stale.
GATHER EMPTY: Gather statistics on objects which currently have no statistics. also, return a list of objects found to have no statistics.
LIST STALE: Return list of stale objects as determined by looking at the *_tab_modifications views.
LIST EMPTY: Return list of objects which currently have no statistics.
objlist : List of objects found to be stale or empty.
statown : Schema containing stattab (if different than ownname).
Export Schema Stats
DBMS_STATS.export_schema_stats (
ownname VARCHAR2,
stattab VARCHAR2,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL);
ownname : Name of the schema.
stattab : User stat table identifier describing where to store the statistics.
statid : Identifier (optional) to associate with these statistics within stattab.
statown : Schema containing stattab (if different than ownname).
Import Schema Stats
DBMS_STATS.import_schema_stats (
ownname VARCHAR2,
stattab VARCHAR2,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL);
ownname : Name of the schema.
stattab : User stat table identifier describing from where to retrieve the statistics.
statid : Identifier (optional) to associate with these statistics within stattab.
statown : Schema containing stattab (if different than ownname).
Delete Schema Stats
DBMS_STATS.delete_schema_stats (
ownname VARCHAR2,
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL);
ownname : Name of the schema.
stattab : User stat table identifier describing from where to delete the statistics. If stattab is NULL, then the statistics are deleted directly in the dictionary.
statid : Identifier (optional) to associate with these statistics within stattab (Only pertinent if stattab is not NULL).
statown : Schema containing stattab (if different than ownname).