Optimizer Mistakes

 

Distribution

  
Querying greater values greater than the "clump" can cause a full table scan if there are no histograms



Values Out of Bounds


John Kanagaraj
Be aware that bind peeking occurs even when Histograms are not present
- changing LO/HI values also causes this and given the fact that the
right plan was chosen when you flushed this from the shared pool, it
is a probable cause. How are you collecting your stats, and what is
the rate of change in lo/hi values? Investigate this line of thought
and you will probably arrive at he root cause of the problem. We would
appreciate it if you found anything and posted it back to the list.
Stephane Faroult
 To add to what John said, some strange things also happen when
values fall OUTSIDE the hi/lo range. I've seen something very similar
one year ago. Touching the code was forbidden, it ended up with an
outline. Values outside the known range suddenly become super-selective
for the optimizer. Add the fact that a 2-column index is probably
smaller (fewer blocks to inspect) than a 3-column index,  a pinch of
clustering factor, and the choice of the "wrong" index may have been
perfectly logical when the query was parsed.

HTH
SF


Index with Nullable fields not used

Fast Full Index Scan
when querying all fields in an index the index will not be used if all the fields are nullable.


NLS Settings

Martin Berger <martin.a.berger@gmail.com>

create table ue (no number, na varchar2(2));
CREATE UNIQUE INDEX PK_ue ON ue (na);
alter table ue add constraint pk_ue primary key (na);
insert into ue values (58, 'Xx');
insert into ue values (75, 'uu');
commit;
set autotrace on 
alter session set nls_sort='binary';
select /*+ INDEX(UE) */ na, dump(na, 16) d  from ue order by na;
alter session set nls_sort='german';
select /*+ INDEX(UE) */ na, dump(na, 16) d  from ue order by na;

nls_sort='binary'; -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 6 | 1 (0)| 00:00:01 | | 1 | INDEX FULL SCAN | PK_UE | 2 | 6 | 1 (0)| 00:00:01 | -------------------------------------------------------------------------- nls_sort='german'; -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 6 | 2 (50)| 00:00:01 | | 1 | SORT ORDER BY | | 2 | 6 | 2 (50)| 00:00:01 | | 2 | INDEX FULL SCAN| PK_UE | 2 | 6 | 1 (0)| 00:00:01 | --------------------------------------------------------------------------

Small set of high cardinality values in large set of low cardinality values

David Kurtz 10.2.0.4 on Linux. 166,067 rows, 50,060 distinct values in an indexed column. 50,000 values have 3 rows each
60 values have over 100 rows each When I gather stats for all columns size auto I do not get a histogram on this column. I have to force gather_table_stats to create a histogram on this column (I've tried 25 to 254 buckets) However, the optimizer doesn't seem to use the histogram in calculating the cost select /*+GATHER_PLAN_STATISTICS*/ count(*) from psroleuser where roleuser ='HVA_FUNC' --------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------- | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 4 | |* 2 | INDEX RANGE SCAN| PS_PSROLEUSER | 1 | 3 | 169 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------- I have another version of this table on a test database, but the volume is slightly smaller (26000 rows, 3300 distinct values) That uses the histogram exactly as I would expect, the estimated number of rows is close to reality, and the execution plan of the complex query changes.
Greg Rahn
select /*+ cardinality(psroleuser 169) */ * from psroleuser where roleuser ='HVA_FUNC'
Alex Gorbachev
 150,000 rows  (50K*3) of non-skewed data and the rest (16,067) is the skewed data
Uri Shaft
You can only use width balanced histograms to estimate selectivity that is at least 1% (for 254 buckets) with any 
degree of accuracy. 
Wolfgang Brietling
How about fixing num_distinct and density such that the standard equality selectivity formula gives the correct value of 3 for the ~50,000 values and create a custom histogram for the 60 values with more than 100 rows.

Comments