Examples‎ > ‎

Tuning - select with outlier (oracle)

from Richard Foote's example

drop table hist_test;

create table hist_test (
         id1 number, 
         id2 number, 
         id3 number);

insert into hist_test (id1, id2, id3) 
                    select rownum, 
       from dual connect by level <= 1000000;

update hist_test set id1=1000000000 where id1=1000000;

alter table hist_test add primary ky(id1);


  Database doesn't  know that there is only one extreme  outlier  thus the database thinks that 

             select * from hist_test where id1 > 1000000;

  will return the majority of rows based on the bad assumption that the values are evenly distributed between

                1 to 1,000,000,000 

  thus the optimizer thinks the predicate 

               id1  > X

   will return  

                 100% * ( 1 -  (  X / total_number_rows )  = 100% * ( 1 - ( 1,000,000/  1,000,000,000 ) = 99.9 % of the rows
and the optimizer then thinks it's cheaper to do a FULL TABLE SCAN , but that's based on the false assumption  of "Even Distribution"

The actual distribution is all grouped together except for one outlier, so in our case the distribution looks like

The actual result of 

             select * from hist_test where id1 > 1000000;

only returns 1 row out of 1,000,000 which is 

              100% * 1/1000000 or 0.0001%  of the rows

And the optimizer should have accessed the data via an INDEX RANGE SCAN based on correct statistics from histogram