ASH stands for "active session history" and it's official introduction was in Oracle 10g. Oracle 10g introduced ASH and Oracle 11 builds upon that foundation. ASH is a radically new way to gather performance data. “Radical?” You might ask. Yes, “radical” because ASH only takes samples of data once a second and what happens between samples is lost. The lost data bothers many at first but this unease quickly passes when the power of ASH is understood. ASH provides information necessary to solve some of the toughest performance problems. Before ASH this information was often too difficult and too expensive to get. Previous performance gathering techniques such as STATSPACK (and continued with the even more expensive AWR whose report is almost the same as STATSPACK) lacked the information to solve many performance bottlenecks. Session tracing provides much of the same information as ASH but is much more costly and has to be set up before a problem arises, proving to be impractical in many situations. Starting in Oracle 10g, ASH is always running, sampling every second, and saved for a week on disk (configurable), thus providing the data need to identify and solve a problem that may have only lasted seconds and days ago.
ASH is way of sampling the state of sessions connected to an Oracle database in order to monitory database load as well as drill down into any performance issues that may arise. ASH is a technology that can be applied to any system with connected users such as another database like SQL Server or an operating system or even applications and application servers. Oracle is the first system that I know of using ASH to collect performance data. ASH by default on an Oracle database, samples once every second and logs some 30 to 60 (depending on version) pieces of information on any session that is active at the time of sampling. One hurtle to accepting ASH as a model had been the question “how fast do we have to sample to get worth while data”. There has been a tendency (I admit having felt this) to want to sample super fast like 10-100 times a second (easily possible with a C program) but experience has shown that most monitoring and problem resolution can be easily accomplished with 1 second sampling. Higher rate of sampling serve only for rare cases that are better left to tracing. For 99.9% of the time, sampling once a second is sufficient to both show clearly the load on the system as well as provide detailed information on system activity that would otherwise be too difficult or prohibitively expensive to collect (due to the load caused by the data collection itself).
ASH is the collection of active session stated sampled every second. An active session is any session that has made a call to the database.
The commitment that Oracle took to ASH required a break from the compulsive quest to gather all the statistics possible at a 100% accuracy which was a futile and limiting strategy. By letting go the drive to collect everything all the time accuratly, Oracle was able to collect more information, more powerfully and with less over head. How can less be more? ASH took an understanding that the most powerful performance data can be collected not by collecting every thing but by collecting the most important information in a particular manor.
Instead of collecting the exact statistics, ASH is a statistical approximation to the statistic counters. ASH samples every second the session states of all active sessions as well as the SQL the session is executing. This sampling produces a statistical approximation that is cheaper to collect and the multidimensional data allowing new and previously impossible diagnostics.
Intractable Performance Data Collection Problem
Previous to ASH, Oracle performance collections tried to collect as much statistics as often as possible.
This produced a couple of problems
2. Overwhelming amount of data
For example, database tools tried to collect statistics on all the Sessions, SQL and Objects in the database.
Session information came from v$session.
The collection of this information amounted to massive amounts of work. For example on a system with 150 sessions the number of values to collect would be 150 sessions x (800 wait events + 200 statitics) = 150,000 values to collect every collection ! That's just session information. For SQL, there could be tens of thousands and for each statement we might want to collect a couple dozen statistics. Same for objects - there could be 1000s of objects and for each we'd want to collect a dozen of statistics. The problem quickly becomes intractable.
There are optimizations of the collection that allowed some people to doggedly try and collect these kind of statistics. For example for Sessions, SQL and Objects we could filter out any statistics that were zero but for any statistics that wasn't zero we'd have collect it because we wouldn't know if it had changed since we last collected it. For sessions, Oracle does have a counter that tells whether the session has been active since we last collected but SQL and Objects don't have any such counter
How about other things like v$filestat, etc etc?
The solution? Every second query v$session which is an inexpesive view onto the actual C structures in Oracle shared memory. Only collect information for sessions that are active at the time of collection. Collecting only active sessions is a natural filter for everything we want to filter. This filters out not only unwanted sessions, but also serves as a guide to the SQL,Object and Files we also want to collect information on. We gather not only session information and state, but also gather what SQL they were executing, files and objects they were accessing, honing in directly to the information of most interest to us.
Intelligently Collects DataOld methods:
Sample Every Second
Knowing everything – impossible and overwhelming
Sampling is like taking a motion picuture. We miss what happens between pictures but we get enough to know exactly what happened.
If it happens a lot or for long ... it will be captured by ASH.
Wait events can be classified into 4 major groups
On Oracle databases the classification is easy thanks to a table called V$EVENT_NAME which has a filed "WAIT_CLASS" that tells the kind of wait.
One problem with Oracle's wait interface when it was introduced in version 7 was that there was no documentation on the wait events and to make matters worse, many of the wait events were "idle", ie the meant that the processes had no work to do and were just waiting. For example if I run SQL*PLUS but don't actually run any queries, then my session in the database will report that it's waiting for "SQL*Net message to client", meaning the session is ready and waiting to execute queries but I'm not giving it any. Once I submit a query then my state would change to either running on the CPU or some actual non-idle wait event like waiting for IO. But once my query finishes executing and the results are given back to the user in SQL*Plus then the session state in the database goes back to "SQL*Net message to client". The wait event "SQL*Net message to client" is only one of many idle wait events that don't signify bottlenecks but if I don't know they are idle events, then it might look like my database has huge bottlenecks. This confusion between real wait events and idle wait events and the lack of documentation slowed the adoption of the wait interface by DBAs and performance analysts. Luckily now it's easy to get the list of idle wait events and create a list of events to ignore with the query:
If on Oracle 9i or below, you can install statspack which has a table STATS$IDLE_EVENT that lists the idle events. These are events we can ignore when looking at overall database load. They could be relevant when looking at one session. For example, if a session is suppose to be running and we see lots of idle waits, its a signal that the application is inefficiently using the database. For example if the application is inserting 1000s of rows, but inserting one at a time, then a lot of time will be spent on communication between the application and the database session. In this case we will see the session spending a lot of time waiting on idle events when we'd expect it to be on CPU. The solution in that case would be to use batch processing. For example
Batching vs Single row operations
Single row operations, lots of communication , slowest
Send all the info over the wire to the database, one communication
Send all the info over the wire to the database, one communication, bulk insert
CPU is a special case of "wait events". There is no wait event called "CPU" and when we are on CPU we aren't considered to be waiting but actually working. Working is considered a good thing (assuming the work is worth getting done). We might or might not be doing efficient work but that's a different discussion. CPU is one state that a sessions can spend it's time in though, so the CPU state belong in the list of "wait events" or better put "session states". CPU is pivotal in analysis of performance because the relative importance of time waited is all relative to how much time we spent on the CPU. If we spent little time both on CPU and wait time then we weren't doing much. If we spent a lot of time on CPU and little on wait time, then the waits don't matter but if we spent a lot of time waiting and little on CPU then we have an opportunity to improve throughput and response time.
How do we know if a session is on CPU? Oracle doesn't report any such session state directly but it does give us enough information to deduce it. In V$SESSION (or V$SESSION_WAIT before 10g) these fields let us know
Both of these fields are in V$SESSION starting in 10g. Before 10g V$SESSION_WAIT has to be joined to V$SESSION to get both of these fields.
IO is a wait but it is a wait that all databases have to so at some point in time. Data has to be read of disk and reading off of disk causes IO wait time. Whether IO wait time is acceptable or not really depends on the application but we can make some general observations such as an optimize disk subsystem should be able to render reads in 10ms. If not and the system spends a lot of time waiting for IO then there is the opportunity to improve throughput. IO waits break down into groups such as IO done by sorts that overflow memory buffers and a written to disk in the temporary tables and read out again. If we see this happening we can investigate increasing memory sort area sizes. Finally we can check the db cache advice and see if there would be any benefit to change the buffer cache size. Once these areas have been checked, it really comes down to investigating the SQL and seeing if they can be optimized either through change the SQL or adding structures such as indexes or if the application logic or architecture can be optimized.
In general if we see other waits and they are important relative to the amount of CPU time spent then we have a clear opportunity for performance tuning.
Waits are classified by WAIT_CLASS
Over 800 waits, though almost 75% of the are "Other", ie they shouldn't happen in normal circumstances.
Not only does Oracle collect the active sessions, their session state and they SQL they are executing but Oracle collects a lot more information and stores it in a table V$ACTIVE_SESSION_HISTORY
Looks like a lot of fields! Let's break it down into bite sizes logical pieces:
Of these fields the basic information is
ASH is multidimensional data that can be group in many ways.
Top Waiting Session in last 5 minutes
SQL_ID CPU WAITING IO TOTAL
STATUS SESSION_ID NAME PROGRAM CPU WAITING IO
Note on Active or Waiting
ASH family of tables
Circular Buffer - 1M to 128M (~2% of SGA)
Flushed every hour to disk or when buffer 2/3 full (it protects itself so you can relax)
Avg row around 150bytes
3600 secs in an hour
~ ½ Meg per Active Session per hour
That’s generally over an hour of ASH
Dumping ASH to flat file
level 5 = # of minutes
loader file rdbms/demo/ashldr.ctl
statistics_level = Typical (default)
10.2 added fields to ASH
Blocking Session !
connect to ALL_PROCEDURES with
How much data does ASH Collect ?
1 CPU means max 1 Avg Active Session unless there is a bottleneck
ASH is now at 93 fields in 11gR2, starting from an original 30 in 10gR1
Here is a spread sheet across 10.1.0, 10.2.0.1, 10.2.0.3,11.1, 11.2
for 11gR2 documentation see:
10.1 10.2 10.2.0.3 11.1 11.2
aveact.sql - day of data, 1 line per hour
aveactn.sql - - day of data, 1 line per hour, show the top to waits as well as graph
aveactf.sql- same as above but finer frequency ( 1 hour of data, line for every minute)
aveactnf.sql - same as above but finer frequency ( 1 hour of data, line for every minute)
Wait Event Docs >