I was inspired around 1994/1995 when I saw my first Oracle graphic monitor at EuropeCar written in a week or so in TCL by a guy names Jan Simon Pendry - a genius coder. There were other graphic monitors out there such as Patrol but I hadn't seen them and generally they were not performance monitors but systems monitors and not very good at Oracle performance. (one Patrol experience I had even brought the most massive machine at the time to it's knee, the Digital Alpha 8400 , because it ran inefficient queries!)
I then created my own monitor in TCL which at the time in 1995 was a super amazing language for graphic programming. ( Sadly TCL/TK never became popular thus didn't mature, but is still pretty decent.) At the time, I didn't know what to monitor so I monitored everything I could and gave it a simple interface. The interface consisted of a menu giving access to the graphs of several hundred Oracle statistics. I quickly honed in on the Oracle wait events now known as the "wait interface" The "wait interface" was unknown at the time but has since become the de facto standard (logically enough) for Oracle performance monitoring
I simplified the interface to only show wait events that had values greater than zero (or some low cut off). As wait events showed up I would dynamically display the a graph for the wait event.
The wait interface measures time spent inside the database by active Oracle sessions. I quickly found that the time ( and not the count ) was more important and total time was more important so I change the charts to show only time and to normalize all the charts to have the same maximum Y axis value, so that it was quick to see which chart had the highest value, thus the biggest bottleneck
In the above graphic is clear that the highest line is the "log_buffer_space" graph. This means that the database sessions (or users) are spending most of their processing time waiting on access to the log buffer. The log buffer can be increased in size improving the throughput. The above graph comes from Oracle 7 - yes, it's an old graph! As of Oracle 10, it's generally more of a question of improving write speeds to the log files on disk to clear out the buffer rather than making the buffer bigger)
in November 1999 I ran into Quest's Spotlight which had one little graph hidden away deep in the product on the wait interface. This unassuming little graph was a revolution to me. It stacked all the waits on top of each other - a super improvement for compacting the data
I copied the stacking idea, but added two crucial values
Ironically I proposed the idea to Quest when I was working at Quest who rejected the idea though since seeing it at Oracle in OEM where I successfully pitched the idea to get it implemented , Quest have now implemented it.
Including CPU along with the wait time in a stacked graph is now used by numerous monitoring tools available on the market such as Quest's Performance Analyzer, Confio's Ignite, Precise's Indepth/I3,
I then decided it would be better to stack the CPU time rather than overlay it as a line
This stacking of CPU and waits was the first major step in the new tuning paradigm of active sessions, or more precisely "average active sessions" aka AAS as coined by my esteemed colleague John Beresniewicz . I came up with the idea of charting sessions when I was working at oracle in 2003 on OEM and no one understood the time measurements in the first version of the graph which was in centi-secs on Y axis and the wall clock time on the X axis. I tried to come up with a more intuitive value, and that value was sessions. Active session count is the same as the time spent by active sessions in the database which is what this chart plots. John Beresniewicz coined the term "Average Active Sessions" or AAS.
In 2003 I mocked this up for Oracle to use in OEM 10g
Here's what OEM 10g looked like before I got to the team and the way it looked after the mock up ideas were implemented
the page on the left is unreadable at this size, but the one on the right is still readable as to what the database is doing ( the green is active sessions on CPU and blue is on IO, so there is about a 50/50 split on time on CPU and time waiting for IO. The green is well below the dashed red line, max CPU, so Oracle is not saturating the CPU on the machine)
Attempts were made to compact data more such that a user didn't have to scroll ( the red line indicates bottom of a typical screen)
but were not deemed important.
I further improved the page in 2004 by adding aggregates of top SQL and top session. The aggregates are done for the area defined by the grey rectangle at the right on the average active session chart (at the top). This grey rectangle is a slider window. You can click and slide it to look at spikes or different areas on the chart.
now I can see the load on the databse and where the load is coming from.
The slider is over a small peak in CPU usage (green) and to the bottom left I can see one SQL statement uses almost all my CPU, and voila the culprit - super easy to see and find.
Suggestions for user interfaces to show the impact of changing data cache size and seeing the estimated impact on session load :
The above mockup was an idea for configuring the size of the buffer cache and seeing the effect it would potentially have on AAS due to more or less IO read time.
In 2006 I left Oracle to work on a light weight version of active session monitoring Embarcadero. The active session monitoring in Oracle OEM was a huge improvement, but OEM was still heavy, complicated and slow. I yearned for something small light weight and fast!
The project was put on hold just before shipping, so I left to pursue independent consulting.
I wrote my own revised free version of the TCL monitor based on these ideas but not getting paid had it's drawbacks, so in the summer of 2008 I joined Embarcadero full time and am super excited about what we've produced: DB Optimizer !
This time Embarcadero took the project whole heartlidly and by July 2008 we had our first verison of DB Optimizer out!
With DB Optimizer there is much more access to data and increased flexibility. The selection window on the load chart can be sized to any value from 5 seconds to hours wide, and it can be filtered for user process or background processes not to mention by program module action etc. The Top Activity area below the chart includes not only SQL and Session but top events on the database and top Object IO. Plus, there is a whole detail section below giving details on any item clicked on in the Top Activity Section. Its fast, light weight, no database install or agents. All data displayed can be saved into a flat file and emailed to a DBA across the world who can analyze it without access to the original database and it works on Oracle, DB2, Sybase and SQL Server
The same designs have been adopted by other tools in the market such as Quest Performance Analyzer, Embarcadero’s DB Optimizer, Ashviewer, OEMlite, and Lab128.