Errorstack tracefiles are very useful for troubleshooting ORA-600's, crashes, hangs and even just bad performance.
Errorstack dumps are dumped automatically by Oracle when critical errors such as ORA-600 happen. This is when you see an error reported in alert.log and the generated tracefile has a "ksedmp: internal or fatal error" string in its beginning, followed by the error code (usually ORA-7445 or ORA-600 with some parameters).
"ksedmp" means Kernel Service Error DuMP, so everything below that line is the errorstack dump.
Errorstack dumps can also be manually invoked by issuing ORADEBUG ERRORSTACK 3 (when being connected to the target process using ORADEBUG SETOSPID). This can be useful when a session seems to be hung (but without showing a reasonable wait event in V$SESSION_WAIT) or is consuming much more resources than normally and you want to know which exact bind variable values are currently used for executing the SQL.
Errorstack tracefile has a lot of information it, some more and some less useful for practical troubleshooting. The vast amount of data and the binary dumps may scare you away, but I hope to show that there's some well understandable structure in there.
In this article we will go through following techniques:
I will paste the relevant parts of a sample errorstack dump wherever relevant.
Reading the current executing PL/SQL package and PL/SQL source code line number from errorstack trace
If the process which dumped the tracefile was in a PL/SQL call, then the PL/SQL call stack is also dumped (under the PL/SQL Call Stack section). This shows you exactly in which PL/SQL procedure this problem happened and which procedure calls led Oracle to the final procedure where the execution was when the dump was done.
The PL/SQL call stack (bold/italic above) reports 3 columns:
To recap, the above little demo PL/SQL call stack consists of 2 lines only:
You should read a PL/SQL call stack from bottom up:
You see, the line 8 reported above is where the PL/SQL execution currently was (it invoked the recursive DELETE query which was still running).
Simply, the PL/SQL errorstack says exactly where in the PL/SQL code Oracle was executing when this error condition, dump, crash, hang, or whatever was the problem, happened (top line was the code execution currently was and the line(s) below show the "parent" functions which called their child function seen on the above line).
If the stack thing doesn't make sense right now, bear with me, I will write a separate note about it.
I will also explain the "Call Stack Trace" part in another note as it's a longer topic.
Let say this:
In Oracle 11.2, the Real-Time SQL Monitoring feature makes things easier. There's a new column BIND_XML in V$SQL_MONITOR, which contains the bind variable values of queries running long enough. Accessing that view however requires you to have Diagnostics and Tuning pack license. You can read more about this feature here.
But let's get back to the original article, not everyone is running on Oracle 11.2 and not everyone has Diag+Tuning pack licenses.
There's a reason why it's non-trivial to get the bind values - the bind variable values live in the private memory of the processes. So you can't easily go and read the private memory of another process!
That's why we need to use ORADEBUG to connect to the problem process and send a signal to it so it would dump the errorstack, which luckily includes CURSORDUMP, which includes bind variable values!
The red number 8213 in the below example is the OS process ID (V$PROCESS.SPID) of the target, problem session which I want to dump:
Now that target process should have written the errorstack to UDUMP (or BDUMP if its a background process). Note that you probably don't want to connect to background processes with ORADEBUG unless you really know what you're doing!
Of course, when you are diagnosing a crash or ORA-600 (which has already happened), then you don't need to use oradebug, as the crash handler should have dumped the errorstack with cursordump to a tracefile already.
So, lets open the same tracefile (beginning shown only).
When you start scrolling through all the (binary) data which comes below, you may go crazy as there's too much (incomprehensible) data there. So what's the systematic way to reliably navigate to the actual bind variables used?
What I usually do, is search for first occurrence of "Session Cursor Dump". Note that the case matters here.
When I search for it, I will see the following output (again I only paste some here, starting from the Session Cursor Dump I searched for):
Here you go, the Current cursor: 6 says that Oracle was currently executing Cursor#6 in the open cursor array in that session's UGA (the max array size is set by open_cursors parameter).
So, in order to find the bind variables of currently executing query, all I need to do is forward-search for Cursor#6 in this tracefile. Note that the case and the "missing" space between Cursor and #6 is intentional:
And here you go:
I will paste here again few last lines of the last output:
Under the cursor frame dump you see a bunch of lines starting with kxsc, which means Kernel eXecution Shared Cursor. In each line you see a field called siz, which tells the private memory allocation size for that particular allocation type (or what's the internal allocation size, to be correct).
You can just sum all these sizes together to determine a cursors memory usage.
Here are the meanings (which I have guessed) for the interested:
Finally, even though ORADEBUG is a widely used tool by Oracle support - be careful with it (for example, search what Metalink has to say about ORADEBUG DUMP ERRORSTACK) etc. If you are unsure, ask Oracle Support for "blessing" first.
I never connect to critical background processes such as LGWR, DBWR with oradebug unless I'm prepared to that the instance may crash. However these techniques have been very valuable in some cases where other instrumentation is not enough!