An itch.

After a few decades of maintenance programming, it was time for a change. I guess maintenance is where the money is, but it was getting a bit tedious. Peering over the code. Trying to figure out the programmer's intentions. Cursing the writer for its unnecessary complexity. And while doing so, I sometimes wondered why I was so busy while the computer was bone idle. Computers are really of very little help when you're not ready yet to make a coding change. Time to learn something else also, but what?

As a Linux user, I had to try some development on Linux. Had to try an open source language on it and see how friendly and reliable that is. I tried Python, recommended by many as powerful and elegant. And, as it turns out, it easily matches the commercial products I was used to. That was the language worked out, but how would one use it? What could be its front end? A web browser on every machine makes a very tempting universal front end. The Python I had learned, needed to be used in a web application. It would require more study of HTML and Javascript, but there would always be the multitude of information and working examples on the web.

What would this web application be about? What knowledge do I have that I could use in it? Do I have a detailed knowledge about dentistry, so that I could write a dental application? No I don't; I am a programmer. An application about programming then? Well, I have some techniques that I always use when learning about a program: I like to print it out and work on the listing. When it gets convoluted, I start scribbling variable values between the lines. I work those values out or, we're talking really desperate here, I obtain them with a trace in the program. The variables become a bit more meaningful that way. And I like to mark areas off with a highlighter. Really important loops and conditions have their start and end lines colored. Whether a line falls inside or outside an important area, becomes immediately obvious. It helps splitting the program up in functional areas as well.

That's an application I'll use Python for: CodeInvestigator. A computerized desk-check tool that lets you interact with code. A tool that helps you understand code. A computer version of all these things I do with a program listing.

  • It makes run-time information available without the usual restrictions of a debugger. It is all there, just ask.
  • It shows you what code is executed and what is passed over. It confirms the expectation you have about the code.
  • It shows the block structure by highlighting blocks. What code controls what blocks becomes immediately obvious. Python, with its master stroke of forced indentation, probably does not need that as much as other languages do.

Can an application like this be written? I have never seen anything like this before. Maybe it just can't be done, and I am missing something. Let's see if this idea holds water, or where I am going wrong. Speed and resource usage can be ignored for now. And I would like to know if this idea is universally applicable without relying on specific implementations.

Run-time data.

There is a lot of run-time information to store, and retrieval of it is random, determined only by the part of the program that is being viewed. We need a database for that. Any database will do, but an SQLite database is chosen because it does not need to be installed, it comes with the standard Python install. The collection of run-time data is achieved by adding code to the program. This additional code writes run-time information to the database while the program runs. All the changes required for the run-time data collection, are in the user program. The language implementation does not need to change. The disadvantage of that is that the code involved in the tracing can't be traced itself. The code that does the tracing relies on some existing code. That existing code can't be traced. Something similar happens when you use print in a trace. That only works as long as you don't use print inside a function that implements print. It's a disadvantage that not everything can be traced, but not a big disadvantage. CodeInvestigator needs to work on top of an existing layer of 'trusted code' that does not require any tracing and is excluded from it. The database routines are an example of routines that can't be traced. They are in this 'trusted code' layer. CodeInvestigator can't be used for these low level functions.

Adding trace code to a program creates a much larger, generated, trace-able program that is run instead. When it is run, it does whatever the original program did, plus the additional tracing. The hard part is the addition of this tracing code. I achieve that by replacing every statement in the program with its tracing equivalent. An assignment, for instance, gets the following treatment:


The user interface.

Ease of use needs to be paramount. I can't expect that the user knows anything about the program that is being desk-checked. I can't expect that the user makes any code changes before the desk-check. All a user has to do is click a program file in a list of Python programs. This program has tracing code added, is run, and another run is created. All runs are shown on another list and the user starts viewing one, by clicking it. The program code is then presented as if in an editor.

Everything happens by clicking on words in the code. CodeInvestigator, if it needs to display anything, displays its information behind and between the affected code in the same window. Lines in the code move apart to make way for whatever CodeInvestigator has to display about the clicked word.

The web server.

A web application requires a web server. So obviously, I started off with an Apache server which worked very well. A bit more effort and it could even be set up as a secure server to encrypt all network traffic. The mod_python module was used for all the tedious web tasks like sessions, cookies and dynamic HTML pages. For the casual user though, this meant too much installation effort, and a simpler setup was needed. Apache had to go, and with it mod_python and all its goodies. Was there a web server that did not require any setup and was available on all platforms? There was with GCIHTTPServer, the server included in the default Python install. It is used to start Python scripts on the server. A Python script can do anything, even produce HTML. So instead of getting HTML from a file via an Apache server, the HTML is generated by a Python script that is started by GCIHTTPServer. Sessions under this server had to go because of the programming effort that it required. Sessions are needed when two users need to look at the same run simultaneously. It had to go to get a simpler install. There is no support for multiple users under this server.

The browser.

The interactivity of the code requires animation on the web page. Code-lines need to move apart to make way for runtime details that is displayed in the gap. It creates the illusion that the run-time information already existed under the code. Clicking a word in the code only reveals what was already there. This accentuates the difference between code and run-time information. Fortunately there were a few JavaScript libraries available that I could use for this: Scriptaculous, Prototype and Moo.fx.

It takes too long for run-time information to be fetched from the database after a word is clicked. The response to a word-click needs to be readily available on the web page. Everything that can be clicked on a page needs to have its response ready. That requires a buffer in JavaScript. And, to save on storage, it only needs to apply to words that are currently visible in the window. When the user scrolls down, more words become visible and information is retrieved for these words. Information is removed from the buffer for all the words that have scrolled off the page. Scrolling is a trigger that initiates some activity in the background to update this buffer. Another trigger that initiates the same activity, is the selection of another iteration in a loop. A loop can show all its iterations so that the user can select one. Run-time data for the code inside a loop relates to the selected iteration. Selecting another iteration means getting a different set of run-time data for the code inside the loop.

The current iteration of every loop in the program needs to be stored in JavaScript. If the user scrolls down to a previously visited loop and an iteration was selected during that visit, then the same iteration should be shown this time around. The previously selected iterations are passed to the server when run-time data is requested from it.

A browser requests data from the server by sending an HTTPRequest. This is the default Ajax-way to ask for something from a web server. The server responds by sending XML data to the browser's responseXML. responseXML is an object just like the browser's DOM tree, it just does not display anything. The response could have been send to responseText but responseXML is preferred because it makes it easier to add structure to the response. Browsers are very good at parsing XML, so speed is not an issue.

There is a lot of scrolling in the code and it's quite hard to find your way back to what you were reading after a short excursion somewhere else. A bar along the right side of the code makes it easier for the user to find their way back. The bar appears on a page when a user scrolls away from the page; it marks the page so the user can scroll back to it. If you don't scroll for over 2 seconds then whatever is on your screen is deemed to be your main interest and it will be marked when scrolling away from it. This period can be changed in CodeInvestigator's settings.

Any web page is stored by Firefox in its Back-Forward cache. It allows Firefox to quickly redraw a page after Back or Forward is pressed without involving the server again. CodeInvestigator uses this cache to quickly switch between the view of the main program and views of function calls. Function calls can be viewed alongside the main program and the user can switch between them. The page saved by Firefox includes its JavaScript and changes to the DOM tree. This is very convenient because user interaction results in changes to JavaScript and changes to the DOM tree. A selected iteration, for instance, is stored in JavaScript and the value of an variable is shown on the page, by inserting HTML in the DOM tree. A restored page shows the previous user interaction and it lets the user return to a page and find it how it was left. Alas, the Back-Forward cache isn't limitless and no more than around 5 pages are held. A user can view more than 5 function calls simultaneously and can return to view one who's page isn't held in cache. If the HTML and JavaScript were requested again from the server, the user would return to a page that wasn't the way it was left. All previous clicks would have disappeared from it. To make it the same as it was left, all user interactions are stored locally in Firefox's When a page is re-send from the server, the interactions are locally re-applied to it.

Program history.

Parts of the program are likely to be executed more than once. A user is able to view any execution of any part in the program and CodeInvestigator needs to find it in the database to show it to the user. To achieve this I have split the code up in blocks.

A simple program with nested blocks.

This program consists of two blocks, one contained inside the other. Block one consists of all the lines 100 - 400 inclusive, while block two only has lines 200 and 300. Block one has only one pass through its logic whereas block two has three passes. Every pass in every block is assigned a unique number and this number uniquely identifies a point in time for a block in the execution history. This unique number has the rather awkward name of workspace in my code. Lets stick with that name.

The workspace during the execution of the simple program.

A workspace applies to all the words in a block at a point in time. Workspace is used as a database key for all the words in a block at that point in time. Workspace can be seen to represent the execution history.

The tracing routines build a tree that records the linkages between workspaces. This tree is stored in the database and holds the program flow from block to block. This tree could potentially grow very large and is therefore broken up by function call. It is used again when values are requested from the database. The tree is then used to translate iteration information into workspace numbers.

The tracing code that tracks a loop, adds a workspace for every iteration it performs. When the loop finishes, the program returns to the enclosing code and the current workspace changes to the workspace of the enclosing code. The workspace mechanism uses a stack to achieve this nesting.

Not all calls can be clicked in the code.

The __getattr__ call is made when bob.surname is evaluated. That call is not visible and therefore can't be clicked.

To overcome that, I register indirect calls for each line. A line with indirect calls shows a triangle at the end of the line, and when that is clicked all the indirect calls for the line show in a list. The user clicks one of them to view its execution. This way these calls can be clicked at line level not at word level.

Clicking words.

Potentially a lot of different words and combinations of words in the first line could be clicked. The words mod, obj and attr could be clicked separately. Another good case could be made for the groups mod.obj and mod.obj.attr. If mod.obj were a click-able object what would be its value? If it consists of all its attributes, am I really interested in seeing all of these? It could hold information that is not even accessed by this program. I therefore decided that only attr in this line is click-able. It is the only attribute in mod.obj or even mod that I want to know the value of right now.

Lambda functions are so compact in their notation that clicking in them is nearly impossible. They are skipped by CodeInvestigator.

Tracing lower levels.

Not all import files can be traced. Lower level functions are used by tracing functions and can't be traced themselves. When a file is imported, CodeInvestigator needs to make a decision on whether to generate a trace-able version of the import file and import that instead. The criteria I use, is whether the imported file appears in certain directories or not. These directories can be specified in CodeInvestigator's settings and only files that are imported from these directories, are included in the tracing.

Event based applications.

Event based applications typically use a loop that registers events. If one occurs, a callback function is called that then deals with the event. The event loop where the actual callback call is initiated, is not visible.

A callback is an indirect call like the __getattr__ call mentioned earlier. All the callbacks are listed as indirect calls of the main loop. Click the triangle next to gtk.main() for instance to view these indirect calls and select the callback from that list.

Future plans.

When the trace-able program runs, it spends a lot of its time storing data in the database. There is no need for the program to wait for each store to finish, the store could easily run in parallel. It would be interesting to see if there was a performance gain if all database stores were buffered and processed by a parallel process. The same performance gain could also be possible when the database resides on a different server.

It could be useful if the user could add speech bubbles to the code. These marks could be comments on the code or helpful remarks about it. Often these insights are lost when the code isn't the focus of attention anymore.

The trace-able program does exactly what the original program does. If the original loops, the trace-able program will also loop, albeit a lot slower since it has a lot more work to do. Any uncaught exceptions raised in the original will also be raised in the trace-able program. CodeInvestigator expects a working, well-behaved program to be used for its investigations. Future versions could be more lenient. That way CodeInvestigator would be more of a development tool than a maintenance tool.