The threading technique : ITC

extracted from Win32Forth Help / Win32forth reference /In-depth...

Win32Forth Help

The Threading Technique

One of the best descriptions of Forth on the web that explained the architecture of Forth systems (in terms even I could understand) was Brad Rodriguez' "Moving Forth" which first appeared in The Computer Journal #59 (January/February 1993). I'm immensely grateful for his permission to modify his original work and publish this set of articles on the internals of Win32Forth. Please take the time to read the originals; they contain much more in-depth information on building Forth systems than is contained here. I've been very selective in the sections that I've used.

I lay claim to little originality. The diagrams I have recreated to add some colour and to allow them to be printed with better resolution. I am also no wordsmith; any errors in here are probably all mine. Be as good to click on the top left icon on these pages if there's anything you spot that's misspelled (British spelling excepted; I'll have no other), technically wrong, unclear or just plain incomprehensible. I really want to know.

"Threaded code" is the hallmark of Forth. A Forth "thread" is just a list of addresses of routines to be executed. You can think of this as a list of subroutine calls, with the CALL instructions removed. Over the years many threading variations have been devised, and which one is best depends upon the CPU and the application. To make a decision, you need to understand how they work, and their tradeoffs.

Indirect Threaded Code (ITC)

This is the classical Forth threading technique, used in fig-Forth, F83 and Win32Forth, and described in most books on Forth. All the other threading schemes are "variations" on this, so you need to understand ITC to appreciate the others.

Let's look at the definition of a Forth word SQUARE :

    : SQUARE  DUP * ;

In a typical ITC Forth this would appear in memory as shown in Figure 1. (The yellow header will be discussed in a future article; it holds housekeeping information used for compilation, and isn't involved in threading. Brown is code; blue and cyan are CFA pointers, explained below).

Assume SQUARE is encountered while executing some other Forth word. Forth's Interpreter Pointer (IP) will be pointing to a cell in memory -- contained within that "other" word -- which contains the address of the word SQUARE . To be precise, that cell contains the address of SQUARE's Code Field, referred to in Win32Forth as the 'cfa' (code field address) or the 'xt' (execution token). The interpreter fetches that address, and then uses it to fetch the contents of SQUARE's Code Field. These contents are yet another address -- the address of a machine language subroutine which performs the word SQUARE . In pseudo-code, this is:

(IP) -> W

fetch memory pointed by IP into "W" register; W now holds address of the Code Field (the cfa)

IP+4 -> IP

advance IP, just like a program counter (assuming 4-byte addresses in the thread)

(W) ->  X

fetch memory pointed by W into "X" register; X now holds address of the machine code

JP (X)

jump to the address in the X register

This illustrates an important but rarely-elucidated principle: the address of the Forth word just entered is kept in W. CODE words don't need this information, but all other kinds of Forth words do.

If SQUARE were written in machine code, this would be the end of the story: that bit of machine code would be executed, and then jump back to the Forth interpreter -- which, since IP was incremented, is pointing to the next word to be executed. This is why the Forth interpreter is usually called NEXT.

But, SQUARE is a high-level "colon" definition -- it holds a "thread", a list of addresses. In order to perform this definition, the Forth interpreter must be re-started at a new location: the Parameter Field of SQUARE . Of course, the interpreter's old location must be saved, to resume the "other" Forth word once SQUARE is finished. This is just like a subroutine call! The machine language action of SQUARE is simply to push the old IP, set IP to a new location, run the interpreter, and when SQUARE is done pop the IP. (As you can see, the IP is the "program counter" of high-level Forth.) In W32F, this is called DOCOL:

PUSH IP

onto the "return address stack"

W+4 -> IP

W still points to the Code Field, so W+4 is the address of the Body

JUMP

to interpreter ("NEXT")

This identical code fragment is used by all high-level (i.e., threaded) Forth definitions! That's why a pointer to this code fragment, not the fragment itself, is included in the Forth definition. Over hundreds of definitions, the savings add up! And this is why it's called Indirect Threading.

The "return from subroutine" is the word EXIT, which gets compiled when Forth sees ';'. EXIT just executes a machine language routine which does the following:

POP IP

from the "return address stack"

JUMP

to interpreter

Walk through a couple of nested Forth definitions, just to assure yourself that this works.

Note the characteristics of ITC: every Forth word has a one-cell Code Field. Colon definitions compile one cell for each word used in the definition. And the Forth interpreter must actually perform a double indirection to get the address of the next machine code to run (first through IP, then through W).

ITC is neither the smallest nor the fastest threading technique. It may be the simplest; although DTC (see Brad Rodriguez papers) is really no more complex. So why are so many Forths indirect-threaded? Mainly because previous Forths, used as models, were indirect-threaded. These days, DTC is becoming more popular.

So when should ITC be used? Of the various techniques, ITC produces the cleanest and most elegant definitions -- nothing but addresses. If you're attuned to such considerations, ITC may appeal to you. If your code fiddles around with the insides of definitions, the simplicity and uniformity of the ITC representation may enhance portability. ITC is the classical Forth model, so it may be preferred for education. Finally, on the x86 family, CALL and RET are more expensive than the double indirection.

There are other techniques; DTC and STC are the common alternatives. For details, please see Brad Rodriguez' paper. .

Document : w32f-arch-itc.htm -- Dec 12 2004 -- alex_mcdonald

Google Sites

Report abuse