Back in 1987 I wrote a user manual for the Occam concurrent programming language for Inmos, and I've always kept up an interest in parallel processing. Occam more or less ceased to be used after Inmos stopped production of the Transputer and then closed down, but there has been a recent revival from an interesting source. A group of teachers and students of Allegheny College in Meadville, Pennsylvania and the University of Kent, UK, have started teaching Occam for robotics control and other embedded systems (www.concurrency.cc). They're using an interpreter they wrote themselves called the Transterpreter (www.transterpreter.org) and they've ported this interpreter to Atmel's Atmega328 processor, which forms the core of the Arduino control board used by many artists and hobbyists (www.arduino.cc), where it runs in a skinny 20KB of flash memory.
As for myself, nowadays I experiment with a simulated system of my own design called PriPar (for Primitive Parallel computers), a software simulation of networks of parallel Turing machines. My PriPar system is implemented as a Ruby class of Turing machine objects, each of which can run its own local program that makes or removes simple marks from a simulated "tape". I chose to employ "half-infinite" tapes with a fixed starting point and then running off (in theory) to infinity on the right: in practice of course PC memory eventually limits their size. And unlike the classic Turing machine, my PriPar machines are able to send messages to one another.
Although each PriPar machine is a full-blown Turing machine, capable in theory of running any sequential program, in practice I discovered a small repertoire of commonly used routines that I employ as a higher-level language for writing programs. These routines resemble those in dataflow languages: "Inject" reads a tape and sends its contents to another machine; "Tally" receives a stream of messages and writes them to a tape; various other routines implement simple integer arithmetic and logic functions.
A Turing Program
Here is the raw Turing Machine code for one of the PriPar primitive operations Multiply:
Multiply = ["1X~2",
Each opcode consists of four parts: for example 2X/3 means "if you are in state 2 and looking at a mark (X), then send a message (/) and proceed to state 3. There are two different instructions to receive messages: ? means wait for a message and write a mark on the tape, while ~ means wait for a message and proceed leaving tape unchanged.
My version 1 machines used a communication protocol based on Occam/CSP where ! means send a message and ? means wait for a message before proceeding - messages were contentless events that either happened or not. This worked well but sacrificed a lot of parallelism while waiting for output. My latest version 2 machines instead send an actual mark, "/" or a blank " " which gets written onto the other's tape, which is far more efficient (compare the programs sequential add.rb and parallel add.rb in the two versions): in effect the message channel becomes a shared tape.
One odd feature of the PriPar system is that rather than avoid deadlock it actually courts and exploits it. The Ruby harness in which a parallel PriPar program is embedded terminates the running program when it detects that all processes are simultaneously waiting - that is, deadlocked. (I even thought about calling them "Deadlock Machines"...) This is a sort of cheat, because each PriPar machine does have a proper halting state 0, and I could have insisted that programs terminate with all processes in state 0. However I've found that the deadlock alternative simplifies programming enormously, at the cost of sometimes producing a spurious result which can be fixed by introducing a synchronisation barrier (see below). The halting problem for parallel programs is a very deep one, and I find this an acceptable compromise.
I've also added Barrier Synchronisation, which enables complex programs to be divided into shorter sections that enforce synchronisation: at the barrier, all processes have to catch up with one another, then the next section is started. This scheme also enables dynamic programming, where the program running on one or more machines gets changed at the barrier - it's even possible to create new machines at the barrier and start them running. You observe the results of a PriPar calculation using the built-in trace function which animates activity on the tapes.
PriPar program syntax
The principal operators used to write PriPar programs are as follows:
• par(<procname>, <tape>, <program>, [<tracelevel>])
Creates a new PriPar machine called <procname>, loads a string <program> that represents a Turing program into it, and assigns a string <tape> representing a tape containing data in the form of marks, eg. "/////////". par only creates and initialises the PriPar machine but does not start it running. The optional <tracelevel> parameter may be 0 (display nothing), 1 (display tape contents) or 2 (display process state and tape contents). Omitting <tracelevel> is equivalent to 0.
A shorthand for describing tapes, which returns a string of <num> marks, so x(4) gives "////".
• <barname> __BARRIER__
Create a synchronisation barrier called <barname>, such that all processes declared with par before the barrier will run to it and wait.
• run_to <barname>
Run all the PriPar machines declared before <barname> until all are waiting.
• tape(<tape>, <proc>)
Return a string containing the contents of <tape> and the number of cycles that <proc> has executed. Used to display final program result, as in "print tape(t1,B)".
Here's a flow diagram for the PriPar program whose source code is in the box at right - it calculates the values of a quadratic expression. Note that two PriPar machines can read and write from the same tape. The source for this program is included among the attachments below as quadratic.rb.
A Sample PriPar program
require 'priparlib.rb' # Program to compute Y=3X^2+2X+5Y = "" X = x(10) # compute it for X=10 # compute 3X^2par("P1", X, Inject) par("P2", X, Multiply)par("P3", x(3), Multiply)
# compute 2Xpar("P4", X, Inject) par("P5", x(2), Multiply) # the constant 5par("P6", x(5), Inject)
# add up the termspar("Out", Y, Tally)
# configure topologyP1.__to__ P2P2.__to__ P3P4.__to__ P5P3.__to__ OutP5.__to__ OutP6.__to__ OutB1 = __BARRIER__run_to B1print tape(Y, P1)
Configuring the topology of a PriPar program
Each named PriPar machine has a method called .__to__ which declares a communication link to another machine, as in A.__to__ B. Machine A can now send messages to machine B by executing a / instruction, and B can receive messages by executing ? or ~ and waiting.
Communication is many-to-one: each machine may only have a link to one other machine, but may receive messages from many other machines, so
where B receives from both A and C is a legal configuration but
where A sends to both B and C is not legal because A can only have one outgoing link.
I have no idea where the limits on PriPar computation are yet. So far I can compute polynomial expressions, perform a crude kind of sort using either a tree of PriPars, or dynamic program changes at each pass, and I can add up terms of an arithmetic series. I can't so far compute the factorial function properly, though I've written a "cheat" version just for timing purposes which uses external Ruby code to put the successive integers into a series of tapes. I think that PriPar might need to be extended to allow each machine to output to multiple others to avoid this cheat: then one machine could set up the tapes for all the others. Multiple communication channels per machine is the next obvious enhancement.