Open Questions and Project Ideas

A Code-Based ELIZA Phylogeny

Goals: The narrow goal of this project is to create a phylogeny -- perhaps specifically a phylogenetic tree -- of ELIZA programs, based (primarily) upon the source code, and perhaps some manually-constructed meta-data). A more general goal is to create a methodology for code genealogy that would enable one to drop a collection of sources and output something like a phylogenetic tree, or some other way of describing which code was based on which other code.

Data: One would have to first gather many versions of ELIZA. We know the three "original" versions: Weizenbaum's MAD-SLIP version, Cossell's Lisp version, and Shrager's BASIC version, but there are many -- probably hundreds! -- of others floating around and accessible either directly (e.g., github) or indirectly (e.g., old publications or the internet archive). The meta-data for these would have to be carefully curated. Another thing that might be useful to gather is crowd-sourced manual translations of parts of ELIZA code into other languages.

Approach: Here is where we need to be a bit creative. There are standard methods of creating phylogenetic trees based upon either genome or protein sequences, or descriptions of organisms (e.g, Phylip, MrBayes). If we think of the code as analogous to the sequence, and the meta-data as analogous to the descriptions, a good first start might be to just run these through one of the standard methods. Another approach might be to learn how to measure the distance between parts of source codes (for example by asking programmers to compare them on a numerical "similarity" scale), and then apply these measures to the code, and use the distances to create the trees (as above). However, code has a well-defined semantics, and one could imagine a much more interesting (and difficult!) project of actually comparing the code bases at the semantic level, and then using this to create the phylogeny. The latter approach provides a much richer analysis of what was changed from one to the next version of ELIZA.

The Mystery of ELIZA's Teachability

The ELIZA code has a whole "teaching" section, in the sense of the user teaching ELIZA new patterns and responses on the fly. This is actually explicitly described on the very first page of Weizenbaums' 1966 CACM paper, and is (of course) the whole reason that the program was named ELIZA: "Its name was chosen to emphasize that it may be incrementally improved by its users, since its language abilities may be continually improved by a "teacher". Like the Eliza of Pygmalion fame, it can be made to appear even more civilized, the relation of appearance to reality, however, remaining in the domain of the playwright.". However, so far as we know, user teahability did not make it into any subsequent clone, and so this important feature was lost to history. (One immediate question is what to make of the phrase: "...the relation of appearance to reality, however, remaining in the domain of the playwright." ?!)