Implementation

The method for building knowledge bases is implemented following three steps of knowledge management: it begins with the definition of a descriptive model by the expert, then a questionnaire is automatically built. It serves to enter a case base and it generates a global view of each case for visual comparison. This knowledge acquisition phase is followed by the knowledge processing phase, for classification of cases in the frame of a common architecture, i.e. the descriptive model. Induction and case based reasoning methods are used. At last, the results of identification suggest some updating of cases and the descriptive model can be changed iteratively. This last phase is named knowledge validation.

I give below an illustration of the method that I have implemented on a small part of the marine sponge domain: the knowledge on genus Hyalonema from the expert Pr. C. Levi.

Knowledge acquisition:

The descriptive model corresponds to the definition of theoretical objects (concepts) that the expert observes and interprets in this domain of specialty: observable objects are structured in a description tree by nodes with links of different types (dependencies, specialisations, multi-instanciations).

For each of the objects, the expert defines their characteristics such as the attached attributes or characters, the object status, and the relations with other objects.

Each attribute is also defined with a type (numerical, nominal, classified, etc.), the range of possible values, the number of possible choices, and an associated question.

Some values are classified, i.e. they can be specialized in more precise sub-values, as it is the case for the shape of the body of Hyalonema sponges.

The domain is represented by the root node of the descriptive model, i.e. the object representing the genus Hyalonema.

The resulting expanded descriptive model of Hyalonema is shown underneath with the target list of species value (from the attribute class of object identification). In the next description phase of knowledge acquisition, each species can be covered by one or several descriptions of observed specimens, in order to represent the variability of taxa.

Then, a questionnaire that matches the descriptive model is automatically built; it allows any biologist to use it as an observation guide, to acquire observed descriptions and build a case base that is consistent with the descriptive model.

Each object is represented by a card in HyperCard with buttons as components of the object and a field representing the list of object attributes. Here is an example for the object "body" of the sponge with two fictitious components: macro constituants and micro elements. Some attributes of the body are also declared with default values, such as the color of the body of the Hyalonema sponge which is white.

When a description of a specimen is done in the questionnaire, each object card of the descriptive model is instanciated, i.e. a differential copy is done of the card, and the person can enter observed states of objects and attributes.

For example, the exhaling face of the body of the Hyalonema sponge is described on this specimen as having orifices (+), no central cone (-), and the person doesn't know if there is a cribled membrane or not (?).

The expert enters his case base in a reflexive design approach between what is observable in monographes and what is observed on the table. Whenever possible, he gives a class name to each case in order to make supervised learning from examples in the next processing phase.

As descriptions represent concrete samples in collection, there can have several cases for one class. The knowledge acquisition phase of the method is thus iterative between the definition of the descriptive model and the case base. It is the most important step of the work for delivering a robust knowledge base.

HyperQuest was the hyperCard software that I developed to design both descriptive models and multimedia questionnaires for any biological domains following the pre-defined above descriptive logics.

The strength of this tool is the facility with which you can design and program user interfaces that match your desires. Indeed, the interface IS the application for mastering the different tasks of building robust knowledge bases, and (HyperCard + HyperTalk) was a must for doing so nice interfaces in 1993.

Knowledge processing:

Then, in the knowledge processing phase of the method, two different types of technology are used depending on the goal to be achieved. For classification purposes, a decision tree is built using inductive learning from examples to characterize the named classes. For identification purposes, a case based reasoning strategy is used. It dynamically extracts the most efficient descriptors and produces better identifications than by following a path of a decision tree.

Kate was the software developed by M. Manago in Common lisp that was used for knowledge processing of cases represented in the Casuel language. On one hand, it extracted the best criteria relative to information gain at each node of a decision tree in order to produce an identification key. On the other hand, it was able to take into account unknown answers to the questions relative to these criteria by generating dynamic keys based on the ordered list of criteria.

Knowledge validation:

Nevertheless, inductive learning as well as the repetitive use of the questionnaire remains useful for detecting possible inconsistencies within the cases library, thus allowing a validation of the descriptive model. The method proposed here gives to the expert the ability to update the knowledge base according to the results obtained during classification and/or identification, and thus improve his descriptive model and case base in an iterative knowledge validation process.