Multi-file and Multi-module Knowledge Bases

Physically Including Files: the #include Directive

Sometimes we might want to split the knowledge base into separate files and include those files physically into one file.

The reasons can be many:

  • Sharing: same file may be reused as part of several other files (e.g., a file containing some common definitions).
  • Separation of concerns: the knowledge base may consist of several distinct parts (e.g., students and courses) and it is preferable from the maintenance point of view to not mix them in one file.
  • Organization: the different parts of the knowledge base are maintained by different people and we don't want these different people to simultaneously edit the same file.

To help with such maintenance issues, Ergo provides a preprocessor very similar to the one used in C/C++. In particular, it provides the familiar preprocessing directives like #define, #include, etc. The syntax is highly compatible with the C/C++ preprocessor. For instance:

#define  MIN_GRADE   B
#include "common_defs.ergo"
#include "students.ergo"
#include "courses.ergo"

..... other statements (rules/faces/queries) .....

The result will be that all the included files will be pulled in and this will result in one unified file.

When loading such a unified file, Ergo checks if any of the included files is newer than the result of a previous compilation of the unified file. If so, the unified file is recompiled. If not, the load command will just succeed without reloading. Ditto for adding (instead of loading) files that have #include statements.

Unix-style path names (with forward slashes) are understood universally and irrespective of the actual OS under which Ergo runs. So, for portability, it is recommended to use only Unix-style path names relative to the directory of the host file that contains the include-directive (e.g., "../abc/cde.foo").

If one absolutely must use Windows-style path names for some reason, keep in mind that backslashes in path names must be doubled, e.g.,

   #include "..\\foo\\bar.ergo"  

Building Knowledge Bases by Adding Files

We already discussed that when a file is loaded into a module (usually into module main, by default), all the information in that module is wiped out. This is convenient in case we want to initialize a new or reuse an existing module. However, sometimes we might want to accumulate information (both facts and rules) coming from different files and not wipe out what was there before. This is achieved by adding files to modules. The commands for adding are

  • into main:
    • [+file].
    • add{file}.
  • into another module (that module can be main also):
    • [+file>>module].
    • add{file>>module}.

We explain the benefits of loading/adding into modules below.

When building a knowledge base by loading and adding, the first file (usually the largest) is loaded and all the subsequent files are added. The files being loaded or added can #include other files.

Note that loaded rules are faster than added rules. (Not hugely, but non-trivially: sometimes twice as fast.) Therefore, even though the first file can also be added (to an empty module), it is recommended to use loading for as many as possible rules.

Modularizing ErgoAI Knowledge Bases: Concept and Rationale

Another method to split ErgoAI knowledge bases into portions is modularization. Modules are containers of knowledge and are associated with groups of files dynamically, via loading and adding.

Modularization also enables hiding of information. Modularization is orthogonal to the aforesaid file-centric methods (#include, adding to the main module) and should be used together with them.

Reasons for modularization:

  • The aforesaid reasons, sharing, separation, and organization still apply. However,
    • modularization is used when each module can be viewed as a separate knowledge base in its own right
    • modules usually include other non-module files (via #include), so modules are a way to organize knowledge at a higher level

Additional reasons:

  • Proper modularization can greatly boost the performance
  • Different modules interact with each other via strictly controlled interfaces, which reduces unintended interference and the danger of hard to find bugs
  • Modules can be encapsulated thus further reducing the channels through which they can interact with each other (and thus reduce the danger of unintended interactions even further)
  • Modules can be debugged separately

To illustrate the non-interference point, suppose that one file, perm_worker.ergo, defines all kinds of predicates having to do with permanent workers and another, temp_worker.ergo, does the same for temporary workers. Since these files might be written by different people (or the same person at different times that are far apart), they may both use the same predicate, salary, for computing weekly salary:

In perm_worker.ergo:

salary(?Wrks,?Sal)  :-
        rate(?Wrkr,?R), hours(?Wrkr,?H), 
        sickdays(?Wrkr,?Sick),
        ?Sal \is min(40,?R*?H+?Sick).

In temp_worker.ergo:

salary(?Wrkr,?Sal) :- 
       rate(?Wrkr,?R), hours(?Wrkr,?H), 
       ?Sal \is ?R*?H.

If these two rules are both #include'd in one file or added to the same module then the query

salary(John,?Sal).

(assuming John is some worker) will return two salaries: one as if John were a permanent worker and one as if he was a temporary worker. In general, things can be even more involved. For instance, the hours and the rate predicates can be defined differently for different kinds of workers. In a 1-module situations, different files can be hard to reuse because combining them with other files can lead to unexpected results. The development effort also requires higher degree of coordination to mitigate the most severe consequences of the above problems.

The right solution in such cases is to load the above two files into two different modules. For instance:

[perm_worker>>permwork, temp_worker>>tempwork].

Then we could write the following rule in module main:

salary(?W,?S) :- salary(?W,?S)@permwork ; salary(?W,?S)@tempwork.

The query salary(John,?Sal) will then return a single answer because the information about John will be present only in one module, since John is either a permanent or a temporary worker, but not both. Thus, only one of the calls -- salary(...)@permwork or salary(...)@tempwork -- will return a result.

Multi-module Knowledge Bases -- Details

Here we discuss how multi-module knowledge bases are constructed and used.

How are Multi-module Knowledge Bases Constructed?

One module, typically main, assembles the entire system by loading various files into other modules. For instance, a file called top.ergo, containing something like

?-  [file1>>mod1,
     file2>>mod2,
     file2>>mod3
    ].

might be loaded into main. That file will also normally contain one or more embedded queries that will start the ball rolling:

?- query1,
   query2.

Each of the above files may have their own "submodules," which they might load in turn. For instance, file2.ergo might contain its own rules and data plus a loading instruction like

?-  [file2_1>>mod2_1,
     file2_2>>mod2_2
    ].

How Do Modules Refer to Each Other?

Normally literals in the rules, facts, and queries refer to literals in the same module. For instance, in

p(?X) :-  ?Y[name->?X], q(?Y).
q(?X) :-  r(?X,?Y), ?Y[price->10].

all literals are referring to the current module, i.e., the module in which the above rule was placed via loading, addition, or insertion. In particular, the literal q(?Y) in the body of the first rule refers to the literal defined by the second rule. However, the whole point of the multi-module knowledge bases is to allow literals in the rules/queries in one module to refer to the literals defined by the rules in another module. For instance, in

p(?X) :-  ?Y[name->?X], q(?Y), q(?X)@foo.
q(?X) :-  r(?X,?Y), ?Y[price->10].

the literal q(?Y) refers to q(?X) in the head of the second rule in the same module. However, the literal q(?X)@foo refers to the literal q(?X) defined by some rules (not shown) that live in a different module foo. Although both literals use the same predicate name (q), they are in fact completely different predicates and are more than likely defined by very different rules. The @foo part of the literal is called a module reference. A module reference is needed only when referencing a module different from the current one.

Here are the general rules of reference to modules other than the current one:

  • Module references (e.g., @foo) can appear only in rule bodies and queries.
  • Module references are not allowed to appear in rule heads (or fact assertions), because this flies against the very purpose of modularization. Thus, a rule like the following
p(?X)@foo :- ...

will be rejected by the compiler. The purpose of a module is to hold rules (and asserted facts) that define literals that are encapsulated in that module. Putting a module reference in a rule head would thus be tantamount to one module defining information that is supposed to belong to a different module. Breaking encapsulation in this way would defeats the purpose of modules.

If you are tempted to put a module reference @foo in a rule head, then instead arrange to put that rule into the module foo. When doing so, you may need to rewrite that rule, e.g., to add/change module references in that rule's body.

    • A module reference can be a variable, but the variable must be bound by the time the corresponding literal is called. For instance,
p(?X) :-  ?Y[name->?X, module->?Z], q(?Y)@?Z.

Here q(?Y) will be called in a module that will become known only at run time. It is expected that ?Z will get bound to a valid module name after evaluating the preceding literal.

  • Module references can appear as arguments in rule heads, but they must be reified:
Bob(knows, ${student(Bill)}).
    • Modules can change the contents of another module dynamically, subject to the rules of encapsulation -- see below and, especially, the ErgoAI Reasoner User's Manual, Section "Module Encapsulation." This can be achieved by invoking exported predicates and by inserting/deleting facts/rules in that other module, if this is allowed by the encapsulation policy of the module.

Prolog Modules

The above discussion refers to knowledge bases written in the language of Ergo and to Ergo modules. However, a multi-module knowledge base can also include Prolog (XSB) programs and these programs then reside in XSB modules. XSB has a default module, called usermod, and the other modules may be defined by the XSB system itself or contributed by users. To invoke a predicate that resides in usermod, use the @\prolog (or @\plg) module reference. For instance,

  • writeln(Hello)@\plg /* write a line of output to console, using a Prolog builtin */

As with Ergo module references, Prolog module references can occur only in rule bodies or queries. To refer to an XSB module other than usermod, use \prolog(modname) or \plg(modname). For instance,

  • shell('rm foo.bat')@\prolog(shell) /* performs a shell command, using a Prolog builtin */

There is also a way for Prolog modules to refer to Ergo modules, but this is a bit more complicated. The details are found in ErgoAI Reasoner User's Manual, section "Calling Ergo from Prolog".

Hiding (Encapsulating) Information within Modules

We already discussed the fact that modules are encapsulated in such a way that one module's rules are not allowed to define literals that reside in another module. There is more to it. A module can hide most of the literals (that it defines) from other modules and expose only a few. In this way, other modules will not be able to invoke the hidden literals by mistake. This is normally done in order to prevent hard-to-find errors and force the developers of the other modules to use only interfaces designed by the author(s) of the given module.

Exposing literals is called exporting and is accomplished via the :- export{} directive. The general rules are:

  • If nothing is exported explicitly then it is assumed that everything is exported.
  • Otherwise, only the literals explicitly mentioned in :- export{...} are exported. For instance:
:- export{p(?), ?::?, foo(?,?) >> bar}.

Here only the predicates p(?), foo(?,?), and the subclass relationship ?::? are exported. The predicate p(?) and :: are exported to any module that wishes to use them. In contrast, f(?,?) is exported only to the module bar. This means that if the above declaration occurs in, say, module abc then trying to invoke foo(...,...)@abc from within any module other than bar will result in an error. Likewise, calling any other (non-exported) predicate in the module abc from within other modules will cause an error.

    • Multiple export statements are allowed, so it is not necessary to cram everything into one statement.
    • Trying to insert (via insert{...}) or delete (via delete{...}) any fact or rule in a module that has an explicit export{...} will result in an error unless such operations are explicitly allowed by the export directive. See the ErgoAI Reasoner User's Manual, Section "Module Encapsulation."