4 - Advanced topics

Input, output, Logging

You can use the Python input() function (input_raw() in IPython) to request data from the user :

pyDatalog.create_terms('X,Y, quacks, input')
quacks(X) <= (input('Does a '+X+' quack ? (Y/N) ')=='Y')

Please note that the prompt string contains a variable, X. Without it, e.g. in (X==input('Enter your name : '), the user would be prompted when the clause is declared, not when it is used during query evaluation.

In Python 3, you can use the print() function to show a result on the console:

pyDatalog.create_terms('print')
ok(X) <= (0 < X) & (Y==print(X))

In Python 2, print is a statement, not a function. So, you would need to create your own print_() function:

def print_(x):
    print(x)
pyDatalog.create_terms('print_')
ok(X) <= (0 < X) & (Y==print_(X))

To get a full trace of pyDatalog's reasoning, add the following instructions before a query:

import logging
from pyDatalog import pyEngine
pyEngine.Logging = True

logging.basicConfig(level=logging.INFO)

Thread safety and multi-models

A Python program may start several threads. Each thread should have these statements to initialize pyDatalog :

from pyDatalog import pyDatalog, Logic
Logic() # initializes the pyDatalog engine

Each thread can then define its own set of clauses, and run queries against them. See the ThreadSafe.py example.

Alternatively, threads can share the same set of clauses, by following these steps :

  1. the initial thread defines the set of clauses as usual
  2. the initial thread passes the Logic(True) object to the threads it creates.
  3. the new thread executes Logic(arg), where arg is the Logic(True) object it received from the calling thread.

Please note that, once passed, the set of clauses should not be changed (such changes are not thread safe).

Finally, a program may switch from one set of clauses to another :

Logic() # creates an empty set of clauses for use in the current thread
# add first set of clauses here
first = Logic(True) # save the current set of clauses in variable 'first'
Logic() # first is not affected by this statement
# define the second set of clauses here
second = Logic(True) # save it for later use
Logic(first) # now use first in the current thread
# queries will now run against the first set of rules

See the Multi-Model.py example for illustration.

Dynamic datalog statements

Some applications need to construct datalog statements dynamically, i.e. at run-time. The following interface can then be used.

A Python program can assert fact in the in-memory Datalog database using assert_fact(). The first argument is the name of the predicate :

from pyDatalog.pyDatalog import assert_fact, load, ask
# + parent(bill, 'John Adams')
assert_fact('parent', 'bill','John Adams') 

A python program calls the load function to add dynamic clauses :

# specify what an ancestor is
load("""
    ancestor(X,Y) <= parent(X,Y)
    ancestor(X,Y) <= parent(X,Z) & ancestor(Z,Y)
""")

It can now query the database of facts :

# prints a set with one element : the ('bill', 'John Adams') tuple
print(ask('parent(bill,X)')) 

Predicate Resolvers written in pure python

A predicate such as p(X,Y) or (a.p[X]==Y) can be resolved using a custom-made python method, instead of using logical clauses. Such python resolvers can be used to implement connectors to non-relational databases, or to improve the speed of specific clauses. It has precedence over (and replaces) any clauses in the pyDatalog knowledge base.

Unprefixed resolvers

An unprefixed resolver is used to resolve an unprefixed predicate. It is defined using the @pyDatalog.predicate() decorator, and by adding the arity to the predicate name:

@pyDatalog.predicate()
def p2(X,Y):
    yield (1,2)
    yield (2,3)
print(pyDatalog.ask('p(1,Y)')) # prints == set([(1, 2)])

This simple resolver returns all possible results, which pyDatalog further filters to satisfy the query. Often, the resolver will use the value of its X and Y arguments to return a smaller, (if possible exact) set of results. Each argument has a .is_const() method : it returns True if it is a constant, and False if it is an unbound variable. If it is a constant, the value can be obtained with .id (e.g. X.id).

Prefixed resolvers

A prefixed resolver is defined in a class. The following resolver is equivalent, and replaces, Employee.salary_class[X] = Employee.salary[X]//1000.

class Employee(pyDatalog.Mixin):
    @classmethod
    def _pyD_salary_class2(cls, X, Y):
        if X.is_const():
            yield (X.id, X.id.salary//1000)
        else:
            for X in pyDatalog.metaMixin.__refs__[cls]:
            Y1 = X.salary // 1000
            if not Y.is_const() or Y1==Y.id:
                yield (X, Y1)

This method will be called to resolve Employee.salary_class(X,Y) and (Employee.salary_class[X]==Y) queries. Each resolver must have the same number of arguments as the arity of its predicate. Note that the arity is appended to the end of the method name.

The list of instances in a class cls is available in pyDatalog.metaMixin.__refs__[cls] . For SQLAlchemy classes, queries can be run on cls.session.

A class can also have (or inherit) a generic resolver, called _pyD_query. Its arguments are the predicate name and its arguments.

# generic resolver
class Employee(pyDatalog.Mixin):
    @classmethod
    def _pyD_query(cls, pred_name, args):

if pred_name == ' Employee.salary_class':

            if args[0].is_const():
                yield (args[0].id, args[0].id.salary//1000)
            else:
                for X in pyDatalog.metaMixin.__refs__[cls]:
  Y1 = X.salary // 1000

yield (X,Y1)

        else:
            raise AttributeError

Performance

The best performance is obtained with pypy, the JIT compiler of python. Please note that:

  • pypy needs a few seconds to run pyDatalog at top performance;
  • performance of pypy is significantly better when executed from the comand line / shell than in debug mode (e.g. in IDE or Eclipse/pyDev);
  • you should not measure performance of pypy with cProfile, as it prevents the JIT compiler to be effective.

A pyDatalog program will never beat the same program written in pure python, but:

    • a pyDatalog program run with pypy will have approximately the same performance as a program interpreted (uncompiled) by CPython;
    • there is a wide class of applications where speed of development is more important than speed of execution;
    • there is a wide class of application where other software components will be the real bottleneck (e.g. I/O or remote databases);
    • thanks to increasing speed of computers, pyDatalog will run fast enough in many cases;
    • you can first prototype a program in pyDatalog, then rewrite the few performance-critical clauses in pure python, using python resolvers.

Performance tip:

    • The results of the evaluation of clauses in a query are memoized only for the duration of the query. For better performance, combine consecutive queries into one using the "&" operator, or create a clause for the combined result.