Online tutorial 0.13

This interactive online tutorial explains the Datalog language implemented by pyDatalog. Other tutorials in the left menu explain how Datalog can be used within a Python program.

Feel free to experiment with our Datalog console. Just copy and paste individual statement in your personal and persistent IPython console below (use Ctrl-Shift-V to paste, or use Paste in the browser menu)

click to open

(click to open the console in a separate window)

Datalog Tutorial

First, import the pyDatalog module:

from pyDatalog import pyDatalog

Then, declare the pyDatalog atoms (aka symbols) that we are going to use in this tutorial.

pyDatalog.create_atoms('parent,bill,ancestor,descendents,manager,first_remainder, X,Y,Z,N,N1,F,  factorial,odd,even, _split, format_')

The Datalog engine store facts and clauses. The following statement inserts the fact that Bill is the parent of John Adams:

+ parent(bill,'John Adams') # bill is the parent of John Adams

A fact can have zero argument (e.g. + it_rains()). Note that unquoted names must start with lower cases, and that quotes must be used for string with spaces or starting with an uppercase letter. Comments, line continuation and constant strings adhere to the lexical rules of python.

You can now query our database of facts :

print(parent(bill,X)) # X is 'John Adams'

Note that variables in the query start with an upper case (or underscore '_'), as is customary in prolog. A query returns distinct values only (similar to "Select Distinct" in SQL).

Logic clauses make the engine smarter. The following program explains recursively when X is an ancestor of Y : either X is the parent of Y, or X is the parent of a person Z who is an ancestor of Y. Note that <= means "if" (not the arithmetic comparison), and that the "or" is implemented by 2 clauses.

# specify what an ancestor is
ancestor(X,Y) <= parent(X,Y)
ancestor(X,Y) <= parent(X,Z) & ancestor(Z,Y)

print(ancestor(bill, X)) # X is 'John Adams'

It is possible to define literals with a mix of facts and clauses. Literals with the same name but with a different number of arguments, e.g. p(X) and p(X,Y)are considered different : each will have its own separate set of facts and clauses.

A Function is a special type of literal defining a unique value for each combination of the elements between brackets:

# the manager of bill is John Adams. Bill has only one manager.
+ (manager[bill]=='John Adams') 

print(manager[bill]==X) # X is 'John Adams'

If another fact specifies another manager for Bill, it will replace 'John Adams'.

The following example illustrates the use of comparisons.

# use expressions and recursion to evaluate factorials
+ (factorial[1] == 1)
(factorial[N] == F) <= (N > 1) & (F == N*factorial[N-1])

print(factorial[3]==N) # N is 6

Expressions can use the 5 operators (+,-,*,/, //). Note that :

    • equality/comparison predicates must be placed between parenthesis
    • the left hand side of the equality/comparison must be a variable or function
    • the variables on the right hand side of comparisons must be bound (pyDatalog does not solve equations !)

This 8-Queen solution illustrates how to use the 'in' operator for solving discrete constraint problems.

A variable can contain a list or tree (aka nested list).

# (nested) list
first_remainder(X, Y, Z) <= (Y==X[0]) & (Z==X[1:])
print(first_remainder((1,2), Y, Z)) # Y is 1, Z is (2,)

Built-in functions are:

    • (Y==len_(X)) : Y is the length of the list bound to X
    • (Y==range_(X)) : Y is a list containing numbers from 0 to X-1
    • (Y==format_(F, X1, X2, ..)) : Y is the formatting of X1, X2, ... according to the F specifications (see format() in python documentation)

For example:

print((Y==5) & (X==format_('Y is {}', Y))) # X is 'Y is 5'

A clause can specify an aggregate function in a clause head:

# the list of descendents, sorted by their name, and separated by ','

(descendents[X]==concat_(Y, order_by=Y, sep=',')) <= ancestor(X,Y) 

print(descendents[bill]==X) # X is 'John Adams'

The available aggregate functions are:

    • len_ (P[X]==len_(Y)) <= body : P[X] is the count of values of Y (associated to X by the body of the clause)
    • sum_ (P[X]==sum_(Y, for_each=Z)) <= body : P[X] is the sum of Y for each Z. (Z is used to distinguish possibly identical Y values)
    • concat_ (P[X]==concat_(Y, order_by=Z, sep=',')) <= body : same as 'sum' but for string. The strings are sorted by Z, and separated by ','.
    • min_, max_ (P[X]==min_(Y, order_by=Z)) <= body : P[X] is the minimum (or maximum) of Y sorted by Z.
    • rank_ (P[X]==rank_(for_each=Y, order_by=Z)) <= body : P[X] is the sequence number of X in the list of Y values when the list is sorted by Z.
    • running_sum_ (P[X]==running_sum_(N, for_each=Y, order_by=Z)) <= body : P[X] is the sum of the values of N, for each Y that are before or equal to X when Y's are sorted by Z.

The named arguments must be specified in that order. X and the named arguments can be replaced by a list of variables, to represent more complex grouping. Variables in 'order_by' can be preceded by '-' for descending sort order. If the aggregation function does not depend on a variable, use a constant (e.g. P[None] == len_(Y)).

Functions can be defined by a formula:

Employee.salary_class[X] = Employee.salary[X]//1000

Note the "=" (instead of "==") This is equivalent to (function[X] == C) <= (C == expression). There can be only one formula for the function.

You can use lambda functions in expression as well. For example:

# specify how a string can be split, reversibly
_split(X, Y,Z) <= (X == Y+'-'+Z)
_split(X, Y,Z) <= (Y == (lambda X: X.split('-')[0])) & (Z == (lambda X: X.split('-')[1]))
print(_split("a-b",X,Y)) # X is 'a', Y is 'b'

The following example illustrates that the depth of recursion is not limited.

# unlimited depth of recursion
+ even(0)
even(N) <= (N > 0) & odd(N-1)
odd(N) <= (N > 0) & even(N-1)

print(even(2000)) # prints [()], i.e. one valid result with 0 variable

The next tutorial explains how to use Datalog in a Python program.