Documentation of version 8.1
Features
pyDatalog embeds logic programming in python:
- you can assert facts in a datalog knowledge base, and query it.
- you can define attributes of python classes through logic clauses, and use logic queries on python objects (see example below).
- you can use logic clauses to query relational database via SQLAlchemy, the object-relational mapping and data access layer for python.
- you can use an interactive console to run datalog queries
More specifically:
- you can assert logic clauses of the form
head <= body
, where head is a single literal with one or more variables, and body is a list of literals separated by '&' :p(X) <= q(X) & R(X,Y)
- each variable in the head must also appear in the body. The body may include equality and comparison predicates :
N1 == N-1
, orN > 0
, and contain lambda expressions - you can negate a literal in the body:
not(p(X))
- the argument of a literal cannot be another predicate :
p(q(a))
is not allowed - you can define logic functions by a formula:
p[X]=expression
(similar to an assignment) - you can define aggregate functions, e.g.
p[X]=len(Y) <= body
: len, sum, concat, min, max - you can use prefixed literals and functions (e.g.
Employee.name[X]==Y
) to refer to python objects in logic clauses, or to run logic queries in python programs
pyDatalog is fast and lightweight:
- it is based on the SLG resolution algorithm with memoization
- it uses indexes for datalog facts
- it can run on pypy (python with Just-In-Time compiler)
- it has less than 2 K lines of code.
The depth of recursion is not limited by the stack size.
pyDatalog is open-source (LGPL). It has been tested with python 2.7, python 3.2, pypy 1.9, SQLAlchemy 0.7.
Example
Let's define an Employee class, create some objects, and run logic queries on them.
Define python class and business rules
- from pyDatalog import pyDatalog
- class Employee(pyDatalog.Mixin): # --> Employee inherits the pyDatalog capability to use logic clauses
- def __init__(self, name, manager, salary): # method to initialize Employee instances
- super(Employee, self).__init__() # calls the initialization method of the Mixin class
- self.name = name
- self.manager = manager # direct manager of the employee, or None
- self.salary = salary # monthly salary of the employee
- def __repr__(self): # specifies how to display an Employee
- return self.name
- @pyDatalog.program() # indicates that the following method contains pyDatalog clauses
- def _():
- # the salary class N of employee X is computed as a function of his/her salary
- # this statement is a logic equality, not an assignment !
- Employee.salary_class[X] = Employee.salary[X]//1000
- # all the indirect managers Y of employee X are derived from his manager, recursively
- Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Y)
- Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Z) & Employee.indirect_manager(Z,Y)
Create python objects
- John = Employee('John', None, 6800)
- Mary = Employee('Mary', John, 6300)
Query the objects using the datalog engine
The following python statements implicitly use the datalog clauses in the class definition.
Notice the similarity to a pyDatalog query.
- # What is the salary class of Mary ?
- print(Mary.salary_class)
- # who has a salary of 6300 ?
- X = pyDatalog.Variable()
- Employee.salary[X]==6300
- print(X) # prints (Mary,)
- # Who are the employees with a salary class of 6 ?
- Employee.salary_class[X]==6
- print(X) # prints (John, Mary)
- # who are the indirect managers of Mary (recursively) ?
- Employee.indirect_manager(Mary, X)
- print(X) # prints (John,)
This similar example shows how to write a similar program to add business logic to a relational database using SQLAlchemy.
This one is using a pure datalog knowledge base. These modes can be mixed freely.
Tutorial 1 - Datalog
This tutorial explains datalog for use in python programs. You can also run console.py
(in the pyDatalog package) to try the examples (with some adaptations : just enter the datalog statements that are the arguments of pyDatalog.load or pyDatalog.ask)
The datalog engine store facts and clauses. The following statement inserts the fact that Bill is the parent of John Adams:
from pyDatalog import pyDatalog
pyDatalog.load("""
# bill is the parent of John Adams
+ parent(bill,'John Adams')
""")
Note that the unquoted names must start with lower cases, and that quotes must be used for string with spaces or starting with an uppercase letter. Comments, line continuation and constant strings adhere to the lexical rules of python.
A more efficient way to assert the fact is:
# bill is the parent of John Adams
pyDatalog.assert_fact('parent', 'bill','John Adams')
You can now query our database of facts :
# prints a set with one element : the ('bill', 'John Adams') tuple
print(pyDatalog.ask('parent(bill,X)'))
Note that variables in the query start with an upper case, as is customary in prolog.
Logic clauses make the engine smarter. The following program explains recursively when X is an ancestor of Y : either X is the parent of Y, or X is the parent of a person Z who is an ancestor of Y.
# specify what an ancestor is
pyDatalog.load("""
ancestor(X,Y) <= parent(X,Y)
ancestor(X,Y) <= parent(X,Z) & ancestor(Z,Y)
""")
# prints a set with one element: the ('bill', 'John Adams') tuple
print(pyDatalog.ask('ancestor(bill, X)'))
A Function is a special type of clause, with unicity for each element between brackets:
# function
pyDatalog.load("""
# the manager of bill is John Adams. Bill has only one manager.
+ (manager[bill]=='John Adams')
""")
# prints a set with one element : the ('bill', 'John Adams') tuple
print(pyDatalog.ask('manager[bill]==X'))
If another fact specifies another manager for bill, it will replace 'John Adams'.
The following example illustrates the use of comparisons.
# use expressions and recursion to evaluate factorials
pyDatalog.load("""
(factorial[N] == F) <= (N < 2) & (F==1)
(factorial[N] == F) <= (N > 1) & (N1 == N-1) & (F == N*factorial[N1])
""")
# prints a set with one element: (3,6)
print(pyDatalog.ask('factorial[3]==N'))
Expressions can use the 5 operators (+,-,*,/, //). Note that :
- equality/comparison predicates must be placed between parenthesis
- the left hand side of the equality/comparison must be a variable
- the variables on the right hand side must be bound (pyDatalog does not solve equations !)
This N-Queen solution illustrates how to use the 'in' operator for solving discrete constraint problems.
You can use lambda functions in expression as well. For example:
# specify how a string can be split, reversibly
pyDatalog.load("""
split(X, Y,Z) <= (X == Y+'-'+Z)
split(X, Y,Z) <= (Y == (lambda X: X.split('-')[0])) & (Z == (lambda X: X.split('-')[1]))
""")
# prints a set with one tuple : ('a-b', 'a', 'b')
print(pyDatalog.ask('split("a-b",X,Y)'))
A clause can specify an aggregate function in a clause head:
# aggregate function
pyDatalog.load("""
(ancestors[X]==concat(Y, key=Y, sep=',')) <= ancestor(X,Y) # the list of ancestors, sorted by their name, and separated by ','
""")
# prints a set with one element: the ('bill', 'John Adams') tuple
print(pyDatalog.ask('ancestors[bill]==X'))
The available aggregate functions are:
- len
P[X]==len(Y) <= body
: P[X] is the count of values of Y (associated to X by the body of the clause) - sum
P[X]==sum(Y, key=Z) <= body
: P[X] is the sum of Y for each Z. - concat
P[X]==concat(Y, key=Z, sep=',') <= body
: same as 'sum' but for string. The strings are sorted by Z, and separated by ','. - min, max
P[X]==min(Y, key=Z) <= body
: P[X] is the minimum of Y sorted by Z.
X and key can be replaced by a list of variables, to represent more complex grouping. Variables in 'key' can be preceded by '-' for descending sort order.
Functions can be defined by a formula:
# formula
pyDatalog.load("""
Employee.salary_class[X] = Employee.salary[X]//1000
""")
This is equivalent to (function[X] == C) <= (C == expression)
. There can be only one formula for the function.
The following example illustrates that the depth of recursion is not limited.
# unlimited depth of recursion
pyDatalog.load("""
+ even('0')
even(N) <= (N > 0) & (N1==N-1) & odd(N1)
odd(N) <= (N > 0) & (N2==N-1) & even(N2)
""")
# prints a set with one element: the ('2000',) tuple
print(pyDatalog.ask('even(2000)'))
Trace mode shows on the console the sequence of facts established by the datalog engine to resolve a query:
from pyDatalog import pyEngine
pyEngine.Trace = True
Tutorial 2 - Datalog in Python
In this tutorial, we'll show that Python objects can be used in logic fact and clauses, and that clauses can be invoked from simple python statements.
In the examples, we'll use the following class definition of Employee.
from pyDatalog import pyDatalog
class Employee(object):
def __init__(self, name, manager, salary): # method to initialize Employee instances
super(Employee, self).__init__() # calls the initialization method of the Mixin class
self.name = name
self.manager = manager # direct manager of the employee, or None
self.salary = salary # monthly salary of the employee
def __repr__(self): # specifies how to display an Employee
return self.name
John = Employee('John', None, 6800)
Mary = Employee('Mary', John, 6300)
Let's assert the fact that the object Mary has a car:
pyDatalog.assert_fact('has_car', Mary)
print(pyDatalog.ask('has_car(X)')) # prints a set with one element : the (Mary) tuple
This fact is stored in pyDatalog's knowledge base (not in the Employee object). All the principles explained in Tutorial 1 apply to such predicates whose terms are python objects.
Instead of asserting facts in the pyDatalog knowledge base, it is often more efficient for logic predicates to directly refer to the attribute of an object. For example, we may want to have a clause concerning the employee salary, as defined in the Employee class.
This requires 2 things:
- that the class inherits from pyDatalog.Mixin
- that the predicate is a function prefixed by the class name, e.g.
( Employee.salary[X] == Y )
, both in the datalog clause and the query in python.
from pyDatalog import pyDatalog
class Employee(pyDatalog.Mixin): # --> Employee inherits the pyDatalog capability to use logic clauses
<same definition of Employee as above>
John = Employee('John', None, 6800)
Mary = Employee('Mary', John, 6300)
print(pyDatalog.ask('Employee.salary[X]==6300')) # prints a set with 1 element : the (Mary, 6300) tuple
In fact, thanks to the pyDatalog.Mixin, there is a much shorter way to write queries:
# who has a salary of 6300 ?
X = pyDatalog.Variable()
Employee.salary[X]==6300 # notice the similarity to a pyDatalog query
print(X) # prints (Mary,)
If the attribute is None, then the literal is considered false:
# who is the manager of John
Employee.manager[John]==X
print(X) # prints []
Prefixed predicates can appear in the head and body of a logic clause. For example:
pyDatalog.load("""
# all the indirect managers Y of employee X are derived from his manager, recursively
Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Y)
Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Z) & Employee.indirect_manager(Z,Y)
""")
# Who are the employees with a salary class of 6 ?
Employee.salary_class[X]==6
print(X) # prints (John, Mary)
If a logic clause defines an existing python class attribute, the datalog engine will use the logic clause (instead of reading the attribute) to resolve queries on it.
It can be convenient to use aliases for class names, as shown below:
pyDatalog.load("""
E = Employee # defines an alias for Employee
Employee.salary_class[X] = E.salary[X]//1000
""")
The pyDatalog Mixin also simplifies the creation of clauses for a class, using an anonymous function:
class Employee(pyDatalog.Mixin): # --> Employee inherits the pyDatalog capability to use logic clauses
<same definition of Employee as above>
@pyDatalog.program() # indicates that the following method contains pyDatalog clauses
def _():
# the salary class N of employee X is computed as a function of his/her salary
Employee.salary_class[X] = E.salary[X]//1000
The Employee class can be persisted to a relational database using SQLAlchemy, as shown in this example.
Reference
Methods and class
The pyDatalog module has the following methods :
- assert_fact(predicate_name, *terms) : asserts "predicate_name(terms[0], terms[1], ...)"
- load(code) : where code is a string containing a datalog program, as described in the section below. This method is used to add facts and clauses to the datalog database.
- program() : a function decorator that loads the datalog program contained in the decorated function.
- ask(query) : where query is a string containing a literal, i.e. a predicate with variable and/or constant terms. It returns an instance of Answer, or None.
- clear() : removes all facts and clauses from the datalog database.
The Answer class contains the following attributes and methods (the methods may be modified to meet specific needs):
- name : name of the predicate that was queried
- arity : arity of the predicate
- answers : a list of tuples that satisfy the query. The length of each tuple is the same as the arity.
- __eq__(other) : facilitates comparison to another set of tuples
- __str__() : prints the answers
Grammar of a datalog program
In theory, a code string can contain any python code, as defined in the official grammar of Python. However, function and variable names (that are not reserved by python) are considered Datalog symbols, have special meaning, and should appear only in statements that follow this subset of the python grammar :
The terminal symbols in this grammar are defined in BNF as follows :
- simple_predicate ::= [a-fA-F_] [0-9a-fA-F_]*
- constant ::= [a-f] [0-9a-fA-F_]* | python literals
- variable ::= [A-F_] [0-9a-fA-F_]*
Please note:
- "=" defines a logic formula, while "==" appears in a fact, clause or query and must always be surrounded by parenthesis
- an aggregate function can only appear in the head of a clause
- an inequality must be surrounded by parenthesis, and can only appear in the body of a clause
- that the order of literals in a body is significant. In particular, the right hand side of an (in)equality must be bound by a previous literal, and the left hand side of an inequality must also be bound (otherwise, no result is returned. No error is raised).
Aggregate functions:
- len
P[X]==len(Y) <= body
: P[X] is the count of values of Y (associated to X by the body of the clause) - sum
P[X]==sum(Y, key=Z) <= body
: P[X] is the sum of Y for each Z. Z is sometimes necessary to distinguish identical Y values. - concat
P[X]==concat(Y, key=Z, sep=',') <= body
: same as 'sum' but for string. The strings are sorted by Z, and separated by ','. - min, max
P[X]==min(Y, key=Z) <= body
: P[X] is the minimum of Y, sorted by Z.
X and key can be replaced by a list of variables, to represent more complex grouping. Variables in 'key' can be preceded by '-' for descending sort order.
Beware that, when loading a datalog program, a symbol could become a constant. For example,
from pyDatalog import pyDatalog
@pyDatalog.program()
def _():
+ a(i)
for i in range(3):
+ b(i)
print(pyDatalog.ask("a('i')")) # prints a set with 1 element : the ('i',) tuple
print(pyDatalog.ask("b(X)")) # prints a set with 3 elements, each containing one element : 0, 1 or 2
The for
loop assigns an integer to i, which is inserted as a constant in + b(i)
.