pyDatalog embeds logic programming in python:
More specifically:
head <= body
, where head is a single literal with one or more variables, and body is a list of literals separated by '&' : p(X) <= q(X) & R(X,Y)
N1 == N-1
, or N > 0
, and contain lambda expressionsnot(p(X))
p(q(a))
is not allowedp[X]=expression
(similar to an assignment)p[X]=len(Y) <= body
: len, sum, concat, min, maxEmployee.name[X]==Y
) to refer to python objects in logic clauses, or to run logic queries in python programspyDatalog is fast and lightweight:
The depth of recursion is not limited by the stack size.
pyDatalog is open-source (LGPL). It has been tested with python 2.7, python 3.2, pypy 1.9, SQLAlchemy 0.7.
Let's define an Employee class, create some objects, and run logic queries on them.
Define python class and business rules
Create python objects
Query the objects using the datalog engine
The following python statements implicitly use the datalog clauses in the class definition.
Notice the similarity to a pyDatalog query.
This similar example shows how to write a similar program to add business logic to a relational database using SQLAlchemy.
This one is using a pure datalog knowledge base. These modes can be mixed freely.
This tutorial explains datalog for use in python programs. You can also run console.py
(in the pyDatalog package) to try the examples (with some adaptations : just enter the datalog statements that are the arguments of pyDatalog.load or pyDatalog.ask)
The datalog engine store facts and clauses. The following statement inserts the fact that Bill is the parent of John Adams:
from pyDatalog import pyDatalog
pyDatalog.load("""
# bill is the parent of John Adams
+ parent(bill,'John Adams')
""")
Note that the unquoted names must start with lower cases, and that quotes must be used for string with spaces or starting with an uppercase letter. Comments, line continuation and constant strings adhere to the lexical rules of python.
A more efficient way to assert the fact is:
# bill is the parent of John Adams
pyDatalog.assert_fact('parent', 'bill','John Adams')
You can now query our database of facts :
# prints a set with one element : the ('bill', 'John Adams') tuple
print(pyDatalog.ask('parent(bill,X)'))
Note that variables in the query start with an upper case, as is customary in prolog.
Logic clauses make the engine smarter. The following program explains recursively when X is an ancestor of Y : either X is the parent of Y, or X is the parent of a person Z who is an ancestor of Y.
# specify what an ancestor is
pyDatalog.load("""
ancestor(X,Y) <= parent(X,Y)
ancestor(X,Y) <= parent(X,Z) & ancestor(Z,Y)
""")
# prints a set with one element: the ('bill', 'John Adams') tuple
print(pyDatalog.ask('ancestor(bill, X)'))
A Function is a special type of clause, with unicity for each element between brackets:
# function
pyDatalog.load("""
# the manager of bill is John Adams. Bill has only one manager.
+ (manager[bill]=='John Adams')
""")
# prints a set with one element : the ('bill', 'John Adams') tuple
print(pyDatalog.ask('manager[bill]==X'))
If another fact specifies another manager for bill, it will replace 'John Adams'.
The following example illustrates the use of comparisons.
# use expressions and recursion to evaluate factorials
pyDatalog.load("""
(factorial[N] == F) <= (N < 2) & (F==1)
(factorial[N] == F) <= (N > 1) & (N1 == N-1) & (F == N*factorial[N1])
""")
# prints a set with one element: (3,6)
print(pyDatalog.ask('factorial[3]==N'))
Expressions can use the 5 operators (+,-,*,/, //). Note that :
This N-Queen solution illustrates how to use the 'in' operator for solving discrete constraint problems.
You can use lambda functions in expression as well. For example:
# specify how a string can be split, reversibly
pyDatalog.load("""
split(X, Y,Z) <= (X == Y+'-'+Z)
split(X, Y,Z) <= (Y == (lambda X: X.split('-')[0])) & (Z == (lambda X: X.split('-')[1]))
""")
# prints a set with one tuple : ('a-b', 'a', 'b')
print(pyDatalog.ask('split("a-b",X,Y)'))
A clause can specify an aggregate function in a clause head:
# aggregate function
pyDatalog.load("""
(ancestors[X]==concat(Y, key=Y, sep=',')) <= ancestor(X,Y) # the list of ancestors, sorted by their name, and separated by ','
""")
# prints a set with one element: the ('bill', 'John Adams') tuple
print(pyDatalog.ask('ancestors[bill]==X'))
The available aggregate functions are:
P[X]==len(Y) <= body
: P[X] is the count of values of Y (associated to X by the body of the clause)P[X]==sum(Y, key=Z) <= body
: P[X] is the sum of Y for each Z.P[X]==concat(Y, key=Z, sep=',') <= body
: same as 'sum' but for string. The strings are sorted by Z, and separated by ','.P[X]==min(Y, key=Z) <= body
: P[X] is the minimum of Y sorted by Z.X and key can be replaced by a list of variables, to represent more complex grouping. Variables in 'key' can be preceded by '-' for descending sort order.
Functions can be defined by a formula:
# formula
pyDatalog.load("""
Employee.salary_class[X] = Employee.salary[X]//1000
""")
This is equivalent to (function[X] == C) <= (C == expression)
. There can be only one formula for the function.
The following example illustrates that the depth of recursion is not limited.
# unlimited depth of recursion
pyDatalog.load("""
+ even('0')
even(N) <= (N > 0) & (N1==N-1) & odd(N1)
odd(N) <= (N > 0) & (N2==N-1) & even(N2)
""")
# prints a set with one element: the ('2000',) tuple
print(pyDatalog.ask('even(2000)'))
Trace mode shows on the console the sequence of facts established by the datalog engine to resolve a query:
from pyDatalog import pyEngine
pyEngine.Trace = True
In this tutorial, we'll show that Python objects can be used in logic fact and clauses, and that clauses can be invoked from simple python statements.
In the examples, we'll use the following class definition of Employee.
from pyDatalog import pyDatalog
class Employee(object):
def __init__(self, name, manager, salary): # method to initialize Employee instances
super(Employee, self).__init__() # calls the initialization method of the Mixin class
self.name = name
self.manager = manager # direct manager of the employee, or None
self.salary = salary # monthly salary of the employee
def __repr__(self): # specifies how to display an Employee
return self.name
John = Employee('John', None, 6800)
Mary = Employee('Mary', John, 6300)
Let's assert the fact that the object Mary has a car:
pyDatalog.assert_fact('has_car', Mary)
print(pyDatalog.ask('has_car(X)')) # prints a set with one element : the (Mary) tuple
This fact is stored in pyDatalog's knowledge base (not in the Employee object). All the principles explained in Tutorial 1 apply to such predicates whose terms are python objects.
Instead of asserting facts in the pyDatalog knowledge base, it is often more efficient for logic predicates to directly refer to the attribute of an object. For example, we may want to have a clause concerning the employee salary, as defined in the Employee class.
This requires 2 things:
( Employee.salary[X] == Y )
, both in the datalog clause and the query in python.from pyDatalog import pyDatalog
class Employee(pyDatalog.Mixin): # --> Employee inherits the pyDatalog capability to use logic clauses
<same definition of Employee as above>
John = Employee('John', None, 6800)
Mary = Employee('Mary', John, 6300)
print(pyDatalog.ask('Employee.salary[X]==6300')) # prints a set with 1 element : the (Mary, 6300) tuple
In fact, thanks to the pyDatalog.Mixin, there is a much shorter way to write queries:
# who has a salary of 6300 ?
X = pyDatalog.Variable()
Employee.salary[X]==6300 # notice the similarity to a pyDatalog query
print(X) # prints (Mary,)
If the attribute is None, then the literal is considered false:
# who is the manager of John
Employee.manager[John]==X
print(X) # prints []
Prefixed predicates can appear in the head and body of a logic clause. For example:
pyDatalog.load("""
# all the indirect managers Y of employee X are derived from his manager, recursively
Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Y)
Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Z) & Employee.indirect_manager(Z,Y)
""")
# Who are the employees with a salary class of 6 ?
Employee.salary_class[X]==6
print(X) # prints (John, Mary)
If a logic clause defines an existing python class attribute, the datalog engine will use the logic clause (instead of reading the attribute) to resolve queries on it.
It can be convenient to use aliases for class names, as shown below:
pyDatalog.load("""
E = Employee # defines an alias for Employee
Employee.salary_class[X] = E.salary[X]//1000
""")
The pyDatalog Mixin also simplifies the creation of clauses for a class, using an anonymous function:
class Employee(pyDatalog.Mixin): # --> Employee inherits the pyDatalog capability to use logic clauses
<same definition of Employee as above>
@pyDatalog.program() # indicates that the following method contains pyDatalog clauses
def _():
# the salary class N of employee X is computed as a function of his/her salary
Employee.salary_class[X] = E.salary[X]//1000
The Employee class can be persisted to a relational database using SQLAlchemy, as shown in this example.
The pyDatalog module has the following methods :
The Answer class contains the following attributes and methods (the methods may be modified to meet specific needs):
In theory, a code string can contain any python code, as defined in the official grammar of Python. However, function and variable names (that are not reserved by python) are considered Datalog symbols, have special meaning, and should appear only in statements that follow this subset of the python grammar :
The terminal symbols in this grammar are defined in BNF as follows :
Please note:
Aggregate functions:
P[X]==len(Y) <= body
: P[X] is the count of values of Y (associated to X by the body of the clause)P[X]==sum(Y, key=Z) <= body
: P[X] is the sum of Y for each Z. Z is sometimes necessary to distinguish identical Y values.P[X]==concat(Y, key=Z, sep=',') <= body
: same as 'sum' but for string. The strings are sorted by Z, and separated by ','.P[X]==min(Y, key=Z) <= body
: P[X] is the minimum of Y, sorted by Z.X and key can be replaced by a list of variables, to represent more complex grouping. Variables in 'key' can be preceded by '-' for descending sort order.
Beware that, when loading a datalog program, a symbol could become a constant. For example,
from pyDatalog import pyDatalog
@pyDatalog.program()
def _():
+ a(i)
for i in range(3):
+ b(i)
print(pyDatalog.ask("a('i')")) # prints a set with 1 element : the ('i',) tuple
print(pyDatalog.ask("b(X)")) # prints a set with 3 elements, each containing one element : 0, 1 or 2
The for
loop assigns an integer to i, which is inserted as a constant in + b(i)
.