pyDatalog embeds logic programming in python:
More specifically:
head <= body, where head is a single literal with one or more variables, and body is a list of literals separated by '&' : p(X) <= q(X) & R(X,Y)N1 == N-1, or N > 0, and contain lambda expressionsnot(p(X))p(q(a)) is not allowedp[X]=expression (similar to an assignment)p[X]=len(Y) <= body : len, sum, concat, min, maxEmployee.name[X]==Y) to refer to python objects in logic clauses, or to run logic queries in python programspyDatalog is fast and lightweight:
The depth of recursion is not limited by the stack size.
pyDatalog is open-source (LGPL). It has been tested with python 2.7, python 3.2, pypy 1.9, SQLAlchemy 0.7.
Let's define an Employee class, create some objects, and run logic queries on them.
Define python class and business rules
Create python objects
Query the objects using the datalog engine
The following python statements implicitly use the datalog clauses in the class definition.
Notice the similarity to a pyDatalog query.
This similar example shows how to write a similar program to add business logic to a relational database using SQLAlchemy.
This one is using a pure datalog knowledge base. These modes can be mixed freely.
This tutorial explains datalog for use in python programs. You can also run console.py (in the pyDatalog package) to try the examples (with some adaptations : just enter the datalog statements that are the arguments of pyDatalog.load or pyDatalog.ask)
The datalog engine store facts and clauses. The following statement inserts the fact that Bill is the parent of John Adams:
from pyDatalog import pyDatalogpyDatalog.load(""" # bill is the parent of John Adams + parent(bill,'John Adams') """)Note that the unquoted names must start with lower cases, and that quotes must be used for string with spaces or starting with an uppercase letter. Comments, line continuation and constant strings adhere to the lexical rules of python.
A more efficient way to assert the fact is:
# bill is the parent of John AdamspyDatalog.assert_fact('parent', 'bill','John Adams') You can now query our database of facts :
# prints a set with one element : the ('bill', 'John Adams') tupleprint(pyDatalog.ask('parent(bill,X)')) Note that variables in the query start with an upper case, as is customary in prolog.
Logic clauses make the engine smarter. The following program explains recursively when X is an ancestor of Y : either X is the parent of Y, or X is the parent of a person Z who is an ancestor of Y.
# specify what an ancestor ispyDatalog.load(""" ancestor(X,Y) <= parent(X,Y) ancestor(X,Y) <= parent(X,Z) & ancestor(Z,Y)""")# prints a set with one element: the ('bill', 'John Adams') tupleprint(pyDatalog.ask('ancestor(bill, X)')) A Function is a special type of clause, with unicity for each element between brackets:
# functionpyDatalog.load(""" # the manager of bill is John Adams. Bill has only one manager.
+ (manager[bill]=='John Adams') """)# prints a set with one element : the ('bill', 'John Adams') tupleprint(pyDatalog.ask('manager[bill]==X'))If another fact specifies another manager for bill, it will replace 'John Adams'.
The following example illustrates the use of comparisons.
# use expressions and recursion to evaluate factorialspyDatalog.load(""" (factorial[N] == F) <= (N < 2) & (F==1) (factorial[N] == F) <= (N > 1) & (N1 == N-1) & (F == N*factorial[N1])""")# prints a set with one element: (3,6)print(pyDatalog.ask('factorial[3]==N')) Expressions can use the 5 operators (+,-,*,/, //). Note that :
This N-Queen solution illustrates how to use the 'in' operator for solving discrete constraint problems.
You can use lambda functions in expression as well. For example:
# specify how a string can be split, reversiblypyDatalog.load(""" split(X, Y,Z) <= (X == Y+'-'+Z) split(X, Y,Z) <= (Y == (lambda X: X.split('-')[0])) & (Z == (lambda X: X.split('-')[1]))""")# prints a set with one tuple : ('a-b', 'a', 'b')print(pyDatalog.ask('split("a-b",X,Y)')) A clause can specify an aggregate function in a clause head:
# aggregate functionpyDatalog.load(""" (ancestors[X]==concat(Y, key=Y, sep=',')) <= ancestor(X,Y) # the list of ancestors, sorted by their name, and separated by ','""")# prints a set with one element: the ('bill', 'John Adams') tupleprint(pyDatalog.ask('ancestors[bill]==X')) The available aggregate functions are:
P[X]==len(Y) <= body : P[X] is the count of values of Y (associated to X by the body of the clause)P[X]==sum(Y, key=Z) <= body : P[X] is the sum of Y for each Z.P[X]==concat(Y, key=Z, sep=',') <= body : same as 'sum' but for string. The strings are sorted by Z, and separated by ','.P[X]==min(Y, key=Z) <= body : P[X] is the minimum of Y sorted by Z.X and key can be replaced by a list of variables, to represent more complex grouping. Variables in 'key' can be preceded by '-' for descending sort order.
Functions can be defined by a formula:
# formulapyDatalog.load(""" Employee.salary_class[X] = Employee.salary[X]//1000 """)This is equivalent to (function[X] == C) <= (C == expression). There can be only one formula for the function.
The following example illustrates that the depth of recursion is not limited.
# unlimited depth of recursionpyDatalog.load(""" + even('0') even(N) <= (N > 0) & (N1==N-1) & odd(N1) odd(N) <= (N > 0) & (N2==N-1) & even(N2)""")# prints a set with one element: the ('2000',) tupleprint(pyDatalog.ask('even(2000)')) Trace mode shows on the console the sequence of facts established by the datalog engine to resolve a query:
from pyDatalog import pyEnginepyEngine.Trace = TrueIn this tutorial, we'll show that Python objects can be used in logic fact and clauses, and that clauses can be invoked from simple python statements.
In the examples, we'll use the following class definition of Employee.
from pyDatalog import pyDatalogclass Employee(object): def __init__(self, name, manager, salary): # method to initialize Employee instances super(Employee, self).__init__() # calls the initialization method of the Mixin class self.name = name self.manager = manager # direct manager of the employee, or None self.salary = salary # monthly salary of the employee def __repr__(self): # specifies how to display an Employee return self.nameJohn = Employee('John', None, 6800)Mary = Employee('Mary', John, 6300)Let's assert the fact that the object Mary has a car:
pyDatalog.assert_fact('has_car', Mary)print(pyDatalog.ask('has_car(X)')) # prints a set with one element : the (Mary) tupleThis fact is stored in pyDatalog's knowledge base (not in the Employee object). All the principles explained in Tutorial 1 apply to such predicates whose terms are python objects.
Instead of asserting facts in the pyDatalog knowledge base, it is often more efficient for logic predicates to directly refer to the attribute of an object. For example, we may want to have a clause concerning the employee salary, as defined in the Employee class.
This requires 2 things:
( Employee.salary[X] == Y ) , both in the datalog clause and the query in python.from pyDatalog import pyDatalogclass Employee(pyDatalog.Mixin): # --> Employee inherits the pyDatalog capability to use logic clauses <same definition of Employee as above> John = Employee('John', None, 6800)Mary = Employee('Mary', John, 6300)print(pyDatalog.ask('Employee.salary[X]==6300')) # prints a set with 1 element : the (Mary, 6300) tupleIn fact, thanks to the pyDatalog.Mixin, there is a much shorter way to write queries:
# who has a salary of 6300 ?X = pyDatalog.Variable()Employee.salary[X]==6300 # notice the similarity to a pyDatalog queryprint(X) # prints (Mary,)If the attribute is None, then the literal is considered false:
# who is the manager of JohnEmployee.manager[John]==Xprint(X) # prints []Prefixed predicates can appear in the head and body of a logic clause. For example:
pyDatalog.load(""" # all the indirect managers Y of employee X are derived from his manager, recursively Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Y) Employee.indirect_manager(X,Y) <= (Employee.manager[X]==Z) & Employee.indirect_manager(Z,Y) """)# Who are the employees with a salary class of 6 ?Employee.salary_class[X]==6print(X) # prints (John, Mary)If a logic clause defines an existing python class attribute, the datalog engine will use the logic clause (instead of reading the attribute) to resolve queries on it.
It can be convenient to use aliases for class names, as shown below:
pyDatalog.load(""" E = Employee # defines an alias for Employee Employee.salary_class[X] = E.salary[X]//1000 """)The pyDatalog Mixin also simplifies the creation of clauses for a class, using an anonymous function:
class Employee(pyDatalog.Mixin): # --> Employee inherits the pyDatalog capability to use logic clauses <same definition of Employee as above> @pyDatalog.program() # indicates that the following method contains pyDatalog clauses def _(): # the salary class N of employee X is computed as a function of his/her salary Employee.salary_class[X] = E.salary[X]//1000The Employee class can be persisted to a relational database using SQLAlchemy, as shown in this example.
The pyDatalog module has the following methods :
The Answer class contains the following attributes and methods (the methods may be modified to meet specific needs):
In theory, a code string can contain any python code, as defined in the official grammar of Python. However, function and variable names (that are not reserved by python) are considered Datalog symbols, have special meaning, and should appear only in statements that follow this subset of the python grammar :
The terminal symbols in this grammar are defined in BNF as follows :
Please note:
Aggregate functions:
P[X]==len(Y) <= body : P[X] is the count of values of Y (associated to X by the body of the clause)P[X]==sum(Y, key=Z) <= body : P[X] is the sum of Y for each Z. Z is sometimes necessary to distinguish identical Y values.P[X]==concat(Y, key=Z, sep=',') <= body : same as 'sum' but for string. The strings are sorted by Z, and separated by ','.P[X]==min(Y, key=Z) <= body : P[X] is the minimum of Y, sorted by Z.X and key can be replaced by a list of variables, to represent more complex grouping. Variables in 'key' can be preceded by '-' for descending sort order.
Beware that, when loading a datalog program, a symbol could become a constant. For example,
from pyDatalog import pyDatalog@pyDatalog.program()def _(): + a(i) for i in range(3): + b(i) print(pyDatalog.ask("a('i')")) # prints a set with 1 element : the ('i',) tupleprint(pyDatalog.ask("b(X)")) # prints a set with 3 elements, each containing one element : 0, 1 or 2The for loop assigns an integer to i, which is inserted as a constant in + b(i).