Reference 0.12

In-line datalog statements and queries

This functionality has been significantly expanded in 0.12.0, and is now the recommended way to use pyDatalog. See the sample files.

An in-line datalog statement is a Python statement:

    • that follows the syntax of a datalog statement (see grammar below),
    • where datalog constants, variables and unprefixed predicates have been previously declared globally using pyDatalog.create_atoms() (thus not in a method or class),
    • and where the prefix of prefixed predicates is the name of a class inheriting from pyDatalog.Mixin.

Similarly, an in-line query is a Python statement that follows the syntax of a body (see grammar below). An in-line query returns a Query object that behaves like a Python list of list : each element of the list is itself a list containing one value for each variable in the query, in order of the appearance of the variables in the query. Additionally, the Query object has a >= X operator which return the first value of variable X in the result of the query.

After an in-line query, each variable in the query contains the list of possible values. It should be noted that the result of the query is determined when it is first needed (and thus not in the statement that defines the query).

p(X)
print(X) # the p(X) query is resolved here !

Grammar of a pyDatalog program

The terminal symbols in this grammar are defined in BNF as follows :

    • simple_predicate ::= [a-fA-F_] [0-9a-fA-F_]*
    • constant ::= [a-f] [0-9a-fA-F_]* | python literals
    • variable ::= [A-F_] [0-9a-fA-F_]*, thus starting with an uppercase
    • Note : words starting with _pyD_ are reserved for pyDatalog

Please note:

    • unary operators (+, -) are not supported in expressions. Use 0- instead.
    • although the order of pyDatalog statements is indifferent, the order of literals within a body is significant:
      • an expression used as a key of a function must be bound by a previous literal (otherwise no result is returned)
      • the right hand side of X==expr must be bound by a previous literal (otherwise, no result is returned)
      • the right hand side of p[X]< expr must be bound (otherwise, no result is returned).
      • the left and right hand sides of X<expr comparisons must be bound (otherwise, an error is raised)
      • the variables in a negated body must be either bound by a previous literal, or not used later in the body
    • an inequality must be surrounded by parenthesis, and can only appear in the body of a clause
    • an aggregate function can only appear in the head of a clause. Note the _ prefix (e.g. _sum) to differentiate with the python aggregate function
    • "=" defines a logic formula, while "==" appears in a fact, clause or query and must always be surrounded by parenthesis
    • the head of a clause can only contain constant or variable (but no expressions). Each variable must also appear in the body

Aggregate functions:

    • _len (P[X]==_len(Y)) <= body : P[X] is the count of values of Y (associated to X by the body of the clause)
    • _sum (P[X]==_sum(Y, for_each=Z)) <= body : P[X] is the sum of Y for each Z. (Z is used to distinguish possibly identical Y values)
    • _min, _max (P[X]==_min(Y, order_by=Z)) <= body : P[X] is the minimum (or maximum) of Y sorted by Z.
    • concat (P[X]==concat(Y, order_by=Z, sep=',')) <= body : same as 'sum' but for string. The strings are sorted by Z, and separated by ','.
    • rank (P[X]==rank(for_each=Y, order_by=Z)) <= body : P[X] is the sequence number of X in the list of Y values when the list is sorted by Z.
    • running_sum (P[X]==running_sum(N, for_each=Y, order_by=Z)) <= body : P[X] is the sum of the values of N, for each Y that are before or equal to X when Y's are sorted by Z.
    • The named arguments must be specified in the given order. X and the named arguments can be a list of variables (instead of just one variable), to represent more complex grouping. Variables in order_by arguments can be preceded by '-' for descending sort order. If the aggregation function does not depend on a variable, use a constant (e.g. P[None] == len(Y)).

Methods and classes

The pyDatalog module has the following methods :

    • create_atoms(*args) : adds "logic atoms" in the scope of the caller. create_atoms must be called at module level (not in a function or class definition) It can have any number of arguments : each arg is a string containing the name of the logic atoms to be created, separated by commas. The created logic atoms are either pyDatalog.Variable (if it starts with an upper case) or pyParser.Symbol (otherwise). create_atoms also creates symbols for the aggregate functions.
    • assert_fact(predicate_name, *terms) : asserts predicate_name(terms[0], terms[1], ...)
    • retract_fact( predicate_name, *terms) : retracts predicate_name(terms[0], terms[1], ...)
    • load(code) : where code is a string containing a datalog program, as described in the section above. This method can be used to add facts and clauses to the datalog database.
    • program() : a function decorator that loads the datalog program contained in the decorated function.
    • predicate() : a function decorator that declares a custom predicate resolver written in python
    • ask(query) : where query is a string containing one or more literal(s) joined by the & operator. It returns an instance of pyDatalog.Answer, or None.
    • clear() : removes all facts and clauses from the datalog database.

An instance of the pyDatalog.Variable class has the following attributes and methods:

    • data : list of possible values for the variable. Updated after each in-line query.
    • v(self) : returns the first value of the variable, or None
    • the methods inherited from collections.UserList

An instance of the pyParser.Query class is returned by an in-line query and has the following attributes and methods:

    • data : list of possible list of values that satisfy the query
    • __eq__(self, other) : returns true if the result of the query is equal to other, after converting both of them to sets
    • __ge__(self, other) : returns the first value of other, where other is a pyDatalog.Variable appearing in the query
    • the methods inherited from collections.UserList

An instance of the pyDatalog.Answer class is returned by pyDatalog.ask(query) and has the following attributes and methods:

    • name : name of the predicate that was queried
    • arity : arity of the predicate
    • answers : a list of tuples that satisfy the query. The length of each tuple is the same as the arity.
    • __eq__(other) : facilitates comparison to another set of tuples
    • __str__() : prints the answers

pyEngine has the following attributes for debugging queries:

    • Trace = true : shows derived facts when they are established
    • Debug = true : deeper trace of pyDatalog's reasoning

Note

Beware that, when loading a datalog program, a symbol could become a constant. For example,

@pyDatalog.program()
def _():
    + a(i)
    for i in range(3):
        + b(i)
print(pyDatalog.ask("a('i')")) # prints a set with 1 element : the ('i',) tuple
print(pyDatalog.ask("b(X)")) # prints a set with 3 elements, each containing one element : 0, 1 or 2

The for loop assigns an integer to i, which is inserted as a constant in + b(i).