Thoughts

Recent

  • Ever Try to Break Your Programming Language? Normally, in programming, the computer is right, and you are wrong.  It's a very humbling experience to learn how to program; one is forced to become painfully aware of ...
    Posted Apr 23, 2014, 1:31 PM by William Cushing
  • A Hack for Integrating EMACS+SLIME+SBCL+CYGWIN The difficulty with this combination is that SBCL on Windows uses Windows proper in order to read files, not Cygwin.  Emacs on Windows is, at least in my case, actually ...
    Posted Apr 18, 2014, 3:56 PM by William Cushing
  • Dynamic Memory Management dlmalloc (many?) http://g.oswego.edu/dl/html/malloc.html jemalloc facebook http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919 ptmalloc gcc http://www.malloc ...
    Posted Nov 27, 2013, 2:35 PM by William Cushing
  • Some Pointers on using (Binary) Decision Diagrams in Planning Once upon a time, Daniel Bryce and I used the Colorado User's Decision Diagram package/library to greatly speed up his planner, POND, using a cute trick I like ...
    Posted Nov 17, 2013, 10:15 PM by William Cushing
  • A quick survey of theorem proving systems more (and less) relevant to Prof. Russell's MRS Prof. Stuart Russell implemented one of the most theoretically-capable AI systems ever, called the Meta Reasoning System.  I have no idea if the code is at all available, or ...
    Posted Oct 31, 2013, 1:24 AM by William Cushing
Showing posts 1 - 5 of 30. View more »

All

Want to comment?  Send me an email, or try joining http://groups.google.com/group/planning-list/subscribe.

Ever Try to Break Your Programming Language?

posted Apr 23, 2014, 1:31 PM by William Cushing

Normally, in programming, the computer is right, and you are wrong.  It's a very humbling experience to learn how to program; one is forced to become painfully aware of just how true "to err is human" is.  (To an only slightly lesser extent, math classes, particularly math exams, also teach this meta-lesson.)

But you know what?  Sometimes the computer is wrong.

Here's a fun experiment: try evaluating 2^{2^{100}}.  That's two to the power of two to the power of one hundred.  In a LISP: (expt 2 (expt 2 100)).  In C: pow(2, pow(2, 100)).  In Windows Calculator (scientific mode): 2, xy, 100, =, MS, 2, xy, MR, =.

SBCL: Immediately aborts with "Can't represent result of left shift". (expt 3 (expt 3 100)) makes it hang.
Racket: Aborts with "Out of memory" (once it runs out of memory).
Lispworks Personal Edition: Hangs (runs for at least a half hour, then I killed it).
Windows Calculator: Rejects the "program" with "Invalid Input" on the attempt to recall 2100 into the exponent operand of xy.

What would OpenGL do?  What would Postscript do?  (Does Bash do exponentiation?) ...

How many different ways do systems break when you ask them to compute some representation of this astronomical number?

A Hack for Integrating EMACS+SLIME+SBCL+CYGWIN

posted Apr 18, 2014, 3:53 PM by William Cushing   [ updated Apr 18, 2014, 3:56 PM ]

The difficulty with this combination is that SBCL on Windows uses Windows proper in order to read files, not Cygwin.  Emacs on Windows is, at least in my case, actually via Cygwin.  Cygwin and Windows think of the filesystem somewhat differently.  This means Slime won't start, because it doesn't know how to tell the target lisp (SBCL) where to find the code for Slime's back-end (Swank).

The following is an egregiously hacky solution ... enjoy!

Firstly, patch slime.el so that slime-init-command actually uses the half-provided functionality for addressing this problem:

(defun slime-init-command (port-filename _coding-system)
  "Return a string to initialize Lisp."
  (let ((loader (if (file-name-absolute-p slime-backend)
                    slime-backend
                  (concat slime-path slime-backend))))
    ;; Return a single form to avoid problems with buffered input.
    (format "%S\n\n"
            `(progn
               (load ,(slime-to-lisp-filename (expand-file-name loader)) ;; WC: convert to target's notion of filenames
                     :verbose t)
               (funcall (read-from-string "swank-loader:init"))
               (funcall (read-from-string "swank:start-server")
                        ,(slime-to-lisp-filename port-filename)))))) ;; WC: convert to target's notion of filenames

Secondly, edit ~/.emacs to exploit the now working hooks:

;; COMPLETELY NON-PORTABLE
(setq slime-to-lisp-filename-function
      (lambda (x)
    (let ((prefix (substring x 0 3)))
      (if (string-equal prefix "/c/")
          (substring x 2)
        (concatenate 'string "/cygwin64" x)
        ))))
(setq slime-from-lisp-filename-function
      (lambda (y)
    (let ((prefix "C:/cygwin64/"))
      (if (string-equal (substring y 0 (length prefix)) prefix)
          (substring y (- (length prefix) 1))       
        (concatenate 'string "/c/" (substring y 3))))))

Thirdly, set up a symbolic link to make cygwin references to window's files (a) prettier, and (b) of the form expected by the hack above:

ln -s /cygdrive/c /c


Some Pointers on using (Binary) Decision Diagrams in Planning

posted Nov 17, 2013, 10:15 PM by William Cushing

Once upon a time, Daniel Bryce and I used the Colorado User's Decision Diagram package/library to greatly speed up his planner, POND, using a cute trick I like to call State Agnostic (Planning) Graphs.

The paper doesn't really delve into all the niggly engineering details.  Getting things done in practice is a lot harder than academic papers make it look.  Some 8-ish years later, here are some vague tips I can still recall at a more engineering-ish level.



Potentially more useful is source code...if you can handle reading source code. 

Fair warning: all source code you can find in the wild is super ugly.  It takes money to make things beautiful; it shouldn't be surprising that it costs money to see beautiful things.


For an example of a lisp-y (Bigloo Scheme, in particular) interface, one could peruse the source code to SRI's Symbolic Analysis Laboratory:
http://sal.csl.sri.com/download.shtml
E.g., src/sal-bdd-fsm.scm is an example of how a schemer chooses to define and document an interface to a particular piece of functionality. 

The source code to POND (Daniel Bryce's planner) is also likely available (at least on request, for academic use), which one may be interested in, since both it and specifically the CUDD implementation of BDDs are implemented in C.  Stefan Edelkamp has several implementations of planning with BDDs in various fashions (BDDPlan, Gamer, MIPS), for which source code might also be available (for academics).

-- Important tricks --

There are two particular tricks whose proper justification requires deep understanding of BDDs, but without which one is quite likely to fail to get off the ground.

(1) Make sure the BDD variable order [*] has x' right after (or before, but after is more intuitive) x for every present state fluent x and next state fluent x'. 

*: If reading a theoretical paper, keep in mind that "BDD" in practice always means "OBDD" in theory.  (The non-ordered variant, sometimes written "FBDD" in a theoretical paper, has very little in the way of directly exploitable computational properties; such are only useful as a conceptual stepping stone.)

(If fluents are multi-valued, then one will have to decide whether to interleave the individual bits of the two bit-vectors encoding the fluents, or concatenate them.  Actually, there is also a third choice: expanding the encoding of multi-valued fluents to boolean fluents in the traditional ADL->STRIPS fashion, which electrical engineers refer to as "one-hot encoding".  The result of doing that expansion and then applying the suggested ordering of the boolean fluents would most closely resemble interleaving the bits of the more compact encoding.  Malte Helmert and/or Stefan Edelkamp have deeper insight into representing multi-valued fluents using BDDs.)

(2) Manually break up the transition relation by action, rather than representing it in the mathematically obvious way: i.e., don't compute a giant disjunction (="or") over the description of each action as a decision diagram.  Keep each action diagram separate, and explicitly loop over the actions in order to compute with the transition relation.

The direct representation is terribly mathematically tempting (by analogy with, say, using eigenvectors of transition matrices to "cheat" at solving Markov chains)...but every publication on a real implementation that I've read explicitly discusses factoring the transition relation manually.

-- Weak justification --

As I recall, (1) is so that the frame axiom(s) have a concise description (decision diagrams don't encode equality constraints between variables far separated in the variable order very well, when, as is the case for a frame axiom, the variables in between are also relevant).  The lion's share of any action representation in a traditional logic is the piece stipulating that it doesn't change any of the irrelevant things; this is doubly true in application to PDDL-style planning, where actions touch a vanishingly small fraction of ground propositions.  (The symbolic model-checking community tends to come up with "better", hand-crafted, encodings in this regard...but as far as I can tell, even they are careful about defining the default variable order in a frame-axiom friendly manner.)

For our purposes, (2) was primarily so that implementing heuristic search would be straightforward (i.e., we could reuse the existing implementation instead of changing it, too).  It is also possible to push search itself into the symbolic package, as the model-checking folks do, but it is difficult to make that perform well for a variety of reasons (e.g., it is easy to symbolically implement breadth-first search, but rather more challenging to faithfully lift A* to a symbolic computation).  I think especially in a more PDDL-ish style context it is too challenging to make symbolic search work well, because the problem encoding(s) aren't nearly as well-informed (as in symbolic model-checking) by the implementation characteristics. 

The other reason (2) is significant in general is that writing out the loops to apply such "factored transition relations" gives substantially better performance, at only a small implementation cost, compared with the all-BDD way, which is to:
(a) compute the conjunction of a present-states BDD with the transition-relation BDD,
(b) then existentially quantify out the present-state variables (giving one the BDD for the next-states), and
(c) finally, swap the two variable sets (allowing one to iterate the procedure without needing one set of variables for every time step; such quasi-infinite encoding is standard for CNF, but totally infeasible with BDDs, which are easily tricked into exponential size in the number of variables).
With a factored transition relation, one wraps those 3 steps in a loop over actions, collecting the diagrams representing next-states using disjunctions. 

Said that way, it almost sounds as if the all-BDD way would be more efficient, but another way of grokking the situation is to expect that every BDD you touch is exponentially large in the number of relevant variables: and transition relations feature twice as many relevant variables as diagrams just representing a set of states.  A loop over exponentials with half the power is preferable. 

(A deeper point is that the particular disjunction over actions, when computed exactly, has too little BDD-exploitable-regularity to rely upon; we never tried this, but in theory we could have represented our symbolic planning graph computations from the point of view of problem relaxation, in which case the transition relation approach might have fared better than it normally does, since the planning graph relaxation yields far greater opportunities for symbolic exploitation.)

Basically, the performance issue is that BDDs don't really reason about disjunction all that efficiently in the first place, and particularly the disjunction over all actions is too easy to externally implement to pass up. 

As an interesting theoretical aside, in theory, conjunction and disjunction are symmetric operations for BDDs.  So there "should" be some similar performance commentary having to do with frequently encountered conjunctions better dealt with outside the BDD package.  ...But I have yet to see anyone find a truly noteworthy exploit of this theoretical fact in a planning context.  If you could somehow make the ideas of "factored planning" work in practice, simultaneously with using BDDs as part of the implementation, you might find yourself staring at a too-big conjunction that could be dealt with by a loop over much smaller BDDs instead of having the package compute everything in one go.  Menkes van den Briel, for a loose example, spent considerable time optimizing integer linear encodings in order to better exploit the particular properties of that conjunction over factors that arises when pursuing such "horizontal" problem decompositions.  (The characteristics of ILP are more similar to CNF than to BDDs, however...)

Note that if one were using CNF or dDNNF, there would be far less reason to worry about the specifics of representing the transition relation (indeed, Blackbox writes out the transition relation in CNF directly, only optimizing it by weeding out the Graphplan-inferred impossibilities); this is a very BDD-specific performance issue.

(On the point of using BDDs to do everything: Edelkamp describes far more concretely how to get BDDs to faithfully perform A* in a PDDL-style planning context, but, since the last time I checked, which was admittedly quite a while ago, his technique(s) remain restricted to optimal classical planning with uniform cost actions, because they critically require that heuristics deliver only a small set of integral values.  Indeed, GAMER just ends up using h=0 to great effect.) 

-- Every other trick ("don't") --

Besides those two tricks, I wouldn't recommend fiddling with the knobs of CUDD in particular, or BDDs in general.  For example, manually and dynamically changing the variable order so that computations run faster (or take less memory), is a rabbit hole as far as I'm concerned.  (A rabbit hole I find much akin to manually and dynamically tuning GC parameters in a memory-managed language; better to take complete control of allocation, or leave it completely up to the implementation.)  Allowing CUDD to play with its own default dynamic variable ordering schemes is fine enough, of course.

...We (Daniel Bryce and I) more or less treated CUDD as a magical black box for efficiently caching the solutions to arbitrary problems in primarily boolean algebra (Dan went on to make greater use of its ADD, i.e., numeric, funtionality).  What I mean by that is that we deliberately limited ourselves to constructing all of our decision diagrams before "the main event", treating the package only as a cute way to effect a time/space tradeoff.  (...Lots of problems in planning require computing the answers to state-dependent boolean queries, and if you have memory to spare, precomputing all/many of them can be a win...) 

Practical Note: We never did amortize the time cost of building the decision diagrams across similar problems, but this was only for silly reasons.  In particular, doing so is entirely feasible if one is willing to take a more practical view than what the spirit of the planning competition allows; to illustrate: write out a "domain.analysis_10_1_2" file for the instantiation of the domain with 10 objects of the first type, 1 object of the second type, and 2 objects of the third type.  That (per-static-type object counts) is enough to (easily) identify an isomorphism of the state transition system after grounding to propositional form, so as to reuse decision diagrams (or other analysis) across differing initial states and/or goal formulas.

There are other ways to try and use BDDs "during the main event"; Stefan Edelkamp writes about several attempts.  Personally, I find the performance characteristics of BDDs too difficult to control when building them is used during search. 

For reference, two key operations that don't build new BDDs are evaluating at a point (linear in the depth of the BDD, e.g., bounded by the number of variables), and equivalence checking (constant-time, i.e., pointer equivalence).  Very handy operations in practice, those two.  Lots of things efficiently reduce to equivalence checking; satisfiability, for instance, is (non-)equivalence with the all-false BDD.  Negation (= "not") is also constant-time in any non-toy implementation of BDDs (the theoretical facet is complement-edges).

Which means that checking for entailment is also typically very fast (by reduction through conjunction (="and") and negation).  Conjunction isn't free, but if you have in mind a particular entailment check you frequently want to make, you can sometimes ensure that the relevant conjunction happens in preprocessing.

It is a bit surreal (and super-fun!) to put yourself in a mode of programming where you get to treat every hypothetical NP-complete (sub-)problem as "already solved in preprocessing: presently constant-time/cheap-enough to lookup".

You'll have to try it for yourself to really appreciate it, I suspect.

A quick survey of theorem proving systems more (and less) relevant to Prof. Russell's MRS

posted Oct 31, 2013, 1:24 AM by William Cushing

Prof. Stuart Russell implemented one of the most theoretically-capable AI systems ever, called the Meta Reasoning System.  I have no idea if the code is at all available, or if anyone has bothered to try and port it from the lisps it was written for to a modern lisp.  But it sure does have thorough documentation!
ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1080/CS-TR-85-1080.pdf

I was asked about what the modern contenders might be to MRS in the context of implementing a class project.  It makes me wonder...what if the MRS code still ran today?  Are the modern implementations actually any better?  For what value of better?

But anyways...



--------------------
-- Theorem-Proving
--------------------

Overview: http://en.wikipedia.org/wiki/Automated_theorem_proving
Competition: http://www.cs.miami.edu/~tptp/CASC/

The closest things in terms of scope (true negation, open worlds, first-order) and style (embedded in a procedural language) that come immediately to my mind are the following:

Kanren by Oleg Kiselyov an Dan Friedman, implemented in Scheme.  Available as a package (an "egg") for Chicken Scheme, for example.  I have no idea if anything practical has been done with Kanren.
  http://kanren.sourceforge.net/

ACL2 initially by Boyer-Moore (now many other contributors); implementation: common lisp.  This system (and or its predecessors) has helped make landmark contributions in computer-aided mathematics on quite a few occasions over the past several *decades*.   Albeit it is not fully automatic ... which is arguably the more practical stance, ultimately...
  http://www.cs.utexas.edu/users/moore/acl2/

Coq is extremely expressive.  It is implemented in Ocaml.  Fun fact: Xavier Leroy wrote a provably correct C-compiler in Coq, achieving within 20% of the performance of GCC (which has many bugs: not a compiler you want to be using to control airplanes...).  I don't know whether that 20% margin is an entirely fair estimate --- gcc produces wildly different code depending on what flags you pass.  So, grain of salt and all that, but that's the tagline for you.  It is a proof assistant, like ACL2.
  http://coq.inria.fr/
  http://compcert.inria.fr/

A colleague has used TVLA; I know nothing about it.  Coq is related to HOL/Isabelle/Miranda and a host of other names I also know next to nothing about.

All implementations of Prolog more or less qualify as relevant and practical.  Higher order, answer set, and tabling prologs have nicer theoretical properties with respect to negation.  Programming real systems in Prolog is a giant pain however.  I highly disrecommend using Prolog as an implementation language.  This is akin to saying: don't write your program *in* SQL (but *do* have your program talk *to* a real database engine).  SQL is just for that fragment of the application that needs to store and retrieve data under ACID guidelines, and compute simple queries over that data -- Prolog is for when the queries get more complicated than SQL can accomodate.
  http://en.wikipedia.org/wiki/Constraint-logic_programming

Embeddings of Prologs in Lisps abound.  Embedding Prolog within a deterministic, procedural, language makes it a 'million' times more practical.

Allegro Common Lisp has an embedded prolog that can use external databases as "the knowledge base", meaning one has access to SQL-esque volumes of data (terabytes or petabytes if you like), but Prolog-esque expressivity over those bytes.  In particular, transitive closure (the ancestor relation, reachability in a graph, ...) is very very easy to say in Prolog, and very very obnoxious (and inefficient) in relational algebra (in theory and in practice). For that kind of problem, Allegro Cache handily outperforms MySQL, in code size, runtime, or whatever other metric you care to make except the "lack of parens" metric.
  http://www.franz.com/products/allegrocache/

Not that MySQL is exactly state-of-the-art in terms of performance...but no SQL engine can handle transitive closure gracefully.  It's a deep theoretical point that distinguishes first-order logic from relational algebra.


----------------
-- Algebra 
----------------

Overview: http://en.wikipedia.org/wiki/Computer_algebra_system
I know of no competition, but here is a conference that is perhaps a good starting point:
  http://www.mathopt.org/?nav=ipco

The Big-Ms are Maxima/Macsyma, Mathematica, Maple, and Matlab.  Their theorem-proving capabilties in first-order logic tend to be much weaker (than the systems that bill themselves as theorem provers or theorem-proving assistants), but they tend to have much better support for linear algebra and real analysis (i.e., "Calculus").  Macsyma/Maxima is common lisp.  The others are commercial, and I don't know their principal implementation language.

Sage is a meta-mathematics engine that integrates many engines such as Mathematica together, using Python as glue and principal interface.  Python is also the principal implementation language of Sage.


----------------------------------
-- First-Order Probability
----------------------------------

Overview: http://probabilistic-programming.org/wiki/Home
Competition:
Probabilistic programming is in its infancy; the community is still years away from being able to have a large library of standard benchmarks and a regularly scheduled international competition hosted by an academic conference.  But it'll happen!

Even wider scope than first-order theorem proving is tying logic and probability theory together...

BLOG is the system we work on.  It is implemented in Java, but the embedding is deep -- the language it offers to the user doesn't require knowing anything about Java. (thankfully!)
  https://sites.google.com/site/bloginference/

HANSEI
is an implementation by Oleg Kiselyov, in Ocaml.  Possibly haskell as well.  I haven't used it (yet?).  The embedding is shallow; Oleg champions the concept of shallow embedding.
  http://okmij.org/ftp/kakuritu/

Figaro is a usable system.  It is shallowly embedded in Scala, which means you must be able to grok Scala to use Figaro.  That makes it more flexible than BLOG if you happen to be a wizard at Scala...
  https://www.cra.com/commercial-solutions/probabilistic-modeling-services.asp

Infer.net is the microsoft contender in the arena of probabilistic programming.  C++, surely.

Church is the MIT contender.  It has 4 implementations, of which I've personally only tried the compiler, Bher.  It seems that it can be made practical if you treat it like Python: offload all real computation to C.  From what I gather, they use Zero-MQ to achieve the task of having the church program talk to C-land.  It is a giant pain to get Bher up and running; I gave up on Windows.  On linux it was possible, but still a pain.  The instructions are not up to date; I suppose I should have fixed them (it being a wiki and all...): my apologies!
R and S are the industry standard tools/languages for performing statistical analysis of data.  The core engine of the tool R is implemented in C.  Much of the standard library is implemented in the language R itself, of course.  (It is a deep embedding; the user language offered has very little to do with C.)  That language doesn't *look* like a lisp, but it shares many of the key qualities of lisps that make them powerful in the hands of wizards.
  http://www.r-project.org/


----------------------------
-- Propositional Calculus (SAT)
----------------------------

Overview: http://en.wikipedia.org/wiki/Boolean_satisfiability_problem
Competition: http://www.satcompetition.org/

Post-AI-winter research, for better or worse, is heavily marked by severely limiting scope in order to achieve far better performance.

Especially the SAT sub-community took this to the extreme.  There are a host of systems that is constantly changing and are all much much faster than naive implementations for solving boolean satisfiability.  Many of the high performance solvers for anything more expressive end up compiling down to SAT nowadays, or something very close: SMT.  All the implementations are in C that I'm aware of.  I wouldn't be surprised if they've begun to walk the path of madness (assembly...).

The only way to stay up to date with traditional, CNF-based, SAT is to go check out the competition page.

I've personally used CUDD, which is the colorado university's decision diagram package; the nice thing about these (BDDs and ADDs) is that the implementations have a convenient API and the computational properties feature particular performance characteristics (O(1) checking for equivalence...) that are easier to manage effectively within a larger framework than calling out to a general solver of CNF (the normal form that SAT solvers typically accept --- note that equivalence-checking of CNF is more or less *the* NP-complete problem).  CUDD is pretty hard-core C.  That said, it was a breeze to setup and use relative to understanding the general C rules: thou shalt manage thy resources and check error codes!

I've heard good things about DNNF ("decomposable negation normal form"), but I've never tried to see what the implementations are like...


------------
Competition: http://smtcomp.sourceforge.net/2012/

SMT solvers don't change the inherent computational properties of SAT, but they do at least offer a far better API than traditional SAT solvers.  Benchmarking them is trickier, because they don't have that nice clean batch interface...more practical also often, unfortunately, means "more difficult to measure/quantify".

Z3 is a big name in the arena of SMT.  It is microsoft's entry in this arena.  The implementation is tightly optimized C++.
  http://rise4fun.com/z3

Vampire wins lots of competitions.  Straight up C as I recall.
  http://www.vprover.org/

The E-theorem-prover, as I understand, has some unique capabilities for certain kinds of problems that makes it a good member of a portfolio of techniques.  C, and some Python (front-end, one presumes). 
  http://www4.informatik.tu-muenchen.de/~schulz/E/E.html

Rosette is an implementation (in Racket, sortof a Scheme, but a lot more practical for day-to-day programming tasks) of program analysis+synthesis that compiles to SMT, and can specifically target Z3 or Kodkod (which, in turn, is implemented in Java and compiles to SAT solvers).  Source weakly available; I only mention this because I very recently was perusing her work.  Interesting, but all very prototype-y!
  http://people.csail.mit.edu/emina/
  http://racket-lang.org/



---------------------------------------------
-- Some notes on implementation language
---------------------------------------------

Racket is, incidentally, a very compelling language to work in for implementing tools that perform deep reasoning about languages.  It encourages that kind of thinking, and offers a lot of nice tools for getting feature-rich translators and compilers up and running single-handedly.  That said, the performance of such translators will be many times slower than had the implementation been in some lower level language.  *That* said: if the lower level implementation is 5 times as big...surely it has at least 5 times as many bugs...

ANTLR is a very nice tool if you want to parse character streams.  It is implemented in Java, and prefers to spit out Java parsers, but this is not an issue, because it can be made to dump the S-expression representation of the parse tree it builds with a minimum of effort.  One can then switch over to any other implementation language, because writing S-expression parsers is *trivial*.  (I highly disrecommend implementing complex language processors in Java, especially if you have only a relatively vague idea of what the language will be and the exact inference to be supported.)
  http://www.antlr.org/

Here is a useful litmus test for picking an implementation language: Imagine we want to find the point that is midway between l and r.  For simplicity, let's say we can assume l<r.
Do you want to continually keep in the back of your mind why (l + r) >>> 1 is correct in Java, but the right C expression is l+(r-l)>>1?
In Schemes/Lisps, the equivalent syntax of the obvious mathematical definition *is* a functionally correct implementation: (/ (+ l r) 2) correctly computes the average of l and r.  In those cases where machine-oriented languages would silently return something you weren't expecting (unless you personally *like* to think in modular arithmetic...), the Lisps will be slower, but correct.  Really well-implemented Lisps will be equal speed if they can prove that overflow isn't a possibility.  Performance-paranoid implementations like SBCL will even shout at you that they *couldn't* find said proof, so were forced to compile a slow path; if you care about the speed penalty, you have several options for fixing it.

Of course, the best language to use is probably the language you already know best.  It takes years to really learn a programming language well enough to be useful in it; in particular, learning the available libraries is key. 

Which is why I took pains to mention the implementation languages of all the tools above.  If you want to try and build off of one of them, my recommendation would be to start with a code base in a language you already know.

Reinventing everything is also a distinct option.  That is what usually transpires.  It is often far simpler to reverge-engineer a system than it is to modify its existing implementation.  Particularly if you want to extend the system with a feature its original architect never dreamed of (or actively opposed!), it is likely simpler to rebuild from a right foundation, rather than attempt to fight against a foundation that doesn't want to be that general.

Note, for example, the numerous systems that compile *to* other systems (e.g., especially SAT), rather than take the implementation and attempt to extend it to (say) first-order (or higher-order!) theorem-proving.



P.S. Indeed, now that I reexamine the original mail more closely, I doubly recommend sticking with a language you already know, i.e., because this is for a class project.  Calling out to something calling itself a Prolog or, if reasonable handling of negation is desired, an Answer Set Prolog is probably your best bet.  Prologs are well understood, and tend to have nice manuals, implementations, examples, file system interoperability, and so on.  SWI-Prolog, for example, runs out-of-the-box on windows (or mac, and presumably linux), no problem.
  http://www.swi-prolog.org/

For AnsProlog, I've used smodels back when it was state-of-the-art.  I suspect it isn't the highest performance implementation anymore, but I also suspect it is among the easiest implementations of AnsProlog to actually use and interact with.
  http://www.tcs.hut.fi/Software/smodels/

All prologs pale in expressiveness to MRS though.  They trade scope for, among other things, speed.

An abstraction penalty benchmark for Schemes

posted Oct 25, 2013, 12:31 AM by William Cushing

The following implements four different ways to get a stricter version of Scheme's builtin semantics for and.  (They consider only #t to be true.)
(define (true? x) (eq? #t x))
(define (strict-and/1 . args) (andmap (curry eq? #t) args))
(define (strict-and/2 . args) (andmap true? args))
(define strict-and/3
        (lambda args
          (cond
            [(null? args) #t]
            [(true? (car args)) (strict-and/3 (cdr args))]
            [else #f] )))
(define strict-and/4
        (case-lambda
          [() #f]
          [(x) (true? x)]
          [(x y) (and (true? x) (true? y))]
          ;; unroll as much as desired
          [(x y . z) (and (true? x) (true? y) (apply strict-and/4 z))]
          ))

This can be micro-benchmarked like so (for loops are a Racket-ism; your Scheme might need a repeat or dotimes or something similar...):
(define (test-and/1) (time (for ((x (in-range 10000000))) (strict-and/1 #t #t (if (> x 500) #t #f)))))
(define (test-and/2) (time (for ((x (in-range 10000000))) (strict-and/2 #t #t (if (> x 500) #t #f)))))
(define (test-and/3) (time (for ((x (in-range 10000000))) (strict-and/3 #t #t (if (> x 500) #t #f)))))
(define (test-and/4) (time (for ((x (in-range 10000000))) (strict-and/4 #t #t (if (> x 500) #t #f)))))
(define (direct-test) (test-and/1) (test-and/2) (test-and/3) (test-and/4))

Which, at least in Racket, belies the notion that compilers are anything besides pretty dumb:
;cpu time: 7550 real time: 7553 gc time: 653  ; curry eq? #t
;cpu time: 656 real time: 650 gc time: 139    ; true?
;cpu time: 530 real time: 531 gc time: 155    ; manual loop via variadic lambda over cond destructuring
;cpu time: 234 real time: 240 gc time: 32     ; manual loop via case-lambda destructuring

If I unrolled the loop one more time in strict-and/4, and setup a fairer test-harness (guaranteed a fresh GC state at the start of each test...), the GC time would be zero in the last case; the reason that garbage is being generated is that Racket is implicitly cons'ing up a list of the arguments to pass to the variadic procedures. 

That implicit list generation is why case-lambda is beating the more idiomatic lambda over cond.


A Problem with Figaro Semantics

posted Oct 18, 2013, 10:26 PM by William Cushing   [ updated Oct 18, 2013, 10:39 PM ]

Background: Probabilistic Programming

A cool way to do probabilistic modeling and inference is to take a (stochastic) procedural description of how worlds get generated, and pass that description as data to an inference routine.

So one can implement sampling from the prior distribution more or less just by implementing a form of eval.  The richer ones procedural language, the harder it is to implement such an evaluation, of course.

Having done so, the fun can really begin.  Instead of evaluating the probabilistic procedural semantics as given, one can deliberately choose to follow a priori unlikely paths, based on, for example, having looked ahead further into the process.  More or less equivalently, such "looking ahead" might actually be more accurately phrased as "learning from experience", i.e., one might choose to evaluate alternative pathways of the program because one is backtracking.

The most important reason to be investigating alternative pathways is to efficiently sample from a posterior distribution.  These come about because one has observed some concrete results of random experiments, and would like to know more information about related quantities. 

For example, a wet umbrella indicates that it is probably raining outside.  (Or, less likely, the sprinklers are running.)

As soon as one delves to any depth in this pursuit one encounters the issue of highly improbable, or downright impossible, events.  There are multiple related concerns, the most obvious one is efficiency(But my present concern has more to do with usability, as we shall see.) 

All time spent computing impossible or improbable scenarios is time completely or mostly wasted.

Indeed, classical theorem-proving systems concern themselves strictly with the task of reasoning efficiently about the downright impossible events.  Many human-centuries worth of effort has been spent on figuring out to reason about impossible scenarios efficiently. 

Suffice it to say that it is a well-motivated pursuit.

On Figaro

Existing probabilistic programming systems are all deficient in regards to their treatment of improbabilities and impossibilities.  Some, of course, are more deficient than others.  I've blogged on the topic before.

Note: Software gets better over time.  If the year isn't 2013...grain of salt, please.
(Pretty please: grain of salt and a hefty dash of skepticism always?)

Figaro is especially fragile.  During its inference stage --- wherein it is musing hypothetically about possible ways to run a probabilistic program --- it will actually carry out any real world effects that program could potentially execute, no matter how damaging or detrimental, even if the program itself makes the probability of executing those effects vanishingly small or downright 0!

That is, Figaro, in cogitating about the program, will step right past any precautions the program builds in, and speculatively execute the effect anyway!

if (green_light?)
  floor_it!

Of course no sane person would give an effectful-program to Figaro.  One is only supposed to give "pure" programs to Figaro.

But the fact that Figaro wouldn't respect the impossibilities/safety-checks that trustworthy programs build in has serious usability and efficiency consequences even when using Figaro properly.  It just takes longer to explain and motivate, so I opened with real-world effects.  They have a certain gravity to them.

Consider the example psuedo-code/model.  Suppose we drink a cup of sanity solution first:

if (green_light?)
  simulate_floor_it!

Figaro can be easily coaxed into wasting infinite time and space carrying out an arbitrarily detailed simulation of what precisely happens when "flooring it" at a red light. 

As a thought experiment it is perhaps interesting, at times, to contemplate the exact reasons why one should obey the law, but certainly the usual case must be that we don't waste time thinking about scenarios that won't happen.  This program fragment obeys the law (more or less), so if we are concerned about collisions, we should invest our time in the scenario where some other driver collides with us (the most probable collision in this scenario is a rear-end collision --- as a consequence of that realization, truly defensive drivers know better than to be strictly law-abiding; such drivers check their rear-view mirrors while stopped at intersections, leaving open a slim chance of entering the intersection to avoid a rear-end collision --- if you've ridden a motorcycle for any length of time, you understand).

Approaching concrete issues: the ranting today is because I've been running into many problems trying to translate BLOG models into Figaro. 

Figaro (especially the version you can download officially, rather than the version now on GitHub) will, on the slightest provocation:
  • blow stack (StackOverflowException)
  • dereference null pointers (NullPointerException)
  • throw NoSuchElementExceptions
  • throw quite a number of other custom exceptions, non-comprehensively including:
    • class InvalidDistributionException extends RuntimeException
    • class NegativeProbabilityException extends InvalidDistributionException
    • class ZeroTotalUnnormalizedProbabilityException extends InvalidDistributionException
    • class LessThanOneTotalProbabilityException extends InvalidDistributionException
    • class InvalidMultinomialIndexException extends RuntimeException
  • in addition to throwing any of the numerous more typical JVM-exceptions (like divide by zero)
If you've been down this rabbit-hole, you understand.  If not, all I can do is wish you maintain your bliss forevermore; I doubt I can properly and concisely explain how much an extensible exception mechanism sucks in practice.  Exceptions are a necessary evil, to be avoided as much as possible.  Throwing an exception is lazy and irresponsible.  If you are going to throw an exception, you should write an entire webpage (manual section, etc.) dedicating to explaining why it is impossible for that particular piece of code to respond correctly, and how exactly higher level code should go about dealing with this particular wart of reality too ugly to hide at the lower layer.  Or at least include an apology: error("I'm really sorry about this, but I just couldn't figure out what to do here.  If you think you know what should be done in this situation, shoot me a mail: <the coder's email address>. Some info with a small chance of being useful: " + <stuff>).

Good tools take many inputs and produce few outputs; good tools fix problems, not create them.
  I'm too lazy at the moment to look up the famous person (Alan Perlis? Josh Bloch? Dijkstra?) and the proper form of the quote.  I'm sorry!  Shoot me a mail and I'll fix it: william.cushing@gmail.com.  (Spammers have your email address anyways.)

Still with me for the rest of the war story?

A Close-to-Real Example


Consider, in its notation, the random variable that is the ratio of two other random variables:
Apply(Uniform(0,1),Uniform(0,1),(x:Double,y:Double)=>x/y)

With vanishingly small probability, this will generate a divide by zero exception.  That exception isn't caught anywhere within Figaro.  It brings the whole inference process down, even though Figaro is only supposed to be speculatively executing (i.e., simulating) the program. 

Thinking
about dividing by zero shouldn't be fatal! 

But, "okay, fine" you say: divide by zero (and exception handling in general) is the classic bread-and-butter for programmers.  It is, arguably, a lion's share of why we actually get paid any money: this is the part of the job we, and everyone else, hates doing.  It is a part of programming that is work, not play.  (In general, programming is great fun: exception-handling, however, is loathsome.  That's why it's a job.)

So just buckle down and do the work, right? 
Why rant?


Remember flooring the gas at the stop light?

val xVar = Uniform(0,1)
val yVar = Uniform(0,1)
val yIsZero = Apply(yVar,(y:Double)=>y==0)
val defaultVar = Constant(null)

Chain(yIsZero,(test:Boolean)=>
    if (test)
Apply(xVar,yVar,(x:Double,y:Double)=>x/y)
    else defaultVar)


Feeding this to Figaro, or at least remarkably similar (but too elaborate in construction to go above the cut) input can still result in an abrupt termination of all inference due to it actually carrying out the division on the underlying JVM, despite my painstaking effort to tell it that the JVM considers division by zero a capital offense. 

This I cannot fix.  (So the answer to "Why Rant?: Do your job!", is "I can't: upstream needs to first!".)


A Letter to Dr. Avi Pfeffer (foreword)

This webpage/rant was written stemming from the desire to somewhat better document the perspective of a potential user of Figaro, ultimately to help better its design and implementation.  It is part of an ongoing technical discussion regarding an automated translation from the BLOG language to Figaro, which we here at Berkeley are developing as part of a DARPA sponsored project to push the technology and community forward.  Making Progress Happen!  Pretty cool, eh?

My most recent contribution to that discussion was too long to be comprehended/digested in email format.  So I put it up here to make it potentially easier to read and understand.  (Sometimes the prettier fonts do make things substantially easier to work through...)

Without the proper context, I imagine the contents won't make much sense to random passerby, but, hey, who knows?  Maybe some other lost soul will find themselves struggling to get VariableElimination working on psuedo-cyclic probabilistic programs...and perhaps find some useful tidbits for grokking Figaro source.


On Fri, Oct 18, 2013 at 11:12 AM, Avi Pfeffer <apfeffer@cra.com> wrote:

Will,

Sorry it’s taken me so long to get into this, I’ve just been reviewing the email thread.

You’re bringing up some deep issues, and I appreciate it. If possible, I’d like to find a way to resolve them by extending Figaro, rather than hacking it in user code, or changing the guts of how Figaro is implemented. I’m thinking that Figaro could provide some generalized stream data structure from which elements could be pulled flexibly on demand. Would that get at the problem?

--Avi

I certainly understand having too little time in the day :)

The following isn't going to help...



As far as the original bug report goes, Figaro 2 addressed the particular issue that my patch addressed, but in a better way (huzzah!). 

However.

More generally speaking, Figaro 2 is still quite difficult to use with cyclic and infinite models.  The assumption that the generative process is a compact encoding of a finite (C)BN is baked into the code in many places; indeed, much of the code assumes that references between Elements are not just acyclic, but moreover tree-structured.  Simply providing a lazy stream of elements isn't going to help much --- that allows an infinity, but only a tree-structuring of references between them.  Which makes it easy to implement and reason about (particularly regarding garbage collection) --- that it is easy to implement and reason about is more or less the dead give-away that it won't help model first-order structures.  First order structures are more general than graphs; assumptions baked into the Figaro code that elements form collections that are restrictions down from Graph are, essentially, show-stoppers. 

Consider the following snippet from the new Universe.scala:
    if (visitStack.contains(elem)) throw new IllegalArgumentException("Cyclic set of elements - cannot walk graph")

That is a bug report (of kind feature enhancement) waiting to happen if we try and apply Figaro beyond the bounds of things statically reducible to CBNs.

That is only one example.  Following is a great litany of problematic lines of code. This is not mere speculation of potential problems; I actively encountered them when working on the translation of the hurricane model.

More constructively, attached is a version of hurricane.scala that captures about 98% (made up number!) of the true BLOG semantics.  Any help in preserving those semantics while specifying the manner in a more compact manner is much appreciated.  It won't work when you run it on your machine, because, as-is, it uses VariableElimination, which uses Values(), which blows up on cyclic descriptions in much the same way that elemGraphBuilder threatens to (but doesn't!).  It runs on my machine because I've patched Values() in a hacky way to force it to work. (Read on.)

As far as the model goes, it is close enough to the true BLOG semantics that the posterior distributions will be computed correctly for the given parameters, but the lines of code themselves will break if parameters are changed (for example, if we reduce the number of cities to one, then BLOG will still function, but the translation cannot handle that change).  There is also an issue that none of the memo tables should be declared inside of procedures; it doesn't impact this model, but one can imagine situations in which such stack-allocated memo tables get thrown away only to be recreated later for what is conceptually (but not actually) an identical context, with the result that identically distributed (but distinct!) random variables get created rather than identical random variables getting referenced. That will lead to difficult to detect statistical anomalies; e.g., to notice, one might have to compute variances rather than means, and pay very close attention...which is why I wanted something a lot more reliable than ImportanceSampling to work with...

So of particular interest (to you) is that my patched version of Figaro can compute the exact posterior distributions on this model using VariableElimination

(Your version will blow up with a stack overflow exception.)

(In my hunt to get VariableElimination working on this model, I uncovered many suspicious lines of code, hence the following litany.)


-Will


Don't call virtual methods from constructors!!

Returning to the point "The assumption that the generative process is a compact encoding of a finite (C)BN is baked into the code in many places."...

For example, Element[V]() calls generate() so that .value gets initialized to something properly drawn from its prior distribution (rather than to whatever the default value of the type V is).  This assumes that calling generate() is
  • cheap
  • correct to do before all initialization is complete
neither of which is correct in general, indeed, the latter is almost always wrong, and the former is in particular wrong when encoding infinite models.

From Select.scala:
  private[figaro] lazy val (probs, outcomes) = clauses.unzip // lazy to avoid uninitialized val bug

As a consequence, Select() has to declare its clauses and normalized probabilities as lazy vals.  Declared as vals, they would get initialized after their base classes/traits, which is too late, because the base class Element already calls generate() which needs the normalized probabilities.  By declaring them as lazy vals, one gets the cute benefit of inverting the normal Scala initialization order, which fixes the bug, but in arguably the wrong way (lazy vals aren't free...!).  There wouldn't be a bug in the first place if Element() didn't call generate() as part of its constructor.  This is an old lesson we learned in Java/C++: constructors should never call, directly or indirectly, virtual methods.  While generate() itself is declared as final, it calls non-final methods, ergo, the call to it from the body of Element still violates the principle of no-calling of virtual methods during initialization, and hence is a bug waiting to happen in every inheritor of Element.

Instead of requiring every concrete inheritor to use lazy val essentially everywhere, it is simpler and better to insist that they all manage their own initialization, i.e., call generate() at the right time. 

An easy way to get Scala to hit you over the head with the requirement is to delete: " = _ " from the line:
  var value : V = _
in Element.scala. 

That is, what that will do is get scala to hit you over the head with thinking about when and how to initialize any concrete inheritor of Element;
 it will do so by insisting that each concrete inheritor spell out some concrete definition of "value". 
There is really only one correct definition:
  var value : V = _
  generate()

which could be made more compact and obvious by having generate() return the generated value:
 var value : V = generate()
but either way, this correct definition ***can't be inherited*** (or shouldn't, in any case), because when it takes place does need to be overridden.


Don't call generate() and friends recursively!!

        (and/or, in Scala, "_" means default, not "null")        


On a related node, Element.scala contains the following crucial comment, that, much like comments the world over, does not, unfortunately, actually constrain the code:
 * Elements also have a current outcome, represented by the value field. Naturally, the
 * generateValue function can refer to the current value of related Elements. However, generateValue
 * is not allowed to call generateValue on another Element. We use the notation generateValue(r | w)
 * to denote the value it produces given randomness r when the current value of related Elements is
 * w.

Apply.scala, Dist.scala, Chain.scala, Select.scala all directly violate this rule.  ReferenceElement, by way of ElementCollection, in the case of multi-valued references, is skating very close to the line because it ends up relying on Values(), which does not technically call generate() or generateValue(), but does the moral equivalent++.
 
For example, Chain.scala clearly violates the contract (I think the braces are only in my version, but otherwise this is the code as it appears in github):
  def generateValue() = {
    if (parent.value == null) {
        parent.generate()
    }
    val resultElem = get(parent.value)
    if (resultElem.value == null) {
        resultElem.generate()
    }
    resultElem.value
  }


Moreover, it violates that contract in a manner which is wrong even if one were to change the full contract of Element to switch to lazily initialization of .value.

That is because "null" is not the name of the default value of an arbitrary type in Scala.  "_" is the name.  So closer to correct code (but still wrong!) would be:
  def generateValue() = {
    if (parent.value == _) {
        parent.generate()
    }
    val resultElem = get(parent.value)
    if (resultElem.value == _) {
        resultElem.generate()
    }
    resultElem.value
  }

This is still wrong, because the default value of a type might very well be a legitimate value. 
 
A correct version --- in the alternative universe where the contract for Element[V] is changed so that .value is initialized lazily --- is:
  def generateValue() = {
    if (!parent.initialized()) {
        parent.generate()
    }
    val resultElem = get(parent.value)
    if (!resultElem.initialized()) {
        resultElem.generate()
    }
    resultElem.value
  }
where initialized() is coded up so as to check a separate --- out-of-band --- flag on whether or not the Element has been initialized yet.  There is no way to embed "intiialized?" within the value type provided by the user --- every representable value, ***even null \in AnyRef***, might be legitimate.  For example, see the attached hurricane model, which tries to use "null" as a value.  That is of course a bug report waiting to happen (it won't be triggered in the case of having 2 cities though). 

We can of course workaround the fact that some parts of Figaro treat the null value specially...but I argue that it is better to fix the logic within Figaro....

Anyways, it is awkward (and potentially slow, because generateValue is in the inner loop) to have all the language code responsible for implementing lazy initialization.  Which is "why" lazy initialization isn't the name of the game for Elements.

So returning to the principal universe wherein Elements are eagerly initialized...a really correct version (maybe...read on...) of Chain.generateValue() is just:
  def generateValue() = {
    get(parent.value).value
  }
Neither null-checking of .value nor initialization checking of the Elements that wrap them is necessary or desirable; null might be a legitimate member of the value type.

(Nothing should go wrong if we pass (wrapped!) nulls through Figaro and back to the user!)

On a related note, the following snippet of code and commentary in Select.scala is completely wrong:
  def generateValue(rand: Randomness) = {
    // This line generates a warning but it is not applicable
      // WC: That's not true --- scala Double really is never null mostly b.c. of
      //   Double2doubleNullConflict(null);
    probs.foreach(prob => if (prob.value == null) prob.generate())
    ...
  }

The comment-not-from-me indicates that the programmer ignored the compiler-generated warning informing us that prob.value == null is always false in the case that the value type is Double.

The compiler is completely correct here --- that test is always false.  It is impossible to get a var/val of scala type Double to hold null.  Attempting to do so, even by reflection, will generate a null pointer exception before this code ever gets a chance to run.

A psuedo-more correct form of the code is again (psuedo-more correct in that it gets the scala name for the default value of a type correct):
    probs.foreach(prob => if (prob.value == _) prob.generate())
except that the default value "_" of Double (which is 0.0) might, of course, be an entirely legitimate sample result, and just discarding an otherwise valid sample is not an appropriate response. 

More importantly, calling .generate() violates the all-important contract of Element that only algorithms get to call generate(), generateRandomness(), and generateValue() --- otherwise sampling could be biased and/or inefficient.

A substantially more correct form of Select.generateValue() simply deletes that line.  It isn't doing anything (and even the JIT might be deleting it...).


            Or is the issue exception-suppression??                               

 AKA exceptions and nondeterministic evaluation: Don't do it!


What that line (and all the related lines checking for null) seems to be trying to do is to avoid generating the exception that util.sampleMultinomial() (among other low-level utilities) can generate. 

Many more or less valid models might occasionally want to do something like uniformly sample from an empty set.  (Which isn't defined, and Figaro generates a runtime exception, which nobody catches.) Figaro generates such runtime exceptions...even when the events in question have measure zero!  This makes Figaro especially awkward to use as a backend, particularly for BLOG models, which assume that measure-zero exceptions can be discarded silently.

That is, I argue, what this code is trying to do is much like what the code in Chain is trying to do --- that is, trying to work around uninitialized/erroneous/illegitimate elements.  Except that, architecturally, it is impossible for .value to tell us that an element is illegitimate.  Element[V] is supposed to be the random thing with value lying in type V, and its member .value is declared as belonging to V itself.  If you want to use value-passing, rather than exception-catching, to smoothly handle the point that the random process generating a random thing may have gone awry, then in Element.scala:
  var value : Option[V]
would be an idiomatically scala way to achieve that goal.  However, that is a big change to the contract of Element: lots of code would be impacted.

In such an alternative universe every inheritor would have to code up generateValue along the lines of (for Apply, say):
  def generateValue() =  parent.value match {
      case Some(x) => try { value = Some(fn(x)) }
         catch {
           case _ : Nonfatal => value = None
        }
      case None => value = None
    }

Which is definitely much more obscure than:
  def generateValue() =  (value = fn(parent.value))


(Which is not the present contract-violating version:
  def generateValue() = {
    if (arg1.value == null) arg1.generate()
    fn(arg1.value)
  }
Which is, again, psuedo-more correct if it uses "_" instead of "null" --- but not really, because the default value of a type could very well be a legitimate value of the type.)

All this null-suppression in the new version was added, I suspect, to deal with problems similar to what we are having in translating BLOG models. 

Exceptional events of measure zero should be suppressed/suppressable.

There is presently no clean and easy way to do this. 

If the present pattern of null suppression was extended throughout Figaro, then user-land would at least have a reasonably straightforward way of explicitly suppressing exceptions by only using subtypes of Element[AnyRef], holding out "null" as an error marker.  This has drawbacks --- it isn't idiomatic for Scala, and Figaro makes heavy use of Element[AnyVal]s, particularly Element[Double], which is (as noted above) not nullable, just like the compiler says, as discussed above. 

(One could use Element[java.lang.Double], which are nullable, but working with java.lang.Double from scala is extremely nontrivial (scala's implicit conversions will constantly be lurking).  In the end, one would end up preferring Element[Option[Double]], and at that point, one would want all of that null-suppression logic to become None-suppression logic...discussed further below.)

Nonetheless, forging ahead on the "null means impossible" line: that stipulation could certainly be added to the contract of Element (which I'm mostly fine with).

But I'd still like to point out that the existing half implementation is wrong!

We shouldn't respond to finding ourselves in a universe of measure 0 by resampling the parent; algorithms are responsible for making sampling decisions. 

The proper protocol within Apply.scala for the enhanced contract in which "null" is a Figaro-distinguished event/value of measure zero is (for example):
  def generateValue() = {
    if (arg1.value == null) null   // not "fn(arg1.generate())"!!
    else fn(arg1.value)
  }

(or in its None form:
  def generateValue() = {
    if (arg1.value == None) None
    else fn(arg1.value)
  }
)

I'm mostly okay with this route, albeit it doesn't really solve the whole problem.

For example:
  Chain(continuous.Uniform(0,1),continuous.Uniform(0,1),(x:Double, y:Double) => x / y)
would still occasionally blow up the system, and the amount of user-land boxing and unboxing of Option is going to get tricky, because you can't access the Option until you unbox the Element holding it, and unboxing Elements in general means invoking Chain, which is doubly awkward.

Really, we just want the inference to ignore any part of the search space that generate exceptions. 

"Figure out how to run this program correctly!" 
"Now tell me what fraction of correct runs satisfy Q!"

Suppressing exceptions from user-land is tricky, particularly because figaro links against scala.Double rather than java.lang.Double for its continuous library.  (java.lang.Double is nullable, so special support for null-suppression would be easier to leverage.)  Note that proper exception-handling is always tricky, but Figaro is a nondeterministic language.  We have to catch and handle exceptions in just the right way, so that any backtracking through the model code works correctly.  That is hard to be sure of unless you know how the inference algorithm really works!  So I would argue the proper place for implementing suppression of measure-zero exceptions is within the language+algorithm packages of Figaro.

For example, here is a sketch of a much friendlier version of (Compound) Select, as a utility procedure (it would have to be dressed up with name and collection to fit within the rest of Figaro...)

def Select[U](clauses: Seq[(Element[Double], Element[U])]) = {
  val (probs,outcomes) = unzip clauses
  val parents = Inject(probs)
  val sumElement = Apply(parents,(xs:Seq[Double]) =>  (0.0 /: xs)(_ + _))
  /**** crucial difference in behavior here ****/
 val positiveDouble? = (n:Double) => n > 0
 sumElement.addCondition(positiveDouble?)
 /* existing version throws a Figaro-custom Scala-level exception from within generateValue;
     users have to dig into the Figaro source code (or stumble across the exception) to know its name (and so be able to catch it),
     and catching it (and continuing) in just the right way isn't really easy to do, at all, mostly because we want to *continue* by having
     the algorithm backtrack however far it takes to allow the random doubles to sum to nonzero probability.
     this "right way" is just what the above .addCondition() does (or should) achieve. */
/** the following additional addConditions aren't a bad idea: **/
// probs foreach((p: Element[Double]) => p.addCondition(positiveDouble?))
/** this is psuedocode: it requires adding the three-argument form of chain **/
 Chain(parents,sumElement,outcomes,(xs:Seq[Double],sum:Double,zs:Seq[U]) => {
    val ys = xs map (_ / sum)
    /* since we've gone to such lengths, we could consider calling an internal version of selectMultinomial that skips error checking
    * however, not all reasoning about models guarantees that our program is always evaluated in the way that we imagine...
        */
    selectMultinomial(ys,zs)
})}


(Probabilistic) Programs generate self-referential, unbounded, structures:

...not mere trees

...not mere dags

...not mere graphs

but rather

        ***full blown model structures of first-order logic***


!!!


Or with less overblown exclamation: registerUses() is better...but (perhaps?) still flawed

and more utility: a hack for Values() to extend the reach of VariableElimination



Returning to the original bug report...and coming to my improved version of Values()...

The current version of Figaro technically fixed the issue, because its version of registerUses() is much upgraded. 

However....


My guess is that the implementation went through the stages:
  1. registerUses does a full recursive walk to compute the transitive closure of dependency information when elements get added (as does deregisterUses).  This is inefficient, but certainly correct.
  2. as an "optimization", registerUses attempts to incrementally maintain the transitively closed cross-reference information as elements are added (and deregisterUses does the same for removal).  The implementation is flawed (and my patch doesn't completely fix the problem), primarily because the "right" way of doing this is to separate the storage of the edges implied by transitivity (instead of mutating the original graph by adding the transitive edges to it).  In the case that elements are only added in a stack-like manner (or entirely preallocated), which all existing examples I've seen of using Figaro do, the bug is not exposed.
  3. uses() and usedBy() do the full recursive walk at the moment cross-references are needed.  The result of the walk is cached, for efficiency.  registerUses() (and deregisterUses()) ensures the cache maintains coherency by *completely* clearing the cache whenever elements are added (or removed).  This is more or less a correct implementation, as well as being reasonably efficient.  It might be possible to invalidate less of the cache, but at present this won't buy anything because of the way elemGraphBuilder works.

In particular, I am somewhat concerned about the following snippet in elemGraphBuilder:
    if (visitStack.contains(elem)) throw new IllegalArgumentException("Cyclic set of elements - cannot walk graph")

That is worrying, but I am not currently running afoul of it despite generating arguably cyclic dependencies; Values(), for example, explodes because of the "cycle" present in all attempts at translating the hurricane model.  The fundamental reason that Values() explodes, (unlike elemGraphBuilder), is because it actually invokes the scala procedures attached to Apply and Chain nodes --- without setting up the correct environments.

That is, it invokes a Chain() node on an arbitrary parent value without ensuring that all related elements in the universe have values that are consistent with that parent value --- the way the hurricane model is set up, there are interior Chain nodes whose full parent domains cannot be correctly explored without resampling the grandparents. 

(The preplevel of a city can depend upon the preplevel of another city, if that other city is hit first.  It can never depend on its own preplevel --- if we evaluate the entire probabilistic program from start to finish --- but Values() will walk the "path" through the program that first assumes city 'a is the first city hit, and, later on, makes the contrary assumption that city 'a is not the first city hit, which leads to a cycle, and a Scala exception.) 

Because elemGraphBuilder only explores the dependencies in the present Universe [i.e., it doesn't speculatively call scala procedures with "wrong" arguments] it does not lead to exceptions on the hurricane model.

Expand() tries to rearchitect the walk that Values() performs, to fix this issue (c.f:
  // We keep track of the set of expanded elements so we don't expand the same one twice.
  // It is important to mark an element as expanded before recursively expanding its arguments so that we
  // don't have an infinite recursion with cyclic models.
  private var expanded: Set[Element[_]] = Set()
)

...but the implementation doesn't work because it still relies on Values to do the grunt work, which will exhibit exactly the behavior that the comment discusses, because Values() doesn't check Expand.expanded().  (Which might be better called expanding, and is probably more efficient and likely more correct as a val to a mutable set, rather than a var to an immutable Set.)

A two-line hack (with my debugging 'infrastructure' commented out...) to Values.apply() is the heart of getting VariableElimination to work correctly on the attached version of hurricane.scala:
  def apply[T](element: Element[T]): Set[T] = {
      //println("memoValues size: " + memoValues.size);
    memoValues.get(element) match {
      case Some(set) => {
//          if (set.size <= 1)
//              println("Element: " + element + " has only one possible value...at this point.");
          set.asInstanceOf[Set[T]]
      }
      case None =>
          // in case an element recursively depends on itself,
          // have recursive calls assume that only the current value is possible.
          val set1 = Set[T](element.value);
          memoValues.update(element,set1)
          // so the result of the recursion ought to be some sort of least fixed point
        val set2 = values(element)
//        if (set2.size <= 1)
//            println("Element: " + element + " still has only one possible value.");
        memoValues.update(element,set2)
/* albeit, if we really want to be sure of that, we should do an explicit loop akin to:

 *   while ((set2 = values(element))!=set1) ...
  *
 * or, better yet, use mutable Sets, and have
 * the heart of Values() incrementally update the Sets,
 * as in a classic implementation of dataflow analysis for compilers.
  *
 * Many algorithms *are* best implemented with mutation. 
 * Dataflow, parsing, shortest paths, transitive closure...

  *
 * Don't believe the functional programming paradigm lies;
 *  it is true that lack of mutation makes algorithms easy to analyze.

 * Consider though the full implication of "easy to analyze". 
 * That means the algorithm isn't achieving anything inherently complicated...

  *
 * Accurately forecasting the possible values for models generated by Turing-complete machinations?
  *
 * I'd say that is complicated stuff. 
 *
 * It deserves an algorithm that is difficult to really understand in any other way than painfully tracing it.

 * All that said, there is one important restriction on mutation that can still be made for this problem:
 *   monotonic mutation.

  *
 * (The scala type system won't let you (afaik) easily express this constraint.)
 * On the plus side, the fact that the heart of Values() is carrying out

 * a recursive walk with (if implemented in the 'right' way) 'side' effects that
 * only *monotonically* increase the sizes of certain collections makes the termination proof

 * easy and obvious, which is a good thing. 
 *
 * Conversely and still plus, for this problem (dataflow), the monotonicity restriction does not get in the way.

 * So it's a good restriction.
 *
 * Which is quite unlike the no mutation-at-all restriction
 * ...as that encourages the existing implementation that blows the stack on a cyclic model.

  */
        set2
    }
  }


The architecture in Expand() (and in elemGraphBuilder) is a smarter way to get the job done (as waxed upon philosophically in the comments), but I just wanted something quick and easy that was just good enough to get hurricane.scala to work.  (I'd rather not be hacking Figaro internals --- I'd rather be working out the semantic bugs in how we are translating from BLOG --- but the precise issue I'm investigating would be rather easier to investigate with the option of running an exact algorithm on finitely groundable models with cyclic descriptions...)


 A conclusion at last!


So long harangue to a close:

Try it out!  Pretty nifty to get VariableElimination working on a "cyclic" model!

Note that it is perhaps the case that the above two-line change to Values.apply() is not itself sufficient --- I did make other changes to Values(). (One that I can recall: a global search/replace from this(...) to apply(...), because I was concerned that this(...) might be constructing new objects rather than calling this.apply(...), which I now suspect was an unfounded concern).  I kind of *hope* that the above change is not, in fact, sufficient.  Take the attached model as a given: there is a version of Figaro, close to the one you possess, that works on this model.  If you fix it yourself, I rather imagine the fix will be a good deal better than my little hack.

A Rumination on Effective Terminology for (Interleaved) Temporal Planning

posted Sep 20, 2013, 11:25 PM by William Cushing   [ updated Sep 20, 2013, 11:40 PM ]

Some context (...not that the following will make sense to anyone but an expert!):

David E. Smith writes:

[a confused discussion...]

You're answering a different question.  My question is much simpler.  A "total ordering" of a plan is generally interpreted to mean an instance of a partially ordered plan in which the actions are totally ordered. I'm asking if there is a term used to refer to an instance of a partially ordered plan in which the start and end times of the actions are totally ordered, which is a weaker requirement than total ordering of the actions.  The terminological confusion above is precisely what I am trying to avoid.


My full reply is too long for email.  Here it is!



The terminological confusion in temporal planning literature is rampant, especially because "plan" can mean "what the validator sees" or "what the planner thinks about", which then further splits over every kind of planner...

Similarly "state"/"situation" can mean lots of things, among them the notion I rather despise of a "search state" (i.e., either a plan, or a set of plans, depending on whether one is emphasizing the current node, or all of its descendents), which leads to bad code (FD, for example, confuses world states with search states, and the FIXME comment pointing this out has remained  in place for something like 5 years now, because fixing the mistake once deeply committed to is not so straightforward...).

I do not believe there is any standard terminology in temporal planning for the various key distinctions....I don't believe there is any standard consensus on what the key semantics are supposed to be (let alone the terms used to refer to them!) beyond, perhaps, the limits set by TGP.  (And even then I'm not entirely convinced that, say, CPT, TGP, DAEYAHSP, and early versions of LPG are all entirely in agreement!)



The specification defines "happening sequence" and "schedules", but the former term goes unused in literature and largely in practice.  VAL generates happening-sequences, but, it uses a broken algorithm to do so. 

Anyways, "happenings" are not what you are trying to refer to.

"snap actions" is, I think, slightly more well-received than "instantaneous actions". 
But I didn't really like it well enough to use in my dissertation...

(That is because "instantaneous changes" are anathema to a state-space view --- "atomic" is better.  Partial-order mathematics don't care if effects are instantaneous or atomic, because they don't try and answer questions like "what is the value of f at t?", which becomes contradictory when effects are point-size, but perfectly fine as long as either change always takes time, or the domain of time is richer than plain numbers.  Numbers \times {before, after}, for example, is the domain that people implicitly use when they draw step functions (using the open-circle and closed-circle conventions to describe the value at the discontinuity), i.e., Numbers \times {open circle, closed circle} is what such drawings actually consist of...(albeit they /refer/ to Reals, but Reals aren't drawable...).)


My answer

I use "effect-sequence" or "effect-order", depending on whether I mean:
 effect-sequence: S \in { i \in Natural -> e \in Effect } s.t. Dom(S) = 1, 2, ..., k for some k
 effect-order: O \subseteq Effect \times Effect s.t. O is transitive, anti-symmetric, and irreflexive.

There is, of course, a natural isomorphism between effect-orders that are total and effect-sequences, but they are different objects, and it is not computationally free to extract such a sequence from a given total order. The definition of X-sequence is not controversial, but X-order is.  Some people would prefer:
 effect-order: O \in <Effect, Links> s.t. Links \subseteq Effect \times Effect and O is acyclic.

That allows one to flip the implementation of "transitive" between a time cost and a space cost: the transitive closure can be computed as desired, or saved.  The advantage of *not* saving is that it becomes cheap to determine the longest chain of the total-order, i.e., to extract the "obvious" sequence, relative to being told what the minimal vertex is.  This savings is more noticeable in the case of wanting to extract a chain decomposition of an entire partial order; storing the partial-order with only the minimal links and a designated set of minimal elements makes extracting good chain decompositions much simpler (than if all the transitive edges appear explicitly).

Anyways...

I use "action-sequence" and "action-order" in the same way, i.e.:
 action-sequence: S \in { i \in Natural -> a \in Action } s.t. Dom(S) = 1, 2, ..., k for some k.
 action-order: O \subseteq Effect \times Effect s.t. O is the edge-relation of a directed acyclic graph

Actually, I think I slightly prefer:
 action-order: O \in { x \in Effect, y \in Effect -> b \in Boolean } s.t. ...
but that is just the functional programmer in me screaming to escape from a world of too-much Java...(and, in fact, many questions about relations are rather harder to compute nicely if all we have is the truth-function representation, so realistically the structural definition is probably superior to the functional definition...).

If I need to be a little bit more formal I distinguish between "(compound) actions" and "(atomic) actions".  But I really don't like overloading the use of "action"; the distinction between the actions proper and the model of their execution is too important to relegate to an adjective. 

So, while it is admittedly an abuse to identify atomic actions with the label "effect"...that is precisely what I do.
This is an abuse, because "effect" does not usually include "precondition", or "duration", or "cost", but in my abusive usage, it often does include especially "precondition", but sometimes also any other properties of an (atomic) action.

Abuse aside, it seems to work quite nicely, when discussing PDDL-style temporal planning, to reserve "action" for the compound things and "effect" for the atomic things. 
(After hundreds of pages, I don't tire of it...)



Allow me to demonstrate (...in less than hundreds of pages!):

I call a partial order on durative-actions/compounds/processes/tasks/dispatchable-entities an "action-order".
Likewise a partial-order on their atomic (i.e., atomic insofar as the provided simulation theory of the dispatchable entities is concerned) constituents I term an "effect-order".

In the case of simple temporal networks I would say "action-STN" versus "effect-STN". 
(Similarly for "action-ILP" versus "effect-ILP".)

Schedules of such get called "action-schedules" versus "effect-schedules" by yours truly.

In the case that every schedule consistent with some simple temporal network is "executable", then I call the network itself an "executable STN".
As in: "The action-STN P  is executable, precisely because every action-schedule S  of P  is executable.".
Likewise for effects.

I always stipulate how any particular adjective lifts to structures of higher type.  That is because they do not all lift in the same way.

For example, it is more natural to lift "optimal" using an existential (rather than universal) quantifier:
  An "optimal action-STN P " is such that some action-schedule consistent with happens to be "optimal".



Returning to your question...

I would call a partial-order on actions possessing the property that all of its schedules are "executable" an "executable action-order".

Likewise an "executable effect-order" is a partial-order on "effects" such that every assignment of concrete times satisfying the constraints of the "order" results in a simulation satisfying all the constraints implied by "executable". 

In other words, an effect-order is executable precisely when all of its schedules are executable.

If, for some reason, it were crucial to distinguish between total-orders and sequences then I might find myself uttering "total action-order" or "action-total-order", versus "total effect-order" and "effect-total-order".

But "action-sequence" and "effect-sequence" read better.

So, for example, an "executable action-sequence" is such that all of its schedules (which are action-schedules) are executable.
Likewise an "executable effect-sequence" is such that all of its schedules (which are effect-schedules) are executable.

Lemma. An "executable action-order", if it exists, never demonstrates required concurrency of actions. 
Pf. That is because partial orders can always be scheduled sequentially, i.e., can be strengthened to total-orders/sequences arbitrarily.
Of course, an executable action-sequence"obviously" demonstrates (or should I say defines ?) a lack of required concurrency (of actions).

Lemma. Likewise an "executable effect-order", if it exists, never demonstrates required concurrency of effects.
Pf. Same reason, i.e., partial orders can always be strengthened to total-orders, which are isomorphic to sequences, and an "executable effect-sequence" obviously demonstrates a lack of required concurrency of effects.

(What is "required concurrency of effects"? Implementing a swap of two registers in the microcode of a digital, clocked, general-purpose processing unit...)

Lemma/Def. An executable effect-order, sequential or otherwise, easily witnesses required concurrency of actions.
Pf. This is as simple as viewing the order as an acyclic graph, contracting all the constituents of actions, and checking to see if the result is still acyclic. 

(Such is my preferred definition of "(causally) required concurrency (of actions)".)

Elaborating...

If still acyclic, then the given effect-order is just the canonical representation of the action-order induced by the contraction (which sounds very complicated, but amounts to the same thing as saying "The rational value 1/1 is "the same" as the integer 1."), and in particular, fails to demonstrate required concurrency of any kind.

If the contracted result is cyclic, then one has just witnessed "required concurrency (of actions)" of a most-likely rather pernicious kind: assuming that the constraints of the ordering are only those induced by executability considerations, then such result witnesses "causally required concurrency of actions". 

This is pernicious because its potential presence "forces" searching over effect-orders rather than action-orders (and there are exponentially more effect-orders than action-orders...).


STNs and higher types...

Lemma. An executable action-STN easily demonstrates a weak kind of required concurrency of actions. 
Pf. Just impose a deadline.

Lemma.
An executable effect-STN easily demonstrates a weak kind of required concurrency of effects. 
Pf. Set all durations to 1 and impose the tightest possible deadline.

Similar comments apply for action-ILPs and effect-ILPs. 

Whether the concurrency is an issue or not depends on the search strategy of the planner; if the planner is directly searching over STNs, then it won't really matter why, if, or how concurrency is forced. 

From the standpoint of directly searching over proof theories (rather than model theories), the better perspective to take (than emphasizing "required concurrency") is that searching over action-ILPs is more efficient than searching over effect-ILPs. 

But when can it be gotten away with? 

When can we commit to contracting all the pieces of a normally compound search decision into a big macro, without losing completeness?

For the state-space bent, the question can be simplified a bit (imho) into detecting when concurrency between compounds might be "causally required". 

If so ("causally required concurrency of compounds" is plausible), then we need to branch over the compounds as they are. 
If not, we can treat them as atomic, "for the win".

Tightly distinguishing between "causally required concurrency" and "concurrency required for unimportant reasons" is as "simple" as partitioning constraints into "needs to be fed to a nondeterministic-memory-bounded-Turing-machine (such as STRIPS)" and "admits algebraic manipulation".  For an important fragment of PDDL-style temporal planning, the separation is clear, and one has the nice property that the "causal" constraints always form partial-orders.  So only the duration/deadline constraints force one to consider something like STNs rather than partial-orders.

In that case it is worthwhile to separate consideration of the "effect-mutex-order"/"effect-deorder"(/"the causally-induced partial-order on effects", "the threat-free partial-order causal-link structure on effects", ...) from all the rest of the constraints.

That takes us back to the simple test for "concurrency that matters": whether contracting by some preordained scheme (effects into their actions, but one could consider generalizations to HTNs...) produces cycles.


PDDL2.1 Terminology and Semantics

Unfortunately the formal semantics of PDDL2.1 do not permit separately verifying causal and temporal constraints.  Of course, in general, the effects of actions might very well depend on when the action is started, or for how long the action is run; so in general one can't pursue such separate verification anyways.

But special cases matter.

The special case of being able to run a plan through a "STRIPS/ADL calculator", to verify its causal constraints, and an "STN calculator", for the temporal constraints, is an important special case.  PDDL2.1 was designed to permit this, but the formal definitions do not meet the design goal.

In particular, the formal object that one feeds to the STRIPS/ADL calculator is what I would call an (effect-set)-sequence, where the sets are sets of simultaneous effects.  The spec. calls these objects "happening-sequences".  To determine a happening sequence, one must decide which sets of effects will be simultaneous and which will not.  So the temporal information ends up polluting the STRIPS/ADL-side of the equation...

There is an interesting reason that the formal semantics go this route --- connected with why overall constraints are said to be fully-open, why effects are only allowed at-start and at-end, and views towards compiling ZENO-like formalisms down to simpler ones... --- but in any event the result is unacceptable to approaches to temporal planning coming from a state-space bent.

In other words, both POPF and SAPA (among others) reject "happening-sequences". 

VAL psuedo-rejects "happening-sequences"; it generates such objects, but it generates them using the wrong approach, which is related to it wanting to take a shortcut in evaluating the "non-zero separation rule".  Which is not unrelated to why the state-space bent ends up rejecting happening-sequences altogether...

VAL Bug

If you're interested, consider a plan (an action-schedule) like:
 0.0012: A[1]
 0.00145: B[1]
 0.0022: C[1]
with B and C mutex, but A and B nonmutex. 

VAL will accept the plan, which is wrong.  It does so because it rewrites the action-schedule to:
 0.0012: A[1]
 0.0012: B[1]
 0.0022: C[1]

Subsequenty it generates the happening-sequence (well, actually, it makes a slight optimization that exploits a fun corner of STRIPS semantics, but nevermind that):
 {A-start, B-start}
 {A-inv, B-inv}
 {C-start}
 {A-inv, B-inv, C-inv}
 {A-end, B-end}
 {C-inv}
 {C-env}
which passes all tests that VAL makes, namely, running the unions of the effect-descriptions through a STRIPS/ADL calculator.

The "correct" (as per the specification) happening-sequence for the original plan, for reference, is:
 {A-start}
 {A-inv}
 {B-start}
 {A-inv, B-inv}
 {C-start}
 {A-inv, B-inv, C-inv}
 {A-end}
 {B-inv, C-inv}
 {B-end}
 {C-inv}
 {C-end}
which would also pass all tests related to happening-sequences, which is all that VAL checks.

However, the specification heavily implies that one must separately and also check a strong-form of the non-zero separation rule, i.e., the \eps-separation rule.  Whether the schedule (contrast with sequence!) passes or not depends on whether A-start and B-start are mutex, and whether B-start and C-start are mutex.  If either pair is mutex, the schedule is in error according to the specification.  If the first pair is nonmutex, but the second pair is mutex, then VAL is tricked into accepting the schedule; scheduling A and B close together hides the mutex between B and C from VAL's blurry vision.  VAL will (rightfully) reject the plan if we instead make A-start and B-start mutex (regardless of whether B-start and C-start are mutex); the "reason" it rewrites the original schedule to move nearby things into simultaneous things is so that it can check the separation rule "more efficiently".  Where this breaks down is that "simultaneous" is transitive, but "separated" is not...

VHPOP psuedo-accepts "happening-sequences"; it doesn't actually build such objects, but it does implement a sound proof-theory for the formal semantics of PDDL2.1 up to its various syntactic restrictions (boolean fluents, STRIPS-only effects?).  When VAL and VHPOP disagree, but VHPOP doesn't reject the input, trust VHPOP.

LPGP is presumably also sound, but it is awkward enough to use that I haven't tested it as much...which is a lame excuse I suppose, but there it is.

No other temporal planners that I am aware of (besides VHPOP and LPGP) respect the spec. in this regard.

An excellent, but hardly conclusive, test-case is to use your (Dave's) device for forcing two actions to coincide: have each establish+destroy the over-all condition of the other.  Such is supposedly legal (VHPOP and VAL concur), but few planners think so.


Required Concurrency of Effects

If we regard effects as instantaneous, then "required concurrency of effects" can be shortened to "required simultaneity", which is a generally reviled notion.

If we instead regard effects as durative (but atomic), then the proper term is "required synchronization".

All PDDL2.1 implementations view STRIPs/ADL actions ("effects") as durative.  (Which is why I prefer to say "compound action" rather than "durative action".)  They vary in the value of time assumed: 1, 0.01, or 0.001, or 2-6, or 2-7, or 10-18, and in the case of Drew McDermott, the duration of effects is one unit along the vertical dimension of a two-dimensional field of time (time is, theoretically speaking at least, hyperreal-valued in OpTop). 

Required synchronization is useful for modeling things like "compare-and-swap" at the level of wires+registers, i.e., at the level of VHDL.  In other words it is useful for abstracting out more detailed explanations in terms of perturbances/tolerances.  So, for example, while it is true that "compare-and-swap" can fail to meet its contract if one designs a processor so large that the non-simultaneous arrival of clock signals matters, or, more likely, if one overclocks the chip to faster than the largest gate delay in front of any memory element on-chip, and while it is true that such calculations *are* ultimately important, so too is it true that the level of abstraction where we assume compare-and-swap "just works" is quite important (hence VHDL...).

Required synchronization is also useful for compiling more expressive languages into PDDL2.1.  There are two main theoretical mechanisms, neither of which actually works in practice (both rely on requiring precise synchronization of the pieces of compiled actions).  Keith Halsey's mechanism reqires knowing the value of unit time, which is planner-specific.  (I view planner-specific benchmarks as a no-go.)  Dave Smith's mechanism requires that over-all constraints be enforced in a manner (fully open, as required by the spec.) antithetical to the state-space view of temporal planning; such planners invariably do not permit Dave's mechanism to work. 

Arguably those planners are in error, because they clearly violate the specification, but at the same time, they do so for important efficiency and architectural reasons that can't be overcome (without switching to the partial-order approach).  In other words, there is a kind of temporal planning (all intervals are non-empty and half-open on the right), weaker even than what PDDL 2.1 asks for, that both the state-space view and the partial-order view could readily achieve.  I'd like for that subset/reinterpretation of PDDL 2.1 to be widely supported, because it is still just barely expressive enough to allow ZENO-esque langagues to be compiled down (without having to target a planner), without imposing undue performance penalties on non-compiled models.

DrRacket's default implementation of procedure-call is too slow...but fixable

posted Sep 19, 2013, 11:47 PM by William Cushing   [ updated Sep 21, 2013, 2:05 PM ]

I was recently looking at the performance of (Dr)Racket's code-generation, and I was operating under the default assumption that procedures created at run-time are slower than those created at compile-time.

In general one expects that to be the case because there is more room for analysis at "compile-time".  On the other hand, there is the potential that better information is available at run-time for conducting optimizations (more things might be known to be constant, for example).

Update!

The behavior described on this page was quite unusual behavior, so I submitted it as a bug report.  Dynamic compilation really shouldn't produce better results than static compilation ... at least ... not without a really great excuse (like extensive profiling information becomes available, which is certainly not the case with (eval (quote (lambda ...)))).

So, as it turns out, my mistake was in (1) using DrRacket, (2) without investigating deeply enough how to configure it for "production-mode".  Microbenchmarking is an art-form, unfortunately...

Anyways, the next box is the full scoop from a core Racket developer; the blue markup highlights the proper way to setup DrRacket if you are interested in speed testing.

Matthew Flatt diagnoses the problem:

At Fri, 20 Sep 2013 00:32:02 -0400, william.cushing@gmail.com wrote:
> Trivial closures created at "compile-time" can be slower than "the same"
> trivial closures produced at "run-time".

Did you run the program in DrRacket?

Adding `(test)` to the end and running in plain `racket` at the command line, I get

 10 million applications of +/2:
 via eval:
 cpu time: 119 real time: 118 gc time: 0
 via lambda:
 cpu time: 30 real time: 30 gc time: 0


In DrRacket's default mode:

 10 million applications of +/2:
 via eval:
 cpu time: 890 real time: 892 gc time: 0
 via lambda:
 cpu time: 964 real time: 964 gc time: 0


In DrRacket after selecting "No debugging or profiling" in the "Choose Language" dialog:

 10 million applications of +/2:
 via eval:
 cpu time: 133 real time: 139 gc time: 0
 via lambda:
 cpu time: 107 real time: 107 gc time: 0


In DrRacket after further unchecking "Preserve stacktrace":

 10 million applications of +/2:
 via eval:
 cpu time: 123 real time: 123 gc time: 0
 via lambda:
 cpu time: 32 real time: 36 gc time: 0


So, my guess is that you're running your tests within DrRacket. If that's right, you're seeing that DrRacket's debugging instrumentation mostly applies to the static program, while program fragments via `eval` have less instrumentation and therefore less overhead.

For concreteness, consider:
(define twice/l              (lambda (x) (+ x x)))
(define twice/p (eval (quote (lambda (x) (+ x x)))))

Both are Scheme-y ways to define a procedure that takes one input, x, and returns twice its value, that is, the value of (+ x x).

Pedantry

Most people say "function" rather than procedure.  This is quite misleading.  Programming languages are about creating and invoking procedures in order to carry out effects on the real world.  It so happens that programs often need to calculate the values of certain functions in order to decide how to proceed.  In which case they end up invoking procedures to perform such calculations.  The primary effect of such a calculation is that the program determines the value of the function in question, but also some available electrical energy gets converted to ambient heat, some wear and tear occurs on the various electrical components (main memory, registers, the processor, hard disks, ...), and so on...

In particular, one should say "procedure" and "effect" rather than "function" and "side-effect".  But anyways...
The second way is especially Scheme-y.  The first way is exactly equivalent to writing:
(define (twice/l x) (+ x x))
I've written it the longer way to make the connection with the second way clearer.

Namely, if the first way corresponds to the sentence:
 ``Define a procedure to double its input.'';
then the second way is the sentence:
 ``Carry out the directions in this sentence:
    ``Define a procedure to double its input.''.''. 

In other words, from a mathematical point of view, surrounding something in (eval (quote ...)) "does nothing".  However, from an algorithmic point of view, doing so does have an important effect.  It alters when the procedure gets created.  Consequently, it also, potentially, alters who does the creating.

Usually, one expects that creating procedures later on will give worse results. 

As it turns out, in DrRacket (Update: but not the command-line racket), twice/p is somewhat faster than twice/l.  I defined a procedure to call the two versions 10 million times each, and time the results.  As so:

> (test-+)
10 million applications of +/2:
via eval:
cpu time: 264 real time: 264 gc time: 0
via lambda:
cpu time: 344 real time: 343 gc time: 0


So wrapping the procedure definition in (eval (quote ...)) gave a speedup of about 25%.

Wow!



That seemed interesting enough to wrap up the transformation in a useful way, and write some additional tests.  Like so for the custom syntax:

(define-syntax-rule
  (define-procedure (f x ...) b ...)
    (define f (eval (quote (lambda (x ...) b ...)) )))


Which, in (Dr)Racket, needs a magic spell to be cast first:

;; The point of the following magic spell is to make (eval x) behave the same in the REPL and in a module.
;; This can also be achieved using (define-namespace-anchor ...) and (namespace-anchor->namespace ...),
;; instead of using (#%variable-reference), but all are equally nonportable.


(current-namespace (variable-reference->namespace (#%variable-reference)))


The new syntax is a drop-in replacement for using (define ...) to define procedures.  For example, the twice functions can be written this way:

(define           (twice/l x) (+ x x))
(define-procedure (twice/p x) (+ x x))

Which are "just" nicer ways of saying the same things as before.  The advantage is that one can change (define-procedure ...) if the behavior of DrRacket changes such that wrapping things in (eval (quote ...)) stops being beneficial (Update: e.g., if one sets the "right" options, as discussed above).  Like so:

;; Alternative do-nothing definition:
(define-syntax-rule

  (define-procedure (f x ...) b ...)
  (define (f x ...) b ...))



Okay, let's see some more tests:

;; Assignments?
(define a-variable (void))

(define          
  (effect/lamb x)
    (set! a-variable (+ x x))
    a-variable)

(define-procedure
  (effect/proc x)
    (set! a-variable (+ x x))
    a-variable)



;; Recursion?
(define
 
(fib/lamb n)

    (if (< n 2)
        n
        (+ (fib/lamb (- n 1))
           (fib/lamb (- n 2)))))

(define-procedure
  (fib/proc n)

    (if (< n 2)
        n
        (+ (fib/proc (- n 1))
           (fib/proc (- n 2)))))



;; Mutual recursion?
(define
  (even?/lamb x)

    (cond
      [(eq? x 0) #t]
      [(eq? x 1) #f]
      [(< x 0) (even?/lamb (- x))]
      [else (odd?/lamb (- x 1))]))
(define
  (odd?/lamb x)
    (even?/lamb (- x 1)))


(define-procedure
  (even?/proc x)

    (cond
      [(eq? x 0) #t]
      [(eq? x 1) #f]
      [(< x 0) (even?/proc (- x))]
      [else (odd?/proc (- x 1))]))
(define-procedure
  (odd?/proc x)

    (even?/proc (- x 1)))


So what do the timing results look like?  Like so:

Welcome to DrRacket, version 5.2.1 [3m].

Language: racket; memory limit: 128 MB.
> (test)
10 million applications of +/2:
via eval:
cpu time: 272 real time: 275 gc time: 0
via lambda:
cpu time: 332 real time: 334 gc time: 0

10 million applications of set! and +/2:
via eval:
cpu time: 304 real time: 303 gc time: 0
via lambda:
cpu time: 624 real time: 623 gc time: 0

The 35th fibonacci number by recursion:
via eval:
cpu time: 388 real time: 389 gc time: 0
via lambda:
cpu time: 1920 real time: 1921 gc time: 0

Testing parity of 10 million by a silly (i.e., linear-time) algorithm:
via eval:
cpu time: 84 real time: 84 gc time: 0
via lambda:
cpu time: 560 real time: 560 gc time: 0

In every case the wrapping helps, especially as the amount of procedure calls increases.  For example, the results for the really bad algorithms for computing fibonacci numbers or testing parity are made much faster (>500% faster!) by the simple expedient of wrapping the definition in (eval (quote ...)).

The reason, incidentally, that DrRacket performs slowly here seems to because its source-line tracking imposes some kind of run-time overhead even when not compiling for debug mode.  It may be that the performance bug lies in DrRacket failing to correctly pass the correct optimization flags to the compiler when I hit "run" rather than "debug".  To reach that conclusion, I tried changing the define-procedure macro to use (syntax ...) instead of (quote ...); doing so entirely reversed the beneficial effect observed above.  (Update: the fuller explanation has to do with profiling, debugging, and stacktrace instrumentation, which (eval (quote (lambda ...))) happens to strip some of away, but (eval (syntax (lambda ...))) does not.)

;; This version, which changes quote to syntax, produces much slower procedures:
#;(define-syntax-rule
    (define-procedure (f x ...) b ...)
      (define f (eval (syntax (lambda (x ...) b ...)) )))

A Simple Abstraction Penalty Benchmark: Closures

posted Sep 6, 2013, 4:48 AM by William Cushing

A really clever compiler would be able to figure out that:
  ((lambda (op) (lambda (x y) (op x y))) +)
...means the same thing as the 2-argument version of addition. 

A really aggressive optimizer would, then, inline any calls to such a thing (a closure).

In theory.


In practice: compilers are quite dumb.

Here are five ways of adding two numbers in Racket, deliberately written in a Common Lisp (rather than Scheme) style for reasons that will be obvious shortly:

#lang racket

;; Let's test how good the compiler is at figuring out that I just want to add two numbers...

(define x 0)
;; Racket prefers "for v in-range n" to "dotimes v n"
(define-syntax-rule               
 (dotimes (v n) body ...)
 (for ((v (in-range n))) body ...))

(define (test-direct-call)
  (dotimes (i 20000000)
    (set! x (+ i 3))))

(define (test-indirect-call)
  (dotimes (i 20000000)
    (set! x (apply + i 3 '()))))
;; such use of apply is misuse in a Lisp-1,
;; ...but mirrors what will be necessary for a Lisp-2

(define (test-indirect-call-through-local-binding)
  (let ((bar +))
    (dotimes (i 20000000)
      (set! x (apply bar i 3 '())))))


;; now the fun part: create some closures and call through them
(define (make-lambda op) (lambda (x y) (op x y)))
(define foo (make-lambda +))

(define (test-local-closure)
  (let ((bar (make-lambda +)))
    (dotimes (i 20000000)
      (set! x (apply bar i 3 '())))))

(define (test-nonlocal-closure)
  (dotimes (i 20000000)
    (set! x (apply foo i 3 '()))))

(define (test-closure-via-eval op)
  (let ((bar (make-lambda op)))
    (dotimes (i 20000000)
      (set! x (apply bar i 3 '())))))
       
(define (test)
  (time (test-direct-call))
  (time (test-indirect-call))
  (time (test-indirect-call-through-local-binding))
  (time (test-local-closure))
  (time (test-nonlocal-closure))
  (write "Type in `+', please.  Anything else could be dangerous...")
  (newline)
  (let ((op (eval (read))))
    (time (test-closure-via-eval op)))
  )


On the present version of Racket, the test yields (the times are in milliseconds):
Welcome to DrRacket, version 5.3.2 [3m].
Language: racket [custom]; memory limit: 1024 MB.
> (test)
cpu time: 874 real time: 874 gc time: 0
cpu time: 858 real time: 875 gc time: 0
cpu time: 874 real time: 870 gc time: 0
cpu time: 1388 real time: 1399 gc time: 0
cpu time: 1404 real time: 1411 gc time: 0
"Type in `+', please.  Anything else could be dangerous..."
+
cpu time: 1420 real time: 1417 gc time: 0


Which means that all ways of creating closures are equal, and all suffer a performance penalty relative to more direct calls. 

That is, an abstraction penalty exists in Racket, for closures.  Interestingly, apply (in this funcall-like form) is not at all penalized, in these tests at least.

I half-expected the second form of calling through a closure to be optimized away (test-local-closure), and/or the first (test-nonlocal-closure), because the closures in these cases are knowable/known at compile-time.  However, the simplicity of neither case goes noticed by the compiler.  "Of course" the third case fools the compiler; that's what it exists for.  (Albeit, it is technically feasible to catch even this case as well, as long as the compiler links itself, and perhaps the original sources as well, into the compiled program.)



What about Lispworks?

;; This is supposed to be the right spell to convince LispWorks to act more like C,
;; ...but it only half-worked for me. 
;; (I had to manually execute the proclaim at the REPL to get the compiler to honor it.)
(eval-when (:compile-toplevel :load-toplevel :execute)
  (proclaim '(optimize (speed 3) (safety 0) (debug 0))))

;; side-effecting variables usually convinces optimizers not to optimize away microbenchmarks...
(defvar x 0)

(defun test1 ()
  (dotimes (i 20000000)
    (setq x (funcall #'+ 2 3))))

(defun test2 ()
  (let ((bar #'+))
    (dotimes (i 20000000)
      (setq x (funcall bar 2 3)))))


(defun make-lambda (op) (lambda (x y) (funcall op x y)))
(defconstant foo (make-lambda #'+))

(defun test3 ()
  (dotimes (i 20000000)
    (setq x (funcall foo 2 3))))

(defun test4 ()
  (let ((bar (make-lambda #'+)))
    (dotimes (i 20000000)
      (setq x (funcall bar 2 3)))))

(defun test ()
  (time (test1))
  (time (test2))
  (time (test3))
  (time (test4))
  )

(test)

Some slightly snipped output from the result of compiling that produced the following timings:

;;; Safety = 0, Speed = 3, Space = 1, Float = 1, Interruptible = 1
;;; Compilation speed = 1, Debug = 0, Fixnum safety = 3
;;; Source level debugging is off
;;; Source file recording is  on
;;; Cross referencing is on

Timing the evaluation of (TEST1)
User time    =        0.031
Timing the evaluation of (TEST2)
User time    =        1.248
Timing the evaluation of (TEST3)
User time    =        1.591
Timing the evaluation of (TEST4)
User time    =        1.606

There appears to be a slight performance penalty for closures (tests 3 and 4) relative to more vanilla procedure objects (test 2).  There is a notable performance penalty relative to a completely obvious call to a builtin (test 1).  Checking the disassembly, the reason for the major discrepancy is that in the first test, the loop is equivalent to just:

(defun test1 ()
  (dotimes (i 20000000)
    (setq x 5)))

That is, the intended call is entirely constant-folded.  Doh!

Changing the 2 to i happened to ensure, this time, that addition actually takes place (checking the disassembly is a good idea...).

(defun test1 ()
  (dotimes (i 20000000)
    (setq x (funcall #'+ i 3))))

I changed all the other tests as well, giving basically the same set of timings, except, of course, for test1:

Timing the evaluation of (TEST1)
User time    =        0.078
Timing the evaluation of (TEST2)
User time    =        1.248
Timing the evaluation of (TEST3)
User time    =        1.622
Timing the evaluation of (TEST4)
User time    =        1.622


There is still a notable discrepancy.  The reason is that all but the first test actually emit a call to the runtime version of funcall, so they are function-calling within a loop, whereas the first test (despite being stated as a call to funcall) is optimized during compilation to a call-free loop.



Conclusion?

So one upshot here is that Lispworks is not quite as clever as Racket is when it comes to functional programming; merely hiding the call in a let-bound variable (as in test2) produced significantly inferior code.  Of course let-bound variables are to be preferred to global-scope constants: that Racket treats global and local scopes pretty much the same is a bonus.

On the flip side...

(1) The Racket compiler is not able to produce nearly as good code for the simplest case.  That is, Lispworks runs 10 times faster (than Racket) on the most directly stated form of the microbenchmark. 

(2) Meanwhile, Lispworks only runs some 10% slower or so (again, relative to Racket), when the computation is hidden behind closures.

Neither implementation does what I really want them to do, which is to inline calls to closures when those closures are actually compile-time constants.  Even better would be to inline such closures at run-time...



Aside:

The amount of Java that it takes to reproduce the precise semantics of the above tests (integer arithmetic only bounded by memory [BigInteger], user chooses what operation to perform [.getClass(...); .getDeclaredMethod(...); .invoke(...)], and so on) is mind-boggling.  The default settings on my Eclipse produce notably worse performance (except for the case of limiting all the calculations to machine integers, which is the easiest form to type out, but not the spirit of the test!).  I did not test the ability of LispWorks and Racket to exploit type declarations --- but they do both have that ability --- and I imagine it works out just as well as Java does.  Exploiting type-information would ideally be an entirely orthogonal issue, except of course that Java insists on types, so fair comparisons are difficult.

That is, my interest here lies in testing the interaction between constant-folding and closures: are constant closures folded? 

The answer, for the state-of-the-art in LISP and Scheme, seems to be "no".  Haskell may very well fare better on that particular point.

1-10 of 30