More results

Out-of-distribution generalization

OOD generalization is considered to be the crucial ability to make progress in hard combinatorial optimization problems and automated theorem proving. The INT inequality generator has been specifically designed to benchmark this phenomenon. We check that Subgoal Search trained on proofs on length 10, generalizes favourably to longer problems, see Figure to the right. We speculate that search is a computational mechanism, which delivers OOD.

Analysis of k (subgoal distance) parameter

The subgoals are trained to be k steps ahead. Having higher k should make planning easier. However, as k increases, the quality of the generator might drop; thus, the overall effect is uncertain. For Sokoban increasing k generally helps, but the benefits saturate around k = 4, see Figure 2. We note that higher k increases the computational cost of low-level conditional policy search.

Quality of subgoals

The learned subgoal generator is likely to be imperfect (especially in hard problems). We study this on 10 × 10 boards of Sokoban, which are small enough to calculate the true distance dist to the solution using the Dijkstra algorithm. In the Figure to the right, we study ∆ := dist(s1) − dist(s2) , where s1 is a sampled state and s2 is a subgoal generated from s1. Ideally, the histogram should concentrate on k = 4 used in training. We see that in slightly more than 65% of cases subgoals lead to an improvement. The low-level conditional policy in Algorithm 2 provides additional verification of generated states. We check that in INT and Rubik’s Cube, about 50% of generated subgoals can be reached by this policy (the rest is discarded)

Value errors

There might be various explanations for the success of our method. One of them claims that Subgoal Search better handles errors of value estimates. Such errors are inevitable when using learned models. Imagine a solution path from the starting state to the goal state. Due to the errors, the value estimates on the path may not be monotonic. Intuitively, this is an undesirable property, which hinders search and thus makes finding the path harder. Now consider a sparse version of the path, e.g., with consecutive states spaced k actions apart, as can be constructed by Subgoal Search. For this version, estimates are more monotonic and easier to find. To illustrate this, we measure monotonicity on 8 solution paths found by our algorithm for INT. The probability that value decreases when moving k steps decreases from 0.32 when k = 1 to mere 0.02 for k = 4.