Real-time heuristic search

Problem Formulation

The distinctive property of agent-centered real-time heuristic search is that an agent must repeatedly plan and execute actions within a constant time interval that is independent of the size of the problem being solved. This restriction severely limits the range of applicable heuristic search algorithms. For instance, static search algorithms such as A* and IDA*, re-planning algorithms such as D*, anytime algorithms such as ARA* and anytime re-planning algorithms such as AD*, cannot guarantee a constant bound on planning time per action. LRTA* can, but with potentially low solution quality due to the need to fill in heuristic depressions.

Search algorithms that produce an entire solution before the agent takes its first action (e.g., A*) lead to increasing action delays as the search graph size increases. Numerous optimizations have been suggested to remedy these problems and decrease the delays. Real-time search addresses the problem in a fundamentally different way. Instead of computing a complete, possibly abstract, solution before the first action is to be taken, real-time search algorithms compute (or plan) only the few first actions for the agent to take. This is usually done by conducting a lookahead search of fixed depth around the agent’s current state and using a heuristic (an estimate of the remaining travel cost) to select the next action. The actions are then taken and the planning-execution cycle repeats. Since the goal state is not reached by most such local searches, the agent runs the risk of heading into a dead end or, more generally, selecting suboptimal actions. To address this problem, real-time heuristic search algorithms update (or learn) their heuristic function with experience.




There had been several problems with the existing body of work in the field. First, early algorithms were designed to converge to an optimal solution thereby exploring the search space aggressively. That led to instability of the search and frequently low solution quality. Second, most algorithms searched and updated their heuristics in the original state space. For practically interesting problems such spaces tend to be vast, leading to unacceptably long searches. Third, most real-time search algorithms did a constant amount of planning (i.e., lookahead search) per action. As a result, they tended to waste CPU cycles when the heuristic function was fairly accurate and, conversely, did not plan enough when the heuristic was particularly inaccurate. Fourth, they computed the heuristic with respect to a distant global goal state, which can put unrealistic requirements on heuristic accuracy. As a net result, real-time heuristic search algorithms have not been deployed widely in applications.

Our Contributions

First, we pioneered the use of automatically built state abstraction in real-time heuristic search and generalized it to the stochastic shortest path problem. Our algorithm (PR LRTS) produced (then) state-of-the-art results in video-game pathfinding. I was the first to observe lookahead pathologies (when deeper search leads to worse solutions on average) in single agent search. Building on this pathology research, together with my group and colleagues I then developed a real-time heuristic search algorithm with dynamically controlled lookahead. This work was also a pioneering use of pattern-databases in real-time heuristic search. The resulting algorithm (D LRTA*) became the new performance champion in the domain of video-game pathfinding. I then developed the first case-based reasoning real-time heuristic search algorithm (kNN LRTA*) which, similarly to D LRTA*, used a case base of previously solved problems to guide its online search. Unlike D LRTA*, the new algorithm had faster off-line pre-computation and was substantially more memory efficient.


We then realized that, by building the case base in a specific way, we can eliminate heuristic learning at runtime, replacing it with simple hill-climbing. The resulting algorithm (HCDPS) was not only simpler on-line, but also sped up the case base pre-computation as well as the run time and virtually eliminated any state revisits. This was the first solution to the long-standing problem of state revisitation that had plagued real-time heuristic search from the beginning and had precluded many practical applications of real-time heuristic search algorithms. HCDPS became the new champion in real-time video game pathfinding: on video game maps of over ten million states, HCDPS took under two minutes of pre-computation per map, had a path suboptimality of only about 3%, per-move planning time of one microsecond and an overall execution time two orders of magnitude faster than A*. Despite the breakthrough empirical results, the case base formation of HCDPS covered the state space in an ad hoc fashion. Using the theoretical notion of problem submodularity, my group then investigated optimal ways of covering the state space and an efficient greedy approximation to them.

Our team also investigated new paradigms in real-time heuristic search. While most real-time heuristic search algorithms learn the cost to go (the heuristic h), we designed and evaluated an algorithm (TBA*) that learns no heuristic function at all; an algorithm (RIBS) that learns only the cost so far (the g function) and uses it to prune certain search states; and an algorithm (f-LRTA*) that learns both cost functions. While working on the latter two algorithms, we proved several theoretical boundson the amount of learning needed by any algorithm. While common in other areas of machine learning, such bounds had traditionally been difficult to derive in the field of real-time heuristic search.