Real-time heuristic search

Problem Formulation

The distinctive property of agent-centered real-time heuristic search is that an agent must repeatedly plan and execute actions within a constant time interval that is independent of the size of the problem being solved. This restriction severely limits the range of applicable heuristic search algorithms. For instance, static search algorithms such as A* and IDA*, re-planning algorithms such as D*, anytime algorithms such as ARA* and anytime re-planning algorithms such as AD*, cannot guarantee a constant bound on planning time per action. LRTA* can, but with potentially low solution quality due to the need to fill in heuristic depressions.

Search algorithms that produce an entire solution before the agent takes its first action (e.g., A*) lead to increasing action delays as the search graph size increases. Numerous optimizations have been suggested to remedy these problems and decrease the delays. Real-time search addresses the problem in a fundamentally different way. Instead of computing a complete, possibly abstract, solution before the first action is to be taken, real-time search algorithms compute (or plan) only the few first actions for the agent to take. This is usually done by conducting a lookahead search of fixed depth around the agent’s current state and using a heuristic (an estimate of the remaining travel cost) to select the next action. The actions are then taken and the planning-execution cycle repeats. Since the goal state is not reached by most such local searches, the agent runs the risk of heading into a dead end or, more generally, selecting suboptimal actions. To address this problem, real-time heuristic search algorithms update (or learn) their heuristic function with experience.




There had been several problems with the existing body of work in the field. First, early algorithms were designed to converge to an optimal solution thereby exploring the search space aggressively. That led to instability of the search and frequently low solution quality. Second, most algorithms searched and updated their heuristics in the original state space. For practically interesting problems such spaces tend to be vast, leading to unacceptably long searches. Third, most real-time search algorithms did a constant amount of planning (i.e., lookahead search) per action. As a result, they tended to waste CPU cycles when the heuristic function was fairly accurate and, conversely, did not plan enough when the heuristic was particularly inaccurate. Fourth, they computed the heuristic with respect to a distant global goal state, which can put unrealistic requirements on heuristic accuracy. As a net result, real-time heuristic search algorithms have not been deployed widely in applications.

Our Contributions

First, we pioneered the use of automatically built state abstraction in real-time heuristic search and generalized it to the stochastic shortest path problem. Our algorithm (PR LRTS) produced (then) state-of-the-art results in video-game pathfinding. I was the first to observe lookahead pathologies (when deeper search leads to worse solutions on average) in single agent search. Building on this pathology research, together with my group and colleagues I then developed a real-time heuristic search algorithm with dynamically controlled lookahead. This work was also a pioneering use of pattern-databases in real-time heuristic search. The resulting algorithm (D LRTA*) became the new performance champion in the domain of video-game pathfinding. I then developed the first case-based reasoning real-time heuristic search algorithm (kNN LRTA*) which, similarly to D LRTA*, used a case base of previously solved problems to guide its online search. Unlike D LRTA*, the new algorithm had faster off-line pre-computation and was substantially more memory efficient.


We then realized that, by building the case base in a specific way, we can eliminate heuristic learning at runtime, replacing it with simple hill-climbing. The resulting algorithm (HCDPS) was not only simpler on-line, but also sped up the case base pre-computation as well as the run time and virtually eliminated any state revisits. This was the first solution to the long-standing problem of state revisitation that had plagued real-time heuristic search from the beginning and had precluded many practical applications of real-time heuristic search algorithms. HCDPS became the new champion in real-time video game pathfinding: on video game maps of over ten million states, HCDPS took under two minutes of pre-computation per map, had a path suboptimality of only about 3%, per-move planning time of one microsecond and an overall execution time two orders of magnitude faster than A*. Despite the breakthrough empirical results, the case base formation of HCDPS covered the state space in an ad hoc fashion. Using the theoretical notion of problem submodularity, my group then investigated optimal ways of covering the state space and an efficient greedy approximation to them.

Our team also investigated new paradigms in real-time heuristic search. While most real-time heuristic search algorithms learn the cost to go (the heuristic h), we designed and evaluated an algorithm (TBA*) that learns no heuristic function at all; an algorithm (RIBS) that learns only the cost so far (the g function) and uses it to prune certain search states; and an algorithm (f-LRTA*) that learns both cost functions. While working on the latter two algorithms, we proved several theoretical boundson the amount of learning needed by any algorithm. While common in other areas of machine learning, such bounds had traditionally been difficult to derive in the field of real-time heuristic search.

Publications

  1. Vadim Bulitko. Effects of Self-knowledge: Once Bitten Twice Shy. Proceedings of the Experimental AI in Games (EXAG) Workshop at the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE). pages 7. 2017.
  2. Devon Sigurdson and Vadim Bulitko. Deep Learning for Real-time Heuristic Search Algorithm Selection. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), in press. 2017.
  3. Mina Abdi Oskouie and Vadim Bulitko. Robustness of Real-time Heuristic Search Algorithms to Read/Write Error in Externally Stored Heuristics. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), in press. 2017.
  4. Carlos Hernandez Ulloa and Adi Botea and Jorge Baier and Vadim Bulitko. Online Bridged Pruning for Real-Time Search with Arbitrary Lookaheads. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), (in press). 2017.
  5. Vadim Bulitko. Per-map Algorithm Selection in Real-time Heuristic Search. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 2016.
  6. Vadim Bulitko. Searching for Real-time Search Algorithms. In Proceedings of the Annual Symposium on Combinatorial Search (SoCS), pages 121-122 .
  7. Vadim Bulitko and Alexander Sampley. Weighted Lateral Learning in Real-time Heuristic Search. In Proceedings of the Annual Symposium on Combinatorial Search (SoCS), pages 10-18.
  8. Vadim Bulitko. Evolving Real-time Heuristic Search Algorithms. In Proceedings of the Fifteenth International Conference on the Synthesis and Simulation of Living Systems (ALIFEXV), pages 108-115.
  9. Nathan Sturtevant and Vadim Bulitko. Scrubbing During Learning In Real-time Heuristic Search. Journal of Artificial Intelligence Research (JAIR), 57:307-343.
  10. Nathan Sturtevant and Vadim Bulitko. 2014. Reaching the Goal in Real-Time Heuristic Search: Scrubbing Behavior is Unavoidable. In Proceedings of the Symposium on Combinatorial Search (SoCS), Prague, Czech Republic. pages 166-174.
  11. Daniel Huntley and Vadim Bulitko. 2013. Search-Space Characterization for Real-time Heuristic Search. CoRR abs/1308.3309
  12. Vadim Bulitko and Chris Rayner and Ramon Lawrence. 2012. On Casebase Formation in Real-time Heuristic Search. In Proceedings of the Eighth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pages 106-111. Stanford, California, October 2012.
  13. Ramon Lawrence and Vadim Bulitko. Database-Driven Real-time Heuristic Search in Video-game Pathfinding. 2012. IEEE Transactions on Computational Intelligence and AI in Games. PP(99). Pages 30. doi: 10.1109/TCIAIG.2012.2230632.
  14. Vadim Bulitko and Yngvi Björnsson and Nathan Sturtevant and Ramon Lawrence. Real-time Heuristic Search for Pathfinding in Video Games. In book: Applied Research in Artificial Intelligence for Computer Games. Springer USA. Pages 1-30. March 2011.
  15. Daniel Huntley and Vadim Bulitko. 2011. Extending the Applications of Recent Real-time Heuristic Search. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Student Abstract and Poster track. San Francisco, California. pages 1792-1793.
  16. Nathan Sturtevant and Vadim Bulitko. 2011. Learning where you are going and from whence you came: h- and g-cost learning in real-time heuristic search. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), pages 365-370. Barcelona, Spain, July 2011.
  17. Vadim Bulitko and Yngvi Björnsson and Ramon Lawrence. Case-Based Subgoaling in Real-Time Heuristic Search for Video Game Pathfinding. 2010. Journal of Artificial Intelligence Research (JAIR), 39: 269 - 300. AAAI Press.
  18. Daniel Huntley. 2011. Performance Analysis of Recent Real-time Heuristic Search Through Search-Space Characterization. M.Sc. thesis. Department of Computing Science. University of Alberta. Edmonton, Alberta.
  19. Ramon Lawrence and Vadim Bulitko. 2010. Taking Learning out of Real-time Heuristic Search for Video-game Pathfinding. In Proceedings of the Twenty-Third Australasian Joint Conference on Artificial Intelligence. Adelaide, Australia, December 2010. Pages 405-414.
  20. Nathan Sturtevant and Vadim Bulitko and Yngvi Björnsson. 2010. On Learning In Agent-Centered Search. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada. Pages 333 - 340.
  21. Valeriy K Bulitko and Vadim Bulitko. 2009. On backtracking in real-time heuristic search. CoRR abs/0912.3228.
  22. Vadim Bulitko and Yngvi Björnsson. 2009. kNN LRTA*: Simple Subgoaling for Real-time Search. In Proceedings of the Artificial Intelligence and Interactive Digital Entertainment conference (AIIDE). Stanford, California. 2-7.
  23. Jieshan Lu. 2009. Learning Multi-agent Pursuit of a Moving Target. M.Sc. thesis. Department of Computing Science. University of Alberta. Edmonton, Alberta.
  24. Yngvi Björnsson and Vadim Bulitko and Nathan Sturtevant. 2009. TBA*: Time-Bounded A*. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). Pasadena, California. 431-436.
  25. Vadim Bulitko and Mitja Luštrek and Jonathan Schaeffer and Yngvi Björnsson and Sverrir Sigmundarson. Dynamic Control in Real-Time Heuristic Search. 2008. Journal of Artificial Intelligence Research (JAIR), 32: 419 - 452. AAAI Press.
  26. Mitja Luštrek and Vadim Bulitko. Thinking Too Much: Pathology in Pathfinding. 2008. In Proceedings of the 18th European Conference on Artificial Intelligence (ECAI), pages 899 - 900. Patras, Greece.
  27. Alejandro Isaza and Jieshan Lu and Vadim Bulitko and Russell Greiner. 2008. A Cover-Based Approach to Multi-Agent Moving Target Pursuit. In Proceedings of the Artificial Intelligence and Interactive Digital Entertainment conference (AIIDE), pages 54 - 59. Stanford, California.
  28. Alejandro Isaza and Csaba Szepesvári and Vadim Bulitko and Russell Greiner. Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions. 2008. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI), pages 306 - 314, Helsinki, Finland.
  29. Vadim Bulitko and Yngvi Björnsson and Mitja Luštrek and Jonathan Schaeffer and Sverrir Sigmundarson. Dynamic Control in Path-Planning with Real-Time Heuristic Search. 2007. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), pages 49 - 56, Providence, Rhode Island.
  30. Vadim Bulitko and Nathan Sturtevant and Jieshan Lu and Timothy Yau. Graph Abstraction in Real-time Heuristic Search. 2007. Journal of Artificial Intelligence Research (JAIR), 30:51 - 100.
  31. D. Chris Rayner and Katherine Davison and Vadim Bulitko and Kenneth Anderson and Jieshan Lu. Real-Time Heuristic Search with a Priority Queue. 2007. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2372 - 2377, Hyderabad, India.
  32. Vadim Bulitko and Greg Lee. Learning in Real Time Search: A Unifying Framework. 2006. Journal of Artificial Intelligence Research (JAIR), 25:119 - 157.
  33. Vadim Bulitko and Nathan Sturtevant and Maryia Kazakevich. Speeding Up Learning in Real-time Search via Automatic State Abstraction. 2005. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1349 - 1354. Pittsburgh, Pennsylvania.
  34. Vadim Bulitko. 2004. Learning for Adaptive Real-time Search. CoRR cs.AI/0407016.
  35. Vadim Bulitko and Lihong Li and Russell Greiner and Ilya Levner. Lookahead Pathologies for Single Agent Search. 2003. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1531 - 1533. Acapulco, Mexico.
  36. Vadim Bulitko. Lookahead Pathologies and Meta-level Control in Real-time Heuristic Search. 2003. In Proceedings of the 15th Euromicro Conference on Real-Time Systems, pages 13-16. Porto, Portugal.
  37. lya Levner and Vadim Bulitko and Omid Madani and Russell Greiner. Performance of Lookahead Control Policies in the face of Abstractions and Approximations. 2002. in Lecture Notes in Artificial Intelligence (LNAI), Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation (SARA), pages 299-308. Springer-Verlag, Berlin, Heidelberg.

Presentations

  1. Searching for Real-time Search Algorithms. Heuristic Search Seminar, Department of Computing Science. University of Alberta. Edmonton, Alberta. April 26, 2016.
  2. Recent Work in Real-time Heuristic Search. Heuristic Search Seminar, Department of Computing Science. University of Alberta. Edmonton, Alberta. October 6, 2015.
  3. Enjoying The Search. Heuristic Search Seminar, Department of Computing Science. University of Alberta. Edmonton, Alberta. March 17, 2015.
  4. Fast Pathfinding through Subgoaling. BioWare. February 27, 2012.
  5. Subgoaling in Real-time Heuristic Search. University of British Columbia Okanagan. December 1, 2011.
  6. Subgoaling in Real-time Heuristic Search. University of Alberta. November 23, 2011.
  7. How to Avoid Learning. University of Alberta. July 20, 2009.
  8. Dynamic Control in Real-Time Heuristic Search. Reykjavik University. September 3, 2008.
  9. Dynamic Control in Real-Time Heuristic Search. University of British Columbia Okanagan. July 28, 2008.
  10. Dynamic Control in Real-Time Heuristic Search. Nara Institute of Science and Technology (NAIST). Nara, Japan. July 18, 2008.
  11. Dynamic Control in Real-Time Heuristic Search. Future University. Hakodate, Japan. July 16, 2008.
  12. State Abstraction in Real-time Heuristic Search. Invited talk at the Seventh International Symposium on Abstraction, Refinement, and Approximation (SARA). Whistler, British Columbia, July 21, 07.
  13. State Abstraction in Learning Real-time Heuristic Search. Stanford University, California. June 1, 07.
  14. Machine Learning for Pursuit in Computer Games. Université de Montréal, Montreal, Quebec. July 26, 06.
  15. Learning to Pursue. McGill University, Montreal, Quebec. July 25, 06.
  16. Real-time Search and Learning. Institute for Creative Technologies (ICT), Marina del Rey, CA. June 20, 06.
  17. Real-time Search in Game-like Environments. BioWare Corp. Edmonton, Alberta. June 9, 06.
  18. Target Modeling in Real-time Moving Target Pursuit. Reykjavik University. Reykjavik, Iceland. June 7, 06.
  19. Real-time Learning and Search. University of British Columbia, Vancouver, British Columbia. May 19, 06.
  20. Recent Developments in Learning Real-time Search and Their Applications to Real-time Path-finding. The Institut d'Investigació en Intel.ligència Artificial (IIIA), Bellaterra, Spain. August 25, 05.
  21. Speeding Up Learning via Abstraction. Institute for Creative Technologies (ICT), Marina del Rey; University of California Berkeley, Berkeley; University of California, Los Angeles (UCLA); Information Science Institute (ISI), Marina del Rey; University of Southern California (USC), Los Angeles, CA. Feb 05.
  22. Lookahead Pathologies and Meta Reasoning in Real-time Decision Making. University of Southern California (USC), Los Angeles, California. August 26, 03.