# Publications

## Journal Articles

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning. Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, Margarita Castro, Ethan Waldie, and Sheila McIlraith. Artificial Intelligence Journal, 2023.

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning. Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. Journal of Artificial Intelligence Research, Volume 73, 2022.

Uses a form of a finite state automata to specify a reward function in a way that exposes the high-level structure of the task at hand.

Builds upon earlier papers, by improving the formulation of reward machines and experimenting further with the described methods

Probably bounded suboptimal heuristic search. Roni Stern, Gal Dreiman, and Richard Valenzano. Artificial Intelligence, Volume 267, 2019.

Develops a general algorithmic heuristic search framework for finding solutions that satisfy a given suboptimality bound with a given probability.

On polygon numbers of circle graphs and distance hereditary graphs. Lorna Stewart and Richard Valenzano. Discrete Applied Mathematics, Volume 248, 2018.

Shows bounds on the polygon number of circle graphs.

Demonstrates that the polygon number of distance hereditary graphs is equal to the asteroidal number and can be found in linear time.

## Conference Papers

Type-WA*: Using Exploration in Bounded Suboptimal Planning. Eldan Cohen, Richard Valenzano, and Sheila McIlraith. IJCAI 2021.

Shows that random exploration can be successfully incorporated into bounded search by restricting the exploration to the focal list of a focal search algorithm.

Learning Reward Machines for Partially Observable Reinforcement Learning. Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, Margarita Castro, and Sheila McIlraith. NeurIPS 2019.

Demonstrates that reward machines can be learned from experience and used as a memory mechanism in partially observable environments

Also appeared at the OptRL workshop at NeurIPS 2019.

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. Alberto Camacho, Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. IJCAI 2019.

Shows that a wide variety of formal languages can be used to specify tasks for RL agents, through a conversion to reward machines. Also describes an automated reward shaping method for reward machine-based methods.

Appeared at the KR2ML workshop at NeurIPS 2019.

Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. ICML 2018.

Uses a form of a finite state automata to specify a reward function in a way that exposes the high-level structure of the task at hand. Also describes an algorithm that decomposes the automata and uses off-policy methods to learn policies for the sub-tasks that can also be interleaved to solve the full task.

Experimental details and a formal proof can be found in the supplementary material.

Teaching Multiple Tasks to an RL Agent using LTL. Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. AAMAS 2018.

Proposes the use of Linear Temporal Logic for specifying multiple tasks to a Reinforcement Learning agent. The language for task decomposition, off-policy learning of subtasks, and sharing subtask policies across multiple tasks.

Using Advice in Model-Based Reinforcement Learning. Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. CCAI 2018.

Considers the use of Linear Temporal Logic for provide advice about how a reinforcement learning agent should behave. Also defines a model-based reinforcement learning algorithm that can deploy such given advice.

An early version of this work first appeared at RLDM 2017

An Analysis and Enhancement of the Gap Heuristic for the Pancake Puzzle. Richard Valenzano and Danniel Sihui Yang. SoCS 2017.

Considers properties of random pancake puzzle problems, provides generators for harder problems, formally analyzes the local search topology of the problem, and uses that topology and a state's dual to perform a heuristic lookahead that leads to a new and stronger heuristic

Full proofs for the local search topology can be found in a technical report.

The code used can be found on Github.

Searching with a Corrupted Heuristic. Levi Lelis, Richard Valenzano, Gabriel Nazar, and Roni Stern. SOCS 2016.

Introduces the problem of dealing with a corrupted heuristic function, and describes several methods for the identification and correction of corrupted heuristics.

On the Completeness of Best-First Search Variants That Use Random Exploration. Richard Valenzano and Fan Xie. AAAI 2016.

Formally analyzes the completeness of best-first search variants that use random exploration, and shows that existing algorithms like epsilon-greedy, diverse best-first search, and type-based search can be guaranteed to solve any inifinite problem in the limit.

Using Metric Temporal Logic to Specify Scheduling Problems. Roy Luo, Richard Valenzano, Yi Li, Christopher Beck, and Sheila McIlraith. KR 2016.

Considers the use of metric temporal logic for the specification of scheduling problems, including complexity results regarding the difficulty of finding schedules to a given specification under different restrictions of the language

A longer version of this paper can be found as a technical report.

Worst-Case Solution Quality Analysis When Not Re-Expanding Nodes in Best-First Search. Richard Valenzano, Nathan Sturtevant, and Jonathan Schaeffer. AAAI 2014.

Formally shows that the suboptimality of any solution found by a best-first search that does not re-expand nodes can be bound by the inconsistency of the heuristic.

A Comparison of Knowledge-Based GBFS Enhancements and Knowledge-Free Exploration. Richard Valenzano, Nathan Sturtevant, Jonathan Schaeffer, and Fan Xie. ICAPS 2014.

Shows the value of adding random exploration to GBFS and compares existing GBFS enhancements to knowledge-free random baselines.

Using Alternative Suboptimality Bounds in Heuristic Search. Richard Valenzano, Shahab Jabbari Arfaee, Jordan Thayer, Roni Stern, and Nathan Sturtevant. ICAPS 2013.

Introduces the notion of alternative bounding paradigms and how one would construct algorithms to satisfy them.

Appeared as a two-page abstract at SOCS 2012

Better Time Constrained Search via Randomization and Postprocessing. Fan Xie, Richard Valenzano, and Martin Müller. ICAPS 2013.

Shows that using restarts and randomization can substantially improve plan quality when used in conjunction with a plan postprocessing system.

Evaluating State-Space Abstractions in Extensive-Form Games. Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. AAMAS 2013.

Describes the abstraction technique used in the Polaris computer poker player, demonstrates the effectiveness of imperfect recall for such abstractions, and considers existing metrics for evaluating the quality of an abstraction in extensive-form games.

ArvandHerd: Parallel Planning with a Portfolio. Richard Valenzano, Hootan Nakhost, Martin Müller, Jonathan Schaeffer, and Nathan Sturtevant. ECAI 2012.

Describes ArvandHerd, the parallel planner which won the multi-core track at IPC 2011, and evaluates the effectiveness the design decisions made in the creation of this planner as they pertain to coverage.

The description of ArvandHerd which appeared in the IPC 2011 Proceedings (and also discusses the design decisions made as pertaining to plan improvement) can be found here.

Simultaneously Searching with Multiple Settings: An Alternative to Parameter Tuning for Suboptimal Single-Agent Search Algorithms. Richard Valenzano, Nathan Sturtevant, Jonathan Schaeffer, Karen Buro, and Akihiro Kishimoto. ICAPS 2010.

Shows that dovetailing over portfolios constructed by using different parameterizations of the same heuristic search algorithm can be used to construct high performance (either for parallel or single-core) heuristic search systems for combinatorial domains and planning.

Appeared as a two-page abstract at SOCS 2010

## Technical Reports

A Formal Characterization of the Local Search Topology of the Pancake Puzzle. Richard Valenzano and Danniel Sihui Yang. CoRR 2017.

Formally proves the statements about the local search topology of the pancake puzzle, as appearing in a SoCS 2017 paper.

ArvandHerd 2014. Richard Valenzano, Hootan Nakhost, Martin Müller, Jonathan Schaeffer, and Nathan Sturtevant. IPC 2014.

Describes ArvandHerd as submitted to IPC 2014, with a focus on how it differs from the version submitted in 2011.

Adding Exploration to Greedy Best-First Search. Richard Valenzano, Nathan R. Sturtevant, and Jonathan Schaeffer. University of Alberta Technical Report TR13-06, 2013.

Shows that simple techniques for introducing exploration into search can substantially improve the coverage of automated planners.

ArvandHerd: Parallel Planning with a Portfolio. Richard Valenzano, Hootan Nakhost, Martin Müller, Jonathan Schaeffer, and Nathan Sturtevant. IPC 2011.

Describes ArvandHerd as submitted to IPC 2011, including the design decisions made for coverage and plan quality.

Arvand: the Art of Random Walks. Hootan Nakhost, Martin Müller, Richard Valenzano, and Fan Xie. IPC 2011.

Describes the Arvand planner which competed in satisficing track at IPC 2011.

## Theses

Design Decisions on Suboptimal Heuristic Search-Based Systems. Richard Valenzano. Ph.D. Thesis. University of Alberta, 2014.

Identifies a set of design decisions that can greatly impact the performance of heuristic search-based planning systems. This includes formally bounding how different design decisions can impact solution quality, demonstrating the value of random exploration, and building a multi-core portfolio that exploits planner variability to different design decisions.

The thesis is mainly based on four different conference publications, one for each main topic: ArvandHerd (ECAI 2012), alternative bounding (ICAPS 2013), node re-expansions (AAAI 2014), and random exploration (ICAPS 2014).

Simultaneously Searching with Multiple Algorithm Settings: An Alternative to Parameter Tuning for Suboptimal Single-Agent Search. Richard Valenzano. M.Sc. Thesis. University of Alberta, 2009.

Looks at using portfolios constructed by using different parameterizations of the same heuristic search algorithm as an alternative to parameter tuning when constructing heuristic search algorithms.

The consideration of the sensitivity to algorithm parameters and the value of dovetailing appeared at ICAPS 2010.