When a robot is tasked to achieve a goal (for example, turning off a switch) by a human user, it may cause side-effects to the environment (for example, opening/closing doors, moving around boxes, making a carpet dirty). The human user may (incorrectly) believe that the robot would not cause side-effects to achieve a goal or have commen sence like "of course the robot wouldn't step on my carpet." So if the robot simply optimizes a policy to achieve the specified goal, it may cause side-effects that negatively surprise the human.
If the robot does not know any policy that guanratees to not negatively surprise the user, how should it query the user to gain more information about her preferences to find such a policy? We identify a set-cover structure in the problem and design a greedy query selection algorithm.
Shun Zhang, Edmund H. Durfee, and Satinder Singh. Querying to Find a Safe Policy Under Uncertain Safety Constraints in Markov Decision Processes. AAAI Conference on Artificial Intelligence (AAAI), 2020. paper poster
Suppose the robot now knows a safe but possibly inefficient policy. How should the robot query to improve its policy?
Shun Zhang, Edmund H. Durfee, and Satinder Singh. Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes. International Joint Conference on Artificial Intelligence (IJCAI), 2018. paper slides
When a robot serves a human user and is uncertain about the human's preference (the reward function), what's the best question to ask to improve its policy?
Shun Zhang, Edmund H. Durfee, and Satinder Singh. Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes. International Conference on Automated Planning and Scheduling (ICAPS), 2017. paper slides
How does human learn complicated tasks so quickly? We use Modular Inverse Reinforcement Learning to interpret how a human uses knowledge from previous learned simple tasks.
Ruohan Zhang, Shun Zhang, Matthew Tong, Mary Hayhoe, and Dana Ballard. Modeling Sensorimotor Behavior through Modular Inverse Reinforcement Learning with Discount Factors. Journal of Vision, 2017. slides
Suppose we want to make a flock to move towards a certain direction by placing some influencing agents. What are the best positions to place them?
Katie Genter, Shun Zhang, and Peter Stone. Determining Placements of Influencing Agents in a Flock. Autonomous Agents and Multiagent Systems (AAMAS), 2015. paper slides
Can we find a policy better than traffic signals, if human-driven, semi-autonomous and fully autonomous vehicles all share the road?
Tsz-Chiu Au, Shun Zhang, and Peter Stone. Semi-Autonomous Intersection Management (Extended Abstract). Autonomous Agents and Multiagent Systems (AAMAS), 2014. paper slides)