Conference Papers, Invited
2022
[FMI18] Takayuki Osogami and Rudy Raymond, "Determinantal Reinforcement Learning with Techniques to Avoid Poor Local Optima". In: Cheng J., Dinghua X., Saeki O., Shirai T. (eds) Proceedings of the Forum "Math-for-Industry" 2018. Mathematics for Industry, vol 35. Springer, Singapore. PDF
Abstract Reinforcement learning for controlling multiple collaborative agents is an important task for many real-world applications, such as those involving robots and IoT devices. For most applications, the action of each agent should not only be relevant but also have diversity according to the role of the agent in the target task. In our prior work, we have proposed the use of the determinant of a positive semidefinite matrix to approximate the action-value function in reinforcement learning. We have designed an algorithm to efficiently learn the matrix such that it represents both the relevance and diversity of the actions. The algorithm is parameterized by the effective rank of the matrix that can be significantly lower than the full one, hence resulting in efficient learning. Based on theoretical derivation of our proposed approach, here we show how our algorithm can avoid the poor local optima that often hinder reinforcement learning approaches.
2019
[NeurIPS19] Cinjon Resnick, Chao Gao, Görög Márton, Takayuki Osogami, Liang Pang, and Toshihiro Takahashi, "Pommerman & NeurIPS 2018: Multi-agent competition," in The NeurIPS '18 Competition: From Machine Learning to Intelligent Conversations, Pages 11-36, Springer, 2019. PDF
2016
[FMI15] Takayuki Osogami, "Human choice and good choice," in The Role and Importance of Mathematics in Innovation - Proceedings of the Forum of Mathematics for Industry 2015, Springer, 2016. PDF
Abstract The choice made by humans is known to depend on available alternatives in rather complex but systematic ways. There has been a significant amount of work on choice models for modeling such human choice. Most of the existing choice models, particularly those in the class of random utility models, however, cannot represent one of the typical phenomena of human choice, known as the attraction effect. Here, we review recent development of choice models that can be trained to learn the attraction effect and other typical phenomena of human choice from the data of the choice made by humans. We also discuss possible extensions of such work on choice models, which suggests potential directions of future research.
2004
[ALLERTON04] Takayuki Osogami, Mor Harchol-Balter, Alan Scheller-Wolf, Li Zhang, "Exploring Threshold-based Policies for Load Sharing," The Forty-Second Annual Allerton Conference on Communication, Control, and Computing, pages 1012-1021, Urbana, IL, September 2004. Extended version available as a technical report: PDF
Abstract We consider the problem of how to design resource allocation policies that both provide good performance at predicted environmental conditions and are robust against changes or misprediction of the environmental conditions. We evaluate various common threshold-based allocation policies within a simple model, where there is a clear tradeoff between the (conflicting) goals of good performance and robustness. We then propose and evaluate a new threshold-based policy, ADT (adaptive dual thresholds), that achieves both the desired goals.