My main research interests include multi-agent reinforcement learning, decision making under uncertainty and game theory. In my work I have focused on providing decision support to individuals in a setting where a large number of self interested individuals are present. Aggregation systems are one such domain where aggregation companies (e.g., Uber, Lyft, FoodPanda, Deliveroo) aggregates supply (e.g., drivers, delivery personnel) and matches demand to supply on a continuous basis. The individuals who are responsible for supply (e.g., taxi drivers,delivery bikes or delivery van drivers) earn more by being at the ”right” place at the ”right” time. My current research focuses on providing multi-agent reinforcement learning approaches and game theoretic approaches to learn individual policies in such domains.
Figure 1: Framework of Aggregation System
Figure 1 provides an overview of the aggregation system. Aggregation companies and individuals are mutually dependent on each other as it is the individuals who execute the actions whereas the demand assignment is done by the company. The aggregation companies have the full view of the system and it can act as central agent and learn policies to maximize the overall payoff and suggest it to the individuals. However, due to optimizing a metric of importance to the central agent, the interests of individuals can be sacrificed. Also, due to the selfish nature of the individuals, they might not be interested in following the suggestion. Hence, my work focuses on providing learning approaches to the individuals in the presence of a self-interested centralized entity.
Many of the current algorithms assume presence of a centralized entity which provides extra information about environment state or joint action to the individuals either during training time or both at the training as well as at execution time. However, the presence of a self-interested centralized entity which acts as an intermediary between the environment and the individual learners adds a new dimension to the learning problem which the level of learning done by the centralized entity. Figure 2 places different pieces of my work against these two learning dimensions and shows that it covers all the combinations of the two dimensions where few of proposed approaches learn solely from the local observations whereas extra learning happens at the central agent’s end for rest of the proposed methods. The high level overview of my work is as follows
Figure 2: Different levels of learning dimensions covered by my work
Independent Learning from Offline Trajectories of Agents
In this work the individuals learn from the real-world (Global Positioning System) GPS trajectories of the other taxis present in an offline data set. This approach is suitable when there are very few learning agents present in the environment and rest of the agents follow stationary policies. Experimental results using real-world data show that a learning agent is able to earn revenue which is comparable the revenue earned by the top 10 percentile of the real-world taxi drivers.
Exploiting Interaction Anonymity
In this work we utilize the interaction anonymity (the payoff of an agent depends on the number of other agents selecting the same action rather than the identities of the other agents) feature of the aggregation domain and consider the local count statistics of the other agents in their learning model. More specifically, we predict and control the non-stationarity introduced due to the presence of other agents by learning policies that maximize the entropy on agent population distribution.
Learning Equilibrium Policies
In this work we provide a centralized learning decentralized execution algorithm where the central agent learns from the learned values of the individual agents and provides the extra information only during the learning phase of the algorithm. Building insights from the non-atomic congestion games model, we provide theoretical properties of equilibrium in anonymous domains. Based on these properties, we propose value variance minimization Q-learning approach to learn ε−Nash equilibrium policies.
Correlated Learning
We propose a correlated learning method where central entity learn directly from the experiences of the individuals. Based on the experiences, the centralized agent learns a policy which optimizes its objective of maximizing social welfare and suggests it to the individuals. Experimental results show that it results into a ”win-win situation” where both central agents and individuals receive better payoff than the other learning approaches.
Incentive Based Q-Learning
To maximize the overall performance of the system, we propose an incentive based Q-learning method where the central agent learns a social welfare maximization policy. Based on the insights from mechanism design, the central agent provides incentives to the individuals such that they learn to follow the suggested action.