TTIC Chicago Summer Workshop (Jul 13-15, 2022)

New Models in Online Decision Making for Real-World Applications

The goal is to present recent works that study online decision making settings that go beyond Bandits and Reinforcement Learning. Especially tackling challenges that arise in real-world problems, where the possibilities of feedback are vast and complex, and ways to incorporate them theoretically in the problem formulation. This can include but is not limited to: RL beyond the classical reward-MDP based model, online learning with implicit feedback, multi-criteria objectives, preference-based RL, multiagent games, learning in partially observed systems, batch adaptive RL, distributional RL, learning with corrupted or delayed feedback, to name a few. Our overall aim is to showcase interesting works with richer feedback models coming from applications, or with new objectives requiring a new interpretation of existing feedback structures. We would also very much welcome new work along these lines.


We invite submissions of posters on all topics related to the above theme of research. Some concrete directions include (but not limited to):

1. Multiagent RL. In many applications, RL agents do not act in isolation but are expected to share their environment with other learning agents. When multiple agents are learning simultaneously in the same environment, many issues arise. Chiefly, the objective of reward maximization may not be enough to capture the set of achievable outcomes. Instead, a variety of notions of equilibria have been proposed to understand these scenarios. Understanding the optimal behavior of an agent in a multiagent environment is further complicated by the nature of the interaction between the different agents as for example agents in an ensemble of cooperative agents may require a different algorithm than a pair of competitive ones. Online commerce has given rise to a myriad of scenarios where multiple algorithmic agents are interacting with each other, and with people, thus forming a rich source of applications for the development of a theory of multiagent RL. The latter is most evident in the renewed vigor with which the field of econ-ml has come to the fore.


2. Meta learning. The traditional learning paradigm for reinforcement learning considers the problem of solving a single task, whose reward and dynamics are defined in isolation from any other learning objective. The recognition of the inadequacy of this simple model in capturing the intimate relationships between problems encountered in practice has led to the development of new ways of understanding task relatedness in the form of the field of Meta RL. Although there has been a flurry of applied algorithmic work, and some theoretical advances have been achieved in the bandit setting, the theoretical understanding of Meta RL is still in its infancy.


3. Offline RL. In the offline RL setting, an algorithm designer is being given an offline data sample by interacting with an environment with a baseline policy. Given access to this data, we wish to find a good policy. Recently, there has been a surge of interest in this setting, both in terms of new algorithms (e.g. Xie et al. 2021) as well as improving our understanding of the fundamental limits (Foster et al. 2022). It is now well understood that the offline RL setting is hard from a statistical perspective. Nevertheless, it is yet to be explored under which realistic assumptions offline RL can be efficiently solved.

4. Beyond reward based feedback. Standard algorithms that tackle bandits and RL assume the feedback is being given in terms of an “absolute reward” for each observed state-action tuple. However, in real-world systems, the possibilities of collecting data from different types of feedback mechanisms are vast, e.g. click-through rate, preference feedback, partial ranking, multi-objective rewards, and trajectory-based feedback among others. Developing efficient algorithms for exploiting all these different types of reward structures would broaden the set of applications where effective online decision making algorithms can be deployed.

5. Decision-making with partial observability. Many decision-making problems are partially observed. In such problems, the algorithm designer is not given full knowledge of the true state of the system. Instead, it only has access to observations that may not be sufficient to decode it. Although the ubiquity of this scenario, partially observed decision-making problems remain, to large extent, under-explored.

6. The reward engineering problem in RL and preference-based Learning. A significant obstacle to deploying reinforcement learning algorithms lies in designing the reward function. In many tasks, the reward is manually constructed to align the performance of an RL algorithm with the task’s objective. This often requires substantial effort from domain experts and may result in sub-optimal system performance. Thus, instead of manually engineering a reward function, it may be beneficial to design algorithms that gather proxy signals for the reward function, (e.g., preferences of users) that can enable the algorithm designer to construct reward signals whose optimization yields the desired behavior. This enables efficient and automatic adaptation of the algorithm to the unknown reward function.

Broadly, the goal of this workshop is to encourage the ML community to rethink standard decision-making interaction models that are usually being assumed. Undoubtedly, classical online learning algorithms can result in well-performing algorithms for simple problems. But that is often unsuitable for many learning scenarios in practice.


Deadlines:

Please submit your poster by June 30th, 2022, AOE ("Anywhere On Earth"). Late submissions would not be considered.


Submission portal:

Posters can be submitted through OpenReview.net !


Style and Author Instructions:

Max. size of the poster should not exceed 20MB. It should be in landscape format. You are recommended to use the AISTATS 2022 template here.



Subscribe to our mailing list!

Sponsored by