Reinforcement Learning
In reinforcement learning, an agent learns a policy to maximize the return (i.e., the cumulative reward) by interacting with the environment. However, when the state and action spaces are large, training a policy from scratch often takes a long time. Various studies have been conducted to enhance sample efficiency during training and expedite learning. Sample-efficient training enables the agent to achieve a better policy with the same number of training samples.
Common Tasks: Gymnasimum
Keywords: Sample efficient training, lifelong training, imitation learning, representation learning
Hopper (Early Training Phase)
Hopper (Early Training Phase)
(Sample Efficient Training)
Robot Training via Imitation Learning
Imitation learning enables an agent to learn a complex policy from expert demonstrations. This method is effective for tasks where the reward signal is sparse or the reward design is not straightforward. In this regard, behavioral cloning has been widely adopted for robot tasks and shown its effectiveness in a variety of tasks (See the following videos). However, limited generalizability in OOD (out-of-distribution) conditions still remains a key challenge.
Common Tasks: Franka Kitchen, pushT, Fetch
Keywords: lifelong training, sample efficient training, multi-modality, representation learning
Multi-agent Reinforcement Learning (MARL)
In multi-agent reinforcement learning, learning cooperative policies for agents still remains a challenging problem. Moreover, when communication is not allowed among agents, each agent must make a decision based solely on its partial observation, yielding a problem class as DecPOMDP (Decentralized Partially Observable Markov Decision Process). Training multiple agents often experiences non-stationarity and local convergence, thus it often requires a long training time. To address these limitations, MARL requires a more sophisticated learning strategy.
Keywords: coordination, cooperation, sample efficient training, representation learning
3m (SMAC)
6h_vs_8z (SMAC)
MMM2 (SMAC)
Representation Learning for sample-efficient training
Learning concise and informative representations is critical for efficient learning, as it captures key features important for completing a given task. Similarly, using semantically similar states during value evaluation in RL or MARL can expedite learning by enabling agents to identify key features more efficiently. There can be diverse ways to develop this semantic embedding.
Semantic Embedding via Latent Encoder
Semantic Embedding via Latent Space Quantization
(Multi-modal) Large Language Models
Large Language Models (LLMs) have demonstrated remarkable versatility, finding applications in diverse domains such as natural language generation, high-level decision-making, robotic action planning, industrial design tools, and even financial portfolio optimization. Despite the impressive capabilities of these foundational models, fine-tuning open-source variants for specific downstream tasks remains a significant challenge, presenting numerous research opportunities.
Keywords: RLHF, Alignment Algorithm, Representation learning
Robot Learning
Recently, robot training has drawn widespread attention amid the rapid development of AI. In the past, training the model from scratch was feasible. However, because the trained model often uses internet-scale training data to improve generalization, training from scratch becomes too expensive. Thus, instead, people resort to the so-called foundational model, even in robotics, and fine-tune it for downstream tasks. Although the fine-tuned model showed some generalization, it inevitably experiences catastrophic forgetting of previously learned tasks. To address this problem, lifelong training has been actively studied.
Common Tasks: Franka Kitchen, pushT, Fetch
Keywords: lifelong training, VLA model, generalization
Training Step = 0
Training Step = 200
Training Step = 400
Training Step = 800