AI Core

Research Vision - Building autonomous agents and enhancing their sequential decision-making capabilities

Research Topic #1: Fundamental AI Research

Reinforcement Learning & Multi-agent Reinforcement Learning

Reinforcement Learning

In reinforcement learning, an agent learns a policy to maximize the return (i.e., the cumulative reward) by interacting with the environment. However, when the state and action spaces are large, training a policy from scratch often takes a long time. Various studies have been conducted to enhance sample efficiency during training and expedite learning. Sample-efficient training enables the agent to achieve a better policy with the same number of training samples.

Common Tasks: Gymnasimum

Keywords: Sample efficient training, lifelong training, imitation learning, representation learning

Hopper (Early Training Phase)

(Sample Efficient Training)

Robot Training via Imitation Learning

Imitation learning enables an agent to learn a complex policy from expert demonstrations. This method is effective for tasks where the reward signal is sparse or the reward design is not straightforward. In this regard, behavioral cloning has been widely adopted for robot tasks and shown its effectiveness in a variety of tasks (See the following videos). However, limited generalizability in OOD (out-of-distribution) conditions still remains a key challenge.

Common Tasks: Franka Kitchen, pushT, Fetch
Keywords: lifelong training, sample efficient training, multi-modality, representation learning

Multi-agent Reinforcement Learning (MARL)

In multi-agent reinforcement learning, learning cooperative policies for agents still remains a challenging problem. Moreover, when communication is not allowed among agents, each agent must make a decision based solely on its partial observation, yielding a problem class as DecPOMDP (Decentralized Partially Observable Markov Decision Process). Training multiple agents often experiences non-stationarity and local convergence, thus it often requires a long training time. To address these limitations, MARL requires a more sophisticated learning strategy.

Common Tasks: SMAC, SMACv2, GRF

Keywords: coordination, cooperation, sample efficient training, representation learning

3m (SMAC)

6h_vs_8z (SMAC)

MMM2 (SMAC)

Representation Learning

Representation Learning for sample-efficient training

Learning concise and informative representations is critical for efficient learning, as it captures key features important for completing a given task. Similarly, using semantically similar states during value evaluation in RL or MARL can expedite learning by enabling agents to identify key features more efficiently. There can be diverse ways to develop this semantic embedding.

Semantic Embedding via Latent Encoder

Semantic Embedding via Latent Space Quantization

Foundational Models

(Multi-modal) Large Language Models

Large Language Models (LLMs) have demonstrated remarkable versatility, finding applications in diverse domains such as natural language generation, high-level decision-making, robotic action planning, industrial design tools, and even financial portfolio optimization. Despite the impressive capabilities of these foundational models, fine-tuning open-source variants for specific downstream tasks remains a significant challenge, presenting numerous research opportunities.

Keywords: RLHF, Alignment Algorithm, Representation learning

Robot Learning

Recently, robot training has drawn widespread attention amid the rapid development of AI. In the past, training the model from scratch was feasible. However, because the trained model often uses internet-scale training data to improve generalization, training from scratch becomes too expensive. Thus, instead, people resort to the so-called foundational model, even in robotics, and fine-tune it for downstream tasks. Although the fine-tuned model showed some generalization, it inevitably experiences catastrophic forgetting of previously learned tasks. To address this problem, lifelong training has been actively studied.

Common Tasks: Franka Kitchen, pushT, Fetch
Keywords: lifelong training, VLA model, generalization

Training Step = 0

Training Step = 200

Training Step = 400

Training Step = 800

Page updated

Google Sites

Report abuse