Workshop on Large Language, Multi-modal and Decision Models 

@ Distributed AI (DAI) 2023

 Nanyang Technological University, 50 Nanyang Avenue, Singapore 

November 30, 2023

Mission of the Workshop

Bridging the Divide: Unleashing the Potential of Integrated Large Language, Multi-modal, and Decision Models

The burgeoning field of Large Language Models (LLMs) and Multi-modal Models, underscored by their prowess in assimilating vast swathes of vision and language data, have carved a niche in executing a myriad of downstream tasks across domains like dialogue systems, autonomous navigation, healthcare, and robotics. However, as these models transcend into real-world applications, they grapple with novel challenges encompassing external feedback assimilation, task modality adaptation, long-term reasoning and planning, and action grounding — realms traditionally navigated by sequential decision-making paradigms like reinforcement learning, planning, and optimal control. Unlike the narrowly focused task-specific models of yore, the transcendent nature of LLMs, equipped with a broad spectrum of prior knowledge, promises a fertile ground for augmenting sample efficiency and generalization.

This DAI 23 Workshop endeavors to foster a collaborative milieu between the Sequential Decision Making, Multi-agent Learning, and Large Language/Multi-modal Models communities. This meld seeks to address the void of action-oriented data in LLM training, exploring novel frameworks to enrich model understanding and interactions with humans, multi-agent, tools, the digital and physical realms, and other models in decision-making environments.

 Invited Speakers

Assistant Professor

Shanghai Jiao Tong University

Shiguang Wu

Staff Researcher

Noah’s Ark Lab, Huawei

Assistant Professor

King’s College London


Jian Zhao

Researcher

Polixir.AI


Weirui Ye

Ph.D. Student

Tsinghua University


Longtao Zheng

Ph.D. Student

Nanyang Technological University


Ziyu Wan

Ph.D. Student

Shanghai Jiao Tong University


Workshop Program

9:10-9:15

Yaodong Yang, Peking University

Opening

9:15-10:00

Zhouhan Lin, Shanghai Jiao Tong University

Zhouhan Lin is an assistant professor at the John Hopcroft Center of Computer Science at Shanghai Jiao Tong University (SJTU). He is leading the Language Understanding and Machine Intellengence Algorithms (LUMIA) group, focusing on machine learning and NLP. Before joining SJTU, He was a visiting scientist at Facebook AI Research (FAIR) in Menlo Park, CA, working with Michael Auli. He received his Ph.D. in Computer Science from the Mila lab in the University of Montreal in 2019, where he was fortunately supervised by Yoshua Bengio. During his Ph.D., He has been interning at Google AI in the Language team in New York City, and at IBM Watson with Bowen Zhou and Mo Yu in Yorktown Height, NY. He also worked as a part-time student researcher at Microsoft Research with Alessandro Sordoni and Adam Trischler in Montreal. Prior to Mila, He received his B.Sc. (2012) and M.Sc. (2014) degrees from Harbin Institute of Technology.

HuRef: HUman-REadable Fingerprint for Large Language Models

Protecting the copyright of large language models (LLMs) has become crucial due to their resource-intensive training and accompanying carefully designed licenses. However, identifying the original base model of an LLM is challenging due to potential parameter alterations through fine-tuning or continued pretraining. In this study, we introduce HuRef, a human-readable fingerprint for LLMs that uniquely identifies the base model without exposing model parameters or interfering with training. We first observe that the vector direction of LLM parameters remains stable after the model has converged during pretraining, showing negligible perturbations through subsequent training steps, including continued pretraining, supervised fine-tuning (SFT), and RLHF, which makes it a sufficient condition to identify the base model. The necessity is validated by continuing to train an LLM with an extra term to drive away the model parameters' direction and the model becomes damaged. 

However, this direction is vulnerable to simple attacks like dimension permutation or matrix rotation, which significantly changes it without affecting performance. To address this, leveraging the Transformer structure, we systematically analyze potential attacks and define three invariant terms that identify an LLM's base model. We make these invariant terms human-readable by mapping them to a Gaussian vector using a convolutional encoder and then converting it into a natural image with StyleGAN2. The encoder discriminates between invariants from different base models and ensures Gaussian output through adversarial training, while StyleGAN2 transforms Gaussian vectors into dog images. Consequently, our method generates a dog image as an identity fingerprint for an LLM, where the dog's appearance strongly indicates the LLM's base model. Specifically, if the LLM is adapted from another base model, the generated dog highly resembles that model; otherwise if trained independently from scratch, it exhibits a unique dog image distinct from other models. Experimental results across various LLMs demonstrate the effectiveness of our method, the generated dog image remains invariant to different training steps, including SFT, RLHF, or even continued pretraining with augmented vocabulary in a new language.

10:00-10:45

Yali Du, King's College London

Dr Yali Du is a Lecturer (Assistant Professor) in the Department of Informatics at King’s College London. She leads the Cooperative AI Lab. Her research aims to enable machines to exhibit cooperative and responsible behaviour in intelligent decision making tasks. Her work focuses on reinforcement learning and multi-agent cooperation, with topics such as generalisation, zero-shot coordination, evaluation of human and AI players, and social agency (e.g., human-involved learning, safety, and ethics). She was chosen for the AAAI New Faculty Highlights award (2023), Rising Star in AI 2023. She has given tutorials on cooperative multi-agent learning at ACML 2022 and AAAI 2023. She serves as the editors for Journal of AAMAS and IEEE Transactions on AI, and organising committee for AAMAS 2023. Her research is also supported by the Engineering and Physical Sciences Research Council (UKRI-EPSRC).

Multi-agent Cooperation in Social Context

In the coming years, diverse ecologies of AI systems are envisioned to rapidly and complexly interact with each other and with humans. Collaborative industrial robots will work on factory floors alongside labourers, care robots will assist human health workers, and personal AI assistants will help with scheduling, albeit in an elementary way. Therefore, it is essential to develop AI systems that can effectively and reliably collaborate with humans in various contexts. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions and possess the ability to adapt to changing circumstances? In this talk, I will provide an overview of the problem and discuss our attempts in human-centred coordination and alignment, as well as collaboration in social context.

10:45-11:15

Coffee Break

11:15-12:00

Shiguang Wu, Noah's Ark Lab, Huawei

Institute of Automation, Chinese Academy of Sciences - Ph.D., Huawei Noah's Ark Laboratory - Researcher. Mainly responsible for the research and implementation of technical directions such as embodied intelligence and decision-making basic models.

Challenges and Exploration of Large Embodied Intelligent Decision-Making Models

Speech Outline:

1. Research status and trends of intelligent decision-making

2. Challenges and difficulties faced by large decision-making models

3. Technical exploration based on large model of embodied intelligent decision-making for robots

12:00-14:00

Lunch

14:00-14:40

Jian Zhao, Polixir.AI

Zhao Jian is the Algorithm Director at Polixir currently. he obtained his Ph.D. in 2023 from the University of Science and Technology of China. His research interest includes Game AI, Reinforcement Learning, and Multi-Agent Systems. He has published over twenty papers in both domestic and international academic journals and conferences. Notably, he has won awards such as the First Tencent AI Arena Multi-Agent Reinforcement Learning Competition championship, and the RLChina Agent Challenge 2021 Summer and 2022 Spring Champion. Currently, at Polixir, he is primarily engaged in the practical application of reinforcement learning.

Inspiration Brought to Decision-Making Tasks by Large Language Models

Large language models, a recent hot topic, have achieved versatility across various tasks. However, from the perspective of reinforcement learning, current LLM can't tackle problems of decision intelligence. Unfortunately, reinforcement learning is still stuck at one-model-solving-one-task level, posing a significant challenge to the generalizability.To address this issue, we try to introduce pre-trained models to decision tasks. Now we focus on multi-agent systems and pay attention to the competitive and cooperative elements among agents. By extracting representations of different tasks, we train a decision model. Subsequently, through fine-tuning to learn task-specific representations, tasks can be swiftly accomplished using the pre-trained model. We have conducted experiments in various environments, yielding promising results.

14:40-15:20

Weirui Ye, Tsinghua University

Weirui Ye received his bachelor's degree from Tsinghua University in 2020. He is now a Ph.D. student at IIIS, Tsinghua University, advised by Prof. Yang Gao. His research interest lies in sample efficient policy learning, including Model-based RL, large models for decision-making, etc. His works have been accepted in the conferences of artificial intelligence, including NeurIPS, ICLR, AAAI.

Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance

In this work, we focus on the concrete form in which to represent embodied foundation priors and propose an intuitive and effective set of the priors that consist of foundation policy, foundation value, and foundation success reward. We name such framework as Foundation Reinforcement Learning (FRL), since it completely relies on embodied foundation priors to explore, learn and reinforce. To verify its effectiveness, we instantiate an actor-critic method with the assistance of the priors, called Foundation Actor-Critic (FAC). The benefits of FRL are threefold: sample efficient learning, robust to noisy priors, and minimal human intervention, which is a novel and powerful learning paradigm, towards achieving an embodied generalist agent.

15:20-15:50

Coffee Break

15:50-16:30

Longtao Zheng, Nanyang Technological University

Longtao Zheng is a Ph.D student in the School of Computer Science and Engineering at Nanyang Technological University (NTU), advised by Prof. Bo An. He received Bachelor of Computer Science at University of Science and Technology of China (USTC) in June 2022. Before NTU, he was advised by wonderful mentors: Dr. Yujing Hu at NetEase Fuxi AI Lab, Prof. Jianmin Ji at USTC Robotics Lab, and Prof. Fei Chiang at McMaster University. His research interests span reinforcement learning and foundation models for decision making.

A Path towards Foundation Agents

Building agents with large language models (LLMs) for computer control is a burgeoning research area, where the agent receives computer states and performs actions to complete complex tasks. In this talk, we will review the transition of agent research in the LLM era and discuss the future development of foundation agents. We will also introduce Synapse, a computer agent featuring three key components: state abstraction, trajectory-as-exemplar prompting, and exemplar memory.

16:30-17:10

Ziyu Wan, Shanghai Jiao Tong University

Ziyu Wan is a 3rd year Ph.D. student in Shanghai Jiao Tong University. His research interest lies in large language models, multi-agent reinforcement learning, large model for decision making etc. His works have been accepted in NeurIPS, ICLR, JMLR.

Discovering Tree Search in LLMs

Large language models (LLMs) typically employ sampling or beam search, accompanied by prompts such as Chain-of-Thought (CoT), to boost reasoning and decoding ability. Recent work like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment the reasoning capabilities of LLMs by utilizing tree-search algorithms to guide multi-step reasoning. These methods mainly focus on LLMs' reasoning ability during inference and heavily rely on human-designed prompts to activate LLM as a value function, thus lacking general applicability and scalability. To address these limitations, we present an AlphaZero-like tree-search learning framework for LLMs (termed TS-LLM), systematically showing how tree-search with a learned value function can guide LLMs' decoding ability. TS-LLM distinguishes itself in two key ways: (1) Leveraging a learned value function, our approach can be generally applied to different tasks beyond reasoning (such as RLHF alignment), and LLMs of any size, without prompting advanced, large-scale models. (2) It can guide LLM's decoding during both inference and training. Empirical evaluations across reasoning, planning, and RLHF alignment tasks validate the effectiveness of TS-LLM, even on trees with a depth of 64.

17:10-17:20

Ying Wen, Shanghai Jiao Tong University

Closing

 Organizers

Ying Wen

Shanghai Jiao Tong University

Yaodong Yang

Peking University

Weinan Zhang

Shanghai Jiao Tong University

Muning Wen

Shanghai Jiao Tong University