dr. Cheems Wang
Researcher from Tsinghua University and AMLab at University of Amsterdam
Researcher from Tsinghua University and AMLab at University of Amsterdam
BIOGRAPHY
My name is Qi (Cheems) Wang (You can call me Cheems). I am currently a Research Fellow at Tsinghua University, working closely with Prof. Xiangyang Ji. I finished the machine learning Ph.D. project at Amsterdam Machine Learning Lab (AMLab), University of Amsterdam. Thanks to my great Ph.D. supervisors, Prof. Max Welling and Dr. Herke van Hoof, for supporting me in finishing this project.
Moreover, my research focuses on fundamental learning paradigms in large models, such as Meta Learning, Multi-Task Learning, and Reinforcement Learning. The principal goal of my Ph.D. project is to achieve convincing uncertainty quantification and enable skill transfer across tasks for fast deployment. I have published 4 ICML papers, 7 NeurIPS papers, 2 ICLR paper, 2 KDD paper, 1 AAAI paper, 1 ICCV paper, X papers under review, and XX papers in progress. My PhD thesis " Functional Representation Learning for Uncertainty Quantification and Fast Skill Transfer " is available in the link, together with the defense video in the link. Besides, I hope you can access my favorite research output after my graduation from VAE's birthplace Model Predictive Task Sampling for Efficient and Robust Adaptation (small generative model to steer large generative models' optimization) Thanks for the respected Editor and Reviewers, I am active in revision🙂 The code of Model Predictive Task Sampling (MPTS) is already made public in January https://github.com/thu-rllab/MPTS !
More interesting work is ongoing and please follow my updates though Googlescholar (Only selected publications appear in the googlescholar). For any guys who want to achieve scientific collaboration in publishing interesting papers, feel free to contact me😎.
Follow me: X/Twitter Platform @AlbertW24045555, Chinese Wiki Knowledge Platform 知乎, Wechat Platform(微信公众号)/Xiaohongshu(小红书账号):SIG-IDM
Team Wesite: For those who take interest in intelligent decision-making, feel free to access our team members' research output in ICML/NeurIPS/ICLR etc. Our team is under supervision of Prof. Xiangyang Ji in academia. https://www.thuidm.com/
招收硕士研究生📯:本人年近四旬,长期Single,未追求人才🧢,坚持每年写1-2篇一作文章,长期推数学公式和阅读论文,科研之事,事必躬亲🙂↔️。课题组经费充足,欢迎报考本人在长沙某校数学系的研究生(实际从事人工智能研究Domain Generalization from Math to Computer Science 😐),保研生请发邮件至:hhq123go@gmail.com,我的微信:Cheems_QW。
具体要求数学基础扎实(本人不聚焦基础数学研究,申请者仅需要对微积分,线性代数和优化有一定了解,具有一定编程能力即可),具有团队精神,吃苦耐劳,聚焦生成式AI前沿研究,本团队稳定输出ICML/NeurIPS/ICLR等顶会,业绩优异者可推荐大厂~科研需要激情、热情和爱好,报考本团队请三思而行,围绕AI前沿研究,不追求论文数量,然鹅,科研强度较大,适合愿意拼搏和对未来职业规划有良好预期的同学~
Highlight Publications🥳🥳🥳
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?, Yun Qu, Qi Cheems Wang#, Yixiu Mao, Vincent Tao Hu, Xiangyang Ji#. Arxiv, Prepare for Submission
Model Predictive Task Sampling for Efficient and Robust Adaptation, Qi Cheems Wang*, Zehao Xiao*, Yixiu Mao*, Yun Qu*, Jiayi Shen, Yiqin Lv, Xiangyang Ji. Under Revision
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments, Yun Qu*, Qi Cheems Wang*, Yixiu Mao*, Yiqin Lv, Xiangyang Ji. ICML 2025
Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation, Qi Cheems Wang*, Yiqin Lv*, Yixiu Mao*, Yun Qu, Yi Xu, Xiangyang Ji. KDD-Research Track 2025
Doubly Mild Generalization for Offline Reinforcement Learning, Yixiu Mao, Qi Cheems Wang, Yun Qu, Yuhang Jiang, Xiangyang Ji. NeurIPS 2024.
Bridge the Inference Gaps of Neural Processes via Expectation Maximization, Qi Cheems Wang, Marco Federici, Herke van Hoof. ICLR 2023
Model-based meta reinforcement learning using graph structured surrogate models and amortized policy search, Qi Cheems Wang, Herke van Hoof. ICML 2022
Functional representation learning for uncertainty quantification and fast skill transfer, Qi Cheems Wang. University of Amsterdam-PHD Thesis 2022
Received Grants
-国家自然科学基金(青年项目):Neural Process模型的多样化高保真技术研究 2024-2026 编号:2306326 经费:30W 状态:主持在研
-国家自然科学基金(重大项目课题分承研):异构具身多智能体协作与博弈机制 2025-2029 编号:62494509011 经费:63W 状态:子课题主持在研
-某部委工程基金:基于因果强化学习的策略生成算法 2023-2025 经费:185W 状态:主持已结题
-某部委创新科技特区基金:基于生成式仿真智能的大模型稳定生成 2023-2024 经费:50W 状态:主持已结题
-某部委基础加强基金(重点项目课题分承研):基于生成式仿真智能的时空轨迹学习 2023-2026 经费:95W 状态:子课题主持在研
Recent News and Publications
August 15th: I am invited to serve as the Area Chair (Senior PC) for AAMAS 2026. Thanks for the Program Chairs' nomination🎉I am happy to accept it ☺️!
July 7th: Our paper "Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?" is now made public! We successfully extend model predictive task sampling to the case with discrete task spaces, and accelerate RL finetuning Large Language Models! 🥳
May 1st: Our paper "Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments" was accepted to ICML2025, Congratulations to Qu Yun! Want to know how to robustify Meta RL and Domain Randomization without extra agent-environment interactions and accelerate learning process, feel free to find it at https://thu-rllab.github.io/PDTS_project_page/.
April 9th: I am invited to serve as the Area Chair for NeurIPS 2025. Thanks for the Program Chairs' nomination🎉yet I fail to work as🥹.
January 19th: Our paper "Model Predictive Task Sampling for Efficient and Robust Adaptation" is made public! This is the first trial to successfully predict the optimization outcome after any-shot adaptation given some tasks and model predictive task sampling (MPTS) significantly improves the adaptation robustness of multimodal foundation models and sequential decision-making with learning efficiency retained!🥳One of my favorate research outputs I can feel proud of after my graduation from AMLab (Birthplace of VAEs)!
January 19th: "DynaPrompt: Dynamic Test-Time Prompt Tuning" was accepted to ICLR 2025. Congratulations ot Zehao and I feel honored to collaborate in this Project!
December 10th: Our paper "Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning" was accepted to AAAI2025. Congratulations to Yun Qu! Preprint will come soon.
November 17th: Our paper "Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation" got accepted to KDD2025 Research Track. Congratuations to all collaborators!
September 26th: Our paper "Doubly Mild Generalization for Offline Reinforcement Learning" got accepted to NeurIPS2024. Preprint will come soon.
September 26th: Our paper "Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression" got accepted to NeurIPS2024. Preprint will come soon.
September 26th: Our paper "GO4Align: Group Optimization for Multi-Task Alignment" got accepted to NeurIPS2024. Camera ready version will come soon. [Paper Link, Code Link]
September 26th: Four papers got accepted to NeurIPS2024. Congratulations to all students I co-superivsed (Cheems as the Second-Author for ALL 🤪)
July 28th: Our latest work "Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation" was now available! Feel free to read our blog here. This work handles the task distribution shifts between meta-training and meta-testing and proposes to generate explicit task distributions in an adversarial way, increasing the robustness of fast adaptation 😄! Codes will be released and watch our progress! [Paper Link, Blog Link]
May 1st: The research work "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation" was accepted to ICML2024, congratulations to all my awesome collaborators! Here, a surrogate back-propagation algorithm is developed together with a variant of Layer Normalization. This work significantly reduces memory usage on Lima and ViT and provides theoretical guarantees. Codes will be released soon!
April 2024: For researchers interested in Multi-task Learning, you cannot miss this wonderful work🥳, "GO4Align: Group Optimization for Multi-Task Alignment". Here, we align the learning progress across tasks to promote the exploitation of beneficial task-relatedness and achieve SOTA performance in most existing benchmarks. Codes will be released soon!
October 2023: I joined Tsinghua University and working as a researcher in automation. Thanks for Prof. Xiangyang Ji's supervision.
September 22nd 2023: Congratulations to Jiayi Shen! Our manuscript "Episodic Multi-Task Learning with Heterogeneous Neural Processes" was accepted to NeurIPS2023 Spotlight (top 3.06% in all submissions), and this work proposes heterogeneous neural processes to exploit task relatedness in multi-task learning. [Paper Link, Code Link]
July 2023: Many thanks to Max and Herke's nomination and Sara, Yang, Sihang's recommendation, I am honored to be awarded China Computer Federation Multi-agent System Groups (CCF-CMAS) Best Dissertation Prize [中国计算机学会-多智能体系统学组优博论文奖].
January 21st 2023: Our manuscript "Bridge the Inference Gaps of Neural Processes via Expectation Maximization" was accepted to ICLR2023.
October 10th 2022: Very pleased that I received NeurIPS Scholar Award, and thanks to the NeurIPS Foundation.
September 14th 2022: Our paper Learning Expressive Meta-Representations with Mixture of Expert Neural Processes was accepted to NeurIPS2022. Our method applies to few-shot supervised learning and meta reinforcement learning. It is able to handle stochastic processes with mixture components. [Paper Link]
July 2022: The draft of my Ph.D. Thesis will be finished.
July 7th 2022: Delighted to receive ICML2022 Participation Grants.
May 2022: Our paper Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search was accepted in ICML2022 Spotlight, and the final version will be released soon. It seems the log-likelihood of defending my Ph.D. thesis this year was increased!
Here a GNN based dynamics model is introduced with superior generalization, and the posterior sampling strategy is used in policy learning without additional policy gradients in new environments.
Feel free to access our slides link below as a brief introduction to our proposed GSSM and Amortized Meta Model-based Policy Search. (Note that this is the first trial in amortizing task-specific policies in meta model-based policy search as far as we know. The significance is that the use of non-parametric modeling avoids re-training or gradient adaptations of policies in new environments, which might be promising to address data-efficient fast adaptation problems.)
[The latest version can be found here: Paper Link, Slides Link]
June 1st, 2020 : Our paper entitled "Doubly Stochastic Variational Inference for Neural Processes with Hierarchical Latent Variables" is accepted in ICML2020. (Supplementary materials refer to the Link.)
The slides for our ICML presentation are attached to the Link below. We propose a hierarchical neural process to simultaneously identify tasks and capture local correlations in high-dimensional problems.
Education Background
Before joining AMLab, I obtained a Bachelor's degree in Mathematics at Sichuan University (2011~2015), and after that, I pursued a Master's degree in Management Science at another research institute (2015~2017). During my undergraduate and early graduate periods, I read a lot about statistical learning theory, convex optimization, probabilistic modeling, and risk management in decision-making. Between November 2018 and April 2019, I worked in CSL at UvA. I always feel very grateful to my host, Prof. Peter Sloot, who sincerely supported me at the beginning of my life and research in the Netherlands, especially during my toughest time. At the end of June 2019, I started working at AMLab, under supervison of Prof. Max Welling and Prof. Herke van Hoof.
Research Focus
The source of Uncertainty and the law of Dynamics are of our great concerns in understanding the complexity of the world, and I am fascinated with some novel Bayesian models in learning dynamics and conducting policy optimization in reinforcement learning environments with Statistics and Physics as fundamental techniques.
Currently, I focus more attention on meta learning, reinforcement learning and efficient optimization of foundation models.
Service Updates
Conference Review: ICML/NeurIPS/ICLR
Journal Review: XXX
In August 2021, I feel honored to become a program committee member for the 2021 NeurIPS workshop ECOLOGICAL THEORY OF RL (EcoRL 2021).
Since August 2021, I have been working as a teaching assistant in a Master AI course - Reinforcement Learning at UvA.
Between July 2020 and September 2021, I am assisting our supervisors at AMLab in organizing Weekly Seminar, which is a wonderful platform for research communications.
Since August 2020, I worked as a teaching assistant in a Master AI course - Reinforcement Learning at UvA.
Since August 2019, I worked as a teaching assistant in a Master AI course - Reinforcement Learning at UvA, mainly in charge of practical sessions and QA sessions.
Prepared Study Materials for MSc Students
Based on requests from MSc students in my RL practical sessions, I share my prepared slides of practical sessions with you guys as follows.
Note that these slides are adapted from Richard Sutton's book and other open access online materials.
[session1, session2, session3, session4, session5, session6, session7, session8, session9, session10, session11, session12, to be continued]
Student Supervision
Out-of-Distribution Detection on Time Series Dataset using Bayesian Deep Learning (Collaborate with Fraudio Company) --> Master Student: Berend Jansen --> Finished (2020.12~2021.07)
Contact Information
Office : Lab42 4.22, Science Park 904, Amsterdam
E-mail : cheemswang@mail.tsinghua.edu.cn or hhq123go@gmail.com
Social Media : Twitter @AlbertW24045555
Googlescholar : Q. Wang