I am a first-year Ph.D student at Washington University in St. Louis, advised by Prof. Jiaxin Huang.
My enthusiasm lies in exploring multimodal learning and natural language processing, with the goal of enhancing the ability of intelligent models to understand and generate complex data in open-world scenarios. Recently, my research has focused on (M)LLM efficiency and trustworthiness.
My previous series of multimodal research (PGIM->RiVEG->SMNER->VP-MEL->D-ARTEMIS) explored how to unleash the potential capabilities of visual-language models in complex multimodal scenarios, how to build harmonious interaction and collaboration between multiple models, and how to construct image-text based knowledge augmentation methods in open-world scenarios. Additionally, I also maintain interest in several visual tasks (AFAN).
I am more concerned about the problems that are worth solving rather than limiting myself to a specific field. Feel free to contact me and explore possibilities together.
Vision and Language:
Knowledge + (Multimodal) Large Language Model [PGIM (EMNLP'23), RiVEG (ACL'24), VP-MEL (ACL'25), SMNER (IEEE TMM'25), D-ARTEMIS (arXiv'25)]
Image Inpainting and Synthesis [AFAN (IEEE TMM'25)]
[Sep. 2025] Checking our latest research about GUI Agent, which builds a deliberative framework inspired by human cognitive cycles.
[Jul. 2025] One paper about Segmented Multimodal Named Entity Recognition (SMNER) is accepted to IEEE Transactions on Multimedia.
[May 2025] One paper is accepted to ACL 2025. We propose a new Visual Prompts Guided Multimodal Entity Linking (VP-MEL) task and available dataset that aims to balance the status of different modalities in MEL.
[Feb. 2025] Thrilled to join the HINT Lab at WashU — time to launch the next chapter! 🚀
[Jan. 2025] One paper about Blind Image Inpainting is accepted to IEEE Transactions on Multimedia.
[Nov. 2024] I am honored to have received three top honors at Tianjin University: China National Scholarship, First-Class Academic Excellence Scholarship, and Outstanding Student Award.
[May 2024] One paper about Grounded Multimodal Named Entity Recognition (GMNER) and Large Language Models is accepted to ACL 2024. See you in Bangkok!
[Oct. 2023] One paper about Multimodal Named Entity Recognition (MNER) is accepted to EMNLP 2023.
Baidu Inc.
Beijing, Apr. 2024 - Jun. 2024
Focus on vision-language models for visual document understanding.
[2025] Outstanding Graduation Thesis of Tianjin University
[2025] Outstanding Graduates of Tianjin University
[2024] China National Scholarship (Top 1%)
[2022-2024] First-class Academic Excellence Scholarship of Tianjin University
[2024] Outstanding Student of Tianjin University
When I’m not in research mode, I enjoy swimming and violin (Fluent sight-reading skills). They have been with me for nearly twenty years. My favorite virtuoso is Ray Chen. His interpretation of the music always carries a distinct personal touch. I hope I can approach my research in the same way.
In the past, I demonstrated good road cycling ability, but I stopped continuing this sport after a few close accidents.