I am a first-year Ph.D student at Washington University in St. Louis, advised by Prof. Jiaxin Huang.
My enthusiasm lies in exploring multimodal learning and natural language processing, with the goal of enhancing the ability of intelligent models to understand and generate complex data in open-world scenarios. Recently, my research has focused on (M)LLM efficiency and trustworthiness.
My previous series of multimodal research (PGIM->RiVEG->SMNER->VP-MEL->Balanced-Info-MPRM) explored how to unleash the potential capabilities of visual-language models in complex multimodal scenarios, how to build harmonious interaction and collaboration between multiple models, and how to construct image-text based knowledge augmentation methods in open-world scenarios.
I am more concerned about the problems that are worth solving rather than limiting myself to a specific field. Feel free to contact me and explore possibilities together.
Vision and Language:
Knowledge-Grounded (Multimodal) Large Language Model
-- PGIM (EMNLP'23), RiVEG (ACL'24), SMNER (IEEE TMM'25), VP-MEL (ACL'25), D-ARTEMIS (arXiv'25), RelayLLM (arXiv'26), Balanced-Info-MPRM (arXiv'26)
Image Inpainting and Synthesis
[2025] Outstanding Graduation Thesis of Tianjin University
[2025] Outstanding Graduates of Tianjin University
[2024] China National Scholarship (Top 1%)
[2022-2024] First-class Academic Excellence Scholarship of Tianjin University
[2024] Outstanding Student of Tianjin University
Baidu Inc.
Beijing, Apr. 2024 - Jun. 2024
Focus on vision-language models for visual document understanding.
When I’m not in research mode, I enjoy swimming and violin (Fluent sight-reading skills). They have been with me for nearly twenty years. My favorite virtuoso is Ray Chen. His interpretation of the music always carries a distinct personal touch. I hope I can approach my research in the same way.
In the past, I demonstrated good road cycling ability, but I stopped continuing this sport after a few close accidents.