Speaker: Dr. Dongfang Liu, Rochester Institute of Technology
Time: 10:00 am - 11:30 am on 11-06-2024 (Wednesday)
Room: E297L, Discovery Park, UNT
Zoom link: Zoom link
Coordinator: Drs. Heng Fan and Yunhe Feng
Abstract:
The advent of foundation models, particularly large language models (LLMs), has revolutionized various fields. However, the conventional approach of training LLMs from scratch, guided by scaling laws, is not only resource-intensive but also leads to redundant capabilities, creating a bottleneck for further innovation. In this talk, we introduce a novel approach to overcome these challenges by fusing pre-trained LLMs with different architectures at the logit level. This method bypasses the costly retraining process while enhancing model performance through architecture-agnostic integration. At the core of our approach is the introduction of token alignment, which harmonizes token representations from different tokenizers. We reformulate this task into a classical problem of optimal transport, allowing us to leverage distribution-aware learning for more coherent and interpretable model fusion. This not only results in models with enhanced coherence but also offers deep insights into the alignment process, paving the way for combining stronger baselines across diverse tokenizers. This method marks a paradigm shift in the pursuit of artificial general intelligence, moving beyond the brute-force scaling of models. The potential to reduce computational costs while improving model performance holds great promise for socially impactful applications, making this work both valuable and transformative for the research community.