Research Interests: Foundation models | Transformer++ architecture | Efficient Pre-training| Knowledge Distillation