Research


Foundation models, Transformer++ architecture, Efficient Pre-training and Knowledge Distillation