Research Interests: Foundation models| Efficient Pre-training| Efficient Inference| Knowledge Distillation

Overview

I am a PhD student at the University of Texas at Austin, advised by Prof. Sujay Sanghavi in the Department of Electrical and Computer Engineering. For my research, I work on simple things and simple things works for me. I am currently working on efficient training strategies for large models (mostly LLMs). Some of my recent works has been featured in Ahead of AI magazine, Marktechpost, and the Interconnects newsletter.

Before moving to Austin, I graduated with an M.Eng. degree in Information and Communication Engineering from Chongqing University of Posts and Telecommunications, Chongqing, China in 2019, and received a B.Tech degree in Electronics and Communication Engineering from the Maulana Abul Kalam Azad University of Technology (formerly known as West Bengal University of Technology), Kolkata, India. During my undergrad, I gloriously failed to scale up my startup, Tronix India, and later worked in an Indian multinational IT firm, TechMahindra.

I am a person who stutters some info about stuttering here.

Internships

Research Intern at Lightning AI (May-Aug 2024) | Topic: Pre-training Small language Models.
Applied science Intern at Amazon Science Alexa (May-Aug 2022) | Topic: Vision Language pre-training and finetuning.

Preprints

Sunny Sanyal, Sujay Sanghavi and Alex Demakis, "Pre-training Small Base LMs with Fewer Tokens". [paper] [tweet] [code]
This paper is featured in a popular media portal.
Ludwig Schmidt, Vaishaal Shankar, ... Sunny Sanyal et al. "DataComp-LM: In search of the next generation of language model training sets". [under submission]

Selected Publications

Sunny Sanyal, Atula Tejaswi, Jean Kaddour, Abhishek Kumar, and Sujay Sanghavi, “Early Weight Averaging Meets High Learning Rates for LLM Pre-training”. [NeurIPS 2023 WANT workshop] [paper][code]
This paper is featured in two popular newsletters [media] [media]

Selected DEmos, Posters and talks

Gave a talk at ml collective on Pre-training with a little less Data and Compute. [slides] [recordings]
Demo at Art Gallery CVPR 2023 on Generative Masking and In-painting for Videos. [demo]
Poster at 6G@UT symposium 2023 on Understanding the Effectiveness of Early Weight Averaging for Training LLMs.
Gave a talk (in-person) at Austin Deep learning community’s main event on Do Neural Networks Overthink? [link]

Academic Services

Organized Broadening Research Collaborations in ML, NeurIPS workshop 2022, New Orleans, US.
Member of Student Board, Diversity Equity and Inclusivity, Cockrell School of Engineering, UT Austin.
Technical Program Committee: EAI Fabulous 2019, Bulgaria.
Program Committee: ACM/SIGAPP SAC 2019, Cyprus.
Publicity Co-Chair: EAI UBICNET 2019, India.
Reviewer: IEEE Access || IEEE VTC Fall 2019, USA || IEEE ICC 2019, China || IEEE ICCCN 2019, Spain || EAI INTERSOL 2019, Egypt.

Recent Updates

Moving to Newyork this summer.
May 2024: Gave a talk at ml collective's Deep Learning: Classics and Trends.
Our work featured in two newsletters.
Selected to present a poster at NeurIPs WANT 2024.
Selected for Google CSRMP 2023.
Watch my CVPR 2023 demo in art gallery both at Demo hall area.