Speakers

Scaling Track Presentations


Paul Bogdan  (University of Southern California)

Paul Bogdan is an Associate Professor of Electrical and Computer Engineering at the University of Southern California (USC), where he holds the Jack Munushian Early Career Chair. His research focuses on developing novel graph theory, network science, machine learning, and artificial intelligence techniques for describing complex particle systems and their interactions.

Bogdan's work involves collaborating with researchers from various fields, including physics, chemistry, mechanical engineering, materials science, and applied math, to study and design complex systems. He has joined an international team of scientists as part of a new National Science Foundation (NSF) funded Center for Complex Particle Systems (COMPASS), where he will contribute his expertise in machine learning and AI.

Talk: Theoretical Foundations for Artificial Intelligence (AI) Inspired from Understanding Biological Intelligence (BI): Detecting Phase Transitions and Quantifying the Degree of Emergence in Deep Learning 

Darshil Doshi  (University of Maryland)

Darshil is an engineer-turned-physicists-turned-ML-researcher. Currently, he is a PhD candidate at University of Maryland, College Park. His research interests are focused on scaling and emergent behaviours in deep learning. His recent work on interpretability explores emergent phenomena such as In-context Learning, skill-composition and Grokking. Prior to this, he worked on signal propagation in deep neural networks. Darshil uses simplified setups to understand interesting aspects of deep learning at scale.

Talk:  Emergence of in-context learning and skill composition

Hailey Schoelkopf (Eleuther AI)

Hailey Schoelkopf is a research scientist at EleutherAI, a non-profit AI research lab. Hailey's main research focuses include reproducible and reliable LLM evaluation, and large-scale efficient training and inference of DL models. Her work is  focused on open access to models and tooling for open science - including open-source releases such as the Pythia and Llemma language models, and maintaining software tools such as the LM Evaluation Harness. At the workshop, she will be presenting  her recent paper, "Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?", exploring ways to establish a science of rigorous and scaling-predictable evaluations.


Talk:  Designing Predictable Evaluation Processes

Yuhai Tu  (IBM)

Yuhai Tu is a Principal Research Scientist at IBM T. J. Watson Research Center. He received his PhD in theoretical physics from UCSD in 1991. After serving as a Division Prize Fellow at Caltech from 1991-1994, he joined IBM Watson Research Center and served as head of the Theory group during 2003-2015. He has been a Fellow of American Physics Society (APS) since 2004 and served as the APS Division of Biophysics (DBIO) Chair in 2017. He is also a Fellow of AAAS. 

Yuhai Tu has a diverse range of research interests from Physics, Biology, to (more recently) Machine Learning. He has made seminal contributions in statistical physics and biophysics. For his work in theoretical statistical physics, he was awarded (together with John Toner and Tamas Vicsek) the 2020 Lars Onsager Prize the highest honor in statistical physics from APS. His recent work focuses on two directions: (1) Molecular mechanisms of information processing in cellular biochemical networks; (2) Statistical physics of learning in brain and in artificial neural networks, in particular dynamics of learning and generalization.


Talk: Statistical Physics of Deep-Learning: On Learning Dynamics and Generalization 


Multilingual Track Presentations


Preslav Nakov  (Mohamed bin Zayed University of Artificial Intelligence)


Preslav Nakov is Professor and Department Chair for NLP at the Mohamed bin Zayed University of Artificial Intelligence. He is part of the core team that developed Jais, the world's best open-source Arabic-centric LLM, as well as part of the LLM360 team at MBZUAI. He received his PhD degree in Computer Science from the University of California at Berkeley, supported by a Fulbright grant.


Talk: Multilinguality Challenges and Datasets for Evaluating LLMs

Neha Sengupta (G42)

Neha leads the NLP Applied science team at G42 in Abu Dhabi. She joined G42 in September of 2018 where she has created several NLP systems including conversational agents, translation systems, and NLP based data analytics engines. Her current interests and specialization include Arabic NLP and LLMs for low to medium resource languages. 

She completed her PhD at the Indian Institute of Technology (IIT), Delhi, India in 2019, focusing on efficient algorithms for large scale graphs. Prior to PhD, she worked at IBM Research Labs in Delhi, India, specializing in smart energy grids and resource scheduling.


Talk: Comparing adaptation and training from scratch for bilingual models


Tatiana Shavrina  (Meta)

Tatiana has been passionate about open source and multilingualism in LLMs. Enthusiast of various benchmarking methods, she has contributed to BLOOM as the lead for interpretability, led mGPT model development, and has contributed to low-resource NLP methods


Talk: Towards Full Linguistic Diversity in Language Models

Yishi Xu  (Cerebras Systems)

Yishi Xu is an accomplished machine learning researcher at Cerebras Systems, a company renowned for its innovative AI hardware solutions. At Cerebras, Xu focuses on developing and evaluating multilingual and domain-specific LLMs. This involves leveraging technologies such as scaling law, continual pretraining, and vocabulary extension to enhance multilingual and domain-specific capabilities. 

Rio Yokota  (Tokyo Institute of Technology)

Rio Yokota is a Professor at the Global Scientific Information and Computing Center, Tokyo Institute of Technology. His research interests lie at the intersection of high performance computing, linear algebra, and machine learning. He is the developer numerous libraries for fast multipole methods (ExaFMM), hierarchical low-rank algorithms (Hatrix), and information matrices in deep learning (ASDL) that scale to the full system on the largest supercomputers today. He has been optimizing algorithms on GPUs since 2006, and was part of a team that received the Gordon Bell prize in 2009 using the first GPU supercomputer. 

Talk: Continual Pre-training of Open-Source Models on Japanese Text