Create and deploy LLMs efficiently!
We would love to hear from you about your motivation to attend this tutorial! Please fill in this short post-event survey
Large language models (LLMs) have taken the world by storm, revolutionizing the use of AI in products. While scaling laws demonstrate that larger models yield better results, making them work in production is hard, often due to latency demands on inference. In this proposed tutorial, we will share optimizations - both algorithmic and systems-related - that help leverage LLMs (both small and large) for recommendation and generative AI use cases at planet scale for the world's largest professional network - LinkedIn.
In the first part of the tutorial, we will discuss state-of-the-art (SOTA) model quantization and pruning techniques. This will be in conjunction with a discussion on GPU kernel-level optimizations including minimizing memory copying, effectively utilizing shared memory, optimizing thread scheduling, and maximizing parallel efficiency. We will discuss our own experience with these inventing and leveraging such techniques, while also discussing the latest advancements from other enterprises and the open source world. Our discussions will cover models ranging in size from 1 billion to 100 billion+ parameters.
In the second part of the tutorial, we will discuss the latest advancements in the world of LLM knowledge distillation which can result in training very powerful and performant small language models (SLMs). We will also discuss effective instruction tuning and preference alignment techniques that help with improving accuracy and quality of results for generative use cases. Finally, we will discuss actual production use cases that benefit from the aforementioned techniques at planet scale for LinkedIn.
Kayhan Behdin (LinkedIn)
Yun Dai (LinkedIn)
Gregory Dexter (LinkedIn)
Aman Gupta
(LinkedIn* - Now at Nubank)
Rahul Mazumder
(LinkedIn)
Ankan Saha (LinkedIn)
Qingquan Song (LinkedIn)
Shao Tang (LinkedIn)
Sirou Zhu (LinkedIn)
Byron Hsu(LinkedIn*- Now at x.ai)
Intro to the tutorial and applications
Part 1 - Distillation, reasoning and alignment
Part 2 - Efficient Training
Coffee Break
Part 3 - Efficient Inference
Closing thoughts and Q & A
Contact our team aiaf-optimization [at] linkedin.com