Data Filtering Challenge for Training Edge Language Models
 Welcome to the Data Filtering Challenge for Training Edge Language Models!
Introduction and Motivation
The rapid development of language models (LMs) has catalyzed breakthroughs across various domains, including natural language understanding, robotics, and digital human interaction. Compared with general large LMs, which are difficult to deploy on resource-constrained edge devices, edge LMs fine-tuned for target downstream tasks have the potential to achieve both greater efficiency and higher task accuracy. However, this fine-tuning hinges on the availability of high-quality, diverse datasets. The Data Filtering Challenge for Training Edge Language Models seeks to unite academic researchers, industry experts, and AI enthusiasts to develop data filtering techniques that refine datasets driving the next generation of edge LMs.
This challenge invites participants to create data filtering techniques and submit datasets refined by these methods, aiming to significantly enhance the achievable performance of edge LMs on downstream tasks deployed on edge devices. With a focus on improving model accuracy and applicability across crucial domains, participants will have the opportunity to push the frontier of edge LMs and gain recognition within the AI community. For this edition, we are focusing on a method known as Low-Rank Adaptation (LoRA), which allows for the creation of efficient task-specific edge LMs from pre-trained ones using fewer resources, making it ideal for devices such as smartphones and portable robots.
Scope of this Challenge
Participants are encouraged to develop and apply data filtering techniques to curate datasets optimized for key use cases in edge LM deployment. These datasets aim to enhance the performance of edge LMs in diverse scenarios, including:
Roleplay in interactive digital environments
Function calling on mobile devices
Robotics for autonomous tasks
Retrieval-augmented generation (RAG) tasks
The goal is to ensure that edge LMs, continuously trained on these curated datasets, demonstrate significant improvements across these use cases. In particular, participants should highlight how these datasets, coupled with LoRA-enhanced models, improve accuracy and performance.
More details can be found on the Problem page.
News
12/09/2024
Our website is online!
Challenge Timeline
Website Release
Toolkit Release
Registration Deadline
Submission Deadline
Award Notification
Awards Ceremony / Workshop
Jan. 24, 2025
Jan. 24, 2025
Feb. 15, 2025
May 31, 2025
Jun. 20, 2025
Summer 2025
Awards
Grand Prize
Category-Specific Awards
Innovation Award
$10,000
$3,000
$3,000
Sponsors
Contest Organizers
Shizhe Diao
NVIDIA
Yonggan Fu
Georgia Institute of Technology
Xin Dong
NVIDIA
Peter Belcak
NVIDIA
Lexington Whalen
Georgia Institute of Technology
Jan Kautz
NVIDIA
Yingyan (Celine) Lin
Georgia Institute of Technology
Pavlo Molchanov
NVIDIA