Zeroth-Order Machine Learning: Fundamental Principles and Emerging Applications in Foundation Models
AAAI 2024 Tutorial
February 20, 2024 (2:00 pm – 6:00 pm PST)
Room 118
Overview
With the swift progression of artificial intelligence, driven notably by the rise of foundation models (FMs) (e.g., LLMs), a plethora of fresh opportunities and challenges have arisen in the evolution of the next generation of ML algorithms. While the auto-differentiation-based first-order (FO) optimizers, e.g., SGD and Adam, have been the predominant choices for model training and fine-tuning, increasing scenarios have emerged, where obtaining FO gradient information becomes infeasible or computationally prohibitive. For example, as LLMs continue to scale, they encounter significant memory overhead due to the back-propagation (BP), and advancements in addressing this challenge could also facilitate technological breakthroughs in related areas, such as on-device training. Similarly, a significant and recent challenge is the problem of prompt learning for foundation-model-as-a-service, exemplified by platforms like ChatGPT, where directly obtaining the FO gradients is impossible due to its black-box setting. Such challenges are also prevalent in numerous applications in AI for science (AI4S), where ML models might interact with non-differentiable or black-box simulators/experiments with analytically defined learning objectives.
In stark contrast, the utilization of gradient-free zeroth-order (ZO) optimization techniques emerges as a viable approach for LLM fine-tuning, exhibiting an exceptional degree of memory efficiency and broader applicability for various black-box challenges. These instances underscore the imperative nature of exploring alternative avenues to FO-based machine learning algorithms in the current era of FMs. In this tutorial, the following aspects will be covered.
We will thoroughly assess the latest breakthroughs in the gradient-free learning paradigm (i.e., learning without gradients), which we term zeroth-order machine learning (ZO-ML). We will delve into the theoretical and methodological foundations that underpin ZO-ML.
We will demonstrate the seamless integration of ZO-ML techniques with ML/AI applications, bridging the gap between ZO-ML theory and its practical implementation. Through this integration, we will showcase how ZO-ML can eliminate design barriers in today's FM-oriented applications.
We endeavor to nurture a continuous cycle of research and education that encourages collaborative and synergistic interactions between foundational research, use-inspired research, educational activities, and the transfer of knowledge between academia and industry. By seizing this tutorial opportunity, we aspire to create a lasting impact on both the academic and industrial communities.
Table of Contents
Speakers