ICML 2024 Workshop on
Theoretical Foundations of Foundation Models
Workshop Summary
Recent advancements in generative foundation models (FMs) such as large language models (LLMs) and diffusion models have propelled the capability of deep neural models to seemingly magical heights. Yet, the soaring growth in the model size and capability has also led to pressing concerns surrounding such modern AI systems. The scaling of the models significantly increases their energy consumption and deployment cost. Overreliance on AI may perpetuate existing inequalities and lead to widening discrimination against certain groups of people. The gap between the understanding of the internal workings of FMs and their empirical success has also reached an unprecedented level, hindering accountability and transparency.
For decades, theoretical tools from statistics, information theory, and optimization have played a pivotal role in extracting information from unstructured data, which continues to hold true in the era of neural models, including FMs. Statistical principles have been key to developing rigorous approaches to responsible AI systems, such as privacy and fairness. Information theory, particularly language modeling and compression techniques underpin the design and capabilities of LLMs. Optimization theory aids in selecting appropriate training algorithms for LLMs like Adam and second-order methods. Multi-objective learning with proper information divergences has advanced development in reinforcement learning from human feedback (RLHF), the core technique for language model alignment.
Currently, the rapid pace of FM development has outstripped theoretical investigation, creating a potential gap between theoretical researchers and the challenges surrounding FMs. This workshop proposes a platform for bringing together researchers and practitioners from the foundation model and theory communities (including statistics, information theory, optimization, and learning theory), to discuss advances and challenges in addressing these concerns, with a focus on the following three themes:
Efficiency: The training and inference speed and computational costs of FMs hinder their general-purpose and widespread deployment. More efforts are needed to effectively compress, prune, or distill FMs to improve efficiency. Novel tools are in demand to improve data efficiency in training or fine-tuning as well. Another emerging direction is how to efficiently serve FMs, in light of the modern machine learning hardware.
Responsibility: The growing challenges in the responsible use of FMs demand new theoretical studies. Addressing biases in training data, which typically contains text scraped from publicly available Internet resources, is a largely under-explored area. The new paradigm of pre-training and fine-tuning FMs also requires novel development in principles of fairness, privacy, and alignment. How to enforce security and safety when deploying FMs is also an active and new area of research.
Principled Foundations: The key to improving the efficiency and responsibility in FMs is uncovering how they process information and make predictions. Despite the widespread use and success of FMs, we lack an understanding of why they are so good at compression/prediction or whether other architectures (e.g., state-space models) may be comparable to or even better than transformer-based models. In-context learning and other emergent capabilities of LLMs are still not well understood.