ML Efficiency for Large Models (MeLM)

Today’s world needs orders of magnitude more efficient ML to address environmental and energy crises, optimize resource consumption and improve sustainability. With the end of Moore’s Law and Dennard Scaling, we can no longer expect more and faster transistors for the same cost and power budget. This is particularly problematic when looking at the growing data volumes collected by populated sensors and systems, larger and larger models we train, and the fact that most ML models have to run on edge devices to minimize latency, preserve privacy and save energy. The algorithmic efficiency of deep learning becomes essential to achieve desirable speedups, along with efficient hardware implementations and compiler optimizations for common math operations. ML efficiency is being actively investigated in many research communities. This reading group aims to help onboard young scientists interested in the topic and offers researchers at all levels a platform for an open dialog to foster collaboration, and stay up-to-date with rapid developments in the field of efficient ML. We welcome and discuss fresh research findings published as a pre-print or recently presented at research venues. The list of topics includes but is not limited to:

PI: Ganesh Ramakrishnan

Broad Themes

List of Research Threads

Optimizing Large Language Models through Singular Vector-Based Fine-Tuning

Aims to advance parameter-efficient fine-tuning techniques by exploring singular vector-guided updates to adapt large-scale pre-trained models for specific downstream tasks. Traditional parameter-efficient fine-tuning (PEFT) methods like LoRA and DoRA achieve efficiency by introducing additional low-rank or additive updates but often compromise on performance or require a significant parameter budget.

Key Research Areas:

Parameter Efficiency in Model Fine-Tuning
Comparison and Evaluation of PEFT Techniques
Task-Specific Sparsity Patterns and Performance
Scalability and Adaptation in Large Language Models

Pathway to Algorithmic Generalization in Deep Learning Models (Memory-Augmented Transformers)

Explore the potential of memory-augmented Transformers (Memformers) to serve as adaptive optimizers by implementing Linear First-Order Optimization Methods (LFOMs). The Memformer architecture, which utilizes memory registers to retain past gradient information, has shown promise in performing and even surpassing traditional optimization algorithms such as conjugate gradient descent in specific settings.

Key Research Areas:

Leveraging Memory Augmentation for Advanced Optimization
Comparative Performance Against Classical Optimization Techniques
Transformers as Meta-Optimizers
Theoretical Foundations and Convergence Analysis
Efficiency and Practical Scalability

Geodesic Sharpness in Transformers (Riemannian Approaches for Symmetry-Invariant Generalization Metrics)

Advancing symmetry-aware sharpness metrics to improve generalization predictions for Transformer models by leveraging Riemannian geometry. Transformers exhibit unique symmetries—particularly in their attention mechanisms—that distort standard sharpness measures, making it challenging to correlate sharpness with generalization performance accurately. By applying geometric principles, we aim to develop metrics that respect these symmetries, leading to a clearer understanding of model sharpness in high-dimensional architectures.

Key Research Areas:

Developing Symmetry-Invariant Sharpness Measures
Comparative Analysis of Geodesic Sharpness and Adaptive Sharpness
Evaluating Transformer Symmetries in Attention Mechanisms
Potential for Sharpness-Aware Optimization in Training

Efficiently Adapting Pre-Trained Models for Multiple Tasks

Investigating task arithmetic as an efficient technique for editing pre-trained models, focusing on its capacity to add, combine, or remove task-specific capabilities with minimal interference. This approach leverages linear combinations of fine-tuned weights to achieve multi-task performance efficiently, bypassing the need for extensive retraining when adapting to new tasks. Central to this method is the concept of weight disentanglement, where distinct directions in the model’s weight space correspond to separate tasks, enabling isolated task manipulation.

Key Research Areas:

Developing Task Arithmetic for Efficient Model Adaptation
Investigating Weight Disentanglement Mechanisms
Examining Kernel-Based Approaches to Task Localization
Understanding the Role of Pre-Training in Task Disentanglement

Page updated

Google Sites

Report abuse