Call for Papers

This workshop solicits contributions that bridge the gap between deep learning theory and the modern practice of deep learning in an effort to build a mathematical theory of machine learning that can both explain and inspire modern practice. We welcome new mathematical analyses that bridge the gap between existing theory and modern practice, as well as empirical findings that challenge existing theories and offer avenues for future theoretical investigations.

Full Paper Submission Deadline: Oct 4, 2023, 11:59 pm AOE (Note: this date was originally Oct 2)
Accept/Reject Notification Date: Oct 27, 2023, 11:59 AOE
Submission link: https://openreview.net/group?id=NeurIPS.cc/2023/Workshop/M3L
Submission Format:
- The reviewing process will be double-blind and all submissions must be anonymized. Please do not include author names, author affiliations, acknowledgments, or any other identifying information in your submission. Submissions and reviews will not be made public. Only accepted papers will be made public.
- All submissions must be in PDF format and are required to use the LaTeX style file. Submissions are limited to four content pages, including figures and tables. Camera-ready versions may go up to five content pages.
- Unlimited additional pages are allowed for references and supplementary materials. Please include the references and supplementary materials in the same PDF as the main paper.
Dual Submissions: This workshop is non-archival and will not have official proceedings. Workshop submissions can be submitted to other venues. We welcome ongoing and unpublished work, including papers that are under review at the time of submission. However, we do not accept submissions that have already been accepted for publication in other venues with archival proceedings (including NeurIPS 2023 main conference).

This workshop's main areas of focus include but are not limited to:

Reconciling Optimization Theory with Deep Learning Practice
- Convergence analysis beyond the stable regime: How do optimization methods minimize training losses despite large learning rates and large gradient noise? How should we understand the Edge of Stability (EoS) phenomenon? What could be more realistic assumptions for the loss landscape and gradient noise that foster training algorithms with faster convergence both in theory and practice?
- Continuous approximations of training trajectories: Can we obtain insights into the discrete-time gradient dynamics by approximating them with a continuous counterpart, e.g., gradient flow or SDE? When is such an approximation valid?
- Advanced optimization algorithms: adaptive gradient algorithms, second-order algorithms, distributed training algorithms, etc.
Generalization for Overparametrized Models
- Implicit bias: What implicit bias do training algorithms have? How do gradient-based algorithms implicitly pick the solution with good generalization despite the existence of non-generalizing minimizers?
- Generalization Measures: What is the relationship between generalization performances and common generalization measures? (e.g., sharpness, margin, norm, etc.) Can we prove non-vacuous generalization bounds based on these generalization measures?
- Roles of Key Components in Algorithm and Architecture: What are the roles of initialization, learning rate warmup and decay, and normalization layers?
- Intriguing Generalization Phenomena: Generalization despite overparameterization, double descent, benign overfitting, grokking, vulnerability to adversarial examples, etc.
Theory for Foundation Models/Pretrained Models
- Pretraining: What do foundation models learn in pretraining that allows for efficient finetuning? How does the choice of dataset/architecture affect this?
- Multimodal Representations: How can we learn representations from multimodal data?
- Scaling laws: How and why does the performance scale with data, compute, and model size?
- Emergent Phenomena: In-context learning capabilities, few-shot reasoning capabilities such as Chain of Thought (CoT), and improved robustness/calibration.
- Adaptation of Pretrained Models: Fine-tuning, prompting, in-context learning, instruction-tuning, RLHF, etc.
Provable Guarantees Beyond Supervised Learning Settings
- Deep Reinforcement Learning: How should we analyze the training dynamics of deep reinforcement learning algorithms?
- Generative Models: How do different generative modeling methods compare? What do we understand about the complexity and efficiency, and are there fundamental limitations?
- Representation Learning and Transfer Learning: What properties of the source and target tasks allow for efficient transfer learning? What types of representations can be learned via self-supervised learning (e.g., contrastive learning)
- Continual Learning: How do we adapt the model to new tasks while preserving the performance of old tasks?