Discrete Optimization and Machine Learning

Sebastian Pokutta (WiSe 2025 / Seminar)

Both Machine Learning methods as well as Discrete Optimization methods are important tools in many real-world applications. In this course we will primarily study the interplay of integer programming and, more broadly, discrete optimization methods and machine learning.
The format of the course is research-oriented requiring active participation of the students. The course will be a mix of student presentation, discussions, and project work. In the first few weeks in-class projects will be defined.

Course organization

Prerequisites: Linear Algebra, Analysis, and Discrete Optimization (ADM I/ADM II)

Registration: Together with paper selection after first meeting (seminar outline)

Participation requirements:
Students are expected to individually

Write a report (5 page minimum in Latex) about the paper and the contribution from the students themselves.
Send your reports to Jannis Halbey
During the semester, present their plans for the final report and presentation in a shorter 5-minute presentation.
Give a final presentation to present their findings (details discussed in the first meeting).

Paper/project selection:
Students choose one of the papers below to work on

Up to 2 students can work on the same paper
Assignment is first come, first served
Send paper choice to Jannis Halbey
For fairness reasons we only accept selections after 14.10.2025 12:00

Contribution:
Every student is expected to make a contribution to the given topic and not just reproduce the results. The aim is not to obtain groundbreaking new results, but to get an impression of scientific work. Furthermore, you also need to have a very sound understanding of the subject in order to contribute something yourself. Contributions can be an extension of the original algorithm, a new theoretical proof or a comparison with new research.

Examples for strong contributions:

COIL: A Deep Architecture for Column Generation

- Identify the OptNet-Layer as one main bottleneck
- Examine Lagrangian Dual Framework from Ferdinando Fioretto et al.: Lagrangian Duality for Constrained Deep Learning as alternative solution
- Implement modification and evaluate empirically

Sparse Adversarial and Interpretable Attack Framework

- Identify vanilla Frank-Wolfe Algorithm as potential point for improvement
- Examine the variant Blended Pairwise Conditional Gradient
- Implement modification and evaluate empirically

PEP: Parameter Ensembles by Perturbation

- Comparison with state-of-the-art Deep Ensembles
- Examine advantages and disadvantages of both methods
- Implement combination of both approaches and evaluate empirically

Timeline:

Seminar outline 13.10.2025 10:00 s.t. (online)
If you have missed the first meeting, you can still participate in the seminar, please contact Jannis Halbey
Open paper selection 14.10.2025 12:00 (mail to Jannis Halbey)
5-min presentations 18.11.2025 11:00 s.t. in room Mar 0.017
Report submission deadline 08.02.2026 23:59
Final presentations 09.02.2026 10:00 s.t. at Zuse Institute Berlin

Reading material by topic (red papers are no longer available):

Deep Learning / LLMs

Sun et al.

A Simple and Effective Pruning Approach for Large Language Models

https://arxiv.org/abs/2306.11695

Deep Learning / LLMs

Sun et al.

A Simple and Effective Pruning Approach for Large Language Models

https://arxiv.org/abs/2306.11695

Deep Learning / LLMs

Zimmer et al.

Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

https://arxiv.org/abs/2306.16788

Deep Learning / LLMs

Zimmer et al.

PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs

https://arxiv.org/abs/2312.15230

Deep Learning / LLMs

Jan Hendrik Kirchner, Yining Chen, Harri Edwards, Jan Leike, Nat McAleese, Yuri Burda

Prover-Verifier Games improve legibility of LLM outputs

https://arxiv.org/abs/2407.13692

Deep Learning / LLMs

Frantar et al.

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

https://arxiv.org/abs/2301.00774

Deep Learning / Neural Fields

Dupont, E., Kim, H., Eslami, S. M. A., Rezende, D., & Rosenbaum, D.

From data to functa: Your data point is a function and you can treat it like one

https://arxiv.org/abs/2201.12204

Diffusion Models / Sampling

Richter and Berner

Improved sampling via learned diffusions

https://arxiv.org/abs/2307.01198

Optimization

Okanovic, P., Kwasniewski, G., Labini, P. S., Besta, M., Vella, F., & Hoefler, T.
High performance unstructured spmm computation using tensor cores.
https://arxiv.org/abs/2408.11551

Optimization

David Applegate, Mateo Díaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O'Donoghue, Warren Schudy

PDLP: A Practical First-Order Method for Large-Scale Linear Programming

https://arxiv.org/abs/2501.07018

Optimization

Ye-Chao Liu, Jannis Halbey, Sebastian Pokutta, Sébastien Designolle

A Unified Toolbox for Multipartite Entanglement Certification

https://arxiv.org/abs/2507.17435

Optimization

Matteo Vandelli, Francesco Ferrari, Daniele Dragoni

Parallel splitting method for large-scale quadratic programs

https://arxiv.org/abs/2503.16977

Optimization

A. A. Vyguzov, F. S. Stonyakin
Adaptive Variant of Frank-Wolfe method for Relative Smooth Convex Optimization Problems
https://arxiv.org/pdf/2405.12948

Boosting

Kasper Green Larsen, Martin Ritzert

Optimal Weak to Strong Learning

https://proceedings.nips.cc/paper_files/paper/2022/file/d38653cdaa8e992549e1e9e1621610d7-Paper-Conference.pdf

Google Sites

Report abuse