Jatan Shrestha
Santeri Heiskanen
Kari Hepola
Severi Rissanen
Pekka Jääskeläinen
Joni Pajarinen
Multi-objective optimization (MOO) arises in many real-world applications where trade-offs between competing objectives must be carefully balanced. In the offline setting, where only a static dataset is available, the main challenge is generalizing beyond observed data. We introduce Pareto-Conditioned Diffusion (PCD), a novel framework that formulates offline MOO as a conditional sampling problem. By conditioning directly on desired trade-offs, PCD avoids the need for explicit surrogate models. To effectively explore the Pareto front, PCD employs a reweighting strategy that focuses on high-performing samples and a reference-direction mechanism to guide sampling towards novel, promising regions beyond the training data. Experiments on standard offline MOO benchmarks show that PCD achieves highly competitive performance and, importantly, demonstrates greater consistency across diverse tasks than existing offline MOO approaches.
Training: A conditional diffusion model is trained on a static dataset, using a novel reweighting strategy to emphasize high-quality solutions near the Pareto front.
Sampling: At inference, the model directly generates novel designs conditioned on target objectives. This end-to-end approach sidesteps the need for the explicit surrogate models and separate optimizers required by prior methods.
Our proposed approach is evaluated using the comprehensive Offline MOO benchmark (Xue et al., 2024). We consider five diverse task categories: 1) Synthetic, 2) Multi-Objective Reinforcement Learning (MORL), 3) Real-World Applications (RE), 4) Scientific Design, and 5) Multi-Objective Neural Architecture Search (MONAS). For benchmark tasks that involve discrete values, we convert them into continuous logits, following the standard practice in prior work (Trabucco et al., 2022; Xue et al., 2024).
Average rank (↓) of PCD and baseline methods across five task categories. Ranks are calculated based on the 100th percentile Hypervolume (HV). Bold and underlined rows indicate the best and runner up methods respectively. PCD achieves the best overall average rank, demonstrating its strong and consistent performance.
Left: PCD is able to faithfully reconstruct the original conditioning points.
Right: In this lower-dimensional task, the conditioning is highly effective, and the generated points closely align with and even outperform their conditioning targets.
@inproceedings{
shrestha2026paretoconditioned,
title={Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization},
author={Jatan Shrestha and Santeri Heiskanen and Kari Hepola and Severi Rissanen and Pekka Jääskeläinen and Joni Pajarinen},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=S2Q00li155}
}