Research

Global Optimality of Structured Nonconvex Optimization

Many problems in areas such as statistics, causal inference, reinforcement learning, LLM, supply chain management, and revenue management are formulated as nonconvex optimization. However, finding global optima for general nonconvex optimization problems is well-known to be computationally intractable. Fortunately, many of these problems exhibit well-structured properties. For instance, some possess hidden convexity, meaning they can be reformulated as convex problems through a variable transformation. However, these transformations can often be either unknown or computationally challenging, making it difficult to directly solve the convex reformulation to global optimality. Beyond hidden convexity, certain problems, such as base-stock policies in inventory management, involve dynamic programming with convex recursions but remain nonconvex in the decision variables. How to leverage such convex recursion to solve nonconvex problem efficiently. In causal discovery, various nonconvex formulation could admit unique structure that one could explore to establish fast convergence to the true causal model computationally.

This raises a natural research question: can one find global optimality of these structured nonconvex optimization problems efficiently? We provide a positive answer. For problems with hidden convexity, we explore the design of easy-to-implement, globally converging algorithms that directly solve the hidden convex optimization. These methods consistently outperform bid-price control benchmarks, achieving higher revenues in passenger and air-cargo network revenue management. The approach is also applicable to pricing-based network revenue management, inventory management with random supply, capacity, or yield, linear quadratic control, convex Markov decision process. For dynamic problems from inventory systems and cash balance problems, we show that policy optimization in finite-horizon Markov decision processes exhibits a benign nonconvex landscape. Consequently, policy gradient methods can efficiently converge to global optima despite nonconvexity. For causal discovery problem, we show that the nonconvex optimization admits an error bound condition and thus could be efficiently solved, breaking previously belief that efficient computational method may not exist under general conditions. The goal of this line of work is to expand the class of nonconvex problems that can be efficiently solved and provide solution methods with provable guarantees. Solving such nonconvex problems even in a case-by-case manner would greatly enhance the field that the problem originates from.

Causal Invariance Learning via Efficient Optimization of a Nonconvex Objective. [ArXiv]

Zhenyu Wang*, Yifan Hu*, Peter Bühlmann, Zijian Guo.

Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action. [ArXiv]

(α-β) Xin Chen, Yifan Hu, and Minda Zhao.

Stochastic Optimization under Hidden Convexity. [ArXiv]

(α-β) Ilyas Fatkhullin, Niao He, Yifan Hu.

SIAM Journal on Optimization, 2025.

Efficient Algorithms for A Class of Stochastic Hidden Convex Optimization and Its Applications in Network Revenue Management. [Link]

(α-β) Xin Chen, Niao He, Yifan Hu, and Zikun Ye.

Operations Research, 2024.

Large-Scale Causal Inference and Discovery

Understanding correlation alone is insufficient for effective decision-making; causality must also be understood. While various methods for causal inference have been developed by the statistics and causality communities, these methods often face significant computational challenges when applied to large-scale problems for three reasons: (1) there lacks principled approaches to design simple, single-stage optimization objectives for causal inference problems, making it difficult to directly apply many tools; (2) many existing methods leverage closed-form solutions, which do not generalize to neural network approximations; and (3) some causality problems, after formulating as optimization, are inherently nonconvex.

The first two challenges requires mathematical optimization modeling. Leveraging my expertise in stochastic optimization with biased oracles, we demonstrate that many causal inference problems traditionally solved via two-stage procedures or via zero-sum games (for instance instrumental variable regression) can be reformulated as simple single-stage optimization problems. This enables the use of modern techniques. For the third challenge, the design of the mathematical optimization model requires both identifiability and computational tractability. We show that the proposed optimization formulations for some causality problems are not only identifiable but also exhibit benign nonconvexity. Leveraging my expertise in global optimality for nonconvex optimization, I have also designed efficient algorithms to identify causal relationships. My aim for this line of research is to bridge the gap between optimization and causality, providing more versatile optimization tools for modeling and solving causality problems efficiently. I collaborate closely with researchers in statistics and causality to complement our expertise in these areas. Two ongoing projects will be released soon: one addressing the brute-force search in causal discovery and the other simplifying the multi-step procedures in causal inference.

Causal Invariance Learning via Efficient Optimization of a Nonconvex Objective. [ArXiv]

Zhenyu Wang*, Yifan Hu*, Peter Bühlmann, Zijian Guo.

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data. [ArXiv]

Xuxing Chen*, Abhishek Roy*, Yifan Hu, Krishna Balasubramanian.

NeurIPS 2024.

Contextual Bilevel and Learning System Design

In bilevel framework, there is usually one leader (designer) and one or multiple followers (agents), where the leader makes a decision first and then the follower decides what to do. The primary objective of this research is to develop methods for designing systems such that agents' optimal strategies align naturally with the goals of the designer. These settings encompass a broad spectrum of design problems including mechanism design, information design, reward shaping, economic tax structures, transportation network optimization. To achieve this, I build new models when (1) multiple followers exist, relevant to applications such as meta-learning, personalization, platform operations, and transportation; (2) followers' decisions are influenced not only by the leader's decisions but also by global uncertainties; and (3) followers do not immediately act optimally but progressively learn the environment established by the leader. Such problems are typically challenging, computationally intensive, and require timely solutions, making efficiency critical. Existing single-loop bilevel optimization methods generally fail to scale to these scenarios, whereas double-loop methods suffer from excessive complexity. This research provides new models though the methodology builds on stochastic optimization with biased oracles. It also provides novel insights of independent interest to the broader community.

Contextual Bilevel Reinforcement Learning for Incentive Alignment. [ArXiv]

Vinzenz Thoma*, Barna Pasztor*, Andreas Krause, Giorgia Ramponi, Yifan Hu.

NeurIPS 2024.

Contextual Stochastic Bilevel Optimization. [Link]

Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn.

NeurIPS 2023.

Robust and Safe Reinforcement Learning and LLMs

Reinforcement learning (RL) has powered significant advancements in many areas. As these systems become more and more involved in everyday life, ensuring they are robust and reliable is essential. Robust RL and robust Reinforcement Learning from Human Feedback (RLHF) are central to building systems that can perform well in unpredictable or unseen scenarios while adhering to necessary safety standards. My focus is on enhancing both the robustness and safety of such systems in challenging real-world environments.

MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment. [ArXiv]

Tianze Wang, Dongnan Gui, Yifan Hu, Shuhang Lin, Linjun Zhang. ICML 2025.

Group Robust Preference Optimization in Reinforcement Learning with Human Feedback (RLHF). [ArXiv]

Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic.

NeurIPS 2024.

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces. [Link]

Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic.

AISTATS 2024.

Stochastic Optimization and Machine Learning with Biased Oracles

Stochastic gradient descent (SGD) and its variants have become the engine for training modern systems. A key underlying assumption in these methods is the availability of unbiased gradient estimators, typically obtained via backpropagation. However, many optimization problems do not have easily accessible unbiased gradient estimators, particularly when considerations such as personalization, robustness, privacy, and communication come into play. In these cases, one often encounters bilevel, min-max, compositional, or multi-stage stochastic optimization problems.

These problems share a common challenge: constructing gradient estimators with small bias often requires a large number of samples or high computational costs. This raises the natural and important question: can we reduce bias in the process without incurring excessive sampling or computational costs, while also improving model performance? I built a series of works and provided an affirmative answer. We design novel gradient-based methods that establish a delicate tradeoff between bias, variance, and computational cost and address the bias issue in most applications. In this line of works, I focus on characterizing the boundary conditions under which the bias does not deteriorate the performance, offering insights into how to manage this tradeoff effectively.