Research

Research Funding

Robust Convex Reinforcement Learning. NCCR Automation, Swiss National Science Foundation.

Sustainable Supply Chain. NCCR Automation, Swiss National Science Foundation.

Research Directions

My research interest lies in data-driven decision-making. The methodology comes from stochastic and robust optimization, machine learning, reinforcement learning, and statistics. I study problems arising from language models, supply chain management, revenue management and pricing,  meta learning, and causal inference.

Global Optimality of Structured Nonconvex Optimization and Applications in Operations Models and Machine Learning

Finding global optima for general nonconvex optimization problems is well-known to be computationally intractable. Fortunately, many nonconvex problems in areas such as supply chain management, revenue management, causal inference, and reinforcement learning exhibit well-structured properties. For instance, some possess hidden convexity, meaning they can be reformulated as convex problems through a variable transformation. However, identifying these transformations can often be either unknown or computationally infeasible, making it difficult to solve the convex reformulation to global optimality. Additionally, certain problems, such as base-stock policies in inventory management, involve dynamic programming with convex recursions but remain nonconvex in the decision variables. When randomness is introduced, dynamic programming becomes particularly computationally intensive.

This raises a natural research question: can one find global optimality of these structured nonconvex optimization problems efficiently? We provide a positive answer. For problems with hidden convexity, we explore the design of easy-to-implement, globally converging algorithms that directly solve the hidden convex optimization. These methods consistently outperform bid-price control benchmarks, achieving higher revenues in passenger and air-cargo network revenue management. The approach is also applicable to pricing-based network revenue management, inventory management with random supply, capacity, or yield, linear quadratic control, convex reinforcement learning, and certain causal discovery problems. For dynamic problems from  inventory systems and cash balance problems, we show that policy optimization in finite-horizon Markov decision processes exhibits a benign nonconvex landscape. Consequently, policy gradient methods can efficiently converge to global optima despite nonconvexity. The goal of this line of work is to expand the class of nonconvex problems that can be efficiently solved and provide solution methods with provable guarantees.

Zhenyu Wang*, Yifan Hu*,  Peter Bühlmann, Zijian Guo.

(α-β) Xin Chen, Yifan Hu, and Minda Zhao

(α-β) Ilyas Fatkhullin, Niao He, Yifan Hu

Major revision in journal. 

(α-β) Xin Chen, Niao He, Yifan Hu, and Zikun Ye

Operations Research, 2024.

Stochastic Optimization and Machine Learning with Biased Oracles

Stochastic gradient descent (SGD) and its variants have become the engine for training modern machine learning and artificial intelligence systems. A key underlying assumption in these methods is the availability of unbiased gradient estimators, typically obtained via backpropagation. However, many optimization and machine learning problems do not have easily accessible unbiased gradient estimators, particularly when considerations such as personalization, robustness, privacy, and communication come into play. In these cases, one often encounters bilevel, min-max, compositional, or multi-stage stochastic optimization problems. For example, fine-tuning large language models (LLMs) involves bilevel reinforcement learning; meta-learning, end-to-end learning, and personalized learning are bilevel optimization problems; robust learning and generative adversarial networks (GANs) are min-max problems; and causal learning can take the form of compositional optimization.

These problems share a common challenge: constructing gradient estimators with small bias often requires a large number of samples or high computational costs. This raises the natural and important question: can we reduce bias in the learning process without incurring excessive sampling or computational costs, while also improving model performance? I built a series of works and provided an affirmative answer. We design novel gradient-based methods that establish a delicate tradeoff between bias, variance, and computational cost and address the bias issue in most applications. Given these works, I was invited to write a short review of four streams of stochastic biased gradient methods [Link]. As a milestone, my works have characterized the boundary conditions under which the bias does not deteriorate the performance, offering insights into how to manage this tradeoff effectively. 

Vinzenz Thoma*, Barna Pasztor*, Andreas Krause, Giorgia Ramponi, Yifan Hu

NeurIPS 2024.

Yifan Hu, Jie Wang, Xin Chen, and Niao He. 

Under review at Operations Research. (Conference version see NeurIPS 2021 [Link])

Siqi Zhang*, Yifan Hu*, Liang Zhang, Niao He.  

AISTATS 2024.

Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn

NeurIPS 2023.

Yifan Hu, Xin Chen, and Niao He

NeurIPS 2021.

Yifan Hu*, Siqi Zhang*, Xin Chen, and Niao He

NeurIPS 2020

Yifan Hu, Xin Chen, and Niao He

SIAM Journal on Optimization 2020.

Design with Side Information: Contextual Optimization

In the data-driven era, decision makers often collect more information than what is directly used in decision-making processes. Effectively leveraging this additional side information to enhance the sequential decision making decision making forms the central theme of this line of work. Specifically, I focus on contextual bilevel  reinforcement learning, extending beyond classical settings where there is only a single follower or only a static optimization problem. I adapt to more realistic scenarios where: (1) multiple followers exist, as seen in meta-learning, personalization, platform operations, and transportation; and (2) followers respond not only to the leader's decisions but also to global uncertainties, such as in end-to-end learning, optimization with side information, and causal optimal transport; (3) followers does not react optimally immediately but is learning the environment setup by the leader. Such problems cover wide range of design problems, including information design, reward design for reinforcement learning with human feedback in large language models, tax design in economics, transportation network design, imitation learning, and inverse RL. Many of these problems are notoriously hard and computationally intensive yet often require timely feedback. Thus efficiency becomes even more critical. A key difficulty is that most existing single-loop bilevel optimization methods does not extend to such a setting, while double-loop methods suffer from highly suboptimal complexity.  This line of research builds on top of  stochastic optimization with biased oracles and is of independent interest to a broader audience

Vinzenz Thoma*, Barna Pasztor*, Andreas Krause, Giorgia Ramponi, Yifan Hu

NeurIPS 2024.

Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn

NeurIPS 2023.

Large-Scale Causal Inference: An Optimization Perspective

Understanding correlation alone is insufficient for effective decision-making; causality must also be understood. While various methods for causal inference have been developed by the statistics and causality communities, these methods often face significant computational challenges when applied to large-scale problems for three reasons: (1) there lacks principled approaches to design simple, single-stage optimization objectives for causal inference problems, making it difficult to directly apply many deep learning tools; (2) many existing methods leverage closed-form solutions, which do not generalize to neural network approximations; and (3) some causality problems, after formulating as optimization, are inherently nonconvex. 

The first two challenges requires mathematical optimization modeling. Leveraging my expertise in stochastic optimization with biased oracles, I demonstrate that many causal inference problems traditionally solved via two-stage procedures or via zero-sum games (for instance instrumental variable regression) can be reformulated as simple single-stage optimization problems. This enables the use of modern deep learning techniques. For the third challenge, the design of the mathematical optimization model requires both identifiability and computational tractability. I show that the proposed optimization formulations for some causality problems are not only identifiable but also exhibit benign nonconvexity. Leveraging my expertise in global optimality for nonconvex optimization, I have also designed efficient algorithms to identify causal relationships. My aim for this line of research is to bridge the gap between optimization and causality, providing more versatile optimization tools for modeling and solving causality problems efficiently. I collaborate closely with researchers in statistics and causality to complement our expertise in these areas. Two ongoing projects will be released soon: one addressing the brute-force search in causal discovery and the other simplifying the multi-step procedures in causal inference.

Zhenyu Wang*, Yifan Hu*,  Peter Bühlmann, Zijian Guo.

Xuxing Chen*, Abhishek Roy*, Yifan Hu, Krishna Balasubramanian. 

NeurIPS 2024.

Robust and Safe Reinforcement Learning and LLMs

Reinforcement learning (RL) has powered significant advancements in areas like autonomous driving, large language model and generation model (OpenAI o1), protein folding, and clinical trials. As artificial intelligence systems become more and more involved in everyday life, ensuring they are robust and reliable is essential. Robust RL and robust Reinforcement Learning from Human Feedback (RLHF) are central to building systems that can perform well in unpredictable or unseen scenarios while adhering to necessary safety standards. My focus is on enhancing both the robustness and safety of RL systems, aiming to develop AI that remains dependable and effective even in challenging real-world environments.

Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic

AISTATS 2024. 

Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic

NeurIPS 2024.