Research

Research Funding

Our proposal got supported by NCCR Automation under Swiss NSF. It supports a PhD for four years. 

Research Directions

Solving Nonconvex Optimization to Global Optimality and Applications in Supply Chain and Revenue Management

Many nonconvex network revenue management, inventory control, and reinforcement learning problems are well-structured and admit a special structure known as hidden convexity, i.e., there exists a convex reformulation via a variable transformation. However, the transformation from the nonconvex problem to the convex reformulation may be unknown or involve unknown distributions. Thus it is still hard to directly solve the convex counterpart to global optimality. In this direction, we investigate how to design easy-to-implement global converging algorithms that directly solve the nonconvex optimization. Various applications, including booking limit control in quantity-based network revenue management in airlines business, pricing-based network revenue management,  inventory system, convex reinforcement learning, all fall into such a problem category. 


Stochastic Optimization and Machine Learning with Biased Oracles

Stochastic gradient decent has become the engine for modern machine learning and artificial intelligence. However, a wide range of such problems do not have easily accessible unbiased gradient estimators, especially when one cares also about personalization, robustness, and privacy. For instance, distributionally robust optimization, policy gradient methods in reinforcement learning, generative adversarial network, end-to-end learning, causal optimal transport, personalized learning, meta-learning and many others. These problems all share a common feature:  one can construct gradient estimators with small bias using large number of samples or high computation costs. A natural question arises: is it really necessary that we have to pay high costs to reduce the bias in a learning system? In this project, we study the tradeoff between the bias, variance, and cost for stochastic optimization and machine learning with biased oracles, aiming to reduce the total sampling and computational costs.



Decision Making with Side Information/Contextual Optimization 

In this direction, we moves beyond the classical stochastic bilevel optimization model to consider when the lower-level problem aims to minimize a conditional expectation objective. In particularly, it covers two scenarios that classical model does not: 1) when there are multiple followers such as in meta-learning, personalization, platform operations, and transportation, 2) when the follower makes a best response not only to the leader's decision but also some global uncertainty, such as in end-to-end learning, optimization with side information, and causal optimal transport. We aim to solve all these problems in one unified framework called contextual stochastic bilevel optimization. The challenge lies in that nearly all existing single-loop stochastic bilevel optimization methods are not applicable and all double-loop methods admit far sub-optimal complexity. Our work focuses on designing algorithms for such problems with optimal theoretical guarantees and efficient implementation.


Optimization for Causal Inference

In this line of research, I collaborate with statisticians to address open hard computational problems arising from statistics such as instrumental variable regression and causal inference. 


Robust and Safe Reinforcement Learning and LLMs

Online learning, involving both (contextual) bandit and reinforcement learning, has revolutionized various applications, such as autonomous driving, large language model generations, protein folding, and clinical trials assignment. As these artificial intelligence systems get more and more involved in our daily life, we are inevitably in a situation to build more robust, more reliable, safer, and more sustainable artificial intelligence systems. To achieve these goals, I target at safety and robustness in online learning. The robustness aims to enhance the system to behavior well even in unseen scenarios while the safety part aims to ensure that the system satisfies both explict and implicit safe constraints.