Zhengling Qi - Research

Publications

Accepted

Sequential knockoffs for variable selection in reinforcement learning with Ma, T., Cai, H., Shi, C. and Laber, E. by JASA
Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments with Wang, J. and Shi, C. by JASA
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding with Li, Y., Han, E., Hu, Y., Zhou, W., Cui, Y. and Zhu, R. by JASA

2025

Kuang, Q., Wang, J., Zhou, F., & Qi, Z.^ (2025). Breaking the Order Barrier: Off-Policy Evaluation for Confounded POMDPs. Advances in Neural Information Processing Systems (NeurIPS).
Hong, S., Wang, J., Qi, Z., & Wong, R. K. W. (2025). A Principled Path to Fitted Distributional Evaluation. Advances in Neural Information Processing Systems (NeurIPS). (Spotlight)
Bian, Z., Shi, C., Qi, Z., & Wang, L. (2025). Off-policy Evaluation in Doubly Inhomogeneous Environments. Journal of the American Statistical Association, 120(550), 1102–1114.
Tang, J., Qi, Z., Fang, E., & Shi, C. (2025). Offline Feature-Based Pricing under Censored Demand: A Causal Inference Approach. Manufacturing & Service Operations Management, 27(2), 535–553.
Hong, S., Qi, Z., & Wong, R. K. W. (2025). Distributional Off-policy Evaluation with Bellman Residual Minimization. International Conference on Artificial Intelligence and Statistics (AISTATS).
Qi, Z., Bai, C., Wang, Z., & Wang, L. (2025). Distributional Off-policy Evaluation in Reinforcement Learning. Journal of the American Statistical Association. (Articles in Advance).
Fu, Z., Qi, Z.^, Yang, Z., Wang, Z., & Wang, L. (2025). Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information. Management Science. (Articles in Advance).

2024

Qi, Z.*^, Miao, R.#*, & Zhang, X. (2024). Proximal Learning for Individualized Treatment Regimes Under Unmeasured Confounding. Journal of the American Statistical Association, 119(546), 915–928.
Shi, C.*, Qi, Z.*^, Wang, J., & Zhou, F.^ (2024). Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization. Journal of the American Statistical Association, 119(546), 1147–1160.
Yu, S., Fang, S., Peng, R., Qi, Z., Zhou, F., & Shi, C. (2024). Two-way Deconfounder for Off-policy Evaluation under Unmeasured Confounding. Advances in Neural Information Processing Systems (NeurIPS).
Liu, B., Qi, Z., Zhang, X., & Liu, Y. (2024). Change Point Detection for High-dimensional Linear Models: A General Tail-adaptive Approach. Statistica Sinica. (Accepted).
Wang, J., Qi, Z.^, & Wong, R. K. W. (2024). A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models. International Conference on Machine Learning (ICML).
Hong, M.#, Qi, Z.^, & Xu, Y. (2024). Model-based Reinforcement Learning for Confounded POMDPs. International Conference on Machine Learning (ICML).
Hong, M.#, Qi, Z.^, & Xu, Y. (2024). A Policy Gradient Method for Confounded POMDPs. International Conference on Learning Representations (ICLR).
Zhu, J., Wan, R., Qi, Z., Luo, S., & Shi, C. (2024). Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards. International Conference on Artificial Intelligence and Statistics (AISTATS).

2023

Wang, J.#, Qi, Z.^, & Wong, R. K. W. (2023). Projected State-Action Balancing Weights for Offline Reinforcement Learning. The Annals of Statistics, 51(4), 1639–1665.
Qi, Z., Pang, J.-S., & Liu, Y. (2023). On Robustness of Individualized Decision Rules. Journal of the American Statistical Association, 118(543), 2143–2157.
Yang, H.#, Qi, Z.^, Cui, Y., & Chen, P. (2023). Pessimistic Model Selection for Deep Reinforcement Learning. Conference on Uncertainty in Artificial Intelligence (UAI).
Dong, J.#, Mo, W., Qi, Z.^, Shi, C., Fang, X., & Tarokh, V. (2023). PASTA: Pessimistic Assortment Optimization. International Conference on Machine Learning (ICML).
Zhou, Y.#, Qi, Z., Shi, C., & Li, L. (2023). Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach. International Conference on Artificial Intelligence and Statistics (AISTATS).

2022

Liao, P.*, Qi, Z.*^, Wan, R., Klasnja, P., & Murphy, S. (2022). Batch Policy Learning in Average Reward Markov Decision Processes. The Annals of Statistics, 50(6), 3364–3387.
Qi, Z., Cui, Y., Liu, Y., & Pang, J.-S. (2022). Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization. Mathematics of Operations Research, 47(3), 2034–2064.
Miao, R.#, Qi, Z.^, & Zhang, X. (2022). Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models. Advances in Neural Information Processing Systems (NeurIPS).
Tan, X.#, Qi, Z., Seymour, C., & Tang, L. (2022). RISE: Robust Individualized Decision Learning with Sensitive Variables. Advances in Neural Information Processing Systems (NeurIPS).
Chen, X., & Qi, Z.^ (2022). On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation. International Conference on Machine Learning (ICML).

2021 and Prior

Qi, Z., Cui, Y., Liu, Y., & Pang, J.-S. (2021). Estimation of Individualized Decision Rules Based on An Optimized Covariate-dependent Equivalent of Random Outcomes. SIAM Journal on Optimization, 31(4), 3119–3148.
Mo, W., Qi, Z., & Liu, Y. (2021). Learning Optimal Distributionally Robust Individualized Treatment Rules. Journal of the American Statistical Association, 116(534), 659–674.
Mo, W., Qi, Z., & Liu, Y. (2021). Rejoinder to "Learning Optimal Distributionally Robust Individualized Treatment Rules". Journal of the American Statistical Association, 116(534), 685–689.
Qi, Z., Liu, D., Fu, H., & Liu, Y. (2020). Multi-armed Angle-based Direct Learning for Estimating Optimal Individualized Treatment Rules with Various Outcomes. Journal of the American Statistical Association, 115(530), 678–691.
Zheng, J., Qi, Z., Tan, Y., & Dou, Y. (2019). How Mega is the Mega? Measuring the Spillover Effect of WeChat Using Graphical Models. Information Systems Research, 30(4), 1343–1362.
Qi, Z., & Liu, Y. (2019). Convex Bidirectional Large Margin Classifier. Technometrics, 61(2), 176–186.
Qi, Z., & Liu, Y. (2018). D-learning to Estimate Optimal Individualized Treatment Rules. Electronic Journal of Statistics, 12(2), 3601–3638.
Liang, S., Qi, Z., Qu, S., Zhu, J., Chiu, A. S., Jia, X., & Xu, M. (2016). Scaling of Global Input-output Networks. Physica A: Statistical Mechanics and its Applications, 452, 311–319.

* These authors contributed equally to the manuscript.

# Ph.D. students by the time of submission.

^ Corresponding author