Search this site
Embedded Files
Zhengling Qi
  • Home
  • Research
  • Teaching & Presentations
Zhengling Qi
  • Home
  • Research
  • Teaching & Presentations
  • More
    • Home
    • Research
    • Teaching & Presentations

Selected Manuscripts

  • Sequential knockoffs for variable selection in reinforcement learning with Ma, T., Cai, H., Shi, C. and Laber, E. 

  • STEEL: Singularity-aware Reinforcement Learning with Chen, X., and Wan, R.

  • Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments with Wang, J. and  Shi, C.

  • Robust Batch Policy Learning in Markov Decision Processes. With Liao, P.

  • Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes. with Fu, Z, Wang, Z, Yang, Z, Xu, Y, and Kosorok, MR

  • Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning with Rui Miao, Cong Shi and Lin Lin

  • A New Estimator for Encouragement Design in Field Experiments When the Exclusion Restriction Is Violated with Guangying Chen, Cheng Lu, Tat Y. Chan, Dennis J. Zhang and Industry Collaborators.

  • Distributional Off-policy Evaluation with Bellman Residual Minimization with Sungee Hong and Raymond Wong

  • A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing with Zeyu Bian, Cong Shi and Lan Wang

  • Offline Dynamic Inventory and Pricing Strategy: Addressing Censored and Dependent Demand with Korel Gundem

  • A Principled Path to Fitted Distributional Evaluation with Sungee Hong, Jiayi Wang and Raymond Wong

  • Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning with Korel Gundem, Juncheng Dong, Dennis, Zhang and Vahid Tarokh.




Publications


  • Qi, Z., Bai, C., Wang, Z., Wang, L. Distributional off-policy evaluation in reinforcement learning, Journal of the American Statistical Association , to appear. 

  • Fu, Z., Qi, Z., Yang, Z., Wang, Z., Wang, L. Offline reinforcement learning for human-guided human-machine interaction with private information, Management Science, to appear.

  • Hong, S., Qi, Z., Wong, R. K. W. Distributional off‑policy evaluation with Bellman residual minimization, International Conference on Artificial Intelligence and Statistics (AISTATS), 2025.

  • Tang, J., Qi, Z., Fang, E., Shi, C. Offline Feature-Based Pricing under Censored Demand: A Causal Inference Approach, Manufacturing & Service Operations Management, to appear.

  • Yu, S., Fang, S., Peng, R., Qi, Z., Zhou, F. and Shi, C. (2024). Two-way Deconfounder for Off-policy Evaluation under Unmeasured Confounding, Advances in Neural Information Processing Systems (NeurIPS). [code]

  • Bian, Z., Shi, C., Qi, Z. and Wang, L. (2024+). Off-policy Evaluation in Doubly Inhomogeneous Environments, Journal of the American Statistical Association, accepted. [code]

  • Liu, B., Qi, Z., Zhang, X., and Liu, Y. (2024+). Change point detection for high-dimensional linear models: A general tail-adaptive approach. Statistica Sinica, to appear.

  • Wang, J., Qi, Z., Wong, R.K W.  A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models. International Conference on Machine Learning (ICML) 2024.

  • Hong, M.#, Qi, Z., Xu, Y.  Model-based Reinforcement Learning for Confounded POMDPs. International Conference on Machine Learning (ICML) 2024.

  • Zhu J., Wan, R., Qi, Z., Luo, S., Shi, C. Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards, AISTATS 2024. 

  • Hong, M.#, Qi, Z., Xu, Y.  A Policy Gradient Method for Confounded POMDPs. International Conference on Learning Representations (ICLR) 2024. [JSM Student Paper Award from the ASA Section on Nonparametric Statistics]

  • Shi, C.*, Qi, Z.*, Wang, J., Zhou, F.  Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization. Journal of the American Statistical Association To appear. [code]

  • Wang, J.#, Qi, Z., Wong, R. K W.   Projected State-Action Balancing Weights for Offline Reinforcement Learning. Annals of Statistics To appear.

  • Yang, H.#, Qi, Z., Cui, Y., Chen, P. (2023) Pessimistic model selection for deep reinforcement learning. Conference on Uncertainty in Artificial Intelligence.

  • Dong, J.#, Mo, W., Qi, Z., Shi, C., Fang, X., and Tarokh, V.  PASTA: Pessimistic Assortment Optimization International Conference on Machine Learning (ICML 2023), Accepted.

  • Zhou, Y.#, Qi, Z., Shi, C. and Li, L. (2023). Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach, AISTATS 2023. [code]

  • Qi, Z.*, Miao, R.*#, Zhang, X. Proximal Learning for Individualized Treatment Regimes Under Unmeasured Confounding. Journal of the American Statistical Association To appear.  [code]

  • Liao, P.,*, Qi, Z.,*  Wan, R., Klasnja, P., and Murphy, S., Batch Policy Learning in Average Reward Markov Decision Processes. Accepted at Annals of Statistics.

  • Miao, R.#, Qi, Z., Zhang, X. Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models. Accepted at Advances in Neural Information Processing Systems (NeurIPS), 2022 [code]

  • Tan, X.#, Qi, Z., Seymour, C., Tang, L. RISE: Robust Individualized Decision Learning with Sensitive Variables. Accepted at Advances in Neural Information Processing Systems (NeurIPS), 2022 [ENAR 2023 Distinguished Student Paper Award][code]

  • Chen, X., Qi, Z., On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation. International Conference on Machine Learning (ICML 2022), Baltimore, MD.

  • Qi, Z., Pang, J.-S., Liu, Y. On Robustness of Individualized Decision Rules.  Journal of the American Statistical Association To appear.

  • Qi, Z., Cui, Y., Liu, Y., Pang, J.-S. Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization.  Accepted at Mathematics of Operations Research.

  • Mo, W.*, Qi, Z.*, Liu, Y. Rejoinder to "Learning Optimal Distributionally Robust Individualized Treatment Rules". Journal of the American Statistical Association To appear.

  • Mo, W.*, Qi, Z.*, Liu, Y. Learning Optimal Distributionally Robust Individualized Treatment Rules.(with discussion) Journal of the American Statistical Association To appear.

  • Qi, Z., Cui, Y., Liu, Y., Pang, J.-S. Estimation of Individualized Decision Rules Based on An Optimized Covariate-dependent Equivalent of Random Outcomes. To appear at SIAM Journal on Optimization.

  • Zheng. J., Qi, Z., Tan, Y., and Dou, Y.  How Mega is the Mega? Measuring the Spillover Effect of WeChat Using Graphical Models. To appear at Information Systems Research.

  • Qi, Z., Liu, D., Fu, H., Liu, Y. (2018+). Multi-armed Angle-based Direct learning for Estimating Optimal Individualized Treatment Rules with Various Outcomes.  Journal of the American Statistical Association.

  • Qi, Z., Liu, Y. (2018+). D-learning to Estimate Optimal Individualized Treatment Rules. Electronic Journal of Statistics.

  • Qi, Z., Liu, Y. (2018+). Convex Bidirectional Large Margin Classifier. Technometrics.

  • Liang, S., Qi, Z., Qu, S., Zhu, J., Chiu, A. S., Jia, X., and Xu, M. (2016). Scaling of global input-output networks. Physica A: Statistical Mechanics and its Applications, 452, 311-319.

* These authors contributed equally to the manuscript.

# Ph.D. students by the time of submission.


Google Sites
Report abuse
Google Sites
Report abuse