Haruka Kiyohara
清原 明加
hk844 [at] cornell.edu
Hi! I am Haruka Kiyohara (清原 明加).
I am a third-year Ph.D. student at Cornell CS, working on machine learning in (sequential) decision-making scenarios. My particular interest lies in reinforcement learning, counterfactual evaluation and learning, and their application to real-world scenarios, including recommendations. At Cornell, I am excited to pursue these research goals advised by Prof. Thorsten Joachims and Prof. Sarah Dean. Before coming to Cornell, I got B.Eng. from Tokyo Institute of Technology with Excellent Student Award, majoring in Industrial Engineering and Economics. I appreciate the financial support from the Funai Overseas Scholarship (full support) for the first two academic years, and the Quad Fellowship (partial support) for the third year at Cornell.
Links: [CV] [Research Statement] [Google Scholar] [X/Twitter] [LinkedIn] [GitHub] [SpeakerDeck]
日本語での自己紹介はこちら。
decision making x machine learning
I am motivated to facilitate the practical use of machine learning algorithms in interactive decision making systems such as recommendation, education, and healthcare.
I am particularly keen on three topics, (1) how to leverage logged data (off-policy evaluation/learning; OPE/L), (2) how to steer systems for long-term success (dynamics, control, social aspects), and (3) how to build a scalable and adaptable Recsys framework (practical constraints).
Keywords:
・Off-Policy Evaluation (OPE)
・Offline Reinforcement Learning (offline RL)
・Reinforcement Learning for Real Life (RL4RealLife)
・Long-term dynamics in decision-making systems
・Scalability vs adaptability tradeoff of two-tower recommenders
See also: [Research Statement/Statement of Purpose]
[2025.10] I gave a guest lecture at the ML in feedback systems class at Cornell (CS6784) and released the video recording here. I talked about the intro of off-policy evaluation and learning (OPE/L) and my own research related to OPE/L and ML in feedback systems. Check it out!
[2025.10] Our paper "Policy Design for Two-sided Platforms with Participation Dynamics" (ICML'25) has been featured by AIhub. Check it out!
[2025.08] Our paper "Off-Policy Learning for Diversity-aware Candidate Retrieval in Two-stage Decisions" has been accepted to the CONSEQUENCES workshop at RecSys2025. See you in Prague!
[2025.08] I have been selected as a Quad fellow for the 2025-2026 academic year (cohort 3, as a Japan fellow). I appreciate the financial support from them for my third year at Cornell, and look forward to interacting with the other fellows!
[2025.07] Our paper "An Off-Policy Learning Approach for Steering Sentence Generation towards Personalization" has been accepted to RecSys2025. See you in Prague!
[2025.06] I joined the Meta Central Applied Science (CAS) team as a student research intern. See you in San Francisco!
(* equal contribution)
Preprint
Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation.
arXiv preprint, 2023.
[arXiv] [slides] [software]
International Conference Proceedings (refereed)
Haruka Kiyohara, Daniel Yiming Cao, Yuta Saito, Thorsten Joachims.
An Off-Policy Learning Approach for Steering Sentence Generation towards Personalization.
ACM Conference on Recommender Systems (RecSys), 2025. (acceptance rate=19%)
[arXiv] [slides] [code]
Haruka Kiyohara, Fan Yao, Sarah Dean.
Policy Design for Two-sided Platforms with Participation Dynamics.
International Conference on Machine Learning (ICML), 2025. (acceptance rate=26.9%)
[arXiv] [slides] [code] [AIhub interview]
Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation.
International Conference on Learning Representations (ICLR), 2024. (acceptance rate=31%)
[arXiv] [slides] [software] [TokyoTech news]
Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto.
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model.
ACM International Conference on Web Search and Data Mining (WSDM), 2022. (acceptance rate=20.2%) (Best Paper Award Runner-Up)
[paper] [arXiv] [slides] [code]
Workshops
Haruka Kiyohara, Rayhan Khanna, Thorsten Joachims.
Off-Policy Learning for Diversity-aware Candidate Retrieval in Two-stage Decisions.
RecSys Workshop on Causality, Counterfactuals & Sequential Decision-Making (CONSEQUENCES), 2025. (Oral)
ICML Workshop on Scaling up Intervention Models (SIM), 2025.
OfflinePrompts -- main contributor
OfflinePrompts is a Python software for prompt-guided personalized sentence generation. The distinctive features of the library are (1) to connect the off-policy learning (OPL) modules and large language model (LLM)-based sentence generation modules for smooth experimentation, and (2) to provide a realistic simulation based on the MovieLens dataset.
SCOPE-RL -- main contributor
SCOPE-RL is an open-source Python software for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and (off-)policy selection (OPS). The library enables an end-to-end implementation of offline RL and OPE and implements a variety of OPE estimators, including basic estimators (DM, PDIS, and DR), state(-action) marginal IS estimators, cumulative distribution OPE for risk function estimate, and evaluation protocols for OPE, with a user-friendly interface.
Open Bandit Pipeline (OBP) -- developer
OBP is an open-source Python software for bandit algorithms and off-policy evaluation (OPE). The library provides easy-to-use implementations for the evaluation of bandit algorithms. It also enables fair comparisons of the various OPE estimators for the research purposes.
Interpretable Evaluation for Offline Evaluation (pyIEOE) -- contributor
pyIEOE is an open source Python toolkit for the evaluation of off-policy estimators. The library aids estimator selection by providing an interpretable comparison of estimators' robustness. Its easy-to-follow API also assists a reliable experiment in both research and practice.
Study materials
Guest lecture at the ML in feedback systems class at Cornell -- guest lecturer
I gave a guest lecture at the ML in feedback systems class at Cornell (CS6784) and released the video recording in the hyperlink above. I talked about the intro of off-policy evaluation and learning (OPE/L) and my own research related to OPE/L and ML in feedback systems. Other lecture series by Prof. Sarah Dean at Cornell is available here.
awesome-offline-rl
awesome-offline-rl is a github repository collecting various study materials related to offline reinforcement learning (Offline RL) and off-policy evaluation (OPE). It shares collections of papers, tutorial talks, blogs, open source software, and related workshop information.