Haruka Kiyohara
清原 明加
hk844 [at] cornell.edu
Hi! I am Haruka Kiyohara (清原 明加).
I am a third-year Ph.D. student (candidate) at Cornell CS, working on machine learning in (sequential) decision-making scenarios. My particular interest lies in reinforcement learning, counterfactual evaluation and learning, and their application to real-world scenarios, including recommendations. At Cornell, I am excited to pursue these research goals advised by Prof. Thorsten Joachims and Prof. Sarah Dean. Before coming to Cornell, I got B.Eng. from Tokyo Institute of Technology with Excellent Student Award, majoring in Industrial Engineering and Economics. I appreciate the financial support from the Funai Overseas Scholarship (full support) for the first two academic years, and the Quad Fellowship (partial support) for the third year at Cornell.
Links: [CV] [Research Statement] [Google Scholar] [X/Twitter] [LinkedIn] [GitHub] [SpeakerDeck]
日本語での自己紹介はこちら。
decision making x machine learning
I am motivated to facilitate the practical use of machine learning algorithms in interactive decision making systems such as recommendation, education, and healthcare.
I am particularly keen on three topics, (1) how to leverage logged data (off-policy evaluation/learning; OPE/L), (2) how to steer systems for long-term success (dynamics, control, social aspects), and (3) how to build a scalable and adaptable Recsys framework (practical constraints).
Keywords:
・Off-Policy Evaluation (OPE)
・Offline Reinforcement Learning (offline RL)
・Reinforcement Learning for Real Life (RL4RealLife)
・Long-term dynamics in decision-making systems
・Scalability vs adaptability tradeoff of two-tower recommenders
See also: [Research Statement/Statement of Purpose]
[2025.11] I have become a Ph.D. candidate (and received M.S. from Cornell CS)! I look forward to the upcoming journey in the rest of my PhD program, and I'm grateful for the kind support from my amazing advisors, committee, collaborators, and also financial support from the Funai Overseas Scholarship and the Quad Fellowship.
[2025.10] I gave a guest lecture at the ML in feedback systems class at Cornell (CS6784) and released the video recording here. I talked about the intro of off-policy evaluation and learning (OPE/L) and my own research related to OPE/L and ML in feedback systems. Check it out!
[2025.10] Our paper "Policy Design for Two-sided Platforms with Participation Dynamics" (ICML'25) has been featured by AIhub. Check it out!
[2025.08] Our paper "Off-Policy Learning for Diversity-aware Candidate Retrieval in Two-stage Decisions" has been accepted to the CONSEQUENCES workshop at RecSys2025. See you in Prague!
[2025.08] I have been selected as a Quad fellow for the 2025-2026 academic year (cohort 3, as a Japan fellow). I appreciate the financial support from them for my third year at Cornell, and look forward to interacting with the other fellows!
[2025.07] Our paper "An Off-Policy Learning Approach for Steering Sentence Generation towards Personalization" has been accepted to RecSys2025. See you in Prague!
[2025.06] I joined the Meta Central Applied Science (CAS) team as a student research intern. See you in San Francisco!
(* equal contribution)
Preprint
Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation.
arXiv preprint, 2023.
[arXiv] [slides] [software]
International Conference Proceedings (refereed)
Haruka Kiyohara, Daniel Yiming Cao, Yuta Saito, Thorsten Joachims.
An Off-Policy Learning Approach for Steering Sentence Generation towards Personalization.
ACM Conference on Recommender Systems (RecSys), 2025. (acceptance rate=19%)
[arXiv] [slides] [code]
Haruka Kiyohara, Fan Yao, Sarah Dean.
Policy Design for Two-sided Platforms with Participation Dynamics.
International Conference on Machine Learning (ICML), 2025. (acceptance rate=26.9%)
[arXiv] [slides] [code] [AIhub interview]
Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation.
International Conference on Learning Representations (ICLR), 2024. (acceptance rate=31%)
[arXiv] [slides] [software] [TokyoTech news]
Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto.
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model.
ACM International Conference on Web Search and Data Mining (WSDM), 2022. (acceptance rate=20.2%) (Best Paper Award Runner-Up)
[paper] [arXiv] [slides] [code]
Workshops
Haruka Kiyohara, Rayhan Khanna, Thorsten Joachims.
Off-Policy Learning for Diversity-aware Candidate Retrieval in Two-stage Decisions.
RecSys Workshop on Causality, Counterfactuals & Sequential Decision-Making (CONSEQUENCES), 2025. (Oral)
ICML Workshop on Scaling up Intervention Models (SIM), 2025.
OfflinePrompts -- main contributor
OfflinePrompts is a Python software for prompt-guided personalized sentence generation. The distinctive features of the library are (1) to connect the off-policy learning (OPL) modules and large language model (LLM)-based sentence generation modules for smooth experimentation, and (2) to provide a realistic simulation based on the MovieLens dataset.
SCOPE-RL -- main contributor
SCOPE-RL is an open-source Python software for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and (off-)policy selection (OPS). The library enables an end-to-end implementation of offline RL and OPE and implements a variety of OPE estimators, including basic estimators (DM, PDIS, and DR), state(-action) marginal IS estimators, cumulative distribution OPE for risk function estimate, and evaluation protocols for OPE, with a user-friendly interface.
Open Bandit Pipeline (OBP) -- developer
OBP is an open-source Python software for bandit algorithms and off-policy evaluation (OPE). The library provides easy-to-use implementations for the evaluation of bandit algorithms. It also enables fair comparisons of the various OPE estimators for the research purposes.
Interpretable Evaluation for Offline Evaluation (pyIEOE) -- contributor
pyIEOE is an open source Python toolkit for the evaluation of off-policy estimators. The library aids estimator selection by providing an interpretable comparison of estimators' robustness. Its easy-to-follow API also assists a reliable experiment in both research and practice.
Study materials
Guest lecture at the ML in feedback systems class at Cornell -- guest lecturer
I gave a guest lecture at the ML in feedback systems class at Cornell (CS6784) and released the video recording in the hyperlink above. I talked about the intro of off-policy evaluation and learning (OPE/L) and my own research related to OPE/L and ML in feedback systems. Other lecture series by Prof. Sarah Dean at Cornell is available here.
awesome-offline-rl
awesome-offline-rl is a github repository collecting various study materials related to offline reinforcement learning (Offline RL) and off-policy evaluation (OPE). It shares collections of papers, tutorial talks, blogs, open source software, and related workshop information.