Haruka Kiyohara

清原 明加

hk844 [at] cornell.edu

[CV] 

About Me

Hi! I am Haruka Kiyohara (清原 明加).

I am a first-year Ph.D. student at Cornell CS, working on machine learning in (sequential) decision-making scenarios. My particular interest lies in reinforcement learning, counterfactual evaluation and learning, and their application to real-world scenarios, including recommendations. At Cornell, I am excited to pursue these research goals advised by Prof. Thorsten Joachims and Prof. Sarah Dean. Before coming to Cornell, I got B.Eng. from Tokyo Institute of Technology with Excellent Student Award, majoring in Industrial Engineering and Economics. I appreciate the financial support from the Funai Overseas Scholarship for the first two academic years at Cornell.


Links: [CV] [Google Scholar] [Twitter] [LinkedIn] [GitHub] [SpeakerDeck]

日本語での自己紹介はこちら

Research Interest

decision making x machine learning

I am motivated to facilitate the practical use of machine learning algorithms in interactive decision making systems such as recommendation and healthcare. 

Recently, I am particularly excited to explore (i) how the use of logged data can assist safe and reliable real-world implementations, and (ii) how to achieve desirable properties in decision making systems, such as fairness or diversity, for long-term objectives.

Keywords:
・Off-Policy Evaluation (OPE)
・Offline Reinforcement Learning (offline RL)
・Reinforcement Learning for Real Life (RL4RealLife)
・Long-term dynamics in decision-making systems

News

[2024.04] Our paper "Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation" has been featured by the TokyoTech news. Check it out!

[2024.03] Our paper "Prompt Optimization with Logged Bandit Data" has been accepted to the DPFM workshop at ICLR2024. See you in Vienna!

[2024.02] Our paper "Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction" is now on arXiv. Check it out!

[2024.01] Our paper "Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction" has been accepted to WebConf2024. See you in Singapore!

[2024.01] Our paper "Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation" has been accepted to ICLR2024. See you in Vienna!

[2023.12] Our twin papers: "Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation" (link) and "SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation" (link) are now on arXiv! We present a new evaluation metric and open-source software (GitHub, PyPI, documentation) for OPE. Feel free to star and folk!

[2023.09] Our paper "Future-Dependent Value-Based Off-Policy Evaluation in POMDPs" has been accepted to NeurIPS2023. See you in New Orleans!

[2023.08] I joined the Cornell CS Ph.D. program. I appreciate the financial support of the Funai Overseas Scholarship for the first two academic years. Looking forward to my new journey at Cornell!

[see the archives of news]

Featured Publications

(* equal contribution)

Preprint

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation.
arXiv preprint, 2023.
[arXiv] [slides] [software]


International Conference Proceedings (refereed)

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito.
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation.
International Conference on Learning Representations (ICLR), 2024. (acceptance rate=31%)
[arXiv] [slides] [software] [TokyoTech news]

Haruka Kiyohara, Masahiro Nomura, Yuta Saito.
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction.
ACM Web Conference (WebConf), 2024. (acceptance rate=20.2%)
[arXiv] [code] [slides]

Masatoshi Uehara*, Haruka Kiyohara*, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun.
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs.
Conference on Neural Information Processing Systems (NeurIPS), 2023. (acceptance rate=26.1%) (Spotlight)
[arXiv] [code]

Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto.
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model.
ACM International Conference on Web Search and Data Mining (WSDM), 2022. (acceptance rate=20.2%) (Best Paper Award Runner-Up)
[paper] [arXiv] [slides]


Workshop papers (refereed)

Haruka Kiyohara, Yuta Saito, Daniel Yiming Cao, Thorsten Joachims.
Prompt Optimization with Logged Bandit Data.
ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (DPFM), 2024.


[full list of publications]

Resources

SCOPE-RL -- main contributor
SCOPE-RL is an open-source Python software for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and (off-)policy selection (OPS). The library enables an end-to-end implementation of offline RL and OPE and implements a variety of OPE estimators, including basic estimators (DM, PDIS, and DR), state(-action) marginal IS estimators, cumulative distribution OPE for risk function estimate, and evaluation protocols for OPE, with a user-friendly interface. 

awesome-offline-rl -- maintainer
awesome-offline-rl is a github repository collecting various study materials related to offline reinforcement learning (Offline RL) and off-policy evaluation (OPE). It shares collections of papers, tutorial talks, blogs, open source software,  and related workshop information.

Open Bandit Pipeline (OBP) -- developer
OBP is an open-source Python software for bandit algorithms and off-policy evaluation (OPE). The library provides easy-to-use implementations for the evaluation of bandit algorithms. It also enables fair comparisons of the various OPE estimators for the research purposes.

Interpretable Evaluation for Offline Evaluation (pyIEOE) -- contributor
pyIEOE is an open source Python toolkit for the evaluation of off-policy estimators. The library aids estimator selection by providing an interpretable comparison of estimators' robustness. Its easy-to-follow API also assists a reliable experiment in both research and practice.