Abstract
Treatment recommendation is a complex multi-faceted problem with many treatment goals considered by clinicians and patients, e.g., optimizing the survival rate, mitigating negative impacts, reducing financial expenses, avoiding over-treatment, etc. Recently, deep reinforcement learning (RL) approaches have gained popularity for treatment recommendation. In this paper, we investigate preference-based reinforcement learning approaches for treatment recommendation, where the reward function is itself learned based on treatment goals, without requiring either expert demonstrations in advance or human involvement during policy learning. We first present an open simulation platform to model the evolution of two diseases, namely Cancer and Sepsis, and individuals' reactions to the received treatment. Secondly, we systematically examine preference-based RL for treatment recommendation via simulated experiments and observe high utility in the learned policy in terms of high survival rate and low side effects, with inferred rewards highly correlated to treatment goals. We further explore the transferability of inferred reward functions and guidelines for agent design to provide insights in achieving the right trade-off among various human objectives with preference-based RL approaches for treatment recommendation in the real world.
Goal I: Survive
Goal II: Survive with small tumor
Illustration of preferences over two treatment trajectories given a patient with Cancer, based on two treatment goals. Starting from an initial state I, the patient adopts a treatment strategy with trajectory t and ends with either survival S or death D outcome. The final tumor size value is also shown for Goal II. A trajectory can be preferred to the other (>) or the two can be incomparable (~).