Human-Centered Reinforcement Learning
for Personalized Coaching in Health

Introduction

Chronic diseases, such as type 2 diabetes (T2D), hypertension, and obesity place an ever-increasing burden on individuals and society at large [1]. Health coaching, in which human experts use knowledge and experience to help individuals who wish to improve their health identify appropriate health goals, and provide practical advice, encouragement, and feedback towards attaining these goals, has emerged as an effective approach to promoting self-management [4,11,14,18]. However, there are not enough coaching professionals to accommodate the growing population of individuals with chronic diseases, particularly those in medically underserved communities with limited access to traditional healthcare [5,13]. In addition, there are disparities in access to in-person coaching, including transportation and cost [5,12]. 

Conversational agents (CA) have the potential to overcome these barriers and make health coaching available to more diverse populations. CA have been successfully used in many health contexts and domains, including health coaching [2]. The majority of CA in health have relied on fully-scripted approaches, in which the flow of all possible dialogs is written in advance and users choose from available requests and responses [8]. However, these approaches are less suitable for more open-ended dialogs, such as coaching dialogs in the context of chronic diseases, where the complexity of dialog structures can quickly become unmanageable, and fully-scripted dialogs can be perceived as rigid and repetitive [3, 7]. In these contexts, data-driven CA that rely on machine learning (ML) to learn appropriate dialog structures have advantages over the fully-scripted ones.

One particularly promising data-driven approach is with reinforcement learning, a machine learning technique that learns from interactions and prescribes sequences of actions towards reaching a predetermined goal (RL, [3,15]). Given the fluid and flexible structure of coaching and its emphasis on addressing unique needs of each client, RL presents a promising approach for coaching CA. However, RL-based CAs have several limitations that are particularly important in the context of health coaching. First, previous studies showed that optimization algorithms used to train RL-based CA often produce dialogs that are efficient (short) but that are perceived as random by human users [16]. Consequently, there is a need for new approaches to aligning RL-based CA with human reasoning and expectations. This is further exacerbated by the general lack of explainability of actions chosen by an RL agent, a problem common to many ML methods. Consequently, there is a need for new approaches to generating explanations for RL inferences and recommendations.

In this project, we will develop a new approach to providing health coaching with RL-based CA, while at the same time addressing the more general challenges mentioned above on designing human-centered and scalable RL-based CA. We will build upon our prior investigations of CA for nutritional coaching in T2D [9,10]. In our prior work we developed T2 Coach, a fully-scripted CA that follows a coaching protocol Brief Action Planning (BAP, [6]). T2 Coach helps individuals to set nutritional goals and provides assistance with goal attainment via daily dialogs that include reminders, suggestions, and opportunities for reflection. Furthermore, we have investigated new approaches to more directly supporting daily mealtime choices with micro-coaching, which focuses on helping individuals determine how well their planned meals align with their specified nutritional goals, a critical need that often presents a significant barrier to individuals with low nutritional literacy, and is particularly acute for medically underserved communities that suffer from higher prevalence of T2D [1].

To that end, we have developed a micro-coaching CA that 1) elicits individuals’ meal plans (e.g. “Hi Lena, what do you plan to have for lunch?”), 2) collects answers as free-text responses (e.g., “a salad”), 3) converts these free-form descriptions into a computable form using natural language processing (NLP) and a food ontology, 4) assesses these descriptions on alignment with their previously established nutritional goals using a set of rules, and 5) uses follow-up questions learned using RL to disambiguate situations when initial meal descriptions do not contain sufficient information [9].

An initial evaluation study of this approach found that RL-based dialogs were able to efficiently collect information needed to determine whether meal plans met the goals or not. However, it required considerable human engineering of the RL action and reward space to align them with each nutritional goal, thus lacking generalizability. In addition, it suffered from challenges common to RL-based chatbots: while RL-learned follow-up question sequences were short and effective, they were rated low in quality and perceived as not intuitive. This was further exacerbated by the lack of explanations for both the questions and the final determination, which limited users’ ability to alter their meal choices.

In the proposed work we will address these limitations in two complementary ways. First, we will develop a more general, data-driven approach to learning appropriate follow-up questions that does not rely on manual engineering of the RL action and state spaces and is suitable for multiple nutritional goals. At the same time, we will work on addressing the more general challenges of RL-based CA of aligning RL-based dialogs with human reasoning and generating human-understandable explanations of RL inferences and recommendations.

Intellectual Merit

The intellectual merit of this research is two-fold as it will advance the state of the art in RL-based CA for health coaching, and address some of the general limitations of RL-based CA that extend beyond the domain of health coaching. Specifically, the project will include:

Broader Impact

We anticipate that this research will be consequential to society at large in several ways. This research can pave the way for new, human-centered approaches for designing health coaching tools for diverse populations. We anticipate that conversational interfaces can lower entry barriers for engaging with technological interventions in health and wellness for diverse communities with different degrees of education and experience with technologies, and reduce “intervention-generated inequalities” in health [17]. Furthermore, while we focus on nutrition coaching, new techniques for aligning RL with human reasoning and explaining its inferences and choices to users can increase its applicability to a broader set of problems and domains.

On a broader level, this research and educational plan take important steps towards further promoting human-centered approaches to data science, ML, and AI education that can have broader impact on future research in this field. This will include development of a new course on interaction paradigms for intelligent decision support, a summer training program on principles of human-centered AI for students in biomedical informatics and related fields, and a summer training program on human-centered design of data-driven systems for undergraduate and high school students as part of the undergraduate research program at the Department of Biomedical Informatics at Columbia University.

References