Rationale and Objectives


Advances in medical technology have greatly increased the amount of available information (extensive electronic health records recording patient conditions, diagnostic tests, labs, imaging exams, genomics, proteomics, treatments, outcomes etc.) that is often relevant for clinical decision support. These advances have an enormous potential of creating predictive models that are geared towards improving diagnostic/prognostic accuracy as well as therapy selection capabilities. However, a larger amount of available information comes with a larger risk of data overload and suboptimal utilization of the information. Specifically, the increase in data does not always translate into improved diagnosis/treatment selection. Hence, there is a need for clinically motivated predictive models and data mining algorithms that extract the key, actionable information from the large amount of data in order to ensure improved patient outcomes.

Along with the explosion of information, recently there has been a paradigm shift from evidence based medicine to personalized medicine. Earlier optimal therapy selection based on populations e.g. If a patient belonged to a homogenous category such as T2 stage, node negative, non-metastatic, non-small cell lung cancer, the best treatment was selected on clinical trials for the various medications on the same population. Historically, treatment is identical for all members of this patient cohort. While this approach was developed to utilize the statistical power of significantly large sample of a relatively homogeneous group of patients, it ignores the heterogeneity of the individuals within the cohort. This is slowly being replaced by personalized predictive models utilize all available information from each patient (exams, demographics, imaging, lab, genomic etc.) to identify optimal therapy in an individualized manner. This approach improves outcomes because it exploits more detailed patient information to reduce uncertainty in predicting patient outcomes as a function of treatment.

 This finds applications in preventive care, diagnosis, therapy selection and monitoring. For example, a) predicting patients at risk of developing hypertension and preventing manifestation ahead of time with appropriate intervention (medications, diet, lifestyle changes etc.); b) improving the early detection of cancer in asymptomatic patient; c) selecting the optimal chemotherapy/radiation dosage or other therapy parameters based on patient characteristics. Chemotherapy is expensive with terrible side effects and often only works for less than 50% of the patients treated with it. Identifying the right subset of patients that can benefit from it reduces the costs and improves efficacy of the treatment. d) predicting patient response to a given medication or/and treatment: Often the outcomes of therapy manifest too late e.g. outcomes of chemo-radiation therapy in patients with non-small cell lung cancer may take many months to manifest. By monitoring surrogate markers, one may be able to predict poor outcomes early on and modify the therapy plan. Also by predicting patient response and adequate dosage for a given medication , undesirable possible drugs adverse side effects can be avoided. A good example of this is the recent work from the International Warfarin Pharmacogenetics Consortium (see references) on estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data.  

Need for predictive models

This paradigm shift is enabled by and is dependent on predictive methods which do not draw statistical power from large homogeneous patient cohorts. Using technologies similar to collaborative filtering one may be able to utilize two pieces of information in order to predict outcomes with high statistical power: a) detailed data about each patient and b) similarity of patient to others in the cohort. For example recommender systems (like book recommendation in Amazon) are able to utilize information about a) feature descriptors of each book; b) each user’s preferences based on their profiles and c) similarities in preferences across clusters of users to recommend additional items optimally. Similar technologies will be of tremendous benefit in personalized medicine. The practical utility of this modeling approach has not been well addressed either by the machine learning or the personalized medicine community.  Oncotype DX is a lab offering which enables therapy selection for breast cancer patients based on a predictive model which combines gene different expression scores from a tissue sample. Even this product ignores other information such as imaging and clinical history; incorporating all of this information in the models can further improve the patient outcomes. More recently, this has gained significant traction and researchers have been trying to address this problem and designing predictive models targeted at certain specific problem (see references). However, the problems are not yet fully understood and the models are far from mature.

Purpose of Workshop                                                                      

The purpose of this cross-discipline workshop is to bring together machine learning and healthcare researchers interested in problems and applications of predictive models in the field of personalized medicine. The goal of the workshop will be to bridge the gap between the theory of predictive models and the applications and needs of the healthcare community. There will be exchange of ideas, identification of important and challenging applications and discovery of possible synergies. Ideally this will spur discussion and collaboration between the two disciplines and result in collaborative grant submissions. The emphasis will be on the mathematical and engineering aspects of predictive models and how it relates to practical medical problems.

Although related in a broad sense, the workshop does not directly overlap with the fields of Bioinformatics and Biostatistics.  Although, predictive modeling for healthcare has been explored by biostatisticians for several decades, this workshop focuses on substantially different needs and problems that are better addressed by modern machine learning technologies. For example, how should we organize clinical trials to validate the clinical utility of predictive models for personalized therapy selection? The traditional bio-statistical approach for running trials on a large cohort of homogeneous patients would not suffice for the new paradigm and new methods are needed. On the other hand bioinformatics typically deals with the analysis of genomic and proteomic data to answer questions of relevance to basic science. For example, identification of sequences in the genome corresponding to genes, identification of gene regulatory networks etc. This workshop does not focus on issues of basic science; rather, we focus on predictive models that combine all available patient data (including imaging, pathology, lab, genomics etc.) to impact point of care decision making.

More recently, as part of American Re-investment and Recovery Act (ARRA), the US government set aside significant amounts of grant funds for cross-disciplinary research in use of information technology in improving health outcomes, quality of care and selection of therapy.

The workshop program will consist of presentations by invited speakers from both machine learning and personalized medicine fields and by authors of extended abstracts submitted to the workshop. In addition, there will be a slot for a panel discussion to identify important problems, applications and synergies between the two scientific disciplines.