Invited Speakers

Bio: Byron is an assistant professor at Northeastern University in the Khoury College of Computer Sciences. He is the Director of the BS in Data Science program. He also hold an adjunct appointment at Brown University, where he is affiliated with the Center for Evidence Synthesis in Health. His research is in natural language processing and machine learning, with an emphasis on applications in health informatics. More broadly, He is interested in core machine learning and natural language processing issues: e.g., structured and unstructured classification techniques; neural models; semi-supervised learning methods; learning from imbalanced data; and learning from alternative forms of supervision. He tends to be most excited by interdisciplinary research that motivates technical questions by way of interesting applications.

Talk Abstract: Decisions about patient care should be supported by data. But most clinical evidence is stored as text and so not readily accessible. The body of such unstructured evidence is already vast and continues to grow at a breakneck pace. Physicians are overwhelmed by this torrent of data, making it impossible to inform treatment decisions on the basis of all current relevant evidence. NLP methods offer a potential means of helping them make better use of this data to inform treatment decisions, ultimately improving patient care.

In this talk, I will discuss a line of work on designing and implementing NLP tasks, corpora, and models intended to assist physicians and other domain experts in navigating and making sense of the biomedical literature, ultimately to support the practice of evidence-based medicine. I will focus specifically on our recent work on building models for inferring the key results from free-text reports of clinical trials. In addition, I will discuss work on summarization models intended to generate narrative (natural language) summaries of all published evidence pertaining to a particular clinical question, on-demand. I will highlight the NLP challenges that this poses—most notably, ensuring that generated summaries remain factually accurate.

Bio: Hoifung Poon is the Senior Director of Biomedical NLP at Microsoft Research and an affiliated professor at the University of Washington Medical School. He leads Project Hanover, with the overarching goal of structuring medical data for precision medicine. He has given tutorials on this topic at top conferences such as the Association for Computational Linguistics (ACL) and the Association for the Advancement of Artificial Intelligence (AAAI). His research spans a wide range of problems in machine learning and natural language processing (NLP), and his prior work has been recognized with Best Paper Awards from premier venues such as the North American Chapter of the Association for Computational Linguistics (NAACL), Empirical Methods in Natural Language Processing (EMNLP), and Uncertainty in AI (UAI). He received his Ph.D. in Computer Science and Engineering from the University of Washington, specializing in machine learning and NLP.

Talk Abstract: The advent of big data promises to revolutionize medicine by making it more personalized and effective, but big data also presents a grand challenge of information overload. For example, tumor sequencing has become routine in cancer treatment, yet interpreting the genomic data requires painstakingly curating knowledge from a vast biomedical literature, which grows by thousands of papers every day. Electronic medical records contain valuable information to speed up clinical trial recruitment and drug development, but curating such real-world evidence from clinical notes can take hours for a single patient. Natural language processing (NLP) can play a key role in interpreting big data for precision medicine. In particular, machine reading can help unlock knowledge from the text by substantially improving curation efficiency. However, standard supervised methods require labeled examples, which are expensive and time-consuming to produce at scale. In this talk, I'll present Project Hanover, where we overcome the annotation bottleneck by combining deep learning with probabilistic logic, and by exploiting self-supervision from readily available resources such as ontologies and databases. This enables us to extract knowledge from millions of publications, reason efficiently with the resulting knowledge graph by learning neural embeddings of biomedical entities and relations, and apply the extracted knowledge and learned embeddings to supporting precision oncology.