TalkMining the biomedical literature might be even harder than we thought...
Dr. David Lipman is the Director of the National Center for Biotechnology Information (NCBI), a major R&D division of the National Library of Medicine within the National Institutes of Health (NIH). He was appointed as NCBI’s first Director in 1989, shortly after Congress created the Center in 1988, and has overseen its growth into one of the most heavily used resources in the world for the search and retrieval of biomedical information, with almost 3 million users each day. NCBI has a leadership role in conducting basic research in computational molecular biology and in storing, annotating and making accessible biomedical information and genetic data emanating from research conducted at NIH and laboratories around the world. Among NCBI’s approximately 100 databases are GenBank (DNA sequences), PubMed (abstracts and citations of published biomedical literature), PubMed Central (full text of biomedical research articles) and dbGaP (Genome-Wide Association Studies and other phenotype and genotype data). A native of Rochester, New York, Dr. Lipman obtained a B.A. in Biology from Brown University in 1976 and an M.D. from the State University of New York at Buffalo in 1980. After medical training, Dr. Lipman joined the Mathematical Research Branch of the National Institute of Diabetes, Digestive, and Kidney Diseases (NIDDK) at NIH as a Research Fellow, studying molecular evolution and developing computational tools for sequence comparison. Dr. Lipman is one of the developers of the original BLAST (Basic Local Alignment Search Tool) algorithm for rapidly identifying biological sequences that are similar to a queried sequence. Dr. Lipman is the recipient of numerous awards and is an elected member of the National Academy of Sciences, the Institute of Medicine, the American Academy of Arts and Sciences, and the American College of Medical Informatics.
Probabilistic Approaches for Discovery and Prediction from Clinical Temporal Data
Physiological data are routinely recorded in intensive care, but their use for rapid assessment of illness severity has been limited. The data is high-dimensional, noisy, and changes rapidly; moreover, small changes that
occur in a patient's physiology over long periods of time are difficult to detect, yet can lead to catastrophic outcomes. A physician’s ability to recognize complex patterns across these high-dimensional measurements is limited. We propose nonparametric Bayesian methods for discovering informative representations in such continuous time series that aids both exploratory data analysis and feature construction. When applied to data from premature infants in the neonatal ICU (NICU), our model obtains novel clinical insights. Based on these insights, we devised the Physiscore, a novel risk prediction score that combines patterns from continuous physiological signals to predict infants at risk for developing major complications in the NICU. Using only 3 hours of non-invasive data from birth, Physiscore very successfully predicts morbidity in preterm infants. Physiscore performed consistently better than other neonatal scoring systems, including the Apgar, which is the current standard of care, and SNAP, a statistical score that requires multiple invasive tests. This work was published on the cover of Science Translational Medicine (Science's new journal aimed at translational medicine work), and was covered by numerous press sources.
Suchi Saria recently joined Johns Hopkins as an Assistant Professor in the departments of Computer Science and Health Policy & Management. She received her PhD in Computer Science from Stanford University working with Prof. Daphne Koller. She has won various awards including a Best Student Paper, a Best Student Paper finalist, the Rambus Fellowship, the Microsoft full scholarship and the National Science Foundation Computing Innovation Fellowship. Her research interests include inference and prediction in heterogeneous, high-dimensional dynamical systems, graphical models, machine learning and computational healthcare. She develops novel ways to capture and analyze our interactions with the health care system to help identify ways to improve the delivery of care. Her research has been featured on the cover of Science Translational Medicine (AAAS/Science press) and in national and international press outlets including CBS Radio, Science NOW, and France's national newspaper Le Monde.
TalkThe TREC Medical Records Track
The Text REtrieval conference (TREC) (http://trec.nist.gov) is an on-going workshop series designed to create the infrastructure needed for large-scale evaluation of search and other information access technologies. TREC's Medical Records track, now concluding its second year, focuses on the problem of providing access to the information contained in the free-text notes fields of electronic health records (EHRs). While these "unstructured" fields contain the bulk of the information in an EHR, standard text processing techniques do not work well for these fields due to their specialized vocabulary and elliptical discourse structure. New techniques that support matching on semantic content within EHRs will enhance clinical care and greatly expand the usefulness of electronic records in areas such as medical trials and epidemiological studies.
Ellen Voorhees is a computer scientist in the Information Access Division of the National Institute of Standards and Technology (NIST). Her primary responsibility at NIST is to manage the Text REtrieval Conference (TREC) project. She received a B.Sc. in computer science from the Pennsylvania State University, and M.Sc. and Ph.D. degrees in computer science from Cornell University. Her research interests include information retrieval and natural language processing, especially developing appropriate evaluation schemes to measure system effectiveness.
Lost in Publication: Application of Text Mining to Information Access and Database Curation
The explosion of biomedical information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge—mostly captured as free text in journal articles—and the interdisciplinary nature of biomedical research also presents a grand new challenge: how can scientists and health care professionals find and assimilate all the publications relevant to their research and practice? In this regard, in the first part of the talk, I will present our research on text mining and its application for improved information access for not only the worldwide scientific community but also the rapidly growing number of health consumers on the Internet. Real-world use cases of text mining research in two widely accessed Websites (PubMed and PubMed Health) will be demonstrated as showcases. Next, I will present our effort on machine-assisted database curation, with a focus on our recent experience in BioCreative, a community-based worldwide challenge event in biomedical text mining. More specifically, I will discuss various issues (e.g. how to identify user needs) that are often overlooked in building text-mining tools that are aimed to provide practical benefits to end users (human curators).
Dr. Lu is one of the first Earl Stadtman investigators at the National Institutes of Health, where he joined immediately after earning a PhD in Biomedical Informatics at the University of Colorado School of Medicine. The primary goal of his research is to develop computational methods to better understand the natural language in biomedical text. Currently, his lab focuses on applying their text-mining methods for improving information access, assisting manual database curation, and accelerating drug discovery. Dr. Lu is involved in the organization of several international scientific meetings including BioCreative, Pacific Symposium on Biocomputing (PSB), and International Conference on Healthcare Informatics (ICHI).