Li Yao

I was born and raised in Baotou, China.

I got my Master's degree in Machine Learning and Data Mining in Aalto University, Finland.

In 2017, I obtained my PhD from University of Montreal, Canada.

Ph.D. supervisor: Prof. Yoshua Bengio and Prof. Aaron Courville

Dissertation: Learning Visual Representations for Image Generation and Video Captioning

Externel examiner: Prof. Sanja Fidler

I am currently (since March 2021) taking the role of Senior Machine Learning Scientist at Hyperfine Research. At Hyperfine, we take a truly end-to-end approach in tackling medical imaging with our market-defining devices, a promising path to my quest of making radiology better with AI.

Previously, I was Principal Research Scientist at Enlitic, San Francisco, USA. (2017 - 2021) where the solution is solely software-driven with a broad coverage of all radiology.

My Github

My LinkedIn

A.I. and Medical Imaging

On the diminishing return of labeling clinical reports

Jean-Baptiste Lamare, Tobi Olatunji, Li Yao

EMNLP Clinical NLP Workshop (accepted, best paper award), 2020 [pdf]

Ample evidence suggests that better machine learning models may be steadily obtained by training on increasingly larger datasets on natural language processing (NLP) problems from non-medical domains. Whether the same holds true for medical NLP has by far not been thoroughly investigated. This work shows that this is indeed not always the case. We reveal the somehow counter-intuitive observation that performant medical NLP models may be obtained with small amount of labeled data, quite the opposite to the common belief, most likely due to the domain specificity of the problem. We show quantitatively the effect of training data size on a fixed test set composed of two of the largest public chest x-ray radiology report datasets on the task of abnormality classification. The trained models not only make use of the training data efficiently, but also outperform the current state-of-the-art rule-based systems by a significant margin.

An empirical study on machine learning model calibration for chest radiograph triage

Li Yao, Tobi Olatunji, Jean-Baptiste Lamare, and Ashwin Jadhav

Tech Report, 2020 [pdf]

Despite ongoing efforts of improving model performance, capturing model uncertainty remains a major challenge in medical imaging, an important subject that has been largely ignored by the community. This work borrows standard model calibration approaches and empirically demonstrates their effectiveness on medical imaging triage with labels automatically extracted by different natural language processing techniques. We show both the strength and weakness of three different calibration methods using two sets of NLP labels. The tests are conducted on human-labelled ground truth. Although all methods yield comparable results, our proposed approach further improves AUCs when paired with a strong NLP model that generates smooth labels.

Learning to estimate label uncertainty for automatic radiology report parsing

Tobi Olatunji, Li Yao

Med-NeurIPS (accepted), 2019 [pdf]

Bootstrapping labels from radiology reports has become the scalable alternative to provide inexpensive ground truth for medical imaging. Because of the domain specific nature, state-of-the-art report labeling tools are predominantly rule-based. These tools, however, typically yield a binary 0 or 1 prediction that indicates the presence or absence of abnormalities. These hard targets are then used as ground truth to train image models in the downstream, forcing models to express high degree of certainty even on cases where specificity is low. This could negatively impact the statistical efficiency of image models. We address such an issue by training a Bidirectional Long-Short Term Memory Network to augment heuristic-based discrete labels of X-ray reports from all body regions and achieve performance comparable or better than domain-specific NLP, but with additional uncertainty estimates which enable finer downstream image model training.

Analysis of focal loss with noisy labels

Li Yao, Ashwin Jadhav

Workshop contribution for Medical Imaging at NeurIPS, 2019 [pdf]

Despite rapid progress in interpreting perceptual data, machine learning models have had relatively limited success in medical imaging, arguably due to the costly and sometimes ambiguous nature of obtaining reliable medical labels. The recent trend has been to derive weak and often noisy labels from medical reports via automated natural language processing. We demonstrate in both theory and in experiments the negative impact of noisy labels on the most commonly used family of cost functions for classification. Furthermore we propose a simple remedy for label correction that minimizes such impact to robust model training.

Caveats in Generating Medical Imaging Labels from Radiology Reports

Tobi Olatunji, Li Yao, Ben Covington, Alexander Rhodes, Anthony Upton

Workshop contribution for Medical Imaging with Deep Learning (MIDL), 2019 [pdf]

Acquiring high-quality annotations in medical imaging is usually a costly process. Automatic label extraction with natural language processing (NLP) has emerged as a promising workaround to bypass the need of expert annotation. Despite the convenience, the limitation of such an approximation has not been carefully examined and is not well understood. With a challenging set of 1,000 chest X-ray studies and their corresponding radiology reports, we show that there exists a surprisingly large discrepancy between what radiologists visually perceive and what they clinically report. Furthermore, with inherently flawed report as ground truth, the state-of-the-art medical NLP fails to produce high-fidelity labels.

A Strong Baseline for Domain Adaptation and Generalization in Medical Imaging

Li Yao, Jordan Prosky, Ben Covington, Kevin Lyman

Workshop contribution for Medical Imaging with Deep Learning (MIDL), 2019 [pdf]

This work provides a strong baseline for the problem of multi-source multi-target domain adaptation and generalization in medical imaging. Using a diverse collection of ten chest X-ray datasets, we empirically demonstrate the benefits of training medical imaging deep learning models on varied patient populations for generalization to out-of-sample domains.

Efficient and Accurate Abnormality Mining from Radiology Reports with Customized False Positive Reduction

Nithya Attaluri, Ahmed Nasir, Carolynne Powe, Harold Racz, Ben Covington, Li Yao, Jordan Prosky, Eric Poblenz, Tobi Olatunji, Kevin Lyman

arXiv preprint 2018 [pdf]

Obtaining datasets labeled to facilitate model development is a challenge for most machine learning tasks. The difficulty is heightened for medical imaging, where data itself is limited in accessibility and labeling requires costly time and effort by trained medical specialists. Medical imaging studies, however, are often accompanied by a medical report produced by a radiologist, identifying important features on the corresponding scan for other physicians not specifically trained in radiology. We propose a methodology for approximating image-level labels for radiology studies from associated reports using a general purpose language processing tool for medical concept extraction and sentiment analysis, and simple manually crafted heuristics for false positive reduction. Using this approach, we label more than 175,000 Head CT studies for the presence of 33 features indicative of 11 clinically relevant conditions. For 27 of the 30 keywords that yielded positive results (3 had no occurrences), the lower bound of the confidence intervals created to estimate the percentage of accurately labeled reports was above 85%, with the average being above 95%. Though noisier then manual labeling, these results suggest this method to be a viable means of labeling medical images at scale.

Weakly Supervised Medical Diagnosis and Localization from Multiple Resolutions

Li Yao, Jordan Prosky, Eric Poblenz, Ben Covington, Kevin Lyman

arXiv preprint 2018 [pdf]

Diagnostic imaging often requires the simultaneous identification of a multitude of findings of varied size and appearance. Beyond global indication of said findings, the prediction and display of localization information improves trust in and understanding of results when augmenting clinical workflow. Medical training data rarely includes more than global image-level labels as segmentations are time-consuming and expensive to collect. We introduce an approach to managing these practical constraints by applying a novel architecture which learns at multiple resolutions while generating saliency maps with weak supervision. Further, we parameterize the Log-Sum-Exp pooling function with a learnable lower-bounded adaptation (LSE-LBA) to build in a sharpness prior and better handle localizing abnormalities of different sizes using only image-level labels. Applying this approach to interpreting chest x-rays, we set the state of the art on 9 abnormalities in the NIH's CXR14 dataset while generating saliency maps with the highest resolution to date.

Learning to diagnose from scratch by exploiting dependencies among labels

Li Yao, Eric Poblenz, Dmitry Dagunts, Ben Covington, Devon Bernard, Kevin Lyman

arxiv preprint 2017 [pdf]

The field of medical diagnostics contains a wealth of challenges which closely resemble classical machine learning problems; practical constraints, however, complicate the translation of these endpoints naively into classical architectures. Many tasks in radiology, for example, are largely problems of multi-label classification wherein medical images are interpreted to indicate multiple present or suspected pathologies. Clinical settings drive the necessity for high accuracy simultaneously across a multitude of pathological outcomes and greatly limit the utility of tools which consider only a subset. This issue is exacerbated by a general scarcity of training data and maximizes the need to extract clinically relevant features from available samples -- ideally without the use of pre-trained models which may carry forward undesirable biases from tangentially related tasks. We present and evaluate a partial solution to these constraints in using LSTMs to leverage interdependencies among target labels in predicting 14 pathologic patterns from chest x-rays and establish state of the art results on the largest publicly available chest x-ray dataset from the NIH without pre-training. Furthermore, we propose and discuss alternative evaluation metrics and their relevance in clinical practice.

General Machine Learning, NLP, Computer Vision, Deep Learning

Delving Deeper into Convolutional Networks for Learning Video Representations

Nicolas Ballas, Li Yao, Chris Pal, Aaron Courville

International Conference of Learning Representations 2016 [pdf]

We introduced a way to integrate representation multiple layers of ConvNet, without blowing up the number of parameters. New SOTA in video captioning and competitive results on action recognition without using C3D.

Empirical upper bounds for image and video captioning

Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

International Conference of Learning Representations 2016 (workshop) [pdf]

Oracle Performance for Visual Captioning

Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio

British Machine Vision Conference (BMVC) 2016 (Oral) [pdf]

In light of recent progress in image and video captioning, this work constructs the trainable, model-based performance upper bounds on different datasets.

Describing Videos by Exploiting Temporal Structure

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville

International Conference of Computer Vision (ICCV15) [pdf] [code]


State-of-art results on Youtube2Text on video-to-caption generation, with 3D Conv.Net + LSTM

GSNs : Generative Stochastic Networks

Guillaume Alain, Yoshua Bengio, Li Yao, Jason Yosinski, Eric Thibodeau-Laufer, Saizheng Zhang, Pascal Vincent

submitted to JMLR [pdf]

A unified framework of GSNs, a new family of generative models in deep learning.

Iterative Neural Autoregressive Distribution Estimator (NADE-k)

T Raiko, L Yao, K Cho, Y Bengio

Neural Information Processing Systems (NIPS) 2014 [pdf] [code]

How do we push a Deep Orderless NADE further to get the state-of-the-art log-likelihood on both MNIST and Caltech-101.

On the Equivalence Between Deep NADE and Generative Stochastic Networks

L Yao, S Ozair, K Cho, Y Bengio

European Conference on Machine Learning ECML/PKDD 2014 [pdf]

How GSN is mathematically equivalent to a Deep Orderless NADE? We find out.

Generalized denoising auto-encoders as generative models

Y Bengio, L Yao, G Alain, P Vincent

Neural Information Processing Systems (NIPS) 2013 [pdf] [code]

You think denoising autoencoders are not generative models? Think again:)

Bounding the test log-likelihood of generative models

Y Bengio, L Yao, G Alain, P Vincent

International Conference on Learning Representations (ICLR) 2013 [pdf]

Generative stochastic networks does not have an analytical solution to the log-likelihood of the data. We show that it is possible to bound it.

Multimodal Transitions for Generative Stochastic Networks

S Ozair, L Yao, Y Bengio

NIPS Deep learning workshop, 2013 [pdf]

Generative stochastic networks made better by using a NADE model in the reconstruction conditional!.

Stacked calibration of off-policy policy evaluation for video game matchmaking

E Laufer, RC Ferrari, L Yao, O Delalleau, Y Bengio

IEEE Conference on Computational Intelligence in Games (CIG), 2013 [pdf]

Neural networks in UBISOFT games! See how players are matched together with neural networks.