Dr. Yazhe Li

Contact

yazhe.li14@gmail.com

Qualifications

PhD in Statistics, Imperial College London (2020)

MSc in Statistics, University of California Davis (2015)

BSc in Applied Math, Shanghai University (2014)

Research Interest

Statistical Modeling in Finance, Machine Learning and Consumer Credit Risk

Papers

Yazhe Li, Niall Adams and Tony Bellotti . A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression. Journal of Computational and Graphical Statistics (2021 accepted) doi.org/10.1080/10618600.2021.1978470

Abstract: Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by Owen has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this paper, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assign new labels to the minority class observations. An Expectation-Maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on simulated and real data sets.

Yazhe Li, Tony Bellotti, Niall Adams. Issues using logistic regression with class imbalance, with a case study from credit risk modelling. Foundations of Data Science, 2019, 1 (4): 389-417. doi: 10.3934/fods.2019016

Abstract: The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than the majority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen has shown that, in a theoretical context related to infinite imbalance, logistic regression behaves in such a way that all data in the rare class can be replaced by their mean vector to achieve the same coefficient estimates. We build on Owen's results to show the phenomenon remains true for both weighted and penalized likelihood methods. Such results suggest that problems may occur if there is structure within the rare class that is not captured by the mean vector. We demonstrate this problem and suggest a relabelling solution based on clustering the minority class. In a simulation and a real mortgage dataset, we show that logistic regression is not able to provide the best out-of-sample predictive performance and that an approach that is able to model underlying structure in the minority class is often superior.

Talks

Credit Scoring and Credit Control XV conference, Aug 2017, Edinburgh, UK,

Machine learning performance over long time horizons

11th International Conference on Computational and Financial Econometric, Dec 2017, London, UK

29th European Conference on Operational Research, Jul 2018, Valencia, Spain

A relabeling approach to handling the class imbalance problem in consumer credit risk modeling

Credit Scoring and Credit Control XVI conference, Aug 2019, Edinburgh, UK

Clustering Defaults into Different Groups to Improve Default Model Fit and Predictive Performance

I also have two intern experience in BAML and CME group before. You can find my CV here.

Page updated

Report abuse