Imbalance Trouble in Overparameterized Learning (NeurIPS 2022, 2021)
This presentation combines two of our papers related to learning from imbalanced data.
In the first half, we uncover an exact characterization of geometry of last-layer features and classifiers in deep-nets trained on data with label-imbalance.
In the second half, we present an algorithm that provably improves generalization of models trained with imbalance.
Phase transitions for One-vs-one and One-vs-all linear separability in multiclass Gaussian mixtures (ICASSP 2021)
Abstract: We study a fundamental statistical question in multiclass classification: When are data linearly separable? Unlike binary classification, linear separability in multiclass settings can be defined in different ways. Here, we focus on the so called one-vs-one (OvO) and one-vs-all (OvA) linear separability. We consider data generated from a Gaussian mixture model (GMM) in a linear asymptotic high-dimensional regime. In this setting, we prove that both the OvO and OvA separability undergo a sharp phase-transition as a function of the overparameterization ratio. We present precise formulae characterizing the phase transitions as a function of the data geometry and the number of classes. Existing results on binary classification follow as special cases of our new formulae. Numerical simulations verify the validity of the asymptotic predictions in finite dimensions.
Analytic Study of Double Descent in Binary Classification: The Impact of Loss (ISIT 2020)
Abstract: Extensive empirical evidence reveals that, for a wide range of different learning methods and datasets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In a recent paper [Zeyu,Kammoun,Thrampoulidis,2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this paper, we complement these results by extending them to GD with square loss. We show that the DD phenomenon persists, but we also identify several differences compared to logistic loss. This emphasizes that crucial features of DD curves (such as their transition threshold and global minima) depend both on the training data and on the learning algorithm. We further study the dependence of DD curves on the size of the training set. Similar to our earlier work, our results are analytic: we plot the DD curves by first deriving sharp asymptotics for the test error under Gaussian features. Albeit simple, the models permit a principled study of DD features, the outcomes of which theoretically corroborate related empirical findings occurring in more complex learning tasks.
Master's thesis: Bounds, Constructions and Implementation of Codes for Distributed Storage (ISIT 2017, USENIX FAST 2018, NCC 2018)