Research Overview

My research develops principled and practical solutions for learning from large-scale, corrupted, imbalanced and insufficiently labeled modern data.

Alternative Deep Methods

Existing deep methods are end-to-end learning systems that stack standardized operators into hierarchical structures, without much data-specific programming. Our recent work explores novel ideas to develop neural architectures and training algorithms via explicit data low-dimensional modeling.

  • Learning via Rate Reduction. We construct a neural network from emulating the gradient ascent scheme for optimizing the rate reduction function (an information-theoretical measure for learning diverse and discriminative representations [NeurIPS'20b]). All components of this “white box” network have precise optimization, statistical, and geometric interpretation [Arxiv'20] (see [JMLR'21] for a full version). As a particular example, we demonstrate that this network does not suffer from catastrophic forgetting for continual learning [CVPR'21a].

  • Learning via Sparse Coding. We explore using sparse convolutional modeling as a building block of deep neural networks, and demonstrate that such networks can be made robust to input perturbations in a principled manner [NeurIPS'22a].

  • Learning via Self-expression. (coming soon)

Deep & Over-Parameterized Methods

Contemporary practice of deep learning challenges wisdom and approaches in classical machine learning. Our work leverages the intrinsic structure of the data to develop theoretical understandings and practical algorithms for modern deep learning (and over-parameterized methods in general).

  • Generalization (Failure of Classical Wisdom). Classical wisdom predicts that generalization error due to model variance increases monotonically with model size. In contrast, modern practice suggests that performance of deep models improves with network width. We bridge this discrepancy by establishing the unimodal variance principle for generalization of over-parameterized neural networks [ICML'20a]. This may provide guideline to the design of more generalizable deep models.

  • Robustness (Failure of Classical Approaches). Error correction via a robust loss is a canonical approach (see Logan '1965, Candes-Tao '2005, etc.) with rigorous mathematical justifications and broad applications. Such an approach fails for over-parameterized models since they easily overfit to the corruptions. We develop a double over-parameterization model which leverages an implicit bias effect of discrepant learning rates to prevent overfitting with theoretical guarantees [NeurIPS'20a, ICML'22].

  • Architecture (Emergence of New Problems). The development of deep learning gives rise, perhaps for the first time in history, the need to train models beyond one or a few layers. This challenge is currently tackled by a variety of tricks and heuristics on the design of linear, nonlinear, and normalization layers. We argue through strong empirical evidences that the notion of isometry serves as a central principle for training very deep models [ICML'20b]. Such a principle may help people design new networks with much improved performance [NeurIPS'21a].

Self-Expressive Methods

Self-expressive methods are based on expressing a data point as a linear combination of the others. They have been widely applied for unsupervised learning of intrinsic low-dimensional structures from high-dimensional data.

  • Theory. We establish theoretical foundations for the correctness of self-expressive methods, so as to provide a mathematical understanding on the regimes in which such methods are applicable [ICML'15][JSTSP'18][ICCV'19][Preprint'19][ICML'21][ICLR'21].

  • Efficiency. We develop efficient and provably correct algorithms for handling large scale data, by using active-support [CVPR'16a][CVPR'16b], exemplar selection [ECCV'18], divide-and-conquer [Asilomar'16], dropout [CVPR'20], and key-query network [CVPR'21b] techniques. Our method obtains state-of-the-art performance on the MNIST dataset [Project page].

  • Robustness. We improve model robustness by developing techniques for handling gross corruptions [CVPR'17], missing entries [ICCV'19 workshop], imbalanced distribution [ECCV'18][TPAMI'20], etc.

  • Applications. To further enhance the effectiveness of the methods for real applications, we develop joint optimization frameworks that combine self-expressive methods with other learning modules such as deep feature extraction [CVPR'19] and affinity learning [TIP'17].