Machine learning

My research in machine learning focuses on:

 See below for a list of pre-prints: 

(1) Why can neural language models solve next-word prediction? A mathematical perspective (pre-print, 2023)

Recently, deep learning has revolutionized the field of natural language processing, with neural language models proving to be very effective for next-word prediction. However, a rigorous theoretical explanation for their success in the context of formal language theory has not yet been developed, as it is unclear why neural language models can learn the combinatorial rules that govern the next-word prediction task. In this paper, we study a class of formal languages that can be used to model real-world examples of English sentences. We construct neural language models can solve the next-word prediction task in this context with zero error.

(2) On subsampling and inference for multi-parameter persistence homology (ICLR 2022 Workshop on Geometrical and Topological Representation Learning)

In topological data analysis, multi-parameter persistence homology is a framework for extracting topological information for point cloud datasets equipped with multiple filtrations. However, existing algorithms become computationally expensive for large datasets, limiting their applicability. In the single-parameter case, subsampling algorithms have been used to reduce the time complexity of computing persistence homology. Convergence properties of the persistence barcodes have also been established in this setting. We extend these results to the multiparameter persistence homology, and develop subsampling algorithms can be used to approximate the fibered barcode in this setting. We conduct experiments on the point cloud dataset ModelNet to demonstrate the efficiency of these algorithms. 

(3)  Sparse to dense and back: rethinking the lottery ticket search framework (joint with. Thalaiyasingam Ajanthan, Jaeho Lee, Tongliang Liu) 


Algorithms for pruning neural networks have been an active field of research, as sparse neural networks lead to speedups during training and inference. While traditional methods typically prune a dense network after training, recent work has established the existence of sparse networks (known as ``lottery tickets'') which can reach comparable performance after being trained from scratch. We propose an algorithm, SPADES, that applies ticket search algorithms to sparse networks that have been trained for a small number of epochs. Our key insight is that it is more effective to apply ticket search algorithms in this context, rather than to a network at initialization. Our experimental results show that SPADES yields improvements over traditional approaches to pruning at initialization on many commonly used architectures, including VGG and ResNet models trained on CIFAR-10, CIFAR-100, and Tiny ImageNet.