This page describes my research experience at Brandeis University and Tufts University. For industry and public sector research experience, please see Work Experience.
Table of Contents
Investigating Model Selection for LLMs and More
Reinforcement Learning Applied to Neural Network Adaptation
Studying Net2Net, Analyzing Alibaba GPU Traces, and Learning Identity Layers
Continued Study of Automated HPO and DNN Scaling
Study of Resource Adaptive Machine Learning
Alibaba Trace Analysis
Fluid Solver for Incompressible Flow in a Rectangular Domain with Arbitrarily-placed Inclusions
AAU Higher Education Job Postings Visualization Tool
Under the supervision of Professor Fahad Dogar, I studied model selection in the context of Large Language Models (LLMs). In particular, we identified that even on a challenging LLM benchmark, cheaper LLMs can perform just as well as higher tier LLMs on some questions. With an intelligent model selection strategy, one could identify the cheapest model that could answer a question. That way, a user could save cost overall. We tried a variety of strategies for designing such a strategy and our final strategy, given the correct parameters, provides a reasonable tradeoff between cost and performance between GPT-3.5 (low tier, cheap) and GPT-4 (top tier, expensive).
The code and report for this work is contained in a private repository.
In part of doing this work, I worked on implementing early versions of our model selection strategy into a serverless-based LLM proxy service.This has been included in work submitted for publication. More details will be added later.
During this time I also assisted in the writing of When will my ML Job finish? Toward providing Completion Time Estimates through Predictability-Centric Scheduling with Abdullah Bin Faisal, Noah Martin, Dr. Hafiz Mohsin Bashir, and Dr. Fahad Dogar. This paper was accepted at the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2024). I made the majority of the figures and plots and contributed to the writing of some sections and preparation of the artifacts.
The design of cluster schedulers for deep learning jobs has evolved to take into account neural network training behavior. We believe there is an opportunity for schedulers to adapt model architectures based on resource availability. In this work, we present a reinforcement learning agent that informs a neural network trainer on how to increase the complexity of a DNN during training so as to produce a better final model. This would be used in scenarios where a deep learning job receives extra GPUs during training. Our initial evaluation shows that our agent is able to provide good adaptation policies at different stages of training but not for different starting models.
Under the supervision of Professor Fahad Dogar and PhD student Abdullah Bin Faisal, we continued our study of DNN scaling by heavily experimenting with the Net2Net adaptation technique. We expanded upon the original Net2Net work by Chen. et. al. and conducted experiments in PyTorch. We were interested in using Net2Net to expand models to take advantage of additional (preemptible) GPU resources.
To find realistic resource allocation patterns, we analyzed the Alibaba GPU dataset which monitors GPU usage in the Alibaba DNN training cluster. We found that there are GPU resources to spare at most times in the cluster motivating the possibility for increasing the complexity of DNN training jobs to provide users better models.
We also investigated the process of learning identity layers of different types. By identity layer, we mean a layer L such that given an input x, L(x) = x. For certain layer types, such as a linear layers, these are easy to derive. However, for more complex neural network layers (e.g. convolutional, recurrent) it is harder to formulate such layers. Identity layers are useful in expanding models while preserving performance (e.g. in Net2Net).
The code and report for this work are contained in private repositories.
Under the supervision of Professor Fahad Dogar and PhD student Abdullah Bin Faisal, we continued our study of automated hyperparameter optimization (HPO) and studied different strategies for DNN scaling.
For automated HPO, we delved into the ASHA algorithm. We had determined from our previous work that it out performed Bayesian Optimization even in resource constrained scenarios. We compared ASHA under different resource scenarios and in comparison to an oracle HPO algorithm. An oracle algorithm is one that (unrealistically) knows how well all of the hyperparameter configurations perform at any given time.
We transfered our study of DNN scaling to CNNs. We were focused on identifying work that provided some guarantees regarding performance when models are modified during the training process. We studied strategies that do layer weight freezing (e.g. Hydro) and ones that do model quantization (e.g. Egeria). These are both strategies for reducing the computational load of models during training, which would be useful for adapting training jobs when resources (GPUs) are reduced.
The code and report for this work are contained in private repositories.
Under the supervision of Professor Fahad Dogar and PhD student Abdullah Bin Faisal, we investigated opportunities for resource adaptive deep neural network (DNN) training. DNN training is typically done in clusters where resource availability for an individual training job can fluctuate. We were interested to see if there were ways in which DNN training jobs could be adapted to shrinking or growing availaibility. We studied this in two scenarios.
Scenario 1: We studied the possibility for individual job adaptation in the context of Graph Neural Networks (GNNs) which are a hot area of research in the ML community. In particular, we were interested in the possibility of modifying GNNs' aggregator function based on resource availability. The aggregator function can vary in complexity from simple aggregation functions (e.g. mean) to DNNs themselves.
Scenario 2: We studied the possibility for adapting automated HPO algorithms based on resource availability. We were interested if algorithms that benefited from mutliple GPUs (e.g. ASHA) would suffer in comparison to smarter strategies (e.g. Bayesian Optimization) on fewer GPUs.
The code and report for this work is contained in a private repository.
We presented this work at the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI) Poster Session.
Linda Zhao and I conducted a visual analysis of the Alibaba trace dataset. This was done under the supervision of Professor Raja Sambasivan and PhD student Darby Huye. The abstract of our report is given below and the full report is available here.
Microservice design of cloud applications has been an area of academic research for some time. This has been motivated by more and more cloud applications being implemented as collections of lightweight microservices run on top of distributed systems. In this work, we seek to obtain a better understanding of Alibaba’s microservice architecture via an analysis of the well-known Alibaba trace dataset. Our work provides some preliminary analysis of the architecture as well as identification of the various problems in the dataset that would need to be addressed in future research on this topic.
This work was used in a followup published paper authored by Darby where I am acknowledged here.
For my senior honors thesis, I worked with Professor Thomas Fai to develop an algorithm to approximate fluid velocity in varying spatial domains using the Navier-Stokes equations.
The abstract of my thesis is given below and the full thesis is available here.
A template which I prepared for future students to format their theses according to departmental requirements is posted here.
Microfluidic devices are used to process fluids at a small scale. Some of these devices contain physical obstacles that are used to manipulate the flow of fluid through the device. In our work, we implement a Python program for approximating the velocity of an incompressible fluid over a rectangular domain using the Navier-Stokes equations. Our strategy, which is built primarily on common finite difference methods, allows users to easily add obstacles into the domain and observe their effects on fluid flow. In the first half of the thesis, we describe the derivation, implementation, and benchmarking of our strategy. In the latter half, we describe some numerical experiments that we performed with our program to observe the effects of inclusions on both fluid flow and hydraulic resistance. We compare the hydraulic resistance reported by our program with an analytic approximation. Lastly, we briefly mention some possible expansions on our work.
Under the supervision of Professor Antonella Di Lillo and Dr. Solomon Garber, I worked with two undergraduate students to develop a web application that allows users to study higher-education job postings in a variety of modes. Our work was done in collaboration with Dr. Jessica Liebowitz as part of the AAU PhD Education Initiative. The goal was to create a public online tool for people to use in their search for jobs and careers. Our web application allows users to view job counts and top skills based on career area, job title, job type, and additional fields. The users make a variety of input selections, then the server performs a dynamically constructed SQL query and returns the results to be visualized client side.