I am a PhD student in the Theory and MISL groups at the Paul G. Allen School for Computer Science & Engineering at the University of Washington (UW)I research algorithms for large-scale data science and statistical estimation problems. My dissertation will be on "New Algorithmic Tools for Distributed Similarity Search and Edge Estimation." I expect to graduate in Spring 2018, and I am fortunate to be advised by Paul Beame and to work closely with Luis CezeKarin Strauss, and Sergey Yekhanin. I previously obtained a BS in Computer Science from the Department of Computer Science at the University of Illinois Urbana-Champaign (UIUC).

I just finished interning, for the second time, at Microsoft Research AI (MSR AI), working with both the DNA Storage and the Theory groups on computational challenges associated with storing digital data in synthetic DNA. I have also interned at Cray, where I worked on distributed graph algorithms that have been deployed as part of the Cray Graph Engine. My research and industry projects center around the theme of designing new distributed algorithms for supercomputers or compute clusters with the goal of increasing data throughput. I enjoy finding and exploring connections between modern massive data problems and classical theoretical computer science and combinatorics areas.

Born and raised in Chicago, IL, I appreciate midwestern hospitality, deep dish pizza, and (sometimes) being cold in the winter.

I am currently looking for Industry or Academic Research positions, starting Summer or Fall 2018.

CV (pdf, Apr. 2018)

Research Statement (pdf, Apr. 2018)

Research Overview (see my research statement and publications for more details)

I design, implement, and analyze algorithms for data-driven applications. Throughout my Ph.D. research
and during my industry internships, I have worked on similarity search, clustering, and sublinear query
algorithms, and also on lower bounds for distributed and sequential models. Overall, I am interested in
both theoretical analysis, including complexity lower bounds, and in efficient empirical solutions.

This past year, I have been involved in a DNA data storage endeavor between Microsoft Research and UW.
DNA data storage has emerged as a promising new technology, with orders of magnitude improvement
in information density and durability. I focus on computational challenges in retrieving data from DNA,
and I have developed a new distributed clustering algorithm. This complements my previous theoretical
work on computing a distributed similarity join (finding all pairs of similar vectors in a dataset).

Clustering and similarity joins represent daunting tasks for massive datasets, since algorithms often scale
quadratically with the input size. In DNA data storage, retrieving gigabytes requires clustering billions of
strings, so nearly-linear scaling is imperative. Pairwise similarity problems are also widely encountered in
other areas, such as recommender systems, data deduplication, and multimedia search engines. Better
algorithms have broad impact, connecting people to each other and reducing data center redundancy.

In addition to these projects, I have worked on sublinear query algorithms for graph parameter estimation
and lower bounds for bounded-depth circuits. As a researcher, I seek out questions that have practical
impact and shed new light on classical areas of discrete mathematics and theoretical computer science.