My research aims to develop the theoretical foundations for efficiently processing large-scale data in the form of strings (also known as text or sequences). This type of data is prevalent across many domains, but is especially critical in computational biology, where it is used to analyze DNA and other biological sequences. As these datasets become increasingly large and repetitive, there is a growing need for tools that are both computationally and memory-efficient. This need motivates the use of succinct (or compressed) data structures, which store information in space close to the theoretical minimum while still supporting fast queries and updates. By combining techniques from string algorithms and succinct data structures, I develop methods for indexing, searching, and analyzing compressed data -- including highly repetitive DNA sequences and complex graph-based models such as pan-genome graphs, which capture genetic variation across populations.
My research has been primarily supported by the U.S. National Science Foundation, including the prestigious NSF CAREER Award in 2022. Currently, I lead a summer Research Experiences for Undergraduates (REU) Site at NC State University.