Sharma Thankachan

My research aims to develop the theoretical foundations for efficiently processing large-scale data in the form of strings (also known as text or sequences). This type of data is prevalent across many domains, but is especially critical in computational biology, where it is used to analyze DNA and other biological sequences. As these datasets become increasingly large and repetitive, there is a growing need for tools that are both computationally and memory-efficient. This need motivates the use of succinct (or compressed) data structures, which store information in space close to the theoretical minimum while still supporting fast queries and updates. By combining techniques from string algorithms and succinct data structures, I develop methods for indexing, searching, and analyzing compressed data -- including highly repetitive DNA sequences and complex graph-based models such as pan-genome graphs, which capture genetic variation across populations.

My research has been primarily supported by the U.S. National Science Foundation, including the prestigious NSF CAREER Award in 2022. Currently, I lead a summer Research Experiences for Undergraduates (REU) Site at NC State University.

Selected Publications

Below is a list of selected publications. See DBLP or Google Scholar for the complete list.

O. Kulekci, M. Parthasarathi, R. Shah and S. V. Thankachan: Relative Compressed Reverse Suffix Array (STACS 2026)

D. Gibney, J. Huffstutler, M. Parthasarathi, S. V. Thankachan: Repetition Aware Text Indexing for Matching Patterns with Wildcards (ICALP 2025)

D. Gibney, C. Jin, T. Kociumaka and S. V. Thankachan: Near-Optimal Quantum Algorithms for Bounded Edit Distance and Lempel-Ziv Factorization (SODA 2024)

A. Banerjee, D. Gibney, S. V. Thankachan: Longest Common Substring with Gaps and Related Problems (ESA 2024)

A. Ganguly, R. Shah, S. V. Thankachan: Fully Functional Parameterized Suffix Trees in Compact Space (ICALP 2022)

C. Jain, D. Gibney, S. V. Thankachan: Co-linear Chaining with Overlaps and Gap Costs (RECOMB 2022)

D. Gibney, S. V. Thankachan, S. Aluru: The Complexity of Approximate Pattern Matching on de Bruijn Graphs (RECOMB 2022)

A. Ganguly, D. Patel, R. Shah, S. V. Thankachan: LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching (ICALP 2021)

D. Gibney, G. Hoppenworth, S. V. Thankachan: Simple Reductions from Formula-SAT to Pattern Matching on Labeled Graphs and Subtree Isomorphism (SOSA 2021)

J. Bentley, D. Gibney, S. V. Thankachan: On the Complexity of BWT-Runs Minimization via Alphabet Reordering (ESA 2020)

G. Hoppenworth, J. Bentley, D. Gibney, S. Thankachan: The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance (ESA 2020)

D. Gibney and S.V. Thankachan: On the Hardness and Inapproximability of Recognizing Wheeler Graphs (ESA 2019)

S. V. Thankachan, C. Aluru, S. P. Chockalingam, S. Aluru: Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis (RECOMB 2018)

A. Ganguly, R. Shah, Sharma V. Thankachan: pBWT: Achieving Succinct Data Structures for Parameterized Pattern Matching and Related Problems (SODA 2017)

S. P. Chockalingam, S. V. Thankachan, S. Aluru: A parallel algorithm for finding all pairs k-mismatch maximal common substrings (Super Computing 2016)

S. Aluru, A. Apostolico, S. V. Thankachan: Efficient Alignment Free Sequence Comparison with Bounded Mismatches (RECOMB 2015)

W. K. Hon, R. Shah, S. V. Thankachan, J. S. Vitter: Space-Efficient Frameworks for Top-k String Retrieval (Journal of the ACM 2014)

R. Shah, C. Sheng, S. V. Thankachan, J. S. Vitter: Top-k Document Retrieval in External Memory (ESA 2013)

M. Patil, S. V. Thankachan, R. Shah, W. K. Hon, J. S. Vitter, S. Chandrasekaran: Inverted indexes for phrases and strings (SIGIR 2011)