Advances in synthesis and sequencing technologies have made DNA macromolecules an attractive medium for digital information storage. We address both theoretical and practical aspects of coding and signal processing for DNA storage, primarily focusing on:
Channel coding: error-correction and constrained coding for DNA data storage
Sequence reconstruction problem for data recovery
Efficient design of binary and non-binary codes in the Levenshtein distance (edit errors, i.e., deletions, insertions, and substitutions) or the Damerau distance (edits error and adjacent transpositions)
Efficient DNA synthesis and DNA sequencing technologies: Illumina synthesis and Nanopore sequencing
DNA labelling, and DNA tagging technologies
Composite DNA
Constrained codes for DNA storage and DNA computing: runlength limitted (RLL) constraint, AT/GC-content constraint, DNA secondary structure avoidence
Research updates are available at: My Research Gate or My Google Scholar
Brief introduction on storing information in DNA by Twist Bioscience, available on youtube at
We study properties and constructions of constrained binary codes for emerging applications in digital information storages and communication systems, particularly focusing on the following areas:
Balanced, almost-balanced binary codes
Constant weight binary codes: low weight codes, high weight codes, fixed-length and variable-length Knuth balancing schemes
Energy-Harvesting codes for simultaneous energy and information transfer
Sliding Window-Constrained Codes
Binary constrained codes in Thermal-aware Communication (i.e., temperature control systems, power and heat control systems, etc)
Two-dimensional (2D) (or multi-dimensional) constrained codes
Low-weight binary codes (1D & 2D codes) and applications in ReRAM
Research updates are available at: My Research Gate or My Google Scholar
We are interested in efficient designs of binary and non-binary codes in the Levenshtein distance (edit errors, i.e., deletions, insertions, and substitutions) or the Damerau distance (edits error and adjacent transpositions). As these errors model many real-world noise processes, they have broad applications across theory and practice, including data storage and communication systems inlcuding synchronisation error channels, magnetic recording systems, or mobile data, bioinformatics and computational biology (such as sequence alignment, variant detection), and natural language processing (NLP), where it underlies spelling-error modelling and automated word-correction techniques. Research interests include:
Binary codes, non-binary codes
Permutation codes, multi-permutation codes, rank modulation
Burst error-correcting codes, codes correcting tandem repeats (tandem duplications)
Segmented error-correcting codes (Blocks, subblocks, sliding window error-correction)
Efficient design of ECCs for emerging applications in non-volatile memory technologies (such as DNA storage, flash memory, ReRAM)
Research updates are available at: My Research Gate or My Google Scholar