2025-03: Muyao will present Probability-based sequence comparison finds the oldest ever nuclear mitochondrial DNA segments in mammalian genomes at RECOMB-CG!
2025-03: Martin will present A simple way to find related sequences with position-specific probabilities at RECOMB-CG!
2025-03: Further varieties of ancient endogenous retrovirus in human DNA was published in Mobile DNA!
2024-11: The statistics of parametrized syncmers in a simple mutation process without spurious matches was published in Journal of Computational Biology!
2024-10: Muyao Huang-san, Xinyi Liu-san, and Mariko Nakagawa-san will present their work at APBJC 2024! Huangさん won an audience choice award!
2024-09: Muyao Huang-san's NuMTs are in the UCSC genome browser!
2024-08: A simple method for finding related sequences by adding probabilities of alternative alignments (preprint) was published in Genome Research!
Research summary 2022: Our aim is to find interesting and useful information in genetic sequences, and to develop algorithmic and mathematical methods for this purpose. We recently discovered the oldest ever "protein fossils": segments of formerly protein-coding DNA, by sensitive probability-based analysis. This revealed a great diversity of transposable elements in vertebrate ancestors of the Paleozoic Era. We also collaborate with medical geneticists to understand complex chromosome rearrangements and tandem repeat expansions / contractions that cause disease. We discovered the cause of neuronal intranuclear inclusion disease: a tandem repeat expansion in a human-specific gene. In related work, we have detected recombination events between LINE and SINE repeat elements, showing that recombination of repeat elements generates somatic complexity in human genomes. Another project found significant non-existence of sequences in genomes and proteomes, providing clues about immune recognition and pathogen/host adaption. Finally, we are developing a mathematically-optimal way to sample big sequence data, so it can be analyzed quickly, based on minimally-overlapping words.
Official lab page at CBMS, University of Tokyo
We are not specialists in black box machine learning methods (e.g. deep learning), useful and wonderful though they are, because we aim to understand, not just predict.