Home

The most important thing: For critical thinking, against popularity-based science, to save us from a dark age! (pdf)
[Extended version (pdf)]

2026-07: We're presenting our work at the IIBMP conference next month!
- DT-Patricia: Fast and Exact Global Similarity Search over Large-Scale Sequence Sets Using the Diagonal Transition Algorithm on a Patricia Tree
- Evolutionary Patterns of DNA Base Substitutions Across Eukaryotes
- Fast Gapped Alignment Improves Remote DNA Homology Search

2026-06: EvoSubster: a pipeline for evolutionary inference of single- and double-base substitution spectra was published in Bioinformatics Advances!

2026-03: Simple and thorough detection of related sequences with position-varying probabilities of substitutions, insertions, and deletions was published in the Journal of Computational Biology!

2026-03: 川口さん presents PatriciaWFA: パトリシア木とWavefront Algorithmを用いた厳密な大規模配列類似検索の高速化 at IPSJ SIG BIO, and wins 優秀プレゼンテーション prize!

2026-02: Draft genome sequence of an entomopathogenic fungus Amphichorda felina was published in Microbiology Resource Announcements!

2026-02: Probability-based sequence comparison finds pre-eutherian nuclear mitochondrial DNA segments in mammalian genomes was published in the Journal of Computational Biology!

We are a research group at the University of Tokyo. We use computers to study genetic sequences, which hold the recipes of life and traces of evolutionary history.

The Frith group was featured in the 2025-03 edition of our school magazine SOSEI (Japanese, English).

Research summary 2022: Our aim is to find interesting and useful information in genetic sequences, and to develop algorithmic and mathematical methods for this purpose. We recently discovered the oldest ever "protein fossils": segments of formerly protein-coding DNA, by sensitive probability-based analysis. This revealed a great diversity of transposable elements in vertebrate ancestors of the Paleozoic Era. We also collaborate with medical geneticists to understand complex chromosome rearrangements and tandem repeat expansions / contractions that cause disease. We discovered the cause of neuronal intranuclear inclusion disease: a tandem repeat expansion in a human-specific gene. In related work, we have detected recombination events between LINE and SINE repeat elements, showing that recombination of repeat elements generates somatic complexity in human genomes. Another project found significant non-existence of sequences in genomes and proteomes, providing clues about immune recognition and pathogen/host adaption. Finally, we are developing a mathematically-optimal way to sample big sequence data, so it can be analyzed quickly, based on minimally-overlapping words.

Official lab page at CBMS, University of Tokyo

We are not specialists in black box machine learning methods (e.g. deep learning), useful and wonderful though they are, because we aim to understand, not just predict.

Google Sites

Report abuse