Our laboratory is focusing on making biological discoveries through the computational analysis of explosively increasing biological data.We are also developing new mathematical life models that are necessary for such analysis

Since the first successes in the 1990s, researchers have succeeded in decoding the full genome of thousands of species. The information generated from those efforts is not limited to genome sequences, but also includes other building blocks of life such as RNA, proteins, metabolites, and DNA modifications. However, integrated analysis of such extremely heterogeneous data has only just begun, and many problems await solutions. We are applying statistical techniques to detect faint signals in the noise that will lead to a deeper understanding of life.

Function and evolution of RNA structures

Various RNA molecules such as messenger RNA, transfer RNA, and micro RNA are involved in the expression of proteins. Most RNA molecules form secondary structures through base pairs such as A-U and C-G. The stabilizing energies of secondary structures are relatively large, and have a significant impact on the regulation and efficiency of gene expression. There exist very accurate models of RNA secondary structures that use a concept from the information sciences called stochastic context-free grammar, which allow for computer-based investigations of RNA structures. By intensively using such models, we are studying various biological processes involving RNA, such as molecular interactions of micro RNA and RNA-binding proteins, alternative splicing, and messenger RNA translation (Fig. 1). We are also investigating RNA structural evolution using genome sequences of human and vertebrate populations

Fig. 1: Genomic-scale sequence analysis using the software tool Raccess to calculate RNA accessibility.

Raccess is useful for determining which region forms exposed secondary structures like the upper one.

Evolution of cancer genomes

Cancers are diseases in which cells multiply uncontrollably, and are often caused by accumulation of DNA mutations. In many types of cancer, each cell division causes various types of mutations to genome sequences. Since such changes in cancer genomes are similar to the genome evolution during speciation, we can use various evolutionary and genetic tools to study cancer progression. We are using tools from population genetics such as Markov processes and the coalescent theory to estimate growth of cancer tissues. We are seeking for methods that allow for computing the probabilities of cancer metastasis or recurrence from the estimated quantitative data.

Bioinformatics of embryonic development and cell differentiation

Embryonic development in animals begins with the cleavage of fertilized eggs, followed by gastrulation and mesoderm differentiation, which results in the formation of organs, bones, and muscles. Such macroscopic changes of animal morphology are precisely controlled through complex interactions between transcription networks and signaling molecules. However, the technologies for making predictions about those mechanisms from cell-level data such as transcription factor bindings and histone modifications are still in its infancy. We are developing methods that combine differential equations for embryonic development from mathematical biology with Bayesian analysis of gene regulatory networks from bioinformatics, in order to associate macroscopic stages of embryonic development with microscopic sequencing data. We are aiming to simulate animal developmental processes by using sequencing data.