Life science using computers

Since the first successes in the 1990s, researchers have succeeded in decoding the full genome of thousands of species, from microbes to human beings. The information generated from those efforts is related not only to genome sequences, but also other building blocks of life like RNA, proteins, metabolites, and DNA modifications. The enormous amount of data being created by the “omics” sciences through such investigations therefore creates an ever greater reliance on storage, search, and analysis techniques from the information sciences to make discoveries.

Established in the mid-20th century, the field of information sciences is relatively new. The amazing performance improvements in computational hardware over the past 20 years, however, has led to its emergence as the dominant agent of change not only in daily life, but also in many fields of scientific study. The life sciences are no exception, where methods from the information sciences are increasingly being adopted as data stores grow. I myself am a “dry” researcher who works solely with computers, as opposed to those who do “wet” work in labs, so techniques from the information sciences are indispensable to me. Every time I see new large datasets related to human-scale projects announced, I’m happy to know that the informatic methods have increased their power once again.

My primary interest lies in how we can extract the maximum amount of information from accumulated data. Although researchers performing the measurements are usually looking to solve specific problems related to life, the exhaustive nature of the produced data may allow us to discover totally unexpected biological truths in the data. Comparative analysis of the large datasets of unrelated research projects, too, can lead to discoveries not noticed by the original researchers. I hope that my work with technology can aid in breathing new life into old published data, helping to find profound biological truths within them.

I’m currently working primarily on developing highly sensitive methods for detecting signals hidden within genome sequences. I’m using technology to approach this problem from a number of angles, including using phylogenetic theories to discover functional sites, applying the concept of context-free grammars from the information sciences to the analysis of secondary structures of RNA, and using Bayesian statistical approaches to discover frequent sequential patterns. I hope that someday our current development efforts will change these massive datasets from seemingly random strings of letters into stories that tell the fundamental principles of life.

(Sept. 2011, Translated with modifications from “Sousei” Vol.18)