Home‎ > ‎Members‎ > ‎

Sohta Ishikawa - 石川 奏太

Postdoctral Researcher 
Research Fellow of the Japan Society for the Promotion of Science (PD)
School of Science, University of Tokyo
Iwasaki Wataru Lab

Please see below and check "CV_SAI.pdf"

My Research

1, Methodological studies for phylogenetic artifacts caused by compositional biases of sequences 

  • Performance assessment of RY-coding and non-homogeneous models in phylogenetic inferences from nucleotide sequences with significant compositional heterogeneity
  In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across lineages, are widely used. However, a homogeneous model-based analysis can yield an artifactual tree when our data exhibit heterogeneous base compositions among sequences. Potential artifacts stemming from compositional heterogeneity in tree reconstruction can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous (NH)’ models. The former approach converts four bases into two-state characters, purine (R) and pyrimidine (Y), to homogenize their compositions among sequences (Phillips and Penny, 2003). In contrast, compositional heterogeneity is explicitly incorporated in the latter approach by allocating free model parameters in a branch-by-branch fashion (Galtier and Gouy, 1998; Dutheil and Boussau, 2008). Although these approaches have been applied to several real-world data analyses, their basic properties have not been fully examined by pioneering simulation studies.

  In this study, I demonstrated the de facto first simulation to assess the performance of the maximum-likelihood phylogenetic analyses incorporating RY-coding and NH models under the presence of compositional heterogeneity. These two methods were applied to the analyses of the ‘4-taxon’ datasets bearing various degrees of the heterogeneity of adenine and thymine (AT) content. Both RY-coding and NH model-based analyses showed superior performance to reconstruct the true phylogenetic relationships against ~20% AT content difference among sequences, compared to a homogeneous model-based analysis. Nevertheless, I revealed that the accuracy of phylogenetic inference based on RY-coding, at least to some extent, depends on the substitution process that generated the sequence data of interest (e.g, transition/transversion ratio). Furthermore, the inferences from RY-coding-based analyses can be severely biased when the data-recoding cannot ameliorate complex patterns of compositional heterogeneity in the data. On the other hand, NH models appeared to be robust against all types of compositional heterogeneity examined in this study, and are widely applicable to phylogenetic analyses of various empirical datasets. For more information, please refer to Ishikawa, Inagaki, and Hashimoto. (2012) listed in my CV.

2, Computational challenges for the efficient parallelization of phylogenetic inferences with non-homogeneous models, on current supercomputing systems

  Recent advances in genome sequencing techniques enable us to phylogenetically analyze large matrices composed of hundreds of genes derived from diverse organisms. Such ‘phylogenomic analyses,’ however, are often influenced by the heterogeneity of base or amino-acid composition, codon usage, and substitution rate across genomes, or even within a genome. Non-homogeneous (NH) models are supposed to be critical to ameliorate the artifact from above systematic biases in phylogenomic analyses. Nevertheless, phylogenomic analyses have been conducted almost exclusively under homogeneous models for two reasons. Firstly, phylogenetic inferences based on NH models can be computationally much more intensive than homogeneous models, because the former models require an enormous amount of model parameters to be optimized. Secondly, all of the currently available phylogenetic codes, which are applied novel parallel computing techniques using a pile of CPUs (and GPUs), only implement homogeneous models. Therefore, it is urgent to build a new phylogenetic program incorporating efficient parallel computing methods with NH models.

  For this computational effort, I have collaborated with the laboratory for High Performance Computing Systems in University of Tsukuba, aiming to parallelize a phylogenetic program, ‘NHML’, which implements a NH model that allows the AT content to vary across lineages (Galtier and Gouy, 1998). A fine-grained parallelization by OpenMP was applied to the calculation of site-wise log-likelihoods (site-lnLs) for a given tree, while a coarse-grained parallelization by Message Passing Interface (MPI) was applied to the computation of alternative trees during the ML tree search based on the SPR method. In addition to this ‘Hybrid’ parallelization, I newly implemented a medium-grained parallelization by MPI—during the lnL calculation for a given tree, optimization of model parameters (e.g., equilibrium AT content on each branch), as well as branch lengths, can be assigned to different groups of MPI processes in parallel. The performance of the ‘multi-grained’ parallelization on NHML was benchmarked by analyzing simulation datasets including ~130 species and ~10,000 nucleotide positions. Consequently, I achieved suitable speedup (i.e., parallel efficiency >= 0.5) of the maximum-likelihood tree inference up to 64 computational nodes and 1,024 CPU cores on a supercomputer system, ‘T2K-Tsukuba’ (http://www.top500.org/system/176215) in Center for Computational Sciences, University of Tsukuba.

3, Detection of gene conversion (recombination) events among bacterial sequences, based on the phylogenetic methods

Bacteria have two paralogs of peptide-chain release factor, RF1 and RF2, which are different from each other in stop-codon recognition. The two RF families are generally expected to have taken independent evolutionary paths after they arose from a single gene-duplication event in the ancestral bacterial genome. However, my survey based on phylogenetic and statistical methods detected inter- or intra-genomic conversions between RF1 and RF2 genes in diverse bacterial genomes, which encompass a domain that has a key role in the interaction with the ribosome during translation termination process. Structural analyses suggested that conversions of the corresponding region are functionally neutral for both RF1 and RF2, implying that the frequency of 'partial' conversion between paralogous genes is higher than we generally assume. For more detailed information, please check Ishikawa, Kamikawa, and Inagaki (2015) listed in my CV.

4, Collaboration for the large-scale phylogenetic analyses

  In addition to the main research themes mentioned above, I have collaborated with a number of evolutionary biologists and worked on the global phylogeny of eukaryotes. Particularly, I had strong contribution in two big projects to elucidate the evolutionary affiliations of two novel microbial eukaryotes, Tsukubamonas globosa and Palipitomonas bilix. I took the initiative in operating the 157-protein-based phylogenomic analyses to determine the positions of T. globosa and P. bilix in the global phylogeny of eukaryotes. I also engaged in statistical analyses to investigate underlying systematic errors (e.g., long branch attraction, compositional biases, covarions).


Peer-reviewed Journal Papers

: Equally contributed authors

1.       Templeton T, Asada M, Jiratanh M, Sohta A. Ishikawa, Tiawsirisup S, Sivakumar T, Namangala B, Takeda M, Mohkaew K, Ngamjituea S, Inoue N, Sugimoto C, Inagaki Y, Suzuki Y, Yokoyama N, Kaewthamasorn M, Kaneko O. (2016), Ungulate malaria parasites. accepted to be published in Scientific Reports

2.       Sohta A. Ishikawa, Ryoma Kamikawa, Inagaki Y. (2015), Multiple conversion between the genes encoding bacterial class-I release factors. Scientific Reports, 5:12406.

3.      Kamikawa R, Tanifuji G, Sohta A. Ishikawa, Ishii K, Matsuo Y, Onodera N, Ishida K, Hashimoto T, Miyashita H, Mayama S, Inagaki Y. (2015), Proposal of a Twin-arginine translocator system–mediated constraint against loss of ATP synthase genes from nonphotosynthetic plastid genomes. Molecular Biology and Evolution, 32(10):2598–2604.

4.      Sohta A. Ishikawa, Nakao M, Inagaki Y, Hashimoto T, Sato M. (2014), MPI/OpenMP HYBRID  Parallelization of Phylogenetic Analyses based on Non-Homogeneous Substitution Models:Implementation and Performance Evaluation for Large-Scale Computing Systems. IPSJ Transactions on Advanced Computing Systems, 7(3), pp 13–24 (2014). written in Japanese

5.      Yabuki A, Kamikawa R, Sohta A. Ishikawa, Kolisko M, Kim E, Tanabe AS, Kume K, Ishida K, Inagaki Y. (2014), Palpitomonas bilix presents a basal cryptist lineage: insight into the character evolution in Cryptista. Scientific Reports, 4:4641.

6.       Kamikawa R, Kolisko M, Nishimura Y, Yabuki A, Brown MW, Sohta A. Ishikawa, Ishida K, Roger AJ, Hashimoto T, Inagaki Y. (2014), Gene-content evolution in discobid mitochondria deduced from the phylogenetic position and complete mitochondrial genome of Tsukubamonas globosa. Genome Biology and Evolution, 6(2), pp 306-315.

7.      Nagayasu E, Sohta A. Ishikawa, Taketani S, Chakraborty G, Yoshida A, Inagaki Y, Maruyama H. (2013), Identification of a bacteria-like ferrochelatase in Strongyloides venezuelensis, an animal parasitic Nematode. PLOS ONE, 8(3), e58458.

8.     Sohta A. Ishikawa, Hashimoto T. (2012), Assessment of the performance of phylogenetic inference based on simulated protein-coding sequences with significant compositional heterogeneity. Proceedings of the Institute of Statistical Mathematics, 60(2), pp 289-303. written in Japanese

9.    Sohta A. Ishikawa, Inagaki Y, Hashimoto T. (2012). RY-coding and non-homogeneous models can ameliorate the maximum-likelihood inferences from nucleotide sequence data with parallel compositional heterogeneity. Evolutionary Bioinformatics, 8, pp 357-371.

10.     Ishitani Y, Sohta A. Ishikawa, Inagaki Y, Tsuchiya M, Takahashi K, Takishita K. (2011), Multigene phylogenetic anaylses including diverse radiolarian species support the "Retaria" hypothesis - the sister relationship of Radiolaria and Foraminifera. Marine Micropaleontology, 81(1), pp 32-42.

11.    Matsumoto T, Sohta A. Ishikawa, Hashimoto T, Inagaki Y. (2011), A deviant genetic code in the green alga-derived plastid in the dinoflagellate Lepidodinium chlorophorum. Molecular Phylogenetics and Evolution, 60(1), pp 68-72.

12.    Reimer JD, Sohta A. Ishikawa, Hirose M. (2011), New records and molecular characterization of Acrozoanthus (Cnidaria: Anthozoa: Hexacorallia) and its endosymbionts (Symbiodinium spp.) from Taiwan. Marine Biodiversity, 41(2), pp 313-323.


Peer-reviewed Conference Papers

1.       Sohta A. Ishikawa, Nakao M, Inagaki Y, Hashimoto T, Sato M. (2014), Hybrid MPI/OpenMP parallelization of a phylogenetic program with Non-Homogeneous models: toward the analyses of large-scale sequence datasets. High Performance Computing Symposium 2014, pp 10-20. written in Japanese


Please check Ishikawa_Presentations


Please check Ishikawa_Products

Contact Info

mail: saishi@b.s.u-tokyo.ac.jp, or s.ishikawa.biol.phylo@gmail.com
*please convert a full-width "@" to a half-width one

サブページ (2): Ishikawa_Products Presentation
Ishikawa Sohta,
2016/04/11 22:11