Our team have been committed to the development of artificial intelligence algorithms for medical research, including single nucleotide polymorphism (SNPs), cancer and epidemiological association analysis, and the development of a bioinformatics system, all of which contributed to research on the association between genes and cancer. I have worked on six research topics: (1) interaction between risk factors for multiple outcomes (multi-outcome interaction), (2) improvement of the identification rate of gene–gene interactions, (3) improvement of whole genome association identification, (4) improvement of the evaluation criteria for gene interactions, (5) improvement of the identification rate of an unbalanced case–control data set, (6) improvement of the identification rate of gene–gene interactions of a data set with a small sample size, and (7) improvement of the limitations of binary classification. To summarize related breakthroughs and innovations, six studies are presented as follows:
Identifying and characterizing the interaction between risk factors for multiple outcomes (multi-outcome interaction) has been one of the greatest challenges faced by complex multifactorial diseases. However, the existing approaches have several limitations in identifying the multi-outcome interaction. To address this issue, we proposed a multi-outcome interaction identification approach called MOAI. MOAI was motivated by the limitations of estimating the interaction simultaneously occurring in multi-outcomes and by the success of Pareto set filter operator for identifying multi-outcome interaction. MOAI permits the identification for the interaction of multiple outcomes and is applicable in population-based study designs. Our experimental results exhibited that the existing approaches are not effectively used to identify the multi-outcome interaction, whereas MOAI obviously exhibited superior performance in identifying multi-outcome interaction. We applied MOAI to identify the interaction between risk factors for colorectal cancer (CRC) in both metastases and mortality prognostic outcomes. An interaction between vaspin and carcinoembryonic antigen (CEA) was found, and our results showed that patients with CRC characterized by higher vaspin (≥30%) and CEA (≥5) levels could simultaneously increase both metastases and mortality risk. The immunostaining evidence revealed that determined multi-outcome interaction could effectively identify the difference between non-metastases/survived and metastases/deceased patients, which offers multi-prognostic outcome risk estimation for CRC. To our knowledge, this is the first report of a multi-outcome interaction associated with a complex multifactorial disease.
MOAI tool link: https://sites.google.com/view/moaitool/home
Y-D Lin, Y-C Lee, C-P Chiang, S-H Moi, J-Y Kan (2022, Jan). MOAI: a multi-outcome interaction identification approach reveals an interaction between vaspin and carcinoembryonic antigen on colorectal cancer prognosis. Briefings in Bioinformatics, Vol. 23, Art. no. bbab427, https://doi.org/10.1093/bib/bbab427. (SCI IF=11.622, Rank=2/58 [3.4%], Quartile=Q1, Mathematical & Computational biology)
In genome-wide association analysis (GWAS), the large number of combinations of SNPs limit the ability of many algorithms to detect potential gene–gene interactions. On the basis of a differential evolution algorithm and multifactor dimensionality reduction (MDR), our study proposes an innovative algorithm to rapidly and accurately detect relevant gene–gene interactions in the GWAS. Our algorithm achieved 10% identification of gene–gene interactions, which is superior to other algorithms. In the GWAS, a time of only 60 s is required to completely identify gene–gene interactions in human chromosome 1, and a conventional algorithm requires at least 6 h for the same activity. Our algorithm has high efficiency and accuracy in identifying potential gene–gene interactions in the GWAS and provides available bioinformatics tools, which can be used to identify the association between genes and diseases in a GWAS.
C-H Yang, L-Y Chuang, Y-D Lin (2017, Aug). CMDR based differential evolution identify the epistatic interaction in genome-wide association studies. Bioinformatics, Vol. 33, 2354-2362. (SCI IF=7.307, Rank=2/57 [3.5%], Quartile=Q1, Mathematical & Computational biology)
I proposed an improved fuzzy c-means-based entropy (FCME) approach to overcome the limitations of binary classification. In this approach, the degree of membership in multifactor dimensionality reduction (MDR), named FCMEMDR, was used. The FCME approach and MDR were integrated to more precisely differentiate between similar frequencies of multifactor genotypes in cases of possible epistasis. Our study used 68 diseases and 6,800 data sets to evaluate FCMEMDR. FCMEMDR increased the recognition rate of epistasis to >83% (other algorithms provide a recognition rate of approximately 76%). Furthermore, fuzzy c-means allows FCMEMDR to provide new visualization technique to help determine the effects of epistasis on the disease, thereby contributing to disease prevention and treatment.
C-H Yang, L-Y Chuang, Y-D Lin (April 2020). Epistasis analysis using an improved fuzzy c-means-based entropy approach. IEEE Transactions on Fuzzy Systems, Vol. 28, 718-730. (SCI IF=9.518, 7/136 [5.1%], Quartile=Q1, Computer science, Artificial intelligence)
We used multiple-criteria decision-making analysis to combine two crucial machine learning algorithms, namely MDR and SNPruler, for identifying gene–gene interactions. Our algorithm has the advantage of multiple evaluation criteria and offers a breakthrough in the study of various diseases. The study used 46 disease data sets, and the recognition rate of gene–gene interactions was 7% higher than that of other algorithms. Our algorithm can simultaneously consider more risk factors and perform a GWAS, thus helping researchers to more accurately infer the cause of a disease. Our article was selected as the cover page for Bioinformatics Vol. 23 (2019) in IEEE Journal of Biomedical and Health Information.
C-H Yang, L-Y Chuang, Y-D Lin (Jan. 2019). Multiple-criteria decision analysis-based multifactor dimensionality reduction for detecting gene-gene interactions. IEEE Journal of Biomedical and Health Informatics, Vol.23, 416-426. (SCI IF=5.223, Rank=1/27 [3.7%], Quartile=Q1, Medical informatics)
In case–control studies, for many rare diseases, it is difficult to collect cases, and most algorithms are limited by sample size, meaning they cannot effectively identify significant gene–gene interactions. We proposed a balance rule for MDR in the gene–gene interaction analysis. Our balance rule can change the imbalanced distribution between cases and controls in each cross-validation and effectively improve accuracy with small sample sizes. The study analyzed a small sample data set of 48 diseases. Our balance rule improved the recognition rate of gene–gene interactions by >50%. Our algorithm provides a breakthrough in the inference of gene associations for small sample sizes and helps researchers to infer significant gene–gene interactions.
C-H Yang, L-Y Chuang, Y-D Lin (Jul. 2020). Class balanced multifactor dimensionality reduction to detect gene—gene interactions. IEEE-ACM Transactions on Computational Biology and Bioinformatics, Vol. 17, 71 - 81. (SCI IF=3.015, Rank=11/124 [8.9%], Quartile=Q1, STATISTICS & PROBABILITY)
We revealed that the used evaluation formula easily affected the MDR method. Our algorithm is the first that combines multiobjective theory in the SNP–SNP interaction analysis. Our algorithm has both cross-disease and superior ability to identify potential disease associations. The algorithm can effectively reduce the possibility of missing the most significant SNP–SNP interactions due to evaluation formula errors. Our study used 48 disease data sets to evaluate the algorithm. Our algorithm improved the recognition rate of SNP–SNP interactions to 5%. Thus, it can effectively determine the association between diseases and genes.
C-H Yang, L-Y Chuang, Y-D Lin (Oct. 2018). Multiobjective multifactor dimensionality reduction to detect SNP–SNP interactions. Bioinformatics, Vol. 34, 2228-2236. (SCI IF=5.481, Rank=3/59 [5.1%], Quartile=Q1, Mathematical & Computational biology)