New
Our team currently focuses on deep learning (DL) applications in bioinformatics, including chronic diseases and oncology. In a time-to-event model, Cox proportional hazards (CoxPH) analysis is the most frequently used method for estimating overall survival. However, CoxPH analysis is limited to explaining only single or partial risk effects among clinicopathological factors. In a recently completed study, CHY proposed DeepCoxPH, a risk score estimation strategy based on DL and CoxPH, to improve risk stratification for analyzing overall survival. Weights abstracted from DL and hazard ratios from CoxPH were transformed into a risk score estimate in the fully adjusted model. DeepCoxPH exhibited more comprehensive risk weight estimation for overall survival compared with other strategies. DeepCoxPH was applied to predict long-term overall survival in breast cancer. A Kaplan–Meier curve revealed that DeepCoxPH improved the discrimination of high- and low-risk stratifications in both short- and long-term overall survival in breast cancer. The aim of this study on risk score estimation based on machine learning and parametric statistical analysis was to identify risk stratifications for overall survival through considering comprehensive risk effects among multiple clinicopathological factors. In addition, multiple machine-learning-based algorithms have been designed and applied in various fields.
Kaohsiung Medical University Hospital
In the era of advanced precision medicine, next-generation genomic data are crucial to achieve breakthroughs in cancer medicine. Effective cancer mortality risk estimation for genomic data associated with cancer remains a vital challenge. The combination of machine learning algorithms and conventional survival analysis can advance the detection of high-risk missense mutation variants and candidate genes associated with cancer mortality in next-generation genomic data. We proposed a fuzzy logic system combined with machine learning algorithms and conventional survival analysis named FuzzyDeepCoxPH to identify high-risk missense mutation variants and candidate genes highly associated with cancer mortality. DL-derived abstracted weights and Cox proportional hazards (CoxPH) ratios were used to develop four model-based risk scores to consider the factor importance associated with risk stratification, time-varying effects, and individual and interaction effects among features. Fuzzy rules based on a fuzzy logic system were designed to integrate these considerations by merging four model-based risk scores to develop advanced risk estimation. The clinical features and next-generation sequencing of deoxyribonucleic acid and ribonucleic acid genomic data were used to evaluate FuzzyDeepCoxPH performance. Our results indicated that FuzzyDeepCoxPH can effectively distinguish high-risk variants and candidate genes related to cancer mortality. In FuzzyDeepCoxPH, the fuzzy logic system was applied to combine DL-based and CoxPH-based models to provide a comprehensive cancer mortality risk estimation for cancer medicine.
Kaohsiung Chang Gung Memorial Hospital
To survey by measuring patient’s functional status which is crucial when end-stage renal disease patients begin a dialysis program. The influence of the disease on patients can be examined by the measurement of Karnofsky Performance Status (KPS) scores, together with a quality of life survey, and clinical variables. The details for the dataset in the study were collected from patients receiving regular hemodialysis (HD) in one hospital, which were available retrospectively for 1166 patients during the 5-year study period. KPS scores were applied for quantifying functional status. To identify risk factors for functional status, clinical factors including demographics, laboratory data, and HD vintage were selected. We applied a classification and regression tree approach (CART) and logistic regression to determine risk factors on functional impairment among HD patients. Ten risk factors were identified by CART and regression model (age, primary kidney disease subclass, treatment years, hemoglobin, albumin, creatinine, phosphorus, intact parathyroid hormone, ferritin, and cardiothoracic ratio). The results of logistic regression with selected interaction models showed older age or higher hematocrit, blood urea nitrogen, and glucose levels could significantly increase the log-odds of obtaining low KPS scores at inperson visits. In interaction results, the combination of older age with higher albumin level and higher creatinine level with longer HD treatment years could significantly decrease the log-odds of a low KPS score assessment during in-person visits. Age, hemoglobin, albumin, urea, creatinine levels, primary kidney disease subclass, and HD duration are the major determinants for functional status in HD patients.
National Taiwan University Hospital
Most non-significant individual single nucleotide polymorphisms (SNPs) were undiscovered in hypertension association studies. Their possible SNP–SNP interactions were usually ignored and leaded to missing heritability. In present study, we proposed a particle swarm optimization (PSO) algorithm to analyze the SNP–SNP interaction associated with hypertension. Genotype dataset of eight SNPs of renin-angiotensin system genes for 130 non-hypertension and 313 hypertension subjects were included. Without SNP–SNP interaction, most individual SNPs were non-significant difference between the hypertension and non-hypertension groups. For SNP–SNP interaction, PSO can select the SNP combinations involving different SNP numbers, namely the best SNP barcodes, to show the maximum frequency difference between nonhypertension and hypertension groups. After computation, the best PSO-generated SNP barcodes were dominant in non-hypertension in terms of the occurrences of frequency differences between non-hypertension and hypertension groups. The OR values of the best SNP barcodes involving 2–8 SNPs were 0.705–0.334, suggesting that these SNP barcodes were protective against hypertension. In conclusion, this study demonstrated that non-significant SNPs may generate the joint effect in association study. Our proposed PSO algorithm is effective to identify the best protective SNP barcodes against hypertension.
Taipei Medical University
Chi-Mei Medical Center
Oral cancer is the sixth most common cancer worldwide with a high mortality rate. Biomarkers that anticipate susceptibility, prognosis, or response to treatments are much needed. Oral cancer is a polygenic disease involving complex interactions among genetic and environmental factors, which require multifaceted analyses. Here, we examined in a dataset of 103 oral cancer cases and 98 controls from Taiwan the association between oral cancer risk and the DNA repair genes X-ray repair cross-complementing group (XRCCs) 1–4, and the environmental factors of smoking, alcohol drinking, and betel quid (BQ) chewing. We employed logistic regression, multifactor dimensionality reduction (MDR), and hierarchical interaction graphs for analyzing gene–gene (G · G) and gene–environment (G · E) interactions. We identified a significantly elevated risk of the XRCC2 rs2040639 heterozygous variant among smokers [adjusted odds ratio (OR) 3.7, 95% confidence interval (CI) = 1.1–12.1] and alcohol drinkers [adjusted OR= 5.7, 95% CI = 1.4–23.2]. The best two-factor based G·G interaction of oral cancer included the XRCC1 rs1799782 and XRCC2 rs2040639 [OR= 3.13, 95% CI = 1.66–6.13]. For the G· E interaction, the estimated OR of oral cancer for two (drinking–BQ chewing), three (XRCC1–XRCC2–BQ chewing), four (XRCC1–XRCC2–age–BQ chewing), and five factors (XRCC1–XRCC2–age–drinking–BQ chewing) were 32.9 [95% CI = 14.1–76.9], 31.0 [95% CI = 14.0–64.7], 49.8 [95% CI = 21.0–117.7] and 82.9 [95% CI = 31.0–221.5], respectively. Taken together, the genotypes of XRCC1 rs1799782 and XRCC2 rs2040639 DNA repair genes appear to be significantly associated with oral cancer. These were enhanced by exposure to certain environmental factors. The observations presented here warrant further research in larger study samples to examine their relevance for routine clinical care in oncology.
Kaohsiung Municipal Hsiao‑Kang Hospital
In association studies, the combined effects of single nucleotide polymorphism (SNP)‑SNP interactions and the problem of imbalanced data between cases and controls are frequently ignored. In the present study, we used an improved multifactor dimensionality reduction (MDR) approach namely MDR‑ER to detect the high order SNP‑SNP interaction in an imbalanced breast cancer data set containing seven SNPs of chemokine CXCL12/CXCR4 pathway genes. Most individual SNPs were not significantly associated with breast cancer. After MDR‑ER analysis, six significant SNP‑SNP interaction models with seven genes (highest cross‑validation consistency, 10; classification error rates, 41.3‑21.0; and prediction error rates, 47.4‑55.3) were identified. CD4 and VEGFA genes were associated in a 2‑loci interaction model (classification error rate, 41.3; prediction error rate, 47.5; odds ratio (OR), 2.069; 95% bootstrap CI, 1.40‑2.90; P=1.71E‑04) and it also appeared in all the best 2‑7‑loci models. When the loci number increased, the classification error rates and P‑values decreased. The powers in 2‑7‑loci in all models were >0.9. The minimum classification error rate of the MDR‑ER‑generated model was shown with the 7‑loci interaction model (classification error rate, 21.0; OR=15.282; 95% bootstrap CI, 9.54‑23.87; P=4.03E‑31). In the epistasis network analysis, the overall effect with breast cancer susceptibility was identified and the SNP order of impact on breast cancer was identified as follows: CD4 = VEGFA > KITLG > CXCL12 > CCR7 = MMP2 > CXCR4. In conclusion, the MDR‑ER can effectively and correctly identify the best SNP‑SNP interaction models in an imbalanced data set for breast cancer cases.