Deep-learning in omics
The generation of high-throughput sequencing results and new types of experimental data requires the development of a new computational framework. To address this challenge, we use deep-learning algorithms to improve the data quality, extract novel biological features from large-scale single-cell and higher-order genomic data, and integrate multi-omics results.
DeepLUCIA:Deep Learning-based Universal Chromatin Interaction Annotator
The importance of chromatin loops in gene regulation is broadly accepted. There are mainly two approaches to predict chromatin loops: transcription factor (TF) binding-dependent approach and genomic variation-based approach. However, neither of these approaches provides an adequate understanding of gene regulation in human tissues. To address this issue, we developed a deep learning-based chromatin loop prediction model called Deep Learning-based Universal Chromatin Interaction Annotator (DeepLUCIA).
Colon Hi-C CNN-XGBoost
In addition to the well-known effect on the coding region, large-scale structural variations (SVs) can dysregulate a gene by failing insulation of unwanted cis-regulatory elements. To investigate the SV-mediated alteration of 3D chromatin structures, multi-omics profile data including 3D genome map were
generated in 50 patients. Developed machine learning-based computational pipeline identified the 3D structural alterations and the genes for which interaction with the super-enhancers (SEs) were established. Analyzing over 600 SVs revealed that the SE-hijacking could be formed exceeding the nearest original
TAD boundary, in addition to the existing TAD fusion/shuffle model. Timing analysis showed ~40% of hijacking genes have clonal characteristics and recurrently associated with cell-cycle/DNA repair functions, which can be used as a prognostic marker. Collectively, our results highlight the regulatory role of noncoding structural variations on oncogene
RNA-CITE seq reconstruction autoencoder
This study presents an autoencoder model that integrates and analyzes CITE sequencing data and RNA sequencing data to reconstruct CITE sequencing data. This model changes the distribution of protein expression to resemble that of RNA expression while preserving the overall gene-specific protein expression level. In addition, the quality of the CITE sequencing data was measured through the distance from the RNA sequencing data, and it was confirmed that the degree to which the model changed the sequencing data was differential according to the defined quality.