Deep-learning in omics

The generation of high-throughput sequencing results and new types of experimental data requires the development of a new computational framework. To address this challenge, we use deep-learning algorithms to improve the data quality, extract novel biological features from large-scale single-cell and higher-order genomic data, and integrate multi-omics results.

DeepLUCIA:Deep Learning-based Universal Chromatin Interaction Annotator

The importance of chromatin loops in gene regulation is broadly accepted. There are mainly two approaches to predict chromatin loops: transcription factor (TF) binding-dependent approach and genomic variation-based approach. However, neither of these approaches provides an adequate understanding of gene regulation in human tissues. To address this issue, we developed a deep learning-based chromatin loop prediction model called Deep Learning-based Universal Chromatin Interaction Annotator (DeepLUCIA).

Colon Hi-C CNN-XGBoost

In addition to the well-known effect on the coding region, large-scale structural variations (SVs) can dysregulate a gene by failing insulation of unwanted cis-regulatory elements. To investigate the SV-mediated alteration of 3D chromatin structures, multi-omics profile data including 3D genome map were

generated in 50 patients. Developed machine learning-based computational pipeline identified the 3D structural alterations and the genes for which interaction with the super-enhancers (SEs) were established. Analyzing over 600 SVs revealed that the SE-hijacking could be formed exceeding the nearest original

TAD boundary, in addition to the existing TAD fusion/shuffle model. Timing analysis showed ~40% of hijacking genes have clonal characteristics and recurrently associated with cell-cycle/DNA repair functions, which can be used as a prognostic marker. Collectively, our results highlight the regulatory role of noncoding structural variations on oncogene

RNA-CITE seq reconstruction autoencoder

This study presents an autoencoder model that integrates and analyzes CITE sequencing data and RNA sequencing data to reconstruct CITE sequencing data. This model changes the distribution of protein expression to resemble that of RNA expression while preserving the overall gene-specific protein expression level. In addition, the quality of the CITE sequencing data was measured through the distance from the RNA sequencing data, and it was confirmed that the degree to which the model changed the sequencing data was differential according to the defined quality.