In recent years, our academic research has achieved significant advancements across multiple domains, including protein structure prediction, post-translational modification prediction, antibody epitope prediction, enzymology research, and medical informatics applications. Through the development of innovative computational methodologies and predictive systems, we have provided robust computational tools for protein function research while offering novel perspectives for the investigation of related diseases.
Our research has not only propelled the advancement of bioinformatics but also provided substantial support for biomedical research. In the realm of protein structure prediction, we have developed QuaBingo [1] and QUATgo [2], both systems significantly enhancing the accuracy of protein quaternary structure prediction. Extending these computational innovations to protein subcellular localization, our REALoc system simultaneously addresses singleplex and multiplex protein localization, presenting a comprehensive predictive approach [3]. Building upon similar methodological strategies, our SUMOgo system advances protein post-translational modification prediction by incorporating the interplay of various PTMs, thereby improving the prediction of SUMOylation sites [4]. In antibody research, we further refined our computational techniques through NIgPred, a system integrating diverse heterogeneous features with advanced machine learning methods to outperform existing epitope prediction tools [5]. Beyond protein-centric predictions, our enzymology studies utilized genetic algorithm-based protein-ligand docking software to discover potential anticancer agents against melanoma [6], and identified novel enzymatic activities in BDH, significantly broadening its biotechnological applications [7]. Finally, translating our computational expertise into clinical practice, we developed the LBN system, providing a rapid, cost-effective, and non-invasive diagnostic solution for bladder cancer [8].
In addition to research, we actively engage in industry-academia collaboration, currently developing a convolutional neural network-based application for image identification to assess the population of ants within their nests. Moving forward, we aim to extend these preliminary outcomes into broader industrial applications in bioinformatics, medical informatics, and AI-driven deep learning technologies. Additionally, we mentor undergraduate students in conducting artificial intelligence and deep learning projects. Following a year of dedicated training and hands-on experience, our students successfully develop deep learning models with strong predictive performance, applying them effectively in diverse AI application scenarios. Examples of student-driven projects include medical applications such as respiratory sound analysis for distinguishing lung diseases and image recognition for skin melanoma, as well as ecological applications including identification of fall armyworms, lepidopteran larvae, and common ant species in Taiwan.
Building upon our existing research achievements, we plan to deepen our investigations in the following areas:
AI-driven "Structural Alphabet 2.0" system: Our future research will focus on developing an AI-driven "Structural Alphabet 2.0" system, enabling ultra-fast structural indexing and functional annotation for entire proteomes. Although the original Structural Alphabet (SA) effectively compressed protein 3D structures and supported rapid structural comparisons (e.g., 3D-BLAST), the recent surge of near-atomic-level predictions from AlphaFold2 and ESMFold demands more scalable solutions. To address this, we aim to leverage self-supervised deep learning approaches, including contrastive learning and variational autoencoders, to adaptively refine the SA framework, generating a new embedding ("SA-Embedding") that maps proteins into low-dimensional vector spaces. Coupled with advanced ANN/LSH indexing, this method will achieve sub-second nearest-neighbor searching across hundreds of millions of protein models. Furthermore, we plan to develop "u3D-BLAST," integrating enhanced SASM algorithms and deep-learning-based scoring functions for statistically robust evaluation and automatic functional annotation (Gene Ontology, EC numbers). Ultimately, this innovative platform will deliver large-scale structural annotations akin to UniProt’s functional annotations, significantly benefiting structural genomics, protein engineering, and remote homology detection across databases such as CATH and Pfam-Clan.
Application of advanced deep learning methods: We will explore more sophisticated deep learning architectures, such as Graph Neural Networks (GNN) and Attention Mechanisms, for protein structure and function prediction. These advanced deep learning methods can more effectively capture complex patterns within protein sequences and structures, thereby enhancing prediction accuracy.
Structural Immunoinformatics Platform development: We aim to develop a "Structural Immunoinformatics Platform," integrating Structural Alphabet (SA) methods to rapidly map antibody interaction spaces and enable AI-driven antibody design. This platform will establish a "CDR-SA Signature" database for ultra-fast structural matching of millions of antibody sequences, complemented by an "Epitope-Hotspot Mapping" system to predict antigen interaction hotspots. Furthermore, an AI-guided workflow will be created for generative design of antibodies with improved affinity and manufacturability, significantly accelerating vaccine and therapeutic antibody development.
Integration of multi-omics data: By integrating proteomics, genomics, and metabolomics data, we aim to develop more comprehensive predictive models to enhance prediction accuracy. Such integration not only provides a more holistic biological perspective but also captures interactions across different omics levels, thus facilitating deeper insights into biological mechanisms.
Clinical application of bioinformatics tools: We will apply the developed bioinformatics tools to clinical research, particularly in disease diagnosis and personalized medicine. These clinical applications will not only validate the practicality of the tools but also provide innovative methods and resources for clinical practice.
Development of user-friendly bioinformatics tools: We will package our predictive systems and computational tools into more user-friendly interfaces, making them accessible to non-specialists. Such user-friendly tools will broaden their utilization and facilitate the application of bioinformatics methodologies in biological and medical research.
Developing disease-USA network with weighted RWR score for protein pathogenic prediction
2016/08/01 - 2017/07/31
Funding: NTD 423,000
Structure-Oriented Distance Motif for Protein Functions Prediction and Annotation
2012/08/01 - 2013/07/31
Funding: NTD 528,000
Networks of Protein Structural Units in Pharmaceutical Applications
2012/01/01 - 2012/07/31
Funding: NTD 500,000
整合影像辨識分析螞蟻成長過程-螞蟻生態日記服務平台 (Integrated Image Recognition and Analysis of Ant Growth: An Ant Ecological Diary Service Platform)
2019/10/01 - 2020/07/31
Funding: NTD 500,000
Cooperate with Empire of ants 螞蟻帝國
利用大型語言模型提升病歷分類準確性與效率的研究 -- 訓練醫療大型語言模型應用至醫療檢閱平台 2024/07/01 - 2025/02/28 學生:黎芷妤
Using SA, we define new local structural fragments called units of structural alphabet (USAs) that represent unique features of protein structures. Each USA is composed of two secondary protein structures and one loop located between these two secondary structures. USAs can maintain not only the flexibility of variable loops but also the stability of secondary structures. We conduct a similarity search and investigate the network formed by all-against-all USA sequence comparisons, where USAs represent nodes and links represent homology relationships.
Our findings show a highly uneven degree distribution characterized by a few and highly connected USAs (hubs) coexisting with many nodes having only a few links. Networks with such a power-law degree distribution are scale free. These findings not only suggest the existence of organizing principles for local protein structures but also allow us to identify key fragments that are potentially useful for new drug development and design. Of particular interest is the identification of USAs in the set of known drug protein targets.
We developed the kappa-alpha (κ, α) plot derived structural alphabet and a novel BLOSUM-like substitution matrix, called structural alphabet substitution matrix, which searches through the structural alphabet database. This structural alphabet (SA) was used in developing the fast structure database search method called 3D-BLAST, which is as fast as BLAST and provides the statistical significance (E-value) of an alignment, indicating the reliability of a hit protein structure. Moreover, we developed an automated server called fastSCOP for integrating a fast structure database search tool (3D-BLAST) and a detailed structural comparison tool, as well as for recognizing the SCOP domains and SCOP superfamilies of query structures.