Some of the ongoing projects in our lab include:
· Network analysis of biomedical associations in public knowledge base: Huge amount of associations among biomedical entities are scattered in public domains, such as biomedical literature, electronic health records, and public health surveillance systems. Systematic analysis of such heterogeneous data provides biomedical scientists with unprecedented opportunities to infer novel associations among different biological entities in the context of precision medicine and translational research. However, it is computationally challenging to perform queries directly from these databases where associations among different biomedical entities are very complex yet sparse. We have developed a series of network-based computational applications to fill in the gaps between knowledge needs of translational researchers and existing knowledge discovery strategies including 1) identification of novel disease-drug-gene associations based on literature knowledge (Pub #1); 2) comparison of vaccine-adverse event associations in biomedical articles and Vaccine Ontology (Pub #2); and 3) identification of sex-specific vaccine-adverse event associations in Vaccine Adverse Event Reporting System (Pub #3).
· Integrative analysis of next generation sequencing data: Next-generation sequencing technology offers the promise of scientific discovery with the challenge of results interpretation. Integrative bioinformatics solutions are beginning to be released that address the challenge and facilitate filtering and interpretation of human sequence variation data. In past few years, we have led/participated the development of numerous next generation sequencing data analysis pipelines, such as Targeted RE-sequencing Annotation Tool (TREAT) (Pub #4), Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) (Pub #5), and Model-based Analysis of ChIP-exo (MACE) (Pub #6). Focusing on long non-coding RNAs (lncRNAs), we have developed an integrative bioinformatics approach to systematically identify and characterize numerous known and novel lncRNAs related to cardiac mechanisms (Pub #7) and melanoma progression (Pub #8).
· Identification of disease-specific networks: Genetic methods have uncovered thousands of complex tissue-specific mutation-induced effects and identified multiple disease gene targets. Important associations between disease and other biological entities, however, are usually scattered in biomedical publications. Systematic analyses of these disease-specific associations can help highlight the hidden associations between different diseases and related genes/drugs. Recently, we have developed novel network-based computational strategies to identify (1) statistically over-expressed subnetwork patterns called network motifs in an integrated drug–disease–gene association network (Pub #1), and (2) latent disease-gene associations from PubMed articles (Pub #9).
· Identification of vaccine adverse events from social media data: Increasing interests using social media-based detection systems for vaccine adverse event (AE) surveillance have demonstrated successful capability to capture timely and prevalent disease information. Despite these advantages, social media-based AE detection suffers from serious challenges such as labor-intensive labeling and class imbalance of the training data. To tackle both challenges from traditional reporting systems and social media, we have developed a combinatorial classification approach by integrating Twitter data and the Vaccine Adverse Event Reporting System (VAERS) information aiming to identify potential AEs after influenza vaccine. We demonstrate the power of formal reports on the performance improvement of AE detection when the amount of social media data was small (Pub #10).