With the advent of Next Generation Sequencing (NGS) and the increasing need for analysis of large scale data derived from these technologies (omics data), there is an increaing need for bioinformatics tools used specifically for the analysis of omics data. My focus is centered on tools and pipelines for omics data analytics.
Specific research interests and directions in this area include:
The improvement of classification algorithms for the successful analysis of transcriptomic data (e.g. miRNA-Seq, RNA-Seq, single-cell RNA-Seq) using systems bioinformatics approaches.
The implementation of machine-learning algorithms to facilitate the diagnosis of complex diseases.
The analysis of whole exome/whole genome next generation sequencing data for the characterization and classification of genetic variation in disease. Emphasis given on variants of unknown significance (VUS) ·
Microbiome data analysis. Emphasis on antimicrobial peptide prediction and bacterial competition.
Genomics of SARS-CoV-2.
Additional interests include the investigation of apporaches for linking scRNA-seq to drug repurposing data.
My research interest also include the development as well as the application of bioinformatics tools and software for the specific analysis of bacterial diversity using genomic and 16S metagenomic data. I have been making use of already implemented pipelines as well as modifying and setting up new procedures and methodologies for the analysis of omics data derived from environmental sampling.
In addition, I am also performing analysis from data derived from full shotgun metagenomics and currently making use of data management web applications like IMG (hosted at the Join Genome Institute - JGI) as well as MGRAST to obtain taxonomic as well as functional information on microbial communities inhabiting specific environments.
Early research interests focused on the improvement of classification algorithms for the successful analysis of microarray expression data obtained from cases with varying pathological conditions or different grades of cancer. I have implemented machine-learning algorithms, such as Support Vector Machines and Artificial Neural Networks using programming languages such as R and JAVA. The aim of such applications is to aid in the diagnosis of complex genetic diseases such as cancer, by providing supplementary information to compliment classical clinical, histopathological and existing genetic information available today.
During my PhD I also focused on the field of microRNAs (miRNA). MiRNAs have been shown to encompass a form of gene regulation that involves small fragments (~22nt) of RNA that hybridize very specifically to the 3`UTR of target gene transcripts and either target the transcript for degradation or simply inhibit translation by disrupting the normal binding of the translation molecular machinery. My work in this field entailed the computational prediction of novel miRNA genes residing within cancer associated genomic regions (CAGRs). I utilized a sophisticated machine learning algorithm based on probabilistic Hidden Markov Models to construct a model from known miRNAs, which was concurrently utilized to scan CAGRs for the prediction of novel miRNAs.
The prediction tool (SSCprofiler) is available as a graphic user interface as well as a web service.
Other work along this line involves the improvement of miRNA target prediction tools. More specifically I have developed a novel computational methodology (Targetprofiler) for prediction of miRNA gene targets based on Profile Hidden Markov Models. Target predictions using this improved methodology has been performed, for novel miRNA gene candidates predicted by SSCprofiler and a biological relevant target has been experimentally verified.