Our research is focused mainly on the computational analysis of biological sequences (which include DNA, RNA and proteins). Our main research interests include prediction of protein structure and function from the primary sequence, development of stochastic models for analyzing biological sequences, large-scale genome analyses, development of methodology for genetic association and gene expression studies and analysis of biological systems.
Protein secondary and tertiary structure is largely determined by the protein's primary sequence. Thus, one of the earlier goals of computational biology was to develop methods for predicting the secondary and tertiary structure of proteins using information contained in their primary sequence. For some particular classes of proteins, such as the transmembrane proteins, the prediction is even more important, since these proteins are difficult to be studied by experimental means (i.e. X-Ray crystallography). One of the main research interests of our lab is the development of computational methods for predicting the structure and function of membrane proteins. In particular, we are involved in:
Prediction of alpha-helical membrane protein topology
Prediction and discrimination of transmembrane beta-barrels (bacterial outer membrane proteins)
Functional classification of G-protein Coupled Receptors (GPCRs)
Incorporation of experimentally derived information in the prediction methods
Prediction of subcellular location of proteins
Prediction of n-terminal signal peptides and other sorting signals in protein sequences
Applications in genome-wide analyses
Machine learning constitutes a large class of algorithms and computational techniques that enable us to recognize complex patterns and make decisions based (usually) on learning from processing large amounts of data (usually labeled). Computational biology and Bioinformatics use extensively machine learning algorithms due to the complexity of the underlying biological systems that are studied. We are mainly involved in studying a particular class of machine learning algorithms namely the Hidden Markov Models as well as other, related, Markovian models for applications in biological data. Markovian models are suitable for analyzing biological sequences since they recognize the sequential nature of such data. In particular, we are involved in:
Development of maximum likelihood and conditional maximum likelihood training algorithms for Hidden Markov Models
Development of decoding (recognition) algorithms for Hidden Markov Models
Development of maximum likelihood parameter estimation algorithms for other classes of Markovian models, especially for higher-order Markov chain models
Development of hybrid methods (i.e. hybrid of Hidden Markov Models and Neural Networks)
Development of semi-supervised training algorithms (i.e. utilizing both labeled and unlabeled data)
Applications in various problems of analyzing biological sequences (DNA, RNA and proteins)
The rapidly developing field of genetic epidemiology, which is the fusion of traditional genetics and epidemiology, studies the genetic elements of diseases as well as the joint effects of genetic factors and environmental determinants in large populations. Whereas traditional genetic studies (i.e. linkage studies, segregation studies) are usually used to identify major determinants of monogenic diseases which are caused by rare variants, modern genetic-association studies are involved in deciphering the role played by a large number of common genetic variants in the development of common (multifactorial) diseases (i.e. diabetes, heart disease, cancer). We are involved in both applied and methodological research in the area, especially implicated in meta-analysis of genetic-association studies and in particular regarding GWAS. The continuously increasing number of published genetic association studies, has made imperative the need for collecting and synthesizing the available information for a particular gene-disease association providing a quantitative overall estimate in a procedure known as meta-analysis.In particular, we are involved in:
Developing methodology for meta-analysis of genetic-association studies and GWAS
Software development for meta-analysis
Multivariate meta-analysis
Applications in common diseases such as diabetes, hypertension, heart disease and auto-immune diseases
Gene expression analysis and meta-analysis from microarrays and RNAseq data
Haplotype analysis
Methodological issues in genetic association studies (Hardy-Weinberg equilibrium, Linkage Disequilibrium etc)