Melanomics - studying genetic architecture of a skin cancer (2014-2015)
Melanoma is a the most dangerous form of skin cancer which forms from pigment-containing cells on the skin (melanocytes). They are usually caused by exposure to ultra-violet light from the sun. Investigating the genes and genetic mechanisms that lead to the propagation of the cancer may help in our approach to prevention and cure, such as by identification of novel therapeutic drug targets. We investigate matched normal-tumor pair of DNA and RNA sequencing data to -
1. Detect somatic mutations and changes in expression of RNA
2. Interpret the effect of somatic mutations on the development of cancer
Systems genomics for a brain cancer cells (2012-2014)
See http://systemsbiology.uni.lu/shsy5y for the resources produced by us for the SH-SY5Y neuroblastoma cell line. (published in BMC Genomics)
Whole genome and exome analysis for epilepsy families
Epilepsy is a common serious neurological disorder affecting six million people in Europe. As part of a European Science Foundation Grant project, I am developing a bioinformatic workflow to compare genetic variations across individuals in a family. The main goal is to find candidate genetic variations which are likely causing the disease. Comparing such variations across individuals is a problem due to the multiple kinds of variations (SNV, indels, copy number variations, structural variations) that exist in the human genome, and uncertainty in the measurements of different kinds of variations. For example, the location of large structural variations in the genome are not precisely known, whereas indels in repetitive regions maybe reported ambiguously.
Once we find the candidate genetic variations causing epilepsy, they may then be validated in separate studies by our collaborators.
Gene regulation as a Boolean network (2010-2012)
Microarray studies fail to capture the differential expression of many genes due to low statistical power. Some of these genes are known to participate in the regulatory process using previous literature. We present a method to integrate prior knowledge about gene regulatory network, and context-specific microarray results, in order to predict differential gene expression not found in the microarray. (published in Nucleic Acids Research)
Genome analysis workflow (2012 - Present)
Different platforms exist for sequencing the human genome such as Complete Genomics, Illumina, ABI, Pacific Biosciences SMRT etc., and newer platforms keep emerging. In order to take full advantage, we need to assess the strengths and weaknesses of each platform and find a way to integrate information from different platforms. We develop a bioinformatic workflow to visualise the differences between different sequencing platforms, and create a combined genome sequence. The particular implementation considers variants from Complete Genomics (CG) and Illumina (IL).
Left: Percentage association of variants with genomic position. Concordant (partial and full) SNPs are more likely to be within genes, compared to platform-specific variants, which are slightly more biased towards intergenic regions. Middle: Distribution of SNPs over quality values given by Complete Genomics (CG). Right: Distribution of SNPs over quality values given by Illumina (IL). A CG SNP quality value of 40 and an IL SNP quality value of 40, appear to filter out a majority of platform-specific SNPs.