Home‎ > ‎

Programs

Predicting double-strand DNA breaks using epigenome marks or DNA at kilobase resolution

We devised a computational approach to predict DSBs using the epigenomic and chromatin context, but also using DNA motifs. 

The R code and data are available at the Github repository


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Uncovering direct and indirect molecular determinants of chromatin loops using a computational integrative approach

We proposed a generalized linear regression with interaction terms to dissect the roles of genomic features such as insulator binding proteins, cofactors, motifs and promoters in establishing or maintaining 3D organization as reflected by the measure of long-range contacts (PLOS Computational Biology).

The model is implemented in the R package "HiCglmi" available at the bottom of this page (file HiCglmi_1.0.tar.gz). Here is the HiCfeat R package manual.


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Computational identification of genomic features that influence 3D chromatin domain formation

We proposed a multiple logistic regression model to measure the influences of genomic features such as DNA-binding proteins and functional elements on 3D domain borders (PLOS Computational Biology). This model tackles both colocalization and statistical interaction among multiple genomic features.

The model is implemented in the R package "HiCfeat" available at the bottom of this page (file HiCfeat_1.0.tar.gz). Here is the HiCfeat R package manual.


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Identification of late drug responsive genes from time-series gene expression data

We proposed a new model - the dynamic time order network - to distinguish and connect early and later drug responsive gene targets (BMC System Biology). 

This network is constructed based on an integrated differential equation. Spline regression is applied for an accurate modeling of the time variation of gene expressions. Then a likelihood ratio test is implemented to infer the time order of any gene expression pair. One application of the model is the discovery of estrogen response biomarkers. For this purpose, we focused on genes whose responses are late when the breast cancer cells are treated with estradiol (E2).

The R scripts and data are available in the rar archive code_scriptR_DTON.rar.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Modeling and visualization of linkage disequilibrium using latent Bayesian networks


We proposed a new method called CFHLC+ to learn forests of hierarchical latent class models (FHLCMs) for modeling and visualization of linkage disequilibrium using Bayesian networks with latent variables (COMPSTAT 2010, BMC Bioinformatics, PLoS ONE). CFHLC+ has been developed in C++ by Raphaël Mourad using the ProBT library dedicated to Bayesian networks. Several datasets are also attached in the archive dataset.rar. Another version of CFHLC that can deal with any data can be downloaded (CFHLC_general_use.rar).

An archive named CFHLC+_program.rar containing the CFHLC+ program, sample data and a readme can be downloaded at the bottom of this page. It is compiled on windows VISTA 32 bits and necessitates ProBT and Boost dynamic libraries which are present in the archive. CFHLC+ is the second version of CFHLC, an algorithm dedicated to linkage disequilibrium modeling though forests of hierarchical latent class models. The first version has the drawback to necessitate the splitting of the genotype sequence into windows in order to learn inside each window a specific model. Splitting the sequence into windows simplifies learning of models by constraining them. In consequence, this method cannot account for dependences between contiguous windows. In the second version, not published, the sequence is not splitted, ensuring a better modeling. Nevertheless, in this second implementation, the model is now constrained by a maximal physical distance between SNPs.

ċ
CFHLC+_program.rar
(6679k)
Raphael Mourad,
Aug 23, 2011, 2:54 AM
ċ
CFHLC_general_use.rar
(2120k)
Raphael Mourad,
Sep 9, 2011, 9:14 AM
Ċ
Raphael Mourad,
Sep 29, 2015, 9:03 AM
ċ
HiCfeat_1.0.tar.gz
(128k)
Raphael Mourad,
Sep 29, 2015, 9:00 AM
Ċ
Raphael Mourad,
Oct 17, 2016, 11:07 AM
ċ
HiCglmi_1.0.tar.gz
(2519k)
Raphael Mourad,
Oct 17, 2016, 11:07 AM
ċ
code_scriptR_DTON.rar
(3221k)
Raphael Mourad,
Jan 8, 2013, 6:53 AM
ċ
datasets.rar
(1072k)
Raphael Mourad,
Oct 4, 2011, 2:42 AM
Comments