Publications

CTDquerier: a tool for enrichment analysis in genetic and toxicological studies using CTD

C Hernandez-Ferrer, JR Gonzalez

April, 2018 | Bioinformatics

Biomedical studies currently include a large amount of genomic and environmental factors to study the aetiology of human diseases. R/Bioconductor projects provides several tools to perform enrichment analysis at gene-pathway level. However there is a need to perform similar analyses at gene-chemicals or gene-disease levels to provide complementary knowledge of the causal path between factors and health. While CTD provides information based on gene-disease and gene-chemical there is no software for its integration in R/Biocondcutor analysis pipelines. CTDquerier helps users to easily get and visualize CTD data in R/Biocondcutor framework. The use of the package is illustrated by enhancing a real data analysis of asthma-related genes.

Circulating miRNAs, isomiRs and small RNA clusters in human plasma and breast milk

M Rubio, M Bustamante, C Hernandez-Ferrer, D Fernandez-Orth, L Pantano, Y Sarria, M Pique-Borras, K Vellve, R Carreras, X Estivill, JR Gonzalez, A Mayor

March, 2018 | PLoS ONE

Circulating small RNAs, including miRNAs but also isomiRs and other RNA species, have the potential to be used as non-invasive biomarkers for communicable and non-communicable diseases. This study aims to characterize and compare small RNA profiles in human biofluids. For this purpose, RNA was extracted from plasma and breast milk samples from 15 healthy postpartum mothers. Small RNA libraries were prepared with the NEBNext small RNA library preparation kit and sequenced in an Illumina HiSeq2000 platform. miRNAs, isomiRs and clusters of small RNAs were annotated using seqBuster/seqCluster framework in 5 plasma and 10 milk samples that passed the initial quality control. The RNA yield was 81 ng/mL [standard deviation (SD): 41] and 3985 ng/mL (SD: 3767) for plasma and breast milk, respectively. Mean number of good quality reads was 4.04 million (M) (40.01% of the reads) in plasma and 12.5M (89.6%) in breast milk. One thousand one hundred eighty two miRNAs, 12,084 isomiRs and 1,053 small RNA clusters that included piwi-interfering RNAs (piRNAs), tRNAs, small nucleolar RNAs (snoRNA) and small nuclear RNAs (snRNAs) were detected. Samples grouped by biofluid, with 308 miRNAs, 1,790 isomiRs and 778 small RNA clusters differentially detected. In summary, plasma and milk showed a different small RNA profile. In both, miRNAs, piRNAs, tRNAs, snRNAs, and snoRNAs were identified, confirming the presence of non-miRNA species in plasma, and describing them for the first time in milk.

The acute effects of ultraviolet radiation on the blood transcriptome are independent of plasma 25OHD3

M Bustamante*, C Hernandez-Ferrer*, Y Sarria, GI Harrison, L Nonell, W Kang, MR Friedlander, X Estivill, JR Gonzalez, M Nieuwenhuijsen, AR Youn

September, 2017 | Environmental Research

The molecular basis of many health outcomes attributed to solar ultraviolet radiation (UVR) is unknown. We tested the hypothesis that they may originate from transcriptional changes in blood cells. This was determined by assessing the effect of fluorescent solar simulated radiation (FSSR) on the transcriptional profile of peripheral blood pre- and 6h, 24h and 48h post-exposure in nine healthy volunteers. Expression of 20 genes was down-regulated and one was up-regulated at 6h after FSSR. All recovered to baseline expression at 24h or 48h. These genes have been associated with immune regulation, cancer and blood pressure; health effects attributed to vitamin D via solar UVR exposure. Plasma vitamin D3 [25(OH)D3] levels increased over time after FSSR and were maximal at 48h. The increase was more pronounced in participants with low basal 25(OH)D3 levels. Mediation analyses suggested that changes in gene expression due to FSSR were independent of 25(OH)D3 and blood cell subpopulations.

psygenet2r: a R/Bioconductor package for the analysis of psychiatric disease genes

A Gutierrez-Sacristan*, C Hernandez-Ferrer*, JR Gonzalez, LI Furlong

August, 2017 | Bioinformatics

Psychiatric disorders have a great impact on morbidity and mortality. Genotype-phenotype resources for psychiatric diseases are key to enable the translation of research findings to a better care of patients. PsyGeNET is a knowledge resource on psychiatric diseases and their genes, developed by text mining and curated by domain experts. We present psygenet2r, an R package that contains a variety of functions for leveraging PsyGeNET database and facilitating its analysis and interpretation. The package offers different types of queries to the database along with variety of analysis and visualization tools, including the study of the anatomical structures in which the genes are expressed and gaining insight of gene’s molecular function. Psygenet2r is especially suited for network medicine analysis of psychiatric disorders.

MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration

C Hernandez-Ferrer*, C Ruiz-Arenas*, A Beltran-Gomila, JR Gonzalez

January, 2017 | BMC Bioinformatics

Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment.

The Pregnancy Exposome: Multiple Environmental Exposures in the INMA-Sabadell Birth Cohort

O Robinson, X Basagaña, L Agier, M de Castro, C Hernandez-Ferrer, JR Gonzalez, JO Grimalt, M Nieuwenhuijsen, J Sunyer, R Slama, M Vrijheid

July, 2015 | Environmental Science & Technology

The exposome is defined as the totality of human environmental exposures from conception onward, complementing the genom and its holistic approach may advance understanding of disease etiology. We aimed to describe the correlation structure of the exposome during pregnancy to better understand the relationships between and within families of exposure and to develop analytical tools appropriate to exposome data. Estimates on 81 environmental exposures of current health concern were obtained for 728 women enrolled in The INMA (INfancia y Medio Ambiente) birth cohort, in Sabadell, Spain, using biomonitoring, geospatial modeling, remote sensors, and questionnaires. Pair-wise Pearson's and polychoric correlations were calculated and principal components were derived. The median absolute correlation across all exposures was 0.06 (5th-95th centiles, 0.01-0.54). There were strong levels of correlation within families of exposure (median = 0.45, 5th-95th centiles, 0.07-0.85). Nine exposures (11%) had a correlation higher than 0.5 with at least one exposure outside their exposure family. Effectively all the variance in the data set (99.5%) was explained by 40 principal components. Future exposome studies should interpret exposure effects in light of their correlations to other exposures. The weak to moderate correlation observed between exposure families will permit adjustment for confounding in future exposome studies.

affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling

C Hernandez-Ferrer, I Quintela Garcia, K Danielski, A Carracedo, LA Perez-Jurado, JR Gonzalez

May, 2015 | BMC Bioinformatics

The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies.