Searching the environmental causes of complex diseases

Post date: Jan 21, 2013 4:54:15 PM

Alain-Jacques Valleron

Professor of epidemiology and biostatistics of Université Pierre et Marie Curie, Paris and member of the Académie des sciences.

The basic idea of this project is that the time has arrived to search for environmental causes of diseases by using data driven approaches (http://jem.rupress.org/content/205/13/2953.long) .

In 1990, the humane genome project was launched. In 2003, the description of the first human genomes was given. In 2005, the first Genome Wide Association Study (GWAS) was published. In GWAS studies, sequences of SNPs (Single Nucleotide Polymorphisms) are compared in a group of “cases” and a group of “controls”. After appropriate correction for the “False Discovery Rate”, SNPs for which the 3 genotypes are differently distributed in cases and controls give clues for a possible genetic cause of the disease. The strength of the association is measured by the Odd Ratio. Only 9 years later, 1250 GWAS studies were published by July 2012 (see www.genome.gov/gwastudies) scanning the entire spectrum of human diseases. This approach was methodologically totally different of the genetic research of the 80’s which was « hypothesis driven », exploring candidate genes in appropriate studies of families

However, the overall conclusion that can be drawn from all these works is that only a very small percentage of the complex and frequent diseases can be explained by genetic factors. For example, this % is certainly much less than 30% in Type 1 Diabetes (same figure for most of the cancers, cardiovascular diseases, Alzheimer disease, etc.). Environmental causes are therefore likely to play an important role to explain – alone, or in combination with G factors- the causality of these diseases. Moreover, there are numerous diseases whose incidence changes quickly (while the genetic structure of the population is fairly stable): Diabetes Type 1 (incidence doubled over the last decades) and other autoimmune diseases (e.g. asthma), malformations of the male reproductive system at birth, etc.

In the project we have set over the last three years, we want to use the same data driven paradigms to study the E factors than those currently used to study the G factors. The genome is replaced by the “environnementome”, which is the list of values taken by all environmental factors for which we have data, regardless of their a priori plausibility in the etiology of the diseases. The data may be obtained by large questionnaire filled by volunteers (cases and controls). However, the main source of data we used is obtained by mapping the addresses of the patients (in utero, at birth, during their early school years, etc.) with a series of environmental databases: climate, physical and chemical environment, infections (those which are continuously space time monitored), nature of the land cover (as monitored by satellites with a 250m x 250 m precision), sociological and demographical variables, etc. These measures are done on “cases” but must be done on “controls” to allow a comparison. The choice of unbiased physical controls (real persons) is costly, and difficult, as always in Epidemiology. We have therefore also defined “virtual controls” drawn randomly on the map.

To test our approach, we chose two diseases in children, as it is simpler to describe the environmental lifeline before the disease:

the first one is a large cohort of Diabetes Type 1 patients constituted over the last 3 years. By January 2013, 6000 children whose parents have given an informed consent, were recruited, and had a clinical and biological follow up, every 6 months. 2500 have been fully genotyped to allow the search of Gene x Environment interactions. All of them were mapped to environmental databases, and a subset of the cohort filled a 800 items questionnaire. Our preliminary statistical analyses concerned mainly the infectious diseases environment for which France has large geographical real time data since 1984. See the public web site (http://www.isis-diab.org )

The second one consists of a large set (>15,000) of male babies with congenital anomalies of the male genitalia (cryptorchidism, hypospadias), together with a comparable dataset of “normal” control children.

The final dream would be that innovative algorithms be able to discover that the a priori homogeneous set of patients with the disease is, in reality, a collection of subsets for which a specific genetic-environmental combination would be the cause of the disease. At beginning, we believe that the “data driven” approach of the environmental causality of diseases can be performed quite similarly to those that were used for the genome studies.