Population–based 3D genome structure analysis reveals driving forces in spatial genome organization
Conformation capture technologies (e.g. Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and inter-chromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm the presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization.
Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. PNAS (2016)
Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities
3D genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as “Regulatory Communities.” We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures.
Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities. Nature Communications (2016)
TopDom: An efficient and deterministic method for identifying topological domains in genomes
Genome-wide proximity ligation assays allow the identification of chromatin contacts at unprecedented resolution. Several studies reveal that mammalian chromosomes are composed of topological domains (TDs) in sub-mega base resolution, which appear to be conserved across cell types and to some extent even between organisms. Identifying topological domains is now an important step toward understanding the structure and functions of spatial genome organization. However, current methods for TD identification demand extensive computational resources, require careful tuning and/or encounter inconsistencies in results. In this work, we propose an efficient and deterministic method, TopDom, to identify TDs, along with a set of statistical methods for evaluating their quality. TopDom is much more efficient than existing methods and depends on just one intuitive parameter, a window size, for which we provide easy-to-implement optimization guidelines. TopDom also identifies more and higher quality TDs than the popular directional index algorithm. The TDs identified by TopDom provide strong support for the cross-tissue TD conservation. Finally, our analysis reveals that the locations of housekeeping genes are closely associated with cross-tissue conserved TDs.
TopDom: an efficient and deterministic method for identifying topological domains in genomes. Nuclei Acids Research (2015)
Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data
Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin con- tact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability—the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency—the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed—the parallel version can run very fast on multiple computing nodes with limited local memory. Availability and implementation: The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems).
Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data. Bioinformatics (2015)
Genome architectures revealed by tethered chromosome conformation capture and population-based modeling
We describe tethered conformation capture (TCC), a method for genome-wide mapping of chromatin interactions. By performing ligations on solid substrates rather than in solution, TCC substantially enhances the signal-to-noise ratio, thereby facilitating a detailed analysis of interactions within and between chromosomes. We identified a group of regions in each chromosome in human cells that account for the majority of interchromosomal interactions. These regions are marked by high transcriptional activity, suggesting that their interactions are mediated by transcriptional machinery. Each of these regions interacts with numerous other such regions throughout the genome in an indiscriminate fashion, partly driven by the accessibility of the partners. As a different combination of interactions is likely present in different cells, we developed a computational method to translate the TCC data into physical chromatin contacts in a population of three-dimensional genome structures. Statistical analysis of the resulting population demonstrates that the indiscriminate properties of interchromosomal interactions are consistent with the well-known architectural features of the human genome.
Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nature Biotechnology (2011)
Physical tethering and volume exclusion determine higher-order genome organization in budding yeast
We show that tethering of heterochromatic regions to nuclear landmarks and random encounters of chromosomes in the confined nuclear volume are sufficient to explain the higher-order organization of the budding yeast genome. We have quantitatively characterized the contact patterns and nuclear territories that emerge when chromosomes are allowed to behave as constrained but otherwise randomly configured flexible polymer chains in the nucleus. Remarkably, this constrained random encounter model explains in a statistical manner the experimental hallmarks of the S. cerevisiae genome organization, including (1) the folding patterns of individual chromosomes; (2) the highly enriched interactions between specific chromatin regions and chromosomes; (3) the emergence, shape, and position of gene territories; (4) the mean distances between pairs of telomeres; and (5) even the co-location of functionally related gene loci, including early replication start sites and tRNA genes. Therefore, most aspects of the yeast genome organization can be explained without calling on biochemically mediated chromatin interactions. Such interactions may modulate the preexisting propensity for colocalization but seem not to be the cause for the observed higher-order organization. The fact that geometrical constraints alone yield a highly organized genome structure, on which different functional elements are specifically distributed, has strong implications for the folding principles of the genome and the evolution of its function.
Physical tethering and volume exclusion determine higher-order genome organization in budding yeast. Genome Research (2012)
Global reorganization of budding yeast chromosome conformation in different physiological conditions. Journal of Cell Biology (2016)
Comparative 3D Genome Structure Analysis of the Fission and the Budding yeast
We studied the 3D structural organization of the fission yeast genome, which emerges from the tethering of heterochromatic regions in otherwise randomly configured chromosomes represented as flexible polymer chains in an nuclear environment. This model is sufficient to explain in a statistical manner many experimentally determined distinctive features of the fission yeast genome, including chromatin interaction patterns from Hi-C experiments and the co-locations of functionally related and co-expressed genes, such as genes expressed by Pol-III. Our findings demonstrate that some previously described structure-function correlations can be explained as a consequence of random chromatin collisions driven by a few geometric constraints (mainly due to centromere-SPB and telomere-NE tethering) combined with the specific gene locations in the chromosome sequence. We also performed a comparative analysis between the fission and budding yeast genome structures, for which we previously detected a similar organizing principle. However, due to the different chromosome sizes and numbers, substantial differences are observed in the 3D structural genome organization between the two species, most notably in the nuclear locations of orthologous genes, and the extent of nuclear territories for genes and chromosomes. However, despite those differences, remarkably, functional similarities are maintained, which is evident when comparing spatial clustering of functionally related genes in both yeasts. Functionally related genes show a similar spatial clustering behavior in both yeasts, even though their nuclear locations are largely different between the yeast species.
Comparative 3D Genome Structure Analysis of the Fission and the Budding Yeast. PLOS One (2015)