DNA compositional domains

DNA sequences are formed by patches or domains of different nucleotide composition. In simple, homegeneous sequences, domains can be identified by eye; however, most DNA sequences show a complex compositional heterogeneity. We used a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen–Shannon entropic divergence (Bernaola-Galván et al., 2012; Oliver et al., 1999).

We divide a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence can be derived (Román-Roldán et al., 1998). SCC is computed as a function of the significance level, which provides a multiscale view of sequence complexity. Four DNA alphabets or mapping rules (Bernaola-Galván et al., 1999) were used: {A,T,C,G}, {S,W}, {R,Y} and {K,M}.

Using the UCSC Track Hub facility, we provide here both genome maps and genome coordinates of the compositional domains obtained at the 0.95 significance level in different groups of genome sequences:


Bernaola-Galván P, Oliver JL, Hackenberg M, Coronado a. V., Ivanov PC, Carpena P. 2012. Segmentation of time series with long-range fractal correlations. Eur Phys J B 85:211. doi:10.1140/epjb/e2012-20969-5
Bernaola-Galván P, Oliver JL, Román-Roldán R. 1999. Decomposition of DNA sequence complexity. Phys Rev Lett 83:3336–3339.
Oliver JL, Román-Roldán R, Pérez J, Bernaola-Galván P. 1999. SEGMENT: identifying compositional domains in DNA sequences. Bioinformatics 15:974–9.
Román-Roldán R, Bernaola-Galván P, Oliver JL. 1998. Sequence compositional complexity of DNA through an entropic segmentation method. Phys Rev Lett 80:1344–1347.