Functional genomics experiments based on next-gen sequencing (e.g. ChIP-seq, MNase-seq, DNase-seq, FAIRE-seq) that measure biochemical activity of various elements in the genome often produce artifact signal in certain regions of the genome. It is important to keep track of and filter artifact regions that tend to show artificially high signal (excessive unstructured anomalous reads mapping). Below is a list of comprehensive empirical blacklists identified by the ENCODE and modENCODE consortia. Note that these blacklists were empirically derived from large compendia of data using a combination of automated heuristics and manual curation. These blacklists are applicable to functional genomic data based on short-read sequencing (20-100bp reads). These are not directly applicable to RNA-seq or any other transcriptome data types. The blacklisted regions typically appear uniquely mappable so simple mappability filters do not remove them. These regions are often found at specific types of repeats such as centromeres, telomeres and satellite repeats. It is especially important to remove these regions that computing measures of similarity such as Pearson correlation between genome-wide tracks that are especially affected by outliers.
Blacklist for various species and genome versions can be downloaded from here http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/
The human hg19 blacklist was generated by Anshul Kundaje as part of the ENCODE project
The worm, fly and mouse blacklists and the GRCh38 human blacklist were generated by Alan Boyle and Anshul Kundaje as part of the ENCODE and modENCODE projects.
If you use these tracks in any work please cite the flagship ENCODE paper: ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.
A detailed publication on these tracks is imminent and the appropriate citation will be provided here.
In the mean time, you can check out this paper http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3989762/ uses the human blacklist to examine artifacts in ChIP-seq and ChIP-exo data. However, please DO NOT cite it as the primary source of the blacklist. The primary citation should be the flagship ENCODE paper and a link to this website.
Below is a brief description of how the human track was generated. Analogous procedures were used for the mouse, worm and fly tracks.