Output files

***.pdf

  • This file contains all the figures plotted by the Profile function.

***.R file

  • This file contains all the R commands for plotting figures in the .pdf file.

***_genomicSiteCategory.xls

  • This file contains the data that is used to plot each figure in the .pdf file. There should be one such file for each category of genomic sites, e.g., ***_TSS.xls for transcription start site (TSS), or ***_gene.xls for gene body.

  • Each file would start with the column "pos", which each row in this column indicate the position (distance) relative to a category of genomic site, e.g., .

  • There will be one or more additional columns each representing the average occupancy of one wiggle data around one group of genomic sites. The specific wiggle data , genomic site groups, and genomic site category will be indicated by the column name. E.g., for a column with name "wild_type.wig.up_regulated.xls.tss", the wiggle file wild_type.wig, gene file up_regulated.xls, and genomic site category TSS (transcription start site) are used to calculated the data in this column.

***_genomicSiteCategory_heatmap/

    • This is a directory containing data for plotting heat map, e.g., by subjecting the data to the tool MEV. There should be one such directory for each category of genomic sites. Actually, each directory contains the data that is used to calculate the data in the file ***_genomicSiteCategory.xls. e.g., ***_TSS_heatmap/ contains the data that is used to calculate the data in the file ***_TSS.xls.

    • Each such directory would contain 1 or more .xls files. Each such file contain the data used to calculate data in a column of the file ***_genomicSiteCategory.xls. E.g., in the directory ***_TSS_heatmap/, there may be a file up_regulated.xls.tss.wild_type.wig.heatmap.xls that contains the data used to calculate the column "wild_type.wig.up_regulated.xls.tss" in the file ***_TSS.xls

    • Each file in the directory, e.g., the file up_regulated.xls.tss.wild_type.wig.heatmap.xls, starts with the following columns:

    • (1) name the name of a genomic site, e.g., the gene name of a TSS (transcription start site)

    • (2) max the maximal occupancy in a given region, e.g., trom 1.5kb upstream to 1.5kb down stream of a TSS

    • (3) min the minimal occupancy in a given region, e.g., trom 1.5kb upstream to 1.5kb down stream of a TSS

    • (4) sum the sum of occupancy values in a given region, e.g., trom 1.5kb upstream to 1.5kb down stream of a TSS

    • (5) There would be multiple additional columns each with a name represents a data point in a given region, e.g., "-100" and "100" may represent the 100th data point upstream and downstream of TSS. When calculating for gene body, "-100", "100" and "+100" may represent the 100th data point upstream of TSS, downstream of TSS, and downstream of TTS (Transcription terminal site). NOTE: each data point represent a bin, e.g., when --bin_size is set to be 10, then each data point represent 10bp.