CITE-seq-counter

CITE-seq-counter is a software to count the UMI of antibodies tags in raw sequencing reads. CITE-seq-counter has been tested with Single Cell CITE-seq samples processed with 10x Genomics technologies. Cell and antibody barcode positions are adjustable, UMI positions as well. A white list is mandatory in current version and 1 mismatch is allowed in the barcode and the UMI. PCR duplicates (same UMI + barcode) are excluded from the count.

The sofware is written in Go, it is fast and the memory usage is as low as possible. Only the result table and two sequences are stored in RAM at the same time.

how to cite : please cite this reference

┌──────────────────────────────────────────┐

│ CITE-seq-counter (c)Frederic PONT 2018 │

│ Free Software GNU General Public License │

└──────────────────────────────────────────┘

Config summary : {1 16 17 26 1 15 ^[ATGC]{15}[TGC][A]{6,}}

barcode.tsv in whiteList

tags.tsv in tags

552 possible AB sequences in tags list

213955 possible cell sequences in white list

Undetermined_S0_L001_R1_001.fastq in fastqR1

Undetermined_S0_L001_R2_001.fastq in fastqR2

R1 and R2 parsing...

filename = fastqR1/Undetermined_S0_L001_R1_001.fastq size = 1.828925e+07 sequences

14441538 / 18289250 [=============================>-----] 78.96% 10s

Manual :

The sofware was statically compiled for Linux and Windows 64 bits : there is nothing to install.

1- unzip the software

2- Edit the conf.json file to match the cell/AB barcodes and UMI positions.

{

"Cell_barcode_first_base": 1,

"Cell_barcode_last_base": 16,

"Umi_first_base": 17,

"Umi_last_base": 26,

"AB_barcode_first_base": 1,

"AB_barcode_last_base": 15,

"Tag_regex": "^[ATGC]{15}[TGC][A]{6,}"

}

The TAG regex is a pattern matching the antibody barcode+UMI+polyA

For more information about regular expressions, visit :

http://perldoc.perl.org/perlre.html

3- copy the uncompressed fastq R1 file in fastqR1 directory. Copy the uncompressed fastq R2 file in fastqR2 directory.

4- Edit the tags.tsv file in the tags directory. This CSV table contains two columns separated by a tabulation : the AB tag and the AB name. for example :

5- copy your white list in the whiteList directory. The file must have only one column with the cells barcodes (see the test file for example). The white list is mandatory and can be obtained from the Seurat software.

6- Run the software using the Linux or Windows binary.

7- The result table is in the result directory