Extract Fasta in list parallel

This software extracts Fasta sequences matching a list of keywords or sequences (new in v018 and later). This software is similar to Extract Fasta in list but designed for powerful computers. Sequences are read by groups of 10000 and processed by 10 CPUs so it is 5 times faster than Extract Fasta in list but need more RAM and CPU cores. For massively parallel computer and to search keywords in title (no sequence search), I suggest Extract Fasta in list parallel GO.

Manual :

1- install Perl free programming language and GNU parallel.

2- unzip the software

3- copy your Fasta files in the “fasta” directory.

4- copy your reference lists in the “lists” directory, one item per line.

The lines of the reference list are interpreted as strings :

GEN1 matches GEN1, GEN11, GEN12 etc.

(GEN1) matches (GEN1)

The fasta block (title + sequence) is flatten before the search, so it is possible to search a sequence.

5- edit parallel_extract_conf.txt to set the number of CPUs for parallel processing.

6- execute the software by the command : perl parallel_extract_fasta-0.3.pl

7- processed files are in the “results” directory.

8- search log files are in the “log” directory.

download

Page updated

Google Sites

Report abuse