Technical Support
How to use:
- you need to install python 2.7 or python 3 on your machine.
- you need to install Numpy and Biopython
- you need to install future module by pip command
- Click “Clone or download” > “Download ZIP” > extract the downloaded file.
- Open the file “sddc.py” with (python.exe).
- Windows
- U/Linux : use the command
chmod u+x sddc.py
- Mac : use the command
python sddc.py
- State your variables and press Enter.
if you want to dereplicate protein sequences use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode derep
if you want to dereplicate protein sequences and preserve the original order of the sequences in the new file use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode derep -org_order
if you want to dereplicate protein sequences with a minimum length = 30 and sequences are in multiple files use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode derep -min_length 30 -multi
if you want to dereplicate nucleotide sequences with optimum approach and normal protein length = 300 use the following command
python sddc.py -in (input_file) -n -out (output_file) -mode derep -optimum -prot_length 300
if you want to filter a protein sequences inclusively by name (i.e. you want to retrieve only seqeunces that you've specified their names) use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode filter -flt_by name -flt_file (filter_file) -approach inclusive
if you want to filter a protein sequences inclusively by keyword(s) (i.e. you want to retrieve only seqeunces that you've specified the keywords (separated by a comma) in their names) use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode filter -flt_by name -flt_file (filter_file in csv) -approach inclusive -kw
if you want to filter a protein sequences exclusively by name (i.e. you want to retrieve the seqeunces that aren't present in your filter file) use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode filter -flt_by name -flt_file (filter_file) -approach exclusive
if you want to filter a protein sequences exclusively by keyword(s) in their names (i.e. you want to retrieve the seqeunces that certain keywords (separated be a comma) aren't present in your filter file) use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode filter -flt_by name -flt_file (filter_file in csv) -approach exclusive -kw
if you want to filter a nucleotide sequences by sequence (only exclusive) use the following command
python sddc.py -in (input_file) -n -out (output_file) -mode filter -flt_by seq -flt_file (filter_file)
if you want to exchange words in FASTA headers of your protein sequences use the following command
python sddc.py -in (input_file) -p -out (output_file) -mode exchange_headers -ex_file (exchange_file in csv)
if you want to exchange words in FASTA headers of your nucleotide sequences use the following command
python sddc.py -in (input_file) -n -out (output_file) -mode exchange_headers -ex_file (exchange_file in csv)