downloadfiles_SRA

1. Download, install and load the SRAtoolkit which we need for downloading the fastq files from SRA. I downloaded the sratoolkit.2.8.2-1 and this is saved in the source folder : /uufs/chpc.utah.edu/common/home/u6007910/source/sratoolkit.2.8.2-1-centos_linux64/. All the commands in this toolkit are in the bin folder in this folder. Therefore for running the commands we have to give the path to this folder.
2. I downloaded files from the SRA using the script -- downloadfastq.sh. This is saved in the folder: /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/scripts. Basically this is a bash file which takes a file with list of accession numbers for sequences to be downloaded, reads each line of the file (each accession number) and downloads the fastq.gz file using the fastq-dump command from SRAtoolkit. As I had a lot of fastq files to download, I created a bash script and submitted it as a job to the cluster.
3. When we download files from SRA, for each fastq file a .sra is generated. These files are saved in the folder: /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/srafiles_ncbi.

I downloaded 1545 fastq.gz files from SRA but I deleted files from 4 species which had only 1 individual sequenced: boha, curi, peti, and shep. Total files for these 4 were 125. Therefore, I removed these 125 files and now I have 1420 files in the folder : /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/fastqfiles. In this folder the file delete.txt contains accession numbers of the deleted files for this species.
The fastq.gz files were converted (unzipped) to fastq using : for f in *.fastq.gz; do unzip $f; done. This is saved in the file as unzip.sh in the folder/uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments

Page updated

Google Sites

Report abuse