SRA sequence upload

Here are the steps for sequence read archiving on NCBI SRA. The files I uploaded are on the cluster in the zip file: (put folder details).

1. Create an account on NCBI SRA and log into your account to create the project submission page.

2. Add the details of the project upto the next step.

3. Bio Attributes file: This is one of the files that needs to be uploaded for the submission process. This could be a text file or excel file -- either way there is a template available on the page which requires this step. USE THIS TEMPLATE. Have the details of your library prep ready for this file. This file mainly focuses on details of the sample collection. Remember sample name is something you are suggesting for the samples to be called when all the files are uploaded on the SRA server. Alternatively, you could edit the inbuilt table on SRA and add your sample details to this table.

Here is a link to my bio attributes table:https://docs.google.com/spreadsheets/d/178AJMI6TMfuSvdXGPB4AdEKBkqwZe73pOszNovZFEVY/edit#gid=24822202

Notice how some columns are the same and some change for each sample.

4. SRA Metadata File: This file is a pain to prepare. There are two templates again in text and excel format for the file on the step page which prompts for submission of this file. Use the excel file as this is has options preloaded in the columns. Chose the details for specific columns using drop down options. Again, if you enter details in the inbuilt file on the SRA site, you can use drop down menus to select your options for each column.

Here is a link to my metadata table: https://docs.google.com/spreadsheets/d/1WbRv21unqPnZIG3uDRNTJVOu9t3_NzpknE9hFLjADrg/edit#gid=54150393

Again notice how some columns are the same and some differ for each ID. For this table you should have all information for your library prep ready to be popped into the table. The reference fasta assembly name remains the same for all samples.

5. I chose to upload bam files. So I need to give the name of the fasta file against which I assembled the bam files in the column labeled assembly. You will list each file in a separate row. Then each of these files should be uploaded separately to the ftp cluster. You might end of adding 100s of files. (read more here: https://www.ncbi.nlm.nih.gov/sra/docs/submitformats/)

6. Now, create a new folder on the cluster, copy all bam files to this folder. Then copy the fasta file to this folder.

7. From this folder upload the files to NCBI ftp cluster using the following instructions: https://www.ncbi.nlm.nih.gov/sra/docs/submitportal/. Here are the commands I used

ftp -i

open ftp-private.ncbi.nlm.nih.gov

Name (ftp-private.ncbi.nlm.nih.gov:u6007910): insert username from the SRA submission page (this will be specific to your submission)

Password: insert password from the SRA submission page (this will be specific to your submission)

cd uploads/schaturvedi@aggiemail.usu.edu_V0dwUzKn #cd into your specific upload folder

mkdir new_folder #create a new folder to upload your files, do not upload files in root directory

cd new_folder

#put the fasta assembly and zipped bam files in the new folder

mput *.bam

or mput *

put final.assembly.fasta

put melissaHostUse.tar.bz2

Once you upload the files, it will take 10-15 minutes for the files to show up on your SRA submission page. Once they do, select the folder as your preload folder and the files will automatically show up.

Complete the submission and wait for few minutes before you are assigned you bioproject accession number. This is the number you need for the manuscript.

Here is a great resource for troubleshooting and step by step guide with pics: https://github.com/CandiceChuDVM/RNA-Seq/wiki/Tutorial:-How-to-upload-your-data-to-the-evil-Sequence-Read-Archive-(SRA)%3F

# convert BAM to SAM

for file in ./*.bam

echo $file

samtools view -h $file > ${file/.bam/.sam}

done

Page updated

Google Sites

Report abuse