Post date: May 07, 2014 5:29:28 PM
We are going to move forward with the most recent de novo assembly done with the sosorum data: "Esosorum_40mil_denovo_mmp92.ace" with 28% assembled and 583340 contigs, because this is probably not abnormal for salamanders. Once I have a reference sequence, the reference-based assembly with be done at USU with BWA.
At Wyoming I:
#Made consensus sequence (I chose contigs that were 80 to 96 bases in length because that's what I did with all three other taxa):
cp /data/local/lycaeides_gbs/Scripts/pruneContigs.pl ./
perl pruneContigs.pl Esosorum_40mil_denovo_mmp92.ace 80 96
grep Contig pruned_Esosorum_40mil_denovo_mmp92.ace | wc
Number of contigs after pruning: 579590
mv pruned_Esosorum_40mil_denovo_mmp92.ace pruned_Esosorum_40mil_denovo_mmp92.fasta
#Then I tried to assemble these sequences to themselves to identify similar, potentially repetitive contigs.
cp Esosorum_40mil_denovo.smng.txt Esosorum_40mil_qc.smng.txt
emacs Esosorum_40mil_qc.smng.txt
#I changed:
loadSeq file:
"/data/local/august13_ut/pruned_Esosorum_40mil_denovo_mmp92.fasta"
setParam minMatchPercent:84
RealignContigs
saveProject file: "/data/local/august13_ut/Esosorum_40mil_qc_mmp84.fasta"
format:Phrap
saveReport file: "/data/local/august13_ut/Esosorum_40mil_qc_mmp84.report.txt"
writeUnassembledSeqs file: "/data/local/august13_ut/Esosorum_40mil_qc_mmp84.fasta"
closeProject
smng Esosorum_40mil_qc.smng.txt
574953 did not assemble (these are the good ones)
Then I copied the relevant sosorum stuff from sunflower to greenhouse to USU:
Esosorum_40mil_qc_mmp84.fasta
parsed_clean_TXState14_NoIndex_L005_R1_001.fastq
Esosorum_barcodes.csv
Then at I mounted the USU projects folder on my computer, went to /Volumes/labs/evolution/projects/ and I edited the file needed to split the fastq file into files for each individual and made a new file with a list of all the individuals:
cp ../example_scripts/splitFastq.pl ./
#I changed the regular expression to: if (/^\@(E\-BS\-[CW]\-[A-Z0-9]+\-\d+)/){
less Esosorum_barcodes.csv
cut -f 3 -d "," Esosorum_barcodes.csv > ids.txt
#I deleted the first row of that file in emacs.
#I then had to change the number of files I can have open at once on mac os x:
ulimit -S -n 2048
ulimit -a
perl splitFastq.pl ids.txt parsed_clean_TXState14_NoIndex_L005_R1_001.fastq
#This will run for a few hours. To check that it was working:
head -n 8 E-BS*fastq
#Then I'll follow Zach's directions for the reference-based assembly in BWA: https://sites.google.com/site/gompertlabnotes/home/lab-protocols/alignment-and-variant-calling