After Matt ran on the analyses, he sent me list of transcripts to annotate.
I double checked the scaffold positons for transcripts in multiple raw fastq files from Matt and then did the following to create a file with cuffIDs, scaffold and their start stop position:
Folder: /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/matt_transcriptome/cufflinks/genomeref/transcript_annot
cut -f1 ../KS001.gtf > scaffold_ks001
cut -f9 ../KS001.gtf | cut -d';' -f2 | cut -d' ' -f3 > cuffids_ks001
##in R
trlist<-read.table("transcript_list.txt", header=F)
trids<-unique(trlist[,1])
#get the files with all the scaffolds and transcript ids
sp<-read.table("scaffold_ks001", header=F)
cid<-read.table("cuffids_ks001", header=F)
pos<-read.table("startstop_ks001", header=F)
final<-cbind(cid,sp, pos)
final<-as.data.frame(final)
#loop to crosscheck and write out the file
for (i in trids){
scafids<-unique(final[which(final[,1] == i),])
print(scafids)
write.table(scafids, "scafs_transcriptids.txt",append=T, row.names=F, col.names=F, quote=F)
}
I used the following scripts to run the annotation: create_snp_annotations.py. This is the same script I used to annotate SNPs for the dubois data. I ran this script on both start and stop position of transcripts that Matt sent me.
I used only the scaffold and start or scaffold and stop position from the scafs_transcriptids.txt file to run this annotation.
python create_snp_annotations.py --map transcripts_start --ann genome_annotation.txt --out out_transcripts_start
python create_snp_annotations.py --map transcripts_stop --ann genome_annotation.txt --out out_transcripts_stop
On July 10th 2020: Su'ad asked me send back annotations for her result tables. I wrote a script (transcript_annot.py) for this and created the tables (*annot.csv) and sent it back to her. In the output table, I manually edited the column names and edited a few columns which were empty by adding 0. The files are saved in the transcript_annot folder above in the folder suad_results_annot.