Path to the directory : /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation
Here are the files saved in the directory for this project :
FOLDER : fasta_genome
This folder contains the fasta genome and the index file associated with it after calling different steps of the alignment. To create the .fai index file for variant calling I used the following : samtools faidx <.fasta>
FOLDER : sampleinfo
accessionlist.txt : the accession website urls list from Owen's excel file
accessionnumbers.txt : acession numbers of all the sequences that needed to downloaded. I created this list from the accessionlist.txt above using :
Timema_climate_adaptation_info_for_Sam.xlsx : master file which Owen sent me with all the info. I split all the sheets into separate files as follows.
Timema_climate_adaptation_info_accession.csv : accession number details
Timema_climate_adaptation_info_climatedata.csv : climate data details
Timema_climate_adaptation_info_sequenceinfo.csv : sequences from ncbi and info
Timema_climate_adaptation_info_siteinfo.csv : information about locations where samples were collected from for this project
Timema_distribution_map_moes-pops-only.jpg : map for timema
entropy_plots_by_species-site-host.pdf : entropy plots created by Owen.
FOLDER : scripts
downloadfastq.sh : bash script to download the sequences from the ncbi SRA. This script goes through the accessionnumbers.txt file above and downloads the fastq relevant to the accession number.
unzip.sh : bash script to unzip fastq.gz files downloaded from SRA to fastq files
runbwa.sh : This was my simple bash version of running bwa from Zach's EvoGenomics class.
wrap_qsub_slurm_bwa_aln.pl : wrapper script used to run bwa alignments using bwa aln.
wrap_qsub_slurm_bwa_mem.pl : wrapper script used to run bwa alignments using bwa mem.
wrap_qsub_slurm_sam2bam.pl : wrapper script to convert sam files to bam format
timemaVariantCalling.sh : shell script to do variant calling for each of the species
vcf2gl.pl : convert vcf file to genotype likelihood file
variantUnfiltered.sh : convert bcf to vcf without any filtering
FOLDER : srafiles_ncbi : all the corresponding sra files are saved in this folder.
FOLDER : alignments :
This folder details will be added shortly.
FOLDER: fastqfiles_ncbi : all the fastq files downloaded from the SRA are saved in this folder. In this folder the zipped fastq files downloaded from the SRA are saved in the folder fastqgzfiles.
FOLDER: samsaifiles: all the sam files and index files. Folder slurmfiles contains slurm output of the bwa runs.
wrap_qsub_slurm_sam2bam.pl : perl script to run bwa alignment.
FOLDER: bamfiles
Contains 8 species folders : bart, cali, chum, cris, knul, land, podu, popp. The respective bam, sorted.bam and bai(index( files in each of these folders. Refer to the variant calling page to find details of number of files in each folder.
Each folder contains :
*.bam files
*.sorted.bam.bai files
variantsTimema*.bcf = bcf file for this species
variantsTimema*.vcf = vcf file for this species
variantsTimema*_unfiltered.vcf = unfiltered vcf without filters
findreadcounts.py = python script to determine total number of reads
out.txt = contains mapped and unmapped reads information
This folder also contains 8 text files for respective species which contains sample info of the each alignment for each species
FOLDER: variantcalling
variantsTimema*.vcf = variant calling file for all species
timemaVariants_*.gl = genotype likelihood file from the vcf file for each species (created after calling vcf2gl.pl script
vcf2gl.pl = script to convert vcf to gl files