filelog_cluster

Path to the directory : /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation

Here are the files saved in the directory for this project :

FOLDER : fasta_genome

This folder contains the fasta genome and the index file associated with it after calling different steps of the alignment. To create the .fai index file for variant calling I used the following : samtools faidx <.fasta>

FOLDER : sampleinfo

accessionlist.txt : the accession website urls list from Owen's excel file
accessionnumbers.txt : acession numbers of all the sequences that needed to downloaded. I created this list from the accessionlist.txt above using :
Timema_climate_adaptation_info_for_Sam.xlsx : master file which Owen sent me with all the info. I split all the sheets into separate files as follows.
Timema_climate_adaptation_info_accession.csv : accession number details
Timema_climate_adaptation_info_climatedata.csv : climate data details
Timema_climate_adaptation_info_sequenceinfo.csv : sequences from ncbi and info
Timema_climate_adaptation_info_siteinfo.csv : information about locations where samples were collected from for this project
Timema_distribution_map_moes-pops-only.jpg : map for timema
entropy_plots_by_species-site-host.pdf : entropy plots created by Owen.

FOLDER : scripts

downloadfastq.sh : bash script to download the sequences from the ncbi SRA. This script goes through the accessionnumbers.txt file above and downloads the fastq relevant to the accession number.
unzip.sh : bash script to unzip fastq.gz files downloaded from SRA to fastq files
runbwa.sh : This was my simple bash version of running bwa from Zach's EvoGenomics class.
wrap_qsub_slurm_bwa_aln.pl : wrapper script used to run bwa alignments using bwa aln.
wrap_qsub_slurm_bwa_mem.pl : wrapper script used to run bwa alignments using bwa mem.
wrap_qsub_slurm_sam2bam.pl : wrapper script to convert sam files to bam format
timemaVariantCalling.sh : shell script to do variant calling for each of the species
vcf2gl.pl : convert vcf file to genotype likelihood file
variantUnfiltered.sh : convert bcf to vcf without any filtering

FOLDER : srafiles_ncbi : all the corresponding sra files are saved in this folder.

FOLDER : alignments :

This folder details will be added shortly.

FOLDER: fastqfiles_ncbi : all the fastq files downloaded from the SRA are saved in this folder. In this folder the zipped fastq files downloaded from the SRA are saved in the folder fastqgzfiles.

FOLDER: samsaifiles: all the sam files and index files. Folder slurmfiles contains slurm output of the bwa runs.

wrap_qsub_slurm_sam2bam.pl : perl script to run bwa alignment.

FOLDER: bamfiles

Contains 8 species folders : bart, cali, chum, cris, knul, land, podu, popp. The respective bam, sorted.bam and bai(index( files in each of these folders. Refer to the variant calling page to find details of number of files in each folder.

Each folder contains :

*.bam files
*.sorted.bam.bai files
variantsTimema*.bcf = bcf file for this species
variantsTimema*.vcf = vcf file for this species
variantsTimema*_unfiltered.vcf = unfiltered vcf without filters
findreadcounts.py = python script to determine total number of reads
out.txt = contains mapped and unmapped reads information

This folder also contains 8 text files for respective species which contains sample info of the each alignment for each species

FOLDER: variantcalling

variantsTimema*.vcf = variant calling file for all species
timemaVariants_*.gl = genotype likelihood file from the vcf file for each species (created after calling vcf2gl.pl script
vcf2gl.pl = script to convert vcf to gl files

Page updated

Google Sites

Report abuse