Lab Code and Datasets

Feel free to use any code we've shared publicly using github. If you find an error, please let us know and we'd be happy to correct it. We also ask that you properly cite our works which all have associated DOIs.

Our github repositories:

Random-scripts (DOI: 10.5281/zenodo.10287)

Here we will upload various miscellaneous programs we use to make our lives easier. Perhaps they can make your life easier too!

(1) 'sort_by_chr.sh' #found it frustrating to try to sort large files with multiple chromosomes, so I wrote a shell program that takes a list of chromosomes and an input file and sorts each chromosome on its own, writing it all to output.

(2) 'split_by_chr.pl' #program takes a fasta input and writes an individual output fasta file for each fasta entry. Setup to exclude unknown chromosomes etc (modify for your organism).

VCF-conversion-tools (DOI: 10.5281/zenodo.10288)

Here we have uploaded a variety of programs we regularly use for vcf conversions between various other file formats, such as PHASE, fastPHASE, MS and LDhat.

(1) Files in this folder will help convert between vcf and other formats:

'fastPHASE2VCF.pl' #converts between fastPHASE output to VCF

'thinVCF.pl' #thins VCF files (slightly different algorithm from vcftools, removes all but one site if sites are close to each other)

'vcf_merge.pl' #merges multiple VCF files into single file

'vcf2fastPHASE.pl' #converts a VCF file to fastPHASE input for autosomes and females

'vcf2fastPHASE_4males.pl' #converts VCF file to fastPHASE input for males on the X

'vcf2MS.pl' #converts VCF to MS format

(2) A few other programs are included as they an be used with the others to convert to and from LDhat formats:

'fastPHASE2LDhat.pl' #converts fastPHASE output to LDhat input

'MS2LDhat.pl' #converts MS input to LDhat input

'MS2PHASE.pl' #converts MS input to PHASE input

(3) The following programs are useful for creating bed files of a subset of overlapping SNPs from a VCF file:

'CreateChrBedFromVCF.pl' #creates BED file with bins of 4k SNPs, 100 overlapping, useful for running LDhat

'CreateChrBedFromVCF2.pl' #creates BED file with bins of 400 SNPs, 100 overlapping, useful for running PHASE

Code from the Great Ape recombination analysis (GarMap)

This repository houses all the files and code that were generated as part of the GarMap project. DOI: 10.5281/zenodo.13975