Sedgwick Reserve, where I studied pollen and seed dispersal in Quercus agrifolia and Q. lobata.

 

Intron length distributions within the 5'UTR, coding sequence, and 3'UTR of Arabidopsis thaliana genes.

Intron length distribution in Arabidopsis

 

Flower of Delonix regia

photo of Delonix regia flower

 

Pollen from Delonix regia stained to estimate viability; viable grains are dark blue, nonviable grain is pale blue.

germinating pollen

 

Umeå Plant Science Centre

Software

Here you will find software packages to perform a variety of mostly genetics-related tasks, mostly specific to projects of mine, mostly in R.

 Package  Description  Version  Download
checkNullAlleles checkNullAlleles() is an R function that checks a set of genotypes for null alleles with respect to a set of reference genotypes. A null allele in the comparison genotype is observed when the reference genotype at that locus is heterozygous, and the comparison genotype is homozygous for one of the reference alleles. The function simply reports such cases; whether it represents an actual null allele is for you to decide. You can check a single dataset for internal consistency by using the same dataset as reference and comparison genotype. Included in this ZIP file is the R function, example reference and comparison genotype files, and the results from a sample run using these genotype files. This function also requires genotypes to be in GenAlEx format. Use the readGenalex() function, included in the ZIP file and also available separately here, to read this format into an R data.frame.
> source("readGenalex-0.2.R")
> source("checkNullAlleles-0.2.R")
> checkNullAlleles("reference_genotypes.txt", "compare_genotypes.txt")

checkNullAlleles 0.2: 6 reference genotypes
checkNullAlleles 0.2: 16 comparison genotypes

comp8	1 compare	1/1	2/1	2/5	*3/3*	3/1 
ref6	1 ref	        1/1	2/1	2/5	*2/3*	3/1 

comp10	1 compare	2/3	1/1	*4/4*	3/3	6/1 
ref2	1 ref	        2/3	1/1	*2/4*	3/3	6/1 

comp12	1 compare	3/3	2/1	2/2	3/1	*3/3* 
ref4	1 ref	        3/3	2/1	2/2	3/1	*2/3* 



0.2 Link in repository
convertHapSNPsToArlequin convertHapSNPsToArlequin() is an R function to convert a file containing population-specific haplotype SNPs in a simple format to an Arlequin project file. A number of options are provided for controlling the output. This tool is in no way affiliated with the Arlequin project. The website for the current version of Arlequin (3.5) is http://popgen.unibe.ch/software/arlequin35/. 0.1 Link in repository
fastagc.pl fastagc.pl is a Perl script to compute GC and base content, along with a few other statistics, for FASTA-formatted sequences in the input file(s). Input sequences must contain only the bases A, C, G, T as well as N. GC content is calculated with and without accounting for Ns. GC content can be calculated for sequences broken into blocks, where blocks are defined by the length of FASTA input lines. If a sequence is broken into 60-bp lines, then block size is 60 bp. GC content can be computed four ways, all of which may be produced in one run:
  • total GC content across all input sequences (this is the output shown below)
  • GC content on a block-by-block bases along all input sequences, as if they were concatenated
  • mean GC content within each block across all input sequences
  • GC content of each input sequence
$ gunzip < Cpapaya_113.fa.gz | fastagc.pl
statistic	                value
N seqs	                        5901
N FASTA lines	                5687642
N lines in longest sequence	103987
Mean block length including N	60.25
Mean block length excluding N	42.99
N letters including N	        342680090
N bases excluding N	        244530832
GC including N	                0.24919
GC excluding N	                0.34921
A                        	79583366
C	                        42687682
G	                        42705800
T	                        79553984
N	                        98149258
A%	                        0.23224
C%	                        0.12457
G%	                        0.12462
T%	                        0.23215
N%	                        0.28642
0.1 Link in repository
plotGFF plotGFF() is an R function to produce a simple graphical plot of the annotatation contained within a GFF file, read by using the import.gff() function from the BioConductor package rtracklayer prior to calling plotGFF(). Tracks of sites (e.g., SNP locations/frequencies) can be added below the GFF plot. plotGFF() requires the BioConductor packages rtracklayer, IRanges and Biostrings. For a much more sophisticated plot, see e.g. gff2ps (http://genome.crg.es/software/gfftools/GFF2PS.html).

0.1 Link in repository
plotSpatialPies plotSpatialPies() is an R function to plot pie charts representing fractions of a code at particular locations. Input is a data file containing sample entries with (at least) the named columns 'site', 'lat', 'long', and 'code', and optionally a second file containing containing named columns 'code' and 'color' specifying the common (across-sites) color which should be used for plotting the pie wedges for each code. Output can be controlled with a number of options. This was originally developed to plot proportions of particular genotypes at population locations, but it will now plot more general proportions of each 'code' at each site. Requires the library "plotrix" for its very useful floating.pie() function.  The function is useful for rapid examination of data, and depending on your needs it may require modification to produce presentation-quality output.

0.2 Link in repository
pmi pmi() is an R function to calculate PMI (Probability of Maternal Identity) statistics, as described in Grivet et al. (2005) and Scofield et al. (2010). The script calculates all three PMI site-wise estimators qgg, rgg, and q*gg, along with weighted and unweighted means and variances and pairwise PMI statistics. Also provided are the functions pmiPooled() for pooling data subsets and then calculating PMI statistics, and pmiPlot() for plotting the sites-by-types table in a format acceptable for publication (e.g. Figure 1 of Scofield et al. 2010, see below). For more information, see http://www.eeb.ucla.edu/Faculty/Sork/Sorklab/software_pmi.html.

0.2 Link in repository
readGenalex readGenalex() is an R function to read GenAlEx-format genotype files into an annotated data.frame.  Several functions are provided for accessing and printing this data. GenAlEx and its documentation are available from http://www.anu.edu.au/BoZo/GenAlEx/
> source("readGenalex-0.2.R")
> refgt <- readGenalex("reference_genotypes.txt")
> refgt
    id Site loc1 loc1.2 loc2 loc2.2 loc3 loc3.2 loc4 loc4.2 loc5 loc5.2
1 ref1    1    3      3    2      3    2      2    3      3    4      3
2 ref2    1    2      3    1      1    2      4    3      3    6      1
3 ref3    1    3      3    2      3    2      2    3      1    4      2
4 ref4    1    3      3    2      1    2      2    3      1    2      3
5 ref5    1    1      1    1      3    2      5    3      3    6      2
6 ref6    1    1      1    2      1    2      5    2      3    3      1
> attributes(refgt)
$names
 [1] "id"     "Site"   "loc1"   "loc1.2" "loc2"   "loc2.2" "loc3"   ...
$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.frame"

$n.loci
[1] 5

$ploidy
[1] 2

$n.samples
[1] 6
 
...
0.2 Link in repository