Gene Expression Analysis
bimodal_blog_post.R - R script to model bimodal gene expression using non-linear least squares regression (Blog Post)
Tiling Array Analysis
HCMV_Tiling_Array_Stat.R - R script for analysis and visualization of a custom HCMV tiling array (Huang et al. 2012)
HCMV_Annotate_Tiling_Array.pl - Perl script to average custom HCMV tiling array signal across scripts annoated via BLAST (Huang et al. 2012)
Next Generation Sequencing
run_bismark_se.pl - Perl script that executes all of the steps necessary for running Bismark for single-end read data. Script is meant to be run on a Linux server.
run_bismark_pe.pl - Same as above, but this script is for paired-end reads instead of single-end reads. Additionally, this script uses samtools for alignment stats and also calculates the number of CpG sites represented in a sample.
find_bismark_CpG_sites.pl - template script to create COHCAP (Warden et al. 2013) annotation file for a targeted BS-Seq dataset. User will need to also provide a table of gene coordinates (GENCODE_Genes.bed in the script, downloaded here under group "Genes and Gene Prediction Tracks" and track "GENCODE Genes V12") as well as CpG island / targeted regions (UCSC_CpG_Islands.bed in the script, downloaded here under group "Regulation" and track "CpG Islands").
run_RNA_Seq.pl - Perl script that aligns paired-end RNA-Seq data via TopHat, provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit. Script is meant to be run on a Linux server.
run_RNA_Seq_de_novo.pl - Perl script that preprocesses RNA-Seq data (according to the guidelines from Szpara et al. 2011 : remove adapter sequence, filter out mononucleotides, and trimming based upon quality scores), assembles contigs / transcripts using Oases, estimates mRNA abundance using eXpress, and predicts gene function using BLAST (specifically, using the output format from CLC Bio Genomics Workbench). As a technical note, I have found CLC Bio de novo to be the most useful for de novo RNA-Seq, even though this algorithm is not specifically design to handle RNA-Seq data. However, I believe this Perl script currently provides the most useful open-source solution. In fact, the Velvet contigs (which are included as output files in the Oases output) are probably roughly similar to the CLC Bio de novo results.
run_miRNA_Seq.pl - Perl script that aligne single-end small RNA-Seq data via novoalign (optimized for miRNA alignment), provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit. Script is meant to be run on a Linux server.
Exon_Capture_workflow.pl -Perl script that executes relatively standard pre-processing analysis of exon capture DNA-Seq data (BWA for alignment, samtools for alignment stats, Picard for duplicate removal and targeted sequencing stats, and VarScan for SNP identification). Script is meant to be run on a Linux server.
qPCR_normalization.pl - Perl script to normalize qPCR Ct values based upon GAPDH expression. More specifically, this script was used to analyze the output from Fluidigm Dynamic Arrays for Single-Cell Gene Expression Analysis (with the first 11 lines deleted, saved as a tab-delimited text file)
qPCR_normalization.R - R script to reformat the output from qPCR_normalization.pl into a matrix and visualize the expression of GAPDH in order to visually identify outliers. The output from this script can be used for differential expression analysis (where I would typically use the Gene Expression workflow in Partek)
qPCR_cor_heatmap.R - R script that takes the output from qPCR_normalization.R (or a filtered version of that output) and produces a similarity matrix to identify co-expressed genes (visualized as a heatmap)
Fiji_stitch.ijm - Fiji / ImageJ macro to stitch together adjacent regions for a large number of files.
Fiji_Montage.ijm - Fiji / ImageJ macro to stitch together adjacent, non-overlapping windows for a large number of images
vocabBuilder.jar - program I wrote to help improve my vocabulary for the GRE (compiled from vocabBuilder.java)