Scripts

Microarray
 
Gene Expression Analysis
 
bimodal_blog_post.R - R script to model bimodal gene expression using non-linear least squares regression (Blog Post)
 
Tiling Array Analysis
 
HCMV_Tiling_Array_Stat.R - R script for analysis and visualization of a custom HCMV tiling array (Huang et al. 2012)
 
HCMV_Annotate_Tiling_Array.pl - Perl script to average custom HCMV tiling array signal across scripts annoated via BLAST (Huang et al. 2012)

Next Generation Sequencing

BS-Seq Analysis
 
run_bismark_se.pl - Perl script that executes all of the steps necessary for running Bismark for single-end read data.  Script is meant to be run on a Linux server.
 
run_bismark_pe.pl - Same as above, but this script is for paired-end reads instead of single-end reads.  Additionally, this script uses samtools for alignment stats and also calculates the number of CpG sites represented in a sample.
 
find_bismark_CpG_sites.pl - template script to create COHCAP (Warden et al. 2013) annotation file for a targeted BS-Seq dataset.  User will need to also provide a table of gene coordinates (GENCODE_Genes.bed in the script, downloaded here under group "Genes and Gene Prediction Tracks" and track "GENCODE Genes V12") as well as CpG island / targeted regions (UCSC_CpG_Islands.bed in the script, downloaded here under group "Regulation" and track "CpG Islands").
 
RNA-Seq Analysis
 
run_RNA_Seq.pl - Perl script that aligns paired-end RNA-Seq data via TopHat, provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit.  Script is meant to be run on a Linux server.
  • run_RNA_Seq_v2.pl - Perl script that aligns paired-end RNA-Seq data via TopHat, summarizes mRNA expression levels using cufflinks, provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit. Script is meant to be run on a Linux server.  The output can be used in Partek (.bam files from TopHat) or R (tab-delimited text file from Cufflinks output)
 
run_RNA_Seq_de_novo.pl - Perl script that preprocesses RNA-Seq data (according to the guidelines from Szpara et al. 2011 : remove adapter sequence, filter out mononucleotides, and trimming based upon quality scores), assembles contigs / transcripts using Oases, estimates mRNA abundance using eXpress, and predicts gene function using BLAST (specifically, using the output format from CLC Bio Genomics Workbench).  As a technical note, I have found CLC Bio de novo to be the most useful for de novo RNA-Seq, even though this algorithm is not specifically design to handle RNA-Seq data.  However, I believe this  Perl script currently provides the most useful open-source solution.  In fact, the Velvet contigs (which are included as output files in the Oases output) are probably roughly similar to the CLC Bio de novo results.
 
run_miRNA_Seq.pl - Perl script that aligne single-end small RNA-Seq data via novoalign (optimized for miRNA alignment), provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit. Script is meant to be run on a Linux server.
 
DNA-Seq Analysis
 
Exon_Capture_workflow.pl -Perl script that executes relatively standard pre-processing analysis of exon capture DNA-Seq data (BWA for alignment, samtools for alignment stats, Picard for duplicate removal and targeted sequencing stats, and VarScan for SNP identification).  Script is meant to be run on a Linux server.
 
qPCR

qPCR Analysis

 qPCR_normalization.pl - Perl script to normalize qPCR Ct values based upon GAPDH expression.  More specifically, this script was used to analyze the output from Fluidigm Dynamic Arrays for Single-Cell Gene Expression Analysis (with the first 11 lines deleted, saved as a tab-delimited text file)

qPCR_normalization.R - R script to reformat the output from qPCR_normalization.pl into a matrix and visualize the expression of GAPDH in order to visually identify outliers.  The output from this script can be used for differential expression analysis (where I would typically use the Gene Expression workflow in Partek)

qPCR_cor_heatmap.R - R script that takes the output from qPCR_normalization.R (or a filtered version of that output) and produces a similarity matrix to identify co-expressed genes (visualized as a heatmap)
 
Molecular Imaging
 
Fiji_stitch.ijm - Fiji / ImageJ macro to stitch together adjacent regions for a large number of files.
 
Fiji_Montage.ijm - Fiji / ImageJ macro to stitch together adjacent, non-overlapping windows for a large number of images
Other

vocabBuilder.jar - program I wrote to help improve my vocabulary for the GRE (compiled from vocabBuilder.java)
Comments