Scripts

Microarray
 
Gene Expression Analysis
 
bimodal_blog_post.R - R script to model bimodal gene expression using non-linear least squares regression (Blog Post)
 
Tiling Array Analysis
 
HCMV_Tiling_Array_Stat.R - R script for analysis and visualization of a custom HCMV tiling array (Huang et al. 2012)
 
HCMV_Annotate_Tiling_Array.pl - Perl script to average custom HCMV tiling array signal across scripts annoated via BLAST (Huang et al. 2012)

Next Generation Sequencing

BS-Seq Analysis
 
run_bismark_se.pl - Perl script that executes all of the steps necessary for running Bismark for single-end read data.  Script is meant to be run on a Linux server.
 
run_bismark_pe.pl - Same as above, but this script is for paired-end reads instead of single-end reads.  Additionally, this script uses samtools for alignment stats and also calculates the number of CpG sites represented in a sample.
 
find_bismark_CpG_sites.pl - template script to create COHCAP (Warden et al. 2013) annotation file for a targeted BS-Seq dataset.  User will need to also provide a table of gene coordinates (GENCODE_Genes.bed in the script, downloaded here under group "Genes and Gene Prediction Tracks" and track "GENCODE Genes V12") as well as CpG island / targeted regions (UCSC_CpG_Islands.bed in the script, downloaded here under group "Regulation" and track "CpG Islands").
 
RNA-Seq Analysis
 
run_RNA_Seq.pl - Perl script that aligns paired-end RNA-Seq data via TopHat, provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit.  Script is meant to be run on a Linux server.
  • run_RNA_Seq_v2.pl - Perl script that aligns paired-end RNA-Seq data via TopHat, summarizes mRNA expression levels using cufflinks, provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit. Script is meant to be run on a Linux server.  The output can be used in Partek (.bam files from TopHat) or R (tab-delimited text file from Cufflinks output)
 
run_RNA_Seq_de_novo.pl - Perl script that preprocesses RNA-Seq data (according to the guidelines from Szpara et al. 2011 : remove adapter sequence, filter out mononucleotides, and trimming based upon quality scores), assembles contigs / transcripts using Oases, estimates mRNA abundance using eXpress, and predicts gene function using BLAST (specifically, using the output format from CLC Bio Genomics Workbench).  As a technical note, I have found CLC Bio de novo to be the most useful for de novo RNA-Seq, even though this algorithm is not specifically design to handle RNA-Seq data.  However, I believe this  Perl script currently provides the most useful open-source solution.  In fact, the Velvet contigs (which are included as output files in the Oases output) are probably roughly similar to the CLC Bio de novo results.
 
run_miRNA_Seq.pl - Perl script that aligne single-end small RNA-Seq data via novoalign (optimized for miRNA alignment), provides alignment stats via samtools, and fastq file stats using the FASTX-Toolkit. Script is meant to be run on a Linux server.
 
DNA-Seq Analysis
 
Exon_Capture_workflow.pl -Perl script that executes relatively standard pre-processing analysis of exon capture DNA-Seq data (BWA for alignment, samtools for alignment stats, Picard for duplicate removal and targeted sequencing stats, and VarScan for SNP identification).  Script is meant to be run on a Linux server.
 
run_CoNIFER.pl - create necessarily input files and run CoNIFER to call DNA copy number alterations.  Normalized coverage values are also exported to DNAcopy for greater sensitivity in copy number calls.

qPCR

qPCR Analysis

 qPCR_normalization.pl - Perl script to normalize qPCR Ct values based upon GAPDH expression.  More specifically, this script was used to analyze the output from Fluidigm Dynamic Arrays for Single-Cell Gene Expression Analysis (with the first 11 lines deleted, saved as a tab-delimited text file)

qPCR_normalization.R - R script to reformat the output from qPCR_normalization.pl into a matrix and visualize the expression of GAPDH in order to visually identify outliers.  The output from this script can be used for differential expression analysis (where I would typically use the Gene Expression workflow in Partek)

qPCR_cor_heatmap.R - R script that takes the output from qPCR_normalization.R (or a filtered version of that output) and produces a similarity matrix to identify co-expressed genes (visualized as a heatmap)
 
Molecular Imaging
 
Fiji_stitch.ijm - Fiji / ImageJ macro to stitch together adjacent regions for a large number of files.
 
Fiji_Montage.ijm - Fiji / ImageJ macro to stitch together adjacent, non-overlapping windows for a large number of images
Other

vocabBuilder.jar - program I wrote to help improve my vocabulary for the GRE (compiled from vocabBuilder.java)
Comments