Gregory R. Warnes, Ph.D.

greg@warnes.net – 585-678-6661

Biostatistician – Data Scientist – Statistical Methods Developer - Analytic Software Developer 



Summary of Qualifications

  • Skilled biostatistician, statistical methods developer, software developer, and expert in the R statistical analysis system.

  • Background in infectious disease modeling, immune system modeling, and pre-clinical pharmaceutical and academic research. 

  • Deep technical understanding of advanced targeted and high-resolution genetic, genomic, and epigenetic biomedical technologies, 

  • History of collaborative relationships with scientists that extend beyond statistical support to focus on solving the underlying scientific research problems.

  • Excels at communication with and bridging between diverse scientific, technical, business, and cultural communities.

  • Demonstrated record of improving efficiency and scientific impact by developing and improving tools and processes. 

Employment History

Owner & Chief Scientist

GRW Consulting

02/2013–Present

Provide modeling expertise, data analysis, and scientific software development for clients in diverse industries, including pharma/biotech, agriculture, and finance. Recent engagements include: Statistical support and guidance for a bioinformatics team in the research organization of a major pharmaceutical company; Development of software for statistically rigorous market prediction by non-quantitative marketing executives for a major pharmaceutical company; Analytic tool development for agricultural crop irrigation planning and management.

Principal Data Scientist

Torqata Data and Analytics, LLC

08/2018-04-/2020

Invented and led development of Torqata's marquee "Category Compass" SAAS product. Category Compass is a first-of-its-kind tool for tire retailers which applies an advanced market demand model to identify location-specific market opportunities and optimal product stocking strategies. Also provided statistical training and consulting, including experimental design and analysis, guidance on development and evaluation of machine learning models.

Principal Data Scientist

Medidata Solutions, Inc.

08/2016-07/2018

Developed advanced tools, processes, algorithms, architectures, and innovative approaches for leveraging a large repository of clinical trial data (1000+ pharmaceutical companies, 10,000+ Studies, 10M+ clinical subjects). Designed and prototyped a web application for the analysis of genomic biomedical data that explicitly supports the workflow for addressing targeted scientific questions from high-dimensional data, while providing rigorous and flexible statistical modeling.

Senior Principal Biostatistician

Boehringer Ingelheim Pharmaceuticals Inc.

01/2014-08/2016

Acted as statistical expert and lead statistician for the research division located in Ridgefield, CT. Provided strategic vision and set direction for the use of statistical approaches and analysis, and decision-making. Ensured that research programs were statistically rigorous and efficient. Provided statistical expertise and training to project teams and internal statistical experts with respect to appropriate experimental design, analytic techniques, and rules for addressing statistical issues, analysis plans, and inferences from data. Evaluated novel high-dimensional biological technologies, designed experiments to validate and optimize performance of these technologies, and recommended appropriate analytic tools.

Senior Expert Modeler

Novartis Institutes for Biomedical Research

07/2011–02/2013

Developed an interactive and statistically rigorous web-based tool for assigning clinical study patients and sites to countries, optimizing for time to completion and cost while accounting for country, therapeutic area, indication, study phase, and study length effects on costs, delays, and recruitment rates under resource and other constraints. Clinical trial simulation using PK, PD, patient distribution, and trial feature information for ophthalmology.

Founding Director

Center for Research Computing, University of Rochester

07/2008–07/2011

Founded and managed a comprehensive program supporting computing as a research tool at the University of Rochester. Managed funding, purchase, installation, maintenance, and support for high-performance computing technology. Managed IT and computational science staff. Established center policy and procedures. Developed training and outreach programs for university researchers. Obtained over $10 Million in contributions, gifts, and grant funding. Over a 2 year period, grew the user population from 0 to more than 400 researchers, representing more than $300M in research grants.

Associate Professor

Department of Biostatistics and Computational Biology,

University of Rochester

05/2006–07/2011

Co-directed the biocomputing core of the University of Rochester Center for Biodefense Immune Modeling (CBIM). Provided statistical support for immunological research on Influenza and HIV. Developed DEDiscover, a high-quality software application for modeling biological processes using systems of differential equations. Taught courses in applied Bayesian Modeling using Markov Chain Monte Carlo, and Statistical Computing. Performed statistical consulting, statistical research, and computational research in collaboration with faculty from Immunology, Medicine, Biochemistry and Biophysics, Biostatistics, and Computer Science.

Statistical Consultant

Gregory R. Warnes Statistical Consulting

01/2010–07/2011

Provided statistical, data analysis, and modeling expertise to clients in medical, legal, and financial organizations.

Owner & Chief Scientist

Random Technologies LLC

12/2006 - 12/2009

Established and operated a software business providing enterprise-class packaging, enhancements, services, and training for the R statistical software system. Performed statistical consulting and software development for scientific, financial, and marketing organizations.

Associate Director

Nonclinical Statistics, Pfizer Global Research and Development

11/2000-05/2006

Statistical analysis, methods development, and software implementation in support of projects utilizing "-omic" technologies for pre-clinical, safety, and basic science projects spanning disease areas (Atherosclerosis to Obesity), experimental species (Yeast to Human Subjects), sample sizes (4 to 40,000), and measurement technologies (e.g. small scale DNA genotyping, Affymetrix mRNA microarrays, 2-d gel proteomics, whole-genome SNP genotyping, and NMR metabanomics). Designed and implemented MIDAS, a web-based system for processing mRNA microarray data from machine data to analytic results, quadrupling the number of mRNA supported by each statistician from 10 to 40 per year while improving the reliability, reproducibility, and scientific impact of study results. Developed, published, and maintained more than 20 R, Python, and Zope extension packages.

Associate Research Scientist

Department of Computer Science, Yale University

01/2003-05/2006

Developed parallel and distributed algorithms, methods, and software for the analysis of genetic, genomic, proteomic, and metabanomic data, including development of statistical genetics packages for R. Developed, published, and maintained more than 20 R, Python, and Zope extension packages.

Summer Intern

Statistics and Data Mining Research, Bell Labs

06/1999-09/1999

Under the direction of John M. Chambers, Ph.D., developed parallel modules for pseudo-random number generation, bootstrapping, and Markov Chain Monte Carlo (MCMC) for the OmegaHat next-generation statistical computing system.

Research Assistant

Fred Hutchinson Cancer Research Center

12/1997-10/2000

Developed and validated algorithms and software for Markov Chain Monte Carlo on parallel and cluster computers. Contributed computational methods descriptions to grant applications, including a successful grant proposing to utilize a Beowulf parallel computing cluster. Constructed, maintained, and supported a 16-node Beowulf parallel computing cluster. Provided scientific programming and Markov Chain Monte Carlo (MCMC) expertise to other project members. Developed the Hydra MCMC library, “MCGibbsit” MCMC diagnostic and mcgibbsit R package.


Education

Ph.D. Biostatistics

University of Washington, Seattle, WA

12/2000


Preceptor:

Adrian E. Raftery, Ph.D.

Algorithms, Diagnostics, & Software for Parallel MCMC


Biology Advisor:

Brian J. Reid, M.D. Ph.D.

Genetic & Epigenetic changes during progression from Barrett’s Esophagus to Esophageal Adenocarcinoma

M.Sc. Biostatistics

University of Washington, Seattle, WA

12/1997

B.Sc. Statistics

Brigham Young University, Provo, UT

04/1995

A.Sc. Computer Science

Weber State University, Ogden, UT

06/1992


Computer Skills

Scientific and Mathematical Software

R, Spotfire, SAS, MATLAB, Mathematica, NumPy, PipelinePilot, etc.

Computer Languages

Scripting: Python, PERL, JavaScript, etc. – 

Compiled: Java, C/C++, Fortran, Pascal, etc.

Formatting/Markup Languages

Markdown, Latex, HTML/HTTP, XML, etc.

Communications APIs

JSON, RMI: SOAP, CORBA, Java RMI - Parallel Computing: MPI, openMP, etc.

Operating Systems

Mac OS X, Linux/Unix, MS-Windows, etc.


Spoken Languages

Native English, Fluent French, Survival German


Selected Publications

Hulin Wu; Hongyu Miao; Warnes, G.R.; Canglin Wu; LeBlanc, A.; Dykes, C.; Demeter, L.M. "DEDiscover: A Computation and Simulation Tool for HIV Viral Fitness Research," BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on , vol.1, no., pp.687-694, 27-30 May 2008 

Kooner JS, Chambers JC, Aguilar-Salina CA, et al. "Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides in man,” Nature Genetics, 40, 149 - 151 (2008). 

Burrows RB, Warnes GR, Hanumara RC. "Statistical Modeling of Biochemical Pathways,” IET Systems Biology, IET Syst. Biol. 1, 353 (2007) 

Mank-Seymour AR, Richmond JL, Wood LS, Reynolds JM, Fan Y, Warnes GR, Milos MP, Thompson JF. "Association of torsades de pointes with novel and known single nucleotide polymorphisms in long QT syndrome genes," American Heart Journal, Volume 152, Issue 6, Pages 1116-1122, August 2006 

Warnes GR. “Sample Size Estimation for Microarray Experiments using the SSIZE package,” R News, Volume 6, Issue 5, pp. 64-68, December 2006. 

Caba E, Dickinson DA, Warnes GR, Aubrecht J. “Differentiating mechanisms of toxicity using global gene expression analysis in Saccharomyces cerevisiae,” Special Issue on EEMS 2004, Mutation Research, Volume 575, Issues 1-2, Aug. 2005, Pages 34-46. 


Selected Software

Category Compass: First-of-its-kind SAAS application for tire retailers which applies an advanced market demand model to identify location-specific market opportunities and optimal product stocking strategies. https://www.torqata.com/solutions

IrrigatorPro.org: Web application providing growers of peanuts, corn and cotton with a simple tool to determine when to irrigate to ensure optimal crop growth while minimizing irrigation costs leveraging on long-term, multicrop irrigation management research performed by the ARS National Peanut Research Laboratory in Dawson, Georgia.  http://irrigatorpro.org

DEDiscover: High-quality software application that enables biologists and physicians to model biological processes using systems of ordinary and delay differential equations. https://www.dediscover.org/ 

MiDAS: Web-based system based on the scientific workflow that allows scientists, lab staff, and statisticians to store, manage, process, and analyze mRNA microarray data using best-available statistical, graphical, and computational technologies based on R, Zope, and RStatServer 

RStatServer: System for quickly developing and deploying web applications that integrate statistical computations and graphics using R. Includes Python, R & Zope packages RSessionDA, Rpy, RSOAP, SOAPpy, and fpconst.

R Packages: General tools: gregmisc, gtools, gdata, gmodels, gplotsnamespace, SASxport, sessionStatistical genetics & genomics: genetics, GeneticsBase, GeneticsDesign, GeneticsPed, GeneticsQC, fbat, ssize, qvalue - Parallel computing: fork, Rlsf - Audiology: SII - MCMC Diagnostics: mcgibbsit.


Patents

Warnes, GR. Method and apparatus for transmission and reception of a signal over multiple frequencies with time offset encoding at each frequency. US Patent 9,602,228, 2017 https://patents.google.com/patent/US9602228B1/en


Publications

Warnes, GR. Method and apparatus for transmission and reception of a signal over multiple frequencies with time offset encoding at each frequency. US Patent 9,602,228, 2017 https://patents.google.com/patent/US9602228B1/en

Hulin Wu; Hongyu Miao; Warnes, G.R.; Canglin Wu; LeBlanc, A.; Dykes, C.; Demeter, L.M. "DEDiscover: A Computation and Simulation Tool for HIV Viral Fitness Research," BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on , vol.1, no., pp.687-694, 27-30 May 2008 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4548758&isnumber=4548615

Kooner JS, Chambers JC, Aguilar-Salina CA, et al. “Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides in man”, Nature Genetics, 40, 149 - 151 (2008). http://www.nature.com/ng/journal/v40/n2/full/ng.2007.61.html 

Burrows RB, Warnes GR, Hanumara RC. “Statistical Modeling of Biochemical Pathways”, IET Systems Biology, IET Syst. Biol. 1, 353 (2007) http://www.sciencedirect.com/science/article/B6T2C-4C4C52R-1/2/b405d51f9c53f02c012b72296796927a 

Mank-Seymour AR, Richmond JL, Wood LS, Reynolds JM, Fan Y, Warnes GR, Milos MP, Thompson JF. “Association of torsades de pointes with novel and known single nucleotide polymorphisms in long QT syndrome genes.” American Heart Journal, Volume 152, Issue 6, Pages 1116-1122, August 2006 http://www.mdconsult.com/das/article/body/214469302-2/jorg=journal&source=&sp=16680575&sid=0/N/561766/s0002870306007605.pdf?issn=0002-8703 

Warnes GR. “Sample Size Estimation for Microarray Experiments using the SSIZE package”, R News, Volume 6, Issue 5, pp. 64-68, December 2006. http://cran.r-project.org/doc/Rnews/Rnews_2006-5.pdf 

Warnes GR, Jain N. “Balloonplot: A graphical tool for displaying tabular data”, R News, Volume 6, Issue 2, May 2006. http://cran.r-project.org/doc/Rnews/Rnews_2006-2.pdf 

Warnes GR, Liu P. “Sample Size Estimation for Microarray Experiments”, Technical report 06/06, Department of Biostatistics and Computational Biology, University of Rochester, 2006. (also submitted to Bioinformatics) http://www.urmc.rochester.edu/biostat/people/faculty/documents/0606_warnes_liu.pdf 

Caba E, Dickinson DA, Warnes GR, Aubrecht J. “Differentiating mechanisms of toxicity using global gene expression analysis in Saccharomyces cerevisiae,” Special Issue on EEMS 2004, Mutation Research, Volume 575, Issues 1-2, Aug. 2005, Pages 34-46. http://www.sciencedirect.com/science/article/B6T2C-4G3CX6K-1/2/b86f0d9d41a37bd2a6e805c423b4f32f 

Warnes GR. “RSOAP - Using “R” with Python,” PyZine, Volume 11, Issue 05, Apr. 2004. http://www.pyzine.com/Issue005/Section_Articles/article_RSOAP.html 

Dickinson DA, Warnes GR, Quievryn G, Messer J, Zhitkovich A, Rubitski E, and Jiri A. “Differentiation of DNA-reactive and non-reactive genotoxic mechanisms using gene expression profile analysis”, Mutation Research, Volume 549, Issues 1-2, May 2004, Pages 29-41. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.78.4690&rep=rep1&type=pdf#page=35 

Warnes GR. “The Genetics Package,” R News, Volume 3, Issue 1, Jun. 2003. http://cran.r-project.org/doc/Rnews/Rnews_2003-1.pdf 

Warnes GR. “The Gregmisc Package: Something for Everyone” submitted to R News. http://www.warnes.net/Research/Publications/gregmisc.pdf  

Warnes GR. “HYDRA: A Java library for Markov Chain Monte Carlo,” Journal of Statistical Software, Volume 7, Issue 4, Mar. 2002. http://www.jstatsoft.org/v07/i04/ 

Warnes GR. “The Normal Kernel Coupler: An adaptive Markov Chain Monte Carlo method for efficiently sampling from multi-modal distributions,” submitted to Journal of the American Statistical Association, currently in revision.

Yanez ND, Warnes GR, and Kronmal RA. “A Univariate Measurement Error Model for Longitudinal Change,” Communications in Statistics, Volume 30, Issue 2, 2001. http://dx.doi.org/10.1081/STA-100002031

Warnes GR. “The Normal Kernel Coupler: An adaptive Markov Chain Monte Carlo method for efficiently sampling from multi-modal distributions,” Technical Report no. 395, Department of Statistics, University of Washington, Apr. 2001. http://www.stat.washington.edu/research/reports/2001/tr395.pdf 

Warnes GR. “HYDRA: A Java library for Markov Chain Monte Carlo,” Technical Report no. 394, Department of Statistics, University of Washington, Apr. 2001. http://www.stat.washington.edu/research/reports/2001/tr394.pdf

Warnes GR. “The Normal Kernel Coupler: An adaptive Markov Chain Monte Carlo method for efficiently sampling from multi-modal distributions,” Ph.D. thesis, Department of Biostatistics, University of Washington, Oct. 2000.



Published Software


Warnes GR. “SII” This R package calculates ANSI S3.5-1997 Speech Intelligibility Index (SII), a standard method for computing the intelligibility of speech from acoustical measurements of speech, noise, and hearing thresholds.  http://cran.r-project.org/package=SII, 2010-

Warnes GR. “SASxport”: This R package provides functions for reading, listing the contents of, and writing SAS xport format files, fully supporting translation between R  “factor” objects and SAS FORMAT labels. http://cran.r-project.org/package=SASxport, 2007-

Warnes GR, Miao H, Wu C, LeBlanc, Wu H. “DEDiscover” a  cross-platform software tool for differential equation model simulation and estimation, designed with special attention to the features necessary for modeling the interaction between the human immune system and viruses. https://cbim.urmc.rochester.edu/software/dediscover, 2007- 

Qiu W, Lazarus R, Warnes GR, Jain N.“GeneticsQC”, a package of classes and functions for the open-source statistical package R that checks the quality of a genetics data set (as a geneSet object), such as reporting the counts of missing genotypes for markers or for subjects, checking Mendelian errors, testing Hardy-Weinberg Equilibrium (HWE), etc. This package also provides functions to filter out low-quality markers, subjects, and/or families. http://r-genetics.org, 2007- 

Warnes GR, Duffy D, Man M, Qiu W, Lazarus R. “GeneticsDesign”, a package for the open-source statistical package R that provides classes and functions for designing genetics studies, including power and sample-size calculations. http://bioconductor.org/packages/2.6/bioc/html/GeneticsDesign.html, 2007- 

Qiu W, Lazarus R, Warnes GR, Jain N. “fbat”, a package for the open-source statistical software package R that implements a broad class of Family Based Association Tests for genetics data, with adjustments for population admixture using the code from the 'FBAT' software program. http://bioconductor.org/packages/2.6/bioc/html/fbat.html, 2006- 

Warnes GR, Lazarus R, Chasalow SD, Montana G, O'Connel M, Cheng J, Jain N. “GeneticsBase”, a package for the open-source statistical package R  that provides classes and functions for handling and analyzing large scale genetic data (up to 1e6 markers) http://bioconductor.org/packages/2.6/bioc/html/GeneticsBase.html, 2005

Warnes GR. “WardListing”, a tool for downloading LDS Ward Membership information from the official LDS web site, and translating it into formats appropriate for importing into address book software. http://www.warnes.net/Software/WardListing, 2005- 

Smith CR and Warnes GR. “Rlsf”, a package of functions for the open-source statistical package R that provides functions for using R with the LSF cluster/grid queuing system. http://cran.r-project.org/package=Rlsf, 2005-

Warnes GR and Li F. “ssize”, a package of functions for the open-source statistical package R that provides functions for computing and displaying sample size information for gene expression arrays. http://bioconductor.org/packages/2.6/bioc/html/ssize.html, 2004-

Warnes GR. “gmodels”, a package of functions for the open-source statistical package R that provides various R programming tools for model fitting.  http://cran.r-project.org/package=gplots, 2005

Warnes GR. “gdata”, a package of functions for the open-source statistical package R that provides various R  programming tools for data manipulation.  http://cran.r-project.org/package=gplots, 2005- 

Warnes GR. “gtools”, a package of functions for the open-source statistical package R that provides various general purpose programming tools. http://cran.r-project.org/package=gtools, 2005

Warnes GR. “gplots”, a package of functions for the open-source statistical package R that provides various R  programming tools for plotting data. http://cran.r-project.org/package=gplots, 2005-

Moriera W, Warnes GR. “rpy”, a robust Python, interface to the R Programming Language, http://rpy.sf.net, 2004-

Warnes GR. “fork”, a package of functions for the open-source statistical package R that provide simple wrappers around the Unix process management API calls: fork, wait, waitpid, kill, and _exit. This enables construction of R programs that utilize multiple concurrent processes. http://cran.r-project.org/package=fork, 2003-

Warnes GR. “fpconst”, a Python library providing constants and functions for creating and detecting IEEE 754 the floating point special values. http://research.warnes.net/projects/RStatServer/fpconst, 2003- 

Warnes GR. “CSVFile”, objects for the open-source web application development system Zope which automatically detecting and translating Microsoft Excel files into comma-delimited text files when uploaded. http://research.warnes.net/projects/RStatServer/csvfile/, 2003-

Warnes GR. “RSessionDA”, a Python module to allow Zope access to the features of the open-source statistical package/language R. http://research.warnes.net/projects/RStatServer/rsessionda/, 2003- 

Warnes GR. “session”, a package of function for open-source statistical package R that permit the state of the R session to be saved and restored, as well as functions for capturing the result of evaluating strings containing R commands and capturing the output. http://cran.r-project.org/package=session, 2002- 

Warnes GR. “RSOAP”, a server providing access to the features of the open-source statistical package R via the SOAP protocol. http://research.warnes.net/projects/RStatServer/rsoap/, 2002- 

Ullman C, Matthews B, Warnes GR, Blunk C. “SOAPpy”, a SOAP implementation for Python. http://pywebsvcs.sourceforge.net/, 2002-2004 

Warnes GR and Leisch F. “genetics”, a package for handling marker-based genetic data within the open-source statistical package R. The package includes function to compute allele frequencies, use genetic markers in statistical models, estimate disequilibrium, and test for departure from Hardy-Weinberg equilibrium. http://cran.r-project.org/package=genetics, 2002- 

Warnes GR et al.  “gregmisc”, a package of useful utility functions for the open-source statistical package R. Most functions in the gregmisc library fall into five general areas: permutations and combinations, tools for linear models, plots, data manipulation, and fixed or extended versions of existing functions.  http://cran.r-project.org/package=gregmisc, 2001- 

Warnes GR. “DistLib”, an Java library containing classes for computing features of and generating random numbers from a variety of statistical distribution functions. http://statdistlib.sourceforge.net/, 2000- 

Warnes GR. “mcgibbsit”, a package for the open-source statistical package R implementing the MCGIBBSIT diagnostic software for multiple (potentially interrelated) MCMC samplers.  http://cran.r-project.org/package=mcgibbsit, 2000- 

Warnes GR. “HYDRA”, an open-source, platform-neutral library for performing Markov Chain Monte Carlo. It implements the logic of standard MCMC samplers within a framework designed to be easy to use and to extend while allowing integration with other software tools. http://research.warnes.net/projects/mcmc/hydra, 2000- 

Warnes GR. “ClusterNFS”, an NFS server that allows diskless NFS clients to share a common file system by providing for host, user, and group-specific files within the same directory structure via interpreted file name extensions. http://ClusterNFS.sourceforge.net, 1999-