The following are reference primers to help other analysts learn a variety of different analytical software or concepts. These reference primers were created - in large part - to help past students and/or collegues re-learn class/analytical material very quickly. Please feel free to use, as they will be up for life!
R
Chad has a wealth of experience programming in the statistical language R. A large majority of Chad's industry, academic and government related work at comScore was coding in R, and he has written several advanced scripts. On several occasions, Chad has instructed others on how to make their script more efficient from both a time and resource efficient manner. Below are some of the introductory primers that Chad has written in R for the general community.
1. General Tutorial on Basic Programming in R - This introductory primer provides a basic overview of the various programming data structs available in R. Designed mostly for beginners, the primer will allow beginners to learn the basic functionality in R. Here is a PPT deck that corresponds to this basic primer.
2. Functional Programming Tutorial in R - This primer gives an overview of various functional programming in R using the apply, sapply and lapply. The tutorial also gives an overview of the plyr package for sub-set processing with various data types.
3 Statistical Programming in R - This primer gives an overview of the various statistical functions and is divided into four parts. The first part gives some basic statistical functions, the second part details the various statistical distribution functions such as the normal and beta distributions, the third part overviews the basic regression models such as the glm and lm function, and the fourth part provides an overview of various classification models such as the SVM and linear regression models.
4. Graphical Production Tutorial in R - This primer gives an overview of the various graphical functions in R. This includes the graphics library in R and the ggplot2 library.
The following gives an overview of some of the graphical productions that Chad has made in R. Each of the figures provides an example of the powerful graphical abilities available in R. The R script used to produce all these plots is provided at the very end.
a. Correlation and Scatterplot - An example of a correlation and scatterplot displayed in the same plot between various variables. Very useful for any initial or exploratory data analysis.
b. Density and Smoothing Plot - A density and smoothing plot displayed in the same figure side by side. The smoothing plot has both a local smoothing fit and a lm fit. These figures demonstrate how multiple statistical models can be displayed in the same plot. This type of figure is also very useful for exploratory type analysis.
c. Faceted Data Plot - A scatter plot which is faceted (or sub-divided) on a categorical variable relative to the x-axis. From a data visualization stand-point, this can be extremely useful.
d. GGPLOT2 position placement - A figure with 3 scatterplots from the ggplot2 library positioned at different locations on the figure. This demonstrates how to place a figure at different locations on a figure.
e. Graphical Production Script - The R script used to produce all the aforementioned graphical plots.
Scripts for Large Scale data processing:
5. Parallel Processing in R - Chad has experience with parallel execution jobs in R. This primer provides a brief overview of the snow package in R. The snow package is one of R's main parallel execution packages.
6. Data.Table package in R - The data.table package in R allows for the fast processing of several common data functions such as aggregation and merging. This primer provides an overview of the package's core abilities. The last section is dedicated to bench-marking data.table functions vs. data.frame functions.
7. ODBC Connections in R - This tutorial provides an intro on how to perform ODBC connections in R, with several examples used to connect to Microsoft ACCESS and SQL Server. Also included is code to write back to an database (ie: ACCESS).
Applied Study Design
Primer on the Study Design for Observational and Experimental Studies - This is a very meticulous primer written on how to perform observational and experimental studies. The objective of the primer is to explain - in as simple terms as possible - the step-by-step study design strategy for an observational or experimental study. This primer was written to be general enough to be applicable to almost any kind of observational or experimental setting.
Simio
The following is a 'golden' reference model for Simio (divided into several files due to space limitations with the personal edition). Simio is a discrete event simulation language.
This golden reference model is divided into sub-models, and each sub-model demonstrates a basic Simio programming concept. The model is well documented into: how-to sections (green), important concepts (blue) and technical notes (black). This golden reference model is meant to be a quick refresher for Simio learning.
Python
Python Data Mining and Syntax - A reference primer in Python to quickly learn the basic syntax and data mining (aggregation, subset, merging).
MatLab
MatLab Basic Syntax - A reference primer in MatLab to learn basic syntax.
MatLab Data Mining - A reference primer in MatLab to learn basic data mining (aggregation, subset, merging).
MatLab ODBC Primer - A reference primer in MatLab to create ODBC connections.
Class Material
The following are my class slides and supplementary files for all of my classes taught at the University of Dayton. I provide this material in the hope that it could be useful in the future for any of my past students - or anyone else. To my past students: just remember to give kudos to your favorite professor Dr. Kimmel. :)
IET 322/SYE571 - Data Analytics in R
IET 335 - Discrete Event Simulation in Simio (Lectures)
IET 335 - Discrete Event Simulation in Simio (Labs)