There are numerous statistical analysis methods, and mastering them can take years. For simple A/B tests on the web, however, it's enough to how to apply some simple techniques. This page shows you how.
Tools R
The R scripts on this page make it easy to do the kinds of analysis we do in IS211. R is available for free on Windows, Macintosh, and Linux.
Running Examples
- First, you can either download and install R or sign up and access Rstudio Cloud. 
- When you run R, you will see the console window, where you type commands and see output. 
- Copy the example files (.r and .csv) into your home directory (~). - On Windows, your home directory is "c:\Users\<your_username>\Documents". - This is also called "My Documents" in some places. 
 
- On Macintosh, your home directory is "/Users/<your_username>". 
- To work in a different directory: - Open the .r script in a text editor. 
- Look for the line "setwd("~")" near the top of the script. 
- Change "~" to any folder path you like. 
- Use "/" to separate directory names (even on windows). 
 
 
- Load your data into the .csv file - Edit the file with Microsoft Excel 
- See below for notes on each file's format 
 
- Go to the R console window and run the script from the menu. - On Windows, choose "File->Source R Code..." 
- On Macintosh, choose "File->Source File..." 
 
- Output - Statistics will be printed to the console. 
- Histograms (if any) will be saved as png files in your working directory. 
 
 
 
Getting Help
- "help(function_name)": Enter this into the R console for quick information on a function. 
- Go to Quick-R: if you want to learn more about R. 
 
 
Other Tools
You can run your T-test using Microsoft Excel, here are some supplementary information about running T-test in Excel
These tools are not supported, but have been used effectively by others in is211.
Variable Types
- Interval (include Ratio) - ordered, distance between values is meaningful 
- e.g. task time, number of clicks, number of errors, or money 
 
 
- Ordinal - ordered, distance between values is not meaningful 
- e.g. Likert scale response (1-5, 1-7, etc.) 
 
 
- Nominal - not ordered 
- e.g. task success or click-through (True/False), preference (Version A/Version B) 
 
 
Interval Variable Tests (Normal Distributions Only)
The most common case in is211 is to compare the mean of some interval variable between version A and version B of an interface. The statistical test we use depends on the type of experiment we are running.
Note that these tests can only be used if the variable is normally distributed. Check the variable's histogram and use a test for ordinal variables if your variable does not look normally distributed. Variables that are often normally distributed include task time, number of clicks, or money.
- Within-subjects: Paired-Sample T-Test - script: paired-t-test.r 
- input: paired-t-test.csv - One column for each condition. 
- The first (header) row has the names of the conditions. 
- Each row after the first contains data for one participant. 
 
 
 
- Between-subjects: Welch Two Sample T-test - script: welch-t-test.r 
- input: welch-t-test.csv - One column for each condition. 
- The first (header) row has the names of the conditions. 
- Each row after the first contains two different participants' data. 
 
 
 
Ordinal Variable Tests (Also for Non-Normal Interval Variables)
Use one of the following tests to compare means of an ordinal variable, such as Likert scale responses, between version A and version B of an interface. Again, the statistical test we use depends on the type of experiment we are running.
You should also use these tests instead of t-tests for comparing means of interval variables whenever the histogram for the variable does not look normally distributed (for example, when it is heavily skewed in one direction). Variables like number of errors are usually skewed toward 0, so we often don't bother with t-tests at all in this case.
(Are you wondering why we use the t-test at all? It's because the t-test has more power to detect significant results. In other words, it takes fewer participants to see significant results. So use the t-test if you can.)
- Within-subjects: Wilcoxon test - script: wilcoxon.r 
- input: wilcoxon.csv - One column for each condition. 
- The first (header) row has the names of the conditions. 
- Each row after the first contains data for one participant. 
 
 
 
- Between-subjects: Mann-Whitney U test - script: mann-whitney-u.r 
- input: mann-whitney-u.csv - One column for each condition. 
- The first (header) row has the names of the conditions. 
- Each row after the first contains two different participants' data. 
 
 
 
Nominal Variable Tests
Comparing Counts
The following test compares counts of a nominal variable. This is useful for comparing how often an event occurs between two versions of an interface. For example, how many users succeed in completing a task or click a link when using Version A vs. Version B? The following test will help you check whether or not these differences are significant. This test works for both within-subjects and between-subjects experiments.
- Fisher's exact Test - script: fishers-exact.r 
- input: fishers-exact.csv - The first (header) row contains column labels (can be changed). 
- Each row after the first contains one sample. - For within-subjects, each participant will have two rows. 
 
- The first column has the name of the condition. 
- The second column has the value. 
 
 
 
Comparing Preferences
In within-subjects experiments, it's fairly common to ask participants to choose between two alternatives (e.g. "Do you prefer A or B?"). A binomial test can help you determine if you have enough participants to show that this preference is significant.
- Binomial test - script: binomial.r 
- input: binomial.csv - All data should be in one column 
- One row for each participant 
- Each row should have the name of the condition that the participant preferred