There are numerous statistical analysis methods, and mastering them can take years. For simple A/B tests on the web, however, it's enough to how to apply some simple techniques. This page shows you how.
Tools R
The R scripts on this page make it easy to do the kinds of analysis we do in IS211. R is available for free on Windows, Macintosh, and Linux.
Running Examples
First, you can either download and install R or sign up and access Rstudio Cloud.
When you run R, you will see the console window, where you type commands and see output.
Copy the example files (.r and .csv) into your home directory (~).
On Windows, your home directory is "c:\Users\<your_username>\Documents".
This is also called "My Documents" in some places.
On Macintosh, your home directory is "/Users/<your_username>".
To work in a different directory:
Open the .r script in a text editor.
Look for the line "setwd("~")" near the top of the script.
Change "~" to any folder path you like.
Use "/" to separate directory names (even on windows).
Load your data into the .csv file
Edit the file with Microsoft Excel
See below for notes on each file's format
Go to the R console window and run the script from the menu.
On Windows, choose "File->Source R Code..."
On Macintosh, choose "File->Source File..."
Output
Statistics will be printed to the console.
Histograms (if any) will be saved as png files in your working directory.
Getting Help
"help(function_name)": Enter this into the R console for quick information on a function.
Go to Quick-R: if you want to learn more about R.
Other Tools
You can run your T-test using Microsoft Excel, here are some supplementary information about running T-test in Excel
These tools are not supported, but have been used effectively by others in is211.
Variable Types
Interval (include Ratio)
ordered, distance between values is meaningful
e.g. task time, number of clicks, number of errors, or money
Ordinal
ordered, distance between values is not meaningful
e.g. Likert scale response (1-5, 1-7, etc.)
Nominal
not ordered
e.g. task success or click-through (True/False), preference (Version A/Version B)
Interval Variable Tests (Normal Distributions Only)
The most common case in is211 is to compare the mean of some interval variable between version A and version B of an interface. The statistical test we use depends on the type of experiment we are running.
Note that these tests can only be used if the variable is normally distributed. Check the variable's histogram and use a test for ordinal variables if your variable does not look normally distributed. Variables that are often normally distributed include task time, number of clicks, or money.
Within-subjects: Paired-Sample T-Test
script: paired-t-test.r
input: paired-t-test.csv
One column for each condition.
The first (header) row has the names of the conditions.
Each row after the first contains data for one participant.
Between-subjects: Welch Two Sample T-test
script: welch-t-test.r
input: welch-t-test.csv
One column for each condition.
The first (header) row has the names of the conditions.
Each row after the first contains two different participants' data.
Ordinal Variable Tests (Also for Non-Normal Interval Variables)
Use one of the following tests to compare means of an ordinal variable, such as Likert scale responses, between version A and version B of an interface. Again, the statistical test we use depends on the type of experiment we are running.
You should also use these tests instead of t-tests for comparing means of interval variables whenever the histogram for the variable does not look normally distributed (for example, when it is heavily skewed in one direction). Variables like number of errors are usually skewed toward 0, so we often don't bother with t-tests at all in this case.
(Are you wondering why we use the t-test at all? It's because the t-test has more power to detect significant results. In other words, it takes fewer participants to see significant results. So use the t-test if you can.)
Within-subjects: Wilcoxon test
script: wilcoxon.r
input: wilcoxon.csv
One column for each condition.
The first (header) row has the names of the conditions.
Each row after the first contains data for one participant.
Between-subjects: Mann-Whitney U test
script: mann-whitney-u.r
input: mann-whitney-u.csv
One column for each condition.
The first (header) row has the names of the conditions.
Each row after the first contains two different participants' data.
Nominal Variable Tests
Comparing Counts
The following test compares counts of a nominal variable. This is useful for comparing how often an event occurs between two versions of an interface. For example, how many users succeed in completing a task or click a link when using Version A vs. Version B? The following test will help you check whether or not these differences are significant. This test works for both within-subjects and between-subjects experiments.
Fisher's exact Test
script: fishers-exact.r
input: fishers-exact.csv
The first (header) row contains column labels (can be changed).
Each row after the first contains one sample.
For within-subjects, each participant will have two rows.
The first column has the name of the condition.
The second column has the value.
Comparing Preferences
In within-subjects experiments, it's fairly common to ask participants to choose between two alternatives (e.g. "Do you prefer A or B?"). A binomial test can help you determine if you have enough participants to show that this preference is significant.
Binomial test
script: binomial.r
input: binomial.csv
All data should be in one column
One row for each participant
Each row should have the name of the condition that the participant preferred