This tutorial will guide you through a basic analysis of fictional data. The tutorial is indended to be used as a worked example of how to start thinking about analysing data in R. It is in no way intended to be authoritative. Much more comprehensive guides are available on the internet and in published text books. My hope is that this is a useful primer, and will be specifically useful for those researching cognitive impairment and the use of neuropsychological tests.
You can work through this tutorial in a stats package of your choice, but the worked examples here are all in R. You can download R and RStudio for free (available for all platforms). Below is what RStudio looks like.
I leart R by making sense of some code which was written for a similar purpose, then trying to adapt that code to my own ends. This tutorial is designed very much with this mode of learning. Sample code in R supplied at the bottom of each page. This isn't a stats course, rather this tutorial assumes you have a working knowledge of stats somewhere at the back of your mind, but you just need to be shown how to do things in practice (and in R)
If you've never used R before, you will probably have to install all the packages or libraries used. For each library that you don't have, install it.
eg. install.packages("dplyr")
There are new libraries to use/install on each page.
I hope you find this tutorial useful. Please email lachlan.fotheringham@gmail.com with any queries or comments.
Vignette:
You are a researcher working Fife, Scotland. You have just completed a study in the village of Limekilns on Fife syndrome. Fife syndrome affects cognition in older people, which you hope to detect using one of three candidate pen and paper tests. The study aimed to establish which was the most appropriate test to use in the clinic. You only enrolled participants >=60 years. This was a cross sectional study.
Here is what the data looks like
Please download this data. This is a CSV file - a generic format for use with any spreadsheet or stats program. This can be a useful format for passing data around.
Please create a table to summarise participant characteristics - sometimes called a demographics table
This online guide might help you understand what a demographics table is if you're not sure
It might look something like this...
Make sure to examine your target journal for the exact requirements
Luckily you find a reference text book which includes all tests you are using (below)
2. Now inspect the data - are there any problems? How would we detect this? Specifically look for tests out of range. Consider what to do about this.
This density plot below (similar to a histogram) shows one way of inspecting data. Notice the blip around 20. Can this be right? What should we do about this?
3. Use a statistical test of your choice to measure the association between each of your candidate tests and the diagnosis of Fife Syndrome. Which looks most promising at this stage?
You may need to use an online flow chart to help you chose a test. This isn't a tutorial on statistical test choice.
Your answer might look something like this:
.y. group1 group2 n1 n2 statistic df p p.signif
<chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 testA no yes 100 99 7.82 194. 3.27e-13 ****
2 testB no yes 100 100 3.55 197. 4.88e- 4 ***
3 testC no yes 100 100 3.51 196. 5.61e- 4 ***
It's probably also useful to think about the association of your potential confounders with the diagnosis of Fife syndrome.
4. Use an appropriate statistical test to look at the association between each of your confounders and the diagnosis.
eg. below
Here is some R code to help you along with tasks 1-4