00. YOUR FIRST STEPS

This tutorial will take only about 15 minutes. You will learn quickly and easily the essencial concepts you will need to carry out statistical analysis in self way using this toolbox. At the end of this short tutorial you will be able to follow and understand every stage of the statistical analysis of a data matrix, following the different aids you'll find in each section of this website.

Next you will learn the following concepts: VECTOR, ASSIGNMENTS, INDEXING, DATA.FRAME, LIST AND FUNCTION

BASICS IN R.

  • Run R and place it in the working directory (File -> Change dir). Open a syntax file (File -> Open scrip) or create a new one (File -> New script).

  • With the cursor in the first line of the script you have opened or created, runs combined keys (CTRL + R Windows - Cmd + Intro Mac - CTRL + INTRO Linux). You will see in CONSOLE (where R window displays the results) the following information:

VECTORS, INDEXED and ASSIGNMENTS

Now you ready to take your first steps in R. First create a vector (v.g. my.vector) . We write this in our sample script we created above

my.vector= c(2,3,7,10,14,16,8,9,18,5)

If you now press CTRL + R (hereinafter [CR]) in console you will see a that order. You've created a vector using the function c (). In R you should always use c( ) to introduce n values ​​in a vector. You have assigned numbers 2,3, ... 5 to an object call my.vector. To view the contents of that object (which is a store), just write my.vector [or CR] and you will see that the string of number that contains the object my.vector :

> my.vector

[1] 2 3 7 10 14 16 8 9 18 5

> my.vector[c(3,7,9:10)]

[1] 7 8 18 5

If we want to access to the number in the 4th position in the object my.vector we type the position in brackets:

> my.vector[4]

[1] 10

Numbers in positions 1 al 5: my.vector[1:5]

> my.vector[1:5]

[1] 2 3 7 10 14

Positions 3rd, 7th, 9th to the end (As the numbers in those positions are not consecutive we do: c( )

my.vector[c(3,7,9:10)]

All the numbers except those in first and last position. (see the sign in c( ) )

my.vector[-c(1,10)]

> my.vector[-c(1,10)]

[1] 3 7 10 14 16 8 9 18

We can also create a vector with qualitative character values:

between.factor = c('sex','zone')

between.factor = c('sex','zon)

Notice the quotation marks around the variable names. Running the second line we'll get an error since the apostrophe for the between.factor zone was not closed.

As we will see the creation of this kind of vectors will be necessary for example to tell R that we have two factors sex and zone (between factors).

DATA.FRAME

R organizes all information in two fundamental types of storage that you should know (data.frame and list). To begin tho work with these types of storages we will use one of the data files that you can find as demos in R. We assign, for convenience, to the object dat the content of OBrienKaiser database. Write in your script and execute [CR]:

dat = OBrienKaiser

head(dat)

The command head( ) is of great importance in R and you should remember. By default will teach you the first 6 records of any object that contains data. In this case the object dat belongs to the class data.frame. A data.frame is very similar with a worksheets that you usually handled in office. This is usually (not always) of subjects in rows and variables in columns. Our database ObrienKaiser corresponds to split-plot design with treatment and gender as between factors and two repeated measures: stage with 3 levels (before, after, and monitoring) and time with 5 levels (5 measurements each hour -1 - in each phase). Therefore this is a 3 x 2 x 3 x 5 design.

Other useful function will be var.names( ) or vn( ). It will give you the name of the variables in your database, the column that occupies and the variable type (numeric, integer, factor, character, logic, etc.). Let see it use with our running example.

var.names(dat)

var.names(dat, alphabetic=T)

If you write dimension(dat) in the console or in your script you will see:

We have 16 records in 17 columns (variables) containing 2 columns that correspond to the inter-group factors and the following 15 columns correspond to the 3 x 5 nesting factor repeated measures design phase * hour. So a data.frame therefore has rows and columns which can be accessed with the index that we used in the previous step, only now we need an index for rows and one for columns. Consider:

dat[which.rows, wich.colums]

If you look at the last line you will see that you can access to any variable in a data.frame simply typing the name of the store followed by the $ symbol following by the name of the variable you want to see:

storage$variable

You can also create a new variable in a data.frame indicating this in the same, with the $ symbol and assigning the value or values you want for that new variable:

storage$new.variable = log(storage$variable)

In this case we are creating a variable that is the log of another variable already in the data.frame.

LIST

While a data.frame is an object storing data in rows (units or subjects) and columns (variables). A list is a collections of objects. It should be imagined as a "chest of drawer" containing drawers each with a name or number in which we can store any type of object that we want regardless of its extent or type. If you remember the previous corresponding data.frame split-plot design 3 x 2 x 3 x 5 within-group had two factors we call phase with 3 levels and time with 5. If you have a repeated measures design like this you must tell R to the list (using the list function) within.factor the name of the factors and their levels properly nested. Consider:

within.factor= list(phase=c('before','after','following'),hour=1:5)

Of course if the nesting had been hours x phase, the time should be indicated first and then phase. The indexing of the elements of a list is slightly different from the previous. Consider several examples:

To access to an element in a list you can use a double bracket. But in doing so no arithmetic operation must be done, if you try R will give you a numerical error. If you want to run an arithmetic operation to an element in a list it is required to insert that element in a double brackets [[2]]. The double bracket erase the class list and now the information thus extracted has it native class.

FUNCTION

In general, all computer applications works with functions that collect by means of arguments, the elements that when properly combined produces the result desired by the user. Calling a function typically requires the function name and, in parenthesis, the arguments that uses that function.

name.of.funcion(arg1,arg2,arg3,...,arg.n)

In many cases the user wants to assign a new object to the result of that function:

object = name.of.funcion(arg1,arg2,arg3,...,arg.n)

In this toolbox, most functions require mandatory and optional arguments. When an argument is optional normally acts with a default value that the user can modify at any time. For exaple in function frequency.fnc( ) by which we can apply the frequency table or contingency of one or more variables.

frequency.fnc(dat, variable='gender')

frequency.fnc(dat, variable='gender', prop=T)

We "added" the argument prop=T. This is a logical argument since only two values are possible: T (True) and F (False). In this example prop=T will give us the table with the relative frequencies of the variable gender. (By default the value is False so we do not get the relative frequency unless we ask for with prop=T)

All the names that call for a function have at the end .fnc For example frequency.fnc. In general for all functions the syntax have common rules:

  1. The first argument of the function is usually the name of the database on which we operate and this is never going quotation marks and apostrophes.

  2. Whenever we refer to a variable within the function.fnc syntax, we do it indicating the number of the column or columns of these variables or by its name. If we use the name, this should always be in quotes or apostrophes.

See some examples:

descriptives.fnc(my.dat, variable=1:6)

descriptives.fnc(my.dat, variable=c(1,3,5,6) )

descriptives.fnc(my.dat, dv='weight', which.factor='sex')

quantile.fnc(my.dat, variable=c(2:5,7,12) )

quantile.fnc(my.dat, variable=c(2:5,7,12),

which.factor='grade')

If you want to access a quick help for a particular function should only write the name of it with parentheses and without arguments:

frequency.fnc( )