New variable

Copy, Paste and Adapts

dat$A=1 dat$A[dat$zone=='rural' & dat$age <=10]=0

my.list=list( F1=c(1,5,10,2,8,3), F3=c(9,11,7) ) dat= compute.fnc(dat, variables=my.list)

my.expression=list(P1='1.5*X1+log(X2+1)),

P2='sum(X2+X1)^2')

dat=compute.fnc(dat, expression=mi.expression

Objetives

To create new variables we can follow these ways:

  1. New variables for the fulfillment of criteria imposed on other variables in the database.

  2. New variables for the application of summary functions (mean, median, sum or sum of squares) of user-defined variables.

  3. New variables by the application of an algorithm or numerical expression of p variables (linear or nonlinear combination) by the user.

1.- New variable by conditions compliance

Create a new variable from compliance with conditions in one or more other variables imposed by the user.

From the database OBrien Kaiser we want to create a new variable name block.1 with two levels: high and low, which comes from belong to treatment control and a value lower or equal to 4 in fup.1 variable.

IMPORTANT: The variables involved in the filtration conditions should not have missing (NA). Therefore if there were missing value for any of the variables, recode the NA value, for example to -999. Then we simulate the variable treatment has missing values (NA).

dat=OBrienKaiser

dat $treatment= recode(dat$treatment, " NA =-999 ")

dat$block.1='high'

dat$block.1[dat$treatment=='control' & dat$fup.1 <= 4, ]='low'

The instruction format follows the following scheme:

storage$new.variable[conditions]=new.value

frequency.fnc(dat, variable='block.1')

#------------------------------------------------------------------

# TABLE OF FRECUENCY

#------------------------------------------------------------------

$block.1

$block.1$n.total

[1] 16

$block.1$table

high low

12 4

We return the missing values for the variable treatment to its original value NA .

dat $treatment= recode(dat$treatment, " -999=NA ")

The most frequent mistakes in this way to create new variables are:

      1. To ignore the pattern $new.variable.name previous to condition to met.

      2. Skip the store name (data in this example) plus the $ sign in front of each variable of the condition.

2.- compute.fnc

    • Create new variables from the application of a summary function (mean, median, sum or sums of squares) to the variables of the database user-defined.

    • Create new variables from the application of an algorithm indicated by the user:: (2*x1+log(x2)-3*1/x3)

NEW VARIABLES FROM THE APPLICATION OF A SUMMARY FUNCTION

From database iqitems we generate the means for each subject in the factors that we call F1 and F3 according to the criteria defined in the figure of the reliability calculated by omega.

First, we create a list with numbers of variables colums which we wish their means to form the factorial structure defined in this figure. To do this, we must first know which colums these variables occupy in database.

var.names(iqitems)

name column type

1 reason.4 1 integer

2 reason.16 2 integer

3 reason.17 3 integer

4 reason.19 4 integer

5 letter.7 5 integer

6 letter.33 6 integer

7 letter.34 7 integer

8 letter.58 8 integer

9 matrix.45 9 integer

10 matrix.46 10 integer

11 matrix.47 11 integer

12 matrix.55 12 integer

13 rotate.3 13 integer

14 rotate.4 14 integer

15 rotate.6 15 integer

16 rotate.8 16 integer

We see the correspondence of items with columns they occupy.

my.list=list( F1=c(1,5,10,2,8,3),

F3=c(9,11,7) )

iqitems=compute.fnc(iqitems, variables=my.list)

*** The new variables: F1 F3 have beed created using the summary function mean ***

*** following next criteria:

$F1

[1] "reason.4" "letter.7" "matrix.46" "reason.16" "letter.58" "reason.17"

$F3

[1] "matrix.45" "matrix.47" "letter.34"

head(iqitems[,c('F1','F3')])

F1 F3

5 3.666667 4.333333

6 3.166667 4.000000

7 3.666667 4.000000

8 2.666667 4.000000

9 2.500000 4.000000

10 4.000000 3.666667

Next we ask for the variables F1 and F3 are the median of the variables defined.

iqitems=compute.fnc(iqitems, variables=my.list,

statistic='median')

And now the sum of such variables.

iqitems=compute.fnc(iqitems, variables=my.list,

statistic='sum')

Now sum squares (ss) of the variables.

iqitems=compute.fnc(iqitems, variables=my.list,

statistic='ss')

NEW VARIABLES FROM THE APPLICATION OF A USER ALGORITHM

On the same database iqitems we will create two new variables, each of which arise from the implementation of an algorithm, linear combination of variables using a new arguments: expression

my.expression=list(

P1='1.8*reason.4+0.34*reason.16-log(reason.17+1)',

P2='sqrt(rotate.3)+(1/rotate.4)' )

iqitems = compute.fnc(iqitems, expression=my.expression)

*** The new variables P1 P2 have beed created using the expressions

$P1

[1] "1.8*reason.4+0.34*reason.16-log(reason.17+1)"

$P2

[1] "sqrt(rotate.3)+(1/rotate.4)"

head(iqitems[,c('P1','P2')])

P1 P2

5 4.4740899 2.402735

6 4.8105621 2.649490

7 5.1505621 2.953427

8 5.2540899 1.142857

9 -0.2494379 2.378925

10 6.9505621 2.232051

Up->