Using function aggregate() and ddply()
Post date: Mar 27, 2014 5:33:19 AM
using aggregate() to aggregate value by the primary keys. aggregate() is a simpler instance for ddply(). The latter function can do much more complex operation than aggregate() and is similar to 'group by' in sql.
ddply requires R package called "plyr", and a more complex tutorial can be found here.
student <- as.character(c(1,1,2,2,3,3,4,4)) day <- c(1,2,1,2,1,2,1,2) group <- c('A','A','A','A','B','B','B','B') math <- c(20,25,30,20,10,15,20,25) science <- c(100,50,75,80,30,40,50,60) d <- data.frame(group,student,day,math,science) # group student day math science # 1 A 1 1 20 100 # 2 A 1 2 25 50 # 3 A 2 1 30 75 # 4 A 2 2 20 80 # 5 B 3 1 10 30 # 6 B 3 2 15 40 # 7 B 4 1 20 50 # 8 B 4 2 25 60 ## Calculate max for each group aggregate(x=d[,c('math', 'science')], by=list('Group'=d$group), FUN=max) # Group math science # 1 A 30 100 # 2 B 25 60 ## Calculate max for each group in each day aggregate(x=d[,c('math', 'science')], by=list('group'=d$group, 'day'=d$day), FUN=max) # group day math science # 1 A 1 30 100 # 2 B 1 20 50 # 3 A 2 25 80 # 4 B 2 25 60 ## We can also use package plyr to do the same require(plyr) ddply(d, .(group, day), summarize, math = max(math), science = max(science)) # group day math science # 1 A 1 30 100 # 2 A 2 25 80 # 3 B 1 20 50 # 4 B 2 25 60 ## Calculate max of sum score for each student ## This computation is cross-column operation, which will not work by using aggregate ddply(d, .(student), summarize, mathscience = max(math+science)) # student mathscience # 1 1 120 # 2 2 105 # 3 3 55 # 4 4 85 ## Note that we cannot calculate across column using aggregate function. ## The command below will give error # aggregate(x=d[,c('math', 'science')], by=list('student'=d$student), FUN=function(x) { max(x[,math] + x$science) } )
Created by Pretty R at inside-R.org