Using function aggregate() and ddply()

Post date: Mar 27, 2014 5:33:19 AM

using aggregate() to aggregate value by the primary keys. aggregate() is a simpler instance for ddply(). The latter function can do much more complex operation than aggregate() and is similar to 'group by' in sql.

ddply requires R package called "plyr", and a more complex tutorial can be found here.

student <- as.character(c(1,1,2,2,3,3,4,4)) day <- c(1,2,1,2,1,2,1,2) group <- c('A','A','A','A','B','B','B','B') math <- c(20,25,30,20,10,15,20,25) science <- c(100,50,75,80,30,40,50,60)   d <- data.frame(group,student,day,math,science) #   group student day math science # 1     A       1   1   20     100 # 2     A       1   2   25      50 # 3     A       2   1   30      75 # 4     A       2   2   20      80 # 5     B       3   1   10      30 # 6     B       3   2   15      40 # 7     B       4   1   20      50 # 8     B       4   2   25      60   ## Calculate max for each group aggregate(x=d[,c('math', 'science')], by=list('Group'=d$group), FUN=max) #   Group math science # 1     A   30     100 # 2     B   25      60       ## Calculate max for each group in each day aggregate(x=d[,c('math', 'science')], by=list('group'=d$group, 'day'=d$day), FUN=max) #   group day math science # 1     A   1   30     100 # 2     B   1   20      50 # 3     A   2   25      80 # 4     B   2   25      60       ## We can also use package plyr to do the same require(plyr) ddply(d, .(group, day), summarize,       math = max(math),       science = max(science)) #   group day math science # 1     A   1   30     100 # 2     A   2   25      80 # 3     B   1   20      50 # 4     B   2   25      60       ## Calculate max of sum score for each student ## This computation is cross-column operation, which will not work by using aggregate ddply(d, .(student), summarize,       mathscience = max(math+science)) #   student mathscience # 1       1         120 # 2       2         105 # 3       3          55 # 4       4          85   ## Note that we cannot calculate across column using aggregate function. ## The command below will give error # aggregate(x=d[,c('math', 'science')], by=list('student'=d$student), FUN=function(x) { max(x[,math] + x$science) } )
Created by Pretty R at inside-R.org