03 Slice and Dice in R

For illustration we will use ggplot2::diamonds data set. (ggplot2 here is the name of package)

``> diamonds = ggplot2::diamonds``
``> head(diamonds)``
``  carat       cut color clarity depth table price    x    y    z``
``1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43``
``2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31``
``3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31``
``4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63``
``5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75``
``6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48``

Find mean carat for all observations

``> mean(diamonds\$carat)``

Find mean carat for premium diamonds

``> mean(diamonds\$carat[diamonds\$cut == "Premium"])``

Here is another way

``> mean(subset(diamonds, cut == "Premium")\$carat)``

Create another data frame with price and cut columns

``> diamonds.premium = diamonds[diamonds\$cut == "Premium", c("carat", "price")]``
``> str(diamonds.premium)``

Verify the row count by using table function on cut column in diamonds dataset.

``> table(diamonds\$cut)``

Find median carat for Premium or Idea cut diamonds.

``> median(diamonds\$carat[diamonds\$cut == "Premium" | diamonds\$cut == "Ideal"])``

Take 10 sample records from the dataset

``> diamonds[sample(nrow(diamonds), 10), ]``

Select records in diff combinations

``diamonds[, c("price")] #returns a vector``
``diamonds[, c("price"), drop = FALSE] #returns a single column dataframe``
``diamonds[, 2] #returns a vector``
``diamonds[, 2, drop = FALSE] #returns a single column dataframe``
``diamonds[, c("price", "cut")] # returns all rows and Price and Cut columns``
``diamonds[1, c("price", "cut")] # return row 1 with Price and Cut columns, 1 being the first row``
``diamonds[1:4, c("price", "cut")] #returns rows from 1 to 4 with Price and Cut columns``
``diamonds[1:4, 1:3] # returns rows from 1 to 4 with columns from 1 to 3, 1 being the first column``
``diamonds[1:4, c(1, 2, 4)] #returns rows from 1 to 4 and column 1, 2, and 4``
``diamonds[c(1, 2, 4), c(4, 1)] # returns rows 1, 2, 4 and column 4 and 1``
``diamonds[4, 2] # returns cell value from row 4 and column 2``
``diamonds[["cut"]] # returns column dropping the column name ``

Apply functions in R

Plyr Package Equivalent

``Base function   Input   Output   plyr function  --------------------------------------- aggregate        d       d       ddply + colwise  apply            a       a/l     aaply / alply  by               d       l       dlply  lapply           l       l       llply   mapply           a       a/l     maply / mlply  replicate        r       a/l     raply / rlply  sapply           l       a       laply ``