The Tidyverse Diamonds Dataset
The Diamonds Tidyverse dataset is used extensively in R4DataScience text book. The diamonds dataset comes in ggplot2 and contains information relating to 53,940 diamonds, including the price, carat, color, clarity, and cut of each diamond. The table below shows the range of diamonds which are available. It turns out that there are more high quality cut diamonds than low quality. The following information is provided in relation to the dataset: price in US dollars ($326--$18,823) / carat weight of the diamond (0.2--5.01) / cut quality of the cut (Fair, Good, Very Good, Premium, Ideal) / color diamond colour, from D (best) to J (worst) / clarity a measurement of how clear the diamond is / x length in mm (0--10.74) / y width in mm (0--58.9) / z depth in mm (0--31.8). carat, color and clarity are ordered factor variables. An ordered factor arranges the categorical values in a low-to-high rank order. For example, there are 5 categories of diamond cuts with “Fair” being the lowest grade of cut to ideal being the highest grade. There are 6 variables that are of numeric structure: carat, depth, table, x, y, z There is 1 variable that has an integer structure: price. See explanation provided by Y. Wendy Huynh Below, we use the Diamonds dataset and ggplot2 Visualization to graph in RStudio. We mainly on dplyr and ggplot2 commands derived mainly from R4DataScience