R Session
1. Advanced Tableau Visualization
1.1. Clustering in Tableau
- From the previous lecture, use the scatter plot sheet:
- From the scatter plot with All Opiod RX~All Opiod Deaths (varied by County name)
- Remove Trendline
- Analysis/Cluster: drag to the plot and select 4 cluster
- Manually select the 3 points (Horry, Greenville, Charleston)
- Right click and Create Set.
- Enter name: “Cty High Opiods with Deaths/Rx”
- The new set will appear in the Data Tab
- Drag the newly created set to Filter
- Save the Sheet name to Cluster
1.2. Trend for both data sets
-Drag Data Year to Columns
-All Opoids Deaths & All Opoids Rx to Rows
-Since the y-axis are in difference scale, we need to reduce the scale by clicking down arror to each Rows data and select “Quick Table calculation” – “Percentage Difference”
-Notice that the unit will be the same and there are reduction of number of deaths for the year 2016 in both data sets
1.3. Segmentation
-Open new Sheet and name it Segmentation
-Drag Alll Opoids Death & All Opoids RX to Columns
-Drag County name to Rows
-Drag County name to Colors
-Select the sort for All Opoid Deaths
1.4. Baseline
- Download Global_T_SST_MSL.csv and open it
- Data\New Data source to open a new Data set
- Drag Dates to Columns, click on the down arrow next to Date and change from Discrete to Continuous
- Analytics: Reference Band and drag to Sheet then choose Table
- Data tab: Drag Median to Rows, Date to Detail (click on down button and change to month)
- Drag Median to Color
- Change color spectrum to Sunrise – Sunset Diverging
- Change the sheet name to “Global Temperature”
1.5. Forecast
- Drag Dates to Columns, click on the down arrow next to Date and change from Discrete to Continuous
- Drag MLS to Row
- Analytics then Drag Forecast to the sheet
- Change sheet name to Forecast
1.6. Heatmap
-Open a new Sheet
-Drag Date to Columns (Year)
-Drag Date to Rows (Change to Month)
-Drag Median to Color
-Change color scheme
-Change view to “Fit Width”
-Change color bar limit from -1 to +1
-Change sheet name to Heatmap of global temperature
1.7. Plot in 2 axes
-Open new Sheet
-Drag Date to Columns
-Drag Median to Rows
-Drag MSL to the right axis
-Drag Date to Detail, change to Month
-Notice the change
-Change sheet name: Global Temp and MSL
2. Configure R to operate in Tableau
Download R:
- For Window OS: download here: https://cran.r-project.org/bin/windows/base/
- For Mac OS: download here: https://cran.r-project.org/bin/macosx/
Download RStudio: https://rstudio.com/products/rstudio/download/
Install Rserve package:
- Open R
- Tools\Install Packages: type in Rserve
Load Rserve package in R:
> library(Rserve)
> Rserve()
Open Tableau:
- Help\Settings and Performance\Manage External Service Connection\
- External Service: Rserve
- Serve: localhost
- Port 6311
- Click on Test Connection
- If connection successful, press OK
- You are ready to run R in Tableau
3. Data mining in Tableau using R
3.1. Load input data and working with Script()
- Download the `mtcars.csv` from the link below
- Open Tableau and load the text file `mtcars.csv`
- This is a sample data sets with different specifications of car's type, fuel consumption, made, weight, etc. In our data mining example, we gonna simulate the "mile per gallon: mpg" variable based on different input files
- Now let's create a simple script.
- Go to Sheet
- Analysis\Create Calculate Fields...
- A new window appears. Enter the name for the Field: "SimpleR"
- Select SCRIPT_REAL from the right window and start typing the input
SCRIPT_REAL('output <- .arg1 + .arg2+.arg3',AVG([Mpg]),AVG([Cyl]),AVG([Disp]))
- Press Ok to go back to Sheet
- Drag SimpleR to Rows
- Drag Mpg to Columns
- Drag Car to Color
- Observe the change
3.2. Linear Modeling with R and Tableau
We will create the Linear Modeling model in R and visualize that in Tableau.
Open RStudio
- Tools\Install Packages, type in caret to install
- Type in the following script into R script:
library(caret)
data(mtcars)
set.seed(123)
indT <- createDataPartition(y=mtcars$mpg,p=0.6,list=FALSE)
training <- mtcars[indT,]
modLM <- train(mpg ~ cyl + wt + hp, data = training,method="lm")
save(modLM,file="c:/CLEMSON/Workshop/LMmodel.rda")
- The LMmodel.rda has been saved to your local computer, next we will load this model into Tableau
Open Tableau
- Open a new sheet
- Analysis\Create Calculate Fields...
- A new window appears. Enter the name for the Field: "LinearMod"
- Select SCRIPT_REAL from the right window and start typing the input
SCRIPT_REAL('
mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)
load("c:/CLEMSON/Workshop/LMmodel.rda")
output <- predict(modLM, newdata = mydata)
',
AVG([Mpg]),
AVG([Cyl]),
AVG([Wt]),
AVG([Hp]))
- Press Ok to go back to Sheet
- Drag LinearMod to Rows
- Drag Mpg to Columns
- Drag TrainTest to Color & Shape
- Go to Analytics tab, Drag Trend line to the plot (over to Linear)
- Observe the change in correlation for training and testing sets
- Save the sheet with name: "Linear Modeling"
3.3. Random Forest with R and Tableau
We will create the Random Forest model in R and visualize that in Tableau.
Open RStudio
- Type in the following script into R script:
library(caret)
data(mtcars)
set.seed(123)
indT <- createDataPartition(y=mtcars$mpg,p=0.6,list=FALSE)
training <- mtcars[indT,]
modRF <- train(mpg ~ cyl + wt + hp, data = training,method="rf")
save(modRF,file="c:/CLEMSON/Workshop/RFmodel.rda")
- The RFmodel.rda has been saved to your local computer, next we will load this model into Tableau
Open Tableau
- Open a new sheet
- Analysis\Create Calculate Fields...
- A new window appears. Enter the name for the Field: "RandomForest"
- Select SCRIPT_REAL from the right window and start typing the input
SCRIPT_REAL('
mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)
load("c:/CLEMSON/Workshop/RFmodel.rda")
output <- predict(modRF, newdata = mydata)
',
AVG([Mpg]),
AVG([Cyl]),
AVG([Wt]),
AVG([Hp]))
- Press Ok to go back to Sheet
- Drag RandomForest to Rows
- Drag Mpg to Columns
- Drag TrainTest to Color & Shape
- Go to Analytics tab, Drag Trend line to the plot (over to Linear)
- Observe the change in correlation for training and testing sets
- Save the sheet with name: "Random Forest"
3.4. Principal Component Analysis (PCA) with R and Tableau
We will create the PCA model in R and visualize that in Tableau.
Open RStudio
- Type in the following script into R script:
data(mtcars)
#Ignore vs & am (PCA works good with numeric data )
datain <- mtcars[,c(1:7,10:11)]
mtcars.pca <- prcomp(datain,center=TRUE,scale=TRUE)
save(mtcars.pca,file="c:/CLEMSON/Workshop/PCAmodel.rda")
- The PCAmodel.rda has been saved to your local computer, next we will load this model into Tableau
Open Tableau
- Open a new sheet
- Analysis\Create Calculate Fields...
- A new window appears. Enter the name for the Field: "PCA1"
- Select SCRIPT_REAL from the right window and start typing the input
SCRIPT_REAL('
load("c:/CLEMSON/Workshop/PCAmodel.rda")
PCA1 <- mtcars.pca$x[,1]
',ATTR([Car]))
- Similarly, create a new field name: "PCA2"
SCRIPT_REAL('
load("c:/CLEMSON/Workshop/PCAmodel.rda")
PCA2 <- mtcars.pca$x[,2]
' ,ATTR([Car]))
- Press Ok to go back to Sheet
- Drag PCA1 to Columns
- Drag PCA2 to Rows
- Drag Wt to Color
- Drag Car to Tooltip, click on Tooltip, change to Label
- Change the colorbar
- Observe the change in correlation for training and testing sets
- Save the sheet with name: "PCA"
3.5. Kmeans clustering with R and Tableau
Open Tableau
- Open a new sheet
- Analysis\Create Calculate Fields...
- A new window appears. Enter the name for the Field: "kmeans"
- Select SCRIPT_REAL from the right window and start typing the input
SCRIPT_REAL('
mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)
set.seed(123)
km <- kmeans(mydata,3) #Split into 3 clusters
km$cluster
',
AVG([Mpg]),
AVG([Cyl]),
AVG([Wt]),
AVG([Hp]))
- Press Ok to go back to Sheet
- Drag Wt to Columns
- Drag Mpg to Rows
- Drag kmeans to Color & Shape
- Drag Car to Tooltip, click on Tooltip, change to Label
- Change the colorbar
- Observe the change in correlation for training and testing sets
- Save the sheet with name: "k-means"
3.6. Fuzzy C-means clustering with R and Tableau
Open Tableau
- Open a new sheet
- Analysis\Create Calculate Fields...
- A new window appears. Enter the name for the Field: "Fuzzy C-Means"
- Select SCRIPT_REAL from the right window and start typing the input
SCRIPT_REAL('
library(ppclust) # You will need to install this package first using R
mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)
set.seed(123)
res.fcm <- fcm(mydata, centers=3)
res.fcm$cluster
',
AVG([Mpg]),
AVG([Cyl]),
AVG([Wt]),
AVG([Hp]))
- Press Ok to go back to Sheet
- Drag Wt to Columns
- Drag Mpg to Rows
- Drag Fuzzy C-Means to Color & Shape
- Drag Car to Tooltip, click on Tooltip, change to Label
- Change the colorbar
- Observe the change in correlation for training and testing sets
- Save the sheet with name: "Fuzzy C-means"
==================================================
You can find the Tableau file for R attached