R Session

1. Advanced Tableau Visualization

1.1. Clustering in Tableau

- From the previous lecture, use the scatter plot sheet:

- From the scatter plot with All Opiod RX~All Opiod Deaths (varied by County name)

- Remove Trendline

- Analysis/Cluster: drag to the plot and select 4 cluster

- Manually select the 3 points (Horry, Greenville, Charleston)

- Right click and Create Set.

- Enter name: “Cty High Opiods with Deaths/Rx”

- The new set will appear in the Data Tab

- Drag the newly created set to Filter

- Save the Sheet name to Cluster

1.2. Trend for both data sets

-Drag Data Year to Columns

-All Opoids Deaths & All Opoids Rx to Rows

-Since the y-axis are in difference scale, we need to reduce the scale by clicking down arror to each Rows data and select “Quick Table calculation” – “Percentage Difference”

-Notice that the unit will be the same and there are reduction of number of deaths for the year 2016 in both data sets

1.3. Segmentation

-Open new Sheet and name it Segmentation

-Drag Alll Opoids Death & All Opoids RX to Columns

-Drag County name to Rows

-Drag County name to Colors

-Select the sort for All Opoid Deaths

1.4. Baseline

- Download Global_T_SST_MSL.csv and open it

- Data\New Data source to open a new Data set

- Drag Dates to Columns, click on the down arrow next to Date and change from Discrete to Continuous

- Analytics: Reference Band and drag to Sheet then choose Table

- Data tab: Drag Median to Rows, Date to Detail (click on down button and change to month)

- Drag Median to Color

- Change color spectrum to Sunrise – Sunset Diverging

- Change the sheet name to “Global Temperature”

1.5. Forecast

- Drag Dates to Columns, click on the down arrow next to Date and change from Discrete to Continuous

- Drag MLS to Row

- Analytics then Drag Forecast to the sheet

- Change sheet name to Forecast

1.6. Heatmap

-Open a new Sheet

-Drag Date to Columns (Year)

-Drag Date to Rows (Change to Month)

-Drag Median to Color

-Change color scheme

-Change view to “Fit Width”

-Change color bar limit from -1 to +1

-Change sheet name to Heatmap of global temperature

1.7. Plot in 2 axes

-Open new Sheet

-Drag Date to Columns

-Drag Median to Rows

-Drag MSL to the right axis

-Drag Date to Detail, change to Month

-Notice the change

-Change sheet name: Global Temp and MSL

2. Configure R to operate in Tableau

Download R:

- For Window OS: download here: https://cran.r-project.org/bin/windows/base/

- For Mac OS: download here: https://cran.r-project.org/bin/macosx/

Download RStudio: https://rstudio.com/products/rstudio/download/

Install Rserve package:

- Open R

- Tools\Install Packages: type in Rserve

Load Rserve package in R:

> library(Rserve)

> Rserve()

Open Tableau:

- Help\Settings and Performance\Manage External Service Connection\

- External Service: Rserve

- Serve: localhost

- Port 6311

- Click on Test Connection

- If connection successful, press OK

- You are ready to run R in Tableau

3. Data mining in Tableau using R

3.1. Load input data and working with Script()

- Download the `mtcars.csv` from the link below

- Open Tableau and load the text file `mtcars.csv`

- This is a sample data sets with different specifications of car's type, fuel consumption, made, weight, etc. In our data mining example, we gonna simulate the "mile per gallon: mpg" variable based on different input files

- Now let's create a simple script.

- Go to Sheet

- Analysis\Create Calculate Fields...

- A new window appears. Enter the name for the Field: "SimpleR"

- Select SCRIPT_REAL from the right window and start typing the input

SCRIPT_REAL('output <- .arg1 + .arg2+.arg3',AVG([Mpg]),AVG([Cyl]),AVG([Disp]))

- Press Ok to go back to Sheet

- Drag SimpleR to Rows

- Drag Mpg to Columns

- Drag Car to Color

- Observe the change

3.2. Linear Modeling with R and Tableau

We will create the Linear Modeling model in R and visualize that in Tableau.

Open RStudio

- Tools\Install Packages, type in caret to install

- Type in the following script into R script:

library(caret)

data(mtcars)

set.seed(123)

indT <- createDataPartition(y=mtcars$mpg,p=0.6,list=FALSE)

training <- mtcars[indT,]

modLM <- train(mpg ~ cyl + wt + hp, data = training,method="lm")

save(modLM,file="c:/CLEMSON/Workshop/LMmodel.rda")

- The LMmodel.rda has been saved to your local computer, next we will load this model into Tableau

Open Tableau

- Open a new sheet

- Analysis\Create Calculate Fields...

- A new window appears. Enter the name for the Field: "LinearMod"

- Select SCRIPT_REAL from the right window and start typing the input

SCRIPT_REAL('

mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)

load("c:/CLEMSON/Workshop/LMmodel.rda")

output <- predict(modLM, newdata = mydata)

',

AVG([Mpg]),

AVG([Cyl]),

AVG([Wt]),

AVG([Hp]))

- Press Ok to go back to Sheet

- Drag LinearMod to Rows

- Drag Mpg to Columns

- Drag TrainTest to Color & Shape

- Go to Analytics tab, Drag Trend line to the plot (over to Linear)

- Observe the change in correlation for training and testing sets

- Save the sheet with name: "Linear Modeling"

3.3. Random Forest with R and Tableau

We will create the Random Forest model in R and visualize that in Tableau.

Open RStudio

- Type in the following script into R script:

library(caret)

data(mtcars)

set.seed(123)

indT <- createDataPartition(y=mtcars$mpg,p=0.6,list=FALSE)

training <- mtcars[indT,]

modRF <- train(mpg ~ cyl + wt + hp, data = training,method="rf")

save(modRF,file="c:/CLEMSON/Workshop/RFmodel.rda")

- The RFmodel.rda has been saved to your local computer, next we will load this model into Tableau

Open Tableau

- Open a new sheet

- Analysis\Create Calculate Fields...

- A new window appears. Enter the name for the Field: "RandomForest"

- Select SCRIPT_REAL from the right window and start typing the input

SCRIPT_REAL('

mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)

load("c:/CLEMSON/Workshop/RFmodel.rda")

output <- predict(modRF, newdata = mydata)

',

AVG([Mpg]),

AVG([Cyl]),

AVG([Wt]),

AVG([Hp]))

- Press Ok to go back to Sheet

- Drag RandomForest to Rows

- Drag Mpg to Columns

- Drag TrainTest to Color & Shape

- Go to Analytics tab, Drag Trend line to the plot (over to Linear)

- Observe the change in correlation for training and testing sets

- Save the sheet with name: "Random Forest"

3.4. Principal Component Analysis (PCA) with R and Tableau

We will create the PCA model in R and visualize that in Tableau.

Open RStudio

- Type in the following script into R script:

data(mtcars)

#Ignore vs & am (PCA works good with numeric data )

datain <- mtcars[,c(1:7,10:11)]

mtcars.pca <- prcomp(datain,center=TRUE,scale=TRUE)

save(mtcars.pca,file="c:/CLEMSON/Workshop/PCAmodel.rda")

- The PCAmodel.rda has been saved to your local computer, next we will load this model into Tableau

Open Tableau

- Open a new sheet

- Analysis\Create Calculate Fields...

- A new window appears. Enter the name for the Field: "PCA1"

- Select SCRIPT_REAL from the right window and start typing the input

SCRIPT_REAL('

load("c:/CLEMSON/Workshop/PCAmodel.rda")

PCA1 <- mtcars.pca$x[,1]

',ATTR([Car]))

- Similarly, create a new field name: "PCA2"

SCRIPT_REAL('

load("c:/CLEMSON/Workshop/PCAmodel.rda")

PCA2 <- mtcars.pca$x[,2]

' ,ATTR([Car]))

- Press Ok to go back to Sheet

- Drag PCA1 to Columns

- Drag PCA2 to Rows

- Drag Wt to Color

- Drag Car to Tooltip, click on Tooltip, change to Label

- Change the colorbar

- Observe the change in correlation for training and testing sets

- Save the sheet with name: "PCA"

3.5. Kmeans clustering with R and Tableau

Open Tableau

- Open a new sheet

- Analysis\Create Calculate Fields...

- A new window appears. Enter the name for the Field: "kmeans"

- Select SCRIPT_REAL from the right window and start typing the input

SCRIPT_REAL('

mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)

set.seed(123)

km <- kmeans(mydata,3) #Split into 3 clusters

km$cluster

',

AVG([Mpg]),

AVG([Cyl]),

AVG([Wt]),

AVG([Hp]))

- Press Ok to go back to Sheet

- Drag Wt to Columns

- Drag Mpg to Rows

- Drag kmeans to Color & Shape

- Drag Car to Tooltip, click on Tooltip, change to Label

- Change the colorbar

- Observe the change in correlation for training and testing sets

- Save the sheet with name: "k-means"

3.6. Fuzzy C-means clustering with R and Tableau

Open Tableau

- Open a new sheet

- Analysis\Create Calculate Fields...

- A new window appears. Enter the name for the Field: "Fuzzy C-Means"

- Select SCRIPT_REAL from the right window and start typing the input

SCRIPT_REAL('

library(ppclust) # You will need to install this package first using R

mydata <- data.frame(mpg=.arg1, cyl=.arg2, wt=.arg3,hp=.arg4)

set.seed(123)

res.fcm <- fcm(mydata, centers=3)

res.fcm$cluster

',

AVG([Mpg]),

AVG([Cyl]),

AVG([Wt]),

AVG([Hp]))

- Press Ok to go back to Sheet

- Drag Wt to Columns

- Drag Mpg to Rows

- Drag Fuzzy C-Means to Color & Shape

- Drag Car to Tooltip, click on Tooltip, change to Label

- Change the colorbar

- Observe the change in correlation for training and testing sets

- Save the sheet with name: "Fuzzy C-means"

==================================================

You can find the Tableau file for R attached