# install packages & load libraries
install.packages("car") # once ever
install.packages("corrplot")
library(car) # once every time you reload R
library(corrplot)
# splitting into test and training
dataSet <- #put your data set here and then run the following code as is (don't change anything else)
n <- nrow(dataSet)
trainrows <- sample(1:n,3*n/4)
train <- dataSet[trainrows,]
test <- dataSet[-trainrows,]
# correlation matrix (you need to change to your models of course)
matrix <- cor(hockeySalaries[c("Salary","Weight","Overall","DraftRound","Goals","Assists")], use = "complete.obs")
matrix
# visualizing the correlation matrix
pairs(hockeySalaries[c("Salary","Weight","Overall","DraftRound","Goals","Assists")], use = "complete.obs")
corrplot(matrix)
# VIF ... should be <5
model <- lm(Salary ~ Weight + Overall + DraftRound + Goals + Assists, data=train)
vif(model)
# logistic regression
model <- glm(Predicted ~ Predictor + Predictor + Predictor, family="binomial", data = train)
# multiple linear regression
model <- lm(Predicted ~ Predictor + Predictor + Predictor, data = train)
# testing a logistic regression
train$predictedValues <- predict(model, train,type="class")
trainCorrect <- train$predictedValues == train$actualValues # <- CHANGE actualValues to the column you're predicting
sum(trainCorrect)/length(trainCorrect)
test$predictedValues <- predict(model, test,type="class")
testCorrect <- test$predictedValues == test$actualValues # <- CHANGE actualValues to the column you're predicting
sum(testCorrect)/length(testCorrect)
# testing multiple linear regression
plot(train$Predicted, predict(model, train)) # Need to change "Predicted" in these lines
fitTrain <- lm(train$Predicted ~ predict(model, train))
summary(fitTrain)
plot(test$Predicted, predict(model, test))
fitTest <- lm(test$Predicted ~ predict(model, test))
summary(fitTest)
# more code to be added for decision trees
Copy the code below into a new RMarkdown file to make your project. You can write over everything witht his code, and then change the title/author to make it for you. Then, change nothing besides "YOUR-TEXT-HERE" and the code chunks.
---
title: "Model Fitting Project Template"
author: "Bowman Dickson"
date: "2024-11-13"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### 1. Explanation of the problem/question you are trying to answer.
YOUR-TEXT-HERE
### 2. Explanation of the dataset and variables included. What variables are included in the dataset and why might you think they’ll work in your model? Did you add any features (i.e. combine or change any variables)? Any weird things about any of the variables?
YOUR-TEXT-HERE
### 3. Splitting into test/train set. Show how you split off 20% of the values for a test set.
```{r}
```
### 4. Model making process: Show your initial model and Explain how you made decisions about what variables you included and state your initial model (walk through your process in good detail).
```{r}
```
YOUR-TEXT-HERE
### 5. Model Evaluation 1: Show your residuals. Do they look good? If not, what modifications did you make (logging a variable? Omitting some data?)?
```{r}
```
YOUR-TEXT-HERE
### 6. Model Evaluation 2: Is there any multicollinearity in your model? Show your VIF and correlation matrix analysis to show that there is not. If there is, remove a variable.
```{r}
```
YOUR-TEXT-HERE
### 7. Model Evaluation 3: Show the summary of your model. How strong is your model based on the $r^2$ and p-values?
```{r}
```
YOUR-TEXT-HERE
### 8 Model Evaluation 4: Show how your model performs on the test set vs. the training set. Is there any overfitting?
```{r}
```
YOUR-TEXT-HERE
### 9. Model in practice 1: Show a calculation for a theoretical data point from the future to show how your model would work in practice. Explain the result.
```{r}
```
YOUR-TEXT-HERE
### 11. Model in practice 2: Explain the meaning behind one of the slopes in your model, citing specific numbers. Does it make sense the way it is?
YOUR-TEXT-HERE
### 12. Brief reflection: Take us to the real world. Does it make sense which variables showed up in your model and which didn’t? Do you feel like it’s realistic and performs well? What information would you add to it if you had infinite time?
YOUR-TEXT-HERE