Dataset:
Game data: https://www.kaggle.com/c/pubg-finish-placement- prediction.
Gamer data: Survey data
Software: SPSS, Python
Dataset description:
29 variables, 1000 rows. Dividied into two parts: match characteristics and competition characteristics
Target : WinPlace Perc
For gamers:
Damage, DBNOs,Enemy Kill, Heal number are important
Team Kills, Vehicle Distance, Swim Distance seems no important
Most matche types are Squad-fpp, duo fpp and squad.
Match type do have influence on the target from box plot
Some variabels are highly correlated:
Damage dealt - Kills
Kill Place - Kill streak
Kill Points - Rank Points - Win Points
Walk Distance - Win place perc (target)
Clear positive relationship between walk distance and target
Damage Dealt and Kills are highly correlated
Method : PCA
Components : 2
Variance : 0.75
Method : BackWard Elimination
Coeffiect Estimation
DBNOs, Kill Place and Match Type are negativly related to target.
Remain predictors are positively related to target
Max Place and num Groups may have multicollinearity issue
Utility and Validity
From Utility metrics from left, model performs good on the dataset
From residue plot below, the model fits the linear regression assumption
Residual Distribution
Residual QQ Plot
Prediction Difference
If we accept 10% winning probabiity as error, we may predict right at 70% level.
if we get two persons different features, they can have still have similar percent of winning in prediction
For Gamer, they tend to think Damage dealt, DBNOs and kills are most important factors in winning
For Analysis result, walk distance, weapons acquired and boosts are important
That doesn' t mean regression is wrong, they can still be used in predicting game winning in regression.