To go into further detail into the modeling choices/methods:
Team sheet data was gathered directly from LabMaus’ tournament webpages. A bag of words approach was used to create a one-hot encoded version of the team sheet dataset, and then dependent variables were removed such that the number of observations was greater than the number of covariates.
The linear model I chose to use was Ridge regression, fit with a penalized Iteratively Reweighted Least Squares Algorithm. LASSO and Elastic Net can restrict some covariate values to zero exactly, performing feature selection, and this is not what we’d like to do. In reality, we know all team sheet choices contribute to tournament success, so eliminating some of these choices reduces the information in the dataset unnecessarily.
Although regular OLS regression also achieves the goal of descriptive inference, OLS tends to overfit to covariates with low representation within the data. Ridge also falls prey to this, as can be seen in some of the bar plots on the Shiny App, but mitigates this somewhat compared to OLS. To choose the Ridge penalty parameter, I used 10-fold cross-validation, optimizing for both out-of-sample log loss and out-of-sample residual fit.
Another important point to consider is the inclusion of covariates with low representation within the dataset. There are many team-building choices that are only included in one or a few of teams, and because these teams often look similar to other teams within the dataset, models will tend to over-estimate the effect of these specific choices, which results in inference that is biased by these small-sample effects. For example, if there is a team in the dataset extremely similar to other teams, only differing by a single move choice, and this team performs poorly, the model will estimate that this specific move choice had a large negative contribution towards win percentage, when in reality this doesn't match the intuitive relationship between team-building and tournament performance.
Including these small-sample effects usually results in a butterfly effect, where team-building choices with small usage having large positive or negative contributions greatly affects the inference towards team-building choices with large usage. This results in plots and tables that are much harder to make conclusions with: We have to consider the combination of Pokémon+Item+Ability+Moves all together to get a sense of what the model favors/doesn't favor, rather than just looking at the contribution of a single team-building choice in isolation.
To help deal with this issue, I decided to remove covariates that have very low representation within the dataset. It's easy to remove covariates on less than some % of teams within the dataset, but some covariates are extremely correlated with others, and this also results in small-sample effects. For example, teams with Archaludon almost always run Electro Shot as well. Including both of these covariates will result in Archaludon having a negative contribution to win rate, while Electro Shot has a large high contribution to win rate. This is because the teams with Archaludon and without Electro Shot usually perform poorly. However, intuitively Archaludon is a very useful Pokémon, so its negative contribution to win rate leads to an incorrect conclusion made from the model's inference. To further deal with this issue, I found correlations between team-building choices, and eliminated 1 feature from each highly correlated pair. This leads to almost all team-building choices contributing positively to win percentage, which allows more concrete conclusions to be made and results in cleaner tables and plots.
With the chosen penalty parameter, I fit the IRLS Ridge algorithm to the entire tournament’s data set, to perform descriptive inference. Everything displayed on the Shiny App is descriptive inference performed on the entire data set, and should not be interpreted as predictive in any way.