Premier League Week 15

Predictive Analysis of Team Performance

Executive Summary

This project analyzes Premier League team performance up to Week 15 and applies machine-learning models to predict each club’s points. Both Linear Regression and Random Forest Regression were developed to model week-by-week performance trends. Despite their different mathematical foundations, both models produced remarkably similar predictions, revealing important insights about the underlying dataset, its structure, and the relationships between team metrics and point accumulation.
The visualization below compares actual vs. predicted points for all Premier League teams after 15 matchweeks, highlighting levels of model accuracy, consistency, and the degree to which the dataset supports predictive differentiation.

Interpretation of the Chart: Actual vs. Predicted Points (Week 15)

This bar chart presents a side-by-side comparison of each team’s actual accumulated points versus the predicted points generated by the machine-learning models. The key observations include:

1. Predictions cluster tightly around the mid-table range

The models consistently predict values between 20–28 points for most teams, even for clubs that are significantly overperforming (e.g., Arsenal) or underperforming (e.g., Burnley, Sheffield United).
This clustering effect indicates that the models tend to regress toward the mean, a common outcome when datasets have limited size or limited predictive variance.

2. Strong linear patterns in the data

Both Random Forest and Linear Regression delivered nearly identical predictions. This suggests:

The underlying relationship between input features and points is mostly linear.
There are no strong non-linear interactions for the models to exploit.
Team-level Week 15 variables (e.g., goals scored, goals conceded, win/loss record) are sufficiently correlated with points that a linear model explains most of the trend.

3. Underfitting due to limited dataset complexity

With only 15 match weeks and a small set of basic performance metrics, the models lack the depth required to distinguish between teams with similar profiles. As a result:

High-performing teams are under-predicted
Low-performing teams are over-predicted
Mid-table teams show the highest predictive accuracy

This behavior is typical when models do not have access to stronger predictors such as expected goals (xG), injury impact, style-of-play metrics, or recent momentum.

4. Clear ranking patterns still emerge

Despite the narrow prediction range, the models correctly capture the relative ordering of many teams:

Teams consistently performing well (Arsenal, Manchester City, Aston Villa) remain near the top.
Teams struggling (Burnley, Sheffield United, Luton Town) appear at the bottom.

This demonstrates that even with limited data, the models recognize broad performance trends

Patterns and Insights Identified

1. Consistency matters more than single-match spikes

The models place greater weight on stable features such as:

goal difference
total goals scored
defensive record
overall form

Because these indicators rarely swing dramatically across 15 matches, predictions stabilize around a core trend.

2. Variability is low across features

The week-15 dataset shows strong correlations between:

goals scored and total points
goal difference and total points
wins and goal metrics

This creates a scenario where teams with similar statistical profiles naturally receive similar predicted values.

3. Random Forest’s behavior becomes linear with small data

Although Random Forest is designed to model non-linear relationships, it behaves almost identically to Linear Regression when:

the feature space is simple
the dataset is small
the relationships are close to linear

This is why both models converged on the same predictions.

Page updated

Google Sites

Report abuse