This project analyzes Premier League team performance up to Week 15 and applies machine-learning models to predict each club’s points. Both Linear Regression and Random Forest Regression were developed to model week-by-week performance trends. Despite their different mathematical foundations, both models produced remarkably similar predictions, revealing important insights about the underlying dataset, its structure, and the relationships between team metrics and point accumulation.
The visualization below compares actual vs. predicted points for all Premier League teams after 15 matchweeks, highlighting levels of model accuracy, consistency, and the degree to which the dataset supports predictive differentiation.
This bar chart presents a side-by-side comparison of each team’s actual accumulated points versus the predicted points generated by the machine-learning models. The key observations include:
The models consistently predict values between 20–28 points for most teams, even for clubs that are significantly overperforming (e.g., Arsenal) or underperforming (e.g., Burnley, Sheffield United).
This clustering effect indicates that the models tend to regress toward the mean, a common outcome when datasets have limited size or limited predictive variance.
Both Random Forest and Linear Regression delivered nearly identical predictions. This suggests:
The underlying relationship between input features and points is mostly linear.
There are no strong non-linear interactions for the models to exploit.
Team-level Week 15 variables (e.g., goals scored, goals conceded, win/loss record) are sufficiently correlated with points that a linear model explains most of the trend.
With only 15 match weeks and a small set of basic performance metrics, the models lack the depth required to distinguish between teams with similar profiles. As a result:
High-performing teams are under-predicted
Low-performing teams are over-predicted
Mid-table teams show the highest predictive accuracy
This behavior is typical when models do not have access to stronger predictors such as expected goals (xG), injury impact, style-of-play metrics, or recent momentum.
Despite the narrow prediction range, the models correctly capture the relative ordering of many teams:
Teams consistently performing well (Arsenal, Manchester City, Aston Villa) remain near the top.
Teams struggling (Burnley, Sheffield United, Luton Town) appear at the bottom.
This demonstrates that even with limited data, the models recognize broad performance trends
Patterns and Insights Identified
The models place greater weight on stable features such as:
goal difference
total goals scored
defensive record
overall form
Because these indicators rarely swing dramatically across 15 matches, predictions stabilize around a core trend.
The week-15 dataset shows strong correlations between:
goals scored and total points
goal difference and total points
wins and goal metrics
This creates a scenario where teams with similar statistical profiles naturally receive similar predicted values.
Although Random Forest is designed to model non-linear relationships, it behaves almost identically to Linear Regression when:
the feature space is simple
the dataset is small
the relationships are close to linear
This is why both models converged on the same predictions.