Football statistics

Statistical analysis of a sporting performance has been replaced by cognitive analysis that allows to elaborate an elevated number of variables and bind each of these together through a dependent relationship. Pivotal is the investigation of any single factor in order to detail increasingly trustworthy; this approach is not as limited beside concepts of non-linear regression, which flows into a restraint number of variables as well as their understanding.

This could be avoided if a neural network and machine learning approach is adopted in the first place. If on one side calculation methodologies and basic math applied to sport science have made this concept much more accountable and consistently reduced error margins, on the other side even the variable’s measurement and evaluation have been improved through a logical machine learning mechanism.

A correct analysis of a football match would be possible if we were capable of establishing how multiple factors could affect the final result of the game; however, we have to be careful from imposing constant temporal quantities/variables since these would not conduct reliable results. In other words, parametric assumptions related to non-linear models cannot be the same used for linear models, as it needs to adopt interactive approaches in real time in order to collect variable data in a given timeframe. It is extremely important to shape those models who allow a football analyst to gather (and process) data and variables in real time.

By dividing an athlete’s performance analysis in three phases (measurement, evaluation and calculation) it is possible to study the problem in its full complexity without omitting those aspects that we wrongly retain as negligible. Actually, the core of machine learning lies on the opportunity to measure events without any sort of limitations in measuring a great load of data; an aspect that certainly comes in help if we consider how many aspects and variables (predictable and unpredictable) affect the sport science. This process can facilitate going beyond normal computational difficulties presented in multidisciplinary activities.

The measurement concept mentioned above can be carried on through the introduction of cognitive technologies which succeed in collecting a great deal of data as well as a high number of variables whose importance could be neglected in the first instance. How can we measure the physical capabilities of an athlete and what tools should be used for performance analysis?

The analysis concept, instead, can be described as a step that either classifies or formulates precise meaning to already-collected data. How does the physical aspect weigh on tactics and vice versa? Does it have to consider other variables previously ignored?

The calculation, in the last instance, represents that process based on a series of algorithms which can be utilised also in real-time, because it takes advantage of a predefined mathematical procedure. The algorithm updates itself as soon as it receives data from a collection source (i.e. cameras and football analysis tools placed on a given pitch at a given time). If we imagine a coach who wants to efficiently use the best skills of one or more of his players in a given match, having real time data measurement and evaluation can show him the best area/zone of the pitch where that given player can better perform or give indications of how the player can compete against the opponent.

Once the main function is fully established, it carries on with a systematic structure defining necessary sub-systems and particularly human control needed to validate each phase through own experience. In other words, this process has to supply a result highly closed to the examined reality. It is acutely crucial to correctly evaluate as many factors as possible if we want to provide a reliable sport performance, since considering a wrong or imprecise analysis can produce unreliable outcomes.

In order to get an applied statistics of expected goals (xG), a great load of information has to be studied, such as shots in the target across multiple seasons made by all the teams who participate in a certain competition. Each shot is classified based on the player’s position, distance from the goal, measured angle between the player and the goal, received pass (cross, on-the-ground pass, lateral pass, through ball, etc.) and if there is a marking defender. In this scenario, there is the introduction of a science analyst, who establishes how xGs determine the final match score. The projections become an unavoidable approach in order to comprehend how the game’s goals (scored and conceded) can influence the final score.

For instance, if by analysing only expected conceded goals we wanted to deduce the final game’s score, we could easily affirm that such an aspect is impossible to determine. In fact, how it is demonstrated in the following tables, the losing team reached a value of xG twice superior than the winning team:

These scenarios (happened in the Italian top League, Serie A, season 2020-21) can explain how further data has to be entered for a more comprehensive analysis of a team performance and, not less importantly, to understand how this can affect the final game’s outcome. In the above table, we tried to simply exhibit that a team makes more shots on target than the opponent, or plays more balls in the opponent’s box, or is constantly in possession of the ball in the last 20-25 metres, but eventually does not win the match. Moreover, also the value of xG based on what potentially produced by the team’s attempts to score and/or team’s presence in the opponent’s box could be favourable to the losing team.

notes:

Rec% is the % of time a player successfully received a pass. Minimum 30 minutes played per squad game to qualify as a leader.

Succ Press is the successful pressing produced by a team.

Int are the ball interceptions produced by a team.

In the tables above it is quite clear how the losing team was superior in other parameters, such as recovering the ball, successful pressing and ball interceptions. Nevertheless, the game’s result was in favour of the team who better effectively and efficiently took advantage of less opportunities occurred during the match. It is pivotal to approach a multi-factorial analysis if we want to correctly evaluate a team’s performance we need to consider any measurable, analysable and calculable aspects. Therefore, the capacity of using not many opportunities and reaching the best result possible (winning the game) depend on an elevated number of factors that practically could be inserted in a bespoke tool.

It is worth it to introduce the psycho-physical condition of an athlete and/or being tactically prepared in certain strategies, factors that essentially have to be considered in addition to those elements previously defined as less fundamental than others. A single player or an entire team can show their superiority even though their shoot less frequently on target. Indeed, it needs to evaluate also the lack of reactivity of the opponent goalkeeper and/ or poor tactical/physical capacity of the opponent defence in challenging who in that given time has reached a better performance. Hence, it must know the event that we want to measure, analyse and calculate. Enhancing, improving an athlete’s performance does not only mean increasing his physical capability, but also focusing on those tactical elements that affect his propensity of efficiently challenging the opponent team. This last aspect is very difficult to evaluate, however we can certainly shape logical mathematical models, based on system engineering approaches and machine learning, which can consider this parameter (propensity of efficiently challenging the opponent team) as a human factor. The difference is always made by the players on the pitch, that is why the next step will be monitoring their performance at a more comprehensive way.

Page updated

Google Sites

Report abuse