Using machine learning to predict football transfer fees

Posted on: 16 August 2024 in Research

Using machine learning to predict football transfer fees blog

Using advanced performance metrics and machine learning, researchers from the Centre for Sports Business have developed new models to predict transfer fees for professional footballers with greater precision.

Unlike previous work underpinned by basic performance statistics, Professor Ian McHale and Dr Benjamin Holmes’ study incorporates advanced player performance metrics, such as plus-minus, action value and crowd-sourced ratings.

The results show these more complex metrics substantially improve the predictive accuracy of regression models.

They also further modelled transfer fees using machine learning algorithms, showing superior forecasting precision, compared to the best performing regression model.

The models are particularly valuable for those making monetary decisions in the transfer market, as they can help identify a benchmark price for players and assess value-for-money from previous decisions.

Value for money in the football transfer market 

In 2017, Brazilian international Neymar swapped Barcelona for Paris Saint Germain (PSG) in a world-record €222m deal, as the Parisians chased a first ever Champions League title.

Fast-forward to 2023, PSG had only reached one Champions League final, and the player once dubbed 'the next Pele' moved to Saudi Arabia after six injury wrecked seasons in Paris.

Neymar’s stint at PSG is an infamous example of how, despite vast sums of money being exchanged between clubs, decision making on transfer fees remains rather rudimentary in the football industry. The transfer market has become big business, with 3,279 transfers including a fee in 2023 (up 14.7% from 2022) which translates to £7.6 billion, compared to £5.1bn the year before1.

Spending large sums on a relatively small number of assets carries high risks, as mistakes made on players who fail to live up to expectations can have catastrophic consequences for clubs, including relegation and bankruptcy.

Advanced player performance metrics improve predictive accuracy

Identifying value-for-money in the football labour market involves searching for metrics that accurately capture the current and future ability of players, and subsequently modelling transfer fees to understand what factors drive the price a club is willing to pay for a player.

Traditionally, studies have focused on basic performance statistics, such as goals scored and minutes played.

However, these fail to capture the true impact of a player on their team and the multitude of on-pitch actions they are responsible for

To fill this gap, Ian and Ben’s study introduces two families of objective player performance metrics:

  • Expected goals2 plus-minus (xGPM), a rating which assesses a team’s performance with and without a specific player, by measuring changes in net expected goals from one set of players (teammates and rivals), to the next (eg after a substitution, when a player is sent off)
  • Goal Impact Metric (GIM), a deep reinforcement learning action value rating3 which indicates how player actions contribute to changing (increasing or decreasing) the probability of an episode of play ending in a goal.

Both ratings capture the full contribution of a player’s on-pitch performance and are not position specific, as players earn ratings for defensive actions as much as for offensive ones.

In addition to objective ratings, Ian and Ben also included crowd-sourced ratings from sofifa.com4, specifically overall and potential ratings, which estimate a player’s current and future ability respectively.

The study’s final dataset5 was completed with transfer details, player characteristics and experience, and financial information on the clubs involved in the transfer, specifically the fees paid for players over the previous three years.

Then they estimated transfer fee values by performing regression analysis, a technique that can be used to predict the value of a data variable -transfer fees- using information on related variables, such as player performance metrics.

The results6 show advanced player performance metrics substantially improve the predictive accuracy of the model, compared to those founded on basic metrics.

Machine learning based models boost predictive accuracy further

Continuing with the same dataset, Ian and Ben tested the forecasting potential of different models based on machine learning algorithms.

They used part of the sample (1,557 transfers) to train three algorithms7 on how to predict out-of-sample transfer fees, that is the actual amounts paid for the remaining sample of 389 transfers.

The results demonstrate two of the models, xgbDART and xgbTree, provide the highest precision in forecasting fees, with a remarkable gain in predictive accuracy from linear regression models.

The results also show the most important variable for determining transfer fees is the average price paid in transfers by the purchasing club in the three years prior to the transfer in question.

According to the authors, this variable must incorporate many elements contributing to the transfer fee, including the club’s financial strength and league, plus the ability of the players they typically target, as high spending clubs are more likely to show interest in top footballers.

The sofifa overall rating is the next most important variable, and GIM features higher than standard performance measures, which suggests decision makers take into account information over and above the traditionally used basic statistics.

Identifying ‘good' and ‘bad’ transfers

Finally, Ian and Ben compared the predicted and actual transfer fees of historical transfers to identify the best and worst value-for-money transfers, based on the excess money paid by the clubs.

Unsurprisingly, Luis Suarez and Bruno Fernandes were identified as ‘good’ transfers, while the fees paid for Neymar, Phillippe Coutinho and Harry Maguire were too high.

This is perhaps the most valuable use of the models for clubs, as analysis of transfer fees can help identify a benchmark price for new signings.

The models also provide a prediction interval, from the lowest to the highest estimated transfer price.

This offers the potential to inform negotiations, with buyers aiming for the lowest reasonable price and sellers for the highest reasonable price, within these bounds.

The results also show clubs known for using analytics for player recruitment are better at identifying value-for-money (eg Liverpool, Brentford), while those known for overpaying for players are identified by the model as making less cost-effective decisions (eg Manchester United, Barcelona).

This suggests an added benefit of using the models, as they allow clubs to assess the historical cost-effectiveness of their decisions and use the information to improve current recruitment strategies. 

 

1 Source: FIFA Global Transfer Market Report 2023

2 Expected goals (xG) is equal to the probability of a shot resulting in a goal, with the main determinant being the location of the shot relative to the goal.

3 Deep reinforcement learning action value ratings are machine learning based metrics aimed to identify the most suitable sequence of actions to maximise rewards in a particular situation, which in the context of this study would be scoring a goal. 

sofifa.com is a website where members offer players’ ratings for a wide range of skills, such as heading, shooting, passing etc. 

5 Data was collated for a sample of 1,946 transfers between 11 August 2016 and 29 September 2020. The transfers involved players leaving 31 different leagues and going to 62 unique leagues, with a total of 69 different leagues involved.

6 The study presents four linear regression models: 1) with basic metrics of player performance; 2) adds only xGPM and GIM; 3) adds only sofifa metrics; and 4) includes all types of performance metrics. Results show model 4 is the best fitting

7 Elastic-net regression algorithm, glmnet, and extreme gradient boosting algorithms xgbDART and xgbTree.   

Professor Ian McHale

Professor of Sports Analytics and Director of the Centre for Sports Business

  Dr Ben Holmes

Lecturer in Sports Analytics

You can read Ian and Ben's paper here:

Holmes, B., and McHale, I. (2023). 'Estimating transfer fees of professional footballers using advanced performance metrics and machine learning'.