As the Rugby World Cup draws to a close with England playing South Africa in the final this Saturday, I took the opportunity to reflect upon my predictions from a month ago, which you can read here.
In my previous article, I explained how I used Oracle Machine Learning to predict the entire World Cup on a game-by-game basis, with my predicted final being a New Zealand victory over England. In real life the teams faced off in the semi-final and saw England strongly defeating the three-times champions and progressing to the final.
My predictions for the Pool stage showed some successes, but also some glaring failures.
Having noticed these failures I wanted to find where I had gone wrong, and to do so I began by examining similar machine-learning predictions created by other people. One such example, created by Justin Fisher (linked here), used the same set of data as I had including using the World Rankings to try to enhance the dataset. This may seem like a big coincidence, but in truth the data that is freely available for international rugby union is very limited. Justin’s methods were very different to my own however: he used the very powerful XGBoost algorithm to predict whether a team would win or lose, whereas I predicted the score of each team using Oracle Machine Learning’s pre-built Support Vector Machines algorithm.
Using XGBoost allowed Justin to assign a level of confidence to his predictions, and his model predicted New Zealand to beat England and Wales to beat South Africa, whereas both results went the other way in the real tournament. Thus, despite the different modelling methods, having the same base data led us to the same incorrect results.
Another interesting set of predictions was published in the Express. The underlying data and methods are not explicitly described, though it is possible the data could have been very similar to that which I used. It is explained in the article that the predictions used supercomputers to simulate the tournament thousands of times, taking the most common result as the overall prediction. Once again, this model predicted New Zealand and Wales to progress to the final. Having scoured the internet, almost everyone was predicting that New Zealand would win this World Cup, from machine learning experts, to sports fans, and even the bookmakers; yet they didn't make it to the final.
The problem we are all facing is that we are trying to predict a one-off sporting event, so while we can predict who is more likely to win, we always leave a chance of having egg on our faces. Sports are notoriously difficult to predict due to the huge number of factors that can influence the outcome of a match, including seemingly random occurrences such as Typhoon Hagibis causing three of the Pool stage games to be cancelled and scored as 0-0 ties.
For instance, I realised soon after creating my model that it was an oversight to not account for the advantage Japan would have from playing in their home country, as this helped to push them to win all four of their pool stage games. Home team advantage is a common phenomenon in international sports, including England winning the Football World Cup on home soil in 1966, South Africa winning the Rugby World Cup at home in 1995, and India winning the Cricket World Cup at home in 2011. There are countless factors which influence the outcome of sporting events, and it would have required years of data collection and modelling to put together a truly strong predictive model.
With all that has been said about the weaknesses of modelling with machine learning, it was far from a complete failure. In the games played I predicted the correct team to win 74.4% of the time, which is much higher than random chance.
This project was undertaken as a fun exploration of Oracle Machine Learning, a platform which allows models to be built and tested very quickly, allowing me to consider a wider variety of data and models than if I had been coding these models myself. Here at DSP-Explorer we are exploring how we can integrate Oracle Machine Learning models into our day-to-day work, in situations where machine learning is much more suitable and successful. Check out the DSP-Explorer website to find out more about our Oracle consultancy services, or our Managed Services offerings.
To conclude this project, I used my Oracle Machine Learning model to predict the final, in which I predict an extra-time victory for England. Here’s hoping my prediction is correct as I’ll be supporting Eddie Jones’s men all the way this Saturday morning!