Last month I made my first foray into sports analysis, with a project aimed at analysing and predicting the football Premier League table (click here to read). This was an interesting exploration as to the state of English football, but it was missing one key aspect - a bold prediction of the future! With that in mind I decided the Rugby World Cup offered the perfect opportunity to make a claim which could well be debunked in the coming weeks.
In this article I make the claim that New Zealand will be the winners of the 2019 Rugby World Cup, overcoming England in the final on the 2nd of November, based on my predictions from my Oracle Machine Learning model.
To make a machine-learning based prediction first you need data, and for this I used two main sources: RugbyData.com and espn.co.uk. The former of these gave historical one-on-one stats of each country against each other from the 1870’s until the end of 2018, and from the latter I gained the result of each international match from the last two years in a different format.
The data was not easily accessible, requiring scraping from the website’s HTML code. I then had to format it to make it useful, and fill in some gaps in the data which had occurred due to some combinations of countries having never played each other previously. This is the bread-and-butter work of data science, with the whole act of ‘data mining’ (which is about as glamorous as it sounds) being over 50% of a typical data scientist’s workload.
My machine learning model was built in Oracle Machine Learning, with data being pulled from the Oracle Autonomous Data Warehouse. Using the ADW was very convenient as it reduced the time and effort spent on data handling, with access to the data being very straight-forward after initial processing. Oracle Machine Learning saved huge amounts of time - first in helping me to decide which variables should be included in the model, then in building and tuning the model, much of which Oracle Machine Learning handles autonomously.
The data used to produce these models was all from before the start of the Rugby World Cup in order to allow for a fair prediction that was not influenced by the results of the current competition. In my predictions, Japan were knocked out in the initial Pool stages with only 1 win. At the time of writing this blog Japan have already won 2 out of their 2 games at the World Cup, including a massive upset victory over Ireland. This is part of the magic of sports which cannot be fully encompassed in a statistical model, though it likely would have helped the predictions if the data had considered the fact that Japan are the hosts.
In Pool B the model looks to be performing well so far, though I predicted South Africa to beat New Zealand, whereas in the real-world New Zealand came out ahead in that game. Similarly, in Pool D the model prediction has Australia beating Wales, but Wales won that tie in the real-world. Otherwise the groups are looking good from the first 12 days of Rugby, giving me a little more confidence in my predictions.
There are no huge surprises entering these Quarter-Finals, as the countries who reached this stage were ranked as the 8 best teams in the world upon entering the tournament. Despite Ireland entering the tournament with a slightly higher ranking than New Zealand, my model predicts that the All Blacks will progress to the Semi-Finals.
The Semi-Final matchup between England and South Africa was predicted to be a 22-22 tie, leaving me in a bit of a pickle to decide which team should progress. Thankfully the Oracle model made predictions to a high level of precision, and closer analysis predicted England to score 22.1 points compared to South Africa’s 21.9. This clearly doesn’t have any real-world implications as Rugby scores must be whole numbers, but I used this to justify having England progress to the final at the expense of South Africa.
Unfortunately for my home country, the model predicts that England will be beaten by New Zealand in the final. This would be a huge improvement for England, who failed to qualify for the knock-out stages at the last World Cup. If this prediction is correct it would be a third successive World Cup victory for the All Blacks, an incredible result if they can achieve it.
While we don’t yet know the outcome of the Rugby World Cup, I do feel my predictions using Oracle Machine Learning are looking pretty good. With that said, a clear weakness of my model lies in the fact that there were only around 100 matches used to train the model, as this was how often the World Cup teams played each other between the start of 2018 and the start of the World Cup. While I could have gone from an earlier start-point to create a larger training set, I decided this would not be a prudent move as non-recent data is likely to contain different relationships than the most current data, and I didn’t want to skew my predictions with out-of-date match results.
I think my predictions are a good end-point for this project, as further analysis risks trying to take the fun out of sports, and my predictions leave plenty of uncertainty to have in mind while watching the Rugby World Cup. Watch this space for a review of how my model did!
*not actually true. Probably.