How Twitter Experienced the 2018 Champions League Final
Applying Azure Sentiment Analysis technology to Tweets from football fans.
Last week Bayern Munich beat Paris Saint Germain 1-0 in the 2020 Champions League final. Knowing this match was upcoming, and being a Liverpool fan, I was inspired to look back at the previous two finals. The 2019 final was certainly a much happier memory for me, as Liverpool beat Spurs 2-0, but the more interesting match from a pure football perspective was definitely Liverpool’s 3-1 loss to Real Madrid in 2018.
The most contentious moment of the 2018 final occurred when Real Madrid captain Sergio Ramos caused Liverpool’s key player, Mohammed Salah, to be injured. Liverpool fans were outraged as it appeared that Ramos had purposefully pulled the player to the floor to hurt him, as he could be seen smiling afterwards.
The other key moments were a pair of mistakes by Liverpool goalkeeper, Loris Karius, which led to Real Madrid goals in the second half. It would transpire after the game that Karius had been playing with a concussion which had also been inflicted by Sergio Ramos earlier in the match.
To understand how fans were feeling I took all of the tweets which were sent during the match featuring the hashtag ‘#UCLFinal’, where UCL is the UEFA Champions League. Using the Twitter API, I collected 350,000 tweets, along with metadata such as: the time of the tweet, the user’s main language, and their Twitter username. Together this gave me a CSV file containing 1.8Gb of data.
Analysis 1: Language Detection
The first analysis I ran was language detection using Microsoft Azure Machine Learning Studio. For over 80% of the tweets the analysis found that the tweet was in the user’s main language, but this still left tens of thousands of tweets in a different language to the user’s main language. Most of the tweets were in English, which is not surprising as we only took tweets with the hashtag ‘#UCLFinal’, which is written in English.
The first column below shows the language of the user, and the second/third column shows the language of the actual tweet, found by the language detection model. These two letter codes for the languages are ISO 639-1 codes, with ‘en’ for English and ‘es’ for Spanish being the two most common in our dataset.
Analysis 2: Identifying the Subject
Next, I wanted to know what people were tweeting about, so I used a model for Named Entity Recognition. This locates and extracts the names of people, organizations, etc. from text, which allowed me to determine what subjects people were tweeting about more frequently.
The most common entities were the names of the two teams competing, though if we add together the frequency of ‘Ramos’ with ‘Sergio Ramos’, we see that there were actually more tweets about him than about either team individually. I was surprised to see Karius wasn’t very high in this list, barely being mentioned more than Dua Lipa, who provided pre-match entertainment. Karius wasn’t a particularly famous player at this time so it may be many people just referred to him as the “Liverpool goalkeeper” rather than by his name.
Analysis 3: Sentiment Analysis
Finally, I wanted to look at sentiment analysis, which gave me an opportunity to use the Azure Cognitive Services that our CTO, Phil Brown, used a few months ago to analyse the sentiment of the lyrics in pop songs such as Dee-Light’s ‘Groove Is In the Heart’ [read here]. Sentiment analysis gives a “happiness” score to a piece of text, on a scale from 0 to 1, where 0 is utter misery and 1 is absolute delight.
First, I found 1000 tweets which appeared to be from Liverpool fans (judging by their usernames or content of their tweets), then I used the Cognitive Services API to run sentiment analysis against these tweets. Any score above 0.5 was categorised as ‘positive’, any score below 0.5 as ‘negative’.
Plotting these sentiment scores against game time gives us an insight into which events were the most talked about during the match. At kick-off it appears that Liverpool fans were feeling very positive, hoping their team could win the competition. This lasted for approximately half an hour, until Mohammed Salah was taken off injured, causing the biggest spike in negative tweets that we see during the match.
The other two peaks of negativity came when Karius made the mistakes which both led to Real Madrid goals. However, the other goal Real Madrid scored was met with much less negativity, possibly due to it being such a great goal that it was hard for Liverpool fans to complain about it. The main spike of positivity for Liverpool fans came when Mane scored an equaliser early in the second half, but this positivity soon went away when Real Madrid went 2-1 up, and later 3-1 up.
In conclusion, I have found Azure Machine Learning Studio and Azure Cognitive Services to be very useful tools for text analytics, including sentiment analysis. The libraries of pre-built models are particularly useful in saving time for data scientists and making machine learning more accessible.
Both of these tools are very useful, and for slightly different reasons: Azure Cognitive Services gives straightforward access to machine learning, whereas Azure Machine Learning is aimed at a data scientist audience, and provides more flexibility in how these models are configured. Both Azure options gave very strong results with a much shorter workload than manually building models in code, and allowed me to focus on determining the context of the data.
To learn more about how DSP can help you apply Machine Learning to make business-intelligent decisions, get in touch today using the form below. Or check out our other recent blog posts on our work with Azure Machine Learning.