Super Bowl 50 According to Twitter; Sentiment Analysis of 1.8 Million Tweets
At AYLIEN we like to use topical and interesting events like the FIFA World Cup and the Super Bowl to showcase our technology in a simple and interesting way. Primarily we choose world events with a lot of hype associated with them so that we can try and dive into public opinion from data collected from social media and other sources. This time around we decided to focus on Super Bowl 50 which took place on the 7th of February 2016, to try and get a handle on the public reaction too.
Super Bowl 50 saw the Denver Broncos line out against the Carolina Panthers in what was a battle of the strongest defensive outfit in the league, the Broncos, versus an offensive focused Panthers team.
We set out to try and understand the public reaction to Super Bowl 50 by collecting and analyzing reactions online. We hoped to uncover interesting insights and correlations in the build up to and during the actual game. We focused our attention on the volume of chatter surrounding the event for each team, the battle of the quarterbacks and even the advertising battle which has become such a huge part of the whole Super Bowl event each year.
(Interested in the ads battle? Check out the recording of our recent Webinar with RapidMiner where we dive into who came out on top in the SB50 commercials battle here.)
Overall we collected about 1.8 million tweets using the Twitter Search and Streaming APIs. We also pulled team information like rosters and coaches names from SportRadar API, which we later used to segregate tweets. We analyzed all of the tweets gathered using the AYLIEN Text Analysis API and visualized our results in Tableau. We’ll talk more about the whole process later in the blog.
- Twitter Search API and Twitter Streaming API
- SportRadar API
- AYLIEN Text Analysis API
We focused our data collection on keywords, hashtags and handles that were related to Super Bowl 50. You can download the data set here.
Once we collected all of our tweets we spent a bit of time cleaning and prepping our data. We disregarded some of the metadata which we felt we didn’t need. We kept key indicators like time stamps, geolocation, tweet ID and the raw text of each tweet. We also removed any retweets and tweets that contained links. From previous experience, tweets that contain links are mostly objective and don’t hold any opinion towards the event.
You can read more about the technicalities of the process and even copy the code we used in our walkthrough available here.
We used Tableau to visualize our results and embedded some of the more interesting visualizations below.
We started off by analyzing the volume of chatter on Twitter in the build up to and during Super Bowl 50. You can see in the graph below how the chatter builds in the few days leading up to the event with pretty obvious peaks and troughs in volume at the start and right at the end of the game. With the highest number of tweets published as people expressed their reaction to the result.
Volume of Tweets:
We looked at the overall volume which was somewhat interesting but we also wanted to know which team had the most vocal fans and who was tweeting the most. To understand the reaction towards each team we needed to separate the tweets in some way. The first approach we took was to use pre-identified Hashtags, in this Case #BroncosWin and #PanthersWin which in the build up were touted as the official hashtags to use.
This didn’t prove too useful however, for the most part hashtags can have massive spikes in usage and popularity but they usually fade away quite dramatically or get replaced by other hashtags that might be trending at that time. This is nicely visualized in the graph below which shows the decline in usage for each hashtag.
Team Specific Hashtags
Our second approach was a bit more technical and it focused around the idea of classifying tweets as Denver or Carolina focused based on what concepts – team, coach, players, cities – were mentioned in a tweet. We accomplished this using our Concept Extraction feature. For example if a tweet mentions Cam Newton it is most likely a tweet about Carolina.
Volume by Team:
We had a lot more success with this approach and were able to classify around 40% of the 1.8M tweets we collected as either Carolina or Denver focused or not relevant. As you can see in the visualization above, the Panthers fans were far more vocal, tweeting about twice as much as the Broncos fans.
Were the Broncos fans quietly confident or is it down to something simpler like the there being more panthers fans than Broncos?
Location of Tweets
We also wanted to understand where these tweets and the activity was coming from. We could assume that for each team the majority of their activity would focus around their home cities, Denver and Charlotte and we were right. You can see a strong concentration of activity clustered around North Carolina for the Panthers tweets.
It was much the same for the Broncos tweets with most of the activity focused around Colorado.
While both teams seem to have pockets of fans based in other major cities on the east and west there seems to be a lot more Panthers fans spread throughout the west coast from Florida to New Hampshire.
The other obvious clusters were coming mainly from the San Francisco area where the game was held.
While the volume of tweets and how it increases and decreases is interesting, it doesn’t tell us a whole lot about the opinion of the public, who they were going for, which players they like and who they thought would come out on top.
We used the results from the analysis we did using our Sentiment Analysis and Concept Extraction features to understand what people were actually tweeting about and what their sentiment was, towards teams and some players.
First off we looked at the overall polarity i.e. how many tweets were positive, how many were classified negative and how many we deemed neutral. The majority of tweets, as expected came back as neutral.
While you can’t tell from this graph, which team the positivity or negativity is directed at, there are still some interesting insights here. Notably how the most opinionated tweets are positive in the build up to the game, everyone believes in their team, they’re excited for the big game and are showing their excitement with an overall positive sentiment, the negativity does creep in however once the game kicks off, reaching it’s peak at the end of the game which you could assume was down to the disappointment of the Carolina fans.
We’ll talk more about this in the next section but that initial very severe spike in activity and positivity is also interesting.
Using the same approach as before with Concept Extraction to separate tweets we could split the positive and negative tweets into Broncos and Panthers related tweets.
The Carolina Panthers certainly had the most chatter about them from a volume point of view but they also had the most positive sentiment towards them in the build up and the beginning of the game.
Were the Panthers fans too cocky?
Sentiment towards each team (build up):
What happened on February 5th?
The first thing we noticed from the visualization above was the extreme spike on Feb 5th. It’s somewhat strange to see such a large spike in activity at that time especially because it was so rich in positive sentiment. After some digging in the data we figured out that this was down to a campaign ran by Sports Central where they asked their followers to vote on who was going to win Super Bowl 50 using a Twitter poll. Once you voted your account automatically tweeted one of the following tweets. This certainly shows the effect a poll can have on Twitter but its effectiveness is quite short lived.
— Mike Waldron (@MikeWallly) February 26, 2016
As was the case with the hashtags #BroncosWin and #PanthersWin we discussed earlier, the Twitter poll gave a very strong indication of the public opinion at that time, but failed to deliver insight throughout the rest of the build up and during the game itself.
The graph below focuses on game day, there was quite a significant spike in negative sentiment towards the last two quarters towards Carolina and at the end of the game we can see quite a significant amount of negativity present. The the opposite effect can be seen on the Denver side with a large positive spike right at the end of the game. Where fans expressed their delight with the result.
Sentiment towards each team (Game):
We also wanted to focus on some key individuals and how they performed in the eyes of the public. Anyone who watches football will tell you, a lot of the game focuses on one key position, the quarterback.
Below we analyzed both the positive and negative reactions towards both Cam Newton and Peyton Manning during the game. The overall reaction by fans to Newton’s performance was pretty poor. He only completed 18/41 passes and was sacked a totla of 6 times, which was visualized pretty clearly with the dominance of Negative sentiment towards Newton in tweets, especially as it became more evident that the Broncos had shut them out.
Manning, having not thrown a single touchdown pass, was still praised for his performance and control of the game. After all, Denver are known for their defense focused strategies and they closed out the Carolina attack and Newton’s offensive efforts in particular. So while Manning didn’t deliver a perfect quarterback performance, as usual he delivered the goods in the form of a win, much to the satisfaction of the fans.
Player Reaction on Game Day
From a volume of tweets point of view, other players of note included Greg Olsen and Demaryius Thomas who had the most mentions in tweets.
Other Players Mentioned
While this was put together as a fun exercise there are some key takeaways that can be applied to more business and commercially-focused applications. From a data analytics point of view this use case could be classed as a “voice of the customer application” of Text Analytics with a focus on social listening. It’s pretty clear there is a wealth of information about customer opinion towards brands and events on social platforms like Twitter.
- Hashtags can provide insight but they are heavily influenced by flocking and are easily overshadowed or replaced
- Twitter Polls are great way to gain immediate traction and reaction from the twittersphere but they are extremely short lived
- The ability to segment reactions based on concepts in tweets allows for a greater understanding of opinion towards entities and concepts, in this case teams and players but the same can be easily applied to people and brands for example
As we mentioned above we ran a similar analysis process on brand related tweets for the Super Bowl commercials. Check it out here.