Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

In the first part of our World Cup 2014 blog series, we analyzed 30 million tweets collected between June 6th and July 14th about the biggest sporting event in the world, FIFA World Cup 2014, and we looked at some high-level associations and insights about the tournament: in a nutshell, we observed a repeating pattern of spikes appearing in tweet volumes around match times and important events. In the second part of the series, we’re going to dive deep into the tweets and analyze their content using our very own Text Analysis API and Rapidminer to get a more in-depth view of the data.


We’re using the same datasets that were used in part 1 (tweets.csv) plus a new dataset called tweets-sentiment.csv, which contains the sentiment polarity and subjectivity results obtained using our Sentiment Analysis API in tweet mode.

Top hashtags and mentions

Let’s start our analysis by finding the most popular hashtags and @ mentions from the tournament, by tokenizing tweets and sorting the tokens by frequency:

Sentiment Analysis

We’re now going to look into the polarity values (“positive” or “negative”) of these tweets to see what these values are for different entities and how they change over time, as a result of various events.

Note: we are only analyzing English tweets for the following examples, which introduces a sampling bias. The following charts and insights are based on the opinions of the English-speaking Twitter users.

Sentiment over time

Different events concerning players or a teams affect how people think and talk about them. Using polarity analysis, we can get an idea of people’s reaction to various events, which can provide valuable insight. Let’s look at two major talking points from the tournament as examples: Luis Suarez and the Brazil’s shocking performance.

1. On June 24th, Argentine Luis Suarez was largely accused of biting Italy defender Giorgio Chiellini, which was followed by a big wave of negative comments and feedback from Social Media. Suarez issued an apology on June 30th, which seems to have been satisfactory for the Twitter community (take note PR people!):

2. Brazil had arguably one of its worst performances in World Cup history during the 2014 tournament. This is pretty evident when you analyze the sentiment of tweets about #BRA after every lost match or controversial win:

Before the 3rd place playoff game between Brazil and Netherlands, people were hopeful that the catastrophic loss against Germany might bring the best out of the Brazilians. However, a few minutes into the game it’s pretty clear this was no longer the case:

Popularity by sentiment

We can use the average polarity measures for various entities to see how positively or negatively people talk about them.


Average polarity for the 16 teams that qualified for the second round:


Average polarity for top 10 scorers as well as two noteworthy players, Tim Howard of USA and Luis Suarez of Argentina:

Most ‘polar’ hashtags and mentions

Finally, let’s look at some of the most positive and negative hashtags and mentions:

Final notes

Analyzing the sentiment of tweets gives an extraordinary view into the opinions of the public in relation to a certain topic or event. Listening to “social chatter” allows you to extract detailed insight into opinions and trends on brand, companies, events, football teams etc. and how they change over time, with say, the launch of a product, a company announcement, a crisis event or in the case above a footballer biting another player.

In Suarez’s case his “brand” took a major hit and “social chatter” about him turned pretty sour following the biting incident, however, his PR teams involvement and his deal at Barcelona allowed him to bounce back quite quickly, shown quite clearlyin the switch in polarity of tweets about him.

To learn more about Sentiment Analysis check out our recent blog posts. If you are Interested in analyzing the sentiment of text, tweets, comments or reviews you can get free access to our Text Analysis API.





Note: if you can’t see the charts, please click here.

The FIFA World Cup is without doubt the biggest sporting event in the World, with millions of fans and viewers from all around the globe who use Social Media to share their thoughts and emotions about the games, teams and players and thus creating massive amounts of content on Social Media by doing so.

Throughout the tournament, Facebook saw a record-breaking 3 billion interactions and Twitter saw a whopping 672 million tweets about the World Cup.

That’s why at AYLIEN we decided to collect some of this data using Twitter’s Streaming API and analyzed tweets related to the world cup, looking for interesting insights and correlations.

We are going to explore how you can use text analysis techniques to dig into some of this data in a series of blog posts.

In Part 1 of the series, we’re going to get a high-level view of our data, and also to look for some basic data insights about the tournament.

Data and Tools

Data: datasets used in this blog post are as follows:

  • tweets.csv: Around 30 million Tweets (80 million including retweets – which are omitted) collected between June 6th and July 14th using the Twitter Streaming API, and filtered by some of the official World Cup hashtags (e.g. “#WorldCup” and “#Brazil2014”), as well as team code hashtags (e.g. “#ARG” and “#GER”) and Twitter usernames of teams and players. (Note: we’re assuming that Twitter samples the tweets in a uniform fashion and without any major side effect on their distribution)
  • matches.csv: Information about the 64 matches, such as match time and results, obtained using the World Cup json project.
  • events.csv: Information about match events such as goals, substitutions and cards, obtained using the World Cup json project.

Tools: For these posts we will use AYLIEN Text Analysis API for Sentiment Analysis, RapidMiner for data processing and Tableau for interactive visualizations.


Let’s start our quest by taking a look at the matches and their events, such as goals, substitutions and red and yellow cards:

Things to note:

  • The number of matches with 5 or more yellow cards tends to increase in later stage games, possibly due to higher sensitivity and intensity of these matches.

Tweet languages

Now let’s take a look at a breakdown of the most popular languages used in our tweets dataset:

Things to note:

  • English, followed by Spanish and Portuguese are the three most used languages in our tweets dataset.

Tweet locations

Next we’ll have a look at the distribution of geo-tagged tweets over different countries around the globe, along with their languages:

Tweets and events

Plotting the total volume of tweets over time shows a repeating pattern of spikes appearing at match times and also at times when a major event has occurred (such as elimination of a team, qualification for the next round, or shocking results). Let’s have a look at a few examples:

1. Tweet volume by Language

In these examples, we’re going to see how the volume of tweets in a language is affected by the matches and critical events related to teams from countries where that language is spoken (also note the trend lines in black):

Note: double click on the charts to zoom, click and hold to pan.

Teams: USA, England, Australia, Cameroon and Nigeria.

Teams: Germany and Switzerland.

Teams: France, Belgium, Algeria, Cameroon and Côte d’Ivoire.

Teams: Spain, Argentina, Mexico, Uruguay, Chile, Costa Rica, Ecuador, Honduras and Colombia.

Teams: Italy.

Teams: Brazil and Portugal.

2. Tweet volume during matches

A similar pattern can be observed at a smaller scale during matches, with spikes appearing for each goal or major event. Let’s see an example from the Brazil – Germany match:

3. Tweet volumes for different teams

Finally, let’s take a look at how the volume of tweets that mention a team changes over time for the four teams that qualified for the semi-finals round (for each team we are counting mentions of the team’s full name e.g. “Germany” as well as its team code hashtag e.g. “#GER”):

Subscribe to our blog and stay tuned for part 2, where we use Text Analytics to dig deep into the tweets’ contents.

Got some cool use cases of text analysis? We would love to hear about them. Get in touch below.

Update: here is the second part of the series.

Text Analysis API - Sign up