Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78


Super Bowl 51 had us on the edge of our seats. A dramatic comeback and a shocking overtime finish meant the 111.3 Million Americans who tuned into the event certainly got what they came for. Even though TV viewership was down on previous years, the emotional rollercoaster that was Sunday’s game will certainly go down as one of the greatest.

As with any major sporting event, the Super Bowl creates an incredible amount of hype, particularly on Social Media. All of the social chatter and media coverage around the Super Bowl means it’s a fantastic case study in analyzing the voice of fans and their reactions to the event. Using advanced Machine Learning and Natural Language Processing techniques, such as Sentiment Analysis, we are able to understand how fans of both the Patriots and the Falcons collectively felt at any given moment throughout the event.

Not familiar with Sentiment Analysis? Sentiment Analysis is used to detect positive or negative polarity in text and can help you understand the split in opinion from almost any body of text, website or document.

Our process

We used the Twitter Streaming API to collect a total of around 2.2 million tweets that mentioned a selection of game and team-related keywords, hashtags and handles. Using the AYLIEN Text Analysis API, we analyzed each of these tweets and visualized our results using Tableau. In particular, we were interested in uncovering and investigating the following key areas:

  • Volume of tweets before, during and after the game
  • Sentiment of tweets before, during and after the game
  • Team-specific fan reactions
  • The most tweeted players
  • The most popular Super Bowl hashtag

Keyword selection

We focused our data collection on keywords, hashtags and handles that were related to Super Bowl 51 and the two competing teams, including;

#SB51, #superbowl, #superbowlLI, #superbowl51, #superbowl2017, #HouSuperBowl, #Patriots, #NEPatriots, #newenglandpatriots, #Falcons, #AtlantaFalcons.

Once we collected all of our tweets, we spent a bit of time cleaning and prepping our data set, first by disregarding some of the metadata which we felt we didn’t need. We kept key indicators like time stamps, tweet ID’s and the raw text of each tweet. We also removed retweets and tweets that contained links. From previous experience, we find that tweets containing links are mostly objective and generally don’t hold any author opinion towards the event.

Tools we used


Like with many of our data-driven blog posts, we used Tableau to visualize our results. All visualizations are interactive and you can hover your mouse over each one to dive deeper into the key data points from which they are generated.

We began our analysis of Super Bowl 51 by looking at the overall volume of tweets in the lead up and during the game.

Tweet volume over time: all tweets

The graph below represents minute-by-minute fluctuations in tweet volumes before during and after the game. For reference, we’ve highlighted some of the key moments throughout the event with the corresponding spikes in tweet volume.

As you can see, there is a definite and steady increase in tweet volume in the period leading up to the game. From kickoff, it is then all about reactions to in-game highlights, as seen by the sharp spikes and dips in volumes. We’ve also highlighted the halftime period to show you the effect that Lady Gaga’s performance had on tweet volumes.

Let’s now take a closer look at the pre-game period and in particular, fan predictions.

Pre-game tweet volume: #PatriotsWin vs. #FalconsWin

For the past 13 years, video game developers EA Sports have been using their football game ‘Madden NFL’ to simulate and predict the winner of the Super Bowl each year. They now have a 10-3 success-failure rate, in case you were wondering! In recent times, they have also been inviting the Twittersphere to show their support for their team by using a certain hashtag in their tweets. For 2017, it was #PatriotsWin vs. #FalconsWin.
So, which set of fans were the most vocal in the 2017 #MyMaddenPrediction battle? We listened to Twitter in the build up to the game for mentions of both hashtags, and here’s what we found;

58.57% of tweets mentioned #FalconsWin while 41.43% went with #PatriotsWin. While the Patriots were firm pre-game favorites, it is likely that the neutral football fan on Twitter got behind the underdog Falcons as they chased their first ever Super Bowl win, in just their second appearance.

Tweet volume over time by team

Now that we’ve seen the overall tweet volume and the pre-game #MyMaddenPrediction volumes, let’s take a look at tweet volumes for each individual team before, during and after the game.
The graph below represents tweet volumes for both teams, with the New England Patriots in the top section and the Atlanta Falcons in the bottom section.

Talk about a game of two halves! That vertical line you can see between the two main peaks represents halftime, and as you can see, Falcons fans were considerably louder in the first half of the game, before the Patriots fans brought the noise in the second half as their team pulled off one of the greatest comebacks in Super Bowl history.

Sentiment analysis of tweets

While tweet volumes relating to either team can be a clear indicator of their on-field dominance during various periods of the game, we like to go a step further and look at the sentiment of these tweets to develop an understanding of how public opinion develops and fluctuates.

The charts below are split into two sections;

Top: Volume of tweets over time, by sentiment (Positive / Negative)

Bottom: Average sentiment polarity over time (Positive / Negative)

New England Patriots

What’s immediately clear from the chart above is that, for the majority of the game, Patriots fans weren’t too happy and it seems had given up hope. However, as you can see by the gradual increase in positive tweets sentiment and volume in the final third, their mood clearly and understandably changes.

Atlanta Falcons

In stark contrast to the Patriots chart, Falcons fans were producing high volumes of positive sentiment for the majority of the game, until the Patriots comeback materialized, and their mood took a turn for the worse, as indicated by the drop of sentiment into negative.

Most tweeted individuals

To get an understanding of who people were talking about in their tweets, we looked at the top mentioned individuals. Unsurprisingly, Tom Brady was heavily featured after his 5th Super Bowl triumph.However, the most mentioned individual had no part to play in the actual game.

All notable players and scorers (and even Brady himself) were shrugged aside when it came to who the viewers were talking about and reacting to most on Twitter, as halftime show performer Lady Gaga dominated. To put the singer’s domination into perspective, she was mentioned in nearly as many tweets as Brady and Ryan were combined!

To get an idea of the scale of her halftime performance, check out this incredible timelapse;

Interestingly, national anthem singer Luke Bryan was tweeted more than both the Patriots’ Head Coach Bill Belichick and catch-of-the-game winner Julian Edelman. Further proof, if needed, that the Super Bowl is not just about the game of football, but that it is becoming more and more of an entertainment spectacle off the field.

Most popular Super Bowl hashtags

We saw a variety hashtags emerge for the Super Bowl this year, so we decided to see which were the most used. Here are the top 5 most popular Super Bowl hashtags, which we have visualized with volumes below;






Despite the NFL’s best efforts to get Twitter using #SB51, the most obvious and simple hashtag of #SuperBowl was a clear winner.


There is no other event on the planet that creates as much hype in the sporting, advertising and entertainment worlds. But the Super Bowl as we know it today, is far less about the football and more about the entertainment factor and commercial opportunity. With big brands spending a minimum $5 Million for a 30 second commercial, competition for viewers eyes and more importantly viewers promotion through shares and likes on social media, the Super Bowl has become big business.

In our next installment, we’ve analyzed the chatter around Super Bowl 51 from a branding point of view. We collected and analyzed Twitter data and news and media coverage of the event to pinpoint which brands and commercials joined the Patriots as Super Bowl 51 champions.

Text Analysis API - Sign up



Here at AYLIEN we spend our days creating cutting-edge NLP and Text Analysis solutions such as our Text Analysis API and News API to help developers build powerful applications and processes.

We understand, however, that not everyone has the programming knowledge required to use APIs, and this is why we created our Text Analysis Add-on for Google Sheets – to bring the power of NLP and Text Analysis to anyone who knows how to use a simple spreadsheet.

Today we want to show you how you can build an intelligent sentiment analysis tool with zero coding using our Google Sheets Add-on and a free service called IFTTT.

Here’s what you’ll need to get started;

What is IFTTT?

IFTTT stands for If This, Then That. It is a free service that enables you automate specific tasks by triggering actions on apps when certain criteria is met. For example, “if the weather forecast predicts rain tomorrow, notify me by SMS”.

Step 1 – Connect Google Drive to IFTTT

  • Log in to your IFTTT account
  • Search for, and select, Google Drive
  • Click Connect and enter your Google login information

Step 2 – Create Applets in IFTTT

Applets are the processes you create to trigger actions based on certain criteria. It’s really straightforward. You define the criteria (the ‘If’) and then the trigger (the ‘That’). In our previous weather-SMS example, the ‘if’ is a rain status within a weather app, and the ‘that’ is a text message that gets sent to a specified cell phone number.

To create an applet, go to My Applets and click New Applet.

Here’s what you’ll see. Click the blue +this

Screen Shot 2016-12-02 at 14.48.41

You will then be shown a list of available apps. In this case, we want to source specific tweets, so select the Twitter app.

You will then be asked to choose a trigger. Select New tweet from search.

You can now define exactly what tweets you would like to source, based on their content. You can be quite specific with your search using Twitter’s search operators, which we’ve listed below;

Twitter search operators

To search for specific words, hashtags or languages

  • Tweets containing all words in any position (“Twitter” and “search”)  
  • Tweets containing exact phrases (“Twitter search”)
  • Tweets containing any of the words (“Twitter” or “search”)
  • Tweets excluding specific words (“Twitter” but not “search”)
  • Tweets with a specific hashtag (#twitter)
  • Tweets in a specific language (written in English)

To search for specific people or accounts

  • Tweets from a specific account (Tweeted by “@TwitterComms”)
  • Tweets sent as replies to a specific account (in reply to “@TwitterComms”)
  • Tweets that mention a specific account (Tweet includes “@TwitterComms”)

To exclude Retweets and/or links

  • To exclude Retweets (“-rt”)
  • To exclude links/URLs (“-http”) and (“-https”)

Our first trigger

We’re going to search for tweets that mention “bad santa 2 is” or “bad santa 2 was”. Why are we searching for these terms? Well, we find that original, opinionated tweets generally use either one of these phrases. It also helps to cut out tweets that contain no opinion (neutral sentiment) such as the one below;


Our goal with this tool is to analyze the viewer reaction to “Bad santa 2”  which means Tweets such as this one aren’t entirely interesting to us in this case. However, if we wanted to asses the overall buzz on Twitter about Bad Santa 2 perhaps we might just look for any mention at all and concentrate on the volume of tweets.

And so, here’s our first trigger.

Screen Shot 2016-12-07 at 10.53.35

Click Create Trigger when you’re happy with your search. You will then see the following;

Screen Shot 2016-12-01 at 17.35.29Notice how the Twitter icon has been added. Now let’s choose our action. Click the blue +that

Next, search for or select Google Drive. You will then be given 4 options – select Add row to spreadsheet. This action will add each matching tweet to an individual row in Google Sheets.

Next, give the spreadsheet a name. We simply went for ‘Bad Santa 2’. Click Create Action. You will then be able to review your applet. Click Finish when you are happy with it.

Done! Tweets that match your search criteria will start appearing in an auto-generated Google Sheet within minutes. Now you can go through this process again to create a second applet. We chose another movie, Allied. (“Allied was” or “Allied is”).

Here is an example of what you can expect to see accumulate in your Google Sheet;

Screen Shot 2016-12-02 at 17.59.39

Note: When you install our Google Sheets Add-on we’ll give 1,000 credits to use for free. You then have the option to purchase additional credits should you wish to. For this example, we will stay within the free range and analyze 500 tweets for each movie. You may choose to use more or less, depending on your preference.

Step 3 – Clean your data

Because of the nature of Twitter, you’re probably going to find a lot of crap and spammy tweets in your spreadsheet. To minimize the amount of these tweets that end up in your final data set, there are a few things we recommend you do;

Sort your tweets alphabetically

By sorting your tweets alphabetically, you can quickly scroll down through your spreadsheet and easily spot multiples of the same tweet. It’s a good idea to delete multiple instances of the same tweet as they will not only skew your overall results but multiple instances of the same tweet can often point to bot activity or spamming activity on Twitter. To sort your tweets alphabetically, select the entire column, select Data and Sort sheet by column B, A-Z.

AYLIEN Start Analysis

Remove retweets (if you haven’t already done so)

Alphabetically sorting your tweets will also list all retweets together (beginning with RT). You may or may not want to include retweets, but this is entirely up to you. We decided to remove all retweets because there are so many bots out there auto-retweeting and we felt that using this duplicate content isn’t exactly opinion mining.

Search and filter certain words

Think about the movie(s) you are searching for and how their titles may be used in different contexts. For example, we searched for tweets mentioning ‘Allied’, and while we used Twitter’s search operators to exclude words like forces, battle and treaty, we noticed a number of tweets about a company named ‘Allied’. By searching for their company Twitter handle, we could highlight and delete the tweets in which they were mentioned.

NB: Remove movie title from tweets

Before you move on to Step 4 and analyze your tweets, it is important to remove the movie title from each tweet, as it may affect the quality of your results. For example, our tweet-level sentiment analysis feature will read ‘Bad Santa 2…” in a tweet and may assign negative sentiment because of the inclusion of the word bad.

To remove all mentions of your chosen movie title, simply use EditFind and Replace in Google Sheets.

Step 4 – Analyze your tweets

Now comes the fun part! It’s time to analyze your tweets using the AYLIEN Text Analysis Add-on. If you have not yet installed the Add-on, you can do see here.

Using our Add-on couldn’t be easier. Simply select the column containing all of your tweets, then click Add-onsText Analysis.

Select sentiment

To find out whether our tweets have been written in a positive, neutral or negative way, we use Sentiment Analysis.

Note: While Sentiment Analysis is a complex and fascinating field in NLP and Machine Learning research, we won’t get into it in too much detail here. Put simply, it enables you to establish the sentiment polarity (whether a piece of text is positive, negative or neutral) of large volumes of text, with ease.

Next, click the drop-down menu and select Sentiment AnalysisAnalyze.

Each tweet will then be analyzed for subjectivity (whether it is written subjectively or objectively) and sentiment polarity (whether it is written in a positive, negative or neutral manner). You will also see a confidence score for both subjectivity and sentiment. This tells you how confident we are that the assigned label (positive, negative, objective, etc) is correct.

By repeating this process for our
Allied tweets, we can then compare our results and find out which movie has been best received by Twitter users.

Step 5 – Compare & visualize

In total we analyzed 1,000 tweets, 500 for each movie. Through a simple count of positive, negative and neutral tweets, we received the following results;

Bad Santa 2

Positive – 170

Negative – 132

Neutral – 198


Positive – 215

Negative – 91

Neutral – 194

Now to generate a percentage score for each movie. Let’s start by excluding all neutral tweets. We can then easily figure out what percentage of remaining tweets are positive. So, for Allied, of the remaining 306 tweets, 215 were positive,giving us a positive score of 70%.

By doing the same with Bad Santa 2, we get 56%.

Allied wins!

To visualize your results, use your tweet volume data to generate some charts and graphs in Google Sheets;

piechartsComparing our results with Rotten Tomatoes & IMDb

It’s always interesting to compare results of your analysis with those of others. To compare ours, we went to the two major movie review site – Rotten Tomatoes & IMDb, and we were pleasantly surprised with the similarity in our results!


The image below from Rotten Tomatoes shows both critic (left) and audience (right) score for Allied. Seeing as we analyzed tweets from a Twitter audience, we are therefore more interested in the latter. Our score of 70% comes so close to that of almost 15,000 reviewers on Rotten Tomatoes – just 1% off!

Screen Shot 2016-12-07 at 15.54.03

IMDb provide an audience-based review score of 7.2/10. Again, very close to our own result.

Screen Shot 2016-12-07 at 16.08.46

Our result for Bad Santa 2, while not as close as that of Allied, was still pretty close to Rotten Tomatoes with 56%.

Screen Shot 2016-12-07 at 15.54.24

With IMDb, however, we once again come within 1% with a score of 5.7/10.

Screen Shot 2016-12-07 at 16.09.04


We hope that this simple and fun use-case using our Google Sheets Add-on will give you an idea of just how useful, flexible and simple Text Analysis can be, without the need for any complicated code.

While we decided to focus on movie reviews in this example, there are countless other uses for you to try. Here’s a few ideas;

  • Track mentions of brands or products
  • Track event hashtags
  • Track opinions towards election candidates

Ready to get started? Click here to install our Text Analysis Add-on for Google Sheets.

Text Analysis API - Sign up



In recent months, we have been bolstering our sentiment analysis capabilities, thanks to some fantastic research and work from our team of scientists and engineers.

Today we’re delighted to introduce you to our latest feature, Sentence-Level Sentiment Analysis.

New to Sentiment Analysis? No problem. Let’s quickly get you up to speed;

What is Sentiment Analysis?

Sentiment Analysis is used to detect positive or negative polarity in text. Also known as opinion mining, sentiment analysis is a feature of text analysis and natural language processing (NLP) research that is increasingly growing in popularity as a multitude of use-cases emerge. Here’s a few examples of questions that sentiment analysis can help answer in various industries;

  • Brands – are people speaking positively or negatively when they mention my brand on social media?
  • Hospitality – what percentage of online reviews for my hotel/restaurant are positive/negative?
  • Finance – are there negative trends developing around my investments, partners or clients?
  • Politics – which candidate is receiving more positive media coverage in the past week?

We could go on and on with an endless list of examples but we’re sure you get the gist of it. Sentiment Analysis can help you understand the split in opinion from almost any body of text, website or document – an ideal way to uncover the true voice of the customer.

Types of Sentiment Analysis

Depending on your specific use-case and needs, we offer a range of sentiment analysis options;

Document Level Sentiment Analysis

Document level sentiment analysis looks at and analyzes a piece of text as a whole, providing an overall sentiment polarity for a body of text.

For example, this camera review;

Screen Shot 2016-11-22 at 17.56.07

receives the following result;

Screen Shot 2016-11-22 at 17.56.14

Want to test your own text or URLs? Check out our live demo.

Aspect-Based Sentiment Analysis (ABSA)

ABSA starts by locating sentences that relate to industry-specific aspects and then analyzes sentiment towards each individual aspect. For example, a hotel review may touch on comfort, staff, food, location, etc. ABSA can be used to uncover sentiment polarity for each aspect separately.

Here’s an example of results obtained from a hotel review we found online;

Screen Shot 2016-11-22 at 17.58.05

Note how each aspect is automatically extracted and then given a sentiment polarity score.

Click to learn more about Aspect-Based Sentiment Analysis.

Sentence-Level Sentiment Analysis (SLSA)

Our latest feature breaks down a body of text into sentences and analyzes each sentence individually, providing sentiment polarity for each.

SLSA in action

Sentence-Level Sentiment Analysis is available in our Google Sheets Add-on and also through the ABSA endpoint in our Text Analysis API. Here’s a sample query to try with the Text Analysis API;

Now let’s take a look at it in action in the Sheets Add-on.

Analyze text

We imported some hotel reviews into Google Sheets and then ran an analysis using our Text Analysis Add-on. Below you will see the full review in column A, and then each sentence in a column of its own with a corresponding sentiment polarity (positive, negative or neutral), as well as a confidence score. This score reflects how confident we are that the sentiment is correct, with 1.0 representing complete confidence.

Screen Shot 2016-11-23 at 17.54.55

Analyze URLs

This new feature also enables you to analyze volumes of URLs as it first scrapes the main text content from each web page and then runs SLSA on each sentence individually.

In the GIF below, you can see how the content from a URL on Business Insider is first broken down into individual sentences and then assigned a positive, negative or neutral sentiment at sentence level, thus providing a granular insight into the sentiment of an article.


What’s the benefit of SLSA?

As we touched on earlier, sentiment analysis, in general, has a wide range of potential use-cases and benefits. However, Document-Level Sentiment Analysis can often miss out on uncovering granular details in text by only providing an overall sentiment score.

Sentence-Level Sentiment Analysis allows you to perform a more in-depth analysis of text by uncovering the positive, neutral and negatively written sentences to find the root causes of the overall document-level polarity. It can assist you in locating instances of strong opinion in a body of text, providing greater insight into the true thoughts and feelings of the author.

SLSA can also be used to analyze and summarize a collection of online reviews by extracting all the individual sentences within them that are written with either positive or negative sentiment.

Ready to get started?

Our Text Analysis Add-on for Google Sheets has been developed to help people with little or no programming knowledge take advantage of our Text Analysis capabilities. If you are in any way familiar with Google Sheets or MS Excel you will be up and running in no time. We’ll even give you 1,000 free credits to play around with. Click here to download your Add-on or click the image below to get started for free with our Text Analysis API.


Text Analysis API - Sign up



It’s certainly an exciting time be involved in Natural Language Processing (NLP), not only for those of us who are involved in the development and cutting-edge research that is powering its growth, but also for the multitude of organizations and innovators out there who are finding more and more ways to take advantage of it to gain a competitive edge within their respective industries.

With the global NLP market expected to grow to a value of $16 billion by 2021, it’s no surprise to see the tech giants of the world investing heavily and competing for a piece of the pie. More than 30 private companies working to advance artificial intelligence technologies have been acquired in the last 5 years by corporate giants competing in the space, including Google, Yahoo, Intel, Apple and Salesforce. [1]

It’s not all about the big boys, however, as NLP, text analysis and text mining technologies are becoming more and more accessible to smaller organizations, innovative startups and even hobbyist programmers.

NLP is helping organizations make sense of vast amounts of unstructured data, at scale, giving them a level of insight and analysis that they could have only dreamed about even just a couple of years ago.

Today we’re going to take a look at 3 industries on the cusp of disruption through the adoption of AI and NLP technologies;

  1. The legal industry
  2. The insurance industry
  3. Customer service

NLP & Text Analysis in the Legal industry

While we’re still a long long way away from robot lawyers, the current organic crop of legal professionals are already taking advantage of NLP, text mining and text analysis techniques and technologies to help them make better-informed decisions, in quicker time, by discovering key insights that can often be buried in large volumes of data, or that may seem irrelevant until analyzed at scale, uncovering strategy-boosting and often case-changing trends.

Let’s take a look at two examples of how legal pro’s are leveraging NLP and text analysis technologies to their advantage;

  • Information retrieval in ediscovery
  • Contract management
  • Article summarization

Information retrieval in ediscovery

Ediscovery refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format. Electronic documents are often accompanied by metadata that is not found on paper documents, such as the date and time the document was written, shared, etc. This level of minute detail can be crucial in legal proceedings.

As far as NLP is concerned, ediscovery is mainly about information retrieval, aiding legal teams in their search for relevant and useful documents.

In many cases, the amount of data requiring analysis can exceed 100GB, when often only 5% – 10% of it is actually relevant. With outside service bureaus charging $1,000 per GB to filter and reduce this volume, you can start to see how costs can quickly soar.

Data can be filtered and separated by extracting mentions of specific entities (people, places, currency amounts, etc), including/excluding specific timeframes and in the case of email threads, only include mails that contain mentions of the company, person or defendant in question.

Contract management

NLP enables contract management departments to extract key information, such as currency amounts and dates, to generate reports that summarize terms across contracts, allowing for comparisons among terms for risk assessment purposes, budgeting and planning.

In cases relating to Intellectual Property disputes, attorneys are using NLP and text mining techniques to extract key information from sources such as patents and public court records to help give them an edge with their case.

Article summarization

Legal documents can be notoriously long and tedious to read through in their entirety. Sometimes all that is required is a concise summary of the overall text to help gain an understanding of its content. Summarization of such documents is possible with NLP, where a defined number of sentences are selected from the main body of text to create, for example, a summary of the top 5 sentences that best reflect the content of the document as a whole.

NLP & Text Analysis in the Insurance industry

Insurance providers gather massive amounts of data each day from a variety of channels, such as their website, live chat, email, social networks, agents and customer care reps. Not only is this data coming in from multiple channels, it also relates to a wide variety of issues, such as claims, complaints, policies, health reports, incident reports, customer and potential customer interactions on social media, email, live chat, phone… the list goes on and on.
The biggest issue plaguing the insurance industry is fraud. Let’s take a look at how NLP, data mining and text analysis techniques can help insurance providers tackle these key issues;

  • Streamline the flow of data to the correct departments/agents
  • Improve agent decision making by putting timely and accurate data in front of them
  • Improve SLA response times and overall customer experience
  • Assist in the detection of fraudulent claims and activity

Streamlining the flow of data

That barrage of data and information that insurance companies are being hit by each and every day needs to be intricately managed, stored, analyzed and acted upon in a timely manner. A missed email or note may not only result in poor service and an upset customer, it could potentially cost the company financially if, for example, relevant evidence in a dispute or claim case fails to surface or reach the right person/department on time.

Natural Language Processing is helping insurance providers ensure the right data reaches the right set of eyeballs at the right time through automated grouping and routing of queries and documents. This goes beyond simple keyword-matching with text analysis techniques used to ‘understand’ the context and category of a piece of text and classify it accordingly.

Fraud detection

According to a recent report by Insurance Europe, detected and undetected fraudulent claims are estimated to represent 10% of all claims expenditure in Europe. Of note here, of course, is the fraud that goes undetected.

Insurance companies are using NLP and text analysis techniques to mine the data contained within unstructured sources such as applications, claims forms and adjuster notes to unearth certain red flags in submitted claims. For example, a regular indicator of organized fraudulent activity is the appearance of common phrases or descriptions of incidents from multiple claimants. The trained human eye may or may not be able to spot such instances but regardless, it would be a time consuming exercise and likely prone to subjectivity and inconsistency from the handler.

The solution for insurance providers is to develop NLP-powered analytical dashboards that support quick decision making, highlight potential fraudulent activity and therefore enable their investigators to prioritise cases based on specifically defined KPIs.

NLP, Text Analysis & Customer Service

In a world that is increasingly focused on SLAs, KPIs and ROIs, the role of Customer Support and Customer Success, particularly in technology companies, has never been more important to the overall performance of an organization. With the ever-increasing number of startups and innovative companies disrupting pretty much every industry out there, customer experience has become a key differentiator in markets flooded with consumer choice.

Let’s take a look at three ways that NLP and text analysis is helping to improve CX in particular;

  • Chat bots
  • Analyzing customer/agent interactions
  • Sentiment analysis
  • Automated routing of customer queries

Chat bots

It’s safe to say that chat bots are a pretty big deal right now! These conversational agents are beginning to pop up everywhere as companies look to take advantage of the cutting edge AI that power them.

Chances are that you interact with multiple artificial agents on a daily basis, perhaps even without realizing it. They are making recommendations as we online shop, answering our support queries in live chats, generating personalized fitness routines and communicating with us as virtual assistants to schedule meetings.

Screen Shot 2016-09-16 at 12.21.48

A recent interaction I had with a personal assistant bot, Amy
Chat bots are helping to bring a personalized experience to users. When done right, not only can this reduce spend in an organization , as they require less input from human agents, but it can also add significant value to the customer experience with intelligent, targeted and round-the-clock assistance at hand.

Analyzing customer/agent interactions

Interactions between support agents and customers can uncover interesting and actionable insights and trends. Many interactions are in text format by default (email, live chat, feedback forms) while voice-to-text technology can be used to convert phone conversations to text so they can be analyzed.

Listening to their customers

The voice of the customer is more important today than ever before. Social media channels offer a gold mine of publicly available consumer opinion just waiting to be tapped. NLP and text analysis enables you to analyze huge volumes of social chatter to help you understand how people feel about specific events, products, brands, companies, and so on.

Analyzing the sentiment towards your brand, for example, can help you decrease churn and improve customer support by uncovering and proactively working on improving negative trends. It can help show you what you are doing wrong before too much damage has been done, but also quickly show you what you are doing right and should therefore continue doing.

Customer feedback containing significantly high levels of negative sentiment can be relayed to Product and Development teams to help them focus their time and efforts more accordingly.

Because of the multi-channel nature of customer support, you tend to have customer queries and requests coming in from a variety of sources – email, social media, feedback forms, live chat. Speed of response is a key performance metric for many organizations and so routing customer queries to the relevant department, in as few steps as possible, can be crucial.

NLP is being used to automatically route and categorize customer queries, without any human interaction. As mentioned earlier, this goes beyond simple keyword-matching with text analysis techniques being used to ‘understand’ the context and category of a piece of text and classify it accordingly.


As the sheer amount of unstructured data out there grows and grows, so too does the need to gather, analyze and make sense of it. Regardless of the industry in which they operate, organizations that focus on benefitting from NLP and text analysis will no doubt gain a competitive advantage as they battle for market share.


Text Analysis API - Sign up


One of our API users Farhad, from Taskulu recently published a blog post on how he uses AYLIEN Text Analysis API for lead generation on Twitter.

As Farhad puts it himself; “There are many people out there literally asking you to introduce your product to them so they can become your customers, but the problem is that it’s very difficult to find them!”

The idea behind the app he created is simple; discover, understand and get involved in conversations about management platforms on twitter, as they happen.

With about 50 lines of code, a free AYLIEN subscription and a Google spreadsheet, Taskulu are now actively mining twitter in real time for “Tweeleads.” Tweets made by users who are looking for alternative project management platforms. In the first 8 hours of having the App live, Fahad had managed to gather about 35 “Tweeleads” in a spreadsheet. He has since been following up on each lead and has been having some interesting conversations with prospects since.

For those of you who want to try it out, they’ve published the code for generating Tweeleads on Github. You can also find the documentation and instructions for setting it up on the Github page.

The whole process doesn’t take longer than 5 minutes! If you want to give it a go, you’ll need to do the following:

Farhad Hedayati is co-founder and CEO of Taskulu, a management platform that lets you keep all the stakeholders and resources of your project inside one project – making project managers lives, team management and communication a lot easier. He regularly writes about growth hacking, marketing techniques, startups, and occasionally programming, you can Follow him on Twitter: @farhad_hf

If you have an interesting application for Text Analysis or our API then let us know. We love to see our users hack simple and useful apps together off our API’s.



If you’ve visited our blog before, you’ll know we like to analyze social chatter surrounding major events around the world and try to visualize our findings in interesting ways. In the past, we’ve collected Tweets from the FIFA World Cup, Apple Live and most recently we decided to analyze the public reaction on Twitter to Super Bowl XLIX.

Not surprisingly, Superbowl XLIX generated a huge amount of chatter on social networks with Twitter Estimating that over 28.4 million posts made with terms relating to the Superbowl.

At AYLIEN, we collected just under 4 million Tweets from Hashtags, Handles and Keywords we were monitoring. To keep our sample clean, we removed any reTweets and spam from the Tweets collected and only worked with those Tweets that were written in English. We were left with about 3.5 million Tweets to play with. Our idea was to collect a sample of Tweets run them through our Text Analysis API and visualise the results in some interactive graphs.


Data Collection:

  • Twitter Streaming API
  • RapidMiner

    Data Analysis:

  • AYLIEN Text Analysis API
  • Language Detection
  • Entity Extraction
  • Sentiment Analysis


  • Tableau


    First of all, we looked at where most of the Twitter activity was coming from. Not surprisingly, the most activity was coming from the US. Europeans were also quite active and by the looks of it were happy to suffer from lack of sleep at work on the Monday in order to stay up and experience the event live.


    The superbowl chatter however, did spread far and wide. There was even a Tweet posted during the game by somebody in Antartica!

    Maybe it was this guy?:

    Total Volume

    The second thing we did was, we analysed the volume of Tweets over time. We hoped to see how major events before, during and immediately after the game affected how vocal fans were. Lo and behold it worked.

    Major Events

    On top, we have the total volume of Tweets and in the lower graph we have displayed the Tweets related to certain entities, teams, players, coaches and of course Katy Perry.

    What’s interesting here is, you can see exactly when the activity kicked off for the pre-game coverage. You can imagine fans settling down for the game and voicing their opinions and predictions on Twitter via their phone or tablet. Throughout the game, there were 3 major spikes in activity, just before the kickoff, halftime and the turning point of the game when the patriots went ahead 28-24.


    By far the most mentions went to none other than…Katy Perry, who headlined the much-anticipated halftime show. What does this tell us about the Superbowl fans? They love pop music?

    Half time shows aside, there were some interesting constants and spikes in mentions as the game developed. References to the much loved Tom Brady were pretty constant throughout the game. Pete Carroll, however, only really featured with a spike in activity towards the end of the game (I wonder why that was?). Not immediately after the game but once the dust had settled and reality sunk in, there was a huge spike of activity mentioning the Seahawks, perhaps from disgruntled fans expressing their frustration or maybe even loyal fans reiterating their support for a team who were narrowly defeated. Somehow, I doubt it was the latter.

    So while the spikes in mentions are interesting, what is more, insightful is the context of the Tweets. Whether these are posts made in support of teams and individuals or the opposite.

    Sentiment Analysis: Positive and Negative

    One of the more interesting visualizations we produced was the sentiment intensity graph below:

    Sentiment Intensity

    This displays the polarity (positive or negative) of Tweets which mentioned either team. Perhaps the most interesting event on this graph is the extreme swing in polarity of Tweets mentioning the Seahawks. Right when the Patriots took the lead the polarity of posts, mentioning Seahawks, went from slightly negative to “very negative”. It’s also pretty clear from the drop in positivity that, Seahawks fans reacted very negatively to handing over their lead to their opponents.

    We were also interested in how the Polarity of Tweets mentioning certain entities, developed throughout the game. We displayed some of the most interesting ones below by focusing on the teams, key players and Pete Carroll.

    Sentiment Polarity

    Tweets mentioning the Patriots had very little negativity associated with them, which was also evident in our first sentiment graph. From this, we can assume that either Patriots fans have a lot more faith in their team or, they are generally liked a lot more by football fans with no tie to either team. Either way they had a much larger following of supporters than their opponents. Another interesting aspect to this visualisation is how the “hero” Tom Brady stayed in the positive range throughout while sentiment, toward Marshawn Lynch and Pete Carroll especially, plummeted after the game as fans voiced their opinions on Carrolls Superbowl losing decision. Opting not to run the ball with Marshawn Lynch and go for the touchdown pass instead, was a decision that cost him dearly.

    The Ads

    These days, the Superbowl is as much about the ads and halftime show, as it is about the football. Before the game, we decided to track a few of the bigger name brands to try and get a feel for who won the ads battle.

    Brand Mentions

    Of the 4 brands we followed Budweisers #lostdog campaign dominated with more than 5X the mentions on Twitter than the other brands. We also tracked viewers reactions to the advertisements by again analyzing the sentiment of Tweets made that referenced the brand.

    While Budweiser had the most mentions, they also had the strongest positive reaction to the ad, as shown below. However, the same can’t be said for T-Mobiles ad with Kim Kardashian, which was very poorly received by Superbowl fans. But you know what they say, bad publicity is good publicity.

    Brand Sentiment



For anyone who doesn’t know, the WebSummit is an annual tech conference held in Dublin Ireland. For three days in November Tech junkies, Media personnel, Startups, Tech Giants, Founders and even School kids all descend on the RDS to learn from the best, secure investment or land that perfect job. It truly is a geek’s networking heaven.

Apart from WiFi issues and long queues, this year’s WebSummit was a huge success. Tipping over 22,000 attendees this year, its biggest yet, the WebSummit has gone from strength to strength since starting out as a 500 person meet up for the local technology community.

Before the WebSummit, which we attended and thoroughly enjoyed, we decided that, along with countless other analytics companies, we would put together a blog post on data gathered from Twitter over the course of the 3 or 4 days.

We tracked the official hashtags from the WebSummit (#WebSummit, #WebSummit14 and #WebSummit2014) and also gathered data on a dozen or so of the speakers, listed below: (some for other reasons than others)

12 Speakers

  • Paddy Cosgrave
  • Drew Houston
  • Bono
  • Mark Pincus
  • John Collison
  • Mikel Svane
  • Phil Libin
  • Eva Longoria
  • David Goldberg
  • David Karp
  • Lew Cirne
  • Peter Thiel

    Using Twitter’s Streaming API we collected about 77,300 Tweets in total from 10am Nov 3 to 10am Nov 7 (times in GMT). We set it to monitor the hashtags and users mentioned above.

    Once we had gathered the Tweets, we used AYLIEN Text Analysis API from within RapidMiner to Analyze the Sentiment of the Tweets. Following the analysis, we visualized the data in Tableau. You can read more about RapidMiner and AYLIEN Text Analysis API here.


    While the activity was quite constant over three days you can see three major spikes in the volume of Tweets which represent each day. It’s pretty clear from this that people were enjoying themselves too much at the Night Summit to be Tweeting from the pub with the drop in volume as the day progressed. There was also a pretty evident dip in activity during lunch which suggests we all enjoyed the food or the networking opportunities the lunch break provided.

    The second graph below shows the volume of tweets with a mention of one of the speakers we were monitoring. You can clearly see spikes in volume when they hit the stage to speak. Tweets mentioning Paddy Cosgrave, WebSummit’s founder stayed pretty constant throughout. Surprisingly, the most talked about speaker at this technology conference wasn’t the founder of Dropbox or even Peter Thiel, it was Eva Longoria, the star of Desperate Housewives! Bono came in second and Peter Thiel was the third most mentioned speaker. It turns out even tech geeks have a thing for celebrities.

    Geo-location and Language

    We utilized the location data returned via the Twitter API to map where most of the activity was coming from. Not suprisingly, chatter was mainly concentrated in Dublin. What was surprising is how little activity was coming from the US.

    Tweets from and about the Summit were predominantly written in English. Considering there were companies and attendees from all over the world we expected more multi-lingual data and were surprised by the lack of Tweets in other languages.


    We hoped to get a feel for people’s reactions to the event by mining and analyzing the voice of attendees through their expressions and activity on Twitter. Overall the sentiment of the event was quite positive, however there were some negative trends that creep in throughout the event. People were most vocal about the lack of a good connection and the queues for the food.

    We analyzed all the positive and negative Tweets and created a word cloud for each by extracting keywords mentioned in those tweets. This gave a pretty clear understanding for what people liked and disliked about the event.

    Note: you can hover over the circles in the word clouds to see any words that aren’t displayed.

    The Good

  • Attendees used words like “Great”, “Love” and “Amazing” to describe the conference
  • They also really enjoyed their time in “Dublin”
  • People loved the main stage
  • The event lived up to its networking promises as attendees had positive things to say about the “people” they met

    The Bad

  • “Bad” was a common word used in negative descriptions of the event, as was “critics” and surprisingly, “people”
  • The “wifi” certainly featured as a negative topic as were the queues
  • The RDS (event holders) took a bit of a hit for not providing adequate wifi

    Some words and terms were evident in both positive and negative Tweets. The jury was out on Eva Longoria’s attendance and it’s pretty obvious the public is still undecided on what they make of Bono.

    The WiFi (The “Ugly”)

    Considering it was a tech event you would presume connectivity would be a given. That wasn’t the case. There was a strong reaction to the lack of a WiFi signal. At an event that gets 20,000+ tech heads into one room, each with a minimum of 2 devices, ensuring the ability to stay connected was always going to be a challenge.

    The initial reaction to the WiFi issues was evident in the sharp drop in polarity of Tweets. Each day it certainly had an effect on the overall sentiment of the event. However, at the close of the event the polarity had returned to where it started as people wrapped their WebSummit experience up in mostly positive Tweets. Perhaps the lack of connectivity also meant that a lot of the attendees didn’t even get the option to vent their frustrations online.

    We really enjoyed our time at the Summit, met some great people and companies and learned a lot from some of the excellent speakers. Looking forward to next year’s Summit already!

    Text Analysis API - Sign up


  • At AYLIEN, we do our best to make sure our users get up and running and calling our API in the shortest time possible. As part of a new initiative, we are going to be sharing use case ideas, source code and fully functional apps to help you get the most out of our API. For this edition of the blog, we are going to focus on a pretty common use case that a lot of users want to use our API for, analyzing Tweets.

    There is a wealth of insight that can be extracted from Tweets. You can read more on Analyzing Tweets and Social data in our previous blog; Why is Sentiment Analysis important from a business perspective.

    Today, we’re going to provide you with the source code for a functioning app that mines Twitter for keywords, extracts Tweets and analyzes the text in the Tweets. As part of the process, we’ll run two analysis endpoints on each Tweet, Sentiment Analysis on all of the Tweets and Hashtag Suggestion on Tweets that contain a URL.

    What you’ll need to get going:


    • Twitter API access: Get your api_key, consumer_secret key and access token to make calls to the Twitter API here.
    • Node.js running on your machine: If you don’t already have Node.js on your machine you can download it here.
    • Twit: A Twitter API client library for Node.js. You can download it here.
    • A text editor: You can use any editor, We recommend Sublime Text and you can download it here.
    • AYLIEN Text Analysis API access: Get your Application ID and Application Key here. See our ‘getting started’ blog for details on how to sign up.


    Overview of the code in action

    To give an overview of what can be achieved, we will first look at the code in action. The complete code snippet is given at the end of this blog for you to copy and paste.

    Step1. Setup your Environment

    Ensure Node.js is running on your machine, download the twit client library from GitHub, get access to the Twitter API and finally, open an AYLIEN Text Analysis API account.

    Step 2. Copy the Code

    Open your text editor and copy and paste the code snippet (provided at the bottom of this blog) and save the file as, tweetsentiment.js. Next, open command prompt and Navigate to the folder where you saved the code snippet.

    The application takes two command line parameters which you chose; a keyword for the Twitter query and the number of Tweets the query should return.

    Step 3. Run your Code

    Run the code by typing “tweetsentiment websummit 3”. In this case we are querying the keyword “websummit” and asking for 3 Tweets to be returned.

    Once the Tweets are returned by the Twitter API they are fed to AYLIEN Text Analysis API, where the polarity will be determined and where the optimal Hashtags for URL’s will be generated.

    Note: Ensure you replace the YOUR_APP_ID and YOUR_APP_KEY placeholders in the code with the application id and application key which you received when you signed up for the AYLIEN API. You will also need to fill in your specific Twitter API credentials that you received from Twitter. All going well you should see an output on the command line similar to that shown below.


    Tweet Text           :  RT @IndoBusiness: The #WebSummit is drawing to a close. #Bono up soon. Watch live here: Or follow our blog here: htt.
    Sentiment Polarity   :  neutral
    Polarity Confidence  :  0.9702560119839743
    Hashtags :  [
    Tweet Text           :  RT @FierceClever: I swear to god, if I hear the word "di
    srupt" one more time... #websummit
    Sentiment Polarity   :  negative
    Polarity Confidence  :  0.8947368421052632
    Hashtags :  No Hastags available as no Url specified in the Tweet
    Tweet Text           :  Having a great day at the #websummit . Were at stand ECM 243 in the village if anyone would like to pop over before closing!
    Sentiment Polarity   :  positive
    Polarity Confidence  :  0.9230769230769231
    Hashtags :  No Hastags available as no Url specified in the Tweet

    Taking the first result as an example, you can see that the Tweet itself is displayed followed by the “sentiment polarity” of the Tweet (positive, neutral or negative) and the “polarity confidence” i.e. the confidence that the sentiment returned was correct (a number between 0 and 1). Finally, if the Tweet contained a URL embedded in the Tweet a list of optimal hashtags is generated for that webpage/article.

    How the code works

    It’s worth looking at the two parts of the solution that do most of the heavy lifting :

    1. Querying Twitter

    Querying Twitter is very straight forward using the twit client and requires just one line of code:

    T.get('search/tweets', {
    	q: process.argv[2],
    	count: process.argv[3]
    }, function(err, data, response) {
      data.statuses.forEach(function(s) {

    The above line of code uses the supplied command line arguments to query Twitter, it then passes the returned results one by one to the function that will feed the body of the Tweet and the embedded URL (if any) to the AYLIEN API endpoints for analysis.

    2. Analyzing the Tweets that are returned.

    The function below, takes the following arguments, the AYLIEN endpoint to call (Sentiment, Hashtags, Entities etc.) the parameters which the endpoint should work on (i.e. we indicate whether we are passing a piece of text or a URL for analysis and we also pass the actual text or URL) and a callback function to call when the analysis is complete.

    function call_api(endpoint, parameters, callback) {
      var postData = querystring.stringify(parameters);
      var request = https.request({
        host: '',
        path: '/api/v1/' + endpoint,
        headers: {
          'Accept':                             'application/json',
          'Content-Type':                       'application/x-www-form-urlencoded',
          'Content-Length':                     postData.length,
          'X-AYLIEN-TextAPI-Application-ID':    APPLICATION_ID,
          'X-AYLIEN-TextAPI-Application-Key':   APPLICATION_KEY,
      }, function(response) {
        var data = "";
        response.on('data', function(chunk) {
          data += chunk;
        response.on('end', function() {

    The examples we have used, analyze the Tweets for Sentiment and Hashtag Suggestions. It’s up to you what endpoints you wish to use. Maybe you want to extract entities or concepts from the Tweets as well. A full list of our endpoints can be found in our documentation.

    The Code Snippet

    var Twit = require('./node_modules/twit')  //Twitter API client library
    var https = require('https'), querystring = require('querystring');
    //AYLIEN API Credentials
    //Twitter API Credentials
    var T = new Twit({
        consumer_key:         YOUR_TWITTER_CONSUMER_KEY
      , consumer_secret:      YOUR_TWITTER_CONSUMER_SECRET
      , access_token:         YOUR_TWITTER_ACCESS_TOKEN
      , access_token_secret:  YOUR_TWITTER_ACCESS_TOKEN_SECRET
    var analysisResults = {};
    var parameters;
    var i =  process.argv[3] * 2; //Counter to track when Asynchronous API call have completed
    console.log("Processing your request. Please wait...")
    T.get('search/tweets', { q: process.argv[2], count: process.argv[3] }, function(err, data, response) {
      data.statuses.forEach(function(s) {
        var returnedUrls = s.entities.urls;
        analysisResults[] = {};
        analysisResults[].text = s.text;
      	parameters = {'text': s.text};
        function callAylienAPIs(parameters, callback) {
          call_api('sentiment', parameters, function(result) {
            var a = {};
            a.endpoint = 'sentiment';
            a.polarity = result.polarity;
            a.polarity_confidence = result.polarity_confidence;
            analysisResults[].sentiment = a;
            if (i == 0)  {
          if (returnedUrls.length > 0 ) {
              var url_paramaters = {'url' : returnedUrls[0].expanded_url };
              call_api('hashtags', url_paramaters, function(result) {
                var a = {};
                a.endpoint = 'hashtags';
                a.hashtags = result.hashtags;
                analysisResults[].hashtags = a;
                if (i == 0)  {
            } else {
                var a = {};
                a.endpoint = 'hashtags';
                a.hashtags = 'No Hastags available as no Url specified in the Tweet';
                analysisResults[].hashtags = a;
                if (i == 0)  {
    function outputResults() {
        for (var key in analysisResults) {
        console.log("Tweet Text           : ", analysisResults[key].text);
        console.log("Sentiment Polarity   : ", analysisResults[key].sentiment.polarity);
        console.log("Polarity Confidence  : ", analysisResults[key].sentiment.polarity_confidence);
        console.log("Hashtags : ", analysisResults[key].hashtags.hashtags);
    function call_api(endpoint, parameters, callback) {
      var postData = querystring.stringify(parameters);
      var request = https.request({
        host: '',
        path: '/api/v1/' + endpoint,
        headers: {
          'Accept':                             'application/json',
          'Content-Type':                       'application/x-www-form-urlencoded',
          'Content-Length':                     postData.length,
          'X-AYLIEN-TextAPI-Application-ID':    APPLICATION_ID,
          'X-AYLIEN-TextAPI-Application-Key':   APPLICATION_KEY,
      }, function(response) {
        var data = "";
        response.on('data', function(chunk) {
          data += chunk;
        response.on('end', function() {

    Text Analysis API - Sign up


    Social Listening

    It simply isn’t an option these days for businesses to ignore the voice of their customers on social channels. There is a huge amount of business insight hidden in text on social channels, but it can be difficult to block out the noise and gain business insight from social data. Buying signals, support queries, complaints etc can all be gleaned from social chatter and activity by properly analyzing the voice of customers and users online. For more on this check out our blog on “Why sentiment analysis is important from a business perspective.”

    Analyzing social data and listening to the voice of your customers can be hard and often involves costly software solutions and/or certain technical expertise to gather data, analyze it and visualize the results.

    That’s pretty much why we built our Text Analysis add-on. We built it with the everyday analyst or marketer in mind. We wanted to provide a quick and easy way for our users to analyze text, without the hassle, cost and complications of traditional Text Analysis tools.

    Our Text Analysis add-on is built on a package of machine learning and natural language processing API’s that allow you to perform sophisticated text analysis without any programming or technical expertise.

    In this how-to guide we are going to demonstrate just how easy it is to collect tweets, analyze them and report on your findings from within Google Sheets.

    If you haven’t used the add-on before you can download it here and if it’s your first time then check out our getting started tutorial to get up and running.

    To build a social listening tool, you will need the following:

    • An inquisitive mind
    • Google Spreadsheets
    • AYLIEN Text Analysis add-on
    • Some way of gathering your tweets (Copy and paste, RSS feeds, Twitter Curator)

    For the purpose of this blog, we are going to gather a sample of 100 tweets that mention Ryanair, analyze them, look for insights and graph our results. We will aim to automatically determine what language the tweets are written in, extract any mentions of locations and determine what the sentiment is towards Ryanair from this sample set of tweets.

    Step 1 – Data Collection

    Collect and gather your tweets in a blank spreadsheet. You can copy and paste the tweets from another source, use Twitter curator to collect your tweets with the click of button or if you have the technical expertise write a script to automatically mine Twitter. (Keep an eye out for our webinar and blog on how to build a basic twitter mining tool)

    Step 2 – Analysis

    Once you have your tweets laid out as desired in a Spreadsheet, start your Text Analysis Add-on. For a guide on how to get up and running with the add-on visit our tutorial page.

    First things first, determine what language your tweet is written in by using the language detection function. (=language(X)). Keep in mind you can drag the formula down through the rest of the column to analyze all the tweets automatically, which saves a lot of time and effort.


    Screen shot 2014-10-10 at 5.18.27 PM.png

    Screen shot 2014-10-10 at 3.19.48 PM.png


    Then, extract any mentions of locations by using the locations extraction function (=locations(X)) and do the same as above to drag the formula throughout the rest of the column.


    Screen shot 2014-10-10 at 5.19.04 PM.png


    Lastly, use the sentiment analysis feature to find out if the tweets are negative, positive or neutral. This can be done using the Sentiment Polarity Feature (=sentimentpolarity(x)).


    Screen shot 2014-10-10 at 5.21.33 PM.png


    Following this you should have a spreadsheet that looks like this:


    Screen shot 2014-10-10 at 3.32.00 PM.png


    (Keep in mind the colour coding on the sentiment column is down to the formatting of the column and isn’t generated automatically.)

    So far we have collected and analyzed our tweets, now all that is left to do is build some pretty graphs to visualize the data.

    Step 3 – Reporting

    The advantage of having this data in a Spreadsheet means it is extremely flexible. It can be shared, copied, combined with other data and reported on very easily.

    We are going to create some basic reports based off the data we have gathered throughout the process. This will be done entirely within Google Spreadsheets by utilising the pivot table report. Pivot tables are a very handy way of preparing your data to be visualized in graphs, you can read more about them here.

    To get started with your report, select the range of data you want to report on. Choose data in the main toolbar and click on Pivot Table Report.


    Screen shot 2014-10-10 at 3.34.53 PM.png


    As an example we are going to create a simple bar chart showing the different languages of the tweets in our dataset.

    Once you have clicked on Pivot Table Report in the drop down menu, a separate sheet called “pivot table 1” will open. In the sidebar of the sheet the is a reporting widget. Here is where you choose how your report is laid out.

    In this particular report, we want to get a breakdown of the different languages used in the sample set of tweets and figure out what language is used the most.

    Sort your “rows” by language and under “values” we also choose language. The report widget will be defaulted to summarize by SUM which will leave our table full of zero’s. This needs to be changed to “COUNTA” in order to display the count data.


    Screen shot 2014-10-10 at 3.36.10 PM.png


    Below is an example of what a basic pivot table should look like

    Screen shot 2014-10-10 at 3.35.57 PM.png


    In order to graph the results, choose the data you want to include by highlighting the appropriate cells in the table. Click on “insert” in your toolbar and choose “chart”.

    You should be left with a simple bar chart like the “Tweets by Language” one below. You can get a bit more creative with how you customize your charts by adding colours and formatting.

    You can choose from a wide range of bar charts, geo charts, pie charts etc… all of which are displayed in our completed graphs below.


    Tweets by language: Using the language detection feature we could easily recognise that the majority of tweets out of our 100 sampled were in English.


    Sentiment of tweets: Here we have provided a chart that displays the percentage of tweets that were positive, negative and neutral. On close inspection we noticed the majority of the neutral tweets were general enquiries or news reports.


    Geo locations: Here we have displayed mentions of locations which were extracted automatically from tweets.




    Sentiment of tweets by location: From studying this graph it is pretty clear to see that, of the small sample of tweets we analyzed, there was a lot of negativity in tweets that also mentioned Corfu. On further investigation it became clear that there was in fact a delayed flight which left passengers stranded in Corfu at the time the sample tweets were collected. image

    You can download the add-on here and for more information on our Text Analysis add-on videos, cheat sheets and tutorials visit or tutorial page.

    Text Analysis API - Sign up


    Welcome to the second part of our blog series; Analyzing text in Rapidminer. In the first part of the series, we built a basic setup for analyzing the sentiment of any arbitrary text, to find out if it’s positive, negative or neutral. In this blog, we’re going to build a slightly more sophisticated process than the last one, which we can use to scrape movie reviews from Rotten Tomatoes and analyze them in RapidMiner.

    In a nutshell, we’re going to:

    • Scrape movie reviews for The Dark Knight using the Web Mining extension for RapidMiner.
    • Run the reviews through AYLIEN Text Analysis API to extract their sentiment.
    • Compare the extracted sentiment values with the Fresh or Rotten ratings from reviewers to see if they follow the same pattern.


    • RapidMiner v5.3+ (download)
    • Text Analysis API key (subscribe for free here)

    Step 1: Extract Reviews from Rotten Tomatoes

    The Web Mining extension comes with a set of useful tools designed to crawl the web and scrape web pages for information. Here we’re using the Process Documents from Web operator to scrape a review page, and we use XPath queries to extract the text of the reviews:

    • First, drag and drop a Process Control > Loop > Loop operator into your process. We will use the Loop operator to scrape multiple pages of reviews, which gives us more reviews to analyze.
    • Next, configure the Loop operator to run 5 times, which means we’re going to scrape 5 pages of The Dark Knight reviews – a total of 100 reviews.
    • Now double click the Loop operator and add the Web Mining > Process Documents from Web operator, which will fetch the contents of each review page and provide its HTML for further analyses.
    • Configure the newly added operator to fetch the Nth page of reviews, where N is the current iteration of the Loop operator. The url parameter should look like this{iteration}
    • Process Documents from Web exposes a sub-process for further analysis of the page contents. Double click the operator to access this sub-process and add a Text Processing > Transformation > Cut Document operator, which will extract individual reviews from a single review page.
    • Configure the Cut Document operator to segment the page using the following XPath query: //h:table[@class='table table-striped']/h:tr
    • The Cut Document operator will expose each extracted segment in a sub-process, so let’s add a Text Processing > Extraction > Extract Information operator to extract the actual text of the review.
    • Now let’s connect everything and run the process to get our 100 reviews.

    Step 2: Analyze Reviews using Sentiment Analysis API

    Now that we have run the process and we have our reviews, it’s time to send them to Text API’s /sentiment endpoint and see if they are positive, negative or neutral.

    • Let’s URL-encode the reviews first. To do that, we’re going to use the Web Mining > Utility > Encode URLs operator.
    • Next we’ll send the encoded text to the Text API using the Web Mining > Services > Enrich Data by Webservice operator.
    • So now we have our reviews and we have sent them to the Text API it’s time to run the entire process and analyze these 100 reviews!

    As you can see, we get a polarity column that tells us whether each review is positive, negative or neutral.

    Step 3: Extract Freshness scores and compare them to Sentiment values

    What we accomplished in Step 2 is cool, but let’s evaluate the results by checking if the sentiment polarity scores match the “Freshness” scores given by Rotten Tomatoes reviewers.

    For anyone who doesn’t know, the “Freshness” score on Rotten Tomatoes basically tells us whether a review is positive (Fresh ) or negative (Rotten ).

    • First things first, add a second XPath query to extract the Freshness score as a boolean value (Fresh/Rotten=not Fresh)
    • Before we can check the data for correlations, we must do a bit of a cleanup and pre-processing:
      1. Remove the text column after Sentiment Analysis is done, using the Select Attributes operator.
      2. Convert the polarity and fresh columns to Numerical columns so that for instance, Polarity=true becomes Polarity_true=1. For that, we’ll use the Nominal to Numerical operator.
    • Then we need to add a Modeling > Correlation and Dependancy Computation > Correlation Matrix operator, which basically discovers statistical correlations between independent variables.
    • Finally, Run the process again to produce a table similar to below.

    What we see in the Correlation Matrix, is that polarity_positive has a positive correlation to fresh_true and polarity_negative has a positive correlation to fresh_false, which means we’ve predicted most of the polarity scores correctly.

    That’s it 100 reviews scraped an analyzed using RapidMiner and AYLIEN Text Analysis API, pat yourself on the back, Good Job!

    Text Analysis API - Sign up