Extracting Reality from Data
Go to aylien.com

Text Analysis; 10 business use cases you may not have thought of…

One of the coolest things about analysing text is, it’s everywhere! Irrespective of industry, companies & individuals want to make better informed business decisions based off trackable and measurable insight. With advancements in Text Analysis, companies can now mine text to uncover insights and improve their service or offering to prosper in their market.

So far at AYLIEN, our Text Analysis API has had great success in the news and media space. But this is just the tip of the Text Analytics iceberg. There are countless numbers of other industries that can gain the same value from such insights. As we don’t have a countless amount of time, let’s stick with a Top 10 list of use cases for Text Analytics.

1. Sports trading - One of the most popular sports to bet on, particularly in Europe, is football (soccer). The top sports traders gather data from the mainstream media and have a deep understanding of the game and it’s politics at a local level. If you live in England and you bet on English football, irrespective of the division, it’s relatively easy to understand your market. You can successfully bet on a local second division English team because you speak the language, read the local newspapers and may even follow some of the team members on Twitter. But what if you’d like to do the same for a similar team in Spain and you don’t speak a word of Spanish? A Text Analysis API capable of understanding Spanish would allow you to extract meaning from local Twitter feeds, giving you insights into what the local fans are saying about their team. These people understand the squad dynamics at a local level. If, for example, the star striker of Real Club Deportivo Mallorca has an argument with his wife the night before his cup game, is he as likely to be the top scorer on match day?

2. Financial Trading - As with sports trading, having an insight into what is happening at a local level can be very valuable to a financial trader. Domain-specific sentiment analysis/classification can add real value here. The same way in which fans have their own distinct vocab based on the sport, so too do traders in particular markets. Intent recognition and Spoken Language Understanding services for detecting user intents (e.g. “buy”, “sell”, etc) from short utterances can help to guide traders in deciding what to trade, how much and how quickly.

3. Voice of the customer (VOC) - VOC applications are primarily used by companies to determine what a customer is saying about a product or service. Sources of such data include emails, surveys, call center logs and social media streams like blogs, tweets, forum posts, newsfeeds, and so on. For example, a telecom company could use voice of customer text analysis to scan Twitter for customer gripes about their broadband internet services. This would would give them an early warning when customers were annoyed with the performance of the service and allow them to intercept the issue before it involved the customer calling to officially complain or request contract cancellation.

4. Fraud - Whether it’s workers claiming false compensation or a motorist disclosing a false home address, fraudulent activity can be discovered much more quickly when those investigating can join the dots together, faster. In the latter case, for example, the guilty party may give an address that has many claims associated with it or the driven vehicle may have been involved in other claims. Having the ability to capture this information saves the insurer time and gives them greater insight into the case.

5. Manufacturing or warranty analysis - In this use case, companies examine the text that comes from warranty claims, dealer technician lines, report orders, customer relations text, and other potential information using text analytics to extract certain entities or concepts (like the engine or a certain part). They can then analyze this information, looking at how the entities cluster and to see if the clusters are increasing in size and whether they are a cause for concern, for example.

6. Customer service routing - In this use case, companies can use text analytics to route requests to customer service representatives. For example, say you’ve sent an email to a company while on hold to one of their reps. You might have a question or a complaint about one of their products. The company can use text analytics for intelligent routing of that email to the appropriate person at the company. This could also be possible in a call center situation, provided you have sufficiently accurate speech-to-text software.

7. Lead generation - As was the case with the VOC application, taking timely action on a piece of Social Media information can be used to both retain and gain new customers. For example, if a person tweets that they are interested in a certain product or service, text analytics can discover this & feed this info to a sales rep who can then pursue this prospect and convert them into a customer.

8. TV advertising & audience analysis - TV shows or live televised events are some of the most talked-about topics on Twitter. Marketers and TV producers can both benefit from using Text Analytics in two distinct ways. If producers can get an understanding of how their audience ‘feels’ about certain characters, settings, storylines, featured music etc they can make adjustments in a bid to appease their viewers and therefore increase the audience size and viewers ratings. Marketers can dig in to social media streams to analyse the effectiveness of product placement and commercials aired during the breaks. For example, the TV character ‘Cersei’ from Game of Thrones is becoming a fashion icon amongst fans, who regularly Tweet about her latest frock. High street retailers that want to take advantage of this trend could release a line of ‘Queen of Westeros’ style clothing and align their commercials with shows like Game of Thrones. Text Analytics could also be used by TV Executives looking to sell to advertisers. For example, a TV company could mine viewers tweets & forum activity to profile their audience more accurately. So instead of merely pitching the size of their audience to advertisers, they could wow them by identifying their gender, location, age etc and their feelings towards certain products.

9. Recruitment - Text Analysis could be used in both the search and selection phases of recruitment. The most basic application would be identifying the skills of a potential hire. In the recruitment industry, the real value comes from identifying prospects before they become active on the job market. For example, it would be very powerful to know if somebody tweets about disliking their job or expresses an interest in working in a different field, larger/smaller company, different location etc. Once you have identified such a prospect, you could use Text Analytics to analyse the suitability of this person based on what others say about them. Mining news and blog articles, forum postings and other sources could help to evaluate potential hires.

10. Review Sites - Companies like Expedia have millions of reviews on their website, from travellers all over the world. Given the nature of the site and the fact that their users are looking for a stress free experience, having to sift through hundreds of reviews to find a place to stay can be a real turn off. Text Analysis can be used here to build tools that can summarize multiple properties in 2-3 word phrases. Instead of scrolling through a list of hotel features like heated pool, massage therapy, buffet breakfast etc, you could simply say “Luxurious Hotel and Spa”.

Did you like our top 10 use cases? If you work in an industry that’s not mentioned above and have an idea of how Text Analytics could help you, please let us know!

Subscribe to our blog and keep an eye out for our next post on how Text Analytics can add value to your business.

Drop us an email @mention us on Twitter

Text Analytics meets 2014 World Cup tweets - Part 1

The FIFA World Cup is without doubt the biggest sporting event in the World, with millions of fans and viewers from all around the globe who use Social Media to share their thoughts and emotions about the games, teams and players and thus creating massive amounts of content on Social Media by doing so.

Throughout the tournament, Facebook saw a record-breaking 3 billion interactions and Twitter saw a whopping 672 million tweets about the World Cup.

That’s why at AYLIEN we decided to collect some of this data using Twitter’s Streaming API and analyzed tweets related to the world cup, looking for interesting insights and correlations.

We are going to explore how you can use text analysis techniques to dig into some of this data in a series of blog posts.

In Part 1 of the series, we’re going to get a high-level view of our data, and also to look for some basic data insights about the tournament.

Data and Tools

Data: datasets used in this blog post are as follows:

  • tweets.csv: Around 30 million Tweets (80 million including retweets - which are omitted) collected between June 6th and July 14th using the Twitter Streaming API, and filtered by some of the official World Cup hashtags (e.g. “#WorldCup” and “#Brazil2014”), as well as team code hashtags (e.g. “#ARG” and “#GER”) and Twitter usernames of teams and players. (Note: we’re assuming that Twitter samples the tweets in a uniform fashion and without any major side effect on their distribution)
  • matches.csv: Information about the 64 matches, such as match time and results, obtained using the World Cup json project.
  • events.csv: Information about match events such as goals, substitutions and cards, obtained using the World Cup json project.

Tools: For these posts we will use AYLIEN Text Analysis API for Sentiment Analysis, RapidMiner for data processing and Tableau for interactive visualizations.


Let’s start our quest by taking a look at the matches and their events, such as goals, substitutions and red and yellow cards:

Things to note:

  • The number of matches with 5 or more yellow cards tends to increase in later stage games, possibly due to higher sensitivity and intensity of these matches.

Tweet languages

Now let’s take a look at a breakdown of the most popular languages used in our tweets dataset:

Things to note:

  • English, followed by Spanish and Portuguese are the three most used languages in our tweets dataset.

Tweet locations

Next we’ll have a look at the distribution of geo-tagged tweets over different countries around the globe, along with their languages:

Tweets and events

Plotting the total volume of tweets over time shows a repeating pattern of spikes appearing at match times and also at times when a major event has occurred (such as elimination of a team, qualification for the next round, or shocking results). Let’s have a look at a few examples:

1. Tweet volume by Language

In these examples, we’re going to see how the volume of tweets in a language is affected by the matches and critical events related to teams from countries where that language is spoken (also note the trend lines in black):

Note: double click on the charts to zoom, click and hold to pan.

Teams: USA, England, Australia, Cameroon and Nigeria.

Teams: Germany and Switzerland.

Teams: France, Belgium, Algeria, Cameroon and Côte d’Ivoire.

Teams: Spain, Argentina, Mexico, Uruguay, Chile, Costa Rica, Ecuador, Honduras and Colombia.

Teams: Italy.

Teams: Brazil and Portugal.

2. Tweet volume during matches

A similar pattern can be observed at a smaller scale during matches, with spikes appearing for each goal or major event. Let’s see an example from the Brazil - Germany match:

3. Tweet volumes for different teams

Finally, let’s take a look at how the volume of tweets that mention a team changes over time for the four teams that qualified for the semi-finals round (for each team we are counting mentions of the team’s full name e.g. “Germany” as well as its team code hashtag e.g. “#GER”):

Subscribe to our blog and stay tuned for part 2, where we use Text Analytics to dig deep into the tweets’ contents.

Got some cool use cases of text analysis? We would love to hear about them. Get in touch below.

Drop us an email @mention us on Twitter

Welcome on board to our 3 new team members!

Kevin Boyle - COO
Mike Waldron - VP Sales and Marketing
Mike O’Gorman - Sales and Marketing Director

We have the pleasure of introducing Kevin Boyle as our new COO, Mike Waldron as our VP of Sales and Marketing and Mike O’Gorman as our Sales and Marketing Director.

Welcome on board, folks!


I have made this letter longer than usual, because I lack the time to make it short — Blaise Pascal

We live in the age of “TL;DR"s and 140 character long texts: bite-sized content that is easy to consume and quick to digest. We’re so used to skimming through feeds of TL;DRs for acquiring information and knowledge about our friends and surroundings, that we barely sit through reading a whole article unless we find it extremely interesting.

It’s not necessarily a “bad” thing though – we are getting an option to exchange breadth for depth, which gives us more control over how we acquire new information with a higher overall efficiency.

This is an option we previously did not have, as most of the content was produced in long form and often without considering reader’s time constraints. But in the age of Internet, textual content must compete with other types of media such as images and videos, that are inherently easier to consume.

Vision: The Brevity Knob

In an ideal world, every piece of content should come with a knob attached to it that lets you adjust its length and depth by just turning the knob in either direction, towards brevity or verbosity:

  • If it’s a movie, you would start with a trailer and based on how interesting you find it, you could turn the knob to watch the whole movie, or a 60 or 30-minute version of it.
  • For a Wikipedia article, you would start with the gist, and then gradually turn the knob to learn more and gain deeper knowledge about the subject.
  • When reading news, you would read one or two sentences that describe the event in short and if needed, you’d turn the knob to add a couple more paragraphs and some context to the story.

This is our simplistic vision for how summarization technology should work.

Text Summarization

At AYLIEN we’ve been working on a Text Summarization technology that works just like the knob we described above: you give it some text, a news article perhaps, specify the target length of your summary, and our Summarization API automatically summarizes your text for you. Using it you can turn an article like this:

Into a handful of key sentences:

  1. Designed to promote a healthier balance between our real lives and those lived through the small screens of our digital devices, Moment tracks how much you use your phone each day, helps you create daily limits on that usage, and offers “occasional nudges” when you’re approaching those limits.
  2. The app’s creator, Kevin Holesh, says he built Moment for himself after realizing how much his digital addictions were affecting his real-world relationships.
  3. My main goal with Moment was make me aware of how many minutes I’m burning on my phone each day, and it’s helped my testers do that, too.”
  4. The overall goal with Moment is not about getting you to “put down your phone forever and go live in the woods,” Holesh notes on the app’s website.
  5. There’s also a bonus function in the app related to whether or not we’re putting our phone down in favor of going out on the town, so to speak – Moment can also optionally track where you’ve been throughout the day.

See a Live Demo

A New Version

Today we’re happy to announce a new version of our Summarization API that has numerous advantages over the previous versions and gives you more control over the length of the generated summary.

Two new parameters sentences_number and sentences_percentage allow you to control the length of your summary. So to get a summary that is 10% of the original text in length, you would make the following request:

curl --get --include "https://aylien-text.p.mashape.com/summarize?url=http%3A%2F%2Fwww.bbc.com%2Fsport%2F0%2Ffootball%2F25912393&sentences_percentage=10" -H "X-Mashape-Key: YOUR_MASHAPE_KEY"

We hope you find this new technology useful. Please check it out on our website and let us know if you have any questions or feedback: hello@aylien.com

Happy TL;DRing!

Hallo, bonjour, ciao, hola, olá! Text API supports 5 new languages

When we launched our Text Analysis API back in February, we made a promise to put quality before quantity – meaning that we won’t build a new feature without making sure the current features are all working reasonably well.

That’s why we initially focused on English as the only language supported by the API.

Now we’ve reached a stage where we feel comfortable to extend our knowledge of Machine Learning and Text Analysis to other languages, and that’s why we’ve decided to add support for 5 new languages to our Concept Extraction and Hashtag Suggestion endpoints: starting today, you can extract concepts that are mentioned in documents written in German, French, Italian, Spanish and Portuguese in the same way you would extract concepts mentioned in English documents. Same with Hashtag Suggestion.

Here’s a sample request:

curl -v --data-urlencode "url=http://www.lemonde.fr/europeennes-2014/article/2014/05/22/sarkozy-demolit-l-ue-existante-tout-en-disant-qu-il-l-aime_4423949_4350146.html" \
    -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY" \

Note that you can use language=auto to have the API automatically detect the language of the document for you.

We are planning to eventually add support for these 5 languages to all other endpoints, so stay tuned for more!