Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78


We’re just two days away from seeing the Atlanta Falcons and New England Patriots go head to head at Super Bowl 51 in Texas. With an anticipated viewership of over 100 million people, it’s no surprise that some of the world’s biggest brands are pulling out all the stops in an attempt to win a much anticipated off-field battle. We are of course talking about the annual Super Bowl ads battle, where top brands are willing to cough up over $5 million for just 30 seconds of TV airtime.

Sentiment Analysis of tweets from Super Bowl 2016

Last year, we analyzed 1.8 million tweets during Super Bowl 50 to uncover the best, the worst, and the most controversial ads according to Twitter users. Using advanced Sentiment Analysis techniques, we were able to uncover that Amazon’s star-studded effort was the most popular ad at Super Bowl 50, earning the highest volume of positive tweets. PayPal, on the other hand, found themselves at the opposite end of the positivity scale, receiving the highest volume of negative tweets. And the most controversial? We had a clear winner in that category with Mountain Dew’s Puppy Monkey Baby shocking, confusing and amusing viewers in equal measure!

Of course, it’s not all about those 30 seconds of TV airtime. Brands that create something memorable can reap the rewards long after the final whistle has blown. Popular ads can go viral in minutes, with those that fail to impress being left behind and quickly forgotten. Just take a look at the YouTube views for these three brand ads since Super Bowl 50;

YouTube views since Super Bowl 50

With close to 30 million YouTube hits, it’s safe to say that Mountain Dew did pretty well from their wacky creation last year! For PayPal on the other hand, it was back to the drawing board with an expensive disappointment.

Watch: Mountain Dew’s Super Bowl 50 ad “Puppymonkeybaby”

Note: In this post, which is part 1 of a 3 part series, we’re going to focus on the hype surrounding the ads battle in the lead up to big game. Check back for part 2 and 3 where we’ll dive into the in-game reaction on social media and how the brands fared from the press reaction after the event.

The most anticipated ads of Super Bowl 51

This year, as well as once again analyzing millions of tweets to uncover the good, the bad and the ugly among Super Bowl 51 commercials (check back next week for that one!), we thought it would be cool to find out which brands are receiving the most media attention in the lead up to the event.

Using the AYLIEN News API we sourced and analyzed thousands of news stories that mentioned keywords relating to the Super Bowl and the brands that are advertising throughout. From these stories, and using the power of Natural Language Processing in our Text Analysis engine, we were able to uncover which brands have been mentioned most in news stories in the lead up to the event..

The top 15 most mentioned brands

The bubble chart below represents the 15 brands that have received the most mentions in Super Bowl commercial-related news content since January 1. The bigger the bubble, the higher the mentions volume;

Right away we can see a clear leader in Budweiser, who received 50% more mentions than the second most mentioned brand, Pepsi. Why are Budweiser receiving so much attention? Well, much like Mountain Dew last year, controversy is proving to be a key factor, as we’re about to show you.

Want to track mentions and get intelligent, NLP-driven insights into the world’s news content? Sign-up for a free 14 day trial of our News API and get started!

Our top 3 Super Bowl commercials to watch out for

Having uncovered the top 15 most mentioned brands, we thought we would put our necks on the line by selecting three of these brands that we believe will make the biggest splash on social media during Super Bowl 51.


In an attempt to better understand the reasoning behind the hype around Budweiser, we analyzed all news stories mentioning “Super Bowl” and “Budweiser” to see what other topics were present in the collection of articles. From our results we removed keywords relating to the football game itself, as well as obvious brand-related words such as Bud, Anheuser-Busch, beer, etc. The topics that remained quickly gave us an indication of why this ad is proving to be controversial in the US;

Topics extracted from stories mentioning “Super Bowl” and “Budweiser”

Coincidence, or political statement?

Budweiser’s commercial preview, titled Born The Hard Way, shows Adolphus Busch, the co-founder of Anheuser-Busch, arriving in the US from Germany with the dream of opening his beer brewery. With the immigrant-theme of the commercial and opening line of dialogue being “You don’t look like you’re from around here”, the thoughts of political statement quickly spring to mind.

Watch: Budweiser’s Super Bowl ad preview “Born The Hard Way”

Despite Budweiser vice-president, Ricardo Marques, stating that “There’s really no correlation with anything else that’s happening in the country”, news outlets and social media commentators beg to differ, with a strong split in opinion quickly forming. We’re even seeing the spread of #BoycottBudweiser across many tweets.

Whether intentional or not, Budweiser have placed themselves firmly at the center of an fiery debate on immigration, and it will be fascinating to see the public reaction to their main showpiece on Sunday.

Our Budweiser prediction

  • Most controversial ad this year
  • Ad content will be irrelevant, and a political debate will rage on Twitter


Snickers will make Super Bowl history this year by being the first brand to perform and broadcast their commercial live during the event.

While Snickers have released a number of small teaser-style previews with a western-theme, we’re still not sure exactly how this one is going to play out.

Watch: Snickers’ Super Bowl ad teaser

With the intrigue of a live performance, as well as the inclusion of superstars like Betty White and Adam Driver, we’re excited to see how this one goes, particularly the reaction on social media.

Live commercial, live Twitter reaction

The world’s first live Super Bowl commercial presents us with the opportunity to track public reaction before, during and after the performance. While we’ll be tracking and analyzing the reaction to all of our top 15 ads, the uniqueness of Snickers’ live commercial brings a whole new level of insight into the tracking of public opinion. Judging by the teasers, it appears that Snickers are going for a wild west-style performance with horses, celebrities and a number of performers.

The big question is, how will social media respond to a real-time, potentially unpolished and unpredictable live performance? We can’t wait to find out!

Our Snickers prediction

  • Live format will inspire and drive high social engagement.
  • A popular cast, inclusion of horses and a fun theme will see Snickers near the top of our most liked ads in terms of positive Twitter sentiment.

Want to track Twitter reactions yourself? Build your own sentiment analysis tool in just 10 minutes. No coding required, and it’s free 🙂


Our second most mentioned brand, Pepsi are investing heavily in Super Bowl 51 with commercials for two products, as well as sponsoring the 12-minute Halftime Show.

For Pepsi, their main aim is to generate awareness around two new products; LIFEWTR and Pepsi Zero Sugar. Have they been successful in this regard so far? While our post-game analysis will give us a better indication of the overall success of their campaign, we can perhaps already say that these two products are being somewhat overshadowed.

Here are the top keywords from stories mentioning “Super Bowl” and “Pepsi”, excluding game-related and obvious brand-related keywords such as Houston, PepsiCo, football, etc.

Topics extracted from stories mentioning “Super Bowl” and “Pepsi”

If you weren’t aware of who was performing during the Super Bowl Halftime Show, now you are! Lady Gaga is absolutely dominating in terms of media mentions, and Pepsi’s high mention volume is most definitely a result of the singer’s involvement in the Halftime Show that they just happen to be sponsoring.

Perhaps worryingly for Pepsi, we saw no mention of LIFEWTR or Pepsi Zero Sugar in our top 100 keyword results.

Watch: Pepsi Super Bowl 51 ad “Inspiration Drops”

Last year, PayPal were accused of playing it safe when it came to their Super Bowl ad. Have Pepsi made the same mistake with LIFEWTR?

Our Pepsi prediction

  • Huge Twitter mention volumes for Pepsi, owing to Lady Gaga’s performance.
  • Low mention volumes for LIFEWTR and Pepsi Zero Sugar.
  • Tame public reaction to LIFEWTR commercial and very low YouTube views.

Who will be the winners and losers at Super Bowl 51?

We’ll be listening to and analyzing news and social media content before, during and after Super Bowl 51 to bring you our annual insights into public and media reaction to both the game itself and the ads battle, so check back next week to find out who were the biggest winners and losers!

Happy Super Bowl weekend to you all 🙂

News API - Sign up



With our News API, our goal is to make the world’s news content easier to collect, monitor and query, just like a database. We leverage Machine Learning and Natural Language Processing to process, normalize and analyze this content to make it easier for our users to gain access to rich and high quality metadata, and use powerful filtering capabilities that will ultimately help you to find precise and targeted stories with ease.

To this end, we have just launched a cool new feature, Real-time monitoring. Real-time monitoring allows you to further automate your collection and analysis of the world’s news content by creating tailored searches that source and automatically retrieve highly-relevant news stories, as soon as they are published.

real-time monitoring

You can read more about our latest feature – which is now also available in our News API SDKs – below.

Real-time monitoring

With Real-time monitoring enabled you can automatically pull stories as they are published, based on your specific search query. Users who rely on having access to the latest stories as soon as they are published, such as news aggregators and news app developers for example, should find this new feature particularly interesting.

The addition of this powerful new feature will help ensure that your app, webpage or news feed is bang up to date with the latest and most relevant news content, without the need for manual searching and updating.

Newly published stories can be pulled every minute (configurable), and duplicate stories in subsequent searches will be ignored. This ensures you are only getting the most recent publications, rather than a repeat of what has come before.


We have created code in seven different programming languages to help get you started with Real-time monitoring, each of which can be found below, as well as in our documentation.

NB: Real-time monitoring will only work when you set the sort_by parameter to published_at and sort_direction to desc.


The main benefit of this cool new feature is that you can be confident you are receiving the very latest stories and insights, without delay, by creating an automated process that will continue to retrieve relevant content as soon as it is published online. By automating the retrieval of content in real-time, you can cut down on manual input and generate feeds, charts and graphs that will automatically update in real-time.

We hope that you find this new update useful, and we would love to hear any feedback you may have.

To start using our News API for free and query the world’s news content easily, click the image below.


News API - Sign up



It’s certainly an exciting time be involved in Natural Language Processing (NLP), not only for those of us who are involved in the development and cutting-edge research that is powering its growth, but also for the multitude of organizations and innovators out there who are finding more and more ways to take advantage of it to gain a competitive edge within their respective industries.

With the global NLP market expected to grow to a value of $16 billion by 2021, it’s no surprise to see the tech giants of the world investing heavily and competing for a piece of the pie. More than 30 private companies working to advance artificial intelligence technologies have been acquired in the last 5 years by corporate giants competing in the space, including Google, Yahoo, Intel, Apple and Salesforce. [1]

It’s not all about the big boys, however, as NLP, text analysis and text mining technologies are becoming more and more accessible to smaller organizations, innovative startups and even hobbyist programmers.

NLP is helping organizations make sense of vast amounts of unstructured data, at scale, giving them a level of insight and analysis that they could have only dreamed about even just a couple of years ago.

Today we’re going to take a look at 3 industries on the cusp of disruption through the adoption of AI and NLP technologies;

  1. The legal industry
  2. The insurance industry
  3. Customer service

NLP & Text Analysis in the Legal industry

While we’re still a long long way away from robot lawyers, the current organic crop of legal professionals are already taking advantage of NLP, text mining and text analysis techniques and technologies to help them make better-informed decisions, in quicker time, by discovering key insights that can often be buried in large volumes of data, or that may seem irrelevant until analyzed at scale, uncovering strategy-boosting and often case-changing trends.

Let’s take a look at two examples of how legal pro’s are leveraging NLP and text analysis technologies to their advantage;

  • Information retrieval in ediscovery
  • Contract management
  • Article summarization

Information retrieval in ediscovery

Ediscovery refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format. Electronic documents are often accompanied by metadata that is not found on paper documents, such as the date and time the document was written, shared, etc. This level of minute detail can be crucial in legal proceedings.

As far as NLP is concerned, ediscovery is mainly about information retrieval, aiding legal teams in their search for relevant and useful documents.

In many cases, the amount of data requiring analysis can exceed 100GB, when often only 5% – 10% of it is actually relevant. With outside service bureaus charging $1,000 per GB to filter and reduce this volume, you can start to see how costs can quickly soar.

Data can be filtered and separated by extracting mentions of specific entities (people, places, currency amounts, etc), including/excluding specific timeframes and in the case of email threads, only include mails that contain mentions of the company, person or defendant in question.

Contract management

NLP enables contract management departments to extract key information, such as currency amounts and dates, to generate reports that summarize terms across contracts, allowing for comparisons among terms for risk assessment purposes, budgeting and planning.

In cases relating to Intellectual Property disputes, attorneys are using NLP and text mining techniques to extract key information from sources such as patents and public court records to help give them an edge with their case.

Article summarization

Legal documents can be notoriously long and tedious to read through in their entirety. Sometimes all that is required is a concise summary of the overall text to help gain an understanding of its content. Summarization of such documents is possible with NLP, where a defined number of sentences are selected from the main body of text to create, for example, a summary of the top 5 sentences that best reflect the content of the document as a whole.

NLP & Text Analysis in the Insurance industry

Insurance providers gather massive amounts of data each day from a variety of channels, such as their website, live chat, email, social networks, agents and customer care reps. Not only is this data coming in from multiple channels, it also relates to a wide variety of issues, such as claims, complaints, policies, health reports, incident reports, customer and potential customer interactions on social media, email, live chat, phone… the list goes on and on.
The biggest issue plaguing the insurance industry is fraud. Let’s take a look at how NLP, data mining and text analysis techniques can help insurance providers tackle these key issues;

  • Streamline the flow of data to the correct departments/agents
  • Improve agent decision making by putting timely and accurate data in front of them
  • Improve SLA response times and overall customer experience
  • Assist in the detection of fraudulent claims and activity

Streamlining the flow of data

That barrage of data and information that insurance companies are being hit by each and every day needs to be intricately managed, stored, analyzed and acted upon in a timely manner. A missed email or note may not only result in poor service and an upset customer, it could potentially cost the company financially if, for example, relevant evidence in a dispute or claim case fails to surface or reach the right person/department on time.

Natural Language Processing is helping insurance providers ensure the right data reaches the right set of eyeballs at the right time through automated grouping and routing of queries and documents. This goes beyond simple keyword-matching with text analysis techniques used to ‘understand’ the context and category of a piece of text and classify it accordingly.

Fraud detection

According to a recent report by Insurance Europe, detected and undetected fraudulent claims are estimated to represent 10% of all claims expenditure in Europe. Of note here, of course, is the fraud that goes undetected.

Insurance companies are using NLP and text analysis techniques to mine the data contained within unstructured sources such as applications, claims forms and adjuster notes to unearth certain red flags in submitted claims. For example, a regular indicator of organized fraudulent activity is the appearance of common phrases or descriptions of incidents from multiple claimants. The trained human eye may or may not be able to spot such instances but regardless, it would be a time consuming exercise and likely prone to subjectivity and inconsistency from the handler.

The solution for insurance providers is to develop NLP-powered analytical dashboards that support quick decision making, highlight potential fraudulent activity and therefore enable their investigators to prioritise cases based on specifically defined KPIs.

NLP, Text Analysis & Customer Service

In a world that is increasingly focused on SLAs, KPIs and ROIs, the role of Customer Support and Customer Success, particularly in technology companies, has never been more important to the overall performance of an organization. With the ever-increasing number of startups and innovative companies disrupting pretty much every industry out there, customer experience has become a key differentiator in markets flooded with consumer choice.

Let’s take a look at three ways that NLP and text analysis is helping to improve CX in particular;

  • Chat bots
  • Analyzing customer/agent interactions
  • Sentiment analysis
  • Automated routing of customer queries

Chat bots

It’s safe to say that chat bots are a pretty big deal right now! These conversational agents are beginning to pop up everywhere as companies look to take advantage of the cutting edge AI that power them.

Chances are that you interact with multiple artificial agents on a daily basis, perhaps even without realizing it. They are making recommendations as we online shop, answering our support queries in live chats, generating personalized fitness routines and communicating with us as virtual assistants to schedule meetings.

Screen Shot 2016-09-16 at 12.21.48

A recent interaction I had with a personal assistant bot, Amy
Chat bots are helping to bring a personalized experience to users. When done right, not only can this reduce spend in an organization , as they require less input from human agents, but it can also add significant value to the customer experience with intelligent, targeted and round-the-clock assistance at hand.

Analyzing customer/agent interactions

Interactions between support agents and customers can uncover interesting and actionable insights and trends. Many interactions are in text format by default (email, live chat, feedback forms) while voice-to-text technology can be used to convert phone conversations to text so they can be analyzed.

Listening to their customers

The voice of the customer is more important today than ever before. Social media channels offer a gold mine of publicly available consumer opinion just waiting to be tapped. NLP and text analysis enables you to analyze huge volumes of social chatter to help you understand how people feel about specific events, products, brands, companies, and so on.

Analyzing the sentiment towards your brand, for example, can help you decrease churn and improve customer support by uncovering and proactively working on improving negative trends. It can help show you what you are doing wrong before too much damage has been done, but also quickly show you what you are doing right and should therefore continue doing.

Customer feedback containing significantly high levels of negative sentiment can be relayed to Product and Development teams to help them focus their time and efforts more accordingly.

Because of the multi-channel nature of customer support, you tend to have customer queries and requests coming in from a variety of sources – email, social media, feedback forms, live chat. Speed of response is a key performance metric for many organizations and so routing customer queries to the relevant department, in as few steps as possible, can be crucial.

NLP is being used to automatically route and categorize customer queries, without any human interaction. As mentioned earlier, this goes beyond simple keyword-matching with text analysis techniques being used to ‘understand’ the context and category of a piece of text and classify it accordingly.


As the sheer amount of unstructured data out there grows and grows, so too does the need to gather, analyze and make sense of it. Regardless of the industry in which they operate, organizations that focus on benefitting from NLP and text analysis will no doubt gain a competitive advantage as they battle for market share.


Text Analysis API - Sign up


This is the fourth in our series of blogs on getting started with our various SDKs. Depending on what your language preference is, we have SDKs available for Node.js, Python, Ruby, PHP, GO, Java and .Net (C#). Last week’s blog focused using our Go SDK. This week we’ll focus getting up and running with Ruby.

If you are new to our API and you don’t have an account, you can go directly to the Getting Started page on our website, which will take you through the signup process. You can sign up to a free plan to get started which, allows you to make up to 1,000 calls per day to the API for free.

Downloading and Installing the Ruby SDK

All of our SDK repositories are hosted on Github, you can get the Ruby repository here. The simplest way to install the repository is by using “gem”.

$ gem install aylien_text_api

Configuring the SDK with your AYLIEN credentials

Once you have received your AYLIEN APP_ID and APP_KEY from the signup process and have downloaded the SDK you can start making calls by passing your configuration parameters as a block to AylienTextApi.configure.

require 'aylien_text_api'

AylienTextApi.configure do |config|
  config.app_id        =    "YOUR_APP_ID"
  config.app_key       =    "YOUR_APP_KEY"
client =

Alternatively, you can pass them as parameters to AylienTextApi::Client class.

require 'aylien_text_api'

client = "YOUR APP ID", app_key: "YOUR APP KEY")

When calling the various API endpoints you can specify a piece of text directly for analysis or you can pass a url linking to the text or article you wish to analyze.

Language Detection

First let’s take a look at the language detection endoint. Specifically we will detect the language of the sentence “’What language is this sentence written in?’

You can call the endpoint using the following piece of code.

text = "What language is this sentence written in?"
language = client.language text: text
puts "Text: #{language[:text]}"
puts "Language: #{language[:lang]}"
puts "Confidence: #{language[:confidence]}"

You should receive an output very similar to the one shown below which shows that the language detected was English and the confidence that it was detected correctly (a number between 0 and 1) is very close to 1 indicating that you can be pretty sure it is correct.

Language Detection Results

Text: What language is this sentence written in?
Language: en
Confidence: 0.9999962069593649

Sentiment Analysis

Next we will look at analyzing the sentence “John is a very good football player” to determine it’s sentiment i.e. positive , neutral or negative. The endpoint will also determine if the text is subjective or objective. You can call the endpoint with the following piece of code

text = "John is a very good football player!"
sentiment = client.sentiment text: text
puts "Text            :  #{sentiment[:text]}"
puts "Sentiment Polarity  :  #{sentiment[:polarity]}"
puts "Polarity Confidence  :  #{sentiment[:polarity_confidence]}"
puts "Subjectivity  :  #{sentiment[:subjectivity]}"
puts "Subjectivity Confidence  :  #{sentiment[:subjectivity_confidence]}"

You should receive an output similar to the one shown below which indicates that the sentence is objective and is positive, both with a high degree of confidence.

Sentiment Analysis Results

Text            :  John is a very good football player!
Sentiment Polarity  :  positive
Polarity Confidence  :  0.9999988272764874
Subjectivity  :  objective
Subjectivity Confidence  :  0.9896821594138254

Article Classification

Next we will take a look at the classification endpoint. The Classification endpoint automatically assigns an article or piece of text to one or more categories making it easier to manage and sort. The classification is based on IPTC International Subject News Codes and can identify up to 500 categories. The code below analyses a BBC news article about mega storms on the planet Uranus.

url = ""
classify = client.classify url: url
classify[:categories].each do |cat|
	puts "Label : #{cat[:label]}"
	puts "Code : #{cat[:code]}"
	puts "Confidence : #{cat[:confidence]}"

When you run this code you should receive an output similar to that shown below which assigns the article an IPTC label of “natural science – astronomy” with an IPTC code of 13004007.

Article Classification Results

Label : natural science - astronomy
Code : 13004007
Confidence : 1.0

Hashtag Analysis

Next we will look at analyzing the same BBC article to extract hashtag suggestions for sharing the article on social media with the following code.

url = ""
hashtags = client.hashtags url: url
hashtags[:hashtags].each do |str|
	puts str

You should receive the output shown below.

Hashtag Suggestion Results

If Ruby isn’t your preferred language then check out our SDKs for node.js, Go, PHP, Python, Java and .Net (C#). For more information regarding the APIs go to the documentation section of our website.

We will be publishing ‘getting started’ blogs for the remaining languages over the coming weeks so keep an eye out for them. If you haven’t already done so you can get free access to our API on our sign up page.

Text Analysis API - Sign up



As part of our blog series, ‘Text Analysis 101: a basic understanding for business users’, we will aim to explain how Text Analysis and Natural Language Processing works from a non-technical point of view.

For the first installment, we are going to discover how text is understood by machines, what methods are used in text analysis and why Entity and Concept extraction techniques are so important in the process.

Text Analysis

Text Analysis refers to the process of retrieving high-quality information from text. It involves Information Retrieval (IR) processes and lexical analysis techniques to study word frequency, distributions, patterns and utilizes information extraction and association analysis to attempt to understand text. The main goal of Text Analysis as a practice is to turn text into data for further analysis, whether that is from a business intelligence, research, data analytics or investigative perspective. There are certain aspects of text, that can be identified with modern techniques, that allow machines to understand a document, article or piece of text.

Technological advancements, greater computing power and investment in research has meant Natural Language Processing techniques have evolved, performance has improved and adoption across the business world has grown dramatically with the Text Analytics market now, according to Alta Plana’s latest report “Text Analytics 2014: User Perspectives on Solutions and Providers.”  having an estimated market value exceeding $2bn.

Traditionally NLP techniques focused on words. These techniques relied on statistical algorithms to analyze and attempt to understand text. However, there has been a push in recent times to equip machines with the capabilities to not just analyze, but to “understand” text. There are numerous approaches to the problem some being more popular and more accurate than others.

Document Representation Models – Bag of Words and Bag of Concepts

Traditionally, analysis systems were focused on words and they failed to identify concepts when attempting to understand text. The diagram below outlines how, as we move up the pyramid and consider concepts in our analysis, we move closer to machines extracting meaning from text.

Bag of Words

The bag-of-words model is a representation that has been traditionally used in NLP and IR. In this model, all grammar, sentence structure and word order can be disregarded and a piece of text, a document or a sentence can be seen or represented as a “bag of words”. The collection of words can then be analyzed using the Document-term Matrix for occurrences of certain words, in order to better understand the document, based on its most representative terms. While analyzing words is somewhat successful, a greater focus on concepts within text has proven to increase a machine’s overall understanding of text.

Bag of Concepts

Looking beyond just the words on the surface of a document can provide context to improve a computers’ understanding of text. As demonstrated in the pyramid above, analyzing the words alone can be seen as a base level analysis while considering concepts as part of the analysis goes a step further to improve overall understanding.

While a concept based approach may provide greater insight, by not relying on the words alone and considering concepts as part of the analysis process. Combining both the BoW and BoC approaches to understanding text, performance and accuracy can be greatly improved. This is especially true when we are dealing with a somewhat lesser known sample of text.

You can read more about the Bag of Concepts approach here:

“Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization”

To move towards more of a concept-based model of Text Analysis we need to be able to identify entities and concepts within a text. In order to understand how this is done it’s important to discuss, what entities and concepts are and how we identify and utilize them from an analysis point of view.


An entity is something that exists in itself, a thing with distinct or independent existence.


A concept can be defined as an abstract or generic idea generalized from particular instances.

But how can machines recognize entities and concepts in text?

Named Entity Recognition (NER)

Also known as Entity Extraction, NER, aims to automatically locate and classify elements of text into predefined categories such as the names of persons, organizations, locations, expression of times, quantities, monetary values, percentages, etc. The NER approach uses either linguistic grammar-based techniques or statistical modelling techniques or both to identify and extract entities from text.

Consider the following piece of text as an example:

“Michael loved the Apple iPhone. He always admired Steve Jobs, but he couldn’t justify spending over $500 on a new phone.”

Using NER certain mentions of Entities can be identified in a sentence or entire piece of text, as is highlighted below:

“Michael [Person] loved the Apple [Organization] iPhone . He always admired Steve Jobs [Person] but he couldn’t justify spending over $500 [Money] on a new phone.”

It isn’t always possible, however, to identify entities in a piece of text using NER exclusively. Written language isn’t always exact and trying to understand a piece of text without considering the context can lead to inaccuracies. Is a mention of Apple referring to the company, the fruit or even the artist Billy Apple? That is where disambiguation and concepts can add more clarity and accuracy to the analysis process.

Named Entity Disambiguation (NED)

Named entity disambiguation can be used to identify and extract concepts from text. Its approach to the problem differs to NER in that it doesn’t rely on grammar or statistics. Also known as entity linking, NED utilizes a knowledge base to use as a reference to identify entities. This could be a public knowledge base like Wikipedia or a training text which is often domain specific.

The process is outlined simply below:

Step 1. Spotting: looking for surface forms like “apple” (the sequence of the letters a-p-p-l-e)

Step 2. Candidate generation: identifying potential candidates, Apple inc, Apple (the fruit), Billy Apple etc.

Step 3. Disambiguation: referencing a knowledge base and considering the context to identify a concept.

Entities vs Concepts

For the most part it is often best to identify and extract both named entities and concepts in order to fully understand a piece of text. Entities may be common and well known and easy to identify, but there may also be concepts within your text that would be overlooked without the disambiguation process.

Identifying concepts does have some advantages over only considering entities as part of the entire analysis process. By referring to a knowledge base, like Wikipedia, further information about a concept can be identified and utilised. For example, in an article that mentions Steve Jobs, iPhone, Mac and Palo Alto but not “apple”, based on the information sourced in your knowledge base, you could still identify “apple” as a concept.

Concepts can also be used to pull additional information and insights from a knowledge base, providing an automated and straightforward way to enhance and augment any document. For instance, for every concept of type “place”, a map of that place could be added to the document, knowing the place’s exact latitude and longitude.

Being able to identify Entities and Concepts means key aspects can be identified and extracted from documents, articles, emails etc. which allows machines to provide greater analysis and enhancement capabilities and a deeper understanding of text.

Our next blog in the series will focus on how text is classified and summarized automatically.

Text Analysis API - Sign up


Whether you’re a publisher, a content distributor or an advertiser, content is king if you are looking to increase engagement. As web users, we don’t visit and regularly return to websites, blogs and news sites if we don’t find them engaging or they don’t fulfill our needs. When we visit websites we generally want to learn something or be entertained. If businesses want to engage with audiences and more importantly relevant audiences online they need to be creating high quality and informative content.



What’s Changed?

As web users, the way we search for and consume information has dramatically changed. This has meant publishers and content creators have had to adapt too. More and more content is being consumed and shared online today than ever before according to a recent study by IBM; “90% of the data in the world today has been created in the last two years alone.” This content is also published and shared across numerous channels; blogs, platforms, news sites and social media for example which means staying ahead of the curve and informed has become harder than ever before.

Traditionally content distributors or even search engines would rely on serving relevant content to users based on their expressed requirements, for example through a Google search, we would enter our search term and they would provide a relevant result. It was a similar situation for advertisers and publishers, we would be served ads or promoted content based on our search terms or what we asked for.

Today things are a little different, as web users, we expect relevant content to be pushed to us and placed under our noses via our favourite blogs, social sites and even well targeted ads. We expect informative, relevant and sharable content to be automatically placed at our fingertips.

This new focus on content has also brought about some notable developments in the advertising and content distribution spaces. Content discovery platforms, publishers and advertisers have needed to adapt and not be left behind and have done a relatively good job of reacting to change and embracing technology.

Advancements in NLP, Machine Learning and Text Analysis are right at the heart of how content discovery and distribution has changed. Being able to analyze vast amounts of content and extract topics, entities, concepts, keywords and even being able to summarize vast amounts of text allows for easier and more accurate categorization, discovery and distribution.

So who Benefits?

With all of the advancements in technology being utilized to make the web better, it is difficult to know who benefits the most. Publishers, Advertisers, Content Distributors or Web users.


Providing relevant content to attract visitors to your sites is one thing but keeping visitors engaged and returning is even more important to publishers. Being able to suggest another relevant article or video to readers means visitors are more likely to spend time on your site, re-visit your site, consume more content and share more articles.


Semantic Advertising (which you can read about here) allows advertisers to serve more targeted ads which results in higher CTR’s. Being able to analyze content and automatically serve well-targeted relevant ads means the ad publisher/networks as well as the brands behind the ads extend their reach, improve their relevancy and benefit from better performing ad campaigns.

Content discovery/distribution platforms

Text Analysis and Natural Language Processing technologies allow content platforms to easily discover popular content, group or categorize it and distribute that content effectively on the right channels to the right target audience resulting in a more engaged and growing user base.

Web Users

As web users, things have become a lot easier. We now utilise automated discovery tools, engage with content discovery platforms and follow influencers on social channels, for example, to be kept abreast of content we wish to consume. It has never been easier for us to discover and share content online. We can now consume and discover information, news articles, videos, services and products that are relevant to us on whatever channel or format we choose with little or no input ourselves.

Text Analysis API - Sign up


Human beings are remarkably adept at understanding each other, given that we speak in languages of our own construction which are merely symbols of the information we’re trying to convey.

We’re skilled at understanding for two reasons. First, we’ve had, literally, millions of years to acquire the necessary skills. Second, we speak in, generally, the same terms, the same languages. Still, it’s an incredible feat, to extract understanding and meaning from such an avalanche of signal.

Consider this: researchers in Japan used the K Computer, currently the fourth most powerful supercomputer in the world, to process a single second of human brain activity.

It took the computer 40 minutes to process that single second of brain activity.

For machines to reach the level of understanding that’s required for today’s applications and news organizations, then, would require those machines to sift through astronomical amounts of data, separating the meaningful from the meaningless. Much like our brains consciously process only a fraction of the information they store, a machine that could separate the wheat from the chaff would be capable of extracting remarkable insights.

We live in the dawn of the computer age, but in the thirty years since personal computing went mainstream, we’ve seen little progress in how computers work on a fundamental level. They’ve gotten faster, smaller, and more powerful, but they still require huge amounts of human input to function. We tell them what to do, and they do it. But what if what we’re truly after is understanding? To endow machines with the ability to learn from us, to interact with us, to understand what we want? That’s the next phase in the evolution of computers.

Enter NLP

Natural Language Processing (NLP) is the catalyst that will spark that phase. NLP is a branch of Artificial Intelligence that allows computers to not just process, but to understand human language, thus eliminating the language barrier.

Chances are, you already use applications that employ NLP:

  • Google Translate: human language translation is already changing the way humans communicate, by breaking down language barriers.
  • Siri and Google Now: contextual services built into your smartphone rely heavily on NLP. NLP is why Google knows to show you directions when you say “How do I get home?”.

There are many other examples of NLP in products you already use, of course. The technology driving NLP, however, is not quite where it needs to be (which is why you get so frustrated when Siri or Google Now misunderstands you). In order to truly reach its potential, this technology, too, has a next step: understand you. It’s not enough to recognize generic human traits or tendencies; NLP has to be smart enough to adapt to your needs.

Most startups and developers simply don’t have the time or the resources to tackle these issues themselves. That’s where we come in. AYLIEN (that’s us) has combined three years of our own research with emerging academic studies on NLP to provide a set of common NLP functionalities in the form of an easy-to-use API bundle.

Announcing the AYLIEN Text Analysis API

The Text API consists of eight distinct Natural Language Processing, Information Retrieval, and Machine Learning APIs which, when combined, allow developers to extract meaning and insight from any document with ease.

Here’s how we do it.

Article Extraction

This tool extracts the main body of an article, removing all extraneous clutter, but leaving intact vital elements like embedded images and video.


Article Summarization

This one does what it says on the tin: summarizes a given article in just a few sentences.



The Classification feature uses a database of more than 500 categories to properly tag an article according to IPTC NewsCode standards.


Entity Extraction

This tool can extract any entities (people, locations, organizations) or values (URLS, emails, phone numbers, currency amounts and percentages) mentioned in a given text.


Concept Extraction

Concept Extractions continues the work of Entity Extraction, linking the entities mentioned to the relevant DBPedia and Linked Data entries, including their semantic types (such as DBPedia and types).


Language Detection

Language Detection, of course, detects the language of a document from a database of 62 languages, returning that information in ISO 639-1 format.


Sentiment Analysis

Sentiment Analysis detects the tone, or sentiment, of a text in terms of polarity (positive or negative) and subjectivity (subjective of objective).


Hashtag Suggestion

Because discoverability is crucial to social media, Hashtag Suggestion automatically suggests ultra-relevant hashtags to engage audiences across social media.


This suite of tools is the result of years of research mixed in with a good, old-fashioned hard work. We’re excited about the future of the Semantic Web, and we’re proud to offer news organizations and developers an easy-to-use API bundle that gets us one step closer to recognizing our vision.

We’re happy to announce that you can start using the Text API from today for free. Happy hacking, and let us know what you think.

Learn more:

Text Analysis API - Sign up