Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78


There is a wealth of information hidden in the contents and the markup of a web page that can be extremely useful when trying to understand what a page is all about while trawling the web. One classic example would be tags: those short phrases or keywords that bloggers and publishers use to describe what a webpage, article or blog post is about. Tags can be rendered as visual elements on the page, or hidden away using `meta` attributes.

Screen Shot 2017-02-21 at 18.34.59

example of meta attributes

It is obvious that by extracting these tags we can learn a whole lot about any blog post or article that we are analyzing. They describe a piece of content the way their author or editor would, and they may contain various pieces of information such as the high-level topical category, the entities (people, places, organizations, etc.) mentioned, or the concepts that the article is about. This makes them an excellent source of information to leverage when classifying web pages.

The problem with extracting these tags is the way webpages are structured on the web, and the way they are expressed differs greatly across web pages and sites. Different Content Management Systems used by different blogs and news websites each have their own way of presenting metadata such as tags, making it difficult to access and parse this information.

tags dark grey (3)

examples of visual tags from various blogging platforms

Today we are announcing the launch of a much-requested addition to our Article Extraction API that provides a uniform and standard interface for extracting tags from any blog post or article on the web.

Tag Extraction

We’ve supercharged our article extraction feature in the Text Analysis API to make it even easier to extract useful information from a webpage. Through our Article Extraction endpoint, users already have the ability to extract metadata such as author name, publish date, main image, article title and the main body of text from a page. But in a lot of cases, a web page will contain other useful information about that page often in the form of tags..

The Tag Extraction feature will identify and extract any relevant tags present on the page no matter the structure of the page or where the tags are present.

So how can these tags be used?

These extracted tags can be utilized in a number of ways;

To tag or classify a web page

The tags extracted can often give a very useful insight into what a page is about. These tags are often manually added by the author, an editor or the web designer meaning they can provide very accurate descriptions of what a page is about.

Take for example these tags extracted from an article on Artificial Intelligence on Wired below.

author :  Cade Metz,
image :,
tags : [
neural networks,
Artificial Intelligence
article :  For almost three weeks, Dong Kim sat at a casino in Pittsburgh and played poker against a machine. But Kim wasn’t just a...,
videos : [ ... ],
title :  Inside the Poker AI That Out-Bluffed the Best Humans,
publishDate :  2017-02-01T07:00:43+00:00,
feeds : [ ... ]

Classify a page according to a taxonomy

So while these extracted tags can be useful in understanding a webpage they don’t necessarily help if your aim is to classify content based on a particular taxonomy.

The tags can also be used to classify a piece of content or a page into predetermined categories according to a particular taxonomy. For example using our classification by taxonomy features, these tags provide users with the ability to categorize content efficiently.

First we extract the tags from an Irish times article on Conor McGregor

author :  Emmet Malone,
image :,
tags : [
Other Sports,
Nate Diaz,
Dana White,
Conor Mcgregor
article :  When this imbroglio finally blows over, we can explore what Conor McGregor has against Connecticut. For now, let’s conce...,
videos : [ ... ],
title :  Conor McGregor lays cards on table in poker game with UFC,
publishDate :  2016-04-21T22:48:00+00:00,
feeds : [ ... ]

You’ll see the tags present in the results above.

Next, we use our Classification by Taxonomy feature to automatically categorize the content. You’ll see from the results below that it is correctly categorized as Sports and Martial Arts.

text :  UFC, Other Sports, Nate Diaz, Other, Dana White, Sport, Conor Mcgregor,
taxonomy :  iab-qag,
language :  en,
categories : [
confident :  true,
score :  0.22010621132863611,
label :  Sports,
links : [ ... ],
id :  IAB17
confident :  true,
score :  0.11470804569304427,
label :  Martial Arts,
links : [ ... ],
id :  IAB17-20

Classifying text-light web pages

In most NLP driven page classification problems you rely heavily on the main body of text present on a page to give context and understanding. However, in some cases, web pages may contain little or no text which make it harder to classify or categorize. Common examples include pages that contain only a video, or a collection of photos or product style pages like the one below.

Screen Shot 2017-02-21 at 15.51.58

an example of a product page that is very light on text

As an example, let’s take the page above, it’s a product page taken from the Best Buy website. What you’ll notice is, there’s very little text on the page to use for an appropriate analysis target apart from a few headings. On top of that there are also lot’s of other elements like ads and buttons on the page which make it even harder to scrape. Now just imagine how different every product page is, it’s almost impossible to build an efficient script or bot that’s going to classify these pages successfully.

Using the Tag Extraction feature however means you can leverage other elements of the page as previously explained and shown in the results below. As you can see from the JSON results, the listed tags are very precise, and do an excellent job of describing the page in question.

author :  ,
image :,
tags : [
Smart Lights,
Switches & Plugs,
Smart Home,
Home Living,
Best Buy Canada,
Smart Lighting
article :  The Hue Phoenix rises to the occasion when you want to create ambience and mood lighting. Using the Philips Hue app, you...,
videos : [ ... ],
title :  Philips Hue Phoenix Table Lamp - Opal White,
publishDate :  ,
feeds : [ ... ]


Whatever your reason is for understanding web pages at scale, this new feature provides you with a fantastic opportunity to dive even deeper into the web content they analyze and analyze and classify a wider variety of web pages.

Want to try it for yourself? Click the image below to sign-up to get Free access and 1,000 calls per day with our Text Analysis API.

Text Analysis API - Sign up



Last week we showed you how we analyzed 2.2 million tweets associated with Super Bowl 51 to gauge the public’s reaction to the event. While the Atlanta Falcons and New England Patriots waged war on the field, a battle of ever-increasing popularity and importance was taking place off it. We are of course referring to the Super Bowl ads battle, where some of biggest brands on Earth pay top dollar for a 30-second slot during one of sport’s greatest spectacles.

With roughly 35% of the US population tuning in to watch this year’s Super Bowl, it’s easy to see why brands pay what they do to be involved, which is in the region of $5 million for 30 seconds of airtime. Breaking it down, that’s over $166,000 per second!

So after analyzing how Twitter reacted to the game itself, we wanted to once again dive into the much anticipated and scrutinized battle of the brands, this time by looking at both Twitter’s reaction as well online news content.

Our Process

In particular, we were interested in uncovering and investigating the following key areas:

  • Volume of tweets before, during and after the game
  • Sentiment of tweets before, during and after the game
  • Brand-specific public reactions
  • Brand-specific tweet volumes
  • Reaction from online news
  • Most mentioned brands and individuals in online news

To do so, we looked at both Twitter and the news.


We used the Twitter Streaming API to collect a total of around 2.2 million tweets that mentioned a selection of game and team-related keywords, hashtags and handles. Using the AYLIEN Text Analysis API, we then analyzed each of these tweets and visualized our results using Tableau.


To analyze the reaction in online news content, we performed specific search queries using the AYLIEN News API, again using Tableau to visualize.

Our Predictions

Prior to the Super Bowl, we looked at and analyzed brand mention volumes in online news in an attempt to predict how the public would react to their ads when they aired during the game.

From our analysis, we selected our top 3 Super Bowl ads to watch out for and made predictions for each. We’ll expand on each prediction below and also look at the performance of some other interesting brands. Here are the brands we analyzed;

  • Pepsi
  • Budweiser
  • Avocados from Mexico
  • Intel
  • KFC
  • Snickers
  • T-Mobile
  • KIA

You can also check out the original blog post here: Using NLP & media monitoring to predict the winners and losers of the Super Bowl 51 ads battle.

So without further ado, let’s see how we did!

Tweet volumes by brand

To begin, we wanted to see which brands performed best in terms of tweet mentions. How many tweets contained a mention of each brand?

The chart below shows how brand-related chatter on Twitter developed before, during and after the game.

Straight away we can see that two brands in particular considerably outperformed the rest when it came to spikes in mention volumes; Pepsi and Avocados from Mexico.

These two brands along with Budweiser make up our top 3, with the others really failing to make much of an impact in comparison.

Perhaps the most interesting observation from this chart is the double volume spike for Pepsi, which came pre-game and mid-game. Let’s take a look the reason behind this;


Our pre-game predictions

  • Huge Twitter mention volumes for Pepsi, owing to Lady Gaga’s performance.
  • Low mention volumes for LIFEWTR and Pepsi Zero Sugar.
  • Tame public reaction to LIFEWTR commercial and very low YouTube views.


In terms of tweet mention volume, Pepsi was a clear overall winner. The beverage giant focused their efforts on generating awareness around two new products; LIFEWTR and Pepsi Zero Sugar. LIFEWTR was given it’s own commercial in the first quarter, while Zero Sugar sponsored the halftime show.

Judging by the sheer volume of tweet chatter around Pepsi, you might assume that their ad and new products have been well received by the viewing public. However, as we predicted prior to the game, Pepsi’s high mentions volume was mostly down to the fact that they sponsored the halftime show, that starred Lady Gaga. The two spikes visible in the chart below actually have very little to do with either product. Rather, they represent 1) a barrage pre-game good luck Gaga tweets and 2) Twitter’s reaction to the singer’s halftime show performance.

Sentiment analysis of Pepsi tweets

The chart below shows volumes of positive and negative tweets before, during and after the game. It should be noted that the majority of tweets collected have neutral sentiment, and offer no opinion either way. We therefore exclude tweets with neutral sentiment.

Less than 5% of tweets mentioning Pepsi also included mentions of LIFEWTR or Zero Sugar, such was the dominance of Lady Gaga on Twitter. While Pepsi had a strong brand presence throughout the event, they perhaps failed to highlight their two new products. The chart below compares mentions of Pepsi and Lady Gaga, with an obvious winner;

Further evidence of Pepsi’s apparent failure to highlight their new products comes from the LIFEWTR ad performance on YouTube, which currently has 1.2 million views. When compared to the likes of KIA and Mr. Clean whose ads have 21.6 and 14.5 million views respectively, you can see how little of an impact the commercial had on viewers.

Watch: Pepsi’s Super Bowl ad for LIFEWTR


Our pre-game predictions

  • Most controversial ad this year
  • Ad content will be irrelevant, and a political debate will rage on Twitter


Budweiser’s Super Bowl ad, titled Born The Hard Way, depicts Adolphus Busch, the co-founder of Anheuser-Busch, arriving in the US from Germany with a dream of opening his own beer brewery. With the immigrant-theme and opening line “You don’t look like you’re from around here”, the ad unintentionally fuelled a political debate online as Trump supporters saw it as a clear dig at the President’s planned travel ban, while Trump opposers saw it as political statement and celebration of immigrant history in the US.

Sentiment analysis of Budweiser tweets

The high volumes of both positive and negative tweets certainly backs up our prediction of Budweiser airing the most controversial ad of Super Bowl 51. While positive sentiment outweighed negative throughout the event, it is clear to see there was a strong split in opinion here.
As expected, the ad itself was minimally discussed on Twitter. Rather, it was seen as a political statement and one that many felt they needed to be either for or against. On one side of the fence, we had tweeters threatening to never buy Budweiser products again and using #boycottbudweiser in their tweets.

On the other side, we had people declaring their love for the brand, and encouraging others to go out and buy the beer!

Although Budweiser claim the ad was shot in Summer of 2016, long before the current controversy around Trump’s travel ban came to the fore, we are left to wonder whether their timing was mere coincidence or a well planned publicity stunt.

Watch: Budweiser’s Super Bowl ad “Born The Hard Way”


Our pre-game predictions

  • Live format will inspire and drive high social engagement.
  • A popular cast, inclusion of horses and a fun theme will see Snickers near the top of our most liked ads in terms of positive Twitter sentiment.


Snickers made Super Bowl history this year by being the first brand to perform and broadcast their commercial live during the event. With the intrigue of a live performance, as well as the inclusion of superstars like Betty White and Adam Driver, we were excited to see how this one played out, particularly the reaction on social media.

Sentiment analysis of Snickers tweets

While we did pretty well predicting the reactions to both the Pepsi and Budweiser ads, we’ll put our hands up here and admit we got this one wrong!

Overall, mentions of Snickers on Twitter were very low. From the tweets we did gather, the vast majority of them had neutral sentiment, meaning viewers really didn’t feel strongly about the ad either way.

Whether they are well received or not, great ads tend to make people feel something. Unfortunately for Snickers, their innovative approach and live broadcast wasn’t enough to make up for an ad that failed to make viewers feel anything, and ultimately it fell flat.

Watch: Snickers LIVE Super Bowl ad

Lady Gaga vs. the brands

We showed earlier the impact that Lady Gaga had on Pepsi’s tweet volumes, but to really drive home the full extent of Twitter’s reaction to the singer during the Super Bowl, we’ve compared total tweet volumes for each of the three brands we’ve looked at in this post and compared them to Lady Gaga’s.

Tweet volumes: Lady Gaga, Pepsi, Budweiser & Snickers

Reaction in online news

Now that we’ve looked at some of the public reaction to Super Bowl 51 on Twitter, we wanted to also look at how at how the news reacted to the event, and in particular the ads battle. To do this, we began by looking at the most mentioned keywords in news stories mentioning “Super Bowl” and “ad” or “commercial”.

Note: We removed obvious, unhelpful and game-related keywords such as Houston, football, Falcons, Tom Brady, etc.

What were the most talked about Super Bowl topics in the news?

The bubble chart below shows the most mentioned topics from online news content in the week immediately following the Super Bowl. The bigger the bubble, the higher the mention count. You can hover over each bubble to view more information and data.

It’s hard to escape politics these days, and the Super Bowl was no exception. With the likes of Budweiser, Airbnb, 84 Lumber, Audi and Coca-Cola all airing ads that related to the current political climate, it is no surprise to see these brands mentioned most alongside political and Donald Trump.

Most mentioned individuals

With brands mostly dominating the previous chart, we decided to narrow our focus on the individuals who were mentioned most. Again, Donald Trump tops the list, followed by Lady Gaga, Melissa McCarthy (KIA ad) and Justin Bieber (T-Mobile ad).


As we touched on in our previous post, the modern-day Super Bowl is becoming increasingly less about the game itself, and more about the surrounding hype, entertainment and commercial opportunities that come with an event of such magnitude.

With top brands spending a minimum of $5 million for a 30 second commercial, what seems like a heavy investment can result in a big increase in brand awareness as viewers promote ads through shares and likes on social media. There is uncapped potential for these ads too. Create something special that connects with, amuses or fascinates viewers and your ad may be viewed and shared for years to come.

Thanks to advancements in Natural Language Process and Text Analysis, brands can analyze ad performance down to the minutest of details and gain powerful insights in their quest to create commercial content that resonates with viewers.

Text Analysis API - Sign up


Our researchers at AYLIEN keep abreast of and contribute to the latest developments in the field of Machine Learning. Recently, two of our research scientists, John Glover and Sebastian Ruder, attended NIPS 2016 in Barcelona, Spain. In this post, Sebastian highlights some of the stand-out papers and trends from the conference.


The Conference on Neural Information Processing Systems (NIPS) is one of the two top conferences in machine learning. It took place for the first time in 1987 and is held every December, historically in close proximity to a ski resort. This year, it took place in sunny Barcelona. The conference (including tutorials and workshops) went on from Monday, December 5 to Saturday, December 10. The full conference program is available here.

Machine Learning seems to become more pervasive every month. However, it is still sometimes hard to keep track of the actual extent of this development. One of the most accurate barometers for this evolution is the growth of NIPS itself. The number of attendees skyrocketed at this year’s conference growing by over 50% year-over-year.


Image 1: The growth of the number of attendees at NIPS follows (the newly coined) Terry’s Law (named after Terrence Sejnowski, the president of the NIPS foundation; faster growth than Moore’s Law)

Unsurprisingly, Deep Learning (DL) was by far the most popular research topic, with about every fourth of more than 2,500 submitted papers (and 568 accepted papers) dealing with deep neural networks.


Image 2: Distribution of topics across all submitted papers (Source: The review process for NIPS 2016)

On the other hand, the distribution of research paper topics has quite a long tail and reflects the diversity of topics at the conference that span everything from theory to applications, from robotics to neuroscience, and from healthcare to self-driving cars.

Generative Adversarial Networks

One of the hottest developments within Deep Learning was Generative Adversarial Networks (GANs). The minimax game playing networks have by now won the favor of many luminaries in the field. Yann LeCun hails them as the most exciting development in ML in recent years. The organizers and attendees of NIPS seem to side with him: NIPS featured a tutorial by Ian Goodfellow about his brainchild, which led to a packed main conference hall.


Image 3: A full conference hall at the GAN tutorial

Though a fairly recent development, there are many cool extensions of GANs among the conference papers:

  • Reed et al. propose a model that allows you to specify not only what you want to draw (e.g. a bird) but also where to put it in an image.
  • Chen et al. disentangle factors of variation in GANs by representing them with latent codes. The resulting models allow you to adjust e.g. the type of a digit, its breadth and width, etc.

In spite of their popularity, we know alarmingly little about what makes GANs so capable of generating realistic-looking images. In addition, making them work in practice is an arduous endeavour and a lot of (undocumented) hacks are necessary to achieve the best performance. Soumith Chintala presents a collection of these hacks in his “How to train your GAN” talk at the Adversarial Training workshop.


Image 4: How to train your GAN (Source: Soumith Chintala)

Yann LeCun muses in his keynote that the development of GANs parallels the history of neural networks themselves: They were poorly understood and hard to get to work in the beginning and only took off once researchers figured out the right tricks and learned how to make them work. At this point, it seems unlikely that GANs will experience a winter anytime soon; the research community is still at the beginning in learning how to make the best use of them and it will be exciting to see what progress we can make in the coming years.

On the other hand, the success of GANs so far has been limited mostly to Computer Vision due to their difficulty in modelling discrete rather than continuous data. The Adversarial Training workshop showcased some promising work in this direction (see e.g. our own John Glover’s paper on modeling documents, this paper and this paper on generating text, and this paper on adversarial evaluation of dialogue models). It remains to be seen if 2017 will be the year in which GANs break through in NLP.

The Nuts and Bolts of Machine Learning

Andrew Ng gave one of the best tutorials of the conference with his take on building AI applications using Deep Learning. Drawing from his experience of managing the 1,300 people AI team at Baidu and hundreds of applied AI projects and equipped solely with two whiteboards, he shared many insights about how to build and deploy AI applications in production.

Besides better hardware, Ng attributes the success of Deep Learning to two factors: In contrast to traditional methods, deep NNs are able to learn more effectively from large amounts of data. Secondly, end-to-end (supervised) Deep Learning allows us to learn to map from inputs directly to outputs.

While this approach to training chatbots or self-driving cars is sufficient to write innovative research papers, Ng emphasized end-to-end DL is often not production-ready: A chatbot that maps from text directly to a response is not able to have a coherent conversation or fulfill a request, while mapping from an image directly to a steering command might have literally fatal side effects if the model has not encountered the corresponding part of the input space before. Rather, for a production model, we still want to have intermediate steps: For a chatbot, we prefer to have an inference engine that generates a response, while in a self-driving car, DL is used to identify obstacles, while the steering is performed by a traditional planning algorithm.


Image 5: Andrew Ng on end-to-end DL (right: end-to-end DL chatbot and chatbot with inference engine; left bottom: end-to-end DL self-driving car and self-driving car with intermediate steps)

Ng also shared that the most common mistakes he sees in project teams is that they track the wrong metrics: In an applied machine learning project, the only relevant metrics are the training error, the development error, and the test error. These metrics alone enable the project team to know what steps to take, as he demonstrated in the diagram below:


Image 6: Andrew Ng’s flowchart for applied ML projects

A key facilitator of the recent success of ML have been the advances in hardware that allowed faster computation and storage. Given that Moore’s Law will reach its limits sooner or later, one might reason that also the rise of ML might plateau. Ng, however, argued that the commitment by leading hardware manufacturers such as NVIDIA and Intel and the ensuing performance improvements to ML hardware would fuel further growth.

Among ML research areas, supervised learning is the undisputed driver of the recent success of ML and will likely continue to drive it for the foreseeable future. In second place, Ng saw neither unsupervised learning nor reinforcement learning, but transfer learning. We at AYLIEN are bullish on transfer learning for NLP and think that it has massive potential.

Recurrent Neural Networks

The conference also featured a symposium dedicated to Recurrent Neural Networks (RNNs). The symposium coincided with the 20 year anniversary of LSTM…


Image 7: Jürgen Schmidhuber kicking off the RNN symposium

… being rejected from NIPS 1996. The fact that papers that do not use LSTMs have been rare in the most recent NLP conferences (see our EMNLP blog post) is a testament to the perseverance of the authors of the original paper, Sepp Hochreiter and Jürgen Schmidhuber.

At NIPS, we had several papers that sought to improve RNNs in different ways:

Other improvements apply to Deep Learning in general:

  • Salimans and Kingma propose Weight Normalisation to accelerate training that can be applied in two lines of Python code.
  • Li et al. propose a multinomial variant of dropout that sets neurons to zero depending on the data distribution.

The Neural Abstract Machines & Program Induction (NAMPI) workshop also featured several speakers talking about RNNs:

  • Alex Graves focused on his recent work on Adaptive Computation Time (ACT) for RNNs that allows to decouple the processing time from the sequence length. He showed that a word-level language model with ACT could reach state-of-the-art with fewer computations.
  • Edward Grefenstette outlined several limitations and potential future research directions in the context of RNNs in his talk.

Improving classic algorithms

While Deep Learning is a fairly recent development, the conference featured also several improvements to algorithms that have been around for decades:

  • Ge et al. show in their best paper that the non-convex objective for matrix completion has no spurious local minima, i.e. every local minimum is a global minimum.
  • Bachem et al. present a method that guarantees accurate and fast seedings for large-scale k-means++ clustering. The presentation was one of the most polished ones of the conference and the code is open-source and can be installed via pip.
  • Ashtiani et al. show that we can make NP-hard k-means clustering problems solvable by allowing the model to pose queries for a few examples to a domain expert.

Reinforcement Learning

Reinforcement Learning (RL) was another much-discussed topic at NIPS with an excellent tutorial by Pieter Abbeel and John Schulman dedicated to RL. John Schulman also gave some practical advice for getting started with RL.

One of the best papers of the conference introduces Value Iteration Networks, which learn to plan by providing a differentiable approximation to a classic planning algorithm via a CNN. This paper was another cool example of one of the major benefits of deep neural networks: They allow us to learn increasingly complex behaviour as long as we can represent it in a differentiable way.

During the week of the conference, several research environments for RL were simultaneously released, among them OpenAI’s Universe, Deep Mind Lab, and FAIR’s Torchcraft. These will likely be a key driver in future RL research and should open up new research opportunities.

Learning-to-learn / Meta-learning

Another topic that came up in several discussions over the course of the conference was Learning-to-learn or Meta-learning:

  • Andrychowicz et al. learn an optimizer in a paper with the ingenious title “Learning to learn by gradient descent by gradient descent”.
  • Vinyals et al. learn how to one shot-learn in a paper that frames one-shot learning in the sequence-to-sequence framework and has inspired new approaches for one-shot learning.

Most of the existing papers on meta-learning demonstrate that wherever you are doing something that gives you gradients, you can optimize them using another algorithm via gradient descent. Prepare for a surge of “Meta-learning for X” and “(Meta-)+learning” papers in 2017. It’s LSTMs all the way down!

Meta-learning was also one of the key talking points at the RNN symposium. Jürgen Schmidhuber argued that a true meta-learner would be able to learn in the space of all programs and would have the ability to modify itself and elaborated on these ideas at his talk at the NAMPI workshop. Ilya Sutskever remarked that we currently have no good meta-learning models. However, there is hope as the plethora of new research environments should also bring progress in this area.

General Artificial Intelligence

Learning how to learn also plays a role in the pursuit of the elusive goal of attaining General Artificial Intelligence, which was a topic in several keynotes. Yann LeCun argued that in order to achieve General AI, machines need to learn common sense. While common sense is often vaguely mentioned in research papers, Yann LeCun gave a succinct explanation of what common sense is: “Predicting any part of the past, present or future percepts from whatever information is available.” He called this predictive learning, but notes that this is really unsupervised learning.

His talk also marked the appearance of a controversial and often tongue-in-cheek copied image of a cake, which he used to demonstrate that unsupervised learning is the most challenging task where we should concentrate our efforts, while RL is only the cherry on the icing of the cake.


Image 8: The Cake slide of Yann LeCun’s keynote

Drew Purves focused on the bilateral relationship between the environment and AI in what was probably the most aesthetically pleasing keynote of the conference (just look at those graphics!)


Image 9: Graphics by Max Cant of Drew Purves’ keynote (Source: Drew Purves)

He emphasized that while simulations of ecological tasks in naturalistic environments could be an important test bed for General AI, General AI is needed to maintain the biosphere in a state that will allow the continued existence of our civilization.


Image 10: Nature needs AI and AI needs Nature from Drew Purves’ keynote

While it is frequently — and incorrectly — claimed that neural networks work so well because they emulate the brain’s behaviour, Saket Navlakha argued during his keynote that we can still learn a great deal from the engineering principles of the brain. For instance, rather than pre-allocating a large number of neurons, the brain generates 1000s of synapses per minutes until its second year. Afterwards, until adolescence, the number of synapses is pruned and decreases by ~50%.


Image 11: Saket Navlakha’s keynote

It will be interesting to see how neuroscience can help us to advance our field further.

In the context of the Machine Intelligence workshop, another environment was introduced in the form of FAIR’s CommAI-env that allows to train agents through interaction with a teacher. During the panel discussion, the ability to learn hierarchical representations and to identify patterns was emphasized. However, although the field is making rapid progress on standard tasks such as object recognition, it is unclear if the focus on such specific tasks brings us indeed closer to General AI.

Natural Language Processing

While NLP is more of a niche topic at NIPS, there were a few papers with improvements relevant to NLP:

  • He et al. propose a dual learning framework for MT that has two agents translating in opposite directions teaching each other via reinforcement learning.
  • Sokolov et al. explore how to use structured prediction under bandit feedback.
  • Huang et al. extend Word Mover’s Distance, an unsupervised document similarity metric to the supervised setting.
  • Lee et al. model the helpfulness of reviews by taking into account position and presentation biases.

Finally, a workshop on learning methods for dialogue explored how end-to-end systems, linguistics and ML methods can be used to create dialogue agents.



Jürgen Schmidhuber, the father of the LSTM was not only present on several panels, but did his best to remind everyone that whatever your idea, he had had a similar idea two decades ago and you should better cite him lest he interrupt your tutorial.



Boston Robotics’ Spot proved that — even though everyone is excited by learning and learning-to-learn — traditional planning algorithms are enough to win the admiration of a hall full of learning enthusiasts.


Image 12: Boston Robotics’ Spot amid a crowd of fascinated onlookers


Apple, one of the most secretive companies in the world, has decided to be more open, to publish, and to engage with academia. This can only be good for the community. We’re looking forward to more apple research papers.


Image 13: Ruslan Salakhutdinov at the Apple lunch event


Uber announced their acquisition of Cambridge-based AI startup Geometric Intelligence and threw one of the most popular parties of NIPS.


Image 14: The Geometric Intelligence logo

Rocket AI

Talking about startups, the “launch” of Rocket AI and their patented Temporally Recurrent Optimal Learning had some people fooled (note the acronyms in the below tweets). Riva-Melissa Tez finally cleared up the confusion.


These were our impressions from NIPS 2016. We had a blast and hope to be back in 2017!


Text Analysis API - Sign up



The 2016 US Presidential election was one of (if not the) most controversial in the nation’s history. With the end prize being arguably the most powerful job in the world, the two candidates were always going to find themselves coming under intense media scrutiny. With more media outlets covering this election than any that have come before it, an increase in media attention and influence was a given.

But how much of an influence does the media really have on an election? Does journalistic bias sway voter opinion, or does voter opinion (such as poll results) generate journalistic bias? Does the old adage “all publicity is good publicity” ring true at election time?

“My sense is that what we have here is a feedback loop. Does media attention increase a candidate’s standing in the polls? Yes. Does a candidate’s standing in the polls increase media attention? Also yes.” -Jonathan Stray @jonathanstray

Thanks to an ever-increasing volume of media content flooding the web, paired with advances in natural language processing and text analysis capabilities, we are in a position to delve deeper into these questions than ever before, and by analyzing the final sixty days of the 2016 US Presidential election, that’s exactly what we set out to do.

So, where did we start?

We started by building a very simple search using our News API to scan thousands of monitored news sources for articles related to the election. These articles, 170,000 in total, were then indexed automatically using our text analysis capabilities in the News API.

This meant that key data points in those articles were identified and indexed to be used for further analysis:

  • Keywords
  • Entities
  • Concepts
  • Topics

With each of the articles or stories sourced comes granular metadata such as publication time, publication source, source location, journalist name and sentiment polarity of each article. Combined, these data points provided us with an opportunity to uncover and analyze trends in news stories relating to the two presidential candidates.

We started with a simple count of how many times each candidate was mentioned from our news sources in the sixty days leading up to election day, as well as the keywords that were mentioned most.


By extracting keywords from the news stories we sourced, we get a picture of the key players, topics, organizations and locations that were mentioned most. We generated the interactive chart below using the following steps;

  1. We called the News API using the query below.
  2. We called it again, but searched for “Trump NOT Clinton”
  3. Mentions of the two candidates naturally dominated in both sets of results so we removed them in order to get a better understanding of the keywords that were being used in articles written about them. We also removed some very obvious and/or repetitive words such as USA, America, White House, candidate, day, etc.

Here’s the query;

You can hover your cursor over each cluster to view details;

Most mentioned keywords in articles about Hillary Clinton

Straight away, bang in the middle of these keywords, we can see FBI and right beside it, emails.

Most mentioned keywords in articles about Donald Trump

Similar to Hillary, Trump’s main controversies appear most prominently in his keywords, with terms like women, video, sexual and assault all appearing prominently.

Most media mentions

If this election was decided by the number of times a candidate was mentioned in the media, who would win? We used the following search queries to total the number of mentions from all sources over the sixty days immediately prior to election day;

Note: We could also have performed this search with a single query, but we wanted to separate the candidates for further analysis, and in doing this, we removed overlapping stories with titles that mentioned both candidates.

Here’s what we found, visualized;

Who was mentioned more in the media? Total mentions volume:

It may come as no surprise that Trump was mentioned considerably more than Clinton during this period, but was he consistently more prominent in the news over these sixty days, or was there perhaps a major story that has skewed the overall results? By using the Time Series endpoint, we can graph the volume of stories over time.

We generated the following chart using results from the two previous queries;

How media mentions for both candidates fluctuated in the final 60 days

As you would expect, the volume of mentions for each candidate fluctuates throughout the sixty day period, and to answer our previous question – yes, Donald Trump was consistently more prominent in terms of media mentions throughout this period. In fact, he was mentioned more than Hillary Clinton in 55 of the 60 days.

Let’s now take a look at some of the peak mention periods for each candidate to see if we can uncover the reasons for the spikes in media attention;

Donald Trump

Trump’s peak period of media attention was October 10-13, as indicated by the highest red peak in the graph above. This period represented the four highest individual days of mention volume and can be attributed to the scandal that arose from sexual assault accusations and a leaked tape showing Trump making controversial comments about groping women.

The second highest peak, October 17-20, coincides with a more positive period for Trump, as a combination of a strong final presidential debate and a growing email scandal surrounding Hillary Clinton increased his media spotlight.

Hillary Clinton

Excluding the sharp rise in mentions just before election day, Hillary’s highest volume days in terms of media mentions occurred from October 27-30 as news of the re-emergence of an FBI investigation surfaced.

So we’ve established the dates over the sixty days when each candidate was at their peak of media attention. Now we want to try establish the sentiment polarity of the stories that were being written about each candidate throughout this period. In other words, we want to know whether stories were being written in a positive, negative or neutral way. To achieve this, we performed Sentiment Analysis.

Sentiment analysis

Sentiment Analysis is used to detect positive or negative polarity in text. Also known as opinion mining, sentiment analysis is a feature of text analysis and natural language processing (NLP) research that is increasingly growing in popularity as a multitude of use-cases emerge. Put simply, we perform Sentiment Analysis to uncover whether a piece of text is written in a positive, negative or neutral manner.

Note: The vast majority of news articles about the election will undoubtedly contain mentions of both Trump and Clinton. We therefore decided to only count stories with titles that mentioned just one candidate. We believe this significantly increases the likelihood that the article was written about that candidate. To achieve this, we generated search queries that included one candidate while excluding the other. The News API supports boolean operators, making such search queries possible.

First of all, we wanted to compare the overall sentiment of all stories with titles that mentioned just one candidate. Here are the two queries we used;

And here are the visualized results;

What am I seeing here? Blue represents articles written in a neutral manner, red in a negative manner and green in a positive manner. Again, you can hover over the graph to view more information.

What was the overall media sentiment towards Hillary Clinton?

What was the overall media sentiment towards Donald Trump?

Those of you that followed the election, to any degree, will probably not be surprised by these results. We don’t really need data to back up the claim that Trump ran the more controversial campaign and therefore generated more negative press.

Again, similar to how we previously graphed mention volumes over time, we also wanted to see how sentiment in the media fluctuated throughout this sixty day period. First we’ll look at Clinton’s mention volume and see if there is any correlation between mention volume and sentiment levels.

Hillary Clinton

How to read this graph: The top half (blue) represents fluctuations in the number of daily media mentions (‘000’s) for Hillary Clinton. The bottom half represents fluctuations in the average sentiment polarity of the stories in which she was mentioned. Green = positive and red = negative.

You can hover your cursor over the data points to view more in-depth information.

Mentions Volume (top) vs. Sentiment (bottom) for Hillary Clinton

From looking at this graph, one thing becomes immediately clear; as volume increases, polarity decreases, and vice versa. What does this tell us? It tells us that perhaps Hillary was in the news for the wrong reasons too often – there were very few occasions when both volume and polarity increased simultaneously.

Hillary’s average sentiment remained positive for the majority of this period. However, that sharp dip into the red circa October 30 came just a week before election day. We must also point out the black line that cuts through the bottom half of the graph. This is a trend line representing average sentiment polarity and as you can see, it gets consistently closer to negative as election day approaches.

Mentions Volume (top) vs. Sentiment (bottom) for Donald Trump

Trump’s graph paints a different picture altogether. There was not a single day when his average polarity entered into the positive (green). What’s interesting to note here, however, is how little his mention volumes affected his average polarity. While there are peaks and troughs, there were no major swings in either direction, particularly in comparison to those seen on Hillary’s graph.

These results are of course open to interpretation, but what is becoming evident is that perhaps negative stories in the media did more damage to Clinton’s campaign than they did to Trump’s. While Clinton’s average sentiment polarity remained consistently more positive, Trump’s didn’t appear to be as badly affected when controversial stories emerged. He was consistently controversial!

Trumps lowest point, in terms of negative press, came just after the second presidential debate at the end of September. What came after this point is the crucial detail, however. Trump’s average polarity recovered and mostly improved for the remainder of the campaign. Perhaps critically, we see his highest and most positive averages of this period in the final 3 weeks leading up to election day.

Sentiment from sources

At the beginning of this post we mentioned the term media bias and questioned its effect on voter opinion. While we may not be able to prove this effect, we can certainly uncover any traces of bias from media content.

What we would like to uncover is whether certain sources (ie publications) write more or less favorably about either candidate.

To test this, we’ve analyzed the sentiment of articles written about both candidates from two publications: USA Today and Fox News.

USA Today


Similar to the overall sentiment (from all sources) displayed previously, the sentiment polarity of articles from USA Today shows consistently higher levels of negative sentiment towards Donald Trump. The larger than average percentage of neutral results indicate that USA Today took a more objective approach in their coverage of the election.

USA Today – Sentiment towards Hillary Clinton

USA Today – Sentiment towards Donald Trump

Fox News

Again, Trump dominates in relation to negative sentiment from Fox News. However, what’s interesting to note here is that Fox produced more than double the percentage of negative story titles about Hillary Clinton than USA Today did. We also found that, percentage-wise, they produced half as many positive stories about her. Also, 3.9% of Fox’s Trump coverage was positive, versus USA Today’s 2.5%.

Fox News – Sentiment towards Hillary Clinton

Fox News – Sentiment towards Donald Trump

Media bias?

These figures beg the question; how are two major news publications writing about the exact same news, with such varied levels of sentiment? It certainly highlights the potential influence that the media can have on voter opinion, especially when you consider how many people see each article/headline. The figures below represent social shares for a single news article;

Screen Shot 2016-11-17 at 09.43.44

Bear in mind, these figures don’t represent the number of people who saw the article, they represent the number of people who shared it. The actual number of people who saw this on their social feed will be a high-multiple of these figures. In fact, we grabbed the average daily social shares, per story, and graphed them to compare;

Average social shares per story

Pretty even, and despite Trump being mentioned over twice as many times as Clinton during this sixty day period, he certainly didn’t outperform her when it came to social shares.


Since the 2016 US election was decided there has been a sharp focus on the role played by news and media outlets in influencing public opinion. While we’re not here to join the debate, we are here to show you how you can deep-dive into news content at scale to uncover some fascinating and useful insights that can help you source highly targeted and precise content, uncover trends and assist in decision making.

To start using our News API for free and query the world’s news content easily, click here.

News API - Sign up



It’s certainly an exciting time be involved in Natural Language Processing (NLP), not only for those of us who are involved in the development and cutting-edge research that is powering its growth, but also for the multitude of organizations and innovators out there who are finding more and more ways to take advantage of it to gain a competitive edge within their respective industries.

With the global NLP market expected to grow to a value of $16 billion by 2021, it’s no surprise to see the tech giants of the world investing heavily and competing for a piece of the pie. More than 30 private companies working to advance artificial intelligence technologies have been acquired in the last 5 years by corporate giants competing in the space, including Google, Yahoo, Intel, Apple and Salesforce. [1]

It’s not all about the big boys, however, as NLP, text analysis and text mining technologies are becoming more and more accessible to smaller organizations, innovative startups and even hobbyist programmers.

NLP is helping organizations make sense of vast amounts of unstructured data, at scale, giving them a level of insight and analysis that they could have only dreamed about even just a couple of years ago.

Today we’re going to take a look at 3 industries on the cusp of disruption through the adoption of AI and NLP technologies;

  1. The legal industry
  2. The insurance industry
  3. Customer service

NLP & Text Analysis in the Legal industry

While we’re still a long long way away from robot lawyers, the current organic crop of legal professionals are already taking advantage of NLP, text mining and text analysis techniques and technologies to help them make better-informed decisions, in quicker time, by discovering key insights that can often be buried in large volumes of data, or that may seem irrelevant until analyzed at scale, uncovering strategy-boosting and often case-changing trends.

Let’s take a look at two examples of how legal pro’s are leveraging NLP and text analysis technologies to their advantage;

  • Information retrieval in ediscovery
  • Contract management
  • Article summarization

Information retrieval in ediscovery

Ediscovery refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format. Electronic documents are often accompanied by metadata that is not found on paper documents, such as the date and time the document was written, shared, etc. This level of minute detail can be crucial in legal proceedings.

As far as NLP is concerned, ediscovery is mainly about information retrieval, aiding legal teams in their search for relevant and useful documents.

In many cases, the amount of data requiring analysis can exceed 100GB, when often only 5% – 10% of it is actually relevant. With outside service bureaus charging $1,000 per GB to filter and reduce this volume, you can start to see how costs can quickly soar.

Data can be filtered and separated by extracting mentions of specific entities (people, places, currency amounts, etc), including/excluding specific timeframes and in the case of email threads, only include mails that contain mentions of the company, person or defendant in question.

Contract management

NLP enables contract management departments to extract key information, such as currency amounts and dates, to generate reports that summarize terms across contracts, allowing for comparisons among terms for risk assessment purposes, budgeting and planning.

In cases relating to Intellectual Property disputes, attorneys are using NLP and text mining techniques to extract key information from sources such as patents and public court records to help give them an edge with their case.

Article summarization

Legal documents can be notoriously long and tedious to read through in their entirety. Sometimes all that is required is a concise summary of the overall text to help gain an understanding of its content. Summarization of such documents is possible with NLP, where a defined number of sentences are selected from the main body of text to create, for example, a summary of the top 5 sentences that best reflect the content of the document as a whole.

NLP & Text Analysis in the Insurance industry

Insurance providers gather massive amounts of data each day from a variety of channels, such as their website, live chat, email, social networks, agents and customer care reps. Not only is this data coming in from multiple channels, it also relates to a wide variety of issues, such as claims, complaints, policies, health reports, incident reports, customer and potential customer interactions on social media, email, live chat, phone… the list goes on and on.
The biggest issue plaguing the insurance industry is fraud. Let’s take a look at how NLP, data mining and text analysis techniques can help insurance providers tackle these key issues;

  • Streamline the flow of data to the correct departments/agents
  • Improve agent decision making by putting timely and accurate data in front of them
  • Improve SLA response times and overall customer experience
  • Assist in the detection of fraudulent claims and activity

Streamlining the flow of data

That barrage of data and information that insurance companies are being hit by each and every day needs to be intricately managed, stored, analyzed and acted upon in a timely manner. A missed email or note may not only result in poor service and an upset customer, it could potentially cost the company financially if, for example, relevant evidence in a dispute or claim case fails to surface or reach the right person/department on time.

Natural Language Processing is helping insurance providers ensure the right data reaches the right set of eyeballs at the right time through automated grouping and routing of queries and documents. This goes beyond simple keyword-matching with text analysis techniques used to ‘understand’ the context and category of a piece of text and classify it accordingly.

Fraud detection

According to a recent report by Insurance Europe, detected and undetected fraudulent claims are estimated to represent 10% of all claims expenditure in Europe. Of note here, of course, is the fraud that goes undetected.

Insurance companies are using NLP and text analysis techniques to mine the data contained within unstructured sources such as applications, claims forms and adjuster notes to unearth certain red flags in submitted claims. For example, a regular indicator of organized fraudulent activity is the appearance of common phrases or descriptions of incidents from multiple claimants. The trained human eye may or may not be able to spot such instances but regardless, it would be a time consuming exercise and likely prone to subjectivity and inconsistency from the handler.

The solution for insurance providers is to develop NLP-powered analytical dashboards that support quick decision making, highlight potential fraudulent activity and therefore enable their investigators to prioritise cases based on specifically defined KPIs.

NLP, Text Analysis & Customer Service

In a world that is increasingly focused on SLAs, KPIs and ROIs, the role of Customer Support and Customer Success, particularly in technology companies, has never been more important to the overall performance of an organization. With the ever-increasing number of startups and innovative companies disrupting pretty much every industry out there, customer experience has become a key differentiator in markets flooded with consumer choice.

Let’s take a look at three ways that NLP and text analysis is helping to improve CX in particular;

  • Chat bots
  • Analyzing customer/agent interactions
  • Sentiment analysis
  • Automated routing of customer queries

Chat bots

It’s safe to say that chat bots are a pretty big deal right now! These conversational agents are beginning to pop up everywhere as companies look to take advantage of the cutting edge AI that power them.

Chances are that you interact with multiple artificial agents on a daily basis, perhaps even without realizing it. They are making recommendations as we online shop, answering our support queries in live chats, generating personalized fitness routines and communicating with us as virtual assistants to schedule meetings.

Screen Shot 2016-09-16 at 12.21.48

A recent interaction I had with a personal assistant bot, Amy
Chat bots are helping to bring a personalized experience to users. When done right, not only can this reduce spend in an organization , as they require less input from human agents, but it can also add significant value to the customer experience with intelligent, targeted and round-the-clock assistance at hand.

Analyzing customer/agent interactions

Interactions between support agents and customers can uncover interesting and actionable insights and trends. Many interactions are in text format by default (email, live chat, feedback forms) while voice-to-text technology can be used to convert phone conversations to text so they can be analyzed.

Listening to their customers

The voice of the customer is more important today than ever before. Social media channels offer a gold mine of publicly available consumer opinion just waiting to be tapped. NLP and text analysis enables you to analyze huge volumes of social chatter to help you understand how people feel about specific events, products, brands, companies, and so on.

Analyzing the sentiment towards your brand, for example, can help you decrease churn and improve customer support by uncovering and proactively working on improving negative trends. It can help show you what you are doing wrong before too much damage has been done, but also quickly show you what you are doing right and should therefore continue doing.

Customer feedback containing significantly high levels of negative sentiment can be relayed to Product and Development teams to help them focus their time and efforts more accordingly.

Because of the multi-channel nature of customer support, you tend to have customer queries and requests coming in from a variety of sources – email, social media, feedback forms, live chat. Speed of response is a key performance metric for many organizations and so routing customer queries to the relevant department, in as few steps as possible, can be crucial.

NLP is being used to automatically route and categorize customer queries, without any human interaction. As mentioned earlier, this goes beyond simple keyword-matching with text analysis techniques being used to ‘understand’ the context and category of a piece of text and classify it accordingly.


As the sheer amount of unstructured data out there grows and grows, so too does the need to gather, analyze and make sense of it. Regardless of the industry in which they operate, organizations that focus on benefitting from NLP and text analysis will no doubt gain a competitive advantage as they battle for market share.


Text Analysis API - Sign up





Deep Learning is a new area of Machine Learning research that has been gaining significant media interest owing to the role it is playing in artificial intelligence applications like image recognition, self-driving cars and most recently the AlphaGo vs. Lee Sedol matches. Recently, Deep Learning techniques have become popular in solving traditional Natural Language Processing problems like Sentiment Analysis.

For those of you that are new to the topic of Deep Learning, we have put together a list of ten common terms and concepts explained in simple English, which will hopefully make them a bit easier to understand. We’ve done the same in the past for Machine Learning and NLP terms, which you might also find interesting.


In the human brain, a neuron is a cell that processes and transmits information. A perceptron can be considered as a super-simplified version of a biological neuron.

A perceptron will take several inputs and weigh them up to produce a single output. Each input is weighted according to its importance in the output decision.

Artificial Neural Networks

Artificial Neural Networks (ANN) are models influenced by biological neural networks such as the central nervous systems of living creatures and most distinctly, the brain.

ANN’s are processing devices, such as algorithms or physical hardware, and are loosely modeled on the cerebral cortex of mammals, albeit on a considerably smaller scale.

Let’s call them a simplified computational model of the human brain.


A neural network learns by training, using an algorithm called backpropagation. To train a neural network it is first given an input which produces an output. The first step is to teach the neural network what the correct, or ideal, output should have been for that input. The ANN can then take this ideal output and begin adapting the weights to yield an enhanced, more precise output (based on how much they contributed to the overall prediction) the next time it receives a similar input.

This process is repeated many many times until the margin of error between the input and the ideal output is considered acceptable.

Convolutional Neural Networks

A convolutional neural network (CNN) can be considered as a neural network that utilizes numerous identical replicas of the same neuron. The benefit of this is that it enables a network to learn a neuron once and use it in numerous places, simplifying the model learning process and thus reducing error. This has made CNNs particularly useful in the area of object recognition and image tagging.

CNNs learn more and more abstract representations of the input with each convolution. In the case of object recognition, a CNN might start with raw pixel data, then learn highly discriminative features such as edges, followed by basic shapes, complex shapes, patterns and textures.





Recurrent Neural Network

Recurrent Neural Networks (RNN) make use of sequential information. Unlike traditional neural networks, where it is assumed that all inputs and outputs are independent of one another, RNNs are reliant on preceding computations and what has previously been calculated. RNNs can be conceptualized as a neural network unrolled over time. Where you would have different layers in a regular neural network, you apply the same layer to the input at each timestep in an RNN, using the output, i.e. the state of the previous timestep as input. Connections between entities in a RNN form a directed cycle, creating a sort of internal memory, that helps the model leverage long chains of dependencies.

Recursive Neural Network

A Recursive Neural Network is a generalization of a Recurrent Neural Network and is generated by applying a fixed and consistent set of weights repetitively, or recursively, over the structure. Recursive Neural Networks take the form of a tree, while Recurrent is a chain. Recursive Neural Nets have been utilized in Natural Language Processing for tasks such as Sentiment Analysis.





Supervised Neural Network

For a supervised neural network to produce an ideal output, it must have been previously given this output. It is ‘trained’ on a pre-defined dataset and based on this dataset, can produce  accurate outputs depending on the input it has received. You could therefore say that it has been supervised in its learning, having for example been given both the question and the ideal answer.

Unsupervised Neural Network

This involves providing a programme or machine with an unlabeled data set that it has not been previously trained for, with the goal of automatically discovering patterns and trends through clustering.

Gradient Descent

Gradient Descent is an algorithm used to find the local minimum of a function. By initially guessing the solution and using the function gradient at that point, we guide the solution in the negative direction of the gradient and repeat this technique until the algorithm eventually converges at the point where the gradient is zero – local minimum. We essentially descend the error surface until we arrive at a valley.

Word Embedding

Similar to the way a painting might be a representation of a person, a word embedding is a representation of a word, using real-valued numbers. Word embeddings can be trained and used to derive similarities between both other words, and other relations. They are an arrangement of numbers representing the semantic and syntactic information of words in a format that computers can understand.

Word vectors created through this process manifest interesting characteristics that almost look and sound like magic at first. For instance, if we subtract the vector of Man from the vector of King, the result will be almost equal to the vector resulting from subtracting Woman from Queen. Even more surprisingly, the result of subtracting Run from Running almost equates to that of Seeing minus See. These examples show that the model has not only learnt the meaning and the semantics of these words, but also the syntax and the grammar to some degree.



So there you have it – some pretty technical deep learning terms explained in simple english. We hope this helps you get your head around some of the tricky terms you might come across as you begin to explore deep learning.



News API - Sign up



We recently added a feature to our API that allows users to classify text according to their own labels. This unsupervised method of classification relies on Explicit Semantic Analysis in order to determine how closely matched a piece of text and a label or tag are.

This method of classification provides greater flexibility when classifying text and doesn’t rely on a particular taxonomy to understand and categorize a piece of text.

Explicit Semantic Analysis (ESA) works at the level of meaning rather than on the surface form vocabulary of a word or document. ESA represents the meaning of a piece text, as a combination of the concepts found in the text and is used in document classification, semantic relatedness calculation (i.e. how similar in meaning two words or pieces of text are to each other), and information retrieval.

In document classification, for example, documents are tagged to make them easier to manage and sort. Tagging a document with keywords makes it easier to find. However, keyword tagging alone has it’s limitations; searches carried out using vocabulary with similar meaning, but different actual words may not uncover relevant documents. However classifying text semantically i.e. representing the document as concepts and lowering the dependence on specific keywords can greatly improve a machine’s understanding of text.

How is Explicit Semantic Analysis achieved?

Wikipedia is a large and diverse knowledge base where each article can be considered a distinct concept. In Wikipedia based ESA, a concept is generated for each article. Each concept is then represented as a vector of the words which occur in the article, weighted by their tf-idf score.

The meaning of any given word can then be represented as a vector of that word’s relatedness, or “association weighting” to the Wikipedia based concepts.

“word” 	---> <concept1, weight1>, <concept2, weight2>, <concept3, weight3> - - -

A trivial example might be:

““Mars” -----> <planet, 0.90>, <Solar system, 0.85>, <jupiter 0.30> - - - -

Comparing two word vectors (using cosine similarity) we can get a numerical value for the semantic relatedness of words i.e. we can quantify how similar the words are to each other based on their association weighting to the various concepts.

Note: In Text Analysis a vector is simply a numerical representation of a word or document. It is easier for algorithms to work with numbers than with characters. Additionally, vectors can be plotted graphically and the “distance” between them is a visual representation of how closely related in terms of meaning words and documents are to each other.

Explicit Semantic Analysis and Documents

Larger documents are represented as a combination of individual word vectors derived from the words within a document. The resultant document vectors are known as “concept” vectors. For example, a concept vector might look something like the following:

“Mars” 		---> <planet, 0.90>, <Solar system, 0.85>, <jupiter 0.30> - - - -
“explorer” 	---> <adventurer, 0.89>, <pioneer, 0.70>, <vehicle, 0.20> - - -
;			;			;			;
;			:			:			:
“wordn” 	---> <conceptb, weightb>, <conceptd, weightd>, <conceptp, weightp> - - -

Graphically, we can represent a concept vector as the centroid of the word vectors it is composed of. The image below illustrates the centroid of a set of vectors i.e. it is the center or average position of the vectors.


So, to compare how similar two phrases are we can create their concept vectors from their constituent word vectors and then compare the two, again using cosine similarity.

ESA and Dataless Classification

This functionality is particularly useful when you want to classify a document, but you don’t want to use a known taxonomy. It allows you to specify on the fly a proprietary taxonomy on which to base the classification. You provide the text to be classified as well as potential labels and through ESA it is determined which label is most closely related to your piece of text.


ESA operates at the level of concepts and meaning rather than just the surface form vocabulary. As such, it can improve the accuracy of document classification, information retrieval and semantic relatedness.

If you would like to know more about this topic check out this excellent blog from Christopher Olah and this very accessible research paper from Egozi, Markovitch and Gabrilovich, both of which I referred to heavily when researching this blog post.

Keep an eye out for more in our “Text Analysis 101” series.


Text Analysis API - Sign up


Online news aggregation services, are sites that allow you to view content from online newspapers, media outlets, blogs etc. in one place. They allow you to filter the news that you receive by category, topic, keyword, date and outlet e.g. technology, food, entertainment, in the last day, week or year and on and on. They save us a lot of time and hassle as they allow us to receive the news we want without having to hop from site to site to read updates from our favourite authors or follow topics we’re most interested in. In other words, they provide a consolidated space with the latest news and updates from many different news sources.


text analysis and News Aggregation


News Aggregators have been around for quite a while. The most progressive apps are the ones that learn and keep track of our likes and interests, in order to uncover and suggest relevant and new content which we may not have been aware of.

How does News Aggregation work?

Generally a content provider will publish their content via a feed link, which News Aggregators will subscribe to. From then on the aggregator will be informed when new content is available. These feeds from the various news sources are commonly referred to as RSS and/or Atom feeds.

For machines to analyze and attempt to understand this content they often rely on elements of Natural Language Processing and Text Analysis.

Why the need for Text Analysis?

If you consider that a news aggregator might be dealing with 50,000 plus articles per day, you can quickly see, why being able to analyze content automatically is an essential part of the process. Even if we allowed two minutes for each article to be read and classified by a human (which is ridiculously fast) it would take almost 70 days of nonstop work to get through the 50K articles. Clearly then this is a task for machines and in particular Machine Learning and Natural Language Processing in the form of Text Analysis.

What does the Text Analysis Process for News Aggregation look like?

Article Extraction

One of the first tasks to be completed by a text analysis engine when presented with an article or URL is to strip away the clutter and extract the main text and media. This process is generally referred to as Article Extraction. The story may be processed further to summarize the article in some way, this can be useful when presenting the stories for consumption, as readers will generally spend less than 3 seconds in deciding whether or not to click the article to give it further attention.

Classification based on Named Entity Extraction

Once the text of an article has been extracted it is passed to another part of the analysis engine for classification i.e. to determine whether the story is about Technology, Arts, Entertainment, Business, Finance, etc. Classification is partly achieved by extracting the named entities such as people, places, organisations, keywords, dates, twitter handles etc. from the article. Proper categorization is critical as an article is only valuable if its audience can find it. Classification tags an article with metadata from up to 500 categories, which conform to the IPTC NewsCode taxonomy.

Further Classification and Organisation based on Concept Extraction

More sophisticated analysis engines can also extract concepts from an article. Focusing on concepts and not purely relying on keywords, when analyzing text, results in better tagging and allows aggregation services to cluster similar stories together. For example an article on the current state of the Japanese economy when passed to AYLIEN’s Concept Extraction API endpoint yielded the concepts “deflation“, “public debt“, “sales tax“, “economist“, “gross domestic product“, “abenomics“, “moody’s analytics” and “bond” among others.

A key feature of a concept extraction system is the ability to provide what is called “word sense disambiguation” i.e. the ability to realise that in a tech article that mentions Steve Jobs, the word “Apple” is more like to refer to the company than the fruit! Extracting concepts can also be coupled with Topic Modelling and Clustering, which allow the reader to follow stories as they progress through time and also allows the system to uncover and present similar stories while removing duplicate or near duplicate articles.

Going a level deeper in understanding text

Context is key to the way News Aggregators provide relevant content. We chatted with Drew Curtis, the creator of who had this to say about the current state of News Aggregators; “They’re looking at article content only, I’m arguing that the next level, is taking content that people maybe don’t care as much about and adding context to make them care.” Drew also gave us a nice example, Net Neutrality, “it’s been around for years, but only recently did anyone figure out how to make the average person care about it.”

More sophisticated analysis engines can extract intent and high-level concepts from an article, which when combined allow you to add context, and to better understand the story, not just an article. Traditional news aggregators are only going so deep, in attempting to understand text and pushing content to readers based on topics or keywords. The next wave of news aggregators need to “understand” and distribute content, not just “tag” and distribute content.

Sentiment Analysis of content can also help add context. It can allow machines to detect the tone of a text, whether it’s positive or negative, subjective or objective. Keeping track of the types of articles (sentiment, topics, categories) that a reader consumes, shares or upvotes will allow a system to learn about a reader’s preferences and present articles that are more and more in tune with the readers tastes.

Spreading the Word

Sharing useful or interesting stories gives us some “social currency” and so we are always keen to pass on articles that we think our friends and colleagues might enjoy or find useful. A good text analysis engine will also aid in this process by, for example, providing hashtag suggestions which allow for more effective sharing of content across social media sites.


Text Analysis provides the tools that make it possible for Content Aggregation systems to make sense of the myriad news articles that are published every day and present the reader with articles that are honed to their individual tastes but only when we start focusing on machines “understanding content” before it’s recommended will news aggregators become truly powerful.

Text Analysis API - Sign up


The publishing industry has changed dramatically. Mainstream newspapers and magazines have given way to desktop publishing and the Internet as economics have changed the game.

Let’s look at the main drivers behind this change.

More competition – Self-publishing has moved into mainstream online channels. The increase of entrants into the market means more choice and much of it is free.

The introduction of Apps – Apps create a more engaging and effective way to interact with an audience. The ever increasing ownership and usage of mobile devices mean that more readers can be reached.

Real-time social sharing – It can be argued that Facebook and Twitter provide the most up-to-date news channels. The sharing dimension can also be very appealing to readers who want to contribute to reporting the news, as opposed to passively receiving it.

Shift from mass to a niche market – Before the inception of the internet, successful newspapers and magazines appealed to the general public. Today, however, digital publishing has far lower production costs and a far greater reach to service niche markets.




According to Ofcom, use of the internet to consume news has increased for computers, laptops, tablets and mobiles since 2013, while TV has seen a small decrease from 78 to 75 percent. Use of any type of an online platform to consume news increased from 32 to 41 percent this year, and is now higher than the use of newspapers (40 percent) and radio (36 percent).

This shift in how we consume news has forced publishers to change their strategies in order to compete. More specifically, publishers understand that their content needs to be more relevant, richer, interactive, timely and discoverable.

An Example

Let’s say an editor hears about a bus crashing near a major school, close to a fire station. The editor wants to write about the story and they want to include historical information about the cause of bus crashes (e.g. time of day, time of year, equipment malfunction, driver error etc based on other bus crashes for the past 30 years) to give the story more depth and context. In most cases, a journalist would have tagged documents with dates and keywords. This is generally a manual process and therefore documents could easily be left untagged due to human error. Tags may be missed if different individuals are involved in the process. Some people may also not be as thorough as others. For instance, if somebody simply tags the document “bus crash”, it might be very difficult to find similar stories, much less analyze what happened in other relevant crashes.

Enter Text Analytics

By incorporating text analysis software, historical data can be culled for relevant concepts, entities, sentiments and relationships to produce a far richer tagging system. Information about the bus crash such as the type of bus involved, location, times, dates and causes could be extracted from the text. These entities would be kept as metadata about the articles and used when needed.

The Benefits

Text Analysis software can ‘understand’ the relationships between articles and provide suggestions to similar content. This benefits the editor as he or she can be far more productive as they navigate easily through a complete dataset. Research is, therefore, easier, a lot of time is saved and the end product is often richer, as the editor can reference similar events and give more depth and context to their article.

Richer, more relevant content can improve user engagement, meaning more page views by a narrower market, which can increase the potential for generating advertising revenue. A consumer that is more engaged with their content is far more likely to subscribe to niche newsletters, which can allow publishers to develop these relationships further and upsell their service to their consumers.

Our conclusion

Text analytics is essential in the publishing industry because it saves time when gathering data, allows you to produce richer content to attract more readers in narrower markets, where consumers are often more loyal.

Text Analysis API - Sign up