Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78


Welcome to the first in a series of monthly posts where we’ll be showcasing the power of our News API by looking back at online news stories to uncover emerging insights and trends from topical news categories.

For February, we’re taken a look at the following three news categories;

  1. Arts & Entertainment
  2. Science
  3. Politics

and for each category we have performed the following analysis;

  • Publication volumes over time
  • Top stories
  • Most shared stories on social media
  • Most mentioned topics

Try it yourself

We’ve included code snippets for each of the analyses above so you can follow along or modify to create your own search queries.

If you haven’t already signed up to our News API you can do so here with a free 14 day News API trial.

Arts & Entertainment

The graph below shows publication volumes in the Arts & Entertainment category throughout the month of February 2017.

Note: All visualizations are interactive. SImply hover your cursor over each to explore the various data points and information.

Volume of stories published: Arts & Entertainment

From the graph above we can see a number of spikes indicating sharp rises in publication volumes. Let’s take a look at the top 3;

Top stories

The three stories that contributed to the biggest spikes in news publication volumes;

  1. The Academy Awards (aka the Oscars)
  2. The BAFTAs & Grammys take place on the same night
  3. Reviews of Lady Gaga’s performance during the Super Bowl halftime show

It is interesting to note that the Oscars generated more content than both the BAFTAs and Grammys combined.

While sport-related content is not included in this category, we did see a spike in stories mentioning Lady Gaga after her Super Bowl performance in the halftime show. These stories focused mostly on the singer’s performance and her choice of outfits.

Try it yourself – here’s the query we used for volume by category

Read more: We analyzed 2.2 million Super Bowl tweets to gauge public and media reaction to the game itself and the brands/celebrities featuring throughout;

Sentiment Analysis of 2.2 million Super Bowl tweets from Super Bowl 51

Using NLP to understand how Twitter and the media reacted to the Super Bowl 51 ads battle

Most mentioned topics

From the 52,000+ articles we sourced for the Arts & Entertainment category in February we looked at the most mentioned topics;

With three of the biggest award events in the film and music industries taking place in February, it is no surprise to see these two topics as the top two most mentioned.

Try it yourself – here’s the query we used for most mentioned topics

Read more: We looked at the online news media reaction the the 89th Academy Awards;

The Oscars: analyzing 22,000 news stories using Natural Language Processing & Text Analysis

Most shared on social media

What were the most shared stories on social media? We analyzed share counts from Facebook, LinkedIn and Reddit to see what type of content is performing best on each channel.


  1. 50 Most Popular Women on the Web, Per Google Search Results (ABC News. 152,570 shares)
  2. Adam Levine to Receive Star on Hollywood Walk of Fame (Billboard. 103,658)


  1. 14 of the Best Brands on Instagram Right Now (HubSpot. 2,158 shares)
  2. PwC issues apology after Oscars best picture envelope mistake (The Guardian. 1,354)


  1. Louis C.K. Inks Deal With Netflix for Two Stand-Up Specials (Variety. 32,534 points)
  2. Childish Gambino is ‘definitely’ working with Chance the Rapper (MTC. 20,013 points)

Try it yourself – here’s the query we used for social shares


We sourced a total of 26,000+ news stories categorized under Science and found that scientific discoveries tend to generate the most news content.

Volume of stories published: Science

Top stories

  1. NASA announce the discovery of 7 Earth-like planets
  2. Scientists turn food poisoning microbe into powerful cancer fighter
  3. Scientists reveal a new 8th continent called ‘Zealandia’

On February 22 NASA were excited to share the news that they had discovered 7 Earth-like planets just 40 light years away that could potentially harbour alien life.

Closer to home, it was discovered that an eighth continent Zealandia existed and that a food poisoning microbe could be turned into a powerful cancer fighter.

Most mentioned topics

Although it didn’t generate quite as much news content as the previously mentioned stories, the drought in California and other parts of the US was evident among the most mentioned topics in the Science category.

Most shared on social media


  1. Exxon knew of climate change in 1981, email says – but it funded deniers for 27 more years (The Guardian. 144,647 shares)
  2. Earth Day picked as date for science march on Washington (CNN. 81,870 shares)


  1. Science Says These Five Things Prove You’re Smart (Forbes. 2,857 shares)
  2. NASA Announces Discovery Of 7 Earth-Sized Planets In Nearby Star System (Real Clear Politics. 1,423 shares)


  1. China is now the world’s largest solar power producer (Digital Trends. 52,397 points)
  2. ‘Shell knew’: oil giant’s 1991 film warned of climate change danger (The Guardian. 32,414 points)

Law, Government & Politics

For the Law, Government & Politics category we thought we would try something a little different. The chart below shows two separate volume trends. The red volume represents all 163,000+ stories published in this category in February. The yellow volume represents all stories from this category that did not mention Donald Trump. The total: 48,000 stories.

This means that around 70% of all stories in this category mentioned the US President.

Volume of stories published: Law, Government & Politics with/without Trump

Most mentioned topics

Try it yourself – here’s the query we used for category volumes without Trump

With our previous observation in mind, you probably won’t be surprised by the most mentioned topics from the 163,000+ stories published in the Law, Government & Politics category.

Most shared on social media


  1. Alabama immigration: crops rot as workers vanish to avoid crackdown (The Guardian. 217,679 shares)
  2. 18 WTF Moments From Trump’s Unhinged Press Conference (Rolling Stone. 113,397 shares)


  1. It’s official, Narendra Modi is the most followed world leader on Facebook (Quartz. 2,540)
  2. What CEOs say happened in Trump’s closed-door meeting with big pharma (Washington Post. 1,570 shares)


  1. US appeals court upholds suspension of Trump travel ban (CNBC. 90,980 points)
  2. Fox News’s ‘Swedish defence advisor’ unknown to country’s military officials (The Guardian. 74,772 points)


We hope that this post has given you an idea of the kind of in-depth and precise analyses that our News API users are performing to source and analyze specific news content that is of interest to them.

Ready to try the News API for yourself? Simply click the image below to sign up for a 14-day free trial.

News API - Sign up



The 89th Academy Awards, better known as the Oscars, took place in Los Angeles over the weekend with Hollywood’s finest walking the famed red carpet with hopes of taking home the movie industry’s most prestigious award.

With an event of such magnitude and popularity naturally comes a whole lot of hype and media commentary, both on social media and in the news. Today we’re going to take a look at the latter by leveraging the power of Natural Language Processing and Text Analysis to analyze and gather insights from news content relating to the Oscars.

In doing so, we’ve used the AYLIEN News API to collect and analyze news stories, and Tableau to visualize our results.

How did the media react to the Oscars?

To begin, we wanted to see how the Oscars affected news publication volumes. In doing so, we looked at the Movies category for the month of February and graphed the daily story volumes below;

Daily story publication volumes: Movies category

As you can see, there is a clear spike is story volumes around on the day of the event, and on the days around it. On the day the Awards took place, we see a 350% increase in story volumes compared to the average day in February.

Which publishers produced the most Oscars-related content?

Next we looked at the sources of all Oscars-related news content published in February to uncover the most active publishers.

Note: You can hover over each individual bubble to see precise story volumes.

Story volume per publisher

Which Oscars-related stories were shared most on social media?

We analyzed share counts from Facebook, LinkedIn and Reddit to see what type of content is performing best on each channel.


  1. What We Lose When We Give Awards to Men Like Casey Affleck (Elle. 53,327 shares)
  2. Katherine Johnson, real-life subject of ‘Hidden Figures’ receives standing ovation at Oscars (ABC News. 48,155 shares)


  1. PwC issues apology after Oscars best picture envelope mistake (The Guardian. 1,354 shares)
  2. PwC Partner at Oscars Tweeted Backstage Minutes Before Best Picture Mix-Up (Wall Street Journal. 1,253 shares)


  1. Trump Lashes Out at New York Times Ad Set to Air on the Oscars Tonight (Ad Week. 15,507 points)
  2. Lagerfeld: Meryl Streep Passed On Oscar Dress When Chanel Refused to Pay (WWD. 6,962 points)


It’s interesting to see the different types of content that is shared most on each social network. Facebook users tend to share stories that are perhaps written with the intention of generating an emotional response. While LinkedIn, being a professional network, focuses on stories relating to organizations, in this case PwC who were at the center of an embarrassing mix-up at the Oscars.

Which Best Picture nominees were mentioned most in the news?

We analyzed the nine nominees for Best Picture from February 1 to just before the event to see which movies were receiving the most attention in the press, before the results were announced.

Most mentioned movies prior to the event

Not only was La La Land the favourite to win the Best Picture Oscar, it was also the most mentioned of the nine nominees in the news in the lead up to the Awards. The actual winner, Moonlight, was perhaps surprisingly only the fifth most mentioned out of the nine nominees.

Which individuals were mentioned most in the news?

As one of the entertainment industry’s showcase annual events, the Academy Awards is never short of big name actors, celebrities and performers.

By extracting mentions of people from our analyzed news content we can see which individuals received the most media attention.

Most mentioned individuals in Oscars-related news content

At the top of our most-mentioned list is Best Actress winner Emma Stone and Awards host Jimmy Kimmel. It’s perhaps interesting to note the position of Best Actor winner Casey Affleck, who came in as just the seventh most mentioned individual.

In case you’re wondering about the appearance of Donald Trump here, he got a bit of a roasting on the night by Jimmy Kimmel! Let’s now take a look at the media’s reaction to the host’s performance.

Jimmy Kimmel: Hit or Miss?

Using Sentiment Analysis we can uncover whether an article has been written in a positive, negative or neutral way. To analyze the media’s reaction to Jimmy Kimmel, we only retrieved stories that mentioned him in the title. We find that this significantly increases the chances that the story (or at least the majority of it) is about the individual in question, rather than just mentioning him in the body text of the article

Here’s what we found;

News media reaction to Oscars host Jimmy Kimmel


We hope that this post has given you an idea of the kind of in-depth and precise analyses that our News API users are performing to source and analyze specific news content that is of interest to them.

Ready to try the News API for yourself? Simply click the image below to sign up for a 14-day free trial.

News API - Sign up



There is a wealth of information hidden in the contents and the markup of a web page that can be extremely useful when trying to understand what a page is all about while trawling the web. One classic example would be tags: those short phrases or keywords that bloggers and publishers use to describe what a webpage, article or blog post is about. Tags can be rendered as visual elements on the page, or hidden away using `meta` attributes.

Screen Shot 2017-02-21 at 18.34.59

example of meta attributes

It is obvious that by extracting these tags we can learn a whole lot about any blog post or article that we are analyzing. They describe a piece of content the way their author or editor would, and they may contain various pieces of information such as the high-level topical category, the entities (people, places, organizations, etc.) mentioned, or the concepts that the article is about. This makes them an excellent source of information to leverage when classifying web pages.

The problem with extracting these tags is the way webpages are structured on the web, and the way they are expressed differs greatly across web pages and sites. Different Content Management Systems used by different blogs and news websites each have their own way of presenting metadata such as tags, making it difficult to access and parse this information.

tags dark grey (3)

examples of visual tags from various blogging platforms

Today we are announcing the launch of a much-requested addition to our Article Extraction API that provides a uniform and standard interface for extracting tags from any blog post or article on the web.

Tag Extraction

We’ve supercharged our article extraction feature in the Text Analysis API to make it even easier to extract useful information from a webpage. Through our Article Extraction endpoint, users already have the ability to extract metadata such as author name, publish date, main image, article title and the main body of text from a page. But in a lot of cases, a web page will contain other useful information about that page often in the form of tags..

The Tag Extraction feature will identify and extract any relevant tags present on the page no matter the structure of the page or where the tags are present.

So how can these tags be used?

These extracted tags can be utilized in a number of ways;

To tag or classify a web page

The tags extracted can often give a very useful insight into what a page is about. These tags are often manually added by the author, an editor or the web designer meaning they can provide very accurate descriptions of what a page is about.

Take for example these tags extracted from an article on Artificial Intelligence on Wired below.

author :  Cade Metz,
image :,
tags : [
neural networks,
Artificial Intelligence
article :  For almost three weeks, Dong Kim sat at a casino in Pittsburgh and played poker against a machine. But Kim wasn’t just a...,
videos : [ ... ],
title :  Inside the Poker AI That Out-Bluffed the Best Humans,
publishDate :  2017-02-01T07:00:43+00:00,
feeds : [ ... ]

Classify a page according to a taxonomy

So while these extracted tags can be useful in understanding a webpage they don’t necessarily help if your aim is to classify content based on a particular taxonomy.

The tags can also be used to classify a piece of content or a page into predetermined categories according to a particular taxonomy. For example using our classification by taxonomy features, these tags provide users with the ability to categorize content efficiently.

First we extract the tags from an Irish times article on Conor McGregor

author :  Emmet Malone,
image :,
tags : [
Other Sports,
Nate Diaz,
Dana White,
Conor Mcgregor
article :  When this imbroglio finally blows over, we can explore what Conor McGregor has against Connecticut. For now, let’s conce...,
videos : [ ... ],
title :  Conor McGregor lays cards on table in poker game with UFC,
publishDate :  2016-04-21T22:48:00+00:00,
feeds : [ ... ]

You’ll see the tags present in the results above.

Next, we use our Classification by Taxonomy feature to automatically categorize the content. You’ll see from the results below that it is correctly categorized as Sports and Martial Arts.

text :  UFC, Other Sports, Nate Diaz, Other, Dana White, Sport, Conor Mcgregor,
taxonomy :  iab-qag,
language :  en,
categories : [
confident :  true,
score :  0.22010621132863611,
label :  Sports,
links : [ ... ],
id :  IAB17
confident :  true,
score :  0.11470804569304427,
label :  Martial Arts,
links : [ ... ],
id :  IAB17-20

Classifying text-light web pages

In most NLP driven page classification problems you rely heavily on the main body of text present on a page to give context and understanding. However, in some cases, web pages may contain little or no text which make it harder to classify or categorize. Common examples include pages that contain only a video, or a collection of photos or product style pages like the one below.

Screen Shot 2017-02-21 at 15.51.58

an example of a product page that is very light on text

As an example, let’s take the page above, it’s a product page taken from the Best Buy website. What you’ll notice is, there’s very little text on the page to use for an appropriate analysis target apart from a few headings. On top of that there are also lot’s of other elements like ads and buttons on the page which make it even harder to scrape. Now just imagine how different every product page is, it’s almost impossible to build an efficient script or bot that’s going to classify these pages successfully.

Using the Tag Extraction feature however means you can leverage other elements of the page as previously explained and shown in the results below. As you can see from the JSON results, the listed tags are very precise, and do an excellent job of describing the page in question.

author :  ,
image :,
tags : [
Smart Lights,
Switches & Plugs,
Smart Home,
Home Living,
Best Buy Canada,
Smart Lighting
article :  The Hue Phoenix rises to the occasion when you want to create ambience and mood lighting. Using the Philips Hue app, you...,
videos : [ ... ],
title :  Philips Hue Phoenix Table Lamp - Opal White,
publishDate :  ,
feeds : [ ... ]


Whatever your reason is for understanding web pages at scale, this new feature provides you with a fantastic opportunity to dive even deeper into the web content they analyze and analyze and classify a wider variety of web pages.

Want to try it for yourself? Click the image below to sign-up to get Free access and 1,000 calls per day with our Text Analysis API.

Text Analysis API - Sign up



Last week we showed you how we analyzed 2.2 million tweets associated with Super Bowl 51 to gauge the public’s reaction to the event. While the Atlanta Falcons and New England Patriots waged war on the field, a battle of ever-increasing popularity and importance was taking place off it. We are of course referring to the Super Bowl ads battle, where some of biggest brands on Earth pay top dollar for a 30-second slot during one of sport’s greatest spectacles.

With roughly 35% of the US population tuning in to watch this year’s Super Bowl, it’s easy to see why brands pay what they do to be involved, which is in the region of $5 million for 30 seconds of airtime. Breaking it down, that’s over $166,000 per second!

So after analyzing how Twitter reacted to the game itself, we wanted to once again dive into the much anticipated and scrutinized battle of the brands, this time by looking at both Twitter’s reaction as well online news content.

Our Process

In particular, we were interested in uncovering and investigating the following key areas:

  • Volume of tweets before, during and after the game
  • Sentiment of tweets before, during and after the game
  • Brand-specific public reactions
  • Brand-specific tweet volumes
  • Reaction from online news
  • Most mentioned brands and individuals in online news

To do so, we looked at both Twitter and the news.


We used the Twitter Streaming API to collect a total of around 2.2 million tweets that mentioned a selection of game and team-related keywords, hashtags and handles. Using the AYLIEN Text Analysis API, we then analyzed each of these tweets and visualized our results using Tableau.


To analyze the reaction in online news content, we performed specific search queries using the AYLIEN News API, again using Tableau to visualize.

Our Predictions

Prior to the Super Bowl, we looked at and analyzed brand mention volumes in online news in an attempt to predict how the public would react to their ads when they aired during the game.

From our analysis, we selected our top 3 Super Bowl ads to watch out for and made predictions for each. We’ll expand on each prediction below and also look at the performance of some other interesting brands. Here are the brands we analyzed;

  • Pepsi
  • Budweiser
  • Avocados from Mexico
  • Intel
  • KFC
  • Snickers
  • T-Mobile
  • KIA

You can also check out the original blog post here: Using NLP & media monitoring to predict the winners and losers of the Super Bowl 51 ads battle.

So without further ado, let’s see how we did!

Tweet volumes by brand

To begin, we wanted to see which brands performed best in terms of tweet mentions. How many tweets contained a mention of each brand?

The chart below shows how brand-related chatter on Twitter developed before, during and after the game.

Straight away we can see that two brands in particular considerably outperformed the rest when it came to spikes in mention volumes; Pepsi and Avocados from Mexico.

These two brands along with Budweiser make up our top 3, with the others really failing to make much of an impact in comparison.

Perhaps the most interesting observation from this chart is the double volume spike for Pepsi, which came pre-game and mid-game. Let’s take a look the reason behind this;


Our pre-game predictions

  • Huge Twitter mention volumes for Pepsi, owing to Lady Gaga’s performance.
  • Low mention volumes for LIFEWTR and Pepsi Zero Sugar.
  • Tame public reaction to LIFEWTR commercial and very low YouTube views.


In terms of tweet mention volume, Pepsi was a clear overall winner. The beverage giant focused their efforts on generating awareness around two new products; LIFEWTR and Pepsi Zero Sugar. LIFEWTR was given it’s own commercial in the first quarter, while Zero Sugar sponsored the halftime show.

Judging by the sheer volume of tweet chatter around Pepsi, you might assume that their ad and new products have been well received by the viewing public. However, as we predicted prior to the game, Pepsi’s high mentions volume was mostly down to the fact that they sponsored the halftime show, that starred Lady Gaga. The two spikes visible in the chart below actually have very little to do with either product. Rather, they represent 1) a barrage pre-game good luck Gaga tweets and 2) Twitter’s reaction to the singer’s halftime show performance.

Sentiment analysis of Pepsi tweets

The chart below shows volumes of positive and negative tweets before, during and after the game. It should be noted that the majority of tweets collected have neutral sentiment, and offer no opinion either way. We therefore exclude tweets with neutral sentiment.

Less than 5% of tweets mentioning Pepsi also included mentions of LIFEWTR or Zero Sugar, such was the dominance of Lady Gaga on Twitter. While Pepsi had a strong brand presence throughout the event, they perhaps failed to highlight their two new products. The chart below compares mentions of Pepsi and Lady Gaga, with an obvious winner;

Further evidence of Pepsi’s apparent failure to highlight their new products comes from the LIFEWTR ad performance on YouTube, which currently has 1.2 million views. When compared to the likes of KIA and Mr. Clean whose ads have 21.6 and 14.5 million views respectively, you can see how little of an impact the commercial had on viewers.

Watch: Pepsi’s Super Bowl ad for LIFEWTR


Our pre-game predictions

  • Most controversial ad this year
  • Ad content will be irrelevant, and a political debate will rage on Twitter


Budweiser’s Super Bowl ad, titled Born The Hard Way, depicts Adolphus Busch, the co-founder of Anheuser-Busch, arriving in the US from Germany with a dream of opening his own beer brewery. With the immigrant-theme and opening line “You don’t look like you’re from around here”, the ad unintentionally fuelled a political debate online as Trump supporters saw it as a clear dig at the President’s planned travel ban, while Trump opposers saw it as political statement and celebration of immigrant history in the US.

Sentiment analysis of Budweiser tweets

The high volumes of both positive and negative tweets certainly backs up our prediction of Budweiser airing the most controversial ad of Super Bowl 51. While positive sentiment outweighed negative throughout the event, it is clear to see there was a strong split in opinion here.
As expected, the ad itself was minimally discussed on Twitter. Rather, it was seen as a political statement and one that many felt they needed to be either for or against. On one side of the fence, we had tweeters threatening to never buy Budweiser products again and using #boycottbudweiser in their tweets.

On the other side, we had people declaring their love for the brand, and encouraging others to go out and buy the beer!

Although Budweiser claim the ad was shot in Summer of 2016, long before the current controversy around Trump’s travel ban came to the fore, we are left to wonder whether their timing was mere coincidence or a well planned publicity stunt.

Watch: Budweiser’s Super Bowl ad “Born The Hard Way”


Our pre-game predictions

  • Live format will inspire and drive high social engagement.
  • A popular cast, inclusion of horses and a fun theme will see Snickers near the top of our most liked ads in terms of positive Twitter sentiment.


Snickers made Super Bowl history this year by being the first brand to perform and broadcast their commercial live during the event. With the intrigue of a live performance, as well as the inclusion of superstars like Betty White and Adam Driver, we were excited to see how this one played out, particularly the reaction on social media.

Sentiment analysis of Snickers tweets

While we did pretty well predicting the reactions to both the Pepsi and Budweiser ads, we’ll put our hands up here and admit we got this one wrong!

Overall, mentions of Snickers on Twitter were very low. From the tweets we did gather, the vast majority of them had neutral sentiment, meaning viewers really didn’t feel strongly about the ad either way.

Whether they are well received or not, great ads tend to make people feel something. Unfortunately for Snickers, their innovative approach and live broadcast wasn’t enough to make up for an ad that failed to make viewers feel anything, and ultimately it fell flat.

Watch: Snickers LIVE Super Bowl ad

Lady Gaga vs. the brands

We showed earlier the impact that Lady Gaga had on Pepsi’s tweet volumes, but to really drive home the full extent of Twitter’s reaction to the singer during the Super Bowl, we’ve compared total tweet volumes for each of the three brands we’ve looked at in this post and compared them to Lady Gaga’s.

Tweet volumes: Lady Gaga, Pepsi, Budweiser & Snickers

Reaction in online news

Now that we’ve looked at some of the public reaction to Super Bowl 51 on Twitter, we wanted to also look at how at how the news reacted to the event, and in particular the ads battle. To do this, we began by looking at the most mentioned keywords in news stories mentioning “Super Bowl” and “ad” or “commercial”.

Note: We removed obvious, unhelpful and game-related keywords such as Houston, football, Falcons, Tom Brady, etc.

What were the most talked about Super Bowl topics in the news?

The bubble chart below shows the most mentioned topics from online news content in the week immediately following the Super Bowl. The bigger the bubble, the higher the mention count. You can hover over each bubble to view more information and data.

It’s hard to escape politics these days, and the Super Bowl was no exception. With the likes of Budweiser, Airbnb, 84 Lumber, Audi and Coca-Cola all airing ads that related to the current political climate, it is no surprise to see these brands mentioned most alongside political and Donald Trump.

Most mentioned individuals

With brands mostly dominating the previous chart, we decided to narrow our focus on the individuals who were mentioned most. Again, Donald Trump tops the list, followed by Lady Gaga, Melissa McCarthy (KIA ad) and Justin Bieber (T-Mobile ad).


As we touched on in our previous post, the modern-day Super Bowl is becoming increasingly less about the game itself, and more about the surrounding hype, entertainment and commercial opportunities that come with an event of such magnitude.

With top brands spending a minimum of $5 million for a 30 second commercial, what seems like a heavy investment can result in a big increase in brand awareness as viewers promote ads through shares and likes on social media. There is uncapped potential for these ads too. Create something special that connects with, amuses or fascinates viewers and your ad may be viewed and shared for years to come.

Thanks to advancements in Natural Language Process and Text Analysis, brands can analyze ad performance down to the minutest of details and gain powerful insights in their quest to create commercial content that resonates with viewers.

Text Analysis API - Sign up



Super Bowl 51 had us on the edge of our seats. A dramatic comeback and a shocking overtime finish meant the 111.3 Million Americans who tuned into the event certainly got what they came for. Even though TV viewership was down on previous years, the emotional rollercoaster that was Sunday’s game will certainly go down as one of the greatest.

As with any major sporting event, the Super Bowl creates an incredible amount of hype, particularly on Social Media. All of the social chatter and media coverage around the Super Bowl means it’s a fantastic case study in analyzing the voice of fans and their reactions to the event. Using advanced Machine Learning and Natural Language Processing techniques, such as Sentiment Analysis, we are able to understand how fans of both the Patriots and the Falcons collectively felt at any given moment throughout the event.

Not familiar with Sentiment Analysis? Sentiment Analysis is used to detect positive or negative polarity in text and can help you understand the split in opinion from almost any body of text, website or document.

Our process

We used the Twitter Streaming API to collect a total of around 2.2 million tweets that mentioned a selection of game and team-related keywords, hashtags and handles. Using the AYLIEN Text Analysis API, we analyzed each of these tweets and visualized our results using Tableau. In particular, we were interested in uncovering and investigating the following key areas:

  • Volume of tweets before, during and after the game
  • Sentiment of tweets before, during and after the game
  • Team-specific fan reactions
  • The most tweeted players
  • The most popular Super Bowl hashtag

Keyword selection

We focused our data collection on keywords, hashtags and handles that were related to Super Bowl 51 and the two competing teams, including;

#SB51, #superbowl, #superbowlLI, #superbowl51, #superbowl2017, #HouSuperBowl, #Patriots, #NEPatriots, #newenglandpatriots, #Falcons, #AtlantaFalcons.

Once we collected all of our tweets, we spent a bit of time cleaning and prepping our data set, first by disregarding some of the metadata which we felt we didn’t need. We kept key indicators like time stamps, tweet ID’s and the raw text of each tweet. We also removed retweets and tweets that contained links. From previous experience, we find that tweets containing links are mostly objective and generally don’t hold any author opinion towards the event.

Tools we used


Like with many of our data-driven blog posts, we used Tableau to visualize our results. All visualizations are interactive and you can hover your mouse over each one to dive deeper into the key data points from which they are generated.

We began our analysis of Super Bowl 51 by looking at the overall volume of tweets in the lead up and during the game.

Tweet volume over time: all tweets

The graph below represents minute-by-minute fluctuations in tweet volumes before during and after the game. For reference, we’ve highlighted some of the key moments throughout the event with the corresponding spikes in tweet volume.

As you can see, there is a definite and steady increase in tweet volume in the period leading up to the game. From kickoff, it is then all about reactions to in-game highlights, as seen by the sharp spikes and dips in volumes. We’ve also highlighted the halftime period to show you the effect that Lady Gaga’s performance had on tweet volumes.

Let’s now take a closer look at the pre-game period and in particular, fan predictions.

Pre-game tweet volume: #PatriotsWin vs. #FalconsWin

For the past 13 years, video game developers EA Sports have been using their football game ‘Madden NFL’ to simulate and predict the winner of the Super Bowl each year. They now have a 10-3 success-failure rate, in case you were wondering! In recent times, they have also been inviting the Twittersphere to show their support for their team by using a certain hashtag in their tweets. For 2017, it was #PatriotsWin vs. #FalconsWin.
So, which set of fans were the most vocal in the 2017 #MyMaddenPrediction battle? We listened to Twitter in the build up to the game for mentions of both hashtags, and here’s what we found;

58.57% of tweets mentioned #FalconsWin while 41.43% went with #PatriotsWin. While the Patriots were firm pre-game favorites, it is likely that the neutral football fan on Twitter got behind the underdog Falcons as they chased their first ever Super Bowl win, in just their second appearance.

Tweet volume over time by team

Now that we’ve seen the overall tweet volume and the pre-game #MyMaddenPrediction volumes, let’s take a look at tweet volumes for each individual team before, during and after the game.
The graph below represents tweet volumes for both teams, with the New England Patriots in the top section and the Atlanta Falcons in the bottom section.

Talk about a game of two halves! That vertical line you can see between the two main peaks represents halftime, and as you can see, Falcons fans were considerably louder in the first half of the game, before the Patriots fans brought the noise in the second half as their team pulled off one of the greatest comebacks in Super Bowl history.

Sentiment analysis of tweets

While tweet volumes relating to either team can be a clear indicator of their on-field dominance during various periods of the game, we like to go a step further and look at the sentiment of these tweets to develop an understanding of how public opinion develops and fluctuates.

The charts below are split into two sections;

Top: Volume of tweets over time, by sentiment (Positive / Negative)

Bottom: Average sentiment polarity over time (Positive / Negative)

New England Patriots

What’s immediately clear from the chart above is that, for the majority of the game, Patriots fans weren’t too happy and it seems had given up hope. However, as you can see by the gradual increase in positive tweets sentiment and volume in the final third, their mood clearly and understandably changes.

Atlanta Falcons

In stark contrast to the Patriots chart, Falcons fans were producing high volumes of positive sentiment for the majority of the game, until the Patriots comeback materialized, and their mood took a turn for the worse, as indicated by the drop of sentiment into negative.

Most tweeted individuals

To get an understanding of who people were talking about in their tweets, we looked at the top mentioned individuals. Unsurprisingly, Tom Brady was heavily featured after his 5th Super Bowl triumph.However, the most mentioned individual had no part to play in the actual game.

All notable players and scorers (and even Brady himself) were shrugged aside when it came to who the viewers were talking about and reacting to most on Twitter, as halftime show performer Lady Gaga dominated. To put the singer’s domination into perspective, she was mentioned in nearly as many tweets as Brady and Ryan were combined!

To get an idea of the scale of her halftime performance, check out this incredible timelapse;

Interestingly, national anthem singer Luke Bryan was tweeted more than both the Patriots’ Head Coach Bill Belichick and catch-of-the-game winner Julian Edelman. Further proof, if needed, that the Super Bowl is not just about the game of football, but that it is becoming more and more of an entertainment spectacle off the field.

Most popular Super Bowl hashtags

We saw a variety hashtags emerge for the Super Bowl this year, so we decided to see which were the most used. Here are the top 5 most popular Super Bowl hashtags, which we have visualized with volumes below;






Despite the NFL’s best efforts to get Twitter using #SB51, the most obvious and simple hashtag of #SuperBowl was a clear winner.


There is no other event on the planet that creates as much hype in the sporting, advertising and entertainment worlds. But the Super Bowl as we know it today, is far less about the football and more about the entertainment factor and commercial opportunity. With big brands spending a minimum $5 Million for a 30 second commercial, competition for viewers eyes and more importantly viewers promotion through shares and likes on social media, the Super Bowl has become big business.

In our next installment, we’ve analyzed the chatter around Super Bowl 51 from a branding point of view. We collected and analyzed Twitter data and news and media coverage of the event to pinpoint which brands and commercials joined the Patriots as Super Bowl 51 champions.

Text Analysis API - Sign up



We’re just two days away from seeing the Atlanta Falcons and New England Patriots go head to head at Super Bowl 51 in Texas. With an anticipated viewership of over 100 million people, it’s no surprise that some of the world’s biggest brands are pulling out all the stops in an attempt to win a much anticipated off-field battle. We are of course talking about the annual Super Bowl ads battle, where top brands are willing to cough up over $5 million for just 30 seconds of TV airtime.

Sentiment Analysis of tweets from Super Bowl 2016

Last year, we analyzed 1.8 million tweets during Super Bowl 50 to uncover the best, the worst, and the most controversial ads according to Twitter users. Using advanced Sentiment Analysis techniques, we were able to uncover that Amazon’s star-studded effort was the most popular ad at Super Bowl 50, earning the highest volume of positive tweets. PayPal, on the other hand, found themselves at the opposite end of the positivity scale, receiving the highest volume of negative tweets. And the most controversial? We had a clear winner in that category with Mountain Dew’s Puppy Monkey Baby shocking, confusing and amusing viewers in equal measure!

Of course, it’s not all about those 30 seconds of TV airtime. Brands that create something memorable can reap the rewards long after the final whistle has blown. Popular ads can go viral in minutes, with those that fail to impress being left behind and quickly forgotten. Just take a look at the YouTube views for these three brand ads since Super Bowl 50;

YouTube views since Super Bowl 50

With close to 30 million YouTube hits, it’s safe to say that Mountain Dew did pretty well from their wacky creation last year! For PayPal on the other hand, it was back to the drawing board with an expensive disappointment.

Watch: Mountain Dew’s Super Bowl 50 ad “Puppymonkeybaby”

Note: In this post, which is part 1 of a 3 part series, we’re going to focus on the hype surrounding the ads battle in the lead up to big game. Check back for part 2 and 3 where we’ll dive into the in-game reaction on social media and how the brands fared from the press reaction after the event.

The most anticipated ads of Super Bowl 51

This year, as well as once again analyzing millions of tweets to uncover the good, the bad and the ugly among Super Bowl 51 commercials (check back next week for that one!), we thought it would be cool to find out which brands are receiving the most media attention in the lead up to the event.

Using the AYLIEN News API we sourced and analyzed thousands of news stories that mentioned keywords relating to the Super Bowl and the brands that are advertising throughout. From these stories, and using the power of Natural Language Processing in our Text Analysis engine, we were able to uncover which brands have been mentioned most in news stories in the lead up to the event..

The top 15 most mentioned brands

The bubble chart below represents the 15 brands that have received the most mentions in Super Bowl commercial-related news content since January 1. The bigger the bubble, the higher the mentions volume;

Right away we can see a clear leader in Budweiser, who received 50% more mentions than the second most mentioned brand, Pepsi. Why are Budweiser receiving so much attention? Well, much like Mountain Dew last year, controversy is proving to be a key factor, as we’re about to show you.

Want to track mentions and get intelligent, NLP-driven insights into the world’s news content? Sign-up for a free 14 day trial of our News API and get started!

Our top 3 Super Bowl commercials to watch out for

Having uncovered the top 15 most mentioned brands, we thought we would put our necks on the line by selecting three of these brands that we believe will make the biggest splash on social media during Super Bowl 51.


In an attempt to better understand the reasoning behind the hype around Budweiser, we analyzed all news stories mentioning “Super Bowl” and “Budweiser” to see what other topics were present in the collection of articles. From our results we removed keywords relating to the football game itself, as well as obvious brand-related words such as Bud, Anheuser-Busch, beer, etc. The topics that remained quickly gave us an indication of why this ad is proving to be controversial in the US;

Topics extracted from stories mentioning “Super Bowl” and “Budweiser”

Coincidence, or political statement?

Budweiser’s commercial preview, titled Born The Hard Way, shows Adolphus Busch, the co-founder of Anheuser-Busch, arriving in the US from Germany with the dream of opening his beer brewery. With the immigrant-theme of the commercial and opening line of dialogue being “You don’t look like you’re from around here”, the thoughts of political statement quickly spring to mind.

Watch: Budweiser’s Super Bowl ad preview “Born The Hard Way”

Despite Budweiser vice-president, Ricardo Marques, stating that “There’s really no correlation with anything else that’s happening in the country”, news outlets and social media commentators beg to differ, with a strong split in opinion quickly forming. We’re even seeing the spread of #BoycottBudweiser across many tweets.

Whether intentional or not, Budweiser have placed themselves firmly at the center of an fiery debate on immigration, and it will be fascinating to see the public reaction to their main showpiece on Sunday.

Our Budweiser prediction

  • Most controversial ad this year
  • Ad content will be irrelevant, and a political debate will rage on Twitter


Snickers will make Super Bowl history this year by being the first brand to perform and broadcast their commercial live during the event.

While Snickers have released a number of small teaser-style previews with a western-theme, we’re still not sure exactly how this one is going to play out.

Watch: Snickers’ Super Bowl ad teaser

With the intrigue of a live performance, as well as the inclusion of superstars like Betty White and Adam Driver, we’re excited to see how this one goes, particularly the reaction on social media.

Live commercial, live Twitter reaction

The world’s first live Super Bowl commercial presents us with the opportunity to track public reaction before, during and after the performance. While we’ll be tracking and analyzing the reaction to all of our top 15 ads, the uniqueness of Snickers’ live commercial brings a whole new level of insight into the tracking of public opinion. Judging by the teasers, it appears that Snickers are going for a wild west-style performance with horses, celebrities and a number of performers.

The big question is, how will social media respond to a real-time, potentially unpolished and unpredictable live performance? We can’t wait to find out!

Our Snickers prediction

  • Live format will inspire and drive high social engagement.
  • A popular cast, inclusion of horses and a fun theme will see Snickers near the top of our most liked ads in terms of positive Twitter sentiment.

Want to track Twitter reactions yourself? Build your own sentiment analysis tool in just 10 minutes. No coding required, and it’s free 🙂


Our second most mentioned brand, Pepsi are investing heavily in Super Bowl 51 with commercials for two products, as well as sponsoring the 12-minute Halftime Show.

For Pepsi, their main aim is to generate awareness around two new products; LIFEWTR and Pepsi Zero Sugar. Have they been successful in this regard so far? While our post-game analysis will give us a better indication of the overall success of their campaign, we can perhaps already say that these two products are being somewhat overshadowed.

Here are the top keywords from stories mentioning “Super Bowl” and “Pepsi”, excluding game-related and obvious brand-related keywords such as Houston, PepsiCo, football, etc.

Topics extracted from stories mentioning “Super Bowl” and “Pepsi”

If you weren’t aware of who was performing during the Super Bowl Halftime Show, now you are! Lady Gaga is absolutely dominating in terms of media mentions, and Pepsi’s high mention volume is most definitely a result of the singer’s involvement in the Halftime Show that they just happen to be sponsoring.

Perhaps worryingly for Pepsi, we saw no mention of LIFEWTR or Pepsi Zero Sugar in our top 100 keyword results.

Watch: Pepsi Super Bowl 51 ad “Inspiration Drops”

Last year, PayPal were accused of playing it safe when it came to their Super Bowl ad. Have Pepsi made the same mistake with LIFEWTR?

Our Pepsi prediction

  • Huge Twitter mention volumes for Pepsi, owing to Lady Gaga’s performance.
  • Low mention volumes for LIFEWTR and Pepsi Zero Sugar.
  • Tame public reaction to LIFEWTR commercial and very low YouTube views.

Who will be the winners and losers at Super Bowl 51?

We’ll be listening to and analyzing news and social media content before, during and after Super Bowl 51 to bring you our annual insights into public and media reaction to both the game itself and the ads battle, so check back next week to find out who were the biggest winners and losers!

Happy Super Bowl weekend to you all 🙂

News API - Sign up



With our News API, our goal is to make the world’s news content easier to collect, monitor and query, just like a database. We leverage Machine Learning and Natural Language Processing to process, normalize and analyze this content to make it easier for our users to gain access to rich and high quality metadata, and use powerful filtering capabilities that will ultimately help you to find precise and targeted stories with ease.

To this end, we have just launched a cool new feature, Real-time monitoring. Real-time monitoring allows you to further automate your collection and analysis of the world’s news content by creating tailored searches that source and automatically retrieve highly-relevant news stories, as soon as they are published.

real-time monitoring

You can read more about our latest feature – which is now also available in our News API SDKs – below.

Real-time monitoring

With Real-time monitoring enabled you can automatically pull stories as they are published, based on your specific search query. Users who rely on having access to the latest stories as soon as they are published, such as news aggregators and news app developers for example, should find this new feature particularly interesting.

The addition of this powerful new feature will help ensure that your app, webpage or news feed is bang up to date with the latest and most relevant news content, without the need for manual searching and updating.

Newly published stories can be pulled every minute (configurable), and duplicate stories in subsequent searches will be ignored. This ensures you are only getting the most recent publications, rather than a repeat of what has come before.


We have created code in seven different programming languages to help get you started with Real-time monitoring, each of which can be found below, as well as in our documentation.

NB: Real-time monitoring will only work when you set the sort_by parameter to published_at and sort_direction to desc.


The main benefit of this cool new feature is that you can be confident you are receiving the very latest stories and insights, without delay, by creating an automated process that will continue to retrieve relevant content as soon as it is published online. By automating the retrieval of content in real-time, you can cut down on manual input and generate feeds, charts and graphs that will automatically update in real-time.

We hope that you find this new update useful, and we would love to hear any feedback you may have.

To start using our News API for free and query the world’s news content easily, click the image below.


News API - Sign up



In this post, Software Engineer at AYLIEN, Afshin Mehrabani talks us through his experience in building and launching an incredibly popular open source library called Intro.js.

4 years ago, when I started Intro.js as a weekend project I knew it was a useful idea but I didn’t expect to eventually have the likes of Microsoft, IBM, SAP and Amazon using what I had built.

In this post I’ll talk about what sparked the idea for Intro.js, why I open sourced it and what I have learned in the process.

The idea

I originally started Intro.js to use it on a particular product I was working on at the time. I was working as part of a web development team building web-based stock exchange software. The system had many user facing components, tools and widgets and it was a little confusing to use. This meant the support team we worked with spent most of their time on phone calls guiding people on how to use the software, find options, complete trades etc. instead of dealing with the bugs and issues our users had.

Things got even worse when we completely changed the user interface. The support team were working almost exclusively on user interface-related queries and were spending a lot of time guiding users through critical workflows. As you can imagine our support answering the same questions and running thousands of customers through a guided tour just wasn’t efficient.

As a part of the development team, we had to ensure the UI/UX was kept simple and straightforward enough for users to understand the workflow. Which I know is a common problem in software design, but for some web-based portals like this one, it’s an almost impossible task.

Imagine the UI for stock exchange software, it’s made up of many critical moving parts that are paramount to not just the users experience but the software carrying out the job at hand.


Metatrader screenshot

Financial software example – source:

These UI’s are busy and complicated and for the most part they need to be. If an object is included in the UI it’s critical to the job at hand. You can’t eliminate any objects from the page and you still need to make it clear enough for users to interact with.

I decided there had to be a better way to improve the usability of the product that didn’t rely on hours and hours of support time. That’s when I came up with the idea to develop a JavaScript and CSS tool that demonstrates the software and its components to the user using a step by step guide. The whole idea is based around a user starting a guided tour as they use the product. This ensures a user sees all the important elements and workflows that are critical to how they use the software.

Introducing Intro.js

The pictures below illustrate some sample steps of a simple Intro.js guide, with the highlighted component, a description of the step and control buttons.


Intro.js 1

Intro.js 3


Through the control buttons a user can decide to go to the next step, previous step or close the guide.

Although a relatively simple idea after implementing the original version of Intro.js on that project, I realised what I had built could benefit a lot of people who were facing the same problem we had. Which is why I decided to open-source it and let the community use and add to it while of course making some improvements to the code base.

I have now open-sourced almost all of the useful components I built. Not only has this helped a lot of people save time and money, but we’ve also built an active and vibrant community around Intro.js who help maintain the project, add more features, fix bugs and everything else that needs to be done.

In the last year, I have been building on the original offering. Most recently adding the Hints feature. Using this feature, you can add clickable circles to different elements of the webpage with extra descriptions:


Intro.js Hints

Intro.js Hints 2


Bringing it to the masses

So I had built and validated my idea in a live project and I was pretty sure there was demand out there for something like Intro.js but the challenge was how would I get this into the hands of other developers.

The open source community

After releasing the first version of Intro.js, I created a simple HTML page to demonstrate what it does. I suppose I didn’t think too much about how I was going to find users but I knew from already being actively involved in the open source community that if something was truly useful it would spread and gather pace through the community eventually. I had no idea though that within a couple of days I’d be struggling to keep up with the demand.

To try and get some exposure I decided to submit a post on Hacker News as well as some other dev focused outlets none of which had the effect of Hacker News. Within a couple of hours of the post going live it was gathering some serious interest. Over a couple of days days the post had received about 800 upvotes, 160 comments and the post was on the homepage of Hacker News for two consequent days.

During these two days, I got about 100 pull requests merged and after a month released some other minor versions to address the initial version issues and bugs.

Intro.js was a real hit and I never expected the interest it gathered from users and the hard work of other developers who helped the project by fixing issues, adding features and writing plugins for Angular.js, React, jQuery, etc. However, in saying that, it still probably took about a year to release a stable version of Intro.js that I was happy with.

Since the launch we’ve had some really great companies and developers using Intro.js, it’s been featured on some pretty cool blogs and outlets like Sitepoint and IBM developer works and has collected nearly 15,000 stars on Github.

Maintaining an open-source project

Things get more complicated when you have many many users. You have to be responsible for the product you provide, answer  questions, maintain code, merge pull requests all while also adding more features to the library.

Juggling an open source project, my job at AYLIEN ( we’re hiring BTW  😉 ) and also a Masters course at the university became increasingly difficult and finding new maintainers to work with has also proven quite difficult. Which meant to keep Intro.js alive I needed to look at what options were available.


I was unable to devote all of my time to develop and maintain an open-source project. It was clear I needed to hire some more people, mainly web developers, to work on customer queries, fix issues and maintain the project.

Which is why I decided to add a commercial license to Intro.js. I was reluctant to do so in the beginning but after some careful consideration, I’m glad I did. This decision has helped the project a lot; the codebase has improved significantly since, we can now properly serve our users and we can invest more time in adding more features and keeping our users happy.

Adding a commercial license to an open-source project motivates the creators and the maintainers to keep developing the project and release more awesome, useful versions. Since moving to a commercial license we have added a documentation section using Doc42 and moved the Markdown files from Github to Moreover, Intro.js now has an active tag on StackOverflow where developers can tag their questions with Intro.js and get the answer from the awesome community.

Note: There are many open-source licensing options available. You can read more about them here.

What’s next?

We are going to release v3.0 soon which will add responsive tours and many more features. Meanwhile, we are fixing more issues and bugs and v2.5.0 will be released in the next two weeks.

Besides this open-source project, I have developed many other JavaScript projects and they are available here.


I have learned a lot more than I have could have imagined by building and maintaining a real open-source project, I have gained invaluable experience in answering user queries, releasing versions, licensing products and so on and for that reason alone I can say for sure that open sourcing the project was the best decision I made.

Intro.js was a weekend project that turned into a real product and I hope this post gives you the push to go and build something yourself and contribute to the movement or better still to open source something you’ve already built that you think appeals to the masses. Publishing and releasing the first stable version of an open-source project could take you more than a year but during this process, I promise you, you will learn a lot and you’ll have a lot of fun doing it and who knows, you may even get a nice Thank You email from a tech giant ;).

Text Analysis API - Sign up


Our researchers at AYLIEN keep abreast of and contribute to the latest developments in the field of Machine Learning. Recently, two of our research scientists, John Glover and Sebastian Ruder, attended NIPS 2016 in Barcelona, Spain. In this post, Sebastian highlights some of the stand-out papers and trends from the conference.


The Conference on Neural Information Processing Systems (NIPS) is one of the two top conferences in machine learning. It took place for the first time in 1987 and is held every December, historically in close proximity to a ski resort. This year, it took place in sunny Barcelona. The conference (including tutorials and workshops) went on from Monday, December 5 to Saturday, December 10. The full conference program is available here.

Machine Learning seems to become more pervasive every month. However, it is still sometimes hard to keep track of the actual extent of this development. One of the most accurate barometers for this evolution is the growth of NIPS itself. The number of attendees skyrocketed at this year’s conference growing by over 50% year-over-year.


Image 1: The growth of the number of attendees at NIPS follows (the newly coined) Terry’s Law (named after Terrence Sejnowski, the president of the NIPS foundation; faster growth than Moore’s Law)

Unsurprisingly, Deep Learning (DL) was by far the most popular research topic, with about every fourth of more than 2,500 submitted papers (and 568 accepted papers) dealing with deep neural networks.


Image 2: Distribution of topics across all submitted papers (Source: The review process for NIPS 2016)

On the other hand, the distribution of research paper topics has quite a long tail and reflects the diversity of topics at the conference that span everything from theory to applications, from robotics to neuroscience, and from healthcare to self-driving cars.

Generative Adversarial Networks

One of the hottest developments within Deep Learning was Generative Adversarial Networks (GANs). The minimax game playing networks have by now won the favor of many luminaries in the field. Yann LeCun hails them as the most exciting development in ML in recent years. The organizers and attendees of NIPS seem to side with him: NIPS featured a tutorial by Ian Goodfellow about his brainchild, which led to a packed main conference hall.


Image 3: A full conference hall at the GAN tutorial

Though a fairly recent development, there are many cool extensions of GANs among the conference papers:

  • Reed et al. propose a model that allows you to specify not only what you want to draw (e.g. a bird) but also where to put it in an image.
  • Chen et al. disentangle factors of variation in GANs by representing them with latent codes. The resulting models allow you to adjust e.g. the type of a digit, its breadth and width, etc.

In spite of their popularity, we know alarmingly little about what makes GANs so capable of generating realistic-looking images. In addition, making them work in practice is an arduous endeavour and a lot of (undocumented) hacks are necessary to achieve the best performance. Soumith Chintala presents a collection of these hacks in his “How to train your GAN” talk at the Adversarial Training workshop.


Image 4: How to train your GAN (Source: Soumith Chintala)

Yann LeCun muses in his keynote that the development of GANs parallels the history of neural networks themselves: They were poorly understood and hard to get to work in the beginning and only took off once researchers figured out the right tricks and learned how to make them work. At this point, it seems unlikely that GANs will experience a winter anytime soon; the research community is still at the beginning in learning how to make the best use of them and it will be exciting to see what progress we can make in the coming years.

On the other hand, the success of GANs so far has been limited mostly to Computer Vision due to their difficulty in modelling discrete rather than continuous data. The Adversarial Training workshop showcased some promising work in this direction (see e.g. our own John Glover’s paper on modeling documents, this paper and this paper on generating text, and this paper on adversarial evaluation of dialogue models). It remains to be seen if 2017 will be the year in which GANs break through in NLP.

The Nuts and Bolts of Machine Learning

Andrew Ng gave one of the best tutorials of the conference with his take on building AI applications using Deep Learning. Drawing from his experience of managing the 1,300 people AI team at Baidu and hundreds of applied AI projects and equipped solely with two whiteboards, he shared many insights about how to build and deploy AI applications in production.

Besides better hardware, Ng attributes the success of Deep Learning to two factors: In contrast to traditional methods, deep NNs are able to learn more effectively from large amounts of data. Secondly, end-to-end (supervised) Deep Learning allows us to learn to map from inputs directly to outputs.

While this approach to training chatbots or self-driving cars is sufficient to write innovative research papers, Ng emphasized end-to-end DL is often not production-ready: A chatbot that maps from text directly to a response is not able to have a coherent conversation or fulfill a request, while mapping from an image directly to a steering command might have literally fatal side effects if the model has not encountered the corresponding part of the input space before. Rather, for a production model, we still want to have intermediate steps: For a chatbot, we prefer to have an inference engine that generates a response, while in a self-driving car, DL is used to identify obstacles, while the steering is performed by a traditional planning algorithm.


Image 5: Andrew Ng on end-to-end DL (right: end-to-end DL chatbot and chatbot with inference engine; left bottom: end-to-end DL self-driving car and self-driving car with intermediate steps)

Ng also shared that the most common mistakes he sees in project teams is that they track the wrong metrics: In an applied machine learning project, the only relevant metrics are the training error, the development error, and the test error. These metrics alone enable the project team to know what steps to take, as he demonstrated in the diagram below:


Image 6: Andrew Ng’s flowchart for applied ML projects

A key facilitator of the recent success of ML have been the advances in hardware that allowed faster computation and storage. Given that Moore’s Law will reach its limits sooner or later, one might reason that also the rise of ML might plateau. Ng, however, argued that the commitment by leading hardware manufacturers such as NVIDIA and Intel and the ensuing performance improvements to ML hardware would fuel further growth.

Among ML research areas, supervised learning is the undisputed driver of the recent success of ML and will likely continue to drive it for the foreseeable future. In second place, Ng saw neither unsupervised learning nor reinforcement learning, but transfer learning. We at AYLIEN are bullish on transfer learning for NLP and think that it has massive potential.

Recurrent Neural Networks

The conference also featured a symposium dedicated to Recurrent Neural Networks (RNNs). The symposium coincided with the 20 year anniversary of LSTM…


Image 7: Jürgen Schmidhuber kicking off the RNN symposium

… being rejected from NIPS 1996. The fact that papers that do not use LSTMs have been rare in the most recent NLP conferences (see our EMNLP blog post) is a testament to the perseverance of the authors of the original paper, Sepp Hochreiter and Jürgen Schmidhuber.

At NIPS, we had several papers that sought to improve RNNs in different ways:

Other improvements apply to Deep Learning in general:

  • Salimans and Kingma propose Weight Normalisation to accelerate training that can be applied in two lines of Python code.
  • Li et al. propose a multinomial variant of dropout that sets neurons to zero depending on the data distribution.

The Neural Abstract Machines & Program Induction (NAMPI) workshop also featured several speakers talking about RNNs:

  • Alex Graves focused on his recent work on Adaptive Computation Time (ACT) for RNNs that allows to decouple the processing time from the sequence length. He showed that a word-level language model with ACT could reach state-of-the-art with fewer computations.
  • Edward Grefenstette outlined several limitations and potential future research directions in the context of RNNs in his talk.

Improving classic algorithms

While Deep Learning is a fairly recent development, the conference featured also several improvements to algorithms that have been around for decades:

  • Ge et al. show in their best paper that the non-convex objective for matrix completion has no spurious local minima, i.e. every local minimum is a global minimum.
  • Bachem et al. present a method that guarantees accurate and fast seedings for large-scale k-means++ clustering. The presentation was one of the most polished ones of the conference and the code is open-source and can be installed via pip.
  • Ashtiani et al. show that we can make NP-hard k-means clustering problems solvable by allowing the model to pose queries for a few examples to a domain expert.

Reinforcement Learning

Reinforcement Learning (RL) was another much-discussed topic at NIPS with an excellent tutorial by Pieter Abbeel and John Schulman dedicated to RL. John Schulman also gave some practical advice for getting started with RL.

One of the best papers of the conference introduces Value Iteration Networks, which learn to plan by providing a differentiable approximation to a classic planning algorithm via a CNN. This paper was another cool example of one of the major benefits of deep neural networks: They allow us to learn increasingly complex behaviour as long as we can represent it in a differentiable way.

During the week of the conference, several research environments for RL were simultaneously released, among them OpenAI’s Universe, Deep Mind Lab, and FAIR’s Torchcraft. These will likely be a key driver in future RL research and should open up new research opportunities.

Learning-to-learn / Meta-learning

Another topic that came up in several discussions over the course of the conference was Learning-to-learn or Meta-learning:

  • Andrychowicz et al. learn an optimizer in a paper with the ingenious title “Learning to learn by gradient descent by gradient descent”.
  • Vinyals et al. learn how to one shot-learn in a paper that frames one-shot learning in the sequence-to-sequence framework and has inspired new approaches for one-shot learning.

Most of the existing papers on meta-learning demonstrate that wherever you are doing something that gives you gradients, you can optimize them using another algorithm via gradient descent. Prepare for a surge of “Meta-learning for X” and “(Meta-)+learning” papers in 2017. It’s LSTMs all the way down!

Meta-learning was also one of the key talking points at the RNN symposium. Jürgen Schmidhuber argued that a true meta-learner would be able to learn in the space of all programs and would have the ability to modify itself and elaborated on these ideas at his talk at the NAMPI workshop. Ilya Sutskever remarked that we currently have no good meta-learning models. However, there is hope as the plethora of new research environments should also bring progress in this area.

General Artificial Intelligence

Learning how to learn also plays a role in the pursuit of the elusive goal of attaining General Artificial Intelligence, which was a topic in several keynotes. Yann LeCun argued that in order to achieve General AI, machines need to learn common sense. While common sense is often vaguely mentioned in research papers, Yann LeCun gave a succinct explanation of what common sense is: “Predicting any part of the past, present or future percepts from whatever information is available.” He called this predictive learning, but notes that this is really unsupervised learning.

His talk also marked the appearance of a controversial and often tongue-in-cheek copied image of a cake, which he used to demonstrate that unsupervised learning is the most challenging task where we should concentrate our efforts, while RL is only the cherry on the icing of the cake.


Image 8: The Cake slide of Yann LeCun’s keynote

Drew Purves focused on the bilateral relationship between the environment and AI in what was probably the most aesthetically pleasing keynote of the conference (just look at those graphics!)


Image 9: Graphics by Max Cant of Drew Purves’ keynote (Source: Drew Purves)

He emphasized that while simulations of ecological tasks in naturalistic environments could be an important test bed for General AI, General AI is needed to maintain the biosphere in a state that will allow the continued existence of our civilization.


Image 10: Nature needs AI and AI needs Nature from Drew Purves’ keynote

While it is frequently — and incorrectly — claimed that neural networks work so well because they emulate the brain’s behaviour, Saket Navlakha argued during his keynote that we can still learn a great deal from the engineering principles of the brain. For instance, rather than pre-allocating a large number of neurons, the brain generates 1000s of synapses per minutes until its second year. Afterwards, until adolescence, the number of synapses is pruned and decreases by ~50%.


Image 11: Saket Navlakha’s keynote

It will be interesting to see how neuroscience can help us to advance our field further.

In the context of the Machine Intelligence workshop, another environment was introduced in the form of FAIR’s CommAI-env that allows to train agents through interaction with a teacher. During the panel discussion, the ability to learn hierarchical representations and to identify patterns was emphasized. However, although the field is making rapid progress on standard tasks such as object recognition, it is unclear if the focus on such specific tasks brings us indeed closer to General AI.

Natural Language Processing

While NLP is more of a niche topic at NIPS, there were a few papers with improvements relevant to NLP:

  • He et al. propose a dual learning framework for MT that has two agents translating in opposite directions teaching each other via reinforcement learning.
  • Sokolov et al. explore how to use structured prediction under bandit feedback.
  • Huang et al. extend Word Mover’s Distance, an unsupervised document similarity metric to the supervised setting.
  • Lee et al. model the helpfulness of reviews by taking into account position and presentation biases.

Finally, a workshop on learning methods for dialogue explored how end-to-end systems, linguistics and ML methods can be used to create dialogue agents.



Jürgen Schmidhuber, the father of the LSTM was not only present on several panels, but did his best to remind everyone that whatever your idea, he had had a similar idea two decades ago and you should better cite him lest he interrupt your tutorial.



Boston Robotics’ Spot proved that — even though everyone is excited by learning and learning-to-learn — traditional planning algorithms are enough to win the admiration of a hall full of learning enthusiasts.


Image 12: Boston Robotics’ Spot amid a crowd of fascinated onlookers


Apple, one of the most secretive companies in the world, has decided to be more open, to publish, and to engage with academia. This can only be good for the community. We’re looking forward to more apple research papers.


Image 13: Ruslan Salakhutdinov at the Apple lunch event


Uber announced their acquisition of Cambridge-based AI startup Geometric Intelligence and threw one of the most popular parties of NIPS.


Image 14: The Geometric Intelligence logo

Rocket AI

Talking about startups, the “launch” of Rocket AI and their patented Temporally Recurrent Optimal Learning had some people fooled (note the acronyms in the below tweets). Riva-Melissa Tez finally cleared up the confusion.


These were our impressions from NIPS 2016. We had a blast and hope to be back in 2017!


Text Analysis API - Sign up