Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78


Dubbed as Europe’s largest technology marketplace and Davos for geeks, the Web Summit has been going from strength to strength in recent years as more and more companies, employees, tech junkies and media personnel flock to the annual event to check out the latest innovations, startups and a star-studded lineup of speakers and exhibitors.


Having grown from a small gathering of around 500 like-minded people in Dublin, this year’s event, which was held in Lisbon for the first time, topped 50,000 attendees representing 15,000 companies from 166 countries.

With such a large gathering of techies, there was bound to be a whole lot of chatter relating to the event on Twitter. So being the data geeks that we are, and before we jetted off to Lisbon ourselves, we turned our digital ears to Twitter and listened for the duration of the event to see what we could uncover.

Our process

We collected a total of just over 80,000 tweets throughout the event by focusing our search on keywords, Twitter handles and hashtags such as ‘Web Summit’, #websummit, @websummit, etc.

We used the following tools to collect, analyze and visualize the data;

And here’s what we found;

What languages were the tweets written in?

In total, we collected tweets written in 42 different languages.

Out of our 80,000 tweets, 60,000 were written in English, representing 75% of the total volume.

The pie chart below shows all languages, excluding English. As you can see, Portuguese was the next most-used language with just under of 11% of tweets being written in the host country’s native tongue. Spanish and French tweets represented around 2.5% of total volume each.

How did tweet volumes fluctuate throughout the week?

The graph below represents hourly tweet volume fluctuations throughout the week. As you can see, there are four distinct peaks.

While we can’t list all the reasons for these spikes in volume, we did find a few recurring trends during these times, which we have added to the graph;

Let’s now take a more in-depth look at each peak.

What were the causes of these fluctuations?

By adding the average hourly sentiment polarity to this graph we can start to gather a better understanding of how people felt while writing their tweets.

Not familiar with sentiment analysis? This is a feature of text analysis and natural language processing (NLP) that is used to detect positive or negative polarity in text. In short, it tells us whether a piece of text, or a tweet in this instance, has been written in a positive, negative or neutral way. Learn more.

Interestingly, each tweet volume peak correlates with a sharp drop in sentiment. What does this tell us? People were taking to Twitter to complain!

Positivity overall

Overall, average sentiment remained in the positive (green) for the entire week. That dip into negative (red) that you can see came during the early hours of Day 2 as news of the US election result broke. Can’t blame the Web Summit for that one!

We can also see distinct rises in positive sentiment around the 5pm mark each day as attendees took to Twitter to reflect on an enjoyable day.

Sentiment also remained comparatively high during the later hours of each day as the Web Summit turned to Night Summit – we’ll look at this in more detail later in the post.


Mike, Afshin, Noel & Hamed after a hectic but enjoyable day at the Web Summit

What was the overall sentiment of the tweets?

The pie chart below shows the breakdown of all 80,000 tweets, split by positive, negative and neutral sentiment.

The majority of tweets (80%) were written in a neutral manner. 14% were written with positive sentiment, with the remaining 6% written negatively.

To uncover the reasons behind both the positive and negative tweets, we extracted and analyzed mentioned keywords to see if we could spot any trends.

What were the most common keywords found in positive tweets?

We used our Entity and Concept Extraction features to uncover keywords, phrases, people and companies that were mentioned most in both positive and negative tweets.

As you can imagine, there were quite a few keywords extracted from 80,000 tweets so we trimmed it down by taking the following steps;

  • Sort by mention count
  • Take the top 100 most mentioned keywords
  • Remove obvious or unhelpful keywords (Web Summit, Lisbon, Tech, etc)

And here are our results. You can hover over individual clusters to see more information.

We can see some very positive phrases here, with great, amazing, awesome, good, love and nice featuring prominently.

The most mentioned speaker from the positive tweets was Gary Vaynerchuk (@garyvee), which makes sense considering the sharp rise in positive sentiment we saw his fans produce earlier in this post on our sentiment-over-time graph.

What were the most common keywords found in negative tweets?

We took the exact same approach to generate a list of the most mentioned keywords from tweets with negative sentiment;

For those of you that attended Web Summit, it will probably come as no surprise to see WiFi at the forefront of the negativity. While it did function throughout the event, many attendees found it unreliable and too slow, leading to many using their own data and hotspotting from their cell phones.

Mentions of queue, long, full, lines and stage are key indicators of just how upset people became while queueing for the opening ceremony at the main stage, only for many to be turned away because the venue became full.

The most mentioned speaker from negative tweets was Dave McClure (@davemcclure). The 500 Startups Founder found himself in the news after sharing his views on the US election result with an explosive on-stage outburst. It should be noted that just because Dave was the most mentioned speaker from all negative tweets, it doesn’t necessarily mean people were being negative towards him. In fact, many took to Twitter to support him;

Much of the negativity came from people simply quoting what Dave had said on stage, which naturally contained high levels of negative sentiment;

Which speakers were mentioned most?

Web Summit 2016 delivered a star-studded line up of a total of 663 speakers. What we wanted to know who was, who was mentioned most on Twitter?

By combining mentions of names and Twitter handles, we generated and sorted a list of the top 25 most mentioned speakers.

Messrs Vaynerchuk and McClure once again appear prominently, with the former being the most mentioned speaker overall throughout the week. Joseph Gordon-Levitt, actor and Founder of HitRECord, came in in second place, followed by Web Summit founder Paddy Cosgrave.

Which airline flew to Lisbon with the happiest customers?

With attendees visiting Lisbon from 166 countries, we thought it would be cool to see which airline brought in the happiest customers. By extracting mentions of the airlines that fly in to Lisbon, we could then analyze the sentiment of the tweets in which they were mentioned.

For most airlines, there simply wasn’t enough data available to analyze. However, we did find enough mentions of Ryanair and British Airways to be able to analyze and compare.

Here’s what we found;

Ryanair vs. British Airways

The graph below is split into three levels of sentiment – positive, neutral and negative. Ryanair is represented in blue and British Airways in red.

It’s really not hard to pick a winner here. British Airways were not only mentioned in more positive tweets, they were also mentioned in considerably less negative tweets.

Night Summit: which night saw the highest tweet volumes?

In total we found 593 mentions of night summit. The graph below shows tweet volumes for each day, and as you can see, November 7 was a clear winner in terms of volume.

..and which morning saw the most hangovers?!

Interestingly, we found a correlation between low tweet volumes (mentioning Night Summit, #nightsummit, etc.) and higher mentions of hangovers the following day!

59% of tweets mentioning hangover, hungover, resaca, etc, came on November 10 – the day after the lowest tweet volume day.

35% came on November 9 while just 6% came on November 8 – the day after the highest tweet volume day.

What do these stats tell us? Well, while we can’t be certain, we’re guessing that the more people partied, the less they tweeted. Probably a good idea 🙂


In today’s world, if someone wants to express their opinion on an event, brand, product, service, or anything really, they will more than likely do so on social media. There is a wealth of information published through user generated content that can be accessed in near real-time using Text Analysis and Text Mining solutions and techniques.

Wanna try it for yourself? Click the image below to sign up to our Text Analysis API with 1,000 free calls per day.

Text Analysis API - Sign up



In recent months, we have been bolstering our sentiment analysis capabilities, thanks to some fantastic research and work from our team of scientists and engineers.

Today we’re delighted to introduce you to our latest feature, Sentence-Level Sentiment Analysis.

New to Sentiment Analysis? No problem. Let’s quickly get you up to speed;

What is Sentiment Analysis?

Sentiment Analysis is used to detect positive or negative polarity in text. Also known as opinion mining, sentiment analysis is a feature of text analysis and natural language processing (NLP) research that is increasingly growing in popularity as a multitude of use-cases emerge. Here’s a few examples of questions that sentiment analysis can help answer in various industries;

  • Brands – are people speaking positively or negatively when they mention my brand on social media?
  • Hospitality – what percentage of online reviews for my hotel/restaurant are positive/negative?
  • Finance – are there negative trends developing around my investments, partners or clients?
  • Politics – which candidate is receiving more positive media coverage in the past week?

We could go on and on with an endless list of examples but we’re sure you get the gist of it. Sentiment Analysis can help you understand the split in opinion from almost any body of text, website or document – an ideal way to uncover the true voice of the customer.

Types of Sentiment Analysis

Depending on your specific use-case and needs, we offer a range of sentiment analysis options;

Document Level Sentiment Analysis

Document level sentiment analysis looks at and analyzes a piece of text as a whole, providing an overall sentiment polarity for a body of text.

For example, this camera review;

Screen Shot 2016-11-22 at 17.56.07

receives the following result;

Screen Shot 2016-11-22 at 17.56.14

Want to test your own text or URLs? Check out our live demo.

Aspect-Based Sentiment Analysis (ABSA)

ABSA starts by locating sentences that relate to industry-specific aspects and then analyzes sentiment towards each individual aspect. For example, a hotel review may touch on comfort, staff, food, location, etc. ABSA can be used to uncover sentiment polarity for each aspect separately.

Here’s an example of results obtained from a hotel review we found online;

Screen Shot 2016-11-22 at 17.58.05

Note how each aspect is automatically extracted and then given a sentiment polarity score.

Click to learn more about Aspect-Based Sentiment Analysis.

Sentence-Level Sentiment Analysis (SLSA)

Our latest feature breaks down a body of text into sentences and analyzes each sentence individually, providing sentiment polarity for each.

SLSA in action

Sentence-Level Sentiment Analysis is available in our Google Sheets Add-on and also through the ABSA endpoint in our Text Analysis API. Here’s a sample query to try with the Text Analysis API;

Now let’s take a look at it in action in the Sheets Add-on.

Analyze text

We imported some hotel reviews into Google Sheets and then ran an analysis using our Text Analysis Add-on. Below you will see the full review in column A, and then each sentence in a column of its own with a corresponding sentiment polarity (positive, negative or neutral), as well as a confidence score. This score reflects how confident we are that the sentiment is correct, with 1.0 representing complete confidence.

Screen Shot 2016-11-23 at 17.54.55

Analyze URLs

This new feature also enables you to analyze volumes of URLs as it first scrapes the main text content from each web page and then runs SLSA on each sentence individually.

In the GIF below, you can see how the content from a URL on Business Insider is first broken down into individual sentences and then assigned a positive, negative or neutral sentiment at sentence level, thus providing a granular insight into the sentiment of an article.


What’s the benefit of SLSA?

As we touched on earlier, sentiment analysis, in general, has a wide range of potential use-cases and benefits. However, Document-Level Sentiment Analysis can often miss out on uncovering granular details in text by only providing an overall sentiment score.

Sentence-Level Sentiment Analysis allows you to perform a more in-depth analysis of text by uncovering the positive, neutral and negatively written sentences to find the root causes of the overall document-level polarity. It can assist you in locating instances of strong opinion in a body of text, providing greater insight into the true thoughts and feelings of the author.

SLSA can also be used to analyze and summarize a collection of online reviews by extracting all the individual sentences within them that are written with either positive or negative sentiment.

Ready to get started?

Our Text Analysis Add-on for Google Sheets has been developed to help people with little or no programming knowledge take advantage of our Text Analysis capabilities. If you are in any way familiar with Google Sheets or MS Excel you will be up and running in no time. We’ll even give you 1,000 free credits to play around with. Click here to download your Add-on or click the image below to get started for free with our Text Analysis API.


Text Analysis API - Sign up



The 2016 US Presidential election was one of (if not the) most controversial in the nation’s history. With the end prize being arguably the most powerful job in the world, the two candidates were always going to find themselves coming under intense media scrutiny. With more media outlets covering this election than any that have come before it, an increase in media attention and influence was a given.

But how much of an influence does the media really have on an election? Does journalistic bias sway voter opinion, or does voter opinion (such as poll results) generate journalistic bias? Does the old adage “all publicity is good publicity” ring true at election time?

“My sense is that what we have here is a feedback loop. Does media attention increase a candidate’s standing in the polls? Yes. Does a candidate’s standing in the polls increase media attention? Also yes.” -Jonathan Stray @jonathanstray

Thanks to an ever-increasing volume of media content flooding the web, paired with advances in natural language processing and text analysis capabilities, we are in a position to delve deeper into these questions than ever before, and by analyzing the final sixty days of the 2016 US Presidential election, that’s exactly what we set out to do.

So, where did we start?

We started by building a very simple search using our News API to scan thousands of monitored news sources for articles related to the election. These articles, 170,000 in total, were then indexed automatically using our text analysis capabilities in the News API.

This meant that key data points in those articles were identified and indexed to be used for further analysis:

  • Keywords
  • Entities
  • Concepts
  • Topics

With each of the articles or stories sourced comes granular metadata such as publication time, publication source, source location, journalist name and sentiment polarity of each article. Combined, these data points provided us with an opportunity to uncover and analyze trends in news stories relating to the two presidential candidates.

We started with a simple count of how many times each candidate was mentioned from our news sources in the sixty days leading up to election day, as well as the keywords that were mentioned most.


By extracting keywords from the news stories we sourced, we get a picture of the key players, topics, organizations and locations that were mentioned most. We generated the interactive chart below using the following steps;

  1. We called the News API using the query below.
  2. We called it again, but searched for “Trump NOT Clinton”
  3. Mentions of the two candidates naturally dominated in both sets of results so we removed them in order to get a better understanding of the keywords that were being used in articles written about them. We also removed some very obvious and/or repetitive words such as USA, America, White House, candidate, day, etc.

Here’s the query;

You can hover your cursor over each cluster to view details;

Most mentioned keywords in articles about Hillary Clinton

Straight away, bang in the middle of these keywords, we can see FBI and right beside it, emails.

Most mentioned keywords in articles about Donald Trump

Similar to Hillary, Trump’s main controversies appear most prominently in his keywords, with terms like women, video, sexual and assault all appearing prominently.

Most media mentions

If this election was decided by the number of times a candidate was mentioned in the media, who would win? We used the following search queries to total the number of mentions from all sources over the sixty days immediately prior to election day;

Note: We could also have performed this search with a single query, but we wanted to separate the candidates for further analysis, and in doing this, we removed overlapping stories with titles that mentioned both candidates.

Here’s what we found, visualized;

Who was mentioned more in the media? Total mentions volume:

It may come as no surprise that Trump was mentioned considerably more than Clinton during this period, but was he consistently more prominent in the news over these sixty days, or was there perhaps a major story that has skewed the overall results? By using the Time Series endpoint, we can graph the volume of stories over time.

We generated the following chart using results from the two previous queries;

How media mentions for both candidates fluctuated in the final 60 days

As you would expect, the volume of mentions for each candidate fluctuates throughout the sixty day period, and to answer our previous question – yes, Donald Trump was consistently more prominent in terms of media mentions throughout this period. In fact, he was mentioned more than Hillary Clinton in 55 of the 60 days.

Let’s now take a look at some of the peak mention periods for each candidate to see if we can uncover the reasons for the spikes in media attention;

Donald Trump

Trump’s peak period of media attention was October 10-13, as indicated by the highest red peak in the graph above. This period represented the four highest individual days of mention volume and can be attributed to the scandal that arose from sexual assault accusations and a leaked tape showing Trump making controversial comments about groping women.

The second highest peak, October 17-20, coincides with a more positive period for Trump, as a combination of a strong final presidential debate and a growing email scandal surrounding Hillary Clinton increased his media spotlight.

Hillary Clinton

Excluding the sharp rise in mentions just before election day, Hillary’s highest volume days in terms of media mentions occurred from October 27-30 as news of the re-emergence of an FBI investigation surfaced.

So we’ve established the dates over the sixty days when each candidate was at their peak of media attention. Now we want to try establish the sentiment polarity of the stories that were being written about each candidate throughout this period. In other words, we want to know whether stories were being written in a positive, negative or neutral way. To achieve this, we performed Sentiment Analysis.

Sentiment analysis

Sentiment Analysis is used to detect positive or negative polarity in text. Also known as opinion mining, sentiment analysis is a feature of text analysis and natural language processing (NLP) research that is increasingly growing in popularity as a multitude of use-cases emerge. Put simply, we perform Sentiment Analysis to uncover whether a piece of text is written in a positive, negative or neutral manner.

Note: The vast majority of news articles about the election will undoubtedly contain mentions of both Trump and Clinton. We therefore decided to only count stories with titles that mentioned just one candidate. We believe this significantly increases the likelihood that the article was written about that candidate. To achieve this, we generated search queries that included one candidate while excluding the other. The News API supports boolean operators, making such search queries possible.

First of all, we wanted to compare the overall sentiment of all stories with titles that mentioned just one candidate. Here are the two queries we used;

And here are the visualized results;

What am I seeing here? Blue represents articles written in a neutral manner, red in a negative manner and green in a positive manner. Again, you can hover over the graph to view more information.

What was the overall media sentiment towards Hillary Clinton?

What was the overall media sentiment towards Donald Trump?

Those of you that followed the election, to any degree, will probably not be surprised by these results. We don’t really need data to back up the claim that Trump ran the more controversial campaign and therefore generated more negative press.

Again, similar to how we previously graphed mention volumes over time, we also wanted to see how sentiment in the media fluctuated throughout this sixty day period. First we’ll look at Clinton’s mention volume and see if there is any correlation between mention volume and sentiment levels.

Hillary Clinton

How to read this graph: The top half (blue) represents fluctuations in the number of daily media mentions (‘000’s) for Hillary Clinton. The bottom half represents fluctuations in the average sentiment polarity of the stories in which she was mentioned. Green = positive and red = negative.

You can hover your cursor over the data points to view more in-depth information.

Mentions Volume (top) vs. Sentiment (bottom) for Hillary Clinton

From looking at this graph, one thing becomes immediately clear; as volume increases, polarity decreases, and vice versa. What does this tell us? It tells us that perhaps Hillary was in the news for the wrong reasons too often – there were very few occasions when both volume and polarity increased simultaneously.

Hillary’s average sentiment remained positive for the majority of this period. However, that sharp dip into the red circa October 30 came just a week before election day. We must also point out the black line that cuts through the bottom half of the graph. This is a trend line representing average sentiment polarity and as you can see, it gets consistently closer to negative as election day approaches.

Mentions Volume (top) vs. Sentiment (bottom) for Donald Trump

Trump’s graph paints a different picture altogether. There was not a single day when his average polarity entered into the positive (green). What’s interesting to note here, however, is how little his mention volumes affected his average polarity. While there are peaks and troughs, there were no major swings in either direction, particularly in comparison to those seen on Hillary’s graph.

These results are of course open to interpretation, but what is becoming evident is that perhaps negative stories in the media did more damage to Clinton’s campaign than they did to Trump’s. While Clinton’s average sentiment polarity remained consistently more positive, Trump’s didn’t appear to be as badly affected when controversial stories emerged. He was consistently controversial!

Trumps lowest point, in terms of negative press, came just after the second presidential debate at the end of September. What came after this point is the crucial detail, however. Trump’s average polarity recovered and mostly improved for the remainder of the campaign. Perhaps critically, we see his highest and most positive averages of this period in the final 3 weeks leading up to election day.

Sentiment from sources

At the beginning of this post we mentioned the term media bias and questioned its effect on voter opinion. While we may not be able to prove this effect, we can certainly uncover any traces of bias from media content.

What we would like to uncover is whether certain sources (ie publications) write more or less favorably about either candidate.

To test this, we’ve analyzed the sentiment of articles written about both candidates from two publications: USA Today and Fox News.

USA Today


Similar to the overall sentiment (from all sources) displayed previously, the sentiment polarity of articles from USA Today shows consistently higher levels of negative sentiment towards Donald Trump. The larger than average percentage of neutral results indicate that USA Today took a more objective approach in their coverage of the election.

USA Today – Sentiment towards Hillary Clinton

USA Today – Sentiment towards Donald Trump

Fox News

Again, Trump dominates in relation to negative sentiment from Fox News. However, what’s interesting to note here is that Fox produced more than double the percentage of negative story titles about Hillary Clinton than USA Today did. We also found that, percentage-wise, they produced half as many positive stories about her. Also, 3.9% of Fox’s Trump coverage was positive, versus USA Today’s 2.5%.

Fox News – Sentiment towards Hillary Clinton

Fox News – Sentiment towards Donald Trump

Media bias?

These figures beg the question; how are two major news publications writing about the exact same news, with such varied levels of sentiment? It certainly highlights the potential influence that the media can have on voter opinion, especially when you consider how many people see each article/headline. The figures below represent social shares for a single news article;

Screen Shot 2016-11-17 at 09.43.44

Bear in mind, these figures don’t represent the number of people who saw the article, they represent the number of people who shared it. The actual number of people who saw this on their social feed will be a high-multiple of these figures. In fact, we grabbed the average daily social shares, per story, and graphed them to compare;

Average social shares per story

Pretty even, and despite Trump being mentioned over twice as many times as Clinton during this sixty day period, he certainly didn’t outperform her when it came to social shares.


Since the 2016 US election was decided there has been a sharp focus on the role played by news and media outlets in influencing public opinion. While we’re not here to join the debate, we are here to show you how you can deep-dive into news content at scale to uncover some fascinating and useful insights that can help you source highly targeted and precise content, uncover trends and assist in decision making.

To start using our News API for free and query the world’s news content easily, click here.

News API - Sign up


Here at AYLIEN we have a team of researchers who like to keep abreast of, and regularly contribute to, the latest developments in the field of Natural Language Processing. Recently, one of our research scientists, Sebastian Ruder, attended EMNLP 2016 in Austin, Texas. In this post, Sebastian has highlighted some of the stand-out papers and trends from the conference.


Image: Jackie Cheung

I spent the past week in Austin, Texas at EMNLP 2016, the Conference on Empirical Methods in Natural Language Processing.

There were a lot of papers at the conference (179 long papers, 87 short papers, and 9 TACL papers all in all) — too many to read each single one. The entire program can be found here. In the following, I will highlight some trends and papers that caught my eye:

Reinforcement learning

One thing that stood out was that RL seems to be slowly finding its footing in NLP, with more and more people using it to solve complex problems:


Dialogue was a focus of the conference with all of the three keynote speakers dealing with different aspects of dialogue: Christopher Potts talked about pragmatics and how to reason about the intentions of the conversation partner; Stefanie Tellex concentrated on how to use dialogue for human-robot collaboration; finally, Andreas Stolcke focused on the problem of addressee detection in his talk.

Among the papers, a few that dealt with dialogue stood out:

  • Andreas and Klein model pragmatics in dialogue with neural speakers and listeners;
  • Liu et al. show how not to evaluate your dialogue system;
  • Ouchi and Tsuboi select addressees and responses in multi-party conversations;
  • Wen et al. study diverse architectures for dialogue modelling.


Seq2seq models were again front and center. It is not common for a method to have its own session two years after its introduction (Sutskever et al., 2014). While in the past years, many papers employed seq2seq e.g. for Neural Machine Translation, some papers this year focused on improving the seq2seq framework:

Semantic parsing

Since seq2seq’s use for dialogue modelling was popularised by Vinyals and Le, it is harder to get it to work with goal-oriented tasks that require an intermediate representation on which to act. Semantic parsing is used to convert a message into a more meaningful representation that can be used by another component of the system. As this technique is useful for sophisticated dialogue systems, it is great to see progress in this area:

X-to-text (or natural language generation)

While mapping from text-to-text with the seq2seq paradigm is still prevalent, EMNLP featured some cool papers on natural language generation from other inputs:


Parsing and syntax are a mainstay of every NLP conference and the community seems to particularly appreciate innovative models that push the state-of-the-art in parsing: The ACL ’16 outstanding paper by Andor et al. introduced a globally normalized model for parsing, while the best EMNLP ‘16 paper by Lee et al. combines a global parsing model with a local search over subtrees.

Word embeddings

There were still papers on word embeddings, but it felt less overwhelming than at the past EMNLP or ACL, with most methods trying to fix a particular flaw rather than training embeddings for embeddings’ sake. Pilevhar and Collier de-conflate senses in word embeddings, while Wieting et al. achieve state-of-the-art results for character-based embeddings.

Sentiment analysis

Sentiment analysis has been popular in recent years (as attested by the introductions of many recent papers on sentiment analysis). Sadly, many of the conference papers on sentiment analysis reduce to leveraging the latest deep neural network for the task to beat the previous state-of-the-art without providing additional insights. There are, however, some that break the mold: Teng et al. find an effective way to incorporate sentiment lexicons into a neural network, while Hu et al. incorporate structured knowledge into their sentiment analysis model.

Deep Learning

By now, it is clear to everyone: Deep Learning is here to stay. In fact, deep learning and neural networks claimed the two top spots of keywords that were used to describe the submitted papers. The majority of papers used at least an LSTM; using no neural network seems almost contrarian now and is something that needs to be justified. However, there are still many things that need to be improved — which leads us to…

Uphill Battles

While making incremental progress is important to secure grants and publish papers, we should not lose track of the long-term goals. In this spirit, one of the best workshops that I’ve attended was the Uphill Battles in Language Processing workshop, which featured 12 talks and not one, but four all-star panels on text understanding, natural language generation, dialogue and speech, and grounded language. Summaries of the panel discussions should be available soon at the workshop website.

This was my brief review of some of the trends of EMNLP 2016. I hope it was helpful.


News API - Sign up


With our News API, our goal is to make the world’s news content easier to query, just like a database. Additionally, we leverage Machine Learning to process, normalize and analyze this content to make it easier for our users to gain access to rich and high quality metadata, and use powerful filtering capabilities that will ultimately help you to find the needle in the haystack more easily.

To this end, we have just launched two new handy features for filtering stories based on their image metadata and setting range queries for social media share counts. You can read more about these two features – which are now also available in our News API SDKs – below.

Image metadata filters

The news content published online is increasingly becoming multimodal, to the point that it is rare to find an article or a blog post that doesn’t include an image or a video. Our News API stats show that 83% of all the articles that we have in our index contain at least 1 image.

Therefore, it is important to be able to search and filter stories not just based on their textual content, but also based on their images.

To facilitate this, we now analyze each extracted image of each news article to capture its size (width and height), format and content length. Additionally, we have introduced 7 new parameters for filtering stories based on these attributes:

  • media.images.width.min: minimum image width (in pixels)
  • media.images.width.max: maximum image width (in pixels)
  • media.images.height.min: minimum image height (in pixels)
  • media.images.height.max: maximum image height (in pixels)
  • media.images.content_length.min: minimum image content size (in bytes)
  • media.images.content_length.max: maximum image content size (in bytes)
  • media.images.format[]: image format (possible values are: JPEG, PNG, GIF, SVG, ICO, TIFF, CUR, WEBP and BMP).

As an example, let’s use these parameters to retrieve stories about Golf that have an image in JPEG or PNG format that is bigger than 80kb in size:


Here’s an image returned from the search query above:

Screen Shot 2016-11-07 at 17.43.48

Social range filters

One of the highly popular features of our News API is its ability to sort stories based on how many times they have been shared on social media. However, if you use this to retrieve popular stories over a long period of time, you will sometimes notice that a few highly popular stories (those that have been shared 100’s of thousands of times) would come at the top, preventing you from accessing the long tail of interesting and popular stories easily.

To battle this, we have introduced the following 8 new parameters that allow you to set range (i.e. minimum and maximum) filters on social media shares counts:

  • social_shares_count.facebook.min: minimum number of Facebook shares
  • social_shares_count.facebook.max: maximum number of Facebook shares
  • social_shares_count.google_plus.min: minimum number of Google+ shares
  • social_shares_count.google_plus.max: maximum number of Google+ shares
  • social_shares_count.linkedin.min: minimum number of LinkedIn shares
  • social_shares_count.linkedin.max: maximum number of LinkedIn shares
  • social_shares_count.reddit.min: minimum number of Reddit shares
  • social_shares_count.reddit.max: maximum number of Reddit shares

To retrieve all stories that mention Donald Trump, and have been shared between 50 and 500 times on Facebook, we can use the following query:

These filters are now available across all our News API SDKs. We hope that you find these new updates useful, and we would love to hear any feedback you may have.

To start using our News API for free and query the world’s news content easily, click here.

News API - Sign up



However, we are always keen to speak with potential candidates for various roles here at AYLIEN. If you’re interested in joining the team, we would love to hear from you. Please email your CV to

At AYLIEN we are using recent advances in Artificial Intelligence to try to understand natural language. Part of what we do is building products such as our Text Analysis API and News API to help people extract meaning and insight from text. We are also a research lab, conducting research that we believe will make valuable contributions to the field of Artificial Intelligence, as well as driving further product development (see this post about a recent publication on aspect-based sentiment analysis by one of our research scientists for example).

We are excited to announce that we are currently accepting applications from students and researchers for funded PhD and Masters opportunities, as part of the Irish Research Council Employment Based Programme.

The Employment Based Programme (EBP) enables students to complete their PhD or Masters degree while working with us here at AYLIEN.

For students and researchers, we feel that this is a great opportunity to work in industry with a team of talented scientists and engineers, and with the resources and infrastructure to support your work.

About us

We’re an award-winning VC-backed text analysis company specialising in cutting-edge AI, deep learning and natural language processing research to offer developers and solution builders a package of APIs that bring intelligent analysis to a wide range of apps and processes, helping them make sense of large volumes of unstructured data and content.

With thousands of users worldwide and a growing customer base that includes great companies such as Sony, Complex Media, Getty Images, and McKinsey, we’re growing fast and enjoy working as part of a diverse and super smart team here at our office in Dublin, Ireland.

You can learn more about AYLIEN, who we are and what we do, by checking out our blog and two of our core offerings – our Text Analysis API and News API.

About the IRC Employment Based Programme

The Irish Research Council’s Employment Based Programme (EBP) is a unique national initiative, providing students with an opportunity to work in a co-educational environment involving a higher education institution and an employment partner.

The EBP provides a co-educational opportunity for researchers as they will be employed directly by AYLIEN, while also being full time students working on their research degree. One of the key benefits of such an arrangement is that you will be given a chance to see your academic outputs being transferred into a practical setting. This immersive aspect of the programme will enable you to work with some really bright minds who can help you generate research ideas and bring benefits to your work that may otherwise not have come to light under a traditional academic Masters of PhD route.


The Scholarship funding consists of €24,000pa towards salary and a maximum of €8,000pa for tuition, travel and equipment expenses. Depending on candidates’ level of seniority and expertise, the salary amount may be increased.

Our experience with the EBP

AYLIEN is proud to host and work with two successful programme awardees under the EBP, Sebastian Ruder and Peiman Barnaghi. Both Sebastian and Peiman have been working under the supervision of Dr. John Breslin, who is an AYLIEN advisor and lecturer at NUI Galway and Insight Center. We also have academic ties with University College Dublin (UCD) through Barry Smyth. Barry is a Full Professor and Digital Chair of Computer Science at UCD, and recently joined the team at AYLIEN as an advisor.

Screen Shot 2016-11-02 at 14.59.33Back row, left to right: Peiman and Sebastian with Parsa Ghaffari, AYLIEN Founder & CEO

Sebastian Ruder

Throughout his research, Sebastian has developed language and domain-agnostic Deep Learning-based models for sentiment analysis and aspect-based sentiment analysis that have been published at conferences and are used in production. His main research focus is to develop efficient methods to enable models to learn from each other and to equip them with the capability to adapt to new domains and languages.

The Employment Based Programme for me brings academia and industry together in the best possible way: It enables me to immerse myself and get to the bottom of hard problems; at the same time, I am able to collaborate with driven and inspiring individuals at AYLIEN. I find this immersion of research-oriented people like myself sitting next to people that are hands-on with diverse technical backgrounds very compelling. This stimulating and fast-paced working environment provides me with direction and focus for my research, while the ‘get stuff done’ mentality allows me to concentrate and accomplish meaningful things” – Sebastian Ruder, Research Scientist at AYLIEN

Here are some of Sebastian’s recent publications:

  • INSIGHT-1 at SemEval-2016 Task 4: Convolutional Neural Networks for Sentiment Classification and Quantification (arXiv)
  • INSIGHT-1 at SemEval-2016 Task 5: Deep Learning for Multilingual Aspect-based Sentiment Analysis (arXiv)
  • A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis (arXiv)
  • Towards a continuous modeling of natural language domains (arXiv

Peiman Barnaghi

Peiman’s research, in collaboration with the Insight Centre for Data Analytics, NUI Galway, focuses on Scalable Topic-level Sentiment Analysis on Streaming Feeds. His main focus is working on Twitter data for Sentiment Analysis using Machine Learning and Deep Learning methods for detecting polarity trends toward a topic on a large set of tweets and determining the degree of polarity.

Here are some of Peiman’s recent publications:

  • Opinion Mining and Sentiment Polarity on Twitter and Correlation between Events and Sentiment (link)
  • Text Analysis and Sentiment Polarity on FIFA World Cup 2014 Tweets (PDF)

You can read more about our experience with the EBP in the Irish Research Council’s Annual Report (pages 29 & 31)

Details & requirements

First and foremost, your thesis topic must be something you are passionate about. While prior experience with the topic is important, it is not crucial. We can work with you to establish a suitable topic that overlaps with both the supervisor’s general area of interest/research and our own research and product directions.

Suggested read: Survival Guide to a PhD by Andrej Karpathy

We are particularly interested in applicants with interests in the following areas (but are open to other suggestions):

  • Representation Learning
  • Domain Adaptation and Transfer Learning
  • Sentiment Analysis
  • Question Answering
  • Dialogue Systems
  • Entity and Relation Extraction
  • Topic Modeling
  • Document Classification
  • Taxonomy Inference
  • Document Summarization
  • Machine Translation

You have the option to complete a Masters (1 year, or 2 years if structured) or a PhD (3 years, or 4 years if structured) degree.

AYLIEN will co-fund your scholarship and provide you with professional guidance and mentoring throughout the programme. It is a prerequisite that you spend 50-70% of your time based on site with us and the remainder of the time at your higher educational institute (HEI).

Open to students with a bachelor’s degree or higher (worldwide) and you will ideally be based within a commutable distance of our office in Dublin City Centre.


It would be ideal if you have already identified or engaged with a potential supervisor at a university in Ireland. However, if not, we will help you with finding a suitable supervisor.

Important dates and deadlines

Please note: all times stated are Ireland time and are estimates based on last years programme. Full details will be released in December.

Call open: 6 December 2016

FAQ Deadline: 8 February 2017 (16:00)

Applicant Deadline: 15 February 2017 (16:00)

Supervisor, Employment Mentor and Referee Deadline: 22 February 2017 (16:00)

Research Office Endorsement Deadline: 1 March 2017 (16:00)

Outcome of Scheme: 26 May 2017

Scholarship Start Date: 1 October 2017

How to apply

To express your interest, please forward your CV and accompanying note with topic suggestions to