Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

The global health and wellness market is valued at almost $4 trillion annually, and the media plays a central role in consumer buying decisions. So spotting trends in the media coverage about this can generate a lot of hugely valuable insights, as well as telling us some interesting facts about the global health and wellness industry. With this in mind, we decided to look into what was published in the Health category last month and see what we find.

In February, our News API gathered, analyzed, and indexed over 2.5 million new stories as they were published, giving us a huge enriched dataset of news content to look into. About 70,000 or so of these articles were categorized as belonging to the Health category (seems low right? if you’re wondering why, take a look at this previous blog, where we looked into the most popular categories and saw that Sports, News, and a few others have vastly greater publishing volumes than the rest).

The News API is an extremely powerful tool for looking into enriched news content at scale and generating intelligent insights about what the world is talking about. In this blog, we’re going to look into these almost 70,000 stories to ask the following questions:

  1. What publishing patterns could we identify over the course of the month?
  2. What consumers were these stories aimed at?
  3. What was the sentiment in these stories?
  4. Who and what was being talked about in these stories?

If you want to dive into this content for yourself, grab an API key to start your free trial, and get up and running in minutes.

How many stories are published in the Health category every day?

To find out what patterns of publication the stories in the Health category followed, we used the Time Series endpoint to see the daily count of new stories from the past 8 weeks and plotted the results on the chart below.

You can see a clear pattern in publication volumes in the chart below, with the average weekday volume of new Health stories being around 3,000 stories. On the weekends (when most journalists/writers are off work), this dips by about two thirds to 1,000 new stories per day.

We can also see two noticeable spikes that break this trend in the first and last weeks of February. By narrowing down the time period of our search, we can see that this increased coverage in the first week was due to the announcement of a new health industry venture between Amazon and Berkshire Hathaway, while the second spike was caused by the release of a study that found a correlation between alcohol consumption and early-onset dementia.


Does the Health & Wellness coverage talk about men or women more?

With the huge spend on health and wellness that we mentioned in the introduction, we thought it would be interesting to see who the health coverage was aimed at – men or women. The difference we found is interesting – in stories in the Health category, the word “women” appeared in the title almost four times more than “men”:


Was the Sentiment of Health & Wellness stories different for Men and Women?

Knowing that there were much more Health & Wellness stories aimed at women than men is a valuable insight in itself, but using the News API we can go further and analyze the sentiment of each of these stories.

The News APIs Trends endpoint allows you to do exactly this – every time the News API collects a new story, it analyzes the sentiment of each one. Using the Trends endpoint, you can search for the volume of stories containing positive, negative, and neutral tone.

You can see below that stories in the health category with “women” in the title were generally evenly balanced between positive, negative, and neutral. On the other hand, stories in the same category that contained “men” in the title tended to be more negative than positive, with 47% of these storeies containing a negative tone.


What entities did the Health stories about women mention?

So far we have seen that in the 70,000 news stories published about health last month, much more are about women’s health than men’s health, and the stories about women’s health tend to have a more positive tone than the stories about men’s health.

With the News API, we can dive further into these stories and see exactly what people, things, and organizations are mentioned the most. Using the Trends endpoint, we gathered all of the entities mentioned in Health stories with “women” in the title and plotted them on the chart below. You can see that ‘study’ is frequently mentioned in these stories.


We were curious to see why the word ‘study’ was mentioned so much, so checked the results in the Time Series endpoint, which shows us the volume of stories over time. Specifically, we searched for stories in the Health category that mentioned both “women” and “study” in the body. This search will show when Health stories about women quoted studies. You can see two big spikes:


With the Stories endpoint, we can go further into the trends from last month and see the actual news stories that made up these spikes. We took a random selection of three stories from each of the two spikes in stories that mention “study” in the title.

On February 15th, you can see a WHO report on women’s choices while in labor dominated the news:

While on the 20th, a study on the health impact of cleaning products from the University of Bergen prompted a spike in stories:

That concludes our high-level look into February’s publishing trends in the Health category. If you have an interest in this category, use the link below to grab an API key and start digging into the data yourself. Our News API gathers, analyzes, and indexes over 2.5 million new stories each month in near-real time, so whatever industry you’re in, the News API can help you generate timely, actionable insights.

News API - Sign up


Each month on the AYLIEN blog, we look into the previous month’s news to see what trends and insights we can extract using our News API. 2018 started with the news of a serious comedown from the highs of Bitcoin mania in December, and Donald Trump continuing to blaze a trail through American politics with new controversies. So last month was not exactly short of subject matter to dive into.

Using the News API, we looked into 2.6 million of the stories published last month and analyzed the coverage of two topics that caught our attention in January:

  • the media’s coverage of the rapid rise and fall in the value of Bitcoin
  • the reaction to President Trump’s description of African countries as “shitholes”


Bitcoin’s rise and fall in the media

Previously on the AYLIEN blog, we looked at the rising amount of stories being published about Bitcoin in November, and saw that the media coverage of Bitcoin was closely entwined with the cryptocurrency’s fortunes. This explains the huge number of people who bought into them and drove the price up, despite the relatively tiny number of people who understand cryptocurrency.

So with the recent crash in the price of Bitcoin, we decided to look into whether the media had a role in the plummeting confidence of these same masses of people in Bitcoin’s prospects.

How did the media’s publishing patterns relate to the price of Bitcoin?

To look into the relationship between the media and Bitcoin, we compared the volume of stories published about Bitcoin with the daily closing price of Bitcoin (which we downloaded in CSV format from We did this using the Time Series endpoint, requesting the daily volume of stories published with ‘Bitcoin’ in the title.

You can see that the number of stories about Bitcoin skyrocketed on the day that the cryptocurrency crossed the $10,000 milestone. As its value continued to rise, the press interest remained (even rising a little), but after this initial infatuation, you can see that the only spikes in story volume occurred when Bitcoin lost value.

When you look closely at the patterns of publication volume, you can see that after three news cycles of Bitcoin price rallies, the media were only interested in Bitcoin losing value. For example, notice how the price of Bitcoin increased by $2,000 in one day on January 5th, but this did not prompt any spike in press coverage. On the other hand, when Bitcoin lost value the following week, you can see two spikes in media interest. Using the Time Series endpoint like this lets us begin to extract insights into the press coverage without even looking at a single story.

What were these stories talking about?

Once we see the relationship between Bitcoin price and publishing volume spikes, we can see a definite correlation and guess that the media were interested in stories about the misfortune of Bitcoin investors. But using the News API, we can actually look into every one of these stories and analyze what was being written about at scale.

To do this, we used the keywords parameter of the Trends endpoint to retrieve the most-mentioned people, places, and things from stories published in January with ‘Bitcoin’ in the title. After that, we converted the JSON returned by the News API into CSV format using this free, easy-to-use tool and visualized the file in Tableau.

You can see from the chart that besides the obvious keywords like ‘blockchain’ and ‘currency,’ ‘South Korea’ and ‘volatility’ are two highly-prevalent keywords that cannot be explained away by a conceptual relation to Bitcoin. This means that the media had a big focus on South Korea’s clampdown on the cryptocurrency and its volatility in general.

What content were people sharing about Bitcoin last month?

Knowing that publishers were focused on the falling values of Bitcoin is great, as it gives us a hint about what narrative the media was pushing around it. But we can also take a look at what people were sharing on social media, which can give us an insight into what aspects of the saga the public were most interested in.

Looking at the most-shared stories on Facebook, you can see that people were most interested in stories about government clampdowns on the cryptocurrency and its bad fortune in general. This shows that there was a popular appetite for the negative coverage.

  1. Bitcoin prices fall as South Korea says ban still an option,” The Associated Press. 55,046 shares.
  2. France wants tougher rules on bitcoin to avoid criminal use,” The Associated Press. 54,963 shares.
  3. Facebook is banning all ads promoting cryptocurrencies — including bitcoin and ICOs,” recode. 18,384 shares.

Donald Trump’s description of African countries as “sh*tholes”

In terms of news coverage, there’s really no getting away from Donald Trump. For better or worse, the media is fixated on the President and January was no different. Last month, the President of the United States referred to Haiti and the countries of Africa as “shitholes” during talks on immigration reform, took a test used to diagnose dementia, and gave a speech to world leaders at Davos.

We decided to see which stories the press were most interested in by using the Stories endpoint again. From the peaks in publishing volume about Trump, you can see that the most interest in the President on the day after the “shitholes” comment, followed by his speeches at Davos and the State of the Union, on the last day of the month.


What sentiment did the media show towards each event?

So we can see the events that prompted media coverage of the President in January. But with the News API, we can look into the sentiment of the coverage of each event. This gives us an insight into how people felt about the events influencing the news cycle.
To do this, we use the Time Series endpoint and searched for the daily volume of stories with ‘Trump’ in the title, with queries for positive, negative, and neutral sentiment polarities in the story body.

Below you can see the difference between the overwhelmingly negative sentiment in the coverage around the time of the “shithole” controversy, whereas the more balanced coverage on January 31st, the day Trump made his State of the Union speech.

Most-shared stories on Facebook about Trump

Knowing what events the media published most about is great, but we can go further and see exactly what stories people read and shared the most.

  1. Trump attacks protections for immigrants from ‘shithole’ countries in Oval Office meeting,” Washington Post. 872,383 shares.
  2. Trump referred to Haiti and African countries as ‘shithole’ nations,” NBC News. 396,978 shares.
  3. Trump, Defending His Mental Fitness, Says He’s a ‘Very Stable Genius’’” The New York Times. 277,664 shares.

These are just two topics that interested us last month, but with the vast amount of enriched content that the News API gives you access to, you can dive into any topic, popular or niche, and start generating insights. Sign up for your free trial below and dive in!

News API - Sign up


Advances in of Natural Language Processing and Machine Learning are broadening the scope of what technology can do in people’s everyday lives, and because of this, there is an unprecedented number of people developing a curiosity in the fields. And with the availability of educational content online, it has never been easier to go from curiosity to proficiency.

We gathered some of our favorite resources together so you will have a jumping off point into studying these fields on your own. Some of the resources here are suitable for absolute beginners in either Natural Language Processing or Machine Learning, and others are suitable for those with an understanding of one who wish to learn more about the other.

We’ve split these resources into two categories:

  • Online courses and textbooks for structured learning experiences and reference material
  • NLP and Machine Learning blogs to benefit from the work of some researchers and students who distill current advances in research into interesting and readable posts.

The resources on this post are 12 of the best, not the 12 best, and as such should be taken as suggestions on where to start learning without spending a cent, nothing more!

6 free Natural Language Processing & Machine Learning courses & educational resources:

  1. Speech and Language Processing by Dan Jurafsky and James Martin was first printed in 1999 and its third edition was printed last year. It’s a comprehensive and highly readable introduction to NLP that progresses through the concepts quickly.
    Screenshot (1106)
  2. Andrew Ng’s course on Machine Learning is probably the best standalone introduction to the topic, because of both the content (delivered by Ng himself) and the structure (weekly readings, videos, and assignments). After this, you can proceed to Ng’s Deep Learning class with a solid foundation.

    Screenshot (1105)
  3. The Deep Learning book by Goodfellow, Bengio, and Courville is an authoritative textbook on the subject. Some (minor) criticism leveled at it focused on verbose definitions, but if you’re new to the subject, you’ll appreciate this greatly as it gives a bit of context to new concepts.
  4. The video lectures and resources for Stanford’s Natural Language Processing with Deep Learning are great for those who have completed an introduction to Machine Learning/Deep Learning and want to apply what they’ve learned to Natural Language Processing. The programming assignments are in Python.
  5. is an extensive collection of in-depth educational content, with tutorial series on topics from introductory Machine Learning and Natural Language Processing to training a self-driving car in Grand Theft Auto with Deep Learning. While the Stanford series offers a glimpse into a university class on Deep Learning, these videos (by YouTuber Sentdex) cover the same topics in a much more informal setting. If you’re interested in the “how” of Machine Learning rather than the “why,” you should start here instead of Ng’s class or the Stanford videos.
  6. scikit-learn, a popular Python library for Machine Learning, has a number of hands-on tutorials, including some on text data. Screenshot (1108)


6 Natural Language Processing & Machine Learning blogs to follow

  1. Sebastian Ruder, a research scientist focusing on Transfer Learning and Natural Language Processing here at AYLIEN, is the author of this great blog
    Screenshot (1110)
  2. Vered Schwartz authors the cautiously titled Probably Approximately a Scientific Blog, which explains Natural Language Processing concepts and research in accurate and interesting ways (like this explanation of one of the challenges of NLP – ambiguity – via Dad jokes).
  3. Sujit Pal is a developer who frequently updates his blog, Salmon Run. Since Sajit is coming from a programming rather than a scientific background, this blog is great for programmers who want to learn from a proficient in Machine Learning.
  4. Ben Frederickson writes posts on his blog about technical and NLP-related subjects like this post on Unicode and lots of other stuff, like this great post on recommending music with different algorithms.
     Screenshot (1112) 
  5. Although it hasn’t been updated in a while, Kavita Ganesan’s Text Analytics 101 has a list of useful explainers for NLP concepts such as such as N-grams and not-strictly-NLP-things you might find useful, such as comparisons of CrowdFlower and Amazon’s Mechanical Turk.
     Screenshot (1114)
  6. Finally (this isn’t a blog), to keep up with current developments in NLP and Machine Learning research, interesting articles about the subject, and the newest software libraries, sign up for NLP News, a fortnightly newsletter, curated with thought by Sebastian. 
    Screenshot (1116)

    Looking into some of these educational resources and keeping an eye on these blogs is a great way to become more proficient in Natural Language Processing and Machine Learning. Be sure to keep an eye on the research sections of our website and our blog to read about new research from the Science team!

    While these are great resources for starting a journey to become proficient in Natural Language Processing and Machine Learning, you can leverage these technologies in minutes with our three NLP solutions.

    Text Analysis API - Sign up


Last week, Snapchat unveiled a major redesign of their app that received quite a bit of negative feedback. As a video-sharing platform that has integrated itself into users’ daily lives, Snapchat relies on simplicity and ease of use. So when large numbers of these users begin to express pretty serious frustration about the app’s new design, it’s a big threat to their business.


You can bet that right now Snapchat are analyzing exactly how big a threat this backlash is by monitoring the conversation online. This is a perfect example of businesses leveraging the Voice of their Customer with tools like Natural Language Processing. Businesses that track their product’s reputation online can quantify how serious events like this are and make informed decisions on their next steps. In this blog, we’ll give a couple of examples of how you can dive into online chatter and extract important insights on customer opinion.

This TechCrunch article pointed out that 83% of Google Play Store reviews in the immediate aftermath of the update gave the app one or two stars. But as we mentioned in a blog last week, star rating systems aren’t enough – they don’t tell you why people feel the way they do and most of the time people base their star rating on a lot more than how they felt about a product or service.

To get accurate and in-depth insights, you need to understand exactly what a reviewer is positive or negative about, and to what degree they feel this way. This can only be done effectively with text mining.

So in this short blog, we’re going to use text mining to:

  1. Analyze a sample of the Play Store reviews to see what Snapchat users mentioned in reviews posted since the update.
  2. Gather and analyze a sample of 1,000 tweets mentioning “Snapchat update” to see if the reaction was similar on social media.

In each of these analyses, we’ll use the use the AYLIEN Text Analysis API, which comes with a free plan that’s ideal for testing it out on small datasets like the ones we’ll use in this post.


What did the app reviewers talk about?

As TechCrunch pointed out, 83% of reviews since the update shipped received one or two stars, which gives us a high-level overview of the sentiment shown towards the redesign. But to dig deeper, we need to look into the reviews and see what people were actually talking about in all of these reviews.

As a sample, we gathered the 40 reviews readily available on the Google Play Store and saved them in a spreadsheet. We can analyze what people were talking about in them by using our Text Analysis API’s Entities feature. This feature analyzes a piece of text and extracts the people, places, organizations and things mentioned in it.

One of the types of entities returned to us is a list of keywords. To get a quick look into what the reviewers were talking about in a positive and negative light, we visualized the keywords extracted along with the average sentiment of the reviews they appeared in.

From the 40 reviews, our Text Analysis API extracted 498 unique keywords. Below you can see a visualization of the keywords extracted and the average sentiment of the reviews they appeared in from most positive (1) to most negative (-1).

First of all, you’ll notice that keywords like “love” and “great” are high on the chart, while “frustrating” and “terrible” are low on the scale – which is what you’d expect. But if you look at keywords that refer to Snapchat, you’ll see that “Bitmoji” appears high on the chart, while “stories,” “layout,” and “unintuitive” all  appear low down the chart, giving an insight into what Snapchat’s users were angry about.


How did Twitter react to the Snapchat update?

Twitter is such an accurate gauge of what the general public is talking about that the US Geological Survey uses it to monitor for earthquakes – because the speed at which people react to earthquakes on Twitter outpaces even their own seismic data feeds! So if people Tweet about earthquakes during the actual earthquakes, they are absolutely going to Tweet their opinions of Snapchat updates.

To get a snapshot of the Twitter conversation, we gathered 1,000 Tweets that mentioned the update.To gather the Tweets, we ran a search on Twitter using the Twitter Search API (this is really easy –  take a look at our beginners’ guide to doing this in Python).

After we gathered our Tweets, we analyzed them with our Sentiment Analysis feature and as you can see, the Tweets were overwhelmingly negative:  

Quantifying the positive, negative, and neutral sentiment shown towards the update on Twitter is useful, but using Text Mining we can go one further and extract the keywords mentioned in every one of these Tweets. To do this, we use the Text Analysis API’s Entities feature.

Disclaimer: this being Twitter, there was quite a bit of opinion expressed in a NSFW manner 😉


The number of expletives we identified as keywords reinforces the severity of the opinion expressed towards the update. You can see that “stories” and “story” are two of the few prominently-featured keywords that referred to feature updates while keywords like “awful” and “stupid” are good examples of the most-mentioned keywords in reaction to the update as a whole.

It’s clear that using text mining processes like sentiment analysis and entity extraction – can provide a detailed overview of public reaction to an event by extracting granular information from product reviews and social media chatter.

If you can think of insights you could extract with text mining about topics that matter to you, our Text Analysis API allows you to analyze 1,000 documents per day free of charge and getting started with our tools couldn’t be easier – click on the image below to sign up.

Text Analysis API - Sign up


Online review sites are the world’s repository of customer opinion – every day, hundreds of thousands of customers give publicly available feedback on their experiences with businesses. With customer opinion available on a scale like this, anyone can generate insights about their business, their competitors, and potential opportunities.

But to leverage these sites like this, you need to understand what is being talked about positively and negatively in the text of hundreds or thousands of reviews. Since analyzing that amount of reviews manually would be far too time consuming, most people don’t consider any kind of quantitative analysis beyond looking at the star ratings, which are too vague and can be frequently misleading.

So in this blog, we’re going to show you how to use Text Mining to quickly generate accurate insights from thousands of reviews. For this blog, we’re going to scrape and analyze restaurant reviews from TripAdvisor and show you how easy it is to build a robust sentiment analysis workflow without writing any code using and the AYLIEN Text Analysis Add-on for Google Sheets.

We’ll break the process down into three easy-to-follow steps:

  1. We’ll show you how to use to scrape reviews from TripAdvisor
  2. We’ll use the AYLIEN Text API Google Sheets Add-on to analyze the sentiment expressed in each review toward 13 aspects of the dining experience.
  3. We’ll show you the results of our sample analysis

As we mentioned, neither of the tools we’ll use require coding skills, and you can use both of them for free.


Why are star reviews not enough on their own?

Take a look at the difference between these three-star reviews (which are for the same branch of the same restaurant chain):


Screenshot (911)Screenshot (907)


From looking at these reviews, you can spot two important things with the star ratings and the review texts:

  1. Even though the star rating is the same, one of the reviews is positive, the other is negative. This gap between the star rating and what the reviewer really thought is part of the reason Netflix recently ditched the star review system.
  2. The text review allows you to see why the review is positive or negative – the specific aspects that made their dining experience positive or negative.

So to get an accurate analysis of customer opinion from reviews, you need to read the text of every review. The problem here is that doing this at scale is extremely time consuming and pretty much impossible. But we can solve this problem using Text Analytics and Machine Learning.


How to use to scrape reviews from TripAdvisor

In order to find out what people are saying about businesses, we first need to gather the reviews. For this blog, we decided to analyze customer reviews of Texas Roadhouse, ranked by Business Insider as America’s best restaurant chain.

We chose to compare reviews on their branch in Gatlinburg, Tennessee with the branch in Dubai – as this might let us see how customers in diverse regions are responding to the Texas Outhouse offering. Each of these branches had more than 1,000 reviews, which gives us a generous amount of data to analyze.


Untitled design (12)


Usually, gathering data like this would involve writing code to scrape the review sites, but makes this task a lot easier – they allow you to scrape sites by simply pointing and clicking at the data you want. You can sign up for a free trial here and see a handy introductory video here (but we’ll walk you through the process below).

Once you’ve picked which restaurant you want to analyze and you’ve signed up for a trial with, open up the restaurant’s TripAdvisor page in To do this, just enter the URL in the New Extractor input box. If you point and click on the text of a review, will scrape all of the reviews on the page and save them for you.


Screenshot (918)

Screenshot (917)


You’ve now scraped the reviews from a single page. But since you’ll probably want a lot more than the 10 reviews on each TripAdvisor page, we’ll show you how you scrape a few hundred in one go.

Scraping hundreds of reviews at once

You may notice that when you are browsing reviews of a restaurant on TripAdvisor, the page url changes every time you select the next 10 reviews – it adds “-or10” for the next ten results, “-or20-” for the following 10, and so on. You can see it in the URL right before the restaurant name.

In our Texas Roadhouse example, the URL goes from this:

To this: allows us to scrape numerous webpages at once if we upload a list of URLs on a spreadsheet. So to gather 1,000 restaurant reviews, we need to upload a spreadsheet with 100 of these URLs, with the -”or10” increasing by 10 each time.

To make your life a little easier, we’ll share the simple, six-step workaround we used for this with you here:

Step 1: Select the URL of the second page of results containing reviews of the restaurant you want to analyze. In our case it’s

Step 2: Open up a spreadsheet and fill the first three cells (A1, B1, and C1) with the URL, but only up to “-or10” – copy and paste the remainder of the URL to somewhere else for now (in our case we’ll cut “-Texas_Roadhouse-Dubai_Emirate_of_Dubai.html” and paste it to another cell).

Step 3: Edit the cells in B1 and C1 to end with “-or20” and “-or30”, respectively. Then select these cells, and extend the selected cells until you have 100 rows covered. Excel or Google Sheets will then follow the pattern you have set in the first three.


Step 4: since this is not the completed URL, you’ll need to add the end of the URL you have from step 2 to the end of the text in each cell. You can do this by selecting a cell in row A and typing “A1&[the rest of your URL],” and extending that cell’s format downwards again.

Screenshot (965)

Step 5: copy and paste the values of this new column into column A, and save your spreadsheet. Your spreadsheet should now have one column with 100 rows.

Step 6: Open up and create a new extractor, and open up settings. Click on Import URLs, and select the spreadsheet with your URLs and save them. Once you click Run URLs, will start scraping the 1,000 reviews from the URLs you’ve given it. Once it’s done, download the results, and open the file in Google Sheets.

Screenshot (921)


Analyzing the Sentiment of Reviews

So at this point, we’ve gathered 1,000 reviews of each Texas Roadhouse branch, with each review containing a customer’s feedback about their experience in the restaurant. In every review, customers express positive, negative, and neutral sentiment toward the various aspects of their experience.

AYLIEN’s Aspect-Based Sentiment Analysis feature detects the aspects mentioned in a piece of text, and then analyzes the sentiment shown toward each of these aspects. In this blog, we’re analyzing restaurants but you can also use this feature to analyze review of hotels, cars, and airlines. In the restaurants domain, the Aspect-Based Sentiment Analysis feature detects mentions of seven aspects.

Copy of 600px x 300px – Untitled Design (3)


Using our Text Analysis API is easy with the Google Sheets Add-on, which you can download for free here (the Add-on comes with 1,000 credits free so you can test it out). You can complete the analysis by following these three easy steps:

Step 1: Once you’ve downloaded the Add-on, it will be available in the Add-ons menu in your Google Sheets toolbar. Open it up by selecting it and clicking Start.

Step 2: Before you begin the Aspect-based Sentiment Analysis of your reviews, first select the option from the Analysis Type menu, then select all of the cells that contain your reviews.

Step 3: To begin the sentiment analysis, click Analyze. The Text API will then extract the aspects mentioned in each review one by one, and print them in three columns next to the review – Positive, Negative, and Neutral. These results will be returned to you at a rate of about three per second, so our 2,000 reviews should take around ten minutes to analyze.



Results of the Aspect-Based Sentiment Analysis

At this point, each line of your spreadsheet will contain a column of the reviews you gathered, a column of the aspects mentioned in a positive tone, one with the aspects mentioned in a negative tone, and one with aspects mentioned in a neutral tone. To get a quick look into our data, we put together the following visualizations by simply using the spreadsheet’s word counting function.

First off, let’s take a look at the most-mentioned aspects in all of the reviews we gathered. To do this, all you need to do is separate every aspect listed in Google Sheets into its own cell using a simple function, and then use a formula to count them.

Screenshot (959)


To put each aspect mentioned into its own cell, we’ll use the Split text to columns function, in the Data toolbar. This function will move every word in a cell into a cell of its own by splitting the cell horizontally – that is, if a cell in column A has three words, the Split text function will move the second two words into the adjacent cells in columns B and C.

From the pie chart, we can see that the food and staff alone accounted for almost two thirds of the total mentions, and after that there’s a bit of a drop off. After these aspects, customers in these restaurants were concerned about was how busy the restaurant was and the value of the meal.

Knowing which aspects of the dining experience people were most likely to leave reviews about is useful, but we can go further and analyze the sentiment attached to each aspect. Let’s take a look at the sentiment attached to each aspect in each of the Texas Roadhouse branches.

To do this, use Google Sheets’ COUNTIF formula to count every time the Text API listed an aspect in the positive, negative, and neutral columns. Do this by creating a table with each aspect as rows and Positive, Negative, and Neutral as columns, and use the following formula: =COUNTIF(the range of cells that contain the aspects in each sentiment,”*aspect*”).

After you’ve entered the formula, fill it out correctly, like in the example below, where you can see the formula filled out to count the amount of times food is mentioned positively – =COUNTIF(B1:B988,“*food*”).

Screenshot (950)


Once you’ve done this, fill in the results on a table like the one below, and then insert a chart from the Insert tab.

Screenshot (953)

We chose a stacked bar chart, as it allows us to get a quick grasp of what aspects people were interested in and how they felt about each aspect. First off, take a look at the sentiment shown to each aspect by the reviewers of the Dubai branch. You can see that the reviews are very positive:

When we compare the reviews of the Dubai branch above with the Tennessee reviews, we can see immediately that the American branch received more positive reviews than its Dubai counterpart:

Interestingly, we can also see from the volume of the mentions of each aspect that customers in Dubai were more concerned with value than their American counterparts, where reviewers paid more attention to the restaurant staff (with most of this extra attention being negative).


These are just a few things that jumped out at us after a sample analysis of a couple of restaurants. If you want to get started leveraging TripAdvisor (or another review site) for your own research using the steps in this blog, sign up for a free trial with here, and download our Google Sheets Add-on here (there’s no sign-up required for the Add-on and it comes with free credits so you can test it out).

Text Analysis API - Sign up


It’s now the end of an eventful year that saw the UK begin negotiations to leave the EU, the fight of the century between a boxer and a mixed martial artist, and the discovery of alternative facts. The world’s news publishers reported all of this and the countless other events that shaped 2017, leaving a vast record of what the media was talking about right through the year.

Using Natural Language Processing, we can dive into this record to generate insights about topics that interest us. Our News API has been hard at work gathering, analyzing, and indexing over 25 million news stories in near-real time over 2017. The News API extracts and stores dozens of data points on every story, from classifying the subject matter to analyzing the sentiment, to listing the people, places, and things mentioned in every one.

This enriched content provides us with a vast dataset of structured data about what the world was talking about throughout the year, allowing us to take a quantitative look at the news trends of 2017.

Using the News API, we’re going to dive into two questions on topics that dominated last year’s news coverage:

  1. What was the coverage of Donald Trump’s first year in office like?
  2. What trends affected sports coverage – consistently the most popular category – in 2017?


Trump’s first year in office

How much did the media publish?

Any review of 2017’s news has to begin with Donald Trump and his first year in office as President. To begin with, we wanted to see how the US President was covered over the course of the year, to see which events the media covered the most. To do this, we used the Time Series endpoint to analyze the daily volume of stories that mentioned Trump in the title.

Take a look at what the News API found:


From this chart, you can see that the media are generally less interested in Trump now than they were during the first month or two of his presidency. Despite the coverage of the Charlottesville protests, the media fixation on Trump is slowly tapering off.


How did sentiment in the coverage of Trump vary over the year?

Knowing what the media was the most interested in about the President is useful information, but we can also track the sentiment expressed in each one of these stories, and see how the overall sentiment polarity changed over time.

Again using the Time Series endpoint, we can do this. Take a look at what the News API found:

You can see that the News API detected the most negative sentiment in stories about Trump around the time of his call with a fallen US soldier, where he reportedly said to the soldier’s wife, “he knew what he was signing up for”. The most positive sentiment was detected around the time of Trump’s speech in Riyadh, and as the NFL kneeling controversy began to expand.

You will also notice spikes in positive sentiment in stories about Trump around both his administration’s repeal of DACA, and as more and more NFL players joined in the kneeling protests. We think that since both of these spikes follow shortly after the events that the coverage is most likely about the reactions or backlash towards these developments.


What other things were mentioned in stories about Trump?

So we know how both the volume of stories about Trump and their sentiment varied over time. But knowing exactly what other people, organizations, and things were mentioned in these stories across the year would let us see what all of these stories were about.

The News API extracts entities mentioned in in every story it analyzes. Using the Trends endpoint, we can search for the 100 entities that were most frequently mentioned in stories about Trump in 2017. These entities are visualized below.

Perhaps unsurprisingly, we can see that Trump coverage was dominated by his campaign’s and administration’s involvement with Russia. But what is quite remarkable is the scale to which it dominated the coverage that Russia was mentioned in more stories with ‘Trump’ in the title than the US itself.


What were the most-shared stories about Trump in 2017?

Seeing which stories were shared the most on social networking sites can be very interesting. It can also yield some important business insights as the more a story is shared, the more value it generates for advertisers and publishers.

We can do this with the News API by using the Stories endpoint. Since Facebook consistently garners the most shares of news stories of all the social networks, we returned the top three stories:

  1. Trump Removes Anthony Scaramucci From Communications Director Role,” The New York Times – 1,061,494 shares.
  2. Trump announces ban on transgender people in U.S. military,” The Washington Post – 696,341 shares.
  3. Trump admin. to reverse ban on elephant trophies from Africa,” ABC News – 638,917 shares.


2017 in Sports Coverage

Sports is the subject that the media writes the most about, by quite a bit. This is reflected in the fact that the News API gathered over five million stories about sports in 2017, more than any other single subject category.

To make sense of this content at this scale, we need to first understand the subject matter of each story. To enable us to do this, the News API classifies each story according to two taxonomies.

To analyze the most popular sports, we used the Time Series endpoint to see how the daily volume of stories about the four most-popular sports varied over time. We searched stories that the News API classified as belonging to the categories Soccer, American Football, Baseball, and Basketball in the advertising industry’s  IAB-QAG taxonomy. To narrow our search down a bit, we decided to look into Autumn, the busiest time of year for sports.

Take a look what the News API returned:

We can see that the biggest event that caused a spike in stories was Mike Pence’s out-of-the-ordinary appearance at an NFL game as the NFL kneeling protests expanded, a game from which he left after the players kneeled during the playing of the national anthem.

Other than this, the biggest spike in stories was clearly caused by the closing of the English transfer window on the last day of August, showing the dominant presence of soccer in the world’s media outlets.


Who and what were the media talking about?

Being able to see the spikes in the volume of sports stories around certain events is a useful resource to have, but we can use the News API to see exactly what people, places, and organizations were talked about in every one of the over 25 million stories it gathered in 2017.

To do this, we again used the Trends endpoint to find the most-mentioned entities in sports stories from 2017. Take a look at what the News API found:

You can immediately see the dominance of popular soccer clubs in the media coverage, but locations that host popular NFL and NBA teams are also featured prominently. However, soccer has a clear lead over its American competitors in terms of media attention, probably due to the global reach of soccer.


What were the most-shared sports stories on Facebook in 2017?

The Time Series endpoint showed us that the NFL kneeling protests were the most-covered sports event of 2017. Using the News API, we can also see how many times each one of the over 25 million stories was shared across social media.

Looking at the top three most-shared sports stories on Facebook, we can see that the kneeling protests were the subject of two of them. This shows us that the huge spike in story volume about these protests were responding to genuine public demand – people were sharing these stories with their friends and followers online.

  1. Wife of ‘American Sniper’ Chris Kyle Just Issued Major Challenge to NFL – Every Player Should Read This,” Independent Journal-Review – 830,383 shares.
  2. Vice President Mike Pence leaves Colts-49ers game after players kneel during anthem,” Fox News – 829,466 shares.
  3. UFC: Dana White admits Mark Hunt’s UFC career could be over,” New Zealand Herald – 772,926 shares.


Use the News API for yourself

Well that concludes our brief look back at a couple of the biggest media trends of 2017. If there are any subjects of interest to you, try out our free two-week trial of the News API and see what insights you can extract. With the easy-to-use SDKs and extensive documentation, you can make your first query in minutes.


News API - Sign up


Being able to leverage news content at scale is an extremely useful resource for anyone analyzing business, social, or economic trends. But in order to extract valuable insights from this content, we sometimes need to build analysis tools that help us understand it.

To serve the needs of everyone who needs a simple, end-to-end solution for this complex task, we’ve put together a fully-functional example of a RapidMiner process that sources data from the AYLIEN News API and analyzes this data using some of RapidMiner’s operators.


What can you do with the News API in RapidMiner?

With news content now accessible at web scale, data scientists are constantly creating new ways to generate value with insights from news content that were previously almost impossible to extract. Every month, our News API gathers millions of stories in near-real time, analyzes every news story as it is published, and stores each of them along with dozens of extracted data points and metadata.

Equipped with this structured data about what the world’s media is talking about, RapidMiner users can leverage the extensive range of tools the studio has to offer, including:

  • 1,500+ built-in operations & Extensions to dive into your data
  • 100+ data modelling & machine learning operators
  • Advanced visualization tools.

News_API_RM_vizUsing a 3-D scatter plot to visualize news data with four variables in the RapidMiner Studio

How do I get started with the News API process?

For the this blog we’re going to showcase a example of how you can use our News API within RapidMiner to build useful content analysis processes that aggregate and analyze news content with ease. In this example we’ve picked a fun little example that analyzes articles from TechCrunch and builds a classification model to predict which reporter wrote any new article it is shown (protip: you can use the same model to pick which TechCrunch journalist you should target for your pitch!). We hope this blog might spark some creative ideas and use cases around combining RapidMiner and our News API.

This sample process consists of two main steps:

  1. Gathering enriched news content from the News API using the Web Mining extension
  2. Building a classification model by using RapidMiner’s Naive Bayes operator.

If you are unfamiliar with RapidMiner, there are some great introductory videos and walkthroughs for beginners on their YouTube channel.

So let’s get started!

We’ve made it really easy to get started with the News API and RapidMiner, download this pre-built process and open it with RapidMiner. Next, grab your credentials for the News API by signing up for our free two-week trial.

Once you’ve downloaded the process and opened it up with RapidMiner, you’ll see the main operators outlined in the Process tab. You will see that there are seven operators in total, the first three gather data from the News API while the last four train the classifier.

Screenshot (853)

To make your first calls to the News API, the first thing you need to do is build your search criteria. In order to build your News API query click on the Set Macros operator in the top left of your console. Once you’ve selected the operator, clicking on the Edit List button in the Parameters tab will show you the list of parameters for your News API query. Enter your API credentials (your API key and application ID) that you obtained from the News API developer portal) when you signed up and configure your search parameters – check out the full list of query parameters in our News API documentation to build your search query.

Screenshot (865)

The purpose of this blog is to build a classifier that will predict which TechCrunch journalist wrote an article. In order to do this, we first need to teach the model by gathering relevant training data from TechCrunch. To get this data, we built a query that searched for every article published on on the site in the past 30 days and returned the author of each one, along with the contents of the articles they wrote. The News API can return up to 100 results at a time, but since we wanted more than 100 articles, we used pagination to iterate on the search results for five pages, giving us 500 results. You can see what query we used in the screenshot above.

Importantly, after you have defined these parameters in the Set Macros operator, you’ll need to make the same changes by editing the query list in the Get Page operator within the Process Loop. To do this, double-click on the Loop icon in the Process tab, then double-click the Get Page icon, and select the Edit List button next to Query Parameters.

Screenshot (855)

When you’re entering the parameters, be sure to enter every parameter you entered in the previous window and follow the convention already set in the list (entering the parameter in the “%{___}” format).

News API Results

Once you have defined your parameters in both lists, hit the Run (play) button at the top of the console and let RapidMiner run your News API query. Once it has finished running, you can view the results in the Results window. Below you can see a screenshot of the enriched results with the dozens of data points that the News API returns.

Screenshot (877)

Having access to this enriched news content in Rapidminer allows you to extract useful insights from this unstructured data. After running the analysis you can browse the results of the search using simple visualizations to show data on results like sentiment, or as in the graph below, authorship, which shows us which authors have published the most articles in the time period we set.

Screenshot (883)


Training a Classifier

For the sample analysis in this blog, we’re building a classifier using RapidMiner’s Naive Bayes operator.

Naive Bayes is a common algorithm used in Machine Learning for data classification. You can read more about Naive Bayes in an explainer for novices blog we wrote which talks you through how the algorithm works. Essentially, this classifier will guess which author new articles belong to by learning from features in the training data – the news content we retrieved from our News API results. By analyzing the most common features in the articles from each author in these results, the model will learn that different words and phrases are more likely to be in articles from each author.

For example, take a look below at our how our classifier has learned which writers are most likely to talk about ‘cryptocurrency’. You can test how your classifier by selecting the Attribute button in the top left corner.

Screenshot (879)


Once the process is fully run, it will retrieve and process the news content, and train a Naive Bayes classifier that given the body of an article, tries to predict who the likely authors for that article are, from among all TechCrunch journalists.

Additionally, RapidMiner will also evaluate this classifier for us on a held out subset of the data we retrieved from the News API, by comparing the true labels (known authors) to the model’s predictions (predicted authors) on the test set, and providing us with an accuracy score and a confusion matrix based on the same:

Screen Shot 2017-12-18 at 5.29.56 PM

There are many ways to improve the performance of this classifier, for example by using a more advanced classification algorithm like SVM instead of Naive Bayes. In this post, our goal was to show you how easy it is to retrieve news content from our News API and load it into RapidMiner for further analysis and processing.

Things to try next:

  • Try changing your News API query to repeat this process for journalists from a different news outlet
  • Try using a more powerful algorithm such as SVM or Logistic Regression (RapidMiner includes implementations for many different classifiers, and you can easily replace them with one another)
  • Try to apply a minimum threshold on the number of articles that must exist for each author that the model is trained on

This process is just one simple example of what RapidMiner’s analytic capabilities can perform on enriched news content. By running the first three operators on their own, you can take a look at the enriched content that the News API generates and begin to leverage RapidMiner’s advanced capabilities on an ever-growing dataset of structured news data.

To get started with a free trial of the News API, click on the link below, and be sure to check back on our blog over the coming weeks to see some sample analyses and walkthroughs.

News API - Sign up



Over the past few weeks, the hype about Bitcoin has reached fever pitch, as the cryptocurrency’s rise in price accelerated and the Dollar value of one Bitcoin crossed $10,000 (two weeks later, it’s at over $12,000). Considering you could buy one Bitcoin for $200 in 2015, this is pretty impressive.




But how can we explain this rise? And is it just mania?

A great article in The Atlantic talked about how all currencies are a consensual decision – from a bag of beads to modern banknotes, if a large enough group of people decide something is a currency, then it becomes one.

This is one way to explain the phenomenal rise in value of Bitcoin – from news reports on Russian bots influencing the 2016 election to people noticing how Facebook tracks your behaviour to sell you ads, the average person in the street now knows far more about digital technology than they did in 2009, when Bitcoin was launched. When this is coupled with a popular distrust of banks, it’s easy to see how Bitcoin gains its intrinsic value.

So if the value of Bitcoin is dependent on what large groups of people think about cryptocurrency in general, then understanding what large groups of people are reading about Bitcoin is important because the media coverage of Bitcoin informs buying decisions.. Last month, our News API gathered, analyzed, and indexed 2.6 million news stories as they were published. In this blog, we’re going to look into these stories to see what the media we saying about Bitcoin in November.

We’re going to look at three things:

  • What is the scale of this hype and how is it accelerating?
  • What concepts do the media talk about in stories about Bitcoin, and have the popular concepts changed since the hype has grown?
  • Has the media started to express more positive, negative, or neutral sentiment about Bitcoin since its price shot up?


How big is the hype?

First of all, we need to quantify all of this media attention to see how big the hype actually is by finding exactly how many stories were published about Bitcoin last month and how this compares to previous months.

You can also see that over November, the media interest in Bitcoin increased as the cryptocurrency’s value grew (despite a dip over the Thanksgiving weekend), with this interest peaking on the first day that the value of Bitcoin hit the 10,000-dollar mark. So we can see the media hype was focused on Bitcoin crossing the $10,000 milestone, rather than any definite indicators of rising value in the future.


What else are the media talking about when they talk about Bitcoin?

Knowing the scale of hype about Bitcoin is useful, but knowing what was being talked about in these almost 14,000 stories would let us look even deeper into the Bitcoin saga. When our News API indexes a story, it analyzes and stores dozens of data points, one of which is a list of all other concepts mentioned in the story. Having access to this list for every one of the millions of stories our News API gathers every month gives us a really useful dataset to query.

So we decided to analyze this in two periods – June and November. This will allow us to see if the media has started talking about other subjects since the hype really started taking off in the Autumn.
To analyze this, we used the Trends endpoint to return the most-mentioned concepts in stories with “Bitcoin” in the title. Take a look below and see what else was mentioned.


You can see that the descriptive concepts are the most popular – Ethereum, blockchain, and Coinbase all put Bitcoin in context, and these are all concepts that would be mentioned when talking about Bitcoin in general. This would be very useful if we were looking for related stories, but for our analysis we need to look past these most-mentioned concepts and pay attention to what else was talked about.

Importantly, Japan is mentioned prominently, prompted by the country’s largest Forex market opening up to Bitcoin trading.

In contrast with this, looking at the results of the same search in November’s stories, we can see new concepts being mentioned.


Jamie Dimon and CME Group are two of the most-mentioned concepts here, resulting from JP Morgan Chase’s decision to start offering trades on Bitcoin futures (from CME), despite their CEO (Dimon) publicly declaring only six weeks earlier that he’d fire anyone “stupid enough” to deal in the cryptocurrency.

Remember that November’s story volume is 500% greater than June’s. If November’s story volume was driven by US-based traders beginning to deal in Bitcoin, while June’s story volume included Japanese traders’ much earlier adoption of the cryptocurrency, this gives us a hint that perhaps a lot of the hype around Bitcoin is focused on the news of a few well known financial institutions, which interestingly enough are all based in the US.


What was the sentiment of the coverage?

So we know the volume of the Bitcoin coverage and what was being talked about, but knowing whether all of this coverage was positive, negative, or neutral would let us understand the sentiment shown toward this hype.

Our News API analyzes the sentiment of every story it gathers, so using the Time Series Endpoint again, we can analyze the sentiment of stories with “Bitcoin” in the title over the past few months.


You can see that the sentiment split is roughly the same since June except for in August, when the coverage became more favourable, and in September, when it became more negative. This spike in negative coverage was likely caused by Jamie Dimon’s public remarks about the cryptocurrency being a fraud.

Interestingly, you can see the coverage gets much more positive in November, when the coverage exploded.

If you want to analyze the coverage of Bitcoin in greater detail than our brief layman’s overview here, you can start making queries to our News API in minutes, even without writing a line of code. Sign up for a two-week trial free of charge, with no card details required by clicking on the image below.

News API - Sign up


It’s an exciting time here at AYLIEN – in the past couple of months, we’ve moved office, closed a funding round, and added six people to the team. We’re delighted to announce our most recent hire and our first Chief Architect and Principal Engineer, Hunter Kelly.


The AYLIEN tech infrastructure has grown to quite a scale at this point. In addition to serving over 30,000 users with three product offerings, we’re also a fully-functional AI research lab that houses five full-time researchers, who in turn feed their findings back into the products. With such a complex architecture and backends that have highly demanding tasks, bringing an engineer with the breadth and quality of experience that Hunter has is a huge boost to us as we move into the next phase of our journey.

At first glance, Hunter’s career has followed a seemingly meandering path through some really interesting companies. After graduating from UC Berkeley in the 90s, he joined the Photoscience Department at Pixar in California, became one of the first engineers in Google’s Dublin office, and at NewBay, he designed and built a multi-petabyte storage solution for handling user-generated content, still in use by some of the world’s largest telcos today. Hunter is joining us from Zalando’s Fashion Insight Centre, where as the first engineer in the Dublin office he kicked off the Fashion Content Platform, which was shortlisted as a finalist in the 2017 DatSci Awards.

The common thread in those roles, while perhaps not obvious, is data. Hunter brings this rich experience working on data, both from an engineering and data science perspective, to focus on one extremely important problem – how can we leverage data to solve our hardest problems?

This question is central to AI research, and Hunter’s expertise is a perfect fit with AYLIEN’s mission to make Natural Language Processing hassle-free for developers. Our APIs handle the heavy lifting so developers can leverage Deep NLP in a couple of lines of code, and the ease with which our users do this is down to the great work our science and engineering teams do. Adding Hunter to the intersection of these teams will add a huge amount to our capabilities here and we’re really excited about the great work we can get done.

Here’s what Hunter had to say about joining the team:

“I’m really excited to be joining AYLIEN at this point in time. I think that AI and Machine Learning are incredibly powerful tools that everyone should be able to leverage. I really look forward to being able to bring my expertise and experience, particularly with large-scale and streaming data platforms, to AYLIEN to help broaden their already impressive offerings. AI is just reaching that critical point of moving beyond academia and reaching wide-scale adoption. Making its power accessible to the wider community beyond very focused experts is a really interesting and exciting challenge.”

When he’s not in AYLIEN, Hunter can be found spending time with his wife and foster children, messing around learning yet another programming language, painting minis, playing board games, tabletop RPG’s and wargames, or spending too much time playing video games.  He’s also been known to do some Salsa dancing, traveling, sailing, and scuba diving.

Check out Hunter’s talks on his most recent work at ClojureConj and this year’s Kafka Summit.

Text Analysis API - Sign up



With the world’s media now publishing news content at web scale, it’s possible to leverage this content and discover what news the world is consuming in real time. Using the AYLIEN News API, you can both look for high-level trends in global media coverage and also dive into the content to discover what the world is talking about.   

So in this blog we’re going to take a high-level look at some of the more than 2.5 million stories our News API gathered, analyzed, and indexed last month, and see what we find. Instead of searching for stories using a detailed search query, we’re simply going to retrieve articles written in English and see what patterns emerge in this content.

To understand the distribution of articles published over time, we’ll use the Time Series endpoint. This endpoint allows us to see the volume of stories published over time, according to whatever parameters you set. To get an overview of recent trends in content publishing, we simply set the language parameter to English. Take a look at the pattern that emerges over the past two months:

The first thing you’ll notice is how steady the publishing industry’s patterns are – there is a steady output of around 60,000 new stories in English every weekday, dropping to about 30,000 stories on weekends. This pattern is very regular, except in the last week of the month, when a small but noticeable spike in story volume occurs.

What caused this spike in story volume?

To find the cause of these extra 2,000 – 4,000 stories, we browsed the volume numbers of the biggest categories to see if we could identify a particular subject category which followed the same pattern. We found an unmistakable match in the Finance category – as well as taking place in the same period, this spike also matches the volume of extra stories – roughly an extra 2,000 stories above the daily average.

In addition to this, we also found a similar spike at the end of July. Take a look at the daily story volume of finance stories published over the past six months:

What topics were discussed in this content?

Knowing that the increase in story volume in the last week of October was due to a spike in the number of Finance stories is great, but we can go further and see what was actually being talked about in these stories. To do this, we leveraged our News API’s capability to analyze the keywords, entities and concepts mentioned. The News API allows you to discover trends like this in news stories using the Trends endpoint.

Analyzing keywords lets us get an overview of what people, organizations, and things are mentioned most in the roughly 10,000 finance stories published in that week. Looking at the chart below, it’s a pretty easy to see what caused the spike.

From the results shown in the bubble chart it’s easy to see that keywords and concepts like “quarter”, “earnings”, “financial” and “company” were identified. From this analysis we can make a good guess that a lot of this content published in the last week of October was related to quarterly results and financial reporting by companies. This makes a lot of sense, since in the Time Series chart we could see that a similar spike occurred at the end of July, three months ago.

We thought this was interesting – why was so much published about something so arcane to the general public? In the first graph, we could see the spike that these quarterly earnings reports caused on story volume was visible on a chart of all stories published in the English language. But we don’t think quarterly earnings reports would make the everyday news content consumer drop everything they are doing to check the news.

Why were people interested in quarterly earnings reports?

So we know from the spike in story volume that the media were interested in the quarterly earnings reports. But what were social media users interested in? To find out, we decided to gather the most-shared stories from the Finance categories from the last week of October – the week of the spike. This will tell us if there was a particular aspect to the earnings reports that prompted such a spike in this topic.

The News API lets us do that with the Stories endpoint, by simply searching for the most-shared stories from the Finance category during last week of October across Facebook, LinkedIn, and Reddit.

You can see that of the nine stories we gathered, five are about the earnings reports, and despite this being quite a business-focused topic, Facebook was the network that this content was most popular on, not LinkedIn.


  1. Swiss bank UBS reports 14 percent growth in 3Q net profit,” Associated Press, 39,907 shares
  2. Amazon shares soar as earnings beat expectations,”Associated Press, 39,870 shares
  3. US stocks higher as banks and technology companies rebound,” Associated Press,  39,867 shares


  1. Jeff Bezos is now the richest man in the world with $90 billion,” CNBC, 7,318 shares
  2. CVS Reportedly Looking To Buy Aetna Insurance For $66 Billion,” Consumerist, 2,172 shares
  3. New Uber Visa Credit Card From Barclays Coming Next Week,” Forbes, 1,967 shares


  1. First reading on third-quarter GDP up 3.0%, vs 2.5% rise expected,” CNBC, 14,807 upvotes
  2. New study says Obamacare premiums will jump in 2018 — in large part because of Trump,” Business Insider, 6,255 upvotes
  3. MSNBC host literally left his seat to fact-check Jim Renacci,” Cleveland, 4,641 upvotes


You can see above that of the nine most-shared stories on social media on the week of the spike caused by the quarterly earnings reports, only five actually mention the reports. This suggests to us that although the media published a huge amount on the reports, people in general weren’t too interested in them.

Since we are basing this assumption on just a few headlines above, it’s just a hunch. But with the News API, we can put hunches like this to the test by analyzing quantitative data.

Exactly how interested were people in quarterly earnings reports?

In order to bit more accurate about how interested people were in the quarterly earnings reports, we compared the share counts of the 100 most-shared stories from the week of the spike with those from the corresponding week last month. We can do this using the News API’s Stories endpoint, since the News API monitors the share count of every story it indexes. Also, we’re going to look at Facebook since in the previous section it was the social network most interested in the quarterly earnings reports.

Take a look at how often people were sharing the 100 most-shared stories on Facebook in the last week of October:  

You can see that people were sharing Finance stories less often in the last week of October than the same period in September. This is interesting because we already saw that that there were over three times more Finance stories published in the same period, so we have to assume that people on social media generally just weren’t interested in these stories.

This piece of information is interesting because it shows us that looking at viral stories about a subject can mislead us about how interested people are about that subject.

Well that concludes this month’s roundup of news with the News API. If you want to dive into the world’s news content and use text analysis to extract insights, our News API has a two-week free trial, which you can activate by clicking the link below.

News API - Sign up