Alternative Data and Machine Learning – extracting value from “the New Goldmine”

Alternative Data and Machine Learning – extracting value from “the New Goldmine”


The landscape of data is ever-changing, meaning analysts need to evolve both their thinking and data collection methods to stay ahead of the curve. In many cases, data that might have been considered unique, uncommon or unattainably expensive just a few years ago is now widely used and often very affordable. It is the analysts who take advantage of these untapped data sources, while they remain untapped, who can reap the rewards by gaining a competitive advantage before the rest of their industry or peers catch on.

This type of data is often referred to as alternative data, and with the ever-increasing levels of data available in the modern world comes the opportunity to gain unique insights, competitive industry advantage, and boosted profits. It is perhaps no surprise then to hear that the scramble to get hold of such data has been dubbed the new gold rush.

With so many of our customers here at AYLIEN using our Text Analysis and News APIs to source and analyze alternative data in the form of unstructured content, we thought we would take a look at this trend to give you an idea of how and why it is becoming so popular and important.

What is alternative data?

Alternative data can be described as data that has been derived from non-traditional sources. Data that can be used to complement traditional data sources to produce improved analytical insights that would otherwise not have been achievable with traditional data alone.

Put simply, it’s data that isn’t commonly being used within a specific industry or use case, but can potentially be used to gain a competitive advantage over those that do not have access to it.

Let’s look at investors as an example. Ask any investor what data source they could not do without and they’ll most likely say it’s their Bloomberg terminal, or a similar device. Bloomberg’s data services enable investors to easily scan financial data that has been generated by thousands of companies. The ubiquity of terminals such as Bloomberg’s means that every investor requires one to be successful. However, it therefore also makes it difficult for investors to gain any sort of competitive advantage, seeing as they’re all receiving the same data, at the same time.

So, how is alternative data being sourced and utilized?

Alternative data in use

To give you an idea of just how significant alternative data can be, and the seemingly endless channels from which it can be sourced, we’re going to look at recent examples involving three brands; Chipotle, GoPro and JCPenney.

1. Chipotle

The CEO of Foursquare, which is a search-and-discovery service mobile app, predicted a 30% drop in Q1 sales for Chipotle, based on footfall data accumulated by their app users. Foursquare could see a drop in customer footfall based on a decrease in their users ‘checking-in’ at Chipotle restaurants, and they were correct.

Screenshot 2017-03-31 at 7.01.03 PMSource:

2. GoPro

In November of last year, Wall Street was shocked by the news that GoPro had reported a loss of 60 cents per share on $240.56 million in revenue, after analysts had predicted a far less severe loss of 36 cents per share on $314.06 million in revenue.

While Wall Street never saw this coming, Quandl, a financial data provider were able to foresee the drop in GoPro earnings. In this case, the data came from electronic receipts extracted from over 3 million inboxes. A significant decline in email sales receipts from GoPro’s biggest distribution channel, Amazon, was a key indicator of what was to come. The graph below shows a clear drop in Q3 sales from Amazon, in contrast to other channels;

GoPro revenue estimates by channel

Screenshot 2017-03-31 at 7.02.37 PMSource:

3. JCPenney

When JCPenney reported their Q2 results in 2015, the news came as a surprise to most investors. However, some investors were not surprised at all, because they had been tracking satellite imagery of JCPenney parking lots in near real-time which showed a clear trend in increasing customer footfall.

Screenshot 2017-03-31 at 7.04.18 PMSatellite images of JCPenney parking lots indicated an increase in customer footfall

These three examples show alternative data being sourced from a social app, email receipts and satellite imagery, highlighting the breadth and variety of potential sources that can be explored.

How can NLP and Text Analysis help?

These days we have more data at our fingertips than ever before. One of the main challenges is extracting meaningful insights from it, particularly from vast amounts of unstructured data. Advancements in Natural Language Processing (NLP), Machine Learning and Text Analysis are assisting analysts in exploring data sources that previously would not have been considered worthwhile to their use case or industry.

Here are a few examples:

Social media

Social media channels such as Twitter and Facebook are a gold mine of public opinion and information just waiting to be tapped. If you want to know how the public feels about a specific brand, individual or event, social media offers an easily accessible data source that can be analyzed for trends to help predict consumer behavior.

For example, our sentiment analysis of 1.7 million tweets during last year’s Super Bowl showed Amazon to be the most talk-about brand during the game. Their ad for the Echo also received the highest level of positive sentiment. Result: the Amazon Echo shot up to second place in the bestseller list within a week.

News stories

Highly advanced yet accessible NLP advancements have made it super easy to source and analyze large amounts of news content that can be narrowed down by countless search parameter combinations to deliver precise results to the end-user.

One relevant example we have been seeing among our News API users is the analysis of press releases containing mentions of specific keywords, individuals and companies that are of interest to, for example, an investor.

An investor may have a portfolio of multiple startup companies and may also be keeping a close eye on numerous others. By automatically pulling stories containing mentions of ‘startups’, companies within their own portfolio or companies in a similar area, as soon as they are published, they are accessing the very latest information and can therefore act accordingly and without delay.

Screenshot 2017-03-31 at 5.59.52 PMSearch result from our live News API demo

Some users have even automated their trading process based on press releases, earnings announcements and news of acquisitions.

Customer reviews

The analysis of online customer reviews at scale provides an opportunity for savvy sales professionals and marketers looking to spot a need or weakness in the product or service being reviewed. For example, we analyzed the sentiment of 500 hotel reviews and, using Aspect-Based Sentiment Analysis, we were able to uncover how reviewers felt about specific hotel aspects, such as location, beds, food, staff and WiFi.

As you can see from the charts below, where green = positive and red = negative, the hotel particularly fell down in the areas of beds, WiFi and overall value. As a seller of beds or WiFi solutions, for example, we would find this data extremely useful.

Screenshot 2017-03-31 at 7.12.40 PMAspect-Based Sentiment Analysis of hotel reviews


While we’ve merely scratched the surface today we hope we’ve given you a useful and insightful introduction to alternative data and how it is being used more and more to gain competitive advantage in a multitude of use cases and industries.

News API - Sign up

Let's Talk