Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

The end of AlchemyAPI

Since the very beginning we’ve competed head to head with AlchemyAPI. We have always had a lot in common with them; from our initial mission of bringing NLP to the masses to launching features and products with similar goals and objectives. Even though we wouldn’t have admitted it then, we looked up to Alchemy in many ways.

The news of their acquisition by IBM meant a couple of things: 1. Our biggest competitor was going to be incorporated into a much larger platform and ecosystem and 2. The NLP-as-a-service market started to gather even more hype. But more so, the acquisition opened an opportunity for us to gather some of that market share left by the fallout.

What’s happened since then has been interesting. We hear them mentioned less and less in our sales calls and we’ve noticed more and more customers make the move to us over moving to Watson/Bluemix/Knowledge Studio…whatever it is you need to move to 😉

When we ask our ex-AlchemyAPI users why they’ve decided to move to us over sticking with IBM one particular aspect rings true; the experience we provide is better than working with IBM Watson.

When they talk about experience they refer to some key points;

  • Support
  • Developer experience
  • Flexibility
  • Transparent pricing*
  • Ease of use

* I dare you try and figure out how much your monthly spend will be on a Watson service!

The deprecation of the service

Yes all of this was inevitable and to be honest we were surprised it took this long. AlchemyAPI had built a strong brand and for a lot of the reasons mentioned above it made sense for IBM to hold on to that Alchemy branding and to try and win over their user base.

In the last couple of weeks IBM announced they’re going to kill the AlchemyAPI services by deprecating the AlchemyLanguage and AlchemyNews products. It’s being touted as a rebrand and for the most part it is. However, there are some key elements you need to be aware of in both products before you consider making the move to Watson.

So what’s actually happening?

IBM are shutting down two core AlchemyAPI services, AlchemyLanguage and AlchemyNews. As of the 7th of April you can’t create any new instances of either Alchemy service and support will cease on the 7th of April 2018 for both products. IBM’s advice is to switch to one of their other existing services, Watson NLU or Watson Language Classifier for AlchemyLanguage users and the Watson Discovery News service for AlchemyNews users.

As we already mentioned, this was all expected to happen eventually and all seems pretty straightforward, right? Not really…

What you need to know

We’ve been investigating this for the last couple of days and we’re still a little confused as to what a typical migration for users will look like. Figuring out which of the 120+ services on Bluemix you need is hard, deciding which Watson service you should migrate to is confusing  and it’s really not clear what elements of AlchemyAPI they are keeping and what exactly they’re dumping in the bin.

Even though they’ve made the effort to phase out the Alchemy services with the customer in mind, in our opinion they haven’t made working with Bluemix and/or Watson easy from the get go, no matter which service you’re using. This means existing Alchemy users are going to be faced with a number of challenges: changes in pricing, flexibility and accessibility and the deprecation of some core features.

1. Using Watson/Bluemix; have they lost the built-for-developers feel?

Access and ease of use

At AYLIEN we obsess over the “Developer Experience”. We take steps to make it as easy as possible for our users to get up and running with the service. Our documentation is clear, complete and easy to understand and navigate. We provide SDKs for 7 different programming languages. We have a robust and fully functional free product offering and we do everything we can to make even our paid plans accessible to development teams of all shapes and sizes.

The accessibility of our tech is something we feel very strongly about at AYLIEN and that’s something we believe was important to the AlchemyAPI team too. However, it’s difficult to say the same about IBM. We can’t comment on the details of the strategy but from what we hear from speaking with our users and the accessibility of Bluemix and Watson as a whole, things are looking a little grim for developers hoping to use the Watson language services.

  • You need a Bluemix account to access the Watson services which is only available on a 30 day trial
  • For most of the Watson services you’ll only find 3 – 5 SDKs available
  • The pricing structure of many of the Watson services is extremely prohibitive and down right confusing (more on that below)

Flexibility and support

What we hear time and time again from our customers is working with us is just easier. Put simply they don’t want to deal with a beast of an enterprise like IBM. They want to know that if they have a feature request or feedback they’ll be listened to, they don’t want to jump through hoops and engage in a drawn out sales process in order to use the service, they want to know that the tech they’re using is constantly evolving and the team behind it are passionate about advancing it, but above all, they want the reassurance that if something goes wrong they can pick up the phone and talk to someone who cares.

We do everything we can to make sure our customers are getting the most of our APIs. We’ll run on-boarding calls with new users, we consult with our users on how they can integrate with their solutions and from time to time we’ll work directly with our customers to customize existing features to their specific needs. This level of service is only possible because of our ability to stay flexible and agile.

AYLIEN Text Analysis API

Sample support thread – Yes our founder still handles some support queries

2. Pricing; what’s going to happen your monthly cost?

If you do plan on sitting down to figure out how your cost might change after your move to Watson we recommend grabbing a coffee and snack to get you through it. We have created some comparison tables below to provide a quick overview of some of the savings you can make by migrating to AYLIEN over Watson. They are broken down by service and describe what you should expect to pay as an AlchemyLanguage or AlchemyNews user moving to the Watson services vs what you would pay with AYLIEN.

AlchemyLanguage users

Note: Watson NLU and NLC services are charged on a tiered basis based on the number of Natural Language Units you use. Like our pricing it’s volume based so we were able to compare some price points based on example volumes.

Pricing example:

Hits / NLU units AYLIEN Watson Saving
180,000 $199 $540 $441
1,000,000 $649 $1,500 $851
2,000,000 $649 $2,500 $1,851

AlchemyNews users

Pricing Example:

Note: Watson Discovery News and AYLIEN News API are priced a little differently. Our pricing is based on how many stories you collect and analyze but we don’t care how many queries you make and Watson Discovery News charges per query made plus enrichments. In Watson Discovery News however there is a limit of 50 results (stories) per query which means it’s not too difficult to compare the pricing based on an example number of stories collected.

Stories AYLIEN Watson Saving
100,000 $211 $200 $-11
1,000,000 $1,670 $2,000 $851
2,000,000 $2,871 $4,000 $1,129


– We’re running an Alchemy Amnesty offer –

We know that many of Alchemy’s hardcore fans are looking for an alternative that is just as user friendly and powerful, which is why we’re running an Alchemy Amnesty: we’re giving away 2 months free on any of our plans to any customer moving from AlchemyAPI to AYLIEN. In order to avail of the offer signup here and just drop us an email to sales@aylien.com.

Alchemy Amnesty

3. Features; what’s going and what’s staying?

The AlchemyLanguage features are being incorporated into Watson NLU and Watson NLC while the AlchemyNews product is being incorporated into a larger product named Watson Discovery News.

As part of the migration however, there are a number of changes to the feature set which we’ve set out below. We’ve only listed the features that are either changing or being canned altogether.

AlchemyLanguage

Feature Watson NLU AYLIEN Text API
Entity & Concept Extraction
Sentiment Analysis
Language Detection
Image Tagging
(separate service)
Related Phrases
(separate service)
Summarization
Hashtag Suggestion
Article Extraction
Semantic Labelling
Microformat Extraction
Date Extraction


AlchemyNews

Feature Watson Discovery News AYLIEN News API
Customized Sources
Source Rank
(Blekko)

(Alexa)
Industry Taxonomies 1 2
Languages 1 6
Summarization
Clustering
Social Stats
Similar Articles/Stories
Deduplication


Don’t just take our word for it, make your own mind up. All the details you need to start testing our service for each of our solutions is listed below.

Text API

News API

Final thoughts

Users are moving to AYLIEN for a variety of reasons that we outlined above. The primary drivers are ease of use, flexibility, support, feature set and pricing. If you’re dreading making the move to IBM and you miss the experience of dealing with a dev friendly team, we’ve got you covered ;).

1

0

In this tutorial we’re going to show you how easy it is to analyze customer opinion in reviews and social media content using the “Text Analysis by AYLIEN” Extension for RapidMiner. In particular we will walk you through building a review analysis process using our Aspect-Based Sentiment Analysis feature to mine and analyze customer reviews.

If you’re new to RapidMiner, or it’s your first time using the Text Analysis Extension, you should first read our Getting Started tutorial which takes you through the installation process and the basics behind using our extension. You can download a copy of the Extension in the RapidMiner marketplace.

N.B. If you haven’t got an AYLIEN account, you can sign up for a free account here. You’ll need this to use the Text Analysis Extension.

Aspect based Sentiment Analysis

 

What is Aspect-based Sentiment Analysis (ABSA)?

The whole idea behind our ABSA features is to provide a way for our users to extract specific aspects from a piece of text and measure the sentiment towards each aspect individually. Our customers use it to analyze reviews, Facebook comments, tweets and input from customer feedback forms to determine not just the sentiment of the overall text but what aspects, in particular, a customer likes or dislikes from that text.

We’ve trained domain specific models for the following industries:

  • Restaurants
  • Hotels
  • Airlines
  • Cars

Building the review analysis process

So, here’s what we’re going to do:

  • Analyze the sentiment of reviews collected in a CSV file
  • Understand the top aspects mentioned and their sentiment (positive, negative or neutral)
  • Run a correlation analysis on the words and aspects
  • Visualize our findings

Here’s what our completed process will look like when we’re finished. In this tutorial we’re going to walk you through each step of the process and what operators we used to analyze hotel reviews.

Completed RapidMiner Process

Step 1. Analyzing reviews

For the purpose of this tutorial we are going to use a collection of reviews we gathered on one particular hotel from publicly available sources. Our reviews were listed in a CSV file which we loaded into RapidMiner using the Read CSV operator.

Loading the reviews is the easy part. As you can see from the completed process we start with a Read CSV operator that reads the file containing all our reviews from the disk. All you need to do is to specify the path to the file. We then use the AYLIEN Analyze Aspect-Based Sentiment operator to analyze the sentiment of each review in our file.

Screen Shot 2016-09-13 at 18.51.58

 

 

 

Remember to set your input attribute to “Review” and to choose your domain option in the parameters section of the operator. In this case we’re using the hotels model.

 

Screen Shot 2016-09-12 at 12.08.10

 

Once you’ve ran the analysis, a new ExampleSet will be generated under the Results tab, which contains a new column for showing a list of the aspects present in each review and the respective polarity (positive, negative or neutral) of each aspect.

You can see the identified aspects and their polarity listed in the column in yellow.

Screen Shot 2016-09-09 at 11.13.28

 

While we have the results listed in the ExampleSet, the format they are in means it’s a little difficult to analyze and visualize them further. Which is why need to spend some time cleaning and prepping the results.

Step 2. Prepping the results

In order to make sense of the data we are going to create word vectors through a tokenization process using a “Process documents to data” operator as shown in the image below. This operator is used to create word vectors from string attributes through tokenization and other text processing functions.

Before running our tokenization we duplicate our data using a Multiply operator which allows us to run two types of analysis in parallel with different end goals in mind using the same data. Which is why we have two separate “Process documents to data” operators in our process.

 

Screen Shot 2016-09-13 at 19.00.56

 The ABSA Result Processor:

For the first process we’re going to tokenize our ABSA results (which are in the format “aspect:polarity”) using a simple whitespace split, and we’re going to assign weights to these newly created columns or features based on Binary Term Occurrences, i.e. a 1 if a specific aspect:polarity pair exists in a review, and 0 otherwise. You can see the various parameters we’ll use in the Parameters section below.

Screen Shot 2016-09-12 at 15.30.57

 

Screen Shot 2016-09-13 at 19.15.08

 

The Review Text Processor:

For the second processor we’re going to run some further text processing functions on the review text from our duplicated set.

First we’ll tokenize the text to create unigram tokens, we’ll then transform them all to lowercase, and clean the data by filtering out tokens that contain non-letter characters such as numbers and punctuation using a regular expression ([A-Za-z]*) and finally we’ll discard tokens shorter than 3 characters and remove all stopwords. All of these functions can be seen in the sub-process below.

Screen Shot 2016-09-12 at 15.28.38

 

Screen Shot 2016-09-13 at 19.14.59

 

So now that we have our sentiment analysis done and the data processed and cleaned, we’ll run some further processes which will make mining the data and visualizing it a little easier.  

Step 3. Splitting and filtering results

 

Screen Shot 2016-09-13 at 18.53.05

 

 

 

 

 

 

 

 

 

First we’ll use a Split operator to separate out aspects and polarity attributes (which, if you recall, are in the format “aspect:polarity” in our data, e.g. “beds:positive”). In the parameters section you should choose the attribute you want to split and the split pattern. In this case we’re going to split by “:” in our results as shown below.

 

Screen Shot 2016-09-13 at 19.14.43

 

The ExampleSet generated should resemble the one below showing the attribute split into word_1 (Aspect) and word_2 (Polarity) columns:

Screen Shot 2016-09-13 at 18.09.32

 

Using the duplicated results we’re going to isolate both positive and negative results using a simple Filter operator which is also shown in the image above.

Your filtered ExampleSets should resemble the one displayed below, showing the positive aspects and their count. You will have also noted that we used a Sort operator to sort our results in descending order of total occurrences, i.e. which aspect:polarity pairs appeared most frequently in the entire review set, which will help in our visualization process.

Screen Shot 2016-09-13 at 18.09.05

Step 4. Correlation analysis

The final step before we visualize our results is running a correlation analysis between words used in our reviews and the positive, negative and neutral aspects. We want to see which words are most commonly used to express a certain sentiment (positive, negative or neutral) towards a certain aspect (e.g. beds).

Luckily in RapidMiner this is very easy to do using the Correlation Matrix operator. In order to use it however we first need to join the two ExampleSets that we created separately, so we’ll have the words and the aspect:polarity pairs in one dataset. To be able to do that, we need to assign numerical IDs to our results, which can be done with the Generate ID operators. Afterwards we’ll simply use the Join operator to merge these ExampleSets and feed the result to the Correlation Matrix operator.

 

Screen Shot 2016-09-13 at 18.52.42

 

 

 

 

 

 

 

Your Correlation Matrix should resemble the one below:

Screen Shot 2016-09-13 at 18.08.47

 

The higher the correlation coefficient (the values in the matrix), the stronger the correlation, with 1 being the highest and -1 the lowest, i.e. an inverse correlation.

Using the matrix table you can filter and identify words extracted from reviews that correlate with a certain aspect:polarity attribute and vice-versa, as shown in the example below where words like dirty, blankets, complaint, dingy and cigarette correlate with the negative references to cleanliness.

Screen Shot 2016-09-13 at 16.24.40

Step 5. Basic Visualization

Doing basic visualizations in RapidMiner is easy using either the Charts function or the Advanced charts capabilities in your Results tabs. Below we’ve used some simple bar charts to visualize our findings.

 

Negative aspects mentioned

Screen Shot 2016-09-13 at 18.08.37

Positive aspects mentioned

Screen Shot 2016-09-13 at 18.07.58

Polarity of aspects mentioned

Screen Shot 2016-09-13 at 18.07.05

 

You can download the entire RapidMiner Process and try it for yourself – Download the process

If you’d like to read more about how you can collect reviews for analysis using RapidMiner. Check out our tutorial on Scraping Rotten Tomatoes reviews with RapidMiner.

We’ve also found some useful customer review datasets which you can use if you’d like to build this process yourself using sample reviews.

 





Text Analysis API - Sign up




0

Data Science

Introduction

The dust has truly settled on what was one of the biggest sporting occasions of the year, the 2016 European Championships. The worldwide interest in the Euro 2016 soccer tournament was particularly evident across social media platforms with Twitter, Facebook and even Instagram seeing record numbers in tournament-related interactions over the 4 week period.

As you may have seen before, here at AYLIEN we like to monitor and gather social media and news content around particular events in search of interesting insights using our Text Mining capabilities through our APIs.

Previous posts: Super Bowl 50 according to Twitter and Text Analytics meets 2014 World Cup.

So what did we do this time?

We collected a total of 27 million tweets over the course of the tournament with the purpose of mining these tweets to look for interesting correlations and insights. Using the Twitter Search API, we built searches around official hashtags and handles for both the tournament itself and the teams involved. Following some simple preprocessing of the data, such as removing retweets and tweets containing links to narrow our focus and eliminate some noise, we moved our data to a big MySQL database which made it a lot easier to work with.

The first piece of analysis we did was to run all the tweets through our Language Detection endpoint to split them up by language. You could also use the language predictions provided by Twitter to save some time. The second piece of analysis we did was to analyze the sentiment of all of the English tweets, which amounted to about 17 million in total. The final task involved extracting mentions of Entities in these tweets, paying particular attention to mentions of the countries playing at the tournament.

We decided to dive deeper into 4 areas of interest:

  • Volume of tweets and language;
  • Teams of particular interest (Portugal, France, Iceland and England);
  • The Final game (Portugal v France);
  • And of course, Cristiano Ronaldo (yes, he gets his own section!)

 

Tools used:

Twitter Search API;

AYLIEN Text Analysis API;

AYLIEN News API;

Tableau;

 

Volume of tweets

As was to be expected, the majority of social chatter around the tournament was focused in Europe. Other areas of note included The US and Australia but perhaps most surprising was the high concentration of tweets from soccer fans in Indonesia. This was also reflected in the tweets-by-language analysis we ran, which we’ll discuss later.

Not surprisingly, the most mentioned team was the host nation and tournament runners-up, France. In second place was the champions, Portugal, and 3rd place was England who were up there for all the wrong reasons, which we’ll dive into a little bit later in the post.

While the vast majority of tweets, regardless of their geographic origin, were in English, let’s take a look at the language breakdown.

 

Tweets by language

Tweets in English accounted for over 62% of all tweets collected. 15% of tweets were written in French and about 11% were made in Spanish. Other languages to feature included Portuguese, German and Italian but the biggest surprise was the volume of tweets written in Indonesian which was also highlighted in our Geographic analysis.

Looking at the volume of tweets by language highlights some interesting insights around public interest and following throughout the tournament, revealing a clear connection between fan following and interest in the tournament as a whole.

For tweets written in French and Portuguese you can get a clear understanding of how far a team progressed in the tournament by looking at the volume of tweets written in their native language throughout the tournament. The spikes in the visualizations represent each game and the trend line shows the evident rise or fall in following.

The diminishing voice of the fan following is most evident through a clear indication of how fan following decreased throughout the tournament leading up to their departure.

 

Team Focus

As we mentioned, we decided to pick 4 teams to focus our analysis on – Portugal, France, Iceland and England. We chose teams that were either linked to major events or talking points in the tournament or had performed particularly well.

Each graph in the story below shows the volume of tweets mentioning that team, the tweet polarity (whether it’s positive or negative) and also the rolling average polarity throughout the tournament.

We’ve chosen interesting talking points for each team and highlighted when they occurred and their effect on fan reaction in each graph.

Tip: Click the linked talking points for news stories gathered with our News API.

England:

England were a terrible disappointment at the Euro 2016 tournament. The star-studded team of the Premier Leagues top players failed to impress and were knocked out of the tournament by a much weaker team (on paper) from Iceland. Their tournament was also heavily overshadowed by the behavior of their fans and the departure of their manager Hodgson only highlighted the scale of the issues the English FA had to deal with.

Talking points:

 

Iceland:

A country with a population of 323,000 and about 100 professional footballers provided us with the feel good story of the tournament. Iceland, who were really only expected to show up, did a whole lot more by coming second in their group, drawing with the eventual tournament winners and toppling one of the tournament favourites in the quarter finals.

Talking points:

 

France:

The tournament favorites France easily progressed to the knockout stages where they dealt with a far less experienced Icelandic side and impressively put 2 past the current world champions, Germany. The team which had the tournament’s top scorer was truly on form and looked like they had the tournament in the bag.

 

Portugal:

Although they eventually reigned supreme, Portugal had an all but impressive tournament. Having failed to win 6 of their 7 games in the regulation 90 minutes, they relied on snatching wins during extra time and by holding their nerve in the lottery of penalty shootouts. They even had a couple of close-calls with two far weaker teams in Iceland and Austria. The form of their main man, Cristiano Ronaldo, was at the heart of both their successes and failings throughout the tournament as the Portuguese talisman, carrying an injury throughout, could only show us glimpses of his best.

Talking points:

 

The Final

The final of Euro 2016 attracted as many as 300 Million viewers across the world. What was expected to be a high tempo showdown between a goal hungry, in-form French team and a well-drilled Portuguese team who hadn’t lost a game in the tournament turned out to be whole lot less.

Portugal’s Ronaldo and France’s Griezmann were facing off for the title of Euro 2016 top goalscorer but it was for other reasons that Ronaldo took the limelight and some would argue the sting out of the game as a whole.

Talking points:

  • Ronaldo is fouled by Payet in the 12th minute
  • Ronaldo is forced to leave the field injured after 26 minutes
  • France miss a number of close chances
  • The game enters extra time and is looking like it will go to penalties
  • Eder scores and France look to be defeated

Ronaldo

Cristiano dominated social chatter and news throughout the tournament. Usually it’s his goal tally alone which puts him in the spotlight but during Euro 2016 he was the talk of the tournament for a variety of other reasons, and we’re not even referring to the moth incident!

 

Talking points:


  • Ronaldo shows his true colors and passion for the team on the sidelines
  • Like a true goal scorer, Ronaldo never misses an opportunity…to take his top off


Conclusion

Simple case studies like this highlight the wealth of information hidden in social chatter. Brands and organizations who care about the voice of their customer have no choice in today’s world but to try and leverage social media conversations in order to stay on top of what it is their customers like or dislike about them and their competitors. If you’d like to hear more about using AYLIEN for social listening drop us a line at hello@aylien.com, we’d love to hear from you.

 




Text Analysis API - Sign up




0

Product

Introduction

Our News API is much more than just a news aggregator. It collects news based on a variety of different search criteria: keywords, entities, categories, sentiment and so on. However it’s the ability to index and analyze the content sourced that makes the News API extremely powerful.

Whether you’re pushing data from the News API into an app, resurfacing the analysis in a news feed or building intuitive dashboards with the data extracted, the News API allows you to get a deep understanding for what is happening in the news on a near real time basis.

Our News API has a variety of analysis focused endpoints that allow our users to make sense of the data extracted from news content in a meaningful way. In this blog we’ll talk you through some of the features/endpoints that can help you dive into the stories you collect with the News API and the data extracted. We’ll introduce the endpoints that our News API users leverage to;

  • Track events over time
  • Spot Trends in news content
  • Compare sources and author’s opinion
  • Monitor coverage of the same or similar stories across the web

Leveraging time stamped data: /time_series

Simply put, a Time Series is a sequence of data points plotted over a period of time. Our Time Series endpoint is used when you want to analyze a set of data points relative to a certain timeframe. This makes it easier to analyze timestamped data and means the data can be easily visualized in a meaningful way.

We’ve included a simple example below where we’ve used the Time Series endpoint to understand the volume of stories over a given time period. You’ll also notice in the example that we’ve used the Time Series endpoint to track how the polarity or sentiment of stories changes over time, not just the volume.

 

 

Our customers are using this to monitor how something like a topic, entities or even a particular article might be talked about on social media over a certain time period. It helps them spot stories or topics that might be quickly gathering pace online, hinting at their popularity, importance and virality. We even have users utilising the Time Series endpoint combined with others to identify when is the best time to post articles of a particular nature, for example, posting Personal Finance categorised articles at the end of a fiscal quarter versus mid quarter.

Visit our documentation for more info on the Time Series endpoint.

Working with numerical data points and metrics: /histograms

A histogram is a graph which represents the distribution of numerical data. Our histogram endpoint allows you to get an aggregated profile of a certain metric. It’s up to you which metric you use.

 

 

Above we’ve shared two sample graphs we built with data extracted using the Histogram endpoint. We’ve graphed the number of social shares per story and the length of articles in words sourced through the API.

Our News API customers use this feature to uncover insights like what content categories are most popular across social media or how many words does an author tend to use when writing about a certain topic, which is useful if you’re sending them a news tip for instance.

Visit our documentation for more info on the Histograms endpoint.

Uncovering useful insights: /trends

The Trends endpoint is designed to make it easier to identify most frequently appearing entities, keywords and topical or sentiment-related categories. Meaning you can analyze how often something occurs in the content you source through the API.

You can use the Trends endpoint to monitor the frequency of Categories, Entities and Keywords in the results sourced from the API. For example, you would use the Trends endpoint if you wanted to identify the most mentioned entities from a collection of news articles or the frequency of content category in a collection of articles.

 

 

This endpoint enables our News API users to get a better understanding of topics and mentions of people, organizations, and places in news stories, as shown in the example above. It also means you can conduct distribution or quantitative focused analyses on say, the breakdown of one topic or category to another, or the overall sentiment of articles, as shown in the pie charts below.

Visit our documentation for more info on the Trends endpoint.

Story Coverages and finding related articles: /coverages & /related_stories

The Coverages and Related Stories endpoints provides a 360 degree view on the news reactions to a story. They are designed to provide an easy way for our News API users to source news articles covering the same story. Our users utilize both endpoints in different ways and for a variety of different reasons.

Coverages: Coverages allows you to understand the reach an article has from a news coverage point of view. Our PR focused users utilize this endpoint to get an understanding for how well a press release is performing, based on the number of Coverages it’s had.

Visit our documentation for more info on the Coverages endpoint.

Related Stories: Related Stories looks for semantically similar articles, which are those articles that might be covering the same story or dealing with the same topic. It provides an understanding for how a news story is breaking and the overall reach of a story.

When combined with other parameters like location source or author, it can be used to compare how coverage of the same story might differ between geographical regions or an author’s angle.

Note: Both the Related Stories and Coverages endpoint support GET and POST methods. This means you can provide a URL or Raw text as your input to the News API.

Our API users use this feature to identify related articles to those sourced in their search or from existing text or news articles by passing a URL or the raw text of an article.

As you can see below we’ve used the Related Stories endpoint to source related news content based on the text of a tweet. The API provides a number of semantically similar stories in it’s results.

Input:

 

Output:

 

Visit our documentation for more info on the Related endpoint.

We hope this gives you a better understanding for how our News API can provide you with an intelligent way of sourcing and analyzing news conent at scale.

 




News API - Sign up




0

Introduction

As you may be aware, we recently boosted our Text Analysis API offering with a cool new feature, Aspect-Based Sentiment Analysis. The whole idea behind Aspect-Based Sentiment Analysis (ABSA) is to provide a way for our users to extract specific aspects from a piece of text and determine the sentiment towards each aspect individually. Our customers use it to analyze reviews, Facebook comments, tweets and customer feedback forms to determine not just the sentiment of the overall text but what, in particular, the author likes or dislikes from that text

We’ve built models for 4 different domains (industries). You can see the domains and the domain specific aspects listed in the image below.

 

 

Not familiar with Sentiment Analysis? We explain it quickly and simply here to help get you up to speed. You can also try it for yourself with our easy-to-use online Demo.

ABSA for the AYLIEN Google Sheets Add-on

We’ve just added the ABSA features to our Google Sheets Add-on. The add-on enables users to perform Text Analysis functions from within Google Sheets, without any coding/programming knowledge. We designed the add-on to be as simple and user friendly as possible. If you are in any way familiar with Google Sheets (or Microsoft Excel), you’ll be up and running in no-time.

It’s so simple, in fact, that we can analyze the sentiment of text (hotel reviews in this instance) in under 15 seconds.

Analyzing hotel reviews from Yelp

To showcase our latest feature addition, we analyzed 500 hotel reviews, from Yelp, within Google Sheets. We’ve embedded the sheet we used below to show you exactly how the add-on functions. 

Findings

After running Aspect-Based Sentiment Analysis on the 500 hotels reviews, we had an abundance of data and information to draw insights from. As you can see from the “Analyzed Reviews” sheet in the embedded spreadsheet below, the results of our analysis are laid out neatly in columns containing the review text, the review rating (out of 5) and the positive, negative and neutral aspects found within each review.

Looking beyond the overall sentiment of a review tells us what exactly the customer liked or disliked about their experience. It allows a much greater understanding for the voice of the customer.

Aspects Mentioned

The first thing we wanted to figure out was what aspects the reviewers were speaking about across all reviews and how often each aspect was mentioned.

 

 

It’s interesting how little WiFi is mentioned in reviews when it can, in our opinion, be one of the more frustrating aspects of a hotel’s service. It’s also pretty clear from the graph that people are most opinionated about a hotel’s location and the amenities of their room (TV, Iron, mini bar etc…).

World Cloud

We used another simple Sheets Add-on, Word Cloud Generator, to produce word clouds from the most cited positive and negative aspects. Straight away we can see that the two biggest complaints among customers are the room itself and hotel amenities.

 

Sentiment per Aspect

 Having understood what it is people talk about in reviews we also wanted to dive into the opinion towards each aspect across all the reviews.

To do this we generated aspect-specific pie charts to generate more targeted visualizations (You can see interactive versions in the “Graphs” sheet above).

Note: It’s important to keep in mind the percentages shown are based on the reviews that mention that particular aspect. So taking WiFi, for example, 45.9% of reviews that mentioned this aspect were negative.

 

 

Analyzing reviews on a per aspect basis enables hotel management, for example, to focus on and track the performance of specific aspects, departments or services, such as food & beverage or how the staff are perceived by guests. It can also help to uncover correlations between certain aspects – does an increase in positive sentiment toward Staff lead to an increase in positive sentiment towards Value?

Sentiment of Aspects

The last thing we want to do was to try and get a high level overview of the entire collection of reviews. The graph we created below gives a really nice overview and comparison of customer sentiment towards each of the various aspects across all reviews. It gives a snapshot view of hotel performance as a whole according to all of the reviews.

Conclusion

We’re really excited about this latest addition to our Text Analysis API and Google Sheets Add-on. The analysis we performed above really was so simple that we believe anyone can do the same with their own text/data.

Wanna try it for yourself? You can access the AYLIEN Google Sheets Add-on for free and we’ll even give you 1,000 credits free to run your own analysis.

If you want to explore ABSA a bit deeper (and 12 other Text Analysis features) you can sign-up for our Text Analysis API which will enable you to make up to 1,000 API calls per day. No credit card required. Click the image below to get started!

 




Text Analysis API - Sign up




0

It’s hard to believe that it has been almost a month since we launched our News API. We have been thrilled with the feedback we have been receiving from our users and wanted to start sharing some useful hints and tips for using the API. Here’s four to begin with :-).

1. Advanced search using boolean operators

To find what exactly you’re looking for while searching for articles, the News API supports standard boolean search operators such as AND, OR, NOT and brackets in the full-text fields such as Title, Body or Text:

  • AND: the AND operator can be used to combine multiple search terms that must appear together. Example: `fashion AND shoes` shows results that include both “fashion” and “shoes”.
  • OR: the OR operator can be used to combine multiple search terms that may or may not appear together. Example: `fashion OR shoes` shows results that include “fashion”, “shoes” or both.
  • NOT: the NOT or “-” operator can be used to exclude a word or phrase. Example: `fashion NOT shoes` or `fashion -shoes` shows results that include “fashion” but not “shoes”.
  • “ ”: double quotes can be used to enforce exact phrase matches. Example: “The Batman Begins” will only match results that include the entire phrase.
  • ( ): brackets can be used to group phrases together. Example: `fashion and (shoes or dresses)` matches results that either contain “fashion” and “shoes” OR “fashion” and “dresses”.

Let’s find articles that include “Trump” or “Cruz” in their title, and don’t include “Clinton” nor “Sanders” in their body:

 

API Call:

curl -X GET --header "Accept: application/json" --header "X-AYLIEN-NewsAPI-Application-ID: YOUR_APP_ID" --header "X-AYLIEN-NewsAPI-Application-Key: YOUR_APP_KEY" "https://api.newsapi.aylien.com/api/v1/stories?title=Trump%20OR%20Cruz&body=-Clinton%20-Sanders&language%5B%5D=en&sort_by=published_at&sort_direction=desc&cursor=*&per_page=10"

 

Results

 

2. Search article or headline

The News API allows you to search for articles based on the headline, body content or both. Here’s an example:

To search for articles that mention “Messi” in their title and “Barcelona” in their body we can run the following query:

curl -X GET --header "Accept: application/json" --header "X-AYLIEN-NewsAPI-Application-ID: YOUR_APP_ID" --header "X-AYLIEN-NewsAPI-Application-Key: YOUR_APP_KEY" "https://api.newsapi.aylien.com/api/v1/stories?title=Messi&language%5B%5D=en&sort_by=published_at&sort_direction=desc&cursor=*&per_page=10"

 

Results

 

Other things to try:

You can use the above filters together with boolean search (our previous tip) to run more precise queries. For example, search for articles that mention “Donald Trump” in their title, and don’t include any mentions to “Clinton” in their body.

 

3. Build your own recommendation system and generate related articles based on your own content

Chances are that you have seen the popular “From around the web” feature popping up across a multitude of blogs and news sites on the Internet.Wish you could build your own fully customized version of that? Well look no further.

 

 

Using the Related Stories endpoint of the News API, not only can you generate recommendations for articles retrieved through the News API, but also for any other piece of text. All you need is a title and a description. Let’s look at an example:

Let’s assume we want to show recommended articles on the Wikipedia article for Hillary Clinton. We can extract the title and a bit of description from the page using a short bit of code, or using the AYLIEN Article Extraction API. From the page we can extract the following:

 

 

Title: Hillary Clinton

Description: Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/ (born October 26, 1947) is an American politician. She is a candidate for the Democratic nomination for President of the United States in the 2016 election. She was the 67th United States Secretary of State from 2009 to 2013. From 2001 to 2009, Clinton served as a United States Senator from New York. She is the wife of the 42nd President of the United States Bill Clinton, and was First Lady of the United States during his tenure from 1993 to 2001.

Now let’s feed this to the Related Stories endpoint and see what we get back:

API Call:

curl -X POST --header "Content-Type: application/x-www-form-urlencoded" --header "Accept: application/json" --header "X-AYLIEN-NewsAPI-Application-ID: YOUR_APP_ID" --header "X-AYLIEN-NewsAPI-Application-Key: YOUR_APP_KEY" -d "story_title=Hillary%20Clinton&story_body=Hillary%20Diane%20Rodham%20Clinton%20%2F%CB%88h%C9%AAl%C9%99ri%20da%C9%AA%CB%88%C3%A6n%20%CB%88r%C9%92d%C9%99m%20%CB%88kl%C9%AAnt%C9%99n%2F%20(born%20October%2026%2C%201947)%20is%20an%20American%20politician.%20She%20is%20a%20candidate%20for%20the%20Democratic%20nomination%20for%20President%20of%20the%20United%20States%20in%20the%202016%20election.%20She%20was%20the%2067th%20United%20States%20Secretary%20of%20State%20from%202009%20to%202013.%20From%202001%20to%202009%2C%20Clinton%20served%20as%20a%20United%20States%20Senator%20from%20New%20York.%20She%20is%20the%20wife%20of%20the%2042nd%20President%20of%20the%20United%20States%20Bill%20Clinton%2C%20and%20was%20First%20Lady%20of%20the%20United%20States%20during%20his%20tenure%20from%201993%20to%202001.&boost_by=recency&per_page=5" "https://api.newsapi.aylien.com/api/v1/related_stories"

And we get the following articles back:

 

 

Other things to try:

You can use all the regular story filters such as Sentiment, Source or Category filters to further narrow down the scope of the returned stories, e.g. to get related articles with a positive tone, or related stories that are from a specific source, and so on.

In the above example we boosted the related stories by recency. You could also boost them by `popularity` to give a higher weight to how popular each story is on social media when sorting the related stories.

 

4. Trending articles on social media

The News API constantly monitors social media to measure the popularity of each and every article that it retrieves and indexes. This information can be used in two forms:

1. To sort articles by social popularity when searching for articles (using any of the /stories, /related_stories or /coverages endpoints), which can be done by setting the `sort_by` parameter to `social_shares_count`.

2. To profile each article’s popularity over time by looking at its `social_shares_count` property.

Let’s get the most popular article that mentions David Bowie from the last 60 days by setting `sort_by` to `social_shares_count` and `per_page` to 1:

 

API Call:

curl -X GET --header "Accept: application/json" --header "X-AYLIEN-NewsAPI-Application-ID: YOUR_APP_ID" --header "X-AYLIEN-NewsAPI-Application-Key: YOUR_APP_KEY" "https://api.newsapi.aylien.com/api/v1/stories?title=%22David%20Bowie%22&language%5B%5D=en&published_at.start=NOW-60DAYS&published_at.end=NOW&categories.confident=true&cluster=false&cluster.algorithm=lingo&sort_by=social_shares_count&sort_direction=desc&cursor=*&per_page=1"

 

We get the following article that has about 3,000 shares on Facebook:

 

 

We continuously and repeatedly profile articles for their social performance, so if you look at the JSON,under the hood you will see something like the following:

 

 

Which shows us how the popularity of this article has changed over time.

 

Conclusion

So there you go. 3 things that you may not have known you could do the AYLIEN News API. Check back regularly as we will be sharing more hints, tips and cool use-cases.

Want to try the News API for yourself? Click the image below for a 14-day free trial.





News API - Sign up




0

Data Science

Dubbed the biggest leak of its kind ever, bigger than WikiLeaks and Edward Snowden’s leak in 2013, the Panama Leaks has shed light on how the world’s rich and famous are moving and hiding money across the globe.

 

 

The information, released by the International Consortium of Investigative Journalists following a tip off from an anonymous source connected to German Newspaper Süddeutsche Zeitung, is a cache of over 11 Million documents which show how money is laundered through offshore accounts and entities.


The documents, which are not entirely public yet, show how the world’s super rich exploit international tax regimes in order to hide money and assets.

At the center of the controversy is a Panamanian law firm, Mossack Fonseca, who helped clients route and move money all over the world. Those incriminated in the leak range from world leaders such as Vladimir Putin to soccer stars such as Lionel Messi.

If you want to follow the action live, check out this reddit live page.

So why are we so interested in the leak at AYLIEN?

Well for one, we’re data geeks and the thought of mining a 2.6TB leak greatly appeals to us 😉 and apart from the fact that we care, we also regularly use world events like this to showcase our technology and solutions.

When the news broke, we thought, wouldn’t it be cool to mine the reports. We wanted to look for interesting data points like people mentioned, organizations, locations, topics discussed and so on. In total there is thought to be over 210,000 entities named and the documents dating as far back as the 70’s in a collection of emails, contracts, transcripts, photos and even passports. That’s a lot of interesting data to mine!

The actual documents haven’t been released yet, but there has been a massive amount of chatter on the subject across news outlets, blogs and social media. Using our News API we decided to concentrate on what news outlets were saying by mining news content from across the world with the goal of extracting the same insights we mentioned above.

So, what did we do?

We started by building a very simple search using our News API to scan thousands of monitored news sources for articles related to the leak. In total we collected over 4,000 articles which were then indexed automatically using our text analysis capabilities in the News API.

This meant that key data points in those articles were identified and indexed to be used for further analysis:

  • Keywords
  • Entities
  • Concepts
  • Topics

Search used:

“Panama Leaks” OR “panama papers” OR “Mossack Fonseca” (Try it in our demo)

API call:

https://api.newsapi.aylien.com/api/v1/time_series?period=%2B1HOUR&text=%22panama+leaks%22+or+%22panama+paper%22+or+%22Mossack+Fonseca%22&published_at.start=NOW-3DAYS&published_at.end=NOW

Note: The visualizations we created below were generated at the time of the analysis. Given the rate of new content surfacing we plan on updating these regularly.  

With the stories gathered we decided to dive into them and attempt to extract any interesting data points that the API could surface.

 

Analysis

The first thing we looked at was how the story developed over the past few days following the original story breaking. You can see the news chatter around the topic developing on the evening of April 3rd when The Guardian ran their original story; Revealed: the $2bn offshore trail that leads to Vladimir Putin.

We used the Time Series endpoint in the News API to graph the volume of stories over time.

The graph shows how the volume of stories increases as the story spreads and other timezones come online. We’ve noted some of the more prominent stories by choosing the ones with the highest volume of social shares, which can be easily extracted with our API.

 

Volume over time:

What, Who and Where?

The second thing we wanted to look at was what was being discussed, which individuals, organizations and countries in particular were mentioned in the articles and how often were they mentioned.

We used the API call below to extract any mentions of Entities and Concepts in the articles indexed. The main entities we were focusing included; keywords, people, organizations and countries.

API Call Entities:

https://api.newsapi.aylien.com/api/v1/trends?text=%22panama%20leaks%22%20or%20%22panama%20paper%22%20or%20%22Mossack%20Fonseca%22&language%5B%5D=en&published_at.start=NOW-3DAYS&published_at.end=NOW&field=entities.body.links.dbpedia

API Call Keywords:

https://api.newsapi.aylien.com/api/v1/trends?text=%22panama%20papers%22%20OR%20%22panama%20leaks%22%20OR%20%22Mossack%20Fonseca%22&language%5B%5D=en&published_at.start=2016-04-04T06%3A00%3A00Z&published_at.end=2016-04-04T12%3A00%3A00Z&categories.confident=true&field=keywords

 

Keywords:

Countries:

People:

Organizations:

Trends:

The final piece of analysis, while quite basic, was surprisingly interesting. Using the News API’s Trends endpoint, we looked at how the entities and concepts extracted developed over time as more and more stories broke.

It’s clear the likes of Vladimir Putin was implicated from the start but it’s interesting to see how the likes of David Cameron, Lionel Messi and Xi Jinping were only mentioned following further investigation and coverage.

Entities:


We’re planning on running some further analysis as the story develops. Stay tuned to the blog for updates to the data viz’s and further blog posts.

If you you’d like to try it for yourself just create your free News API account and start collecting and analyzing stories. Our News API is the most powerful way of searching, sourcing and indexing news content from across the globe. We crawl and index thousands of news sources every day and analyze their content using our NLP-powered Text Analysis Engine to give you an enriched and flexible news data source.

 




News API - Sign up




0

Product

Introduction

We are thrilled to announce the launch of the AYLIEN News API, our groundbreaking new service that is going to make it easier than ever before for developers and solution builders to collect, index and understand content at scale to uncover trends, identify hot topics and find influencers.

 

More content was uploaded yesterday than any one human could ever consume in their entire life

– Condé Nast

 

Given the content explosion we’re experiencing on the web today, the need to aggregate and understand news and web content at scale and as close to real-time as possible is more important now than ever before. The News API will help you source and analyze specific, relevant and actionable content from blogs and news sources from across the web. We’re monitoring the Internet 24/7 to provide a constant stream of content so you can keep your finger on the pulse with the most up-to-date news content and data within your applications and solutions.

 

The World’s news data, at your fingertips

The News API enables users to search, source and understand news content from across the web in real-time. By harnessing the power of Machine Learning and NLP-driven technology, users can stay ahead of the curve by collecting news content and extracting what is relevant and important to them.

 

 

You can use our News API to build intelligent content-driven apps and solutions by searching and filtering thousands of news sources, extracting the key data points and delivering valuable and actionable insights.

 

1. Search & Filter

We crawl and index thousands of news sources every day and analyze their content using our NLP-powered Text Analysis Engine to bring you an enriched and flexible news data source.

Our powerful search and filtering capabilities allow users to source and collect the news that matters most to them. Users can build their queries on a variety of data points, including:

– Entities (people, places, products, organizations, etc.)

– Writer Sentiment (positive, negative or neutral opinion)

– Topics

– Categories (industry-specific taxonomies)

– Time (down to minute level, and up to 60 days of historical data)

– Location
– Outlets (news sources and blogs)
– Authors (journalists and influencers)

– Language (English, Spanish, Portuguese, Italian, French and German – more to come)

 

We currently monitor an ever-growing list of thousands of sources from across the world. Given the amount of noise out there, we are focusing on quality over quantity by providing access to high quality and trusted sources.

 

2. Extract key data points

The AYLIEN News API goes beyond just sourcing news. We extract key data points from news content generating an enriched, valuable and actionable data source that can be used to power intelligent news aggregators, content-driven apps and news dashboard. These data points include;

– Keywords 

– Entities mentioned (People, Places, Products, Organizations, etc)

– Categories (according to industry-specific taxonomies)

– Sentiment Analysis of writer’s opinion 

– Language Detection (English, Spanish, Portuguese, Italian, French and German – with more to come)

– Automated article summaries

– Hashtags (automatically generated for each story)

This data is extracted in a matter of seconds from the time the article is published, giving you speedy access to the key data points in the world’s news content.

News API users can build complex search queries to query the news like they would a database, giving them tailored streams of news and content. Try it yourself by building some test quesries here.

 

3. Deliver insights

Social media performance

We continuously monitor social media to measure the mention performance of each story, and profile this performance over time to give users an understanding of the increasing or decreasing popularity of the story.

 

 

Sentiment and category breakdown

We leverage the data points we extract from each and every story to help users answer questions like; What % of articles are talking about category X? or What % of articles are positive, negative or neutral?

 

 

Volume over time

We provide historical data for the previous 60 days, enabling users to clearly see how many stories match a query in a given time window.

 

 

Word clouds

Our word cloud capabilities provide users with a snapshot of the most-used keywords or entities within a given time period.

 

 

Histograms

Informative histograms can be easily created to provide snapshots about a query, an author, outlet or even vertical.

 

 

Integration: As Developer-friendly as it gets

Integrating with our News API is simple. In addition to our extensive and interactive documentation, we’re providing code snippets for the most popular programming languages to help developers get up and running in no time. Results are provided in a well-structured JSON format.

 

 

Pricing & Free Trial 

News API paid plans start from $49 per month and because we charge on a pay-per-story basis, you only pay for what you use. So you are complete control of your usage – no bill shocks!

We are currently offering a 14-day trial . During your trial we will help you make the most of the API by providing access to our extensive interactive documentation, sending you helpful tips, sample code snippets and query inspiration.Check out our Pricing Calculator to get an estimate.

 





News API - Sign up




About AYLIEN

We are a Dublin, Ireland-based AI and Machine Learning company. We provide a range of content analysis solutions to developers, data scientists, marketers and academics. Our core offerings include packages of Information Retrieval, Machine Learning, Natural Language Processing and Image Recognition APIs that allow our users to make sense of human-generated content at scale.

0

Data Science

Introduction

At AYLIEN we like to use topical and interesting events like the FIFA World Cup and the Super Bowl to showcase our technology in a simple and interesting way. Primarily we choose world events with a lot of hype associated with them so that we can try and dive into public opinion from data collected from social media and other sources. This time around we decided to focus on Super Bowl 50 which took place on the 7th of February 2016, to try and get a handle on the public reaction too.

Super Bowl 50 saw the Denver Broncos line out against the Carolina Panthers in what was a battle of the strongest defensive outfit in the league, the Broncos, versus an offensive focused Panthers team.

We set out to try and understand the public reaction to Super Bowl 50 by collecting and analyzing reactions online. We hoped to uncover interesting insights and correlations in the build up to and during the actual game. We focused our attention on the volume of chatter surrounding the event for each team, the battle of the quarterbacks and even the advertising battle which has become such a huge part of the whole Super Bowl event each year.

(Interested in the ads battle? Check out the recording of our recent Webinar with RapidMiner where we dive into who came out on top in the SB50 commercials battle here.)

Process

Overall we collected about 1.8 million tweets using the Twitter Search and Streaming APIs. We also pulled team information like rosters and coaches names from SportRadar API, which we later used to segregate tweets. We analyzed all of the tweets gathered using the AYLIEN Text Analysis API and visualized our results in Tableau. We’ll talk more about the whole process later in the blog.

Tools used:

We focused our data collection on keywords, hashtags and handles that were related to Super Bowl 50. You can download the data set here.

Once we collected all of our tweets we spent a bit of time cleaning and prepping our data. We disregarded some of the metadata which we felt we didn’t need. We kept key indicators like time stamps, geolocation, tweet ID and the raw text of each tweet. We also removed any retweets and tweets that contained links. From previous experience, tweets that contain links are mostly objective and don’t hold any opinion towards the event.

You can read more about the technicalities of the process and even copy the code we used in our walkthrough available here.

Visualizations

We used Tableau to visualize our results and embedded some of the more interesting visualizations below.

We started off by analyzing the volume of chatter on Twitter in the build up to and during Super Bowl 50. You can see in the graph below how the chatter builds in the few days leading up to the event with pretty obvious peaks and troughs in volume at the start and right at the end of the game. With the highest number of tweets published as people expressed their reaction to the result.

Volume

Volume of Tweets:

We looked at the overall volume which was somewhat interesting but we also wanted to know which team had the most vocal fans and who was tweeting the most. To understand the reaction towards each team we needed to separate the tweets in some way. The first approach we took was to use pre-identified Hashtags, in this Case #BroncosWin and #PanthersWin which in the build up were touted as the official hashtags to use.

This didn’t prove too useful however, for the most part hashtags can have massive spikes in usage and popularity but they usually fade away quite dramatically or get replaced by other hashtags that might be trending at that time. This is nicely visualized in the graph below which shows the decline in usage for each hashtag.

Team Specific Hashtags

Our second approach was a bit more technical and it focused around the idea of classifying tweets as Denver or Carolina focused based on what concepts – team, coach, players, cities – were mentioned in a tweet. We accomplished this using our Concept Extraction feature. For example if a tweet mentions Cam Newton it is most likely a tweet about Carolina.

Volume by Team:

We had a lot more success with this approach and were able to classify around 40% of the 1.8M tweets we collected as either Carolina or Denver focused or not relevant. As you can see in the visualization above, the Panthers fans were far more vocal, tweeting about twice as much as the Broncos fans.

Were the Broncos fans quietly confident or is it down to something simpler like the there being more panthers fans than Broncos?  

Location of Tweets

We also wanted to understand where these tweets and the activity was coming from. We could assume that for each team the majority of their activity would focus around their home cities, Denver and Charlotte and we were right. You can see a strong concentration of activity clustered around North Carolina for the Panthers tweets.

Carolina Tweets:

It was much the same for the Broncos tweets with most of the activity focused around Colorado.

While both teams seem to have pockets of fans based in other major cities on the east and west there seems to be a lot more Panthers fans spread throughout the west coast from Florida to New Hampshire.

The other obvious clusters were coming mainly from the San Francisco area where the game was held.

Denver Tweets:



Sentiment

While the volume of tweets and how it increases and decreases is interesting, it doesn’t tell us a whole lot about the opinion of the public, who they were going for, which players they like and who they thought would come out on top.

We used the results from the analysis we did using our Sentiment Analysis and Concept Extraction features to understand what people were actually tweeting about and what their sentiment was, towards teams and some players.

Overall Sentiment:


First off we looked at the overall polarity i.e. how many tweets were positive, how many were classified negative and how many we deemed neutral. The majority of tweets, as expected came back as neutral.

While you can’t tell from this graph, which team the positivity or negativity is directed at, there are still some interesting insights here. Notably how the most opinionated tweets are positive in the build up to the game, everyone believes in their team, they’re excited for the big game and are showing their excitement with an overall positive sentiment, the negativity does creep in however once the game kicks off, reaching it’s peak at the end of the game which you could assume was down to the disappointment of the Carolina fans.

We’ll talk more about this in the next section but that initial very severe spike in activity and positivity is also interesting.

Teams

Using the same approach as before with Concept Extraction to separate tweets we could split the positive and negative tweets into Broncos and Panthers related tweets.

The Carolina Panthers certainly had the most chatter about them from a volume point of view but they also had the most positive sentiment towards them in the build up and the beginning of the game.

Were the Panthers fans too cocky?

Sentiment towards each team (build up):


What happened on February 5th?

The first thing we noticed from the visualization above was the extreme spike on Feb 5th. It’s somewhat strange to see such a large spike in activity at that time especially because it was so rich in positive sentiment. After some digging in the data we figured out that this was down to a campaign ran by Sports Central where they asked their followers to vote on who was going to win Super Bowl 50 using a Twitter poll. Once you voted your account automatically tweeted one of the following tweets. This certainly shows the effect a poll can have on Twitter but its effectiveness is quite short lived.



As was the case with the hashtags #BroncosWin and #PanthersWin we discussed earlier, the Twitter poll gave a very strong indication of the public opinion at that time, but failed to deliver insight throughout the rest of the build up and during the game itself.

Game Day

The graph below focuses on game day, there was quite a significant spike in negative sentiment towards the last two quarters towards Carolina and at the end of the game we can see quite a significant amount of negativity present. The the opposite effect can be seen on the Denver side with a large positive spike right at the end of the game. Where fans expressed their delight with the result.

Sentiment towards each team (Game):


Players

We also wanted to focus on some key individuals and how they performed in the eyes of the public. Anyone who watches football will tell you, a lot of the game focuses on one key position, the quarterback.

Below we analyzed both the positive and negative reactions towards both Cam Newton and Peyton Manning during the game. The overall reaction by fans to Newton’s performance was pretty poor. He only completed 18/41 passes and was sacked a totla of 6 times, which was visualized pretty clearly with the dominance of Negative sentiment towards Newton in tweets, especially as it became more evident that the Broncos had shut them out.

Manning, having not thrown a single touchdown pass, was still praised for his performance and control of the game. After all, Denver are known for their defense focused strategies and they closed out the Carolina attack and Newton’s offensive efforts in particular. So while Manning didn’t deliver a perfect quarterback performance, as usual he delivered the goods in the form of a win, much to the satisfaction of the fans.   

Player Reaction on Game Day

From a volume of tweets point of view, other players of note included Greg Olsen and Demaryius Thomas who had the most mentions in tweets.  

Other Players Mentioned

Conclusion

While this was put together as a fun exercise there are some key takeaways that can be applied to more business and commercially-focused applications. From a data analytics point of view this use case could be classed as a “voice of the customer application” of Text Analytics with a focus on social listening. It’s pretty clear there is a wealth of information about customer opinion towards brands and events on social platforms like Twitter.

Key Takeaways:

  • Hashtags can provide insight but they are heavily influenced by flocking and are easily overshadowed or replaced
  • Twitter Polls are great way to gain immediate traction and reaction from the twittersphere but they are extremely short lived
  • The ability to segment reactions based on concepts in tweets allows for a greater understanding of opinion towards entities and concepts, in this case teams and players but the same can be easily applied to people and brands for example

As we mentioned above we ran a similar analysis process on brand related tweets for the Super Bowl commercials. Check it out here.





News API - Sign up




0

As you know, from time to time at AYLIEN we like to share useful code snippets, apps and use cases that either we’ve put together or our users have built. For this blog we wanted to showcase a neat little script one of our engineers hacked together that can be used when the target webpage your analyzing is lite on content.

Our standard Article Extraction feature extracts the main body of text, the primary image, the headline, author etc. It’s primarily designed for article type, or blog type pages where there is a significant chunk of text on the page.

But what happens when you want to Categorize a home page or even a domain, or what if there just isn’t enough text on a page to analyze? 

You have to look elsewhere.

While these type of pages (product pages, home pages, feature pages) may not have 1 large article style piece of text they often have other strong indicators and text hidden in other areas, such as:

  • Headers
  • Meta descriptions
  • Meta tags for images
  • Keyword Tags

 

These other text sources on webpages can be effectively leveraged when trying to classify “text light” webpages.

Here’s an example demonstrating and explaining how it’s done using our Node.js SDK:

var _ = require('underscore'),
    cheerio = require('cheerio'),
    request = require('request'),
    AYLIENTextAPI = require("aylien_textapi");

var textapi = new AYLIENTextAPI({
    application_id: "YOUR_APP_ID",
    application_key: "YOUR_APP_KEY"
});

var url = 'http://www.bbc.com/';
request(url, function(err, resp, body) {
  if (!err) {
    var text = extract(body)
    textapi.classifyByTaxonomy({'text': text, 'taxonomy': 'iab-qag'}, function(err, result) {
      console.log(result.categories);
    });
  }
});

function getText(tagName, $) {
  var texts = _.chain($(tagName)).map(function(e) {
    return $(e).text().trim();
  }).filter(function(t) {
    return t.length > 0;
  }).value();

  return texts.join(' ');
}

function extract(body) {
  var $ = cheerio.load(body);
  var keywords = $('meta[name="keywords"]').attr('content');
  var description = $('meta[name="description"]').attr('content');
  var imgAlts = _($('img[alt]')).map(function(e) {
    return $(e).attr('alt').trim();
  }).join(' ');

  var h1 = getText('h1', $);
  var h2 = getText('h2', $);
  var links = getText('a', $);
  var text = [h1, h2, links, imgAlts].join(' ');

  return text;
}

In a nutshell, what we’re doing here is extracting the text present in the meta tags, H1 and H2 tags, linked text and image alt attributes within the pages HTML instead of relying on the main body of text.

Our ad tech focused customers currently use this or similar approaches when classifying text light pages and domains in order to be able to classify a page or domain against the IAB QAG taxonomy.

You can copy and paste the snippet for further use or test it out in our Sandbox Environment.





Text Analysis API - Sign up




0

PREVIOUS POSTSPage 1 of 10NO NEW POSTS