Model Updates: Entity-level Sentiment Analysis and Brand New Entity Extraction Models Now Live in the Text Analysis API

Here at AYLIEN, we host a fully-functional research lab of five NLP scientists who carry out leading-edge research into NLP, machine learning, and deep learning. One of the missions of this lab is to constantly feed research output into the models that power our services, so our engineers can build and maintain a world-leading NLP service.

Specifically, the benefits of housing a research lab are twofold – not only are our endpoints being constantly updated with state-of-the-art models, but we can also build brand new service offerings based on the latest research.

We’re really excited to share the results of some of the work our research and engineering teams have done over the past couple of months. We’ve launched a brand new feature – Entity-level Sentiment Analysis – and made major upgrades to two of our most-used features – Entity and Concept extraction.

1. Deep Learning-Powered Entity-Level Sentiment Analysis

Using sentiment analysis, people can build software that understands whether a piece of text is positive, negative, or neutral. Usually, the sentiment of a piece of text is analyzed at document-level, meaning, if you analyze a news article or a review, all of the text in that document is analyzed together, and the entire document is classified as having one sentiment.

But news articles, product reviews, and even Tweets frequently contain differing sentiment about different things. Which means relying on an overall sentiment score can sometimes be misleading. For example, is the sentence “Steve Jobs is a legend. I guess Apple are ok, but I’d never buy Samsung, those things are too hard to use” positive, negative, or neutral?

ELSA has identified each entity mentioned in the sentence and then provided a sentiment score towards that entity. Since the endpoint leverages our Concept Extraction model, which we’ll talk about further down, an entity type is also returned in it’s prediction.

ELSA provides much more granular sentiment analysis results and helps supercharge your media monitoring solution by providing accurate and relevant sentiment scores for the brands, products, people and organizations you care about.

We’re working on a more detailed walkthrough of ELSA, which we’ll post  in the next couple of days, but if you want to dive in and test it out, sign up to the Text Analysis API to start making 1,000 calls to the Text API per day with no charge.

2. Improved Entities and Concept Extraction endpoints

The Entity and Concept extraction endpoints are used by our customers to understand what topics and things are being talked about in a piece of text. Each of these endpoints work by recognizing entities such as things, places or people in a piece of text, and the Concepts endpoint goes one further by cross referencing the entities found with DBpedia, and returns the Concepts listed on DBpedia to the user along with their URIs.

Both of these endpoints now have new models running under the hood, so the Text Analysis API will now not only recognize more entities in text with the Entities endpoint, but it will also map them even more accurately to DBpedia with the Concepts endpoint.

For the user, this means not only that more accurate results will be returned, but also that there will be more of them. For example, take a look at the following story (full story available here):

Here are the concepts returned by the old model:

Old Concepts Model
Apple Inc. European Union Netflix
Blood alcohol content Facebook New York (magazine)
China Instagram PepsiCo
CNNMoney Italy The Coca-Cola Company
Comcast JPMorgan Chase The Walt Disney Company
Constellation Brands Mark Zuckerberg University of Cambridge
Crown Castle International Corp. Microsoft Whole Foods Market
Duke Energy

Now see how the upgraded Concepts endpoint returns over 70% more results than the old endpoint from the body of the story:

New Concepts Model
Amazon (company) Earnings growth Microsoft
Apple Inc. Ecological resilience Netflix
Beer Europe Pepsi
Bond (finance) European Union Phosphoenolpyruvic acid
British Aircraft Corporation Facebook Real estate
Cambridge Analytica Google Social networking service
China Instagram The Move
Cloud computing Interest rate The Walt Disney Company
Coca-Cola Italy Trade war
Comcast JPMorgan Chase United States Department of the Treasury
Constellation Brands Kale Whole Foods Market
Donald Trump presidential campaign, 2016 Mark Zuckerberg Wine
Duke Energy Market capitalization

3. New and Improved IPTC Classification

IPTC Subject Codes is one of the major taxonomies used to organize content produced by the media. Our new IPTC classifier effectively doubles the Text Analysis API’s previous coverage of this taxonomy.

For the user, this means that the Text Analysis API can now accurately predict much more granular subjects, so you can retrieve much more detailed results. For instance, take a look at the article below, which used to be categorized as ‘software’ (04003005). You can see that the new model categorizes this story as ‘new product’ (04016030), a more granular label that can help users generate better insights.

Here’s the response for this story:

The full JSON response includes the prediction of the IPTC subject code for the story as well as the parent category, in this case ’company information’ (04016000):

You can try out all of these features for yourself by checking out the Text Analysis API’s demo, but if you want to test it out on data that interests you, sign up and start trying out 1,000 hits per day with no charge.

Text Analysis API - Sign up

Let's Talk