Model Updates: Entity-level Sentiment Analysis and Brand New Entity Extraction Models Now Live in the Text Analysis API
Here at AYLIEN, we host a fully-functional research lab of five NLP scientists who carry out leading-edge research into NLP, machine learning, and deep learning. One of the missions of this lab is to constantly feed research output into the models that power our services, so our engineers can build and maintain a world-leading NLP service.
Specifically, the benefits of housing a research lab are twofold – not only are our endpoints being constantly updated with state-of-the-art models, but we can also build brand new service offerings based on the latest research.
We’re really excited to share the results of some of the work our research and engineering teams have done over the past couple of months. We’ve launched a brand new feature – Entity-level Sentiment Analysis – and made major upgrades to two of our most-used features – Entity and Concept extraction.
1. Deep Learning-Powered Entity-Level Sentiment Analysis
Using sentiment analysis, people can build software that understands whether a piece of text is positive, negative, or neutral. Usually, the sentiment of a piece of text is analyzed at document-level, meaning, if you analyze a news article or a review, all of the text in that document is analyzed together, and the entire document is classified as having one sentiment.
But news articles, product reviews, and even Tweets frequently contain differing sentiment about different things. Which means relying on an overall sentiment score can sometimes be misleading. For example, is the sentence “Steve Jobs is a legend. I guess Apple are ok, but I’d never buy Samsung, those things are too hard to use” positive, negative, or neutral?
ELSA has identified each entity mentioned in the sentence and then provided a sentiment score towards that entity. Since the endpoint leverages our Concept Extraction model, which we’ll talk about further down, an entity type is also returned in it’s prediction.
ELSA provides much more granular sentiment analysis results and helps supercharge your media monitoring solution by providing accurate and relevant sentiment scores for the brands, products, people and organizations you care about.
We’re working on a more detailed walkthrough of ELSA, which we’ll post in the next couple of days, but if you want to dive in and test it out, sign up to the Text Analysis API to start making 1,000 calls to the Text API per day with no charge.
2. Improved Entities and Concept Extraction endpoints
The Entity and Concept extraction endpoints are used by our customers to understand what topics and things are being talked about in a piece of text. Each of these endpoints work by recognizing entities such as things, places or people in a piece of text, and the Concepts endpoint goes one further by cross referencing the entities found with DBpedia, and returns the Concepts listed on DBpedia to the user along with their URIs.
Both of these endpoints now have new models running under the hood, so the Text Analysis API will now not only recognize more entities in text with the Entities endpoint, but it will also map them even more accurately to DBpedia with the Concepts endpoint.
For the user, this means not only that more accurate results will be returned, but also that there will be more of them. For example, take a look at the following story (full story available here):
Here are the concepts returned by the old model:
|Old Concepts Model|
|Apple Inc.||European Union||Netflix|
|Blood alcohol content||New York (magazine)|
|CNNMoney||Italy||The Coca-Cola Company|
|Comcast||JPMorgan Chase||The Walt Disney Company|
|Constellation Brands||Mark Zuckerberg||University of Cambridge|
|Crown Castle International Corp.||Microsoft||Whole Foods Market|
Now see how the upgraded Concepts endpoint returns over 70% more results than the old endpoint from the body of the story:
|New Concepts Model|
|Amazon (company)||Earnings growth||Microsoft|
|Apple Inc.||Ecological resilience||Netflix|
|Bond (finance)||European Union||Phosphoenolpyruvic acid|
|British Aircraft Corporation||Real estate|
|Cambridge Analytica||Social networking service|
|Cloud computing||Interest rate||The Walt Disney Company|
|Comcast||JPMorgan Chase||United States Department of the Treasury|
|Constellation Brands||Kale||Whole Foods Market|
|Donald Trump presidential campaign, 2016||Mark Zuckerberg||Wine|
|Duke Energy||Market capitalization|
3. New and Improved IPTC Classification
IPTC Subject Codes is one of the major taxonomies used to organize content produced by the media. Our new IPTC classifier effectively doubles the Text Analysis API’s previous coverage of this taxonomy.
For the user, this means that the Text Analysis API can now accurately predict much more granular subjects, so you can retrieve much more detailed results. For instance, take a look at the article below, which used to be categorized as ‘software’ (04003005). You can see that the new model categorizes this story as ‘new product’ (04016030), a more granular label that can help users generate better insights.
Here’s the response for this story:
The full JSON response includes the prediction of the IPTC subject code for the story as well as the parent category, in this case ’company information’ (04016000):
You can try out all of these features for yourself by checking out the Text Analysis API’s demo, but if you want to test it out on data that interests you, sign up and start trying out 1,000 hits per day with no charge.