Product

Feature Update: Unsupervised Classification Endpoint Added to AYLIEN Text Analysis API

Our development team have been working hard adding additional features to the API which allow our users to analyze, classify and tag text in more flexible ways. Unsupervised Classification is a feature we are really excited about and we’re happy to announce that it is available as a fully functional and documented feature, as of today.

So what exactly is Unsupervised Classification?

It’s a training-less approach to classification, which means, unlike our standard classification, that is based on IPTC News Codes, it doesn’t rely on a predefined taxonomy to categorize text. This method of classification allows automatic tagging of text that can be tailored to a users needs, without the need for a pre-trained classifier.

Why are we so excited about it?

Our Unsupervised Classification endpoint will allow users to specify a set of labels, analyze a piece of text and then assign the most appropriate label to that text. This allows greater flexibility for our users to decide, how they want to tag and classify text.

There are a number of ways this endpoint can be used and we’ll walk you through a couple of simple examples; Text Classification from a URL and Customer Service Routing of social interactions.

Classification of Text

We’ll start with a simple example to show how the feature works. The user passes a piece of text or a URL to the API, along with a number of labels. In the case below we want to find out which label, Football, Baseball, Hockey or Basketball, best represents the following article: ‘http://insider.espn.go.com/nfl/story/_/id/12300361/bold-move-new-england-patriots-miami-dolphins-new-york-jets-buffalo-bills-nfl’

Code Snippet:


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
  application_id: 'YourAppId',
  application_key: 'YourAppKey'
});

var params = {
  url: 'http://insider.espn.go.com/nfl/story/_/id/12300361/bold-move-new-england-patriots-miami-dolphins-new-york-jets-buffalo-bills-nfl',
  'class': ['basketball', 'baseball', 'football', 'hockey']
};

textapi.unsupervisedClassify(params, function(error, response) {
  if (error !== null) {
    console.log(error, response);
  } else {
    console.log("nThe text to classify is : nn",
      response.text, "n");
    for (var i = 0; i < response.classes.length; i++) {
      console.log("label - ", response.classes[i].label,
        ", score -", response.classes[i].score, "n");
    }
  }
});

Results:


The text to classify is:

"Each NFL team's offseason is filled with small moves and marginal personnel decisions... "

label -  football , score - 0.13

label -  baseball , score - 0.042

label -  hockey , score - 0.008

label -  basketball , score - 0.008

Based on the scores provided, we can confidently say, that the article is about football and should be assigned a “Football” label.

Customer Service Routing

As another example, let’s say we want to automatically determine whether a post on social media should be routed to our Sales, Marketing or Support Departments. In this example, we’ll take the following comment: “I’d like to place an order for 1000 units.” and automatically determine whether it should be dealt with by Sales, Marketing or Support. To do this, we pass the text to the API as well as our pre-chosen labels, in this case: ‘Sales’, ‘Customer Support’, ‘Marketing’.

Code Snippet:


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
  application_id: 'YourAppId',
  application_key: 'YourAppKey'
});

var params = {
  text: "I'd like to place an order for 1000 units.",
  'class': ['Sales', 'Customer Support', 'Marketing']
};

textapi.unsupervisedClassify(params, function(error, response) {
  if (error !== null) {
    console.log(error, response);
  } else {
    console.log("nThe text to classify is : nn",
      response.text, "n");
    for (var i = 0; i < response.classes.length; i++) {
      console.log("label - ",
        response.classes[i].label,
        ", score -", response.classes[i].score, "n");
    }
  }
});

Results:


The text to classify is:

I'd like to place an order for 1000 units.

label -  Sales , score - 0.032

label -  Customer Support , score - 0.008

label -  Marketing , score - 0.002

Similarily, based on the scores given on how closely the text is semantically matched to a label, we can decide that this inquiry should be handled by a sales agent rather than, marketing or support.

Divide and Conquer

Our next example deals with the idea of using the unsupervised classification feature, with a hierarchical taxonomy. When classifying text, it’s sometimes necessary to add a sub-label for finer grained classification, for example “Sports – Basketball” instead of just “sports”.

So, in this example we’re going to analyze a simple piece of text: “The oboe is a woodwind musical instrument” and we’ll attempt to provide a more descriptive classification result, based on the following taxonomy;

  • ‘music’: [‘Instrument’, ‘composer’],
  • ‘technology’: [‘computers’, ‘space’, ‘physics’],
  • ‘health’: [‘disease’, ‘medicine’, ‘fitness’],
  • ‘sport’: [‘football’, ‘baseball’, ‘basketball’]

    The taxonomy has a primary label and a secondary label, for example ‘music’ (primary) and ‘instrument, Composer’ (secondary)

    Code Snippet:

    
    var AYLIENTextAPI = require('aylien_textapi');
    var textapi = new AYLIENTextAPI({
      application_id: 'YourAppId',
      application_key: 'YourAppKey'
    });
    
    var _ = require('underscore');
    var taxonomy = {
      'music':      ['Instrument', 'composer'],
      'technology': ['computers', 'space', 'physics'],
      'health':     ['disease', 'medicine', 'fitness'],
      'sport':      ['football', 'baseball', 'basketball']
    };
    
    var topClasses = ['technology', 'music', 'health', 'sport'];
    var queryText = "The oboe is a woodwind musical instrument.";
    var params = {
      text: queryText,
      'class': topClasses
    };
    
    textapi.unsupervisedClassify(params, function(error, response) {
      if (error !== null) {
        console.log(error, response);
      } else {
        var classificationResult = '';
        console.log("nThe text to classify is : nn",
          response.text, "n");
        classificationResult = response.classes[0].label +
          " (" + response.classes[0].score + ") ";
        params = {
          text: queryText,
          'class': _.values(
            _.pick(taxonomy, response.classes[0].label)
          )[0]
        };
        textapi.unsupervisedClassify(params,
          function(error, response) {
            if (error !== null) {
              console.log(error, response);
            } else {
              classificationResult += " - " +
                response.classes[0].label +
                " (" + response.classes[0].score +
                ") ";
              console.log("Label: ", classificationResult);
            }
          }
        );
      }
    
    });
    

    Results:

    
    The text to classify is :
    
    The Obo is a large musical instrument
    
    Label    :     music (0.076)  - Instrument (0.342)
    

    As you can see from the results, the piece of text has been assigned ‘music’ as its primary label and ‘instrument’ as its secondary label.

    All the code snippets in our examples are fully functional and can be copied and pasted or tested in our sandbox. We’ll also be adding some of these and more interesting apps to our sandbox over the next week or so that will showcase some interesting use cases for Unsupervised Classification. We’d also love to hear more about how you would use this feature, so don’t hesitate to get in touch with comments or feedback.





    Text Analysis API - Sign up




    Author


    Avatar

    Mike Waldron

    Head of Marketing & Sales @ AYLIEN A legal convert with a masters degree from Smurfit Business School, Mike runs our Sales and Marketing at AYLIEN. Mike gathered his Sales and Marketing experience with technology companies in Sydney and Dublin before getting the startup itch and joining the team at AYLIEN. Twitter: @MikeWallly