Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

Our “Getting started with AYLIEN Text Analysis API” blogs, focus on working with the API using different programming languages. We have previously published code snippets and getting started guides for C#, node.js, Powershell, Python, Java ,Ruby and PHP.

Today we’re going to look at using the API with Go. We’re going to run through some basic Text Analysis functions like analyzing the sentiment of a piece of text, detecting what language a piece of text is written in, classifying an article and generating hashtags for a URL.

To begin we’ll look at the code in action. Then we’ll go through it, section by section, to investigate each of the endpoints used in the code snippet and the results.

Here’s what we’re going to do:

Note: The getting started page on our website has a range of code snippets in other programming languages and links to our SDKs. We are making use of the GO SDK in this blog.

Overview of the code in action

The complete code snippet is given at the end of this blog for you to copy and paste. Copy and paste the snippet into a text editor of your choice. Make sure you replace the YOUR_APP_ID and YOUR_APP_KEY placeholders in the code with your own application id and application key. You would have been sent these when you signed up as an API user. If you haven’t signed up you can make your way to our sign up page to register for free.

Save the file as TextAPISample.go. Type “go run TextAPISample.go” to see the code snippet in action.

Once you have ran the code, you should receive the following output:

Language Detection Results
Text: What language is this sentence written in?
Language: en (0.999998)

Sentiment Analysis Results
Text: John is a very good football player!
Sentiment: positive (0.999999)
Subjectivity: Objective
Subjectivity Confidence: 0.989682

Classification Analysis Results:
Label : disease - heart disease
IPTC code : 07001005
Confidence : 1.000000

Hashtag Suggestion Results:
#Yoga
#HeartDisease
#Hypertension
#Cholesterol
#Oxygen
#BritishHeartFoundation
#ErasmusUniversityRotterdam
#Rotterdam
#Netherlands
#HathaYoga

In this case we have detected that the first piece of text is written in English and the sentiment or polarity of the second statement is positive. We have also generated hashtags for the URL and classified the content.

We’ll now go through the code snippet in more detail.

Language Detection

Using the Language Detection endpoint you can analyze a piece of text or a URL and automatically determine what language it is written in. In the piece of code we have used in this blog, the “languageParams ” variable controls whether the call is made specifying the text directly or as a URL.

languageParams := &textapi.LanguageParams{Text: "What language is this sentence written in?"}
lang, err := client.Language(languageParams)

In this case we have specified that, we want to analyze the following text “What language is this sentence written in?” and as you can see from the output below, it determined that the text is written in English and it gave a 0.999998 confidence score that the language was detected correctly. For all of the endpoints, the API returns the text which was analyzed for reference and we have included it in the results in each case.

Result:

Text            :   What language is this sentence written in?
Language        :   en
Confidence      :   0.999998

Sentiment Analysis

Similarly, the Sentiment Analysis endpoint takes a piece of text or a URL and analyzes it to determine whether it is positive, negative or even neutral it also provides a confidence score in the results.

sentimentParams := &textapi.SentimentParams{Text: "John is a very good football player!"}
sentiment, err := client.Sentiment(sentimentParams)

Analyzing the following text; “John is a very good football player!”. The API has determined that the sentiment of the piece of text is positive, we can also be pretty sure it’s correct based on the confidence score returned of 0.999999. It also detected that statement was Objective with a confidence of 0.989682.

Result:

Text            :   John is a very good football player!
Sentiment Polarity  :   positive
Polarity Confidence  :   0.999999
Subjectivity  : objective
Subjectivity Confidence  :   0.989682

Hashtag Suggestions

The Hashtag Suggestion endpoint, analyses a URL and generates a list of hashtag suggestions which can be used to ensure that your content or URL’s are optimally shared on social media:

hashtagsParams := &textapi.HashtagsParams{URL: "http://www.bbc.com/news/health-30475999"}
hashtags, err := client.Hashtags(hashtagsParams)

For hashtag suggestions, we have used an article published on the BBC news website about how Yoga can help prevent Heart Disease. The URL for the article is as follows “http://www.bbc.com/news/health-30475999”. The hashtag suggestion endpoint extracts the text from the URL and then analyzes that text and generates hashtag suggestions for it.

Result:

Hashtag Suggestion Results
#Yoga
#HeartDisease
#Hypertension
#Cholesterol
#Oxygen
#BritishHeartFoundation
#ErasmusUniversityRotterdam
#Rotterdam
#Netherlands
#HathaYoga

Article Classification

The Classification endpoint automatically assigns an article or piece of text to one or more categories making it easier to manage and sort. The classification is based on IPTC International Subject News Codes and can identify up to 500 categories.

classifyParams := &textapi.ClassifyParams{URL: "http://www.bbc.com/news/health-30475999"}
class, err := client.Classify(classifyParams)

When we analyze the url pointing to the BBC news story, we receive the results as shown below. As you can see it has labeled the article as “disease- heart disease” with a corresponding IPTC code of 07001005 and a confidence of 1.

Result:

Classification Analysis Results
Classification Label        :   disease - heart disease
Classification Code         :   07001005
Classification Confidence   :   1.000000

For more getting started guides and code snippets to help you get up and running with our API, visit our getting started page on our website.

 

image

The Complete Code Snippet

package main

import (
    "fmt"
    textapi "github.com/AYLIEN/aylien_textapi_go"
)

func main() {
    auth := textapi.Auth{"YOUR_APP_ID ", "YOUR_APP_KEY"}
    client, err := textapi.NewClient(auth, true)
    if err != nil {
        panic(err)
    }


    endpoints := []string{"language", "sentiment","classify", "hashtags"};

    for _, endpoint := range endpoints {
        switch (endpoint) {
            case "language": {
                languageParams := &textapi.LanguageParams{Text: "What language is this sentence written in?"}
                lang, err := client.Language(languageParams)
                if err != nil {
                    panic(err)
                }
                fmt.Printf("nLanguage Detection Resultsn")
                fmt.Printf("Text            :   %sn", lang.Text)
                fmt.Printf("Language        :   %sn", lang.Language)
                fmt.Printf("Confidence      :   %fnn", lang.Confidence)
                break;
            }
            case "sentiment": {
                sentimentParams := &textapi.SentimentParams{Text: "John is a very good football player!"}
                sentiment, err := client.Sentiment(sentimentParams)
                if err != nil {
                    panic(err)
                }
                fmt.Printf("Sentiment Analysis Resultsn")
                fmt.Printf("Text            :   %sn", sentiment.Text)
                fmt.Printf("Sentiment Polarity  :   %sn", sentiment.Polarity)
                fmt.Printf("Polarity Confidence  :   %fn", sentiment.PolarityConfidence)
                fmt.Printf("Subjectivity  : %sn", sentiment.Subjectivity)
                fmt.Printf("Subjectivity Confidence  :   %fnn", sentiment.SubjectivityConfidence)
                break;
            }
            case "classify": {
                classifyParams := &textapi.ClassifyParams{URL: "http://www.bbc.com/news/health-30475999"}
                class, err := client.Classify(classifyParams)
                if err != nil {
                    panic(err)
                }
                fmt.Printf("Classification Analysis Resultsn")
                for _, v := range class.Categories {
                    fmt.Printf("Classification Label        :   %sn", v.Label)
                    fmt.Printf("Classification Code         :   %sn", v.Code)
                    fmt.Printf("Classification Confidence   :   %fnn", v.Confidence)
                }
                break;
            }
            case "hashtags": {
                hashtagsParams := &textapi.HashtagsParams{URL: "http://www.bbc.com/news/health-30475999"}
                hashtags, err := client.Hashtags(hashtagsParams)
                if err != nil {
                    panic(err)
                }
                fmt.Printf("Hashtag Suggestion Resultsn")
                for _, v := range hashtags.Hashtags {
                    fmt.Printf("%sn", v)
                }

            }
        }
    }
0

We recently spoke about TTFHW (Time to first hello world) in a previous blog post and how important developer experience is to us. Ensuring our API is easy to use, quick to get started with and flexible is a major priority for us at AYLIEN. With this in mind, we recently launched SDKs in various languages and are today, happy to announce, our developer Sandbox is now live on our developer portal. The idea with the Sandbox is to provide a simple environment for our API users, whilst making it quicker and easier for developers to play with, test and build upon our API.

The Sandbox is a development and test environment that allows you to start making calls to the API as quickly and as easily as possible. It’s comprised of an editor and output window that allows users to start coding and making calls to the API without having to setup a dedicated environment. It currently has two sample apps, one is a “Basic Features App”, a simple application that showcases each feature of the API and the other is an application that grabs the top 5 discussions on Hacker News and Summarizes them. Both are fully functional and editable for you to use as a functional app, to test the API or as the building blocks for an idea you might have. We plan to add more applications over the next couple of weeks to showcase some interesting use cases for the API.

You can access the Sandbox with an AYLIEN Text Analysis API account. If you haven’t signed up for a free account, you can register for one here.

Getting Started

Once you’ve logged in, the Sandbox will automatically generate your App ID and Account information so you can start making calls immediately. Choose the SDK from the menu in the right-hand side for the language of your choice and start hacking.

 

 

Choose an APP

Choose which app you want to use in the drop down menu. As we mentioned there are two sample apps currently available. A feature focused app and a Hacker News summarization app.

 

 

Build your own

The code is fully editable so you can use the apps as building blocks for new ideas, or even build your own app from right within the Sandbox. Your work can also be downloaded and saved on your local machine..

 

 

Invite your friends!

If you are having fun in the sandbox then please invite your friends, simply grab the URL and pass it on. Note: Whoever you share it with will need their own AYLIEN Account to make any calls.

 

 

We hope you enjoy using our Developer Sandbox. You can get access to it here. We’d love to hear about any cool apps you’re building and we would be happy to feature them as sample apps in the Developer Sandbox.

Happy Hacking!




Text Analysis API - Sign up




0

Our “Getting up and running with AYLIEN Text Analysis API” blogs, focus on working with the API using different programming languages. Previously we published code snippets and getting started guides for C#, node.js, Powershell, Python, Java and Ruby. Today we’re going to focus on PHP.

As we did in our previous blogs, we’re going to perform some basic Text Analysis functions like analyzing the sentiment of a piece of text, detecting what language a piece of text is written in, classifying an article and generating hashtags for a URL. The idea, is to make it as easy as possible for developers to get up and running with our API and to showcase how easy it is to get started in your chosen programming language.

We’re first going to look at the code in action. Then we’ll go through it, section by section, to investigate each of the endpoints used in the code snippet and the results.

Here’s what we’re going to do?:

Note: The getting started page on our website has a range of code snippets in other programming languages and links to our SDKs.

Overview of the code in action

The complete code snippet is given at the end of this blog for you to copy and paste. Copy and paste the snippet into a text editor of your choice. Make sure you replace the YOUR_APP_ID and YOUR_APP_KEY placeholders in the code with your own application id and application key. You would have been sent these when you signed up as an API user. If you haven’t signed up you can make your way to our sign up page to register for free.

Save the file as TextAPISample.php in the docs folder of your web server. If you are using XAMPP as your web server the docs folder is typically at “xampp/htdocs”. Open a web browser and open the url “http://localhost/TextAPISample.php” to see the code snippet in action.

Once you open it, you should receive the following output:

Text: What language is this sentence written in?
Language: en (0.999996)

Text: John is a very good football player!
Sentiment: positive (0.999999)

Classification:
Label : natural science - geology
IPTC code : 13004001
Confidence : 1.000000

Hashtags:
#Moon
#Earth
#Theia
#MoonRock
#SolarSystem
#Fingerprint
#BBCNews
#ComputerSimulation
#Mars
#Oxygen
#Meteorite
#Venus
#AsteroidBelt
#Mercury
#Crust
#Sessility
#OpenUniversity
#SolarWind
#Meijer
#UniversityOfOxford
#Apollo
#Netherlands
#UniversityOfGroningen
#GreekLanguage
#CollisionTheory
#Selene
#Astronaut
#GiantImpactHypothesis

In this case we have detected that the first piece of text is written in English and the sentiment or polarity of the second statement is positive. We have also generated hashtags for the URL and classified the content.

We’ll now go through the code snippet in more detail.

Language Detection

Using the Language Detection endpoint you can analyze a piece of text or a URL and automatically determine what language it is written in. In the piece of code we have used in this blog, the “parameter” variable controls whether the call is made specifying the text directly or as a URL.

$params = array('text' => 'What language is this sentence written in?');
$language = call_api('language', $params);

In this case we have specified that, we want to analyze the following text “What language is this sentence written in?” and as you can see from the output below, it determined that the text is written in English and it gave a 0.999996 confidence score that the language was detected correctly. For all of the endpoints, the API returns the text which was analyzed for reference and we have included it in the results in each case.

Result:

Text: What language is this sentence written in?
Language: en (0.999996)

Sentiment Analysis

Similarly, the Sentiment Analysis endpoint takes a piece of text or a URL and analyzes it to determine whether it is positive, negative or even neutral it also provides a confidence score in the results.

$params = array('text' => 'John is a very good football player!');
$sentiment = call_api('sentiment', $params);

Analyzing the following text; “John is a very good football player!”. The API has determined that the sentiment of the piece of text is positive, we can also be pretty sure it’s correct based on the confidence score returned of 0.999999.

Result:

Text: John is a very good football player!
Sentiment: positive (0.999999)

Hashtag Suggestions

The Hashtag Suggestion endpoint, analyses a URL and generates a list of hashtag suggestions which can be used to ensure that your content or URL’s are optimally shared on social media:

$params = array('url' => $url);
$hashtags = call_api('hashtags', $params);

For hashtag suggestions, we have used an article published on the BBC news website about how the moon was formed when a planet called Theia collided with the Earth approximately 4.5 billion years ago. The url for the article is as follows “http://www.bbc.com/news/science-environment-27688511”. The hashtag suggestion endpoint first extracts the text from the URL and then analyzes that text and generates hashtag suggestions for it.

Result:

Hashtags:
#Moon
#Earth
#Theia
#MoonRock
#SolarSystem
#Fingerprint
#BBCNews
#ComputerSimulation
#Mars
#Oxygen
#Meteorite
#Venus
#AsteroidBelt
#Mercury
#Crust
#Sessility
#OpenUniversity
#SolarWind
#Meijer
#UniversityOfOxford
#Apollo
#Netherlands
#UniversityOfGroningen
#GreekLanguage
#CollisionTheory
#Selene
#Astronaut
#GiantImpactHypothesis

Article Classification

The Classification endpoint automatically assigns or tags an article or piece of text to one or more categories making it easier to manage and sort. The classification endpoint is based on IPTC International Subject News Codes and can identify up to 500 categories.

$params = array('url' => $url);
$classify = call_api('classify', $params);

When we analyze the url pointing to the BBC news story, we receive the results as shown below. As you can see it has labelled the article as “natural science- geology” with a corresponding IPTC code of 13004001 and a confidence of 1.

Result:

Label : natural science - geology
IPTC code : 13004001
Confidence : 1.000000

For more getting started guides and code snippets to help you get up and running with our API, visit our getting started page on our website. If you haven’t already done so you can get free access to our API on our sign up page.





Text Analysis API - Sign up




The Complete Code Snippet

<?php
define('APPLICATION_ID',    YOUR_APP_ID);
define('APPLICATION_KEY',  YOUR_APP_KEY');

function call_api($endpoint, $parameters) {
  $ch = curl_init('https://api.aylien.com/api/v1/' . $endpoint);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_HTTPHEADER, array(
    'Accept: application/json',
    'X-AYLIEN-TextAPI-Application-Key: ' . APPLICATION_KEY,
    'X-AYLIEN-TextAPI-Application-ID: '. APPLICATION_ID
  ));
  curl_setopt($ch, CURLOPT_POST, true);
  curl_setopt($ch, CURLOPT_POSTFIELDS, $parameters);
  $response = curl_exec($ch);
  return json_decode($response);
}


$endpoints = array("language", "sentiment", "classify", "hashtags");
$url = "http://www.bbc.com/news/science-environment-27688511";

foreach($endpoints as $endpoint)
  {
    switch($endpoint){
      case "language":
      {
        $params = array('text' => 'What language is this sentence written in?');
        $language = call_api('language', $params);
        echo sprintf("Text: %s 
n", $language->text);
        echo sprintf("Language: %s (%F) 
n", $language->lang, $language->confidence);
        break;
      }
      case "sentiment":
      {
        $params = array('text' => 'John is a very good football player!');
        $sentiment = call_api('sentiment', $params);
        echo sprintf(" 
nText: %s 
n", $sentiment->text);
        echo sprintf("Sentiment: %s (%F) 
n", $sentiment->polarity, $sentiment->polarity_confidence);
        break;
      }
      case "classify":
      {
        echo sprintf("
nClassification:
n");
        $params = array('url' => $url);
        $classify = call_api('classify', $params);
        foreach($classify->categories as $val) {
          echo sprintf("
nLabel        :   %s     ", $val->label);
          echo sprintf("
nIPTC code    :   %s     ", $val->code);
          echo sprintf("
nConfidence   :   %F     ", $val->confidence);
        }
        break;
      }
      case "hashtags":
      {
        echo sprintf("
n
nHashtags:");
        $params = array('url' => $url);
        $hashtags = call_api('hashtags', $params);
        foreach($hashtags->hashtags as $val) {
          echo sprintf(" 
n %s", $val );
        }
        break;
      }
    }

  }
?>
0

Introduction

The automatic classification of documents is an example of how Machine Learning (ML) and Natural Language Processing (NLP) can be leveraged to enable machines to better understand human language. By classifying text, we are aiming to assign one or more classes or categories to a document or piece of text, making it easier to manage and sort the documents. Manually categorizing and grouping text sources can be extremely laborious and time-consuming, especially for publishers, news sites, blogs or anyone who deals with a lot of content.

Broadly speaking, there are two classes of ML techniques: supervised and unsupervised. In supervised methods, a model is created based on previous observations i.e. a training set. In the case of document classification, categories are predefined and a training dataset of documents is manually tagged as part of a category. Following the creation of a training dataset, a classifier is trained on the manually tagged dataset. The idea being that, the classifier will then be able to predict any given document’s category from then on.

Unsupervised ML techniques differ because they do not require a training dataset, and in case of documents, the categories are not known in advance. Unsupervised techniques such as Clustering and Topic Modelling are used to automatically discover groups of similar documents within a collection of documents.In this blog, we are going to concentrate on supervised methods of classification.

What a Classifier does

Classifiers make ‘predictions’, that is their job. In layman terms, when a classifier is fed a new document to classify, it makes a prediction that the document belongs to a particular class or category and often returns or assigns a category “label” for the document. Depending on the classification algorithm or strategy used, the classifier might also provide a confidence measure to indicate how confident it is that the classification label is correct. To explain how a classifier works it is probably best to illustrate with a simple example.

How a Classifier works

As we mentioned classification is about prediction. Take a simple example of predicting whether or not a football game will go ahead to illustrate how this works. First we want to create a dataset. To do this in we would track the outside temperature and whether or not it rained on any given game night over the course of a year to building up a dataset of weather conditions. We could then “tag” this data set with information about whether or not the game went ahead to create a training dataset for future predictions.

In this case, we have two “features.” temperature and rain, to help us predict whether the game will be played or not played. As is illustrated in the table below. On any new match night, we could then reference our table and use it to help us predict whether or not a game would go ahead. In this simple case if the temperature is below zero and it is raining (or snowing!) then there is a good chance that the game will be cancelled.

 

Temp (Degrees C) Rain Play?
15 No Yes
23 Yes Yes
-6 Yes No
-6 No Yes

 

In the table above, each column is called a “feature”, the “Play?” column is referred to as a “class” or “label” and the rows are called “instances”. These instances can be thought of as data points, which could be represented as a vector, as shown below:

<feature1, feature2,…, featureN>

A simple Illustration of Document Classification

If we apply a similar methodology to documents we can use the words within a document as the “features” to help us predict the classification of the document. Again, using a simple example:

In this example, we have three very short documents in our training set as shown below:

 

Reference Document Class 1 Reference Document Class 2 Reference Document Class 3
Some tigers live in the zoo Green is a color Go to New York city

 

We would start by taking all of the words across the three documents in our training set and creating a table or vector from these words.

<some,tigers,live,in,the,zoo,green,is,a,color,go,to,new,york,city> class

Then for each of the training documents, we would create a vector by assigning a 1 if the word exists in the training document and a 0 if it doesn’t, tagging the document with the appropriate class as follows.

 

some tigers live in the zoo green is a color go to new york city
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 class 1
0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 class 2
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 class 3

 

When a new untagged document arrives for classification and it contains the words “Orange is a color” we would create a word vector for it by marking the words which exist in our classification vector.

 

some tigers live in the zoo green is a color go to new york city
0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 unknown class

 

If we then compare this vector for the document of unknown class to the vectors representing our three document classes, we would see that it most closely resembles the vector for class 2 documents.

Comparison of the unknown document class with class 1 (6 matching terms)

< 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 > class 1

< 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0> Unknown class

Comparison of the unknown document class with class 2 (14 matching terms – winner!!)

< 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0>class 2

< 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0> Unknown class

Comparison of the unknown document class with class 3 (7 matching terms)

< 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 > class 3

< 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0> Unknown class

It is then possible to label the new document as a class 2 document with a adequate degree of confidence. This is a very simple but common example of a statistical Natural Language Processing method.

A more detailed look at real world document classification

A real world classifier has three components to it and we will look at each of these components individually to explain in a little bit more detail how a classifier works.

1. The dataset

As we demonstrated above, a statistical method of classification requires a collection of documents which have been manually tagged with their appropriate category. The quality of this dataset is by far the most important component of a statistical NLP classifier.

The dataset needs to be large enough to have an adequate number of documents in each class. For example if you wished to classify documents into 500 possible categories you may require 100 documents per category so a total of at least 50,000 documents would be required.

The dataset also needs to be of a high enough quality in terms of how distinct the documents in the different categories are from each other to allow clear delineation between the categories.

2. Preprocessing

In our simple examples, we have given equal importance to each and every word when creating document vectors. We could do some preprocessing and decide to give different weighting to words based on their importance to the document in question. A common methodology used to do this is TF-IDF (term frequency – inverse document frequency). The TF-IDF weighting for a word increases with the number of times the word appears in the document but decreases based on how frequently the word appears across the entire document set. This has the effect of giving a lower overall weighting to words which occur more frequently in the document set such as “a”, “it”, etc.

3. Classification Algorithm and Strategy

In our example above, the algorithm we used to classify our document was very simple. We classified the document by comparing the number of matching terms in the document vectors to see which class it most closely resembled. In reality, we may be placing documents into more than one category type and we may also be assigning multiple labels to a document within a given category type. We may also have a hierarchical structure in our taxonomy, and therefore require a classifier that takes that into account.

For example, using IPTC International Subject News Codes to assign labels, we may give a document, two labels simultaneously such as “sports event – World Cup” and “sport – soccer”, “sports” and “sports event” being the root category and “soccer” and “World Cup” being the child categories.

There are numerous algorithms used in classification such as Support Vector Machines (SVMs), Naive Bayes, Decision Trees the details of which are beyond the scope of this blog.

Conclusion

We hope that you now have a better understanding of the basics of document classification and how it works. As a recap, in supervised methods, a model is created based on a training set. A classifier is then trained on this manually tagged training dataset and is expected to predict any given document’s category from then on. The biggest factor affecting the quality of these predictions is the quality of the training data set. Keep an eye on our blog for more in the “Text Analysis 101” series.





Text Analysis API - Sign up





0

At AYLIEN, we’re constantly working to improve and enhance our product offering, we have a long list of features and enhancements we’re working through and adding to every day. A lot of these features focus on one key aspect which we feel is pivotal to our success, Developer Experience.

Part of our, “mission” (insert fluffy corporate mission here) is to bring text analytics to the everyday man. To do this, we realised our API needs to be super easy to use and extremely easy to integrate with on top of that our documentation needs to be clear and simple, our support needs to be on point and above all we need to be “developer focused”.

We want our API to be all of the following:

  • Simple
  • Hackable
  • Selfservice
  • Developer Focused

One of our KPI’s we track quite closely is “time to first hello world” (TTFHW) a nice idea we came across in a recent SlideShare by John Musser from ProgrammableWeb..

With all of this in mind we are constantly iterating or sign up process to ensure it’s as easy as possible for developers to harness the power of our API. One of these initiatives we have been working on recently is to provide SDKs for popular languages that our users can use to get up and running with the API.

As Parsa, our founder put it; “We believe that many industries and businesses could benefit from NLP and Text Analytics, and we’ve made it our mission to make it easier for everyone to tap into these technologies to create smarter and more efficient tools and applications that will change the way their industry works. Our SDKs are an important step in this direction, and enable any developer to add the smarts of AYLIEN’s Text Analysis API to their applications.”

Our first batch of SDKs focus on the most common languages used by our users and can be downloaded from our GitHub Repo.

We plan on adding SDK’s for Java, C# and Go and we should have these ready early in the new year. Keeping in line with our focus on developer experience, we also plan on launching a series of sample apps and a sandbox coding area for developers to test the API in.Happy Hacking!

 





Text Analysis API - Sign up






 

0

This edition, the 5th in the series, of our “Getting up and running with AYLIEN Text Analysis API” will focus on working with the API using C#. Previously we published code snippets and getting started guides for node.js, Powershell, Python, Java and Ruby.

As we did in our previous blogs, we’re going to perform some basic Text Analysis like detecting what language a piece of text is written in, analyzing the sentiment of a piece of text, classifying an article and finally generating some hashtags for a URL. The idea here is to make it as easy as possible for developers to get up and running with our API and to showcase how easy it is to get started in your chosen language.

We’re first going to look at the code in action. We’ll then go through the code, section by section, to investigate each of the endpoints used in the code snippet.

What are we going to do?:

  • Detect what language the following text is written in: “What language is this sentence written in?”
  • Analyze the sentiment of the following statement: “John is a very good football player!”
  • Generate a classification (IPTC Code and label) for the URL: “http://www.bbc.com/earth/story/20141110-earths-magnetic-field-flips-more”
  • Generate hashtags for the URL: “http://www.bbc.com/earth/story/20141110-earths-magnetic-field-flips-more”

Note: The getting started page on our website has a range of code snippets in other programming languages you can use if you would prefer not to use C#.

Overview of the code in action

The complete code snippet is given at the end of this blog for you to copy and paste. To get it running, open a text editor of your choice and copy and paste the snippet. Before running the code, make sure you replace the YOUR_APP_ID and YOUR_APP_KEY placeholders in the code with your own application id and application key. You would have been sent these when you signed up as an API user. If you haven’t signed up you can make your way to our sign up page to register for free.

Save the file as TextAPISample.cs and open a developer command prompt. Navigate to the folder where you saved the code snippet and compile the code by typing “csc TextAPISample.cs”. You can now run the executable by typing TextAPISample at the command prompt.

Once you run it, you should receive the following output:


C:srcGettingStarted>TextAPISample


Text     : What Language is this sentence written in?
Language : en (0.9999971593789524)


Text     : John is a very good football player!
Sentiment: positive (0.9999988272764874)


Classification:

Label        :   natural science - geology
IPTC code    :   13004001
Confidence   :   1.0


Hashtags:

#EarthsMagneticField
#RockMusic
#SantaCruzCalifornia
#Lava
#California
#Earth
#Convection
#Iron
#Helsinki
#SpaceColonization
#HistoryOfTheEarth
#Granite
#Finland
#DynamoTheory
#Compass
#SouthMagneticPole
#Hematite
#SolarRadiation

In this case we have detected that the first piece of text is written in English and the sentiment or polarity of the second statement is positive. We have also generated hashtags for the URL and classified the content.

Next, we’ll go through the code snippet, section by section.

Language Detection

Using the Language Detection endpoint you can analyze a piece of text or a URL and automatically determine what language it is written in. In the piece of code we have used in this blog, the “parameter” variable controls whether the call is made specifying the text directly or as a URL.


parameters.text = "What Language is this sentence written in?";
var language = CallApi("language", parameters);

In this case we have specified that, we want to analyze the following text “What language is this sentence written in?” and as you can see from the output below, it determined that the text is written in English and it gave a 0.999993 confidence score that the language was detected correctly. For all of the endpoints, the API returns the text which was analyzed for reference and we have included it in the results in each case.

Result:


Text     : What Language is this sentence written in?
Language : en (0.999997599332592)

Sentiment Analysis

Similarly, the Sentiment Analysis endpoint takes a piece of text or a URL and analyzes it to determine whether it is positive, negative or even neutral it also provides a confidence score in the results.


parameters.text = "John is a very good football player!";
var sentiment = CallApi("sentiment", parameters);

Analyzing the following text; “John is a very good football player!”. The API has determined that the sentiment of the piece of text is positive, we can also be pretty sure it’s correct based on the confidence score returned of 0.999998.

Result:


Text     : John is a very good football player!
Sentiment: positive (0.9999988272764874)

Hashtag Suggestions

The Hashtag Suggestion endpoint, analyses a URL and generates a list of hashtag suggestions which can be used to ensure that your content or URL’s are optimally shared on social media:


parameters.url = "http://www.bbc.com/earth/story/20141110-earths-magnetic-field-flips-more";
var hashtags = CallApi("hashtags", parameters);

For hashtag suggestions, we have used an article about changes in the orientation of the Earth’s magnetic field published on the BBC news website “http://www.bbc.com/earth/story/20141110-earths-magnetic-field-flips-more”. The hashtag suggestion endpoint first extracts the text from the URL and then analyzes that text and generates hashtag suggestions for it.

Result:


Hashtags:
#EarthsMagneticField
#RockMusic
#SantaCruzCalifornia
#Lava
#California
#Earth
#Convection
#Iron
#Helsinki
#SpaceColonization
#HistoryOfTheEarth
#Granite
#Finland
#DynamoTheory
#Compass
#SouthMagneticPole
#Hematite
#SolarRadiation

Article Classification

The classification endpoint automatically assigns or tags an article or piece of text to one or more categories making it easier to manage and sort. The classification endpoint is based on IPTC International Subject News Codes and can identify up to 500 categories.


parameters.url = "http://www.bbc.com/earth/story/20141110-earths-magnetic-field-flips-more";
var classify = CallApi("classify", parameters);

When we analyze the url pointing to the BBC news story, we receive the results as shown below. As you can see it has labelled the article as “natural science- geology” with a corresponding IPTC code of 13004001 and a confidence of 1.

Result:


Classification:

Label        :   natural science - geology
IPTC code    :   13004001
Confidence   :   1.0

For more getting started guides and code snippets to help you get up and running with our API, visit our getting started page on our website. If you haven’t already done so you can get free access to our API on our sign up page.

The Complete Code Snippet


using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Dynamic;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Web;
using System.Web.Script.Serialization;

namespace AylienTextAPI
{
    class Program
    {
        static void Main(string[] args)
        {
            var url = "http://www.bbc.com/earth/story/20141110-earths-magnetic-field-flips-more";
            string[] endpoints = new string[] {"language", "sentiment", "classify", "hashtags"};

foreach(var endpoint in endpoints){
  switch(endpoint){
    case "language":
    {
      dynamic parameters = new System.Dynamic.ExpandoObject();
      parameters.text = "What Language is this sentence written in?";
      var language = CallApi("language", parameters);
      Console.WriteLine("nText     : {0}", language["text"] );
      Console.WriteLine("Language : {0} ({1})",
      language["lang"], language["confidence"]);
      break;
    }
    case "sentiment":{
      dynamic parameters = new System.Dynamic.ExpandoObject();
      parameters.text = "John is a very good football player!";
      var sentiment = CallApi("sentiment", parameters);
      Console.WriteLine("nText     : {0}", sentiment["text"] );
      Console.WriteLine("Sentiment: {0} ({1})",
      sentiment["polarity"], sentiment["polarity_confidence"]);
      break;
    }
    case "classify": {
      dynamic parameters = new System.Dynamic.ExpandoObject();
      parameters.url = url;
      var classify = CallApi("classify", parameters);
      Console.Write("nClassification: n");
      foreach(var item in classify["categories"])
      {
          Console.WriteLine("Label        :   {0}     ", item["label"].ToString());
          Console.WriteLine("IPTC code    :   {0}     ", item["code"].ToString());
          Console.WriteLine("Confidence   :   {0}     ", item["confidence"].ToString());
      }
      break;
    }
    case "hashtags":{
      dynamic parameters = new System.Dynamic.ExpandoObject();
      parameters.url = url;
      var hashtags = CallApi("hashtags", parameters);
      Console.Write("nHashtags: n");
      foreach(var item in hashtags["hashtags"])
      {
          Console.WriteLine(item.ToString());
      }
      break;
    }
  }
}
        }

        private static dynamic CallApi(String endpoint, dynamic parameters)
        {
            String APPLICATION_ID = YOUR_APPLICATION_ID;
            String APPLICATION_KEY = YOUR_APPLICATION_KEY;
            Uri address = new Uri("https://api.aylien.com/api/v1/" +
                endpoint);

            // Create the web request
            HttpWebRequest request = WebRequest.Create(address) as
                HttpWebRequest;

            // Set type to POST
            request.Method = "POST";
            request.ContentType = "application/x-www-form-urlencoded";

            // Create the data we want to send
            StringBuilder data = new StringBuilder();
            var i = 0;
            foreach (var item in parameters)
            {
                if (i == 0)
                    data.Append(item.Key + "=" +
                        HttpUtility.UrlEncode(item.Value));
                else
                    data.Append("&" + item.Key + "=" +
                        HttpUtility.UrlEncode(item.Value));

                i++;
            }

            // Create a byte array of the data we want to send
            byte[] byteData = UTF8Encoding.UTF8.GetBytes(data.ToString());

            // Set the content length in the request headers
            request.ContentLength = byteData.Length;
            request.Accept = "application/json";
            request.Headers.Add("X-AYLIEN-TextAPI-Application-ID",
                APPLICATION_ID);
            request.Headers.Add("X-AYLIEN-TextAPI-Application-Key",
                APPLICATION_KEY);

            // Write data
            using (Stream postStream = request.GetRequestStream())
            {
                postStream.Write(byteData, 0, byteData.Length);
            }

            // Get response
            using (HttpWebResponse response = request.GetResponse()
                as HttpWebResponse)
            {
                // Get the response stream
                StreamReader reader =
                    new StreamReader(response.GetResponseStream());

                // Serialize to JSON dynamic object
                var serializer = new JavaScriptSerializer();
                return serializer.Deserialize(reader.ReadToEnd(),
                    typeof(object));
            }
        }
    }

    // http://stackoverflow.com/a/3806407/1455811
    public sealed class DynamicJsonConverter : JavaScriptConverter
    {
        public override object Deserialize(IDictionary
            dictionary,
            Type type, JavaScriptSerializer serializer)
        {
            if (dictionary == null)
                throw new ArgumentNullException("dictionary");

            return type == typeof(object) ?
                new DynamicJsonObject(dictionary) : null;
        }

        public override IDictionary
            Serialize(object obj, JavaScriptSerializer serializer)
        {
            throw new NotImplementedException();
        }

        public override IEnumerable SupportedTypes
        {
            get { return new ReadOnlyCollection
                (new List(new[] { typeof(object) })); }
        }

        #region Nested type: DynamicJsonObject

        private sealed class DynamicJsonObject : DynamicObject
        {
            private readonly IDictionary _dictionary;

            public DynamicJsonObject(IDictionary dictionary)
            {
                if (dictionary == null)
                    throw new ArgumentNullException("dictionary");
                _dictionary = dictionary;
            }

            public override string ToString()
            {
                var sb = new StringBuilder("{");
                ToString(sb);
                return sb.ToString();
            }

            private void ToString(StringBuilder sb)
            {
                var firstInDictionary = true;
                foreach (var pair in _dictionary)
                {
                    if (!firstInDictionary)
                        sb.Append(",");
                    firstInDictionary = false;
                    var value = pair.Value;
                    var name = pair.Key;
                    if (value is string)
                    {
                        sb.AppendFormat("{0}:"{1}"", name, value);
                    }
                    else if (value is IDictionary)
                    {
                        new DynamicJsonObject((IDictionary)
                            value).ToString(sb);
                    }
                    else if (value is ArrayList)
                    {
                        sb.Append(name + ":[");
                        var firstInArray = true;
                        foreach (var arrayValue in (ArrayList)value)
                        {
                            if (!firstInArray)
                                sb.Append(",");
                            firstInArray = false;
                            if (arrayValue is IDictionary)
                                new DynamicJsonObject(
                                    (IDictionary)
                                    arrayValue).ToString(sb);
                            else if (arrayValue is string)
                                sb.AppendFormat(""{0}"", arrayValue);
                            else
                                sb.AppendFormat("{0}", arrayValue);

                        }
                        sb.Append("]");
                    }
                    else
                    {
                        sb.AppendFormat("{0}:{1}", name, value);
                    }
                }
                sb.Append("}");
            }

            public override bool TryGetMember(GetMemberBinder binder,
                out object result)
            {
                if (!_dictionary.TryGetValue(binder.Name, out result))
                {
                    // return null to avoid exception.
                    // caller can check for null this way...
                    result = null;
                    return true;
                }

                result = WrapResultObject(result);
                return true;
            }

            public override bool TryGetIndex(GetIndexBinder binder,
                object[] indexes, out object result)
            {
                if (indexes.Length == 1 && indexes[0] != null)
                {
                    if (!_dictionary.TryGetValue(indexes[0].ToString(),
                        out result))
                    {
                        // return null to avoid exception.
                        // caller can check for null this way...
                        result = null;
                        return true;
                    }

                    result = WrapResultObject(result);
                    return true;
                }

                return base.TryGetIndex(binder, indexes, out result);
            }

            private static object WrapResultObject(object result)
            {
                var dictionary = result as IDictionary;
                if (dictionary != null)
                    return new DynamicJsonObject(dictionary);

                var arrayList = result as ArrayList;
                if (arrayList != null && arrayList.Count > 0)
                {
                    return arrayList[0] is IDictionary
                        ? new List(
                            arrayList.Cast>()
                            .Select(x => new DynamicJsonObject(x)))
                        : new List(arrayList.Cast());
                }

                return result;
            }
        }

        #endregion
    }
}



$(function(){
  $('.prettyprint:last').tooltip({title:'Click to expand!', trigger:'hover'});
  $('.prettyprint:last').on('click', function() {
    $(this).css({height: '100%', cursor: 'default'}).tooltip('destroy');
  });
});





Text Analysis API - Sign up




0