###### Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

### Introduction

We recently added a feature to our API that allows users to classify text according to their own labels. This unsupervised method of classification relies on Explicit Semantic Analysis in order to determine how closely matched a piece of text and a label or tag are.

This method of classification provides greater flexibility when classifying text and doesn’t rely on a particular taxonomy to understand and categorize a piece of text.

Explicit Semantic Analysis (ESA) works at the level of meaning rather than on the surface form vocabulary of a word or document. ESA represents the meaning of a piece text, as a combination of the concepts found in the text and is used in document classification, semantic relatedness calculation (i.e. how similar in meaning two words or pieces of text are to each other), and information retrieval.

In document classification, for example, documents are tagged to make them easier to manage and sort. Tagging a document with keywords makes it easier to find. However, keyword tagging alone has it’s limitations; searches carried out using vocabulary with similar meaning, but different actual words may not uncover relevant documents. However classifying text semantically i.e. representing the document as concepts and lowering the dependence on specific keywords can greatly improve a machine’s understanding of text.

### How is Explicit Semantic Analysis achieved?

Wikipedia is a large and diverse knowledge base where each article can be considered a distinct concept. In Wikipedia based ESA, a concept is generated for each article. Each concept is then represented as a vector of the words which occur in the article, weighted by their tf-idf score.

The meaning of any given word can then be represented as a vector of that word’s relatedness, or “association weighting” to the Wikipedia based concepts.

“word” 	---> <concept1, weight1>, <concept2, weight2>, <concept3, weight3> - - -


A trivial example might be:

““Mars” -----> <planet, 0.90>, <Solar system, 0.85>, <jupiter 0.30> - - - -


Comparing two word vectors (using cosine similarity) we can get a numerical value for the semantic relatedness of words i.e. we can quantify how similar the words are to each other based on their association weighting to the various concepts.

Note: In Text Analysis a vector is simply a numerical representation of a word or document. It is easier for algorithms to work with numbers than with characters. Additionally, vectors can be plotted graphically and the “distance” between them is a visual representation of how closely related in terms of meaning words and documents are to each other.

### Explicit Semantic Analysis and Documents

Larger documents are represented as a combination of individual word vectors derived from the words within a document. The resultant document vectors are known as “concept” vectors. For example, a concept vector might look something like the following:

“Mars” 		---> <planet, 0.90>, <Solar system, 0.85>, <jupiter 0.30> - - - -
“explorer” 	---> <adventurer, 0.89>, <pioneer, 0.70>, <vehicle, 0.20> - - -
;			;			;			;
;			:			:			:
“wordn” 	---> <conceptb, weightb>, <conceptd, weightd>, <conceptp, weightp> - - -


Graphically, we can represent a concept vector as the centroid of the word vectors it is composed of. The image below illustrates the centroid of a set of vectors i.e. it is the center or average position of the vectors.

So, to compare how similar two phrases are we can create their concept vectors from their constituent word vectors and then compare the two, again using cosine similarity.

### ESA and Dataless Classification

This functionality is particularly useful when you want to classify a document, but you don’t want to use a known taxonomy. It allows you to specify on the fly a proprietary taxonomy on which to base the classification. You provide the text to be classified as well as potential labels and through ESA it is determined which label is most closely related to your piece of text.

### Summary

ESA operates at the level of concepts and meaning rather than just the surface form vocabulary. As such, it can improve the accuracy of document classification, information retrieval and semantic relatedness.

If you would like to know more about this topic check out this excellent blog from Christopher Olah and this very accessible research paper from Egozi, Markovitch and Gabrilovich, both of which I referred to heavily when researching this blog post.

Keep an eye out for more in our “Text Analysis 101” series.

If you’re new to AYLIEN and don’t yet have an account you can take a look at our blog on getting started with the API or alternatively you can go directly to the getting started page on our website which will take you through the signup process. We provide a free plan which allows you to make up to 1,000 calls per day to the API for free forever.

All of our SDK repositories are hosted on Github. To use the SDK start by making the following addition to your composer.json.


{
"require": {
"aylien/textapi": "0.1.*"
}
}


Once you’ve installed the SDK you’re ready to start coding! For the remainder of this blog we’ll walk you through making calls using the PHP SDK and show the output you should receive in each case while showcasing a few simple features of the API.

### Configuring the SDK with your AYLIEN credentials

Once you’ve subscribed to our API and have downloaded the SDK you can start making calls by adding the following PHP code.


require __DIR__ . "/vendor/autoload.php";
$textapi = new AYLIENTextAPI("YourApplicationId", "YourApplicationKey");  When calling the API you can pass a piece of text directly to the API for analysis or you can pass a URL and we will automatically extract the main piece of text on that webpage. ### Language Detection Let’s take a look at the Language Detection endoint. The Language detection endpoint is uite straightforward. It will automatically tell you what language a piece of text is written in. In this example we’ll detect the language of the following sentence; ‘What language is this sentence written in?’ You can call the endpoint using the following piece of code. $text = "What language is this sentence written in?";
$language =$textapi->Language(array("text" => $text)); echo sprintf("Text: %s <br/>",$language->text);
echo sprintf("Language: %s <br/>", $language->lang); echo sprintf("Confidence: %F <br/>",$language->confidence);


You should receive an output similar to the one shown below which shows that the language detected was English and the confidence that it was detected correctly (a number between 0 and 1) is very close to 1 indicating that you can be pretty sure it is correct.

#### Language Detection Results


Text: What language is this sentence written in?
Language: en
Confidence: 0.999997


### Sentiment Analysis

Next, we’ll look at analyzing the sentence “John is a very good football player” to determine it’s sentiment i.e. positive , neutral or negative. The endpoint will also determine if the text is subjective or objective. You can call the endpoint with the following piece of code


$text = "John is a very good football player!";$sentiment = $textapi->Sentiment(array("text" =>$text));
echo sprintf(" <br/>Text: %s <br/>", $sentiment->text); echo sprintf("Sentiment Polarity: %s <br/>",$sentiment->polarity);
echo sprintf("Polarity Confidence: %F <br/>", $sentiment->polarity_confidence);  You should receive an output similar to the one shown below which indicates that the sentence is objective and is positive, both with a high degree of confidence. #### Sentiment Analysis Results  Text: John is a very good football player! Sentiment Polarity: positive Polarity Confidence: 0.999999  ### Article Classification Now we’re going to take a look at our Classification endpoint. The Classification endpoint automatically assigns an article or piece of text to one or more categories making it easier to manage and sort. The classification is based on IPTC International Subject News Codes and can identify up to 500 categories. The code below analyses a BBC news article about how animals eveolved on earth. $url = "http://www.bbc.com/earth/story/20150112-did-snowball-earth-make-animals";
echo sprintf("<br/>Classification:<br/>");
$classify =$textapi->Classify(array("url" => $url)); echo sprintf("URL: %s",$url);
foreach($classify->categories as$val) {
echo sprintf("<br/>Label        :   %s     ", $val->label); echo sprintf("<br/>IPTC code : %s ",$val->code);
echo sprintf("<br/>Confidence   :   %F     ", $val->confidence); }  When you run this code you should receive an output similar to that shown below which assigns the article an IPTC label of “natural science – geology” with an IPTC code of 13004001. #### Article Classification Results  Classification: URL: http://www.bbc.com/earth/story/20150112-did-snowball-earth-make-animals Label : natural science - geology IPTC code : 13004001 Confidence : 1.000000  ### Hashtag Analysis Next, we’ll look at analyzing the same BBC article to extract hashtag suggestions for sharing the article on social media with the following code.  echo sprintf("<br/><br/>Hashtags:<br/>"); echo sprintf("URL: %s",$url);
$hashtags =$textapi->Hashtags(array("url" => $url)); foreach($hashtags->hashtags as $val) { echo sprintf(" <br/> %s",$val );
}


You should receive the output shown below.

#### Hashtag Suggestion Results


Hashtags:
URL: http://www.bbc.com/earth/story/20150112-did-snowball-earth-make-animals
#SnowballEarth
#Oxygen
#Earth
#Evolution
#CarbonDioxide
#Glacier
#GlacialPeriod
#OperationDeepFreeze
#CambrianExplosion ....


If your more of a node of Java fan, check out the rest of our SDKs for node.js, Go, Ruby, Python, Java and .Net (C#). For more information regarding the APIs go to the documentation section of our website.

We’re publishing ‘getting started’ blogs for the remaining SDKs over the coming weeks so keep an eye out for them.

One of our API users Farhad, from Taskulu recently published a blog post on how he uses AYLIEN Text Analysis API for lead generation on Twitter.

As Farhad puts it himself; “There are many people out there literally asking you to introduce your product to them so they can become your customers, but the problem is that it’s very difficult to find them!”

The idea behind the app he created is simple; discover, understand and get involved in conversations about management platforms on twitter, as they happen.

For those of you who want to try it out, they’ve published the code for generating Tweeleads on Github. You can also find the documentation and instructions for setting it up on the Github page.

The whole process doesn’t take longer than 5 minutes! If you want to give it a go, you’ll need to do the following:

Farhad Hedayati is co-founder and CEO of Taskulu, a management platform that lets you keep all the stakeholders and resources of your project inside one project – making project managers lives, team management and communication a lot easier. He regularly writes about growth hacking, marketing techniques, startups, and occasionally programming, you can Follow him on Twitter: @farhad_hf

If you have an interesting application for Text Analysis or our API then let us know. We love to see our users hack simple and useful apps together off our API’s.

We’ve just added support for microformat parsing to our Text Analysis API through our Microformat Extraction endpoint.

Microformats are simple conventions or entities that are used on web pages, to describe a specific type of information, for example, Contact info, Reviews, Products, People, Events, etc.

Microformats are often included in the HTML of pages on the web to add semantic information about that page. They make it easier for machines and software to scan, process and understand webpages. AYLIEN Microformat Extraction allows users to detect, parse and extract embedded Microformats when they are present on a page.

Currently, the API supports the hCard format. We will be providing support for the other formats over the coming months. The quickest way to get up and running with this endpoint is to download an SDK and checkout the documentation. We have gone through a simple example below to showcase the endpoints capabilities.

### Microformat Extraction in Action

The following piece of code sets up the credentials for accessing our API. If you don’t have an AYLIEN account, you can sign up here.


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
application_id: YOUR_APP_ID,
application_key: ‘YOUR_APP_KEY'
});



The next piece of code accesses an HTML test page containing microformats, that we have setup in codepen to illustrate how the endpoint works (check out http://codepen.io/michaelo/pen/VYxxRR.html to see the raw HTML). The code consists of a call to the microformats endpoint and a forEach statement to display any hCards detected on the page.


textapi.microformats('http://codepen.io/michaelo/pen/VYxxRR.html',
function(err, res) {
if (err !== null) {
console.log("Error: " + err);
} else {
res.hCards.forEach(function(hCard) {
console.log(hCard);
console.log("n****************************************");
console.log("End Of vCard");
console.log("******************************************");
});
}
});



As you can see from the results below, there are two hcards on the page, one for Sally Ride and the other for John Glenn. The documentation for the endpoint shows the structure of the data returned by the endpoint and lists the optional hCard fields that are currently supported. You can copy the code above and paste it into our sandbox environment to view the results for yourself and play around with the various fields.

#### Results


{ birthday: '1951-05-26',
organization: 'Sally Ride Science',
telephoneNumber: '+1.818.555.1212',
location:
{ id: '9f15e27ff48eb28c57f49fb177a1ed0af78f93ab',
latitude: '37.386013',
longitude: '-122.082932' },
photo: 'http://example.com/sk.jpg',
email: 'sally@example.com',
url: 'http://sally.example.com',
fullName: 'Sally Ride',
structuredName:
{ familyName: 'van der Harten',
givenName: 'Sally',
honorificSuffix: 'Ph.D.',
honorificPrefix: 'Dr.' },
logo: 'http://www.abc.com/pub/logos/abccorp.jpg',
id: '7d021199b0d826eef60cd31279037270e38715cd',
note: '1st American woman in space.',
{ streetAddress: '123 Main st.',
countryName: 'U.S.A',
postalCode: 'LWT12Z',
id: '00cc73c1f9773a66613b04f11ce57317eecf636b',
region: 'California',
locality: 'Los Angeles' },
category: 'physicist' }

****************************************
End Of vCard
****************************************

{ birthday: '1921-07-18',
telephoneNumber: '+1.818.555.1313',
location:
latitude: '30.386013',
longitude: '-123.082932' },
photo: 'http://example.com/jg.jpg',
email: 'johnglenn@example.com',
url: 'http://john.example.com',
fullName: 'John Glenn',
structuredName:
{ familyName: 'Glenn',
givenName: 'John',
id: 'a1146a5a67d236f340c5e906553f16d59113a417',
honorificPrefix: 'Senator' },
logo: 'http://www.example.com/pub/logos/abccorp.jpg',
id: '18538282ee1ac00b28f8645dff758f2ce696f8e5',
note: '1st American to orbit the Earth',
{ streetAddress: '456 Main st.',
countryName: 'U.S.A',
postalCode: 'PC123',
id: '8cc940d376d3ddf77c6a5938cf731ee4ac01e128',
region: 'Ohio',
locality: 'Columbus' } }

****************************************
End Of vCard
****************************************



Microformats Extraction allows you to automatically scan and understand webpages by pulling relevant information from HTML. This microformat information is easier for both humans and now machines to understand than other complex forms such as XML.

Our development team have been working hard adding additional features to the API which allow our users to analyze, classify and tag text in more flexible ways. Unsupervised Classification is a feature we are really excited about and we’re happy to announce that it is available as a fully functional and documented feature, as of today.

#### So what exactly is Unsupervised Classification?

It’s a training-less approach to classification, which means, unlike our standard classification, that is based on IPTC News Codes, it doesn’t rely on a predefined taxonomy to categorize text. This method of classification allows automatic tagging of text that can be tailored to a users needs, without the need for a pre-trained classifier.

#### Why are we so excited about it?

Our Unsupervised Classification endpoint will allow users to specify a set of labels, analyze a piece of text and then assign the most appropriate label to that text. This allows greater flexibility for our users to decide, how they want to tag and classify text.

There are a number of ways this endpoint can be used and we’ll walk you through a couple of simple examples; Text Classification from a URL and Customer Service Routing of social interactions.

### Classification of Text

We’ll start with a simple example to show how the feature works. The user passes a piece of text or a URL to the API, along with a number of labels. In the case below we want to find out which label, Football, Baseball, Hockey or Basketball, best represents the following article: ‘http://insider.espn.go.com/nfl/story/_/id/12300361/bold-move-new-england-patriots-miami-dolphins-new-york-jets-buffalo-bills-nfl’

#### Code Snippet:


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
application_id: 'YourAppId',
application_key: 'YourAppKey'
});

var params = {
url: 'http://insider.espn.go.com/nfl/story/_/id/12300361/bold-move-new-england-patriots-miami-dolphins-new-york-jets-buffalo-bills-nfl',
'class': ['basketball', 'baseball', 'football', 'hockey']
};

textapi.unsupervisedClassify(params, function(error, response) {
if (error !== null) {
console.log(error, response);
} else {
console.log("nThe text to classify is : nn",
response.text, "n");
for (var i = 0; i < response.classes.length; i++) {
console.log("label - ", response.classes[i].label,
", score -", response.classes[i].score, "n");
}
}
});


#### Results:


The text to classify is:

"Each NFL team's offseason is filled with small moves and marginal personnel decisions... "

label -  football , score - 0.13

label -  baseball , score - 0.042

label -  hockey , score - 0.008

label -  basketball , score - 0.008


Based on the scores provided, we can confidently say, that the article is about football and should be assigned a “Football” label.

### Customer Service Routing

As another example, let’s say we want to automatically determine whether a post on social media should be routed to our Sales, Marketing or Support Departments. In this example, we’ll take the following comment: “I’d like to place an order for 1000 units.” and automatically determine whether it should be dealt with by Sales, Marketing or Support. To do this, we pass the text to the API as well as our pre-chosen labels, in this case: ‘Sales’, ‘Customer Support’, ‘Marketing’.

#### Code Snippet:


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
application_id: 'YourAppId',
application_key: 'YourAppKey'
});

var params = {
text: "I'd like to place an order for 1000 units.",
'class': ['Sales', 'Customer Support', 'Marketing']
};

textapi.unsupervisedClassify(params, function(error, response) {
if (error !== null) {
console.log(error, response);
} else {
console.log("nThe text to classify is : nn",
response.text, "n");
for (var i = 0; i < response.classes.length; i++) {
console.log("label - ",
response.classes[i].label,
", score -", response.classes[i].score, "n");
}
}
});


#### Results:


The text to classify is:

I'd like to place an order for 1000 units.

label -  Sales , score - 0.032

label -  Customer Support , score - 0.008

label -  Marketing , score - 0.002


Similarily, based on the scores given on how closely the text is semantically matched to a label, we can decide that this inquiry should be handled by a sales agent rather than, marketing or support.

### Divide and Conquer

Our next example deals with the idea of using the unsupervised classification feature, with a hierarchical taxonomy. When classifying text, it’s sometimes necessary to add a sub-label for finer grained classification, for example “Sports – Basketball” instead of just “sports”.

So, in this example we’re going to analyze a simple piece of text: “The oboe is a woodwind musical instrument” and we’ll attempt to provide a more descriptive classification result, based on the following taxonomy;

• ‘music’: [‘Instrument’, ‘composer’],
• ‘technology’: [‘computers’, ‘space’, ‘physics’],
• ‘health’: [‘disease’, ‘medicine’, ‘fitness’],
• ‘sport’: [‘football’, ‘baseball’, ‘basketball’]

The taxonomy has a primary label and a secondary label, for example ‘music’ (primary) and ‘instrument, Composer’ (secondary)

#### Code Snippet:


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
application_id: 'YourAppId',
application_key: 'YourAppKey'
});

var _ = require('underscore');
var taxonomy = {
'music':      ['Instrument', 'composer'],
'technology': ['computers', 'space', 'physics'],
'health':     ['disease', 'medicine', 'fitness'],
'sport':      ['football', 'baseball', 'basketball']
};

var topClasses = ['technology', 'music', 'health', 'sport'];
var queryText = "The oboe is a woodwind musical instrument.";
var params = {
text: queryText,
'class': topClasses
};

textapi.unsupervisedClassify(params, function(error, response) {
if (error !== null) {
console.log(error, response);
} else {
var classificationResult = '';
console.log("nThe text to classify is : nn",
response.text, "n");
classificationResult = response.classes[0].label +
" (" + response.classes[0].score + ") ";
params = {
text: queryText,
'class': _.values(
_.pick(taxonomy, response.classes[0].label)
)[0]
};
textapi.unsupervisedClassify(params,
function(error, response) {
if (error !== null) {
console.log(error, response);
} else {
classificationResult += " - " +
response.classes[0].label +
" (" + response.classes[0].score +
") ";
console.log("Label: ", classificationResult);
}
}
);
}

});


#### Results:


The text to classify is :

The Obo is a large musical instrument

Label    :     music (0.076)  - Instrument (0.342)


As you can see from the results, the piece of text has been assigned ‘music’ as its primary label and ‘instrument’ as its secondary label.

All the code snippets in our examples are fully functional and can be copied and pasted or tested in our sandbox. We’ll also be adding some of these and more interesting apps to our sandbox over the next week or so that will showcase some interesting use cases for Unsupervised Classification. We’d also love to hear more about how you would use this feature, so don’t hesitate to get in touch with comments or feedback.

• If you’ve visited our blog before, you’ll know we like to analyze social chatter surrounding major events around the world and try to visualize our findings in interesting ways. In the past, we’ve collected Tweets from the FIFA World Cup, Apple Live and most recently we decided to analyze the public reaction on Twitter to Super Bowl XLIX.

Not surprisingly, Superbowl XLIX generated a huge amount of chatter on social networks with Twitter Estimating that over 28.4 million posts made with terms relating to the Superbowl.

At AYLIEN, we collected just under 4 million Tweets from Hashtags, Handles and Keywords we were monitoring. To keep our sample clean, we removed any reTweets and spam from the Tweets collected and only worked with those Tweets that were written in English. We were left with about 3.5 million Tweets to play with. Our idea was to collect a sample of Tweets run them through our Text Analysis API and visualise the results in some interactive graphs.

### Tools:

#### Data Collection:

• Twitter Streaming API
• RapidMiner

#### Data Analysis:

• AYLIEN Text Analysis API
• Language Detection
• Entity Extraction
• Sentiment Analysis

• Tableau

### Location

First of all, we looked at where most of the Twitter activity was coming from. Not surprisingly, the most activity was coming from the US. Europeans were also quite active and by the looks of it were happy to suffer from lack of sleep at work on the Monday in order to stay up and experience the event live.

#### Location

The superbowl chatter however, did spread far and wide. There was even a Tweet posted during the game by somebody in Antartica!

<

### Total Volume

The second thing we did was, we analysed the volume of Tweets over time. We hoped to see how major events before, during and immediately after the game affected how vocal fans were. Lo and behold it worked.

#### Major Events

On top, we have the total volume of Tweets and in the lower graph we have displayed the Tweets related to certain entities, teams, players, coaches and of course Katy Perry.

What’s interesting here is, you can see exactly when the activity kicked off for the pre-game coverage. You can imagine fans settling down for the game and voicing their opinions and predictions on Twitter via their phone or tablet. Throughout the game, there were 3 major spikes in activity, just before the kickoff, halftime and the turning point of the game when the patriots went ahead 28-24.

### Mentions

By far the most mentions went to none other than…Katy Perry, who headlined the much-anticipated halftime show. What does this tell us about the Superbowl fans? They love pop music?

Half time shows aside, there were some interesting constants and spikes in mentions as the game developed. References to the much loved Tom Brady were pretty constant throughout the game. Pete Carroll, however, only really featured with a spike in activity towards the end of the game (I wonder why that was?). Not immediately after the game but once the dust had settled and reality sunk in, there was a huge spike of activity mentioning the Seahawks, perhaps from disgruntled fans expressing their frustration or maybe even loyal fans reiterating their support for a team who were narrowly defeated. Somehow, I doubt it was the latter.

So while the spikes in mentions are interesting, what is more, insightful is the context of the Tweets. Whether these are posts made in support of teams and individuals or the opposite.

### Sentiment Analysis: Positive and Negative

One of the more interesting visualizations we produced was the sentiment intensity graph below:

#### Sentiment Intensity

This displays the polarity (positive or negative) of Tweets which mentioned either team. Perhaps the most interesting event on this graph is the extreme swing in polarity of Tweets mentioning the Seahawks. Right when the Patriots took the lead the polarity of posts, mentioning Seahawks, went from slightly negative to “very negative”. It’s also pretty clear from the drop in positivity that, Seahawks fans reacted very negatively to handing over their lead to their opponents.

We were also interested in how the Polarity of Tweets mentioning certain entities, developed throughout the game. We displayed some of the most interesting ones below by focusing on the teams, key players and Pete Carroll.

#### Sentiment Polarity

Tweets mentioning the Patriots had very little negativity associated with them, which was also evident in our first sentiment graph. From this, we can assume that either Patriots fans have a lot more faith in their team or, they are generally liked a lot more by football fans with no tie to either team. Either way they had a much larger following of supporters than their opponents. Another interesting aspect to this visualisation is how the “hero” Tom Brady stayed in the positive range throughout while sentiment, toward Marshawn Lynch and Pete Carroll especially, plummeted after the game as fans voiced their opinions on Carrolls Superbowl losing decision. Opting not to run the ball with Marshawn Lynch and go for the touchdown pass instead, was a decision that cost him dearly.

These days, the Superbowl is as much about the ads and halftime show, as it is about the football. Before the game, we decided to track a few of the bigger name brands to try and get a feel for who won the ads battle.

#### Brand Mentions

Of the 4 brands we followed Budweisers #lostdog campaign dominated with more than 5X the mentions on Twitter than the other brands. We also tracked viewers reactions to the advertisements by again analyzing the sentiment of Tweets made that referenced the brand.

While Budweiser had the most mentions, they also had the strongest positive reaction to the ad, as shown below. However, the same can’t be said for T-Mobiles ad with Kim Kardashian, which was very poorly received by Superbowl fans. But you know what they say, bad publicity is good publicity.

#### Brand Sentiment

This is the fourth in our series of blogs on getting started with our various SDKs. Depending on what your language preference is, we have SDKs available for Node.js, Python, Ruby, PHP, GO, Java and .Net (C#). Last week’s blog focused using our Go SDK. This week we’ll focus getting up and running with Ruby.

If you are new to our API and you don’t have an account, you can go directly to the Getting Started page on our website, which will take you through the signup process. You can sign up to a free plan to get started which, allows you to make up to 1,000 calls per day to the API for free.

All of our SDK repositories are hosted on Github, you can get the Ruby repository here. The simplest way to install the repository is by using “gem”.

\$ gem install aylien_text_api


### Configuring the SDK with your AYLIEN credentials

Once you have received your AYLIEN APP_ID and APP_KEY from the signup process and have downloaded the SDK you can start making calls by passing your configuration parameters as a block to AylienTextApi.configure.

require 'aylien_text_api'

AylienTextApi.configure do |config|
config.app_id        =    "YOUR_APP_ID"
config.app_key       =    "YOUR_APP_KEY"
end
client = AylienTextApi::Client.new


Alternatively, you can pass them as parameters to AylienTextApi::Client class.

require 'aylien_text_api'

client = AylienTextApi::Client.new(app_id: "YOUR APP ID", app_key: "YOUR APP KEY")


When calling the various API endpoints you can specify a piece of text directly for analysis or you can pass a url linking to the text or article you wish to analyze.

### Language Detection

First let’s take a look at the language detection endoint. Specifically we will detect the language of the sentence “’What language is this sentence written in?’

You can call the endpoint using the following piece of code.

text = "What language is this sentence written in?"
language = client.language text: text
puts "Text: #{language[:text]}"
puts "Language: #{language[:lang]}"
puts "Confidence: #{language[:confidence]}"


You should receive an output very similar to the one shown below which shows that the language detected was English and the confidence that it was detected correctly (a number between 0 and 1) is very close to 1 indicating that you can be pretty sure it is correct.

Language Detection Results

Text: What language is this sentence written in?
Language: en
Confidence: 0.9999962069593649


### Sentiment Analysis

Next we will look at analyzing the sentence “John is a very good football player” to determine it’s sentiment i.e. positive , neutral or negative. The endpoint will also determine if the text is subjective or objective. You can call the endpoint with the following piece of code

text = "John is a very good football player!"
sentiment = client.sentiment text: text
puts "Text            :  #{sentiment[:text]}"
puts "Sentiment Polarity  :  #{sentiment[:polarity]}"
puts "Polarity Confidence  :  #{sentiment[:polarity_confidence]}"
puts "Subjectivity  :  #{sentiment[:subjectivity]}"
puts "Subjectivity Confidence  :  #{sentiment[:subjectivity_confidence]}"


You should receive an output similar to the one shown below which indicates that the sentence is objective and is positive, both with a high degree of confidence.

Sentiment Analysis Results

Text            :  John is a very good football player!
Sentiment Polarity  :  positive
Polarity Confidence  :  0.9999988272764874
Subjectivity  :  objective
Subjectivity Confidence  :  0.9896821594138254


### Article Classification

Next we will take a look at the classification endpoint. The Classification endpoint automatically assigns an article or piece of text to one or more categories making it easier to manage and sort. The classification is based on IPTC International Subject News Codes and can identify up to 500 categories. The code below analyses a BBC news article about mega storms on the planet Uranus.

url = "http://www.bbc.com/earth/story/20150121-mega-storms-sweep-uranus"
classify = client.classify url: url
classify[:categories].each do |cat|
puts "Label : #{cat[:label]}"
puts "Code : #{cat[:code]}"
puts "Confidence : #{cat[:confidence]}"
end


When you run this code you should receive an output similar to that shown below which assigns the article an IPTC label of “natural science – astronomy” with an IPTC code of 13004007.

Article Classification Results

Label : natural science - astronomy
Code : 13004007
Confidence : 1.0


### Hashtag Analysis

Next we will look at analyzing the same BBC article to extract hashtag suggestions for sharing the article on social media with the following code.

url = "http://www.bbc.com/earth/story/20150121-mega-storms-sweep-uranus"
hashtags = client.hashtags url: url
hashtags[:hashtags].each do |str|
puts str
end


You should receive the output shown below.

Hashtag Suggestion Results

If Ruby isn’t your preferred language then check out our SDKs for node.js, Go, PHP, Python, Java and .Net (C#). For more information regarding the APIs go to the documentation section of our website.

We will be publishing ‘getting started’ blogs for the remaining languages over the coming weeks so keep an eye out for them. If you haven’t already done so you can get free access to our API on our sign up page.