Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

This is the fourth edition, in our series of blogs on getting started with AYLIEN’s various SDKs. There are SDKs available for Node.js, Python, Ruby, PHP, GO, Java and .Net (C#). For this week’s instalment we’re going to focus on C#.

If you are new to AYLIEN and Text Analysis and you do not have an account yet, you can take a look at our blog on how to get started with the API or alternatively you can go directly to our Getting Started page which will take you through the signup process. We provide a free plan to get started which allows users to make up to 1,000 calls per day to the API for free.

Downloading and Installing the C# SDK

All of our SDK repositories are hosted on Github. You can find the C# repository here. The simplest way to install the repository is with “nuget package manager”. Simply type the following from a command line tool.


nuget install Aylien.TextApi

Alternatively, from Visual Studio under the “Project” Menu choose “Manage Nuget Packages” and search for the AYLIEN package under online packages.

Once you have installed the SDK you’re ready to start coding. The Sandbox area of the website has a number of sample applications in node.js which would help to demonstrate what the APIs can do. In the remainder of this blog we will walk you through making calls using the C# SDK and show the output you should receive in each case.

Configuring the SDK with your AYLIEN credentials

Once you have received your AYLIEN APP_ID and APP_KEY from the signup process and you have downloaded the SDK, you can start making calls by adding the AYLIEN namespace to your C# code.


using Aylien.TextApi;
using System;

And initialising a client with your AYLIEN credentials


Client client = new Client(
                "YOUR_APP_ID", "YOUR_APP_KEY");

When calling the various API you can specify whether you want to analyze a piece of text directly or a URL linking to the text or article you wish to analyze.

Language Detection

First let’s take a look at the language detection endpoint by analyzing the following sentence: ‘What language is this sentence written in?’

You can call this endpoint using the following piece of code.


Language language = client.Language(text: "What language is this sentence written in?");
Console.WriteLine("Text: {0}", language.Text);
Console.WriteLine("Language: {0}", language.Lang);
Console.WriteLine("Confidence: {0}", language.Confidence);

You should receive an output very similar to the one shown below which shows the language detected as English and a confidence score. The confidence score is very close to 1, so, you can be pretty sure it’s correct.

Language Detection Results


Text: What language is this sentence written in?
Language: en
Confidence: 0.9999982

Sentiment Analysis

Next, we’ll look at analyzing the sentence “John is a very good football player” to determine it’s sentiment i.e. whether it’s positive, neutral or negative. The endpoint will also determine if the text is subjective or objective. You can call the endpoint with the following piece of code


Sentiment sentiment = client.Sentiment(text: "John is a very good football player!");
Console.WriteLine("Text: {0}", sentiment.Text);
Console.WriteLine("Sentiment Polarity  : {0}", sentiment.Polarity);
Console.WriteLine("Polarity Confidence  : {0}", sentiment.PolarityConfidence);
Console.WriteLine("Subjectivity  : {0}", sentiment.Subjectivity);
Console.WriteLine("Subjectivity Confidence  : {0}", sentiment.SubjectivityConfidence);

You should receive an output similar to the one shown below. This indicates that the sentence is objective and is positive, both with a high degree of confidence.

Sentiment Analysis Results


Text: John is a very good football player!
Sentiment Polarity  : positive
Polarity Confidence  : 0.999998827276487
Subjectivity  : objective
Subjectivity Confidence  : 0.989682159413825

Article Classification

We’re now going to take a look at the Classification endpoint. The Classification endpoint automatically assigns an article or piece of text to one or more categories making it easier to manage and sort. Our classification is based on IPTC International Subject News Codes and can identify up to 500 categories. The code below analyses a BBC news article about scientists who have managed to slow down the speed of light.


Classify classify= client.Classify(url: "http://www.bbc.com/news/uk-scotland-glasgow-west-30944584");
Console.Write("nClassification: n");
foreach(var item in classify.Categories)
     {
     Console.WriteLine("Label        :   {0}     ", item.Label.ToString());
     Console.WriteLine("IPTC code    :   {0}     ", item.Code.ToString());
     Console.WriteLine("Confidence   :   {0}     ",                                                                      item.Confidence.ToString());
     }

When you run this code you should receive an output similar to that shown below which assigns the article an IPTC label of “applied science – particle physics” with an IPTC code of 13001004.

Article Classification Results


Classification:
Label        :   applied science - particle physics
IPTC code    :   13001004
Confidence   :   0.9877892

Hashtag Analysis

Next, we’ll look analyze the same BBC article and extract hashtag suggestions for sharing the article on social media.


Hashtags hashtags = client.Hashtags(url: "http://www.bbc.com/news/uk-scotland-glasgow-west-30944584");
Console.Write("nHashtags: n");
        foreach(var item in hashtags.HashtagsMember)
	{
	Console.WriteLine(item.ToString());
	}

You should receive the output shown below.

Hashtag Suggestion Results


Hashtags:
#Glasgow
#HeriotWattUniversity
#Scotland
#Moon
#QuantumRealm
#LiquidCrystal
#Tie
#Bicycle
#Wave-particleDuality
#Earth
#Physics

Check out our SDKs for node.js, Go, PHP, Python, Java and Ruby if C# isn’t your preferred language. For more information regarding the APIs go to the documentation section of our website.

0

Last week’s getting started blog focused on the Python SDK. This week we’re going to focus on using the API with Go. This is the third in our series of blogs on getting started with AYLIEN’s various SDKs. You can access all our SDK repositories on here

If you are new to our API and Text Analysis in general and you don’t have an account you can go directly to the Getting Started page on the website which will take you through how to open an account. You can choose a free plan to get started, which allows you to make up to 1,000 calls per day to the API for free.

Downloading and Installing the Go SDK

The simplest way to install the repository is with “go get”. Simply type the following from a command line tool.


$ go get github.com/AYLIEN/aylien_textapi_go

Utilizing the SDK with your AYLIEN credentials

Once you’ve subscribed to our API and have downloaded the SDK you can start making calls by adding the following code to your go program.


import (
"fmt"
textapi "github.com/AYLIEN/aylien_textapi_go"
)
auth := textapi.Auth{"YOUR_APP_ID ", "YOUT_APP_KEY"}
client, err := textapi.NewClient(auth, true)
if err != nil {
panic(err)
}

When calling the API you can specify whether you wish to analyze a piece of text directly for a URL linking to the text or article you wish to analyze.

Language Detection

We’re going to first showcase the Language Detection endpoint by analyzing the following sentence “What language is this sentence written in?“ By using the following piece of code.


languageParams := &textapi.LanguageParams{Text: "What language is this sentence written in?"}
lang, err := client.Language(languageParams)
if err != nil {
panic(err)
}
fmt.Printf("nLanguage Detection Resultsn")
fmt.Printf("Text            :   %sn", lang.Text)
fmt.Printf("Language        :   %sn", lang.Language)
fmt.Printf("Confidence      :   %fnn", lang.Confidence)

You should receive an output very similar to the one shown below, which shows, the language detected was English and the confidence that it was detected correctly (a number between 0 and 1) is very close to 1 which means you can be pretty sure it is correct.

Language Detection Results


Text            :   What language is this sentence written in?
Language        :   en
Confidence      :   0.999997

Sentiment Analysis

Next we’ll look at analyzing the following short piece of text “John is a very good football player” to determine it’s sentiment i.e. if it’s positive, neutral or negative.


sentimentParams := &textapi.SentimentParams{Text: "John is a very good football player!"}
sentiment, err := client.Sentiment(sentimentParams)
if err != nil {
panic(err)
}
fmt.Printf("Sentiment Analysis Resultsn")
fmt.Printf("Text            :   %sn", sentiment.Text)
fmt.Printf("Sentiment Polarity  :   %sn", sentiment.Polarity)
fmt.Printf("Polarity Confidence  :   %fn", sentiment.PolarityConfidence)
fmt.Printf("Subjectivity  : %sn", sentiment.Subjectivity)
fmt.Printf("Subjectivity Confidence  :   %fnn", sentiment.SubjectivityConfidence)

You should receive an output similar to the one shown below which indicates that the sentence is objective and is positive, both with a high degree of confidence.

Sentiment Analysis Results


Text            :   John is a very good football player!
Sentiment Polarity  :   positive
Polarity Confidence  :   0.999999
Subjectivity  : objective
Subjectivity Confidence  :   0.989682

Article Classification

AYLIEN’s Classification endpoint automatically assigns an article or piece of text to one or more categories, making it easier to manage and sort. The classification is based on IPTC International Subject News Codes and can identify up to 500 categories. The code below analyses a BBC news article about a one ton pumpkin ;).


classifyParams := &textapi.ClassifyParams{URL: "http://www.bbc.com/earth/story/20150114-the-biggest-fruit-in-the-world"}
class, err := client.Classify(classifyParams)
if err != nil {
panic(err)
}
fmt.Printf("Classification Analysis Resultsn")
for _, v := range class.Categories {
fmt.Printf("Classification Label        :   %sn", v.Label)
fmt.Printf("Classification Code         :   %sn", v.Code)
fmt.Printf("Classification Confidence   :   %fnn", v.Confidence)

When you run this code you should receive an output similar to that shown below which assigns the article an IPTC label of “natural science – biology” with an IPTC code of 13004008.

Classification Results


Classification Label        :   natural science - biology
Classification Code         :   13004008
Classification Confidence   :   0.929754

Hashtag Suggestion

Next, we’ll have a look at analyzing the same BBC article and extracting hashtag suggestions for it.


hashtagsParams := &textapi.HashtagsParams{URL:
"http://www.bbc.com/earth/story/20150114-the-biggest-fruit-in-the-world"}
hashtags, err := client.Hashtags(hashtagsParams)
if err != nil {
panic(err)
}
fmt.Printf("Hashtag Suggestion Resultsn")
for _, v := range hashtags.Hashtags {
fmt.Printf("%sn", v)
}

You should receive an output similar to the one below.

Hashtags


Hashtag Suggestion Results
#Carbon
#Sugar
#Squash
#Agriculture
#Juicer
#BBCEarth
#TopsfieldMassachusetts
#ArnoldArboretum
#AtlanticGiant
#HarvardUniversity
#Massachusetts

If Go isn’t your weapon of choice then check out our SDKs for node.js, Ruby, PHP, Python, Java and .Net (C#). For more information regarding our API go to the documentation section of our website.

We will be publishing ‘getting started’ blogs for the remaining languages over the coming weeks so keep an eye out for them.

1

This is the second in our series of blogs on getting started with Aylien’s various SDKs. There are SDKs available for Node.js, Python, Ruby, PHP, GO, Java and .Net (C#). Last week’s blog focused on the node.js SDK. This week we will focus on Python.

If you are new to AYLIEN and don’t have an account you can take a look at our blog on getting started with the API or alternatively you can go directly to the Getting Started page on the website which will take you through the signup process. We have a free plan available which allows you to make up to 1,000 calls to the API per day for free.

Downloading and Installing the Python SDK

All of our SDK repositories are hosted on Github. You can find the the python repository here. The simplest way to install the repository is with the python package installer pip. Simply run the following from a command line tool.


$ pip install --upgrade aylien-apiclient

The following libraries will be installed when you install the client library:

  • httplib2
  • Once you have installed the SDK you’re ready to start coding! The Sandbox area of the website has a number of sample applications in node.js which help to demonstrate what the API can do. For the remainder of this blog we will walk you through making calls to three of the API endpoints using the Python SDK.

    Utilizing the SDK with your AYLIEN credentials

    Once you have received your AYLIEN credentials and have downloaded the SDK you can start making calls by adding the following code to your python script.

    
    from aylienapiclient import textapi
    c = textapi.Client("YourApplicationID", "YourApplicationKey")
    

    When calling the various endpoints you can specify a piece of text directly for analysis or you can pass a URL linking to the text or article you wish to analyze.

    Language Detection

    First let’s take a look at the language detection endpoint. We’re going to detect the language of the sentence “’What language is this sentence written in?’

    You can do this by simply running the following piece of code:

    
    language = c.Language({'text': 'What Language is this sentence written in?'})
    print ("Language Detection Results:nn")
    print("Language: %sn" % (language["lang"]))
    print("Confidence: %Fn" %  (language["confidence"]))
    

    You should receive an output very similar to the one shown below which shows that the language detected was English and the confidence that it was detected correctly (a number between 0 and 1) is very close to 1 indicating that you can be pretty sure it is correct.

    Language Detection Results:

    
    Language	: en
    Confidence	: 0.9999984486883192
    

    Sentiment Analysis

    Next we will look at analyzing the sentence “John is a very good football player” to determine it’s sentiment i.e. whether it’s positive, neutral or negative. The API will also determine if the text is subjective or objective. You can call the endpoint with the following piece of code

    
    sentiment = c.Sentiment({'text': 'John is a very good football player!'})
    print ("nnSentiment Analysis Results:nn")
    print("Polarity: %sn" % sentiment["polarity"])
    print("Polarity Confidence: %sn" %  (sentiment["polarity_confidence"]))
    print("Subjectivity: %sn" % sentiment["subjectivity"])
    print("Subjectivity Confidence: %sn" %  (sentiment["subjectivity_confidence"]))
    

    You should receive an output similar to the one shown below which indicates that the sentence is positive and objective, both with a high degree of confidence.

    Sentiment Analysis Results:

    
    Polarity: positive
    Polarity Confidence: 0.9999988272764874
    Subjectivity: objective
    Subjectivity Confidence: 0.9896821594138254
    

    Article Classification

    Next we will take a look at the classification endpoint. The Classification Endpoint automatically assigns an article or piece of text to one or more categories making it easier to manage and sort. The classification is based on IPTC International Subject News Codes and can identify up to 500 categories. The code below analyzes a BBC news article about the first known picture of an oceanic shark giving birth.

    
    category = c.Classify({'url': 'http://www.bbc.com/news/science-environment-30747971'} )
    print("nArticle Classification Results:nn")
    print("Label	: %sn" % category["categories"][0]["label"])
    print("Code	: %sn" % category["categories"][0]["code"])
    print("Confidence	: %sn" % category["categories"][0]["confidence"])
    
    

    When you run this code you should receive an output similar to that shown below, which assigns the article an IPTC label of “science and technology – animal science” with an IPTC code of 13012000.

    Article Classification Results:

    
    Label   : science and technology - animal science
    Code    : 13012000
    Confidence      : 0.9999999999824132
    

    If python is not your preferred language then check out our SDKs for node.js, Ruby, PHP, GO, Java and .Net (C#). For more information regarding the APIs go to the documentation section of our website.

    We will be publishing ‘getting started’ blogs for the remaining languages over the coming weeks so keep an eye out for them. If you haven’t already done so, you can get free access to our API on our sign up page.

    Happy Hacking!

    image

    0

    If you’re a regular reader of our blog, you will have heard us mention Time to First Hello World (TTFHW) quite a bit. It’s all part of our focus on Developer Experience and our efforts to make our API’s as easy as possible to use and integrate with.

    In line with this initiative, in late 2014 we launched Software Development Kits for Ruby, PHP, Node.js and Python and we promised to add more SDKs for other popular languages early in the new year. The idea behind the SDKs was to make it as easy as possible for our users to get up and running with the API and making calls as quickly as possible.

    We’re happy to announce that AYLIEN Text Analysis SDKs are now available for Java, C# and Go. We know there was a few of our users waiting on the Java SDK so we’re particularly happy to now offer the Java SDK among others. You can download them directly from our AYLIEN GitHub repository below.

    If you have requests for features or improvements to our API, or our other product offerings, make sure you let us know about them. Also, if you haven’t played with it yet, check out our Developer Sandbox. It’s a Text Analysis playground for developers. A place you can go to test the API, fiddle with ideas and build the foundations of your Text Analysis service.Happy Hacking!

     

    image

    0

    What is eDiscovery?

    In order to understand how Text Analysis technology can help as part of the eDiscovery process it is important to first understand, what eDiscovery is and why it is important in the legal profession. Wikipedia describes legal discovery as “the pre-trial phase in a lawsuit in which each party…can obtain evidence from the opposing party.” eDiscovery is an umbrella term used to indicate the discovery process for electronic documents.

     

    image

     

    Given that the vast majority of information is stored electronically in one form or another the discovery process requires law firm associates to review text documents, email trails etc to determine if they are relevant (responsive or non-responsive) to a  particular case. It is pretty much a data reduction and analysis task which is time-consuming and therefore an extremely costly process.

    Given the proliferation of electronic documents within a corporate environment and the sheer mass of e-documents within an organization’s data warehouse one may have to consider documents numbering in the millions or tens of millions as part of a discovery process. It is almost impossible for a human being to trawl through such a vast amount of documents with a fine tooth comb without any technological assistance. Natural Language Processing and Machine Learning technologies, therefore, are well placed to add some smarts and automation to the process in order to save time, eliminate human error and overall reduce costs.

    Text Analysis used in the process

    Text Analysis practices can be used as part of an overall eDiscovery process to reduce time, increase accuracy and lower costs. Unsupervised and supervised methods can be used to achieve this goal.

    Unsupervised Methods:

    Machine Learning practices and the application of Text Analysis as part of the discovery process can help by allowing certain tasks such as Language detection of documents, Entity Extraction, Concept Extraction, Summarization and Classification to be conducted automatically. Metadata created for individual documents can also be considered in terms of the overall document repository to cluster documents by concept and uncover duplicate and/or near duplicate documents quickly with little or no heavy lifting.

    Additionally the metadata created can allow the automatic discovery of topics in documents and add a temporal dimension to see how the topic evolves over time, this process is known as topic modelling. Consider email threading as an example i.e. taking what would otherwise be disparate emails and linking them together into a thread over time to see how the conversation evolved.

    Supervised Methods:

    While unsupervised methods are useful in the eDiscovery process they will most likely never entirely replace the human aspect of discovery, and for the most part they don’t aim to be a complete replacement. They’re more of a very smart and efficient aid in the process.

    Major benefits are realized when predictive coding is combined with human review. This process is known as Technology Assisted Review or TAR. This is a process whereby a sample set of documents is analyzed, usually by a senior attorney, and scored in terms of responsiveness to discovery requests for the case. eDiscovery software applies mathematical algorithms and machine learning techniques to automatically analyze the rest of the documents and score them for relevance based on what it “learns” from the TAR process.

    Scores generated through predictive coding can be used to automatically cull large numbers of documents from consideration without the need for human review.

    Benefits

    In recent years, the adoption of natural language processing and machine learning technologies as part of the eDiscovery process has been on the rise mainly due to the fact that it aids knowledge discovery, saves time and reduces costs.

    Knowledge Discovery:

    The sheer volume of documents and data to review as part of an eDiscovery process is massively overwhelming for a team of legal professionals who might be searching for a specific line of text among millions of documents. Sometimes they may not even know what they are looking for. Incorporating advanced and specialized technology into the process means the search and discovery process can ensure no page is left unturned.

    Time:

    In most cases eDiscovery projects are time bound and teams work day and night to meet important deadlines. With limited time and huge volumes of data and text to get through eDiscovery teams are often fighting an uphill battle. Technology can assist in processing large amounts of data in a fraction of the time it takes a team of legal professionals.

    Cost:

    The amount of data sources to be analyzed and the size of legal teams involved means an eDiscovery project can often prove quite costly. The introduction of Text Analysis as part of the eDiscovery process means the time it takes and the amount of professionals needed on an eDiscovery team can both be greatly reduced which in turn reduce the cost of the overall project.

    Conclusion

    It seems technology will never fully replace the role of a legal expert in the eDiscovery process, but as machines and software get smarter the role of technology in the entire process is only going to grow.





    Text Analysis API - Sign up




    0

    One of the coolest things about analysing text is, it’s everywhere! Irrespective of industry, companies & individuals want to make better informed business decisions based off trackable and measurable insight. With advancements in Text Analysis, companies can now mine text to uncover insights and improve their service or offering to prosper in their market.

    So far at AYLIEN, our Text Analysis API has had great success in the news and media space. But this is just the tip of the Text Analytics iceberg. There are countless numbers of other industries that can gain the same value from such insights. As we don’t have a countless amount of time, let’s stick with a Top 10 list of use cases for Text Analytics.

    1. Sports trading – One of the most popular sports to bet on, particularly in Europe, is football (soccer). The top sports traders gather data from the mainstream media and have a deep understanding of the game and it’s politics at a local level. If you live in England and you bet on English football, irrespective of the division, it’s relatively easy to understand your market. You can successfully bet on a local second division English team because you speak the language, read the local newspapers and may even follow some of the team members on Twitter. But what if you’d like to do the same for a similar team in Spain and you don’t speak a word of Spanish? A Text Analysis API capable of understanding Spanish would allow you to extract meaning from local Twitter feeds, giving you insights into what the local fans are saying about their team. These people understand the squad dynamics at a local level. If, for example, the star striker of Real Club Deportivo Mallorca has an argument with his wife the night before his cup game, is he as likely to be the top scorer on match day?

    2. Financial Trading – As with sports trading, having an insight into what is happening at a local level can be very valuable to a financial trader. Domain-specific sentiment analysis/classification can add real value here. The same way in which fans have their own distinct vocab based on the sport, so too do traders in particular markets. Intent recognition and Spoken Language Understanding services for detecting user intents (e.g. “buy”, “sell”, etc) from short utterances can help to guide traders in deciding what to trade, how much and how quickly.

    3. Voice of the customer (VOC) – VOC applications are primarily used by companies to determine what a customer is saying about a product or service. Sources of such data include emails, surveys, call center logs and social media streams like blogs, tweets, forum posts, newsfeeds, and so on. For example, a telecom company could use voice of customer text analysis to scan Twitter for customer gripes about their broadband internet services. This would would give them an early warning when customers were annoyed with the performance of the service and allow them to intercept the issue before it involved the customer calling to officially complain or request contract cancellation.

    4. Fraud – Whether it’s workers claiming false compensation or a motorist disclosing a false home address, fraudulent activity can be discovered much more quickly when those investigating can join the dots together, faster. In the latter case, for example, the guilty party may give an address that has many claims associated with it or the driven vehicle may have been involved in other claims. Having the ability to capture this information saves the insurer time and gives them greater insight into the case.

    5. Manufacturing or warranty analysis – In this use case, companies examine the text that comes from warranty claims, dealer technician lines, report orders, customer relations text, and other potential information using text analytics to extract certain entities or concepts (like the engine or a certain part). They can then analyze this information, looking at how the entities cluster and to see if the clusters are increasing in size and whether they are a cause for concern, for example.

    6. Customer service routing – In this use case, companies can use text analytics to route requests to customer service representatives. For example, say you’ve sent an email to a company while on hold to one of their reps. You might have a question or a complaint about one of their products. The company can use text analytics for intelligent routing of that email to the appropriate person at the company. This could also be possible in a call center situation, provided you have sufficiently accurate speech-to-text software.

    7. Lead generation – As was the case with the VOC application, taking timely action on a piece of Social Media information can be used to both retain and gain new customers. For example, if a person tweets that they are interested in a certain product or service, text analytics can discover this & feed this info to a sales rep who can then pursue this prospect and convert them into a customer.

    8. TV advertising & audience analysis – TV shows or live televised events are some of the most talked-about topics on Twitter. Marketers and TV producers can both benefit from using Text Analytics in two distinct ways. If producers can get an understanding of how their audience ‘feels’ about certain characters, settings, storylines, featured music etc they can make adjustments in a bid to appease their viewers and therefore increase the audience size and viewers ratings. Marketers can dig in to social media streams to analyse the effectiveness of product placement and commercials aired during the breaks. For example, the TV character ‘Cersei’ from Game of Thrones is becoming a fashion icon amongst fans, who regularly Tweet about her latest frock. High street retailers that want to take advantage of this trend could release a line of ‘Queen of Westeros’ style clothing and align their commercials with shows like Game of Thrones. Text Analytics could also be used by TV Executives looking to sell to advertisers. For example, a TV company could mine viewers tweets & forum activity to profile their audience more accurately. So instead of merely pitching the size of their audience to advertisers, they could wow them by identifying their gender, location, age etc and their feelings towards certain products.

    9. Recruitment – Text Analysis could be used in both the search and selection phases of recruitment. The most basic application would be identifying the skills of a potential hire. In the recruitment industry, the real value comes from identifying prospects before they become active on the job market. For example, it would be very powerful to know if somebody tweets about disliking their job or expresses an interest in working in a different field, larger/smaller company, different location etc. Once you have identified such a prospect, you could use Text Analytics to analyse the suitability of this person based on what others say about them. Mining news and blog articles, forum postings and other sources could help to evaluate potential hires.

    10. Review Sites – Companies like Expedia have millions of reviews on their website, from travellers all over the world. Given the nature of the site and the fact that their users are looking for a stress free experience, having to sift through hundreds of reviews to find a place to stay can be a real turn off. Text Analysis can be used here to build tools that can summarize multiple properties in 2-3 word phrases. Instead of scrolling through a list of hotel features like heated pool, massage therapy, buffet breakfast etc, you could simply say “Luxurious Hotel and Spa”.

    Did you like our top 10 use cases? If you work in an industry that’s not mentioned above and have an idea of how Text Analytics could help you, please let us know!

    Subscribe to our blog and keep an eye out for our next post on how Text Analytics can add value to your business.

    Drop us an email @mention us on Twitter





    Text Analysis API - Sign up




    0

    I have made this letter longer than usual, because I lack the time to make it short — Blaise Pascal

    We live in the age of “TL;DR“s and 140 character long texts: bite-sized content that is easy to consume and quick to digest. We’re so used to skimming through feeds of TL;DRs for acquiring information and knowledge about our friends and surroundings, that we barely sit through reading a whole article unless we find it extremely interesting.

    It’s not necessarily a “bad” thing though – we are getting an option to exchange breadth for depth, which gives us more control over how we acquire new information with a higher overall efficiency.

    This is an option we previously did not have, as most of the content was produced in long form and often without considering reader’s time constraints. But in the age of Internet, textual content must compete with other types of media such as images and videos, that are inherently easier to consume.

    Vision: The Brevity Knob

    In an ideal world, every piece of content should come with a knob attached to it that lets you adjust its length and depth by just turning the knob in either direction, towards brevity or verbosity:

    • If it’s a movie, you would start with a trailer and based on how interesting you find it, you could turn the knob to watch the whole movie, or a 60 or 30-minute version of it.
    • For a Wikipedia article, you would start with the gist, and then gradually turn the knob to learn more and gain deeper knowledge about the subject.
    • When reading news, you would read one or two sentences that describe the event in short and if needed, you’d turn the knob to add a couple more paragraphs and some context to the story.

    This is our simplistic vision for how summarization technology should work.

    Text Summarization

    At AYLIEN we’ve been working on a Text Summarization technology that works just like the knob we described above: you give it some text, a news article perhaps, specify the target length of your summary, and our Summarization API automatically summarizes your text for you. Using it you can turn an article like this:

    Screen Shot 2017-01-25 at 14.27.59

    Into a handful of key sentences:

    1. Designed to promote a healthier balance between our real lives and those lived through the small screens of our digital devices, Moment tracks how much you use your phone each day, helps you create daily limits on that usage, and offers “occasional nudges” when you’re approaching those limits.
    2. The app’s creator, Kevin Holesh, says he built Moment for himself after realizing how much his digital addictions were affecting his real-world relationships.
    3. My main goal with Moment was make me aware of how many minutes I’m burning on my phone each day, and it’s helped my testers do that, too.”
    4. The overall goal with Moment is not about getting you to “put down your phone forever and go live in the woods,” Holesh notes on the app’s website.
    5. There’s also a bonus function in the app related to whether or not we’re putting our phone down in favor of going out on the town, so to speak – Moment can also optionally track where you’ve been throughout the day.

    See a Live Demo

    A New Version

    Today we’re happy to announce a new version of our Summarization API that has numerous advantages over the previous versions and gives you more control over the length of the generated summary.

    Two new parameters sentences_number and sentences_percentage allow you to control the length of your summary. So to get a summary that is 10% of the original text in length, you would make the following request:

    curl --get --include "https://aylien-text.p.mashape.com/summarize?url=http%3A%2F%2Fwww.bbc.com%2Fsport%2F0%2Ffootball%2F25912393&sentences_percentage=10" -H "X-Mashape-Key: YOUR_MASHAPE_KEY"
    

    We hope you find this new technology useful. Please check it out on our website and let us know if you have any questions or feedback: hello@aylien.com

    Happy TL;DRing!




    Text Analysis API - Sign up




    0

    When we launched our Text Analysis API back in February, we made a promise to put quality before quantity – meaning that we won’t build a new feature without making sure the current features are all working reasonably well.

    That’s why we initially focused on English as the only language supported by the API.

    Now we’ve reached a stage where we feel comfortable to extend our knowledge of Machine Learning and Text Analysis to other languages, and that’s why we’ve decided to add support for 5 new languages to our Concept Extraction and Hashtag Suggestion endpoints: starting today, you can extract concepts that are mentioned in documents written in German, French, Italian, Spanish and Portuguese in the same way you would extract concepts mentioned in English documents. Same with Hashtag Suggestion.

    Here’s a sample request:

    curl -v --data-urlencode "url=http://www.lemonde.fr/europeennes-2014/article/2014/05/22/sarkozy-demolit-l-ue-existante-tout-en-disant-qu-il-l-aime_4423949_4350146.html"
        -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY"
        "https://aylien-text.p.mashape.com/concepts?language=fr"
    

    Note that you can use language=auto to have the API automatically detect the language of the document for you.

    We are planning to eventually add support for these 5 languages to all other endpoints, so stay tuned for more!






    Text Analysis API - Sign up




    0

    Making API requests one by one can be inefficient when you have a large number of documents you wish to analyze. We’ve added a batch processing feature that makes it easier to process a large number of documents all at once using the Text Analysis API.

    Steps to use this feature are as follows:

    Step 1. Package all your documents in one file

    Start by putting all your documents (or URLs) in one big text file – one document/URL per line. Example:

    Don't panic.
    Time is an illusion. Lunchtime doubly so.
    For a moment, nothing happened. Then, after a second or so, nothing continued to happen.
    

    Step 2. Make a batch request and obtain job identifier

    Calling the /batch endpoint creates a new analysis job that will be processed eventually. There are a couple of parameters that you need to provide to /batch:

    Parameter Description Possible values Required
    data Data to be analyzed †
    endpoints Comma separated list of Text Analysis API endpoints classify, concepts, entities, extract, language, sentiment, summarize, hashtags
    entities_type Type of entities in your file, whether they are URLs, or texts text, url
    output_format The format you wish to download the batch results in (Default: json) json, xml

    † Maximum file size is 5MB

    All other parameters sent to /batch will be passed down to the endpoints you’ve specified in endpoints in an as-is manner. For example:

    curl -v -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY"
        -F data=@"/home/amir/42"
        -F "endpoints=sentiment"
        -F "entities_type=text"
        -F "output_format=xml"
        -F "mode=tweet" https://aylien-text.p.mashape.com/batch
    

    Will upload contents of file /home/amir/42, and indicates that each line is a text (not a URL), desired operation is sentiment analysis, and you wish to download the results in XML format.

    A successful request will lead to a 201 Created, with a Location header which indicates the URI you can poll to get the status of your submitted job. For you convenience URI is also in the body of response.

    Step 3. Poll the job status information until it is finished

    You can call the URI obtained from last step to see the status of your job. Your job can be in either one of these states: pending, in-progress, failed, or completed. If your job is completed you’ll receive 303 See Other with a Location header indicating where you can download your results. Its also in the body of your response. Example:

    curl -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY"
        -H "Accept: text/xml"
        "https://aylien-text.p.mashape.com/queue?uuid=68e16fe3-3cde-43dd-86b7-52136b398e0d"
    

    Sample response (XML):

    <result><status>completed</status><location>https://textapi-batch-results.s3.amazonaws.com/...</location></result>
    

    And sample JSON response:

    {
        "status": "completed",
        "location": "https://textapi-batch-results.s3.amazonaws.com/..."
    }
    

    Step 4. Download your results

    location value obtained from the last step, is a pre-signed S3 Object URL which you can easily download using curl, or wget. Please note that results will be kept only for 7 days after the job is finished and will be deleted afterwards. If you fail to obtain the results during this period, you must re-submit your job.

    Happy crunching!





    Text Analysis API - Sign up





    0

    Human beings are remarkably adept at understanding each other, given that we speak in languages of our own construction which are merely symbols of the information we’re trying to convey.

    We’re skilled at understanding for two reasons. First, we’ve had, literally, millions of years to acquire the necessary skills. Second, we speak in, generally, the same terms, the same languages. Still, it’s an incredible feat, to extract understanding and meaning from such an avalanche of signal.

    Consider this: researchers in Japan used the K Computer, currently the fourth most powerful supercomputer in the world, to process a single second of human brain activity.

    It took the computer 40 minutes to process that single second of brain activity.

    For machines to reach the level of understanding that’s required for today’s applications and news organizations, then, would require those machines to sift through astronomical amounts of data, separating the meaningful from the meaningless. Much like our brains consciously process only a fraction of the information they store, a machine that could separate the wheat from the chaff would be capable of extracting remarkable insights.

    We live in the dawn of the computer age, but in the thirty years since personal computing went mainstream, we’ve seen little progress in how computers work on a fundamental level. They’ve gotten faster, smaller, and more powerful, but they still require huge amounts of human input to function. We tell them what to do, and they do it. But what if what we’re truly after is understanding? To endow machines with the ability to learn from us, to interact with us, to understand what we want? That’s the next phase in the evolution of computers.

    Enter NLP

    Natural Language Processing (NLP) is the catalyst that will spark that phase. NLP is a branch of Artificial Intelligence that allows computers to not just process, but to understand human language, thus eliminating the language barrier.

    Chances are, you already use applications that employ NLP:

    • Google Translate: human language translation is already changing the way humans communicate, by breaking down language barriers.
    • Siri and Google Now: contextual services built into your smartphone rely heavily on NLP. NLP is why Google knows to show you directions when you say “How do I get home?”.

    There are many other examples of NLP in products you already use, of course. The technology driving NLP, however, is not quite where it needs to be (which is why you get so frustrated when Siri or Google Now misunderstands you). In order to truly reach its potential, this technology, too, has a next step: understand you. It’s not enough to recognize generic human traits or tendencies; NLP has to be smart enough to adapt to your needs.

    Most startups and developers simply don’t have the time or the resources to tackle these issues themselves. That’s where we come in. AYLIEN (that’s us) has combined three years of our own research with emerging academic studies on NLP to provide a set of common NLP functionalities in the form of an easy-to-use API bundle.

    Announcing the AYLIEN Text Analysis API

    The Text API consists of eight distinct Natural Language Processing, Information Retrieval, and Machine Learning APIs which, when combined, allow developers to extract meaning and insight from any document with ease.

    Here’s how we do it.

    Article Extraction

    This tool extracts the main body of an article, removing all extraneous clutter, but leaving intact vital elements like embedded images and video.

    image

    Article Summarization

    This one does what it says on the tin: summarizes a given article in just a few sentences.

    image

    Classification

    The Classification feature uses a database of more than 500 categories to properly tag an article according to IPTC NewsCode standards.

    image

    Entity Extraction

    This tool can extract any entities (people, locations, organizations) or values (URLS, emails, phone numbers, currency amounts and percentages) mentioned in a given text.

    image

    Concept Extraction

    Concept Extractions continues the work of Entity Extraction, linking the entities mentioned to the relevant DBPedia and Linked Data entries, including their semantic types (such as DBPedia and schema.org types).

    image

    Language Detection

    Language Detection, of course, detects the language of a document from a database of 62 languages, returning that information in ISO 639-1 format.

    image

    Sentiment Analysis

    Sentiment Analysis detects the tone, or sentiment, of a text in terms of polarity (positive or negative) and subjectivity (subjective of objective).

    image

    Hashtag Suggestion

    Because discoverability is crucial to social media, Hashtag Suggestion automatically suggests ultra-relevant hashtags to engage audiences across social media.

    image

    This suite of tools is the result of years of research mixed in with a good, old-fashioned hard work. We’re excited about the future of the Semantic Web, and we’re proud to offer news organizations and developers an easy-to-use API bundle that gets us one step closer to recognizing our vision.

    We’re happy to announce that you can start using the Text API from today for free. Happy hacking, and let us know what you think.

    Learn more:





    Text Analysis API - Sign up





    0