Extracting Reality from Data
Go to aylien.com

TL;DR Era

I have made this letter longer than usual, because I lack the time to make it short — Blaise Pascal

We live in the age of “TL;DR"s and 140 character long texts: bite-sized content that is easy to consume and quick to digest. We’re so used to skimming through feeds of TL;DRs for acquiring information and knowledge about our friends and surroundings, that we barely sit through reading a whole article unless we find it extremely interesting.

It’s not necessarily a “bad” thing though – we are getting an option to exchange breadth for depth, which gives us more control over how we acquire new information with a higher overall efficiency.

This is an option we previously did not have, as most of the content was produced in long form and often without considering reader’s time constraints. But in the age of Internet, textual content must compete with other types of media such as images and videos, that are inherently easier to consume.

Vision: The Brevity Knob

In an ideal world, every piece of content should come with a knob attached to it that lets you adjust its length and depth by just turning the knob in either direction, towards brevity or verbosity:

  • If it’s a movie, you would start with a trailer and based on how interesting you find it, you could turn the knob to watch the whole movie, or a 60 or 30-minute version of it.
  • For a Wikipedia article, you would start with the gist, and then gradually turn the knob to learn more and gain deeper knowledge about the subject.
  • When reading news, you would read one or two sentences that describe the event in short and if needed, you’d turn the knob to add a couple more paragraphs and some context to the story.

This is our simplistic vision for how summarization technology should work.

Text Summarization

At AYLIEN we’ve been working on a Text Summarization technology that works just like the knob we described above: you give it some text, a news article perhaps, specify the target length of your summary, and our Summarization API automatically summarizes your text for you. Using it you can turn an article like this:

Into a handful of key sentences:

  1. Designed to promote a healthier balance between our real lives and those lived through the small screens of our digital devices, Moment tracks how much you use your phone each day, helps you create daily limits on that usage, and offers “occasional nudges” when you’re approaching those limits.
  2. The app’s creator, Kevin Holesh, says he built Moment for himself after realizing how much his digital addictions were affecting his real-world relationships.
  3. My main goal with Moment was make me aware of how many minutes I’m burning on my phone each day, and it’s helped my testers do that, too.”
  4. The overall goal with Moment is not about getting you to “put down your phone forever and go live in the woods,” Holesh notes on the app’s website.
  5. There’s also a bonus function in the app related to whether or not we’re putting our phone down in favor of going out on the town, so to speak – Moment can also optionally track where you’ve been throughout the day.

See a Live Demo

A New Version

Today we’re happy to announce a new version of our Summarization API that has numerous advantages over the previous versions and gives you more control over the length of the generated summary.

Two new parameters sentences_number and sentences_percentage allow you to control the length of your summary. So to get a summary that is 10% of the original text in length, you would make the following request:

curl --get --include "https://aylien-text.p.mashape.com/summarize?url=http%3A%2F%2Fwww.bbc.com%2Fsport%2F0%2Ffootball%2F25912393&sentences_percentage=10" -H "X-Mashape-Key: YOUR_MASHAPE_KEY"

We hope you find this new technology useful. Please check it out on our website and let us know if you have any questions or feedback: hello@aylien.com

Happy TL;DRing!

Hallo, bonjour, ciao, hola, olá! Text API supports 5 new languages

When we launched our Text Analysis API back in February, we made a promise to put quality before quantity – meaning that we won’t build a new feature without making sure the current features are all working reasonably well.

That’s why we initially focused on English as the only language supported by the API.

Now we’ve reached a stage where we feel comfortable to extend our knowledge of Machine Learning and Text Analysis to other languages, and that’s why we’ve decided to add support for 5 new languages to our Concept Extraction and Hashtag Suggestion endpoints: starting today, you can extract concepts that are mentioned in documents written in German, French, Italian, Spanish and Portuguese in the same way you would extract concepts mentioned in English documents. Same with Hashtag Suggestion.

Here’s a sample request:

curl -v --data-urlencode "url=http://www.lemonde.fr/europeennes-2014/article/2014/05/22/sarkozy-demolit-l-ue-existante-tout-en-disant-qu-il-l-aime_4423949_4350146.html" \
    -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY" \
    "https://aylien-text.p.mashape.com/concepts?language=fr"

Note that you can use language=auto to have the API automatically detect the language of the document for you.

We are planning to eventually add support for these 5 languages to all other endpoints, so stay tuned for more!

Batch processing documents with Text API

Making API requests one by one can be inefficient when you have a large number of documents you wish to analyze. We’ve added a batch processing feature that makes it easier to process a large number of documents all at once using the Text Analysis API.

Steps to use this feature are as follows:

Step 1. Package all your documents in one file

Start by putting all your documents (or URLs) in one big text file — one document/URL per line. Example:

Don't panic.
Time is an illusion. Lunchtime doubly so.
For a moment, nothing happened. Then, after a second or so, nothing continued to happen.

Step 2. Make a batch request and obtain job identifier

Calling the /batch endpoint creates a new analysis job that will be processed eventually. There are a couple of parameters that you need to provide to /batch:

Parameter Description Possible values Required
data Data to be analyzed †
endpoints Comma separated list of Text Analysis API endpoints classify, concepts, entities, extract, language, sentiment, summarize, hashtags
entities_type Type of entities in your file, whether they are URLs, or texts text, url
output_format The format you wish to download the batch results in (Default: json) json, xml

† Maximum file size is 5MB

All other parameters sent to /batch will be passed down to the endpoints you’ve specified in endpoints in an as-is manner. For example:

curl -v -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY" \
    -F data=@"/home/amir/42" \
    -F "endpoints=sentiment" \
    -F "entities_type=text" \
    -F "output_format=xml" \
    -F "mode=tweet" https://aylien-text.p.mashape.com/batch

Will upload contents of file /home/amir/42, and indicates that each line is a text (not a URL), desired operation is sentiment analysis, and you wish to download the results in XML format.

A successful request will lead to a 201 Created, with a Location header which indicates the URI you can poll to get the status of your submitted job. For you convenience URI is also in the body of response.

Step 3. Poll the job status information until it is finished

You can call the URI obtained from last step to see the status of your job. Your job can be in either one of these states: pending, in-progress, failed, or completed. If your job is completed you’ll receive 303 See Other with a Location header indicating where you can download your results. Its also in the body of your response. Example:

curl -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY" \
    -H "Accept: text/xml" \
    "https://aylien-text.p.mashape.com/queue?uuid=68e16fe3-3cde-43dd-86b7-52136b398e0d"

Sample response (XML):

<result>
    <status>completed</status>
    <location>https://textapi-batch-results.s3.amazonaws.com/...</location>
</result>

And sample JSON response:

{
    "status": "completed",
    "location": "https://textapi-batch-results.s3.amazonaws.com/..."
}

Calling the /queue endpoint is free of charge.

Step 4. Download your results

location value obtained from the last step, is a pre-signed S3 Object URL which you can easily download using curl, or wget. Please note that results will be kept only for 7 days after the job is finished and will be deleted afterwards. If you fail to obtain the results during this period, you must re-submit your job.

Happy crunching!

Please note: A set of N documents and M endpoints batched together counts toward your usage as N * M requests, not as one request. And the maximum file size is 5MB.

Welcome aboard, John!

image

Position: advisor

We are delighted to announce that Dr. John Breslin has joined AYLIEN as an advisor. John has a unique blend of academia and industry in his background and he’s currently a lecturer at NUI Galway and Insight Centre.

He previously co-founded the very successful Boards.ie and Adverts.ie websites as well as StreamGlider, Technology Voice and Startup Galway. He has also lead Eurapp and SIOC.

Happy to have you on board, John and looking forward to making great things happen with your help and advice!

Introducing Text Analysis API

Human beings are remarkably adept at understanding each other, given that we speak in languages of our own construction which are merely symbols of the information we’re trying to convey.

We’re skilled at understanding for two reasons. First, we’ve had, literally, millions of years to acquire the necessary skills. Second, we speak in, generally, the same terms, the same languages. Still, it’s an incredible feat, to extract understanding and meaning from such an avalanche of signal.

Consider this: researchers in Japan used the K Computer, currently the fourth most powerful supercomputer in the world, to process a single second of human brain activity.

It took the computer 40 minutes to process that single second of brain activity.

For machines to reach the level of understanding that’s required for today’s applications and news organizations, then, would require those machines to sift through astronomical amounts of data, separating the meaningful from the meaningless. Much like our brains consciously process only a fraction of the information they store, a machine that could separate the wheat from the chaff would be capable of extracting remarkable insights.

We live in the dawn of the computer age, but in the thirty years since personal computing went mainstream, we’ve seen little progress in how computers work on a fundamental level. They’ve gotten faster, smaller, and more powerful, but they still require huge amounts of human input to function. We tell them what to do, and they do it. But what if what we’re truly after is understanding? To endow machines with the ability to learn from us, to interact with us, to understand what we want? That’s the next phase in the evolution of computers.

Enter NLP

Natural Language Processing (NLP) is the catalyst that will spark that phase. NLP is a branch of Artificial Intelligence that allows computers to not just process, but to understand human language, thus eliminating the language barrier.

Chances are, you already use applications that employ NLP:

  • Google Translate: human language translation is already changing the way humans communicate, by breaking down language barriers.
  • Siri and Google Now: contextual services built into your smartphone rely heavily on NLP. NLP is why Google knows to show you directions when you say “How do I get home?”.

There are many other examples of NLP in products you already use, of course. The technology driving NLP, however, is not quite where it needs to be (which is why you get so frustrated when Siri or Google Now misunderstands you). In order to truly reach its potential, this technology, too, has a next step: understand you. It’s not enough to recognize generic human traits or tendencies; NLP has to be smart enough to adapt to your needs.

Most startups and developers simply don’t have the time or the resources to tackle these issues themselves. That’s where we come in. AYLIEN (that’s us) has combined three years of our own research with emerging academic studies on NLP to provide a set of common NLP functionalities in the form of an easy-to-use API bundle.

Announcing the AYLIEN Text Analysis API

The Text API consists of eight distinct Natural Language Processing, Information Retrieval, and Machine Learning APIs which, when combined, allow developers to extract meaning and insight from any document with ease.

Here’s how we do it.

Article Extraction

This tool extracts the main body of an article, removing all extraneous clutter, but leaving intact vital elements like embedded images and video.

image

Article Summarization

This one does what it says on the tin: summarizes a given article in just a few sentences.

image

Classification

The Classification feature uses a database of more than 500 categories to properly tag an article according to IPTC NewsCode standards.

image

Entity Extraction

This tool can extract any entities (people, locations, organizations) or values (URLS, emails, phone numbers, currency amounts and percentages) mentioned in a given text.

image

Concept Extraction

Concept Extractions continues the work of Entity Extraction, linking the entities mentioned to the relevant DBPedia and Linked Data entries, including their semantic types (such as DBPedia and schema.org types).

image

Language Detection

Language Detection, of course, detects the language of a document from a database of 62 languages, returning that information in ISO 639-1 format.

image

Sentiment Analysis

Sentiment Analysis detects the tone, or sentiment, of a text in terms of polarity (positive or negative) and subjectivity (subjective of objective).

image

Hashtag Suggestion

Because discoverability is crucial to social media, Hashtag Suggestion automatically suggests ultra-relevant hashtags to engage audiences across social media.

image

This suite of tools is the result of years of research mixed in with a good, old-fashioned hard work. We’re excited about the future of the Semantic Web, and we’re proud to offer news organizations and developers an easy-to-use API bundle that gets us one step closer to recognizing our vision.

We’re happy to announce that you can start using the Text API from today for free. Happy hacking, and let us know what you think.

Learn more: