###### Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!

Just this week, we added Image Tagging capabilities, to our API. The result of a strategic partnership with Imagga, our Image Tagging endpoint is the first step in providing a Hybrid Text and Image analysis API.

### So how does it work?

Our Image Tagging, uses advanced image recognition and deep learning technology, to recognize and identify objects in an image. From a dataset of over 6000 objects, it then suggests candidate tags for that image along with confidence scores.

The Image Tagging feature automates image annotation, categorization and tagging, a task that is often time-consuming and laborious. It’s designed to reduce the heavy lifting involved in dealing with images.

The quickest way to get up and running with this endpoint is to grab an SDK of your choice and check out our detailed documentation.

### Analyzing and Tagging images

For this blog post, we’re going to run through a couple of simple examples to showcase the image tagging endpoint using Node.js.

As a simple example, let’s analyze the following image and see what type of tags the API suggests for it. It’s pretty clear to the human eye that it’s a house, but what exactly will the API actually recognize in the image? Let’s find out!

#### Code:


textapi.imageTags(
function(err, response, ratelimits) {
if (err !== null) {
console.log("Error: " + err);
} else {
response.tags.forEach(function(t) {
console.log("Tag : ", t.tag + " , " + "Confidence : ", t.confidence);
});
}
});


#### Results:


Tag :  mansion , Confidence :  0.47509114790445056
Tag :  driveway , Confidence :  0.4552631524330092
Tag :  house , Confidence :  0.4304875785176193
Tag :  architecture , Confidence :  0.33942988236938837
Tag :  building , Confidence :  0.3094331548632239
Tag :  home , Confidence :  0.2697299038194921
Tag :  estate , Confidence :  0.23534657602532882
Tag :  dwelling , Confidence :  0.23327752776762856
Tag :  structure , Confidence :  0.22539033224250088
Tag :  residential , Confidence :  0.2150613973041477
Tag :  residence , Confidence :  0.19566665898999108
Tag :  housing , Confidence :  0.19091773340752613...


The results will be returned displaying a number of candidate tags and a confidence score for each. The confidence score tells you how sure the API is that the tag it suggested is correct. It faired pretty well with this image and returned some pretty accurate results.

Let’s try something a little different this time, the last image was a little easy. Let’s say we want to analyze the image below.

#### Code:


textapi.imageTags(
'http://www.tencate.com/amer/Images/American%20football29-15343.jpg',
function(err, response, ratelimits) {
if (err !== null) {
console.log("Error: " + err);
} else {
response.tags.forEach(function(t) {
console.log("Tag : ", t.tag + " , " + "Confidence : ", t.confidence);
});
}
});


#### Results:

 

Tag :  football , Confidence :  1
Tag :  stadium , Confidence :  0.24106947076803517
Tag :  back , Confidence :  0.17014749275097063
Tag :  structure , Confidence :  0.12840672955565677
Tag :  game , Confidence :  0.1186865418441251
Tag :  sport , Confidence :  0.10223847715591786
Tag :  field , Confidence :  0.09827027407477171
Tag :  team , Confidence :  0.09808334899832503
Tag :  ball , Confidence :  0.09400727111771828
Tag :  people , Confidence :  0.09279897917842066...


Again, the API returned a number of accurate and relevant tags. This time it recognized it was a picture related to Football and Sport and returned a variety of results, based on what it saw.

The Image Tagging feature, is not just able to spot objects, but it can also recognize humans and certain aspects in a photo of a person. Let’s see what it recognizes in this photo, of pop singer, Beyonce.

#### Code:


textapi.imageTags(
'https://ronehiphopnc2.files.wordpress.com/2013/05/beyonce-knowles-closeup-1024x576.jpg',
function(err, response, ratelimits) {
if (err !== null) {
console.log("Error: " + err);
} else {
response.tags.forEach(function(t) {
console.log("Tag : ", t.tag + " , " + "Confidence : ", t.confidence);
});
}
});


#### Results


Tag :  attractive , Confidence :  0.3641480251093146
Tag :  portrait , Confidence :  0.3570435098374641
Tag :  adult , Confidence :  0.3419036072980034
Tag :  model , Confidence :  0.3381432174429411
Tag :  person , Confidence :  0.32852624615621456
Tag :  pretty , Confidence :  0.3228676609396025
Tag :  complexion , Confidence :  0.3164959079957736
Tag :  face , Confidence :  0.313449951118162...


The results when it comes to an image like the one above are a little bit more advanced or intelligent. The API is confident that the image of Beyonce is an attractive portrait of a model, which isn’t far off.

So, that’s how easy it is to start analyzing images with AYLIEN. Keep an eye out for our next blog on, common and not so common use cases, for hybrid text and image analysis.

We’re very excited to announce, we’ve teamed up with the guys from Imagga, to bring image analysis capabilities to our package of API’s. In a mutually beneficial partnership, we’ll be making our first step in the hybrid Text and Image analysis space with an image tagging end point being made available through our existing service.

The partnership, a match made in heaven, will add image analysis capabilities to our existing Natural Language Processing and Machine Learning API’s creating an all in one content analysis suite.

The addition of image analysis capabilities was a natural next step for us in enhancing our offering. As Parsa, our founder put it; “In today’s media, text and images are two sides of the same coin and must be analyzed and understood in tandem to get a holistic view of what’s going on in the world. There’s a strong demand among our customers for image analysis capabilities, and after evaluating a number of potential partners Imagga came out on top. We’re quite similar as companies in many ways and share the same values and goals. This is just the first of many more hybrid solutions to come.”

The image analysis endpoint utilizes machine learning technology, image recognition and deep learning algorithms to identify up to 6,000 distinct objects, concepts and facial expressions and colours automatically in images.

Our interactive demo has a number of example images which showcase the Image analysis feature.

Once an image is analyzed the API will automatically return a list of candidate tags for that image, as well as a confidence measure, indicating how sure the service is of the particular tags it suggests.

Paying customers of our Text Analysis service will get immediate access to the image processing features. The feature addition gives AYLIEN users far greater capabilities, in how they analyse and understand content at scale, whether they’re dealing with images, text or both.

The image tagging feature is fully documented in our API docs and has been added to our various SDK’s. We’ll also be following up later this week with a “how to blog” for analysing images through the API.

Image taken from Linchi Kwok’s blog

### A picture is worth how many words?

Without a doubt, one of the key things that separate us humans from the rest of the animals, is the way we communicate: the volume, level of complexity and comprehensiveness of communication among humans is far greater than that of any other animal’s.

Over time we have developed and refined our communication methods by creating new conventions (symbols, languages) as well as new channels (telegraph, telephone, newspapers) for communicating with one another, and with the rapid development of technology in the modern age, this trend has accelerated.

Today it’s easier than ever to communicate through online platforms, news sites, social channels and communities, and we are communicating with each other more than ever before by publishing, sharing and consuming information in many forms – most notably text and images.

The volume, of both text and images being published and consumed online, is growing exponentially:

• 2009: 2.5bn
• 2010: 3+bn
• 2012: 9bnSource: http://royal.pingdom.com/2013/01/16/internet-2012-in-numbers/

Number of Tweets posted per day:

• 2009: 17m
• 2010: 68m
• 2011: 300m
• 2012: 450m

So in order to make sense of this vast amount of content, and to get a more holistic view of the world, we need to develop scalable and adaptable technologies that are capable of analyzing both text and images.

### Similarities and Differences Between Text and Images

Both are major communication mediums that were once very closely related. In fact, early written languages could be seen as a form of imagery.

Also in modern day, both are found in abundance, nowhere more so than on the internet. Separately on the likes of Flickr, blogs or forums, but more often together, on platforms like Instagram, Twitter, Facebook and news sites, each with varying degrees of text to image ratio.

There is no doubt that text is the most efficient communication medium utilized today. It is better for expressing thoughts, ideas, opinions and abstract concepts. Text gives you a lot more control over the point you want to get across – level of ambiguity, tone, context and so on.

Images on the other hand have a varying degree of expressiveness. An image is indeed worth a thousand words, but only when there’s an image to describe your message. How can one describe an abstract concept like Entropy, Human Rights or Wabi-sabi with an image? Let’s look at a brief description of these concepts from Wikipedia:

Entropy is a thermodynamic quantity representing the unavailability of a system’s thermal energy for conversion into mechanical work, often interpreted as the degree of disorder or randomness in the system.”

Human rights are moral principles or norms that describe certain standards of human behaviour, and are regularly protected as legal rights in national and international law.”

Wabi-sabi (侘寂) represents a comprehensive Japanese world view or aesthetic centered on the acceptance of transience and imperfection.”

It almost hurts to try and visualize any of these concepts with an image, but the textual description does a pretty good job at giving you a general idea about them, to say the least.

While text does a much better job of describing concepts and expressing thoughts or feelings, images are sometimes easier and more efficient to produce and in some cases, they are a more appropriate and effective way to express yourself.

Question: How many of your friends have written an essay about their newborn’s first smile? How many have posted a picture or video to Facebook?

The truth is, you can capture a lot of the simple facts and events in your surroundings with a single image, and with minimal effort compared to text. Simply, point your phone’s camera at something, and with a single tap you’ve captured and shared the moment with the entire world.

### Text and Image Analysis

While text and images differ in many ways and can exist independently, they are in fact complementary and non-competing communication mediums, and to get a holistic view of the world, we would need to analyze both. Understanding images is as important as understanding text, as together they provide a more accurate picture of reality.

From an AI perspective, this means we need hybrid systems that are not only capable of understanding both mediums, but are also able to discover links between the two and leverage those links to enhance the overall performance and accuracy of such analysis systems.

Those of you who read our blog regularly, know that we are a Text Analysis company, however at a higher level, we are an AI company and therefore have a strong interest in how complementary AI solutions such as Image Analysis can be used to give us a better understanding of the real world. In particular, we have been interested in how Text Analysis and Image Analysis could be married to improve the insight gathered from content that’s produced on the Internet.

To put some of our ideas into practice, we started by collecting over 150,000 news articles from about 50 major news outlets. We wanted to see if there’s a strong link or correlation between the text of an article and the images used in it. For each of these articles, we extracted the article’s text as well as its main images. Next, we analyzed the text of each article using our Text Analysis API to find the high-level category of the article (e.g. Technology, Sports, Food, etc) as well as specific concepts and topics mentioned in the articles (e.g. People, Places, Organizations, etc).

The images accompanying the text were then analyzed using Imagga’s Tagging API, which for any given image, provides a set of tags describing the objects seen in the image. The analysis was performed independently, so when we’re analyzing the text we had no information about the images and vice versa.

### Findings

What we discovered wasn’t exactly ground-breaking but it did prove a few theories we had and affirmed a connection between images and text and how both can be used to improve insights gathered from an analysis point of view.

#### Categorization of Articles

As mentioned above, we tagged each image as part of a particular category and cross-referenced it with the categories that were identified in the text to try and uncover similarities and links between the two.

For instance, for the ABC News article titled “Kate Hudson Shows Off Her Amazing Abs” we got “people – celebrity” as the main category of the text, and the following image as the main image of the article:

Running this image through Imagga’s Tagging API gives us the following set of tags, along with the confidence score for each tag:

attractive (28%), model (25%), portrait (24%), pretty (23%), hair, sexy, person, adult, face, blond, caucasian, body, people, fashion, lady, cute, women, smile, glamour, happy, sensual, studio, smiling, human, blonde, lingerie, clothing, erotic, expression, lifestyle, looking, slim, style, fun, gorgeous, healthy, light brown, orange, skin, grey, black, red, pink.

What we’re doing below is finding the most confidently tagged images for each major category, and creating a mosaic out of those images:

Text Category: Celebrity

Text Category: Sports

Text Category: Health

Text Category: Politics

Text Category: Technology

What we see here is a strong link between the high-level category of the text of an article, and the main image used in it.

#### Concept Extraction

We also went a step further and extracted concepts and entities mentioned in these articles, such as notable people, places, organizations, general concepts and so on and we looked for a link between these and the tags assigned to the main image of the article.

Here we observed that for the most part, there’s a strong association between people, organizations and brands mentioned in an article and the images that accompany them.

However, this is not the case for some types of entities such as places or more abstract concepts such as Human Rights. Meaning that when you’re talking about a person, you’re more likely to use an image of a person but when you’re talking about a city, you might use any kind of imagery – as shown in the contrast of the mosaics for Apple Inc. and Obama vs New York City and Human Rights:

Concept mentioned in Text: Apple Inc.

Concept mentioned in Text: Barack Obama

Concept mentioned in Text: New York City

Concept mentioned in Text: Human Rights

So what is this all about and what are some of the ways we can use text and image analysis together?

#### Hybrid Categorization

When classifying a document such as an article, we can improve the classification accuracy by analyzing text and images simultaneously. Moreover, images are in some cases more universal, compared to text that can be written in various languages, therefore in cases where we can’t analyze the text properly, hybrid analysis could allow us to rely on images for categorization.

#### Named Entity Disambiguation

As mentioned above, text can be ambiguous: when we say “apple” are we referring to the company or the fruit? That’s the problem Named Entity Disambiguation tries to solve. But it relies on textual clues for doing so, for instance if there’s a mention of Steve Jobs in the same article, we might be referring to the company.

But what if there’s not enough textual context (let’s say, in a tweet) to provide those textual clues? Well, we need to look elsewhere and thankfully a lot of digital content such as articles, tweets and comments are often accompanied by an image, which if analyzed can also provide contextual clues. As an example, compare the set of images we had for Apple Inc. with what we have below, of images tagged as containing a fruit according to Imagga:

Images confidently tagged as fruit

So in the Apple Inc. or apple scenario, if our tweet contains an image that is more similar to the images above than the ones about Apple Inc., we can confidently mark the mention of “apple” as a fruit, and not a company.

We’re looking forward to seeing what more we can do by combining text and image analysis and how a hybrid approach, can uncover greater insight from content online.