Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

Introduction

Social media and online publishing have opened up channels of mass communication to everyone. Now individuals, as well as organizations can use publishing techniques to persuade, build influence and spread ideas by sharing and distributing their content online. “Content is king” online today. However, “if you build it they will come” doesn’t always apply.

Content Discovery and Distribution

We recently spoke with Michael Schwartz, the founding partner of WebWire, who distribute business, organizational and personal news releases and press releases over the Internet. Michael told us about the four things he considers to be most important when distributing news releases.

Create well written informative content.

 

  • Reach interested readers
  • Reach appropriate influencers
  • Reach appropriate members of the professional media

Michael also spoke about a common trend; “In the good old days (before the Internet), a media directory would generally guide the marketing professional to the appropriate target, and a room full of people with printed publications, scissors and tape would create effectiveness reports….those days are over.”

Traditionally the discovery, tagging and distribution of an article or piece of content would have often been carried out by a PR professional or marketer. In recent times as machines and software get smarter and the sheer volume of content out there continues to grow, parts of the process can be automated using technology. In order to match a humans work, however, machines need to be able to understand and categorize content effectively. That’s where Text Analytics and Natural Language Processing techniques come into play.

How Text Analysis enables “modern day” distribution

Content

Creating good content that is well written, informative and engaging is central to attracting and retaining your audience’s attention. Just as “beauty is in the eye of the beholder” content needs to be well-written, informative, relevant and engaging from the viewpoint of the audience. While Text Analytics isn’t pivotal in the content creation process, elements of Text Analytics can be incorporated, into the process. For example, discovering trends, topics and identifying what is attracting engagement online can help you create relevant, informative pieces. Essentially writing good content comes down to knowledge and expertise in a certain field or area and this is particularly difficult to automate.

Once you have created an interesting, relevant and informative piece of content do you just sit back and hope it gets discovered organically?

Readers

To reach interested readers, you first need to identify who your audience is and meet them where they congregate. Today we don’t search for content we expected it to be pushed to us, through News Apps, Twitter, Facebook and so on. Analyzing content and being able to automatically extract concepts and topics from content and articles shared and distributed online allows us to identify where our target audience reside.

Text Analysis also allows us to prep our content for maximum discovery and exposure. One effective example of how this can be achieved is through the use of hashtags when sharing content on your social media sites. Being able to understand text and extract topics and concepts automatically means we can also ensure they are distributed appropriately for maximum exposure on social channels.

While you may distribute your content in the right place, to get it to stand out in an extremely crowded space online can be quite difficult. Traditionally relationships with the right journalists or individuals helped in this regard. Today, we have new targets to help with amplification in the form of influencers.

Influencers

Influencers online have generally built up a reputation as trusted and knowledgeable sources of information in a particular subject area. People see them as thought leaders and rely on them as a source of content or informal advice in some ways. An influencer picking up on and sharing your content doesn’t always happen organically. Identifying appropriate influencers to target with you content was traditionally about relationships. Today technology has made the identification of and access to these influencers somewhat easier. By effectively analyzing content, we can match content to appropriate influencers, based on interests, keywords, entities, topics and concepts they write about.

That isn’t to say there isn’t a place for utilizing journalists to increase exposure.

Professional media

Similarly, reaching appropriate members of the professional media can be achieved by matching articles to individual journalist’s areas of interest and writing style. It is vitally important that the matching process is accurate, as sending someone content they have no interest in is technically spam. Being able to analyze opinions can also help be a lot more targeted. Through Sentiment Analysis of content, we can understand a writer’s opinion and target them with appropriate releases. For example, a journalist who writes about technology but is of the opinion that Android trumps iOS isn’t going to want to publish a piece on how great the new iPhone 6 is.

Conclusion

The internet has fundamentally changed how we communicate with each other. Social Media sites, blogs, forums and mainstream media sites proliferate and rise and fall in popularity. Keeping track of the most appropriate outlets for any given piece of content is an increasingly important, difficult and time-consuming task. The ability to automate many parts of the process allows for large volumes of content to be matched accurately with the most appropriate audiences.Text Analysis can be used to reliably aid traditional distribution tactics and processes but while it may aid the process it is yet to be seen whether technology will trump human relationships with the right people built over time.





Text Analysis API - Sign up





0

The 4th edition of our “Getting up and running with AYLIEN Text Analysis API” blog series will focus on working with the API using Ruby. Previously we published code snippets and getting started guides for node.js, Python and Java.

Similar to our previous blogs, we’re going to perform some basic Text Analysis processes like, detecting what language a piece of text is written in, analyzing the sentiment of a piece of text, classifying an article and finally generating some hashtags for a URL in order to showcase how easy it is to get started in your chosen language.

We’re first going to look at the code in action. We’ll then go through the code, section by section, to investigate each of the endpoints used in the code snippet.

We are going to do the following:

  • Detect what language the following text is written in: “What language is this sentence written in?”
  • Analyze the sentiment of the following statement: “John is a very good football player!”
  • Generate a classification (IPTC Code and label) for the URL: “http://www.bbc.com/news/science-environment-30177534”
  • Generate hashtags for the URL: “http://www.bbc.com/news/science-environment-30177534”

Note: The getting started page on our website has a range of code snippets in other programming languages for you to try.

Overview of the code in action

The complete code snippet is given at the end of this blog for you to copy and paste. To run it, open a text editor of your choice and copy and paste the snippet. Before running the code, ensure you replace the YOUR_APP_ID and YOUR_APP_KEY placeholders in the code with your own application id and application key. You should have been sent this when you signed up as an API user. Make your way to our sign up page if you haven’t already signed up.

Save the file as TextAPISample.rb and then open command prompt. Navigate to the folder where you saved the code snippet and run the code by typing “ruby TextAPISample.rb”.

Note:You will need to have ruby installed to run this example, you can download it here, if you haven’t already done so.

Once you run it, you should receive the following output:


C:srcruby>ruby textapisample.rb

Text:   What language is this sentence written in?
Language: en (0.9999934427379825)

Text:   John is a very good football player!
Sentiment: positive (0.9999988272764874)

Hashtags: ["#SeaIce", "#WoodsHoleOceanographicInstitution", "#BBCNews", "#WHOI",
 "#RaceAndEthnicityInTheUnitedStatesCensus", "#Transponder", "#SynopticScaleMete
orology", "#WilkesLand", "#WaterColumn", "#PolarIcePacks", "#Twitter", "#Hanuman
tSingh", "#BritishAntarcticSurvey", "#AutonomousUnderwaterVehicle", "#UK", "#BBC
Online", "#NatureGeoscience", "#PackIce", "#Sonar", "#Arctic", "#Antarctica", "#
UnitedKingdom", "#Scratching", "#Ecosystem", "#AustraliaGroup"]


Classification: [{"label"=>"natural resources - oceans", "code"=>"06006007", "co
nfidence"=>1.0}]

...

In this case we have detected that the first piece of text is written in English and the sentiment or polarity of the second statement is positive. We have also generated hashtags for the URL and classifed the content.

The detail above shows the code running in its entirety, but to highlight each feature/endpoint we will go through the code snippet, section by section.

Language Detection

Using the Language Detection endpoint you can analyze a piece of text or a URL and determine what language it is written in. In the piece of code we have used in this blog, the “parameter” variable controls whether the call is made specifying the text directly or as a URL.


parameters = {"text" => "What language is this sentence written in?"}
language = call_api("language", parameters)

In this case we have specified that it should analyze the following text “What language is this sentence written in?” and as you can see from the output below, it determined that the text is written in English and it gave a 0.999993 confidence score that the language was detected correctly. Note: For all of the endpoints, the API returns the text which was analysed for reference and we have included it in the results in each case.

Result:


Text:   What language is this sentence written in?
Language: en (0.9999934427379825)

Sentiment Analysis

Similarly, the Sentiment Analysis endpoint takes a piece of text or a URL and analyzes it to determine whether it is positive, negative or even neutral.


parameters = {"text" => "John is a very good football player!"}
sentiment = call_api("sentiment", parameters)

In this case, we have specified that it should analyze the text “John is a very good football player!”. The API has determined that the sentiment of the piece of text is positive, we can also be pretty sure it’s correct based on the confidence score returned of 0.999998.

Result:


Text:   John is a very good football player!
Sentiment: positive (0.9999988272764874)

Hashtag Suggestions

The Hashtag Suggestion endpoint, analyses a URL and generates a list of hashtag suggestions which can be used to ensure that your content or URL’s are optimally shared on social media:


parameters = {"url" => "http://www.bbc.com/news/science-environment-30177534"}
hashtags = call_api("hashtags", parameters)

For hashtag suggestions, we have used an article about measuring the thickness of the sea ice in the Antartic published on the BBC news website http://www.bbc.com/news/science-environment-30177534. The hashtag suggestion endpoint first extracts the text from the URL (which is returned for reference by the call and the start of which I have shown below) and then analyses that text and generates hashtag suggestions.

Result:


Hashtags: ["#SeaIce", "#WoodsHoleOceanographicInstitution", "#BBCNews", "#WHOI",
 "#RaceAndEthnicityInTheUnitedStatesCensus", "#Transponder", "#SynopticScaleMete
orology", "#WilkesLand", "#WaterColumn", "#PolarIcePacks", "#Twitter", "#Hanuman
tSingh", "#BritishAntarcticSurvey", "#AutonomousUnderwaterVehicle", "#UK", "#BBC
Online", "#NatureGeoscience", "#PackIce", "#Sonar", "#Arctic", "#Antarctica", "#
UnitedKingdom", "#Scratching", "#Ecosystem", "#AustraliaGroup"]

Text of website article pointed to by the url http://www.bbc.com/news/science-environment-30177534
Antarctic sub gauges sea ice thickness
A novel autonomous sub has acquired the first detailed, high-resolution 3D maps of Antarctic sea ice...

Article Classification

The classification endpoint automatically assigns or tags an article or piece of text to one or more categories making it easier to manage and sort. The classification endpoint is based on IPTC International Subject News Codes and can identify up to 500 categories.


parameters = {"url" => "http://www.bbc.com/news/science-environment-30177534"}
classify = call_api("classify", parameters)

When we pass the url pointing to the BBC news story, we receive the results as shown below. As you can see it has labelled the article as “natural resources – oceans” with a corresponding IPTC code of 06006007 and a confidence of 1.

Result:


Classification: [{"label"=>"natural resources - oceans", "code"=>"06006007", "confidence"=>1.0}]

For more getting started guides and code snippets to help you get up and running with our API, visit our Getting Started page on our website. If you haven’t already done so you can get free access to our API on our sign up page.

image

The Complete Code Snippet


require 'net/http'
require 'uri'
require 'json'

APPLICATION_ID = 'ec551f70'
APPLICATION_KEY = 'fe51b9cf561e233808b54598e7d82413'

def call_api(endpoint, parameters)
  url = URI.parse("https://api.aylien.com/api/v1/#{endpoint}")
  headers = {
      "Accept"                           =>   "application/json",
      "X-AYLIEN-TextAPI-Application-ID"  =>   APPLICATION_ID,
      "X-AYLIEN-TextAPI-Application-Key" =>   APPLICATION_KEY
  }

  http = Net::HTTP.new(url.host, url.port)
  http.use_ssl = true
  request = Net::HTTP::Post.new(url.request_uri)
  request.initialize_http_header(headers)
  request.set_form_data(parameters)

  response = http.request(request)

  JSON.parse response.body
end

parameters = {"text" => "John is a very good football player!"}
sentiment = call_api("sentiment", parameters)

parameters = {"text" => "What language is this sentence written in?"}
language = call_api("language", parameters)

parameters = {"url" => "http://www.bbc.com/news/science-environment-30177534"}
hashtags = call_api("hashtags", parameters)
classify = call_api("classify", parameters)


puts "n"
puts "Text:   #{language["text"]}"
puts "Language: #{language["lang"]} (#{language["confidence"]})"
puts "n"
puts "Text:   #{sentiment["text"]}"
puts "Sentiment: #{sentiment["polarity"]} (#{sentiment["polarity_confidence"]})"
puts "n"
puts "Hashtags: #{hashtags["hashtags"]}"
puts "n"
puts "n"
puts "Classification: #{classify["categories"]}"
puts "n"
puts "Text of website article pointed to by the url http://www.bbc.com/news/science-environment-30177534"
puts "n"
puts " #{hashtags["text"]}"





Text Analysis API - Sign up




0

Online news aggregation services, are sites that allow you to view content from online newspapers, media outlets, blogs etc. in one place. They allow you to filter the news that you receive by category, topic, keyword, date and outlet e.g. technology, food, entertainment, in the last day, week or year and on and on. They save us a lot of time and hassle as they allow us to receive the news we want without having to hop from site to site to read updates from our favourite authors or follow topics we’re most interested in. In other words, they provide a consolidated space with the latest news and updates from many different news sources.

 

text analysis and News Aggregation

 

News Aggregators have been around for quite a while. The most progressive apps are the ones that learn and keep track of our likes and interests, in order to uncover and suggest relevant and new content which we may not have been aware of.

How does News Aggregation work?

Generally a content provider will publish their content via a feed link, which News Aggregators will subscribe to. From then on the aggregator will be informed when new content is available. These feeds from the various news sources are commonly referred to as RSS and/or Atom feeds.

For machines to analyze and attempt to understand this content they often rely on elements of Natural Language Processing and Text Analysis.

Why the need for Text Analysis?

If you consider that a news aggregator might be dealing with 50,000 plus articles per day, you can quickly see, why being able to analyze content automatically is an essential part of the process. Even if we allowed two minutes for each article to be read and classified by a human (which is ridiculously fast) it would take almost 70 days of nonstop work to get through the 50K articles. Clearly then this is a task for machines and in particular Machine Learning and Natural Language Processing in the form of Text Analysis.

What does the Text Analysis Process for News Aggregation look like?

Article Extraction

One of the first tasks to be completed by a text analysis engine when presented with an article or URL is to strip away the clutter and extract the main text and media. This process is generally referred to as Article Extraction. The story may be processed further to summarize the article in some way, this can be useful when presenting the stories for consumption, as readers will generally spend less than 3 seconds in deciding whether or not to click the article to give it further attention.

Classification based on Named Entity Extraction

Once the text of an article has been extracted it is passed to another part of the analysis engine for classification i.e. to determine whether the story is about Technology, Arts, Entertainment, Business, Finance, etc. Classification is partly achieved by extracting the named entities such as people, places, organisations, keywords, dates, twitter handles etc. from the article. Proper categorization is critical as an article is only valuable if its audience can find it. Classification tags an article with metadata from up to 500 categories, which conform to the IPTC NewsCode taxonomy.

Further Classification and Organisation based on Concept Extraction

More sophisticated analysis engines can also extract concepts from an article. Focusing on concepts and not purely relying on keywords, when analyzing text, results in better tagging and allows aggregation services to cluster similar stories together. For example an article on the current state of the Japanese economy when passed to AYLIEN’s Concept Extraction API endpoint yielded the concepts “deflation“, “public debt“, “sales tax“, “economist“, “gross domestic product“, “abenomics“, “moody’s analytics” and “bond” among others.

A key feature of a concept extraction system is the ability to provide what is called “word sense disambiguation” i.e. the ability to realise that in a tech article that mentions Steve Jobs, the word “Apple” is more like to refer to the company than the fruit! Extracting concepts can also be coupled with Topic Modelling and Clustering, which allow the reader to follow stories as they progress through time and also allows the system to uncover and present similar stories while removing duplicate or near duplicate articles.

Going a level deeper in understanding text

Context is key to the way News Aggregators provide relevant content. We chatted with Drew Curtis, the creator of Fark.com who had this to say about the current state of News Aggregators; “They’re looking at article content only, I’m arguing that the next level, is taking content that people maybe don’t care as much about and adding context to make them care.” Drew also gave us a nice example, Net Neutrality, “it’s been around for years, but only recently did anyone figure out how to make the average person care about it.”

More sophisticated analysis engines can extract intent and high-level concepts from an article, which when combined allow you to add context, and to better understand the story, not just an article. Traditional news aggregators are only going so deep, in attempting to understand text and pushing content to readers based on topics or keywords. The next wave of news aggregators need to “understand” and distribute content, not just “tag” and distribute content.

Sentiment Analysis of content can also help add context. It can allow machines to detect the tone of a text, whether it’s positive or negative, subjective or objective. Keeping track of the types of articles (sentiment, topics, categories) that a reader consumes, shares or upvotes will allow a system to learn about a reader’s preferences and present articles that are more and more in tune with the readers tastes.

Spreading the Word

Sharing useful or interesting stories gives us some “social currency” and so we are always keen to pass on articles that we think our friends and colleagues might enjoy or find useful. A good text analysis engine will also aid in this process by, for example, providing hashtag suggestions which allow for more effective sharing of content across social media sites.

Summary

Text Analysis provides the tools that make it possible for Content Aggregation systems to make sense of the myriad news articles that are published every day and present the reader with articles that are honed to their individual tastes but only when we start focusing on machines “understanding content” before it’s recommended will news aggregators become truly powerful.





Text Analysis API - Sign up





0

This is a continuation of our, “Getting up and running with AYLIEN Text Analysis API” blog series. Our last edition walked you through how to start making calls with our Text Analysis API and Java. For this blog, we are going to focus on working with the API using Python.

We’re going to perform some basic Text Analysis processes like, detecting what language a piece of text is written in, analyzing the sentiment of a piece of text and finally generating some hashtags for a URL.

As an overview and to showcase what can be achieved, we’re first going to look at the code in action. We’ll then go through the code, section by section, to investigate each of the endpoints used in the code snippet.

As a simple example, we are going to do the following:

    • Detect what language the following text is written in: “What language is this sentence written in?”
    • Analyze the sentiment of the following statement: “John is a very good football player!”
    • Generate hashtags for the following URL: “http://www.bbc.com/news/health-29912877”Note: We’re using Python 3 compatible code for this blog, the getting started page on our website has Python 2 and Python 3 compatible code snippets for you to try.

Overview of the code in action

The complete code snippet is given at the end of this blog for you to copy and paste. To run it, open a text editor of your choice and copy and paste the snippet. Before running the code, ensure you replace the YOUR_APP_ID and YOUR_APP_KEY placeholders in the code with your own application id and application key which would have been sent to you upon signing up to the API.

Save the file as TextAPISample.py and then open command prompt. Navigate to the folder where you saved the code snippet and run the code by typing “python TextAPISample.py”.

Note: You will need to have python installed to run this example, you can download it here, if you haven’t done so already.

Once you run it, you should receive the output as shown below:

In this case we have detected that the first piece of text is written in English, the sentiment or polarity of the second statement is positive and we have generated hashtags for the URL, that points to a BBC story on how the brain processes different tastes.

The detail above shows the code running in its entirety, but to highlight each feature/endpoint we will now go through the code snippet, section by section.

Language Detection

Using the Language Detection endpoint you can analyze a piece of text or a URL. In the piece of code we have used in this demo code, the “parameter” variable controls whether the call is made specifying the text directly or as a URL.

parameters = {"text": "What language is this sentence written in?"}
language = call_api("language", parameters)

In this case we have specified that it should analyze the following text “What language is this sentence written in?” and as you can see from the output below, it determined that the text is written in English and it gave a 0.999993 confidence score that the language was detected correctly. Note: For all of the endpoints, the API returns the text which was analysed for reference and we have included it in the results in each case.

Result:

Text: What language is this sentence written in?
Language: en (0.999993)

 

 

Sentiment Analysis

Similarly, the Sentiment Analysis endpoint can take a piece of text or a URL and analyze it to determine whether it is positive, negative or even neutral.

parameters = {"text": "John is a very good football player!"}
sentiment = call_api("sentiment", parameters)

In this case, we have specified that it should analyze the text “John is a very good football player!”. The API has determined that the sentiment of the piece of text is positive, we can also be pretty sure it’s correct based on the confidence score returned of 0.999999.

Result:

Text: John is a very good football player!
Sentiment: positive (0.999999)

 

 

Hashtag Suggestions

Finally, the Hashtag Suggestion endpoint, analyses a URL and generates a list of hashtag suggestions which can be used to ensure your content or URL’s are optimally shared on social media:

parameters = {"url": "http://www.bbc.com/news/health-29912877"}
hashtags = call_api("hashtags", parameters)

For hashtag suggestions, we have used an article about how the brain processes tastes published on the BBC news website http://www.bbc.com/news/health-29912877. The hashtag endpoint first extracts the text from the URL (which is returned for reference by the call and the start of which I have shown below) and then analyses that text and generates hashtag suggestions.

Result:

Hashtags: ['#Umami', '#Neuron', '#ColumbiaUniversity', '#Brain', '#Sugar', '#Carnivore', '#Species', '#VampireBat', '#TasteBud', '#StemCell', '#BBCNews', '#Fluorescence', '#Road', '#TonicWater']
Text: Brain's taste secrets uncovered
The brain has specialist neurons for each of the five taste categories - salty, bitter, sour, sweet and umami - US scientists have discovered.
The study, published in the journal Nature, should settle years of debate on how the brain perceives taste.
The Columbia University team showed the separate taste sensors on the tongue had a matching partner in the brain...

For more getting started guides and code snippets to help you get up and running with our API, visit our Getting Started page on our website. If you haven’t already done so you can get free access to our API on our sign up page.

    The complete code snippet:

    import json
    import urllib.request, urllib.error, urllib.parse
    
    APPLICATION_ID = YOUR_APP_ID
    APPLICATION_KEY = YOUR_APP_KEY
    
    def call_api(endpoint, parameters):
      url = 'https://api.aylien.com/api/v1/' + endpoint
      headers = {
          "Accept":                             "application/json",
          "Content-type":                       "application/x-www-form-urlencoded",
          "X-AYLIEN-TextAPI-Application-ID":    APPLICATION_ID,
          "X-AYLIEN-TextAPI-Application-Key":   APPLICATION_KEY
      }
      opener = urllib.request.build_opener()
      request = urllib.request.Request(url,
        urllib.parse.urlencode(parameters).encode('utf-8'), headers)
      response = opener.open(request);
      return json.loads(response.read().decode())
    
    parameters = {"text": "What language is this sentence written in?"}
    language = call_api("language", parameters)
    
    parameters = {"text": "John is a very good football player!"}
    sentiment = call_api("sentiment", parameters)
    
    parameters = {"url": "http://www.bbc.com/news/health-29912877"}
    hashtags = call_api("hashtags", parameters)
    
    print("n")
    print("Text: %s " % (language["text"]))
    print("Language: %s (%F)" % (language["lang"], language["confidence"]))
    print("n")
    print("Text: %s " % (sentiment["text"]))
    print("Sentiment: %s (%F)" % (sentiment["polarity"], sentiment["polarity_confidence"]))
    print("n")
    print("Hashtags: %s " % (hashtags["hashtags"]))
    print("n")
    print("Text: %s " % (hashtags["text"]))
    

 




Text Analysis API - Sign up




0


Image taken from Linchi Kwok’s blog

A picture is worth how many words?

Without a doubt, one of the key things that separate us humans from the rest of the animals, is the way we communicate: the volume, level of complexity and comprehensiveness of communication among humans is far greater than that of any other animal’s.

Over time we have developed and refined our communication methods by creating new conventions (symbols, languages) as well as new channels (telegraph, telephone, newspapers) for communicating with one another, and with the rapid development of technology in the modern age, this trend has accelerated.

Today it’s easier than ever to communicate through online platforms, news sites, social channels and communities, and we are communicating with each other more than ever before by publishing, sharing and consuming information in many forms – most notably text and images.

The volume, of both text and images being published and consumed online, is growing exponentially:

Number of new photos added to Facebook every month:

  • 2009: 2.5bn
  • 2010: 3+bn
  • 2012: 9bnSource: http://royal.pingdom.com/2013/01/16/internet-2012-in-numbers/

    Number of Tweets posted per day:

  • 2009: 17m
  • 2010: 68m
  • 2011: 300m
  • 2012: 450m

    Source: http://www.internetlivestats.com/twitter-statistics/

    So in order to make sense of this vast amount of content, and to get a more holistic view of the world, we need to develop scalable and adaptable technologies that are capable of analyzing both text and images.

    Similarities and Differences Between Text and Images

    Both are major communication mediums that were once very closely related. In fact, early written languages could be seen as a form of imagery.

    Also in modern day, both are found in abundance, nowhere more so than on the internet. Separately on the likes of Flickr, blogs or forums, but more often together, on platforms like Instagram, Twitter, Facebook and news sites, each with varying degrees of text to image ratio.

    There is no doubt that text is the most efficient communication medium utilized today. It is better for expressing thoughts, ideas, opinions and abstract concepts. Text gives you a lot more control over the point you want to get across – level of ambiguity, tone, context and so on.

    Images on the other hand have a varying degree of expressiveness. An image is indeed worth a thousand words, but only when there’s an image to describe your message. How can one describe an abstract concept like Entropy, Human Rights or Wabi-sabi with an image? Let’s look at a brief description of these concepts from Wikipedia:

    Entropy is a thermodynamic quantity representing the unavailability of a system’s thermal energy for conversion into mechanical work, often interpreted as the degree of disorder or randomness in the system.”

    Human rights are moral principles or norms that describe certain standards of human behaviour, and are regularly protected as legal rights in national and international law.”

    Wabi-sabi (侘寂) represents a comprehensive Japanese world view or aesthetic centered on the acceptance of transience and imperfection.”

    It almost hurts to try and visualize any of these concepts with an image, but the textual description does a pretty good job at giving you a general idea about them, to say the least.

    While text does a much better job of describing concepts and expressing thoughts or feelings, images are sometimes easier and more efficient to produce and in some cases, they are a more appropriate and effective way to express yourself.

    Question: How many of your friends have written an essay about their newborn’s first smile? How many have posted a picture or video to Facebook?

    The truth is, you can capture a lot of the simple facts and events in your surroundings with a single image, and with minimal effort compared to text. Simply, point your phone’s camera at something, and with a single tap you’ve captured and shared the moment with the entire world.

    Text and Image Analysis

    While text and images differ in many ways and can exist independently, they are in fact complementary and non-competing communication mediums, and to get a holistic view of the world, we would need to analyze both. Understanding images is as important as understanding text, as together they provide a more accurate picture of reality.

    From an AI perspective, this means we need hybrid systems that are not only capable of understanding both mediums, but are also able to discover links between the two and leverage those links to enhance the overall performance and accuracy of such analysis systems.

    Those of you who read our blog regularly, know that we are a Text Analysis company, however at a higher level, we are an AI company and therefore have a strong interest in how complementary AI solutions such as Image Analysis can be used to give us a better understanding of the real world. In particular, we have been interested in how Text Analysis and Image Analysis could be married to improve the insight gathered from content that’s produced on the Internet.

    To put some of our ideas into practice, we started by collecting over 150,000 news articles from about 50 major news outlets. We wanted to see if there’s a strong link or correlation between the text of an article and the images used in it. For each of these articles, we extracted the article’s text as well as its main images. Next, we analyzed the text of each article using our Text Analysis API to find the high-level category of the article (e.g. Technology, Sports, Food, etc) as well as specific concepts and topics mentioned in the articles (e.g. People, Places, Organizations, etc).

    The images accompanying the text were then analyzed using Imagga’s Tagging API, which for any given image, provides a set of tags describing the objects seen in the image. The analysis was performed independently, so when we’re analyzing the text we had no information about the images and vice versa.

    Findings

    What we discovered wasn’t exactly ground-breaking but it did prove a few theories we had and affirmed a connection between images and text and how both can be used to improve insights gathered from an analysis point of view.

    Categorization of Articles

    As mentioned above, we tagged each image as part of a particular category and cross-referenced it with the categories that were identified in the text to try and uncover similarities and links between the two.

    For instance, for the ABC News article titled “Kate Hudson Shows Off Her Amazing Abs” we got “people – celebrity” as the main category of the text, and the following image as the main image of the article:

    Running this image through Imagga’s Tagging API gives us the following set of tags, along with the confidence score for each tag:

    attractive (28%), model (25%), portrait (24%), pretty (23%), hair, sexy, person, adult, face, blond, caucasian, body, people, fashion, lady, cute, women, smile, glamour, happy, sensual, studio, smiling, human, blonde, lingerie, clothing, erotic, expression, lifestyle, looking, slim, style, fun, gorgeous, healthy, light brown, orange, skin, grey, black, red, pink.

    What we’re doing below is finding the most confidently tagged images for each major category, and creating a mosaic out of those images:

    Text Category: Celebrity

    Text Category: Sports

    Text Category: Health

    Text Category: Politics

    Text Category: Technology

    What we see here is a strong link between the high-level category of the text of an article, and the main image used in it.

    Concept Extraction

    We also went a step further and extracted concepts and entities mentioned in these articles, such as notable people, places, organizations, general concepts and so on and we looked for a link between these and the tags assigned to the main image of the article.

    Here we observed that for the most part, there’s a strong association between people, organizations and brands mentioned in an article and the images that accompany them.

    However, this is not the case for some types of entities such as places or more abstract concepts such as Human Rights. Meaning that when you’re talking about a person, you’re more likely to use an image of a person but when you’re talking about a city, you might use any kind of imagery – as shown in the contrast of the mosaics for Apple Inc. and Obama vs New York City and Human Rights:

    Concept mentioned in Text: Apple Inc.

    Concept mentioned in Text: Barack Obama

    Concept mentioned in Text: New York City

    Concept mentioned in Text: Human Rights


    So what is this all about and what are some of the ways we can use text and image analysis together?

    Hybrid Categorization

    When classifying a document such as an article, we can improve the classification accuracy by analyzing text and images simultaneously. Moreover, images are in some cases more universal, compared to text that can be written in various languages, therefore in cases where we can’t analyze the text properly, hybrid analysis could allow us to rely on images for categorization.

    Named Entity Disambiguation

    As mentioned above, text can be ambiguous: when we say “apple” are we referring to the company or the fruit? That’s the problem Named Entity Disambiguation tries to solve. But it relies on textual clues for doing so, for instance if there’s a mention of Steve Jobs in the same article, we might be referring to the company.

    But what if there’s not enough textual context (let’s say, in a tweet) to provide those textual clues? Well, we need to look elsewhere and thankfully a lot of digital content such as articles, tweets and comments are often accompanied by an image, which if analyzed can also provide contextual clues. As an example, compare the set of images we had for Apple Inc. with what we have below, of images tagged as containing a fruit according to Imagga:

    Images confidently tagged as fruit

    So in the Apple Inc. or apple scenario, if our tweet contains an image that is more similar to the images above than the ones about Apple Inc., we can confidently mark the mention of “apple” as a fruit, and not a company.

    We’re looking forward to seeing what more we can do by combining text and image analysis and how a hybrid approach, can uncover greater insight from content online.





    Text Analysis API - Sign up





    0

For anyone who doesn’t know, the WebSummit is an annual tech conference held in Dublin Ireland. For three days in November Tech junkies, Media personnel, Startups, Tech Giants, Founders and even School kids all descend on the RDS to learn from the best, secure investment or land that perfect job. It truly is a geek’s networking heaven.

Apart from WiFi issues and long queues, this year’s WebSummit was a huge success. Tipping over 22,000 attendees this year, its biggest yet, the WebSummit has gone from strength to strength since starting out as a 500 person meet up for the local technology community.


Before the WebSummit, which we attended and thoroughly enjoyed, we decided that, along with countless other analytics companies, we would put together a blog post on data gathered from Twitter over the course of the 3 or 4 days.



We tracked the official hashtags from the WebSummit (#WebSummit, #WebSummit14 and #WebSummit2014) and also gathered data on a dozen or so of the speakers, listed below: (some for other reasons than others)

12 Speakers

  • Paddy Cosgrave
  • Drew Houston
  • Bono
  • Mark Pincus
  • John Collison
  • Mikel Svane
  • Phil Libin
  • Eva Longoria
  • David Goldberg
  • David Karp
  • Lew Cirne
  • Peter Thiel

    Using Twitter’s Streaming API we collected about 77,300 Tweets in total from 10am Nov 3 to 10am Nov 7 (times in GMT). We set it to monitor the hashtags and users mentioned above.

    Once we had gathered the Tweets, we used AYLIEN Text Analysis API from within RapidMiner to Analyze the Sentiment of the Tweets. Following the analysis, we visualized the data in Tableau. You can read more about RapidMiner and AYLIEN Text Analysis API here.

    Volume

    While the activity was quite constant over three days you can see three major spikes in the volume of Tweets which represent each day. It’s pretty clear from this that people were enjoying themselves too much at the Night Summit to be Tweeting from the pub with the drop in volume as the day progressed. There was also a pretty evident dip in activity during lunch which suggests we all enjoyed the food or the networking opportunities the lunch break provided.

    The second graph below shows the volume of tweets with a mention of one of the speakers we were monitoring. You can clearly see spikes in volume when they hit the stage to speak. Tweets mentioning Paddy Cosgrave, WebSummit’s founder stayed pretty constant throughout. Surprisingly, the most talked about speaker at this technology conference wasn’t the founder of Dropbox or even Peter Thiel, it was Eva Longoria, the star of Desperate Housewives! Bono came in second and Peter Thiel was the third most mentioned speaker. It turns out even tech geeks have a thing for celebrities.


    Geo-location and Language

    We utilized the location data returned via the Twitter API to map where most of the activity was coming from. Not suprisingly, chatter was mainly concentrated in Dublin. What was surprising is how little activity was coming from the US.

    Tweets from and about the Summit were predominantly written in English. Considering there were companies and attendees from all over the world we expected more multi-lingual data and were surprised by the lack of Tweets in other languages.


    Sentiment/Polarity

    We hoped to get a feel for people’s reactions to the event by mining and analyzing the voice of attendees through their expressions and activity on Twitter. Overall the sentiment of the event was quite positive, however there were some negative trends that creep in throughout the event. People were most vocal about the lack of a good connection and the queues for the food.

    We analyzed all the positive and negative Tweets and created a word cloud for each by extracting keywords mentioned in those tweets. This gave a pretty clear understanding for what people liked and disliked about the event.

    Note: you can hover over the circles in the word clouds to see any words that aren’t displayed.


    The Good

  • Attendees used words like “Great”, “Love” and “Amazing” to describe the conference
  • They also really enjoyed their time in “Dublin”
  • People loved the main stage
  • The event lived up to its networking promises as attendees had positive things to say about the “people” they met

    The Bad

  • “Bad” was a common word used in negative descriptions of the event, as was “critics” and surprisingly, “people”
  • The “wifi” certainly featured as a negative topic as were the queues
  • The RDS (event holders) took a bit of a hit for not providing adequate wifi

    Some words and terms were evident in both positive and negative Tweets. The jury was out on Eva Longoria’s attendance and it’s pretty obvious the public is still undecided on what they make of Bono.

    The WiFi (The “Ugly”)

    Considering it was a tech event you would presume connectivity would be a given. That wasn’t the case. There was a strong reaction to the lack of a WiFi signal. At an event that gets 20,000+ tech heads into one room, each with a minimum of 2 devices, ensuring the ability to stay connected was always going to be a challenge.

    The initial reaction to the WiFi issues was evident in the sharp drop in polarity of Tweets. Each day it certainly had an effect on the overall sentiment of the event. However, at the close of the event the polarity had returned to where it started as people wrapped their WebSummit experience up in mostly positive Tweets. Perhaps the lack of connectivity also meant that a lot of the attendees didn’t even get the option to vent their frustrations online.

    We really enjoyed our time at the Summit, met some great people and companies and learned a lot from some of the excellent speakers. Looking forward to next year’s Summit already!





    Text Analysis API - Sign up





    0

  • At AYLIEN, we do our best to make sure our users get up and running and calling our API in the shortest time possible. As part of a new initiative, we are going to be sharing use case ideas, source code and fully functional apps to help you get the most out of our API. For this edition of the blog, we are going to focus on a pretty common use case that a lot of users want to use our API for, analyzing Tweets.

    There is a wealth of insight that can be extracted from Tweets. You can read more on Analyzing Tweets and Social data in our previous blog; Why is Sentiment Analysis important from a business perspective.

    Today, we’re going to provide you with the source code for a functioning app that mines Twitter for keywords, extracts Tweets and analyzes the text in the Tweets. As part of the process, we’ll run two analysis endpoints on each Tweet, Sentiment Analysis on all of the Tweets and Hashtag Suggestion on Tweets that contain a URL.

    What you’ll need to get going:

     

    • Twitter API access: Get your api_key, consumer_secret key and access token to make calls to the Twitter API here.
    • Node.js running on your machine: If you don’t already have Node.js on your machine you can download it here.
    • Twit: A Twitter API client library for Node.js. You can download it here.
    • A text editor: You can use any editor, We recommend Sublime Text and you can download it here.
    • AYLIEN Text Analysis API access: Get your Application ID and Application Key here. See our ‘getting started’ blog for details on how to sign up.

     

    Overview of the code in action

    To give an overview of what can be achieved, we will first look at the code in action. The complete code snippet is given at the end of this blog for you to copy and paste.

    Step1. Setup your Environment

    Ensure Node.js is running on your machine, download the twit client library from GitHub, get access to the Twitter API and finally, open an AYLIEN Text Analysis API account.

    Step 2. Copy the Code

    Open your text editor and copy and paste the code snippet (provided at the bottom of this blog) and save the file as, tweetsentiment.js. Next, open command prompt and Navigate to the folder where you saved the code snippet.

    The application takes two command line parameters which you chose; a keyword for the Twitter query and the number of Tweets the query should return.

    Step 3. Run your Code

    Run the code by typing “tweetsentiment websummit 3”. In this case we are querying the keyword “websummit” and asking for 3 Tweets to be returned.

    Once the Tweets are returned by the Twitter API they are fed to AYLIEN Text Analysis API, where the polarity will be determined and where the optimal Hashtags for URL’s will be generated.

    Note: Ensure you replace the YOUR_APP_ID and YOUR_APP_KEY placeholders in the code with the application id and application key which you received when you signed up for the AYLIEN API. You will also need to fill in your specific Twitter API credentials that you received from Twitter. All going well you should see an output on the command line similar to that shown below.

    Results:

    Tweet Text           :  RT @IndoBusiness: The #WebSummit is drawing to a close. #Bono up soon. Watch live here: http://t.co/onpdYvoIy4 Or follow our blog here: htt.
    Sentiment Polarity   :  neutral
    Polarity Confidence  :  0.9702560119839743
    Hashtags :  [
      '#RyderCup',
      '#PeterThiel',
      '#Davos',
      '#AdrianGrenier',
      '#PaulMcGinley',
      '#FoundersFund'
    ]
    
    Tweet Text           :  RT @FierceClever: I swear to god, if I hear the word "di
    srupt" one more time... #websummit
    Sentiment Polarity   :  negative
    Polarity Confidence  :  0.8947368421052632
    Hashtags :  No Hastags available as no Url specified in the Tweet
    
    Tweet Text           :  Having a great day at the #websummit . Were at stand ECM 243 in the village if anyone would like to pop over before closing!
    Sentiment Polarity   :  positive
    Polarity Confidence  :  0.9230769230769231
    Hashtags :  No Hastags available as no Url specified in the Tweet
    

    Taking the first result as an example, you can see that the Tweet itself is displayed followed by the “sentiment polarity” of the Tweet (positive, neutral or negative) and the “polarity confidence” i.e. the confidence that the sentiment returned was correct (a number between 0 and 1). Finally, if the Tweet contained a URL embedded in the Tweet a list of optimal hashtags is generated for that webpage/article.

    How the code works

    It’s worth looking at the two parts of the solution that do most of the heavy lifting :

    1. Querying Twitter

    Querying Twitter is very straight forward using the twit client and requires just one line of code:

    T.get('search/tweets', {
    	q: process.argv[2],
    	count: process.argv[3]
    }, function(err, data, response) {
      data.statuses.forEach(function(s) {
    ...

    The above line of code uses the supplied command line arguments to query Twitter, it then passes the returned results one by one to the function that will feed the body of the Tweet and the embedded URL (if any) to the AYLIEN API endpoints for analysis.

    2. Analyzing the Tweets that are returned.

    The function below, takes the following arguments, the AYLIEN endpoint to call (Sentiment, Hashtags, Entities etc.) the parameters which the endpoint should work on (i.e. we indicate whether we are passing a piece of text or a URL for analysis and we also pass the actual text or URL) and a callback function to call when the analysis is complete.

    function call_api(endpoint, parameters, callback) {
      var postData = querystring.stringify(parameters);
      var request = https.request({
        host: 'api.aylien.com',
        path: '/api/v1/' + endpoint,
        headers: {
          'Accept':                             'application/json',
          'Content-Type':                       'application/x-www-form-urlencoded',
          'Content-Length':                     postData.length,
          'X-AYLIEN-TextAPI-Application-ID':    APPLICATION_ID,
          'X-AYLIEN-TextAPI-Application-Key':   APPLICATION_KEY,
        }
      }, function(response) {
        var data = "";
        response.on('data', function(chunk) {
          data += chunk;
        });
        response.on('end', function() {
          callback(JSON.parse(data));
        });
      });
      request.write(postData);
      request.end();
    }

    The examples we have used, analyze the Tweets for Sentiment and Hashtag Suggestions. It’s up to you what endpoints you wish to use. Maybe you want to extract entities or concepts from the Tweets as well. A full list of our endpoints can be found in our documentation.

    The Code Snippet

    var Twit = require('./node_modules/twit')  //Twitter API client library
    var https = require('https'), querystring = require('querystring');
    
    //AYLIEN API Credentials
    const APPLICATION_KEY = YOUR_APPLICATION_KEY,
          APPLICATION_ID = YOUR_APPLICATION_ID;
    
    
    //Twitter API Credentials
    var T = new Twit({
        consumer_key:         YOUR_TWITTER_CONSUMER_KEY
      , consumer_secret:      YOUR_TWITTER_CONSUMER_SECRET
      , access_token:         YOUR_TWITTER_ACCESS_TOKEN
      , access_token_secret:  YOUR_TWITTER_ACCESS_TOKEN_SECRET
    })
    
    
    var analysisResults = {};
    var parameters;
    var i =  process.argv[3] * 2; //Counter to track when Asynchronous API call have completed
    
    console.log("Processing your request. Please wait...")
    console.log("n");
    
    T.get('search/tweets', { q: process.argv[2], count: process.argv[3] }, function(err, data, response) {
      data.statuses.forEach(function(s) {
        var returnedUrls = s.entities.urls;
        analysisResults[s.id] = {};
        analysisResults[s.id].text = s.text;
      	parameters = {'text': s.text};
        callAylienAPIs(parameters,outputResults);
    
        function callAylienAPIs(parameters, callback) {
    
          call_api('sentiment', parameters, function(result) {
            var a = {};
            a.endpoint = 'sentiment';
            a.polarity = result.polarity;
            a.polarity_confidence = result.polarity_confidence;
            analysisResults[s.id].sentiment = a;
            i--;
            if (i == 0)  {
              callback();
            }
          })
    
          if (returnedUrls.length > 0 ) {
              var url_paramaters = {'url' : returnedUrls[0].expanded_url };
              call_api('hashtags', url_paramaters, function(result) {
                var a = {};
                a.endpoint = 'hashtags';
                a.hashtags = result.hashtags;
                analysisResults[s.id].hashtags = a;
                i--;
                if (i == 0)  {
                  callback();
                }
              })
            } else {
                var a = {};
                a.endpoint = 'hashtags';
                a.hashtags = 'No Hastags available as no Url specified in the Tweet';
                analysisResults[s.id].hashtags = a;
                i--;
                if (i == 0)  {
                  callback();
                }
            }
        }
      });
    
    })
    
    
    function outputResults() {
    
        for (var key in analysisResults) {
        console.log("Tweet Text           : ", analysisResults[key].text);
        console.log("Sentiment Polarity   : ", analysisResults[key].sentiment.polarity);
        console.log("Polarity Confidence  : ", analysisResults[key].sentiment.polarity_confidence);
        console.log("Hashtags : ", analysisResults[key].hashtags.hashtags);
        console.log("n");
        }
    }
    
    
    
    
    function call_api(endpoint, parameters, callback) {
      var postData = querystring.stringify(parameters);
      var request = https.request({
        host: 'api.aylien.com',
        path: '/api/v1/' + endpoint,
        headers: {
          'Accept':                             'application/json',
          'Content-Type':                       'application/x-www-form-urlencoded',
          'Content-Length':                     postData.length,
          'X-AYLIEN-TextAPI-Application-ID':    APPLICATION_ID,
          'X-AYLIEN-TextAPI-Application-Key':   APPLICATION_KEY,
        }
      }, function(response) {
        var data = "";
        response.on('data', function(chunk) {
          data += chunk;
        });
        response.on('end', function() {
          callback(JSON.parse(data));
        });
      });
      request.write(postData);
      request.end();
    }





    Text Analysis API - Sign up




    1