Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

Introduction

There is a wealth of information in a LinkedIn profile. You can tell a lot about someone and how well they are suited to a role by analyzing their profile on LinkedIn, and let’s face it, LinkedIn in the number one platform to showcase yourself to potential employers and recruiters.

However, there are a number of issues that arise in relying on Linkedin profiles to understand a candidate’s suitability for a role and their current job function.

The Experiment

We set out to find out what section of a LinkedIn profile contains the most insight into an individual’s job function by using Semantic Labeling to try and predict an individual’s job title based on the info they have on their profile.

How did we do it?

We scraped and parsed a number of well known LinkedIn profiles. Using the information we extracted from the profile such as keywords, summaries, job titles and skills we attempted to predict an individual’s job function from each information section to understand which best represents an individual’s ability or function.

We started out by choosing 4 general tags or labels for an individual’s profile that would point towards their high-level job function:

    • Information Technology
    • Finance
    • Marketing
    • Operations

 

Using the Semantic Labeling feature to check how related a tag or label, like Marketing, was to an individual’s actual job function, we could essentially predict what an individual’s actual function is.

Our findings are displayed in the sheet embedded below. The first section of the sheet contains the profiles and information extracted. The Yellow section is the prediction results based on the skills section, red is the Summary section and Green is the Job Title results.

When a label/job function is assigned following our analysis it is also accompanied by a confidence score, which indicates how confident we are in the results. This is important to note as we dive into some of the results. The “winning” results with the highest scores are marked in green.

Note:

For this blog, we kept the functions quite general but you can get quite specific as we have with Gregory’s account below.

Scraped information and Results

But what section of a profile provides the most insight?

Content

When analyzing a Linkedin profile or even using the search feature we’re primarily focusing on keywords mentioned in the content of that profile. Educational institutes, companies, and technologies mentioned for example.

Relying on keywords can often cause problems, there is huge a amount of latent information in a profile that is often overlooked when scanning profiles for keywords. A major problem with keyword search is that it misses related skills, e.g. someone might have “Zend Framework” on their profile, but not PHP – which is inherent, ‘cause Zend is a PHP framework. Good recruiters or somenone with programming knowledge would know this, average recruiters, however, may not.

The same could be said for someone who mentions Image Processing in their profile there is no obvious connection to other inherent knowledge such as Facial Recognition. A knowledge base such as Wikipedia, DBpedia or Freebase can be used to discover these latent connections:

Job Titles

Relying on job titles can also cause problems. They can be inaccurate, misleading or even made up. Especially today, as people often create their own titles. Take Rand Fishkin’s profile on LinkedIn as an example. Unless you know of MOZ and Rand’s wizardry you would have no idea he is at the forefront of Inbound, Social and SEO.

 

Another good example is Dharmesh Shah, founder of HubSpot’s profile. Dharmesh’s title is Founder and CTO of HubSpot.  Running our analysis on the extracted title, Information Technology with a score of .016 is the job function returned for Dharmesh, which is somewhat accurate. However, running the same analysis on his skills section gives a far more accurate result suggesting Dharmesh is actually a Marketer with a “winning” score of .23.

Summaries

A profile Summary can be quite insightful and can provide a strong understanding of someone’s ability and function, but the problem is they aren’t always present or they often contain very little information causing them to be overlooked or disregarded. As was the case in many of the example profiles we used.

The ones that do have a detailed summary provided some strong results. With Rand Fishkin’s profile summary returning some accurate results of Marketing and a score of .188.

There was one section that outperformed the others when providing relevant tags and confidence scores.

Skills

The Skills section on a LinkedIn profile is a gold mine of insight. Based on the information extracted from the skills section, we could more accurately predict an individual’s job function.

Comparing the results and labels assigned across all the information sections and on every profile we used, the Skills section produced the most accurate relationships and the highest confidence scores, which can be seen marked green in the sheets above.

Conclusion

We don’t have an exact science or formula for deciding whether a label is accurate or not, however, our experiment still does a good job of highlighting the fact that, a lot more information and insight can be gleaned from the skills section of a linkedIn profile in deciding at first glance, or automatically how well a candidate is suited to a particular job function. We will explore these ideas in future posts.

 

0

Semantic Labeling is a very popular feature with our Text API users, so we’ve decided to roll it out, as a fully functional Text Analysis Add-on feature too.

For this blog we’re going to walk you through what it does and use some examples to showcase how useful it can be for classifying or categorizing text.

So what exactly is Semantic Labeling?

It’s an intelligent way of tagging or categorizing text, based on labels that you suggest. It’s a training-less approach to classification, which means, it doesn’t rely on a predefined taxonomy to categorize or tag textual content.

With Semantic Labeling you can provide a piece of text, specify a set of labels and the add-on will automatically assign the most appropriate label to that text. This allows for greater flexibility for add-on users to decide how they want to tag and categorize text in their spreadsheets

Our customers are using this feature for a variety of different use cases. We’ll walk you through a couple of simple ones, to show you the feature in action.

Text Classification from a URL

Say for example, I run a sports blog and I want to automatically curate and categorize lots of articles/URL’s into predefined categories that I cover on my blog and list them in a spreadsheet.

Any of the features in the add-on can be used to analyze a URL. Just choose the cell containing that URL in your spreadsheet and hit analyze. Using the Semantic Labeling feature is a little different because you need to also submit your candidate labels through the Text Analysis Add-on sidebar.

Once you choose Semantic Labeling, you’ll notice 5 label options will populate on the right. This is where you enter your categories or labels. In this case, we’re going to use the following URL and Labels.

Example URL:

http://insider.espn.go.com/nfl/story/_/id/12300361/bold-move-new-england-patriots-miami-dolphins-new-york-jets-buffalo-bills-nfl

Labels:

  • Golf
  • Football
  • Soccer
  • Hockey
  • Cricket
     
    Once you’ve selected the cell that you want to analyze and you’ve entered your labels, just hit analyze.

    The add-on will then populate the results in the next 5-10 cells in that row. As is the case in the example below.

    In this case the add-on chose “Football”, as the most closely related label to the article on that webpage. The add-on also displays a confidence score showing which label is “the winner”.

    As you can see from the screenshot of the URL below it did a pretty nice job of recognizing the article had nothing to do with soccer or golf and was primarily about Football.

    Article Screenshot:

    Customer Query Routing

    We’ve also seen our users analyze social interactions like Tweets, Facebook comments and even Email to try and intelligently understand and tag them without the need for manual reading.

    So, let’s say we want to automatically determine whether a post on social media should be routed to and dealt with by our Sales, Support or Finance Departments.

    We’ll use 2 different Tweets that could be handled by different teams within a business and use the different department titles as labels.

    Labels:

  • Sales
  • Finance
  • Support
     
    Tweets:
  • “Are you guys down? I can’t access my account?”
  • “Who do I get in touch with if I want purchase your software?”
     
    Again, choose the cells you want to analyze that contain your Tweets, add your candidate labels in the sidebar and hit analyze.

    The add-on, as shown in the previous example, will populate it’s results in the next few cells showing the most appropriate label first, along with it’s score.

    Again the add-on was pretty accurate in assigning the correct labels to each Tweet. The first Tweet was tagged as most relevant to support and the second one was most appropriately referred to the sales department.

    This feature allows you to analyze and categorize long and short form text based off your own labels or tags. You can submit between 2 and 5 labels to the add-on and it will return the most semantically relevant tag as well as a confidence score.





  • Text Analysis API - Sign up




    0

  • This blog is an adaptation of a talk, “Computer intelligence”, delivered by our founder Parsa Ghaffari (@parsaghaffari) and Trinity College, Dublin, research fellow and founder of Wripl, Kevin Koidl (Ph.D., M.Sc. (Dipl. Wirtsch. Inf. TU)). (@koidl)
  • The talk is a discussion of how computer, or artificial, intelligence works, its applications in the industry and the challenges it presents. You can watch the original video here.

Introduction

Artificial intelligence or Computer intelligence [1] is hot in the tech scene right now, it’s high priority with tech giants like Google and Facebook, journalists are writing about how it will take our jobs and our lives and it’s even hot in Hollywood (although mostly in a technophobic fashion typical of the 21st century Hollywood).

In the industry, all of a sudden AI is everywhere and it almost looks like we’re ready to replace Marc Andreessen’s famous “software is eating the world” with “AI is eating the world”.

But what exactly are we talking about when we refer to Artificial or Computer intelligence?

AI could be defined as the science and engineering of making intelligent computers and computer programs. Since we don’t have a solid definition of intelligence that is not relative to human intelligence, we can define it as the ability to learn or understand things or to deal with new or difficult situations. We also know what computers are, they’re essentially machines that are programmed to carry out a specific task. So Computer Intelligence could be seen as a combination of these two concepts: an algorithmic approach to mimicking human intelligence.

Two branches of AI

Back in the 60s, AI got to a point where it could actually do things, and that created a new branch of AI that was more practical and more pragmatic, which, as a result, got adopted and pioneered by the industry eventually. The new branch (which we call Narrow AI in this article) had different optimization goals and success metrics compared to the original, now called, General AI.

 

image

General AI

If your goal was to predict what’s going to happen next in the room where you’re sitting, one option would be to consult with a physicist who would probably take an analytical approach and use well known equations from Thermodynamics, Electromagnetism and Newtonian Physics to predict the next state of the room.

A fundamentally different approach that doesn’t require a physicist’s involvement would be to set up as many sensors as possible (think video cameras, microphones, thermometers, etc) to capture and feed all the data from the room to a Super Computer, which then runs some form of probabilistic modelling to predict the next state.

The results you get from the second approach would be far more accurate than the ones produced by the physicist. However, with the second approach you did not really understand why things are the way they are, and that’s what General AI is all about: to understand how certain things such as language, cognition, vision, etc work and how they can be replicated.

Narrow AI

Narrow AI is a more focused application of Computer Intelligence that aims to solve a specific problem and is driven by industry, economics and results. Common use cases you will certainly have heard of include Siri on your iPhone or self-driving cars for example.

While Siri can be seen as an AI application, that doesn’t mean that the intelligence behind Siri can also power a self-driving car. The AI behind both is very different, one can’t do the other.

It’s also true that with Narrow AI the intelligence works by crunching information in set conditions for economic outputs so as an example Siri can only answer you certain questions, questions that she has the answer to or can retrieve the answer to by referencing a database.

Challenges of AI

Technical Challenges

As human beings, understanding visual and lingual information comes to us naturally, we read a piece of text and we can extract meaning, intent, feelings, information. We look at a picture and we identify objects, colours, people, places.

However, for machines it’s not that easy. Take this sentence for instance; “I made her duck”. It’s a pretty straightforward sentence, but it has multiple meanings. There are actually 4 potential meanings for that short sentence.

I cooked her some duck I forced her, to duck I made, her duck (the duck belonged to her) I made her duck (made her duck out of wood for example)

When we interpret text we rely on prompts, either syntax indicators or just context, that helps us predict the meaning of a sentence but teaching a machine to do this is a lot harder. There is a lot of ambiguity in language that makes it extremely hard for machines to understand text or language in general.

 

image

 

The same can be said for an image or picture, or visual information in general. As humans we can pick up and recognise certain things in an image within a matter of seconds, we know that there are a man and a dog in a picture, we recognise colours and even brands, but it takes an intelligent machine to do the same.

 

image

 

Philosophical Challenges

One of the main arguments against AI’s success is that we don’t have a good understanding of human intelligence, and, therefore, are not able to fully replicate it. A convincing counter-argument, pioneered by the likes of Ray Kurzweil, is that intelligence or consciousness is an emergent property of the comparatively simpler building blocks of our brains (Neurons) and to replicate a brain or to create intelligence, all we need to do is to understand, decode and replicate these building blocks.

Ethical Challenges

Imagine you’re in a self-driving car and it’s taking you over a narrow bridge. Suddenly a person appears in front of the car (say, due to losing balance) and to avoid hitting that person the AI must take a sharp turn which will result in the car falling off the bridge. If you hit the person, they will die and if you fall off the bridge you will get killed.

One solution is for the AI to predict who’s more “valuable” and make a decision based on that. So it would factor in things like age, job status, family status and so on and boil it down to a numerical comparison between you and the other person’s “worth”. But how accurate would that be? Or would you ever buy a self-driving car that has a chance of killing you?

Conclusion

While some serious challenges in AI still remain open, industry and the enterprise have latched on to the benefits that AI, like Natural Language Processing, Image Recognition and Machine Learning, can bring to a variety of problems and applications.

One thing can be said for certain, and it’s that AI has left the science and research labs and is powering developments in health, business and media. Industry has recognised the potential of Narrow AI and how it can change, enhance and optimize the way we approach problems and tasks as human beings.


[1] The border between AI and human intelligence is getting blurred, therefore eventually we might get to a point where intelligent behaviour manifested by a machine can no longer be labeled as “artificial”. In that case, Computer Intelligence would be better suited. That said, we use the terms Computer Intelligence and Artificial Intelligence interchangeably in this article.

 

image
0

Watch our founder and CEO, Parsa Ghaffari (@parsaghaffari) and Kevin Koidl (@koidl), a Research Fellow at Trinity College Dublin’s Department of Computer Science and the ADAPT research centre, talk Computer intelligence at a recent talk they gave at Science Gallery, Dublin.

This interactive discussion, takes you from general AI, right through to the modern day applications of narrow AI paying, particular attention to a real life example, the Bigfoot App, which was created as part of the Lifelogging exhibition, currently running at Science Gallery, Dublin.

0

Unstructured data is information that isn’t organized in a predefined manner or data that doesn’t have a pre-defined model. It is often text heavy, but it can also contain numbers, dates and images. This data does not follow a specified format.

It’s is data we encounter and create every day, social posts, news articles, emails, audio files and images are all examples of unstructured data. It’s usually human generated and created for consumption by humans and not machines. It comes with a level of complexity and ambiguity that makes it extremely difficult for machines and computers to understand.

Structured data, on the other hand, is data that can be easily organized and referenced, it usually resides in a table or a relational database. It’s easily categorized and organized according to a predefined schema making it super easy to analyze and understand even through machine interactions.

It’s thought that about 10% of data within an enterprise or organization is structured data. So, what are we missing in that other 90%?

Consider the wealth of information and insight that’s overlooked because it’s just too difficult to analyze without having to sit down and read it.

  • Emails Presentations
  • NPS data
  • Customer Complaints and Queries
  • Images

These all contain valuable information and insights from a business point of view. However, they’re all internal data sources, we haven’t even considered external sources like social media data, photos, press releases, reviews. Which all hold as much if not more valuable information that is often left unexplored from a data analysis point of view.

While unstructured data dominates structured by 4 to 1, they often exist in tandem and the ability to analyze both from a data analytics point of view can provide far more insightful analyses. Take emails for example. There is structured data in an email that is easy to understand, categorize and reference. The email sender, the receiver, the date and time it was sent etc. But the body of the email, the actual content, isn’t so easy to understand. It can’t be categorized easily, it doesn’t fit into to a predefined model and requires manual human analysis to be understood.

How can we start extracting value from the mountain of unstructured data out there?

These data sources are often overlooked simply because, it’s too hard for humans to trawl through it and basic information retrieval tools and solutions just don’t have the power to understand this lingual and visual information. While it’s easy for a human to look at a piece of text or an image and extract information from it, it’s far from trivial for a machine to do the same.

With modern technology however, we can start making sense of the mountain of data which is primarily unexplored. Artificial Intelligence, Machine Learning, Natural Language Processing and Image Recognition developments are some of the technological advances that are helping make sense of the vast amount of unstructured data available both publicly and within the enterprise.

Machines can understand unstructured information better than ever before. We have made advancements in Artificial Intelligence that have brought human like capabilities to how machines understand unstructured information.

Understanding Lingual information

 

 

Understanding visual information

 

 

This technology has been around for quite some time but it’s only in recent years that they have been popularized and adopted as part of an advanced data management strategy.

The adoption of modern data management technologies, cheaper computing power and the media hype on the power of AI have only cemented the fact that we are on our way to being able to make sense of the majority of data out there, which has somewhat been overlooked up until now. It seems we have accepted there are powerful insights in human generated data that doesn’t sit in a database. We have embraced the power of computer intelligence to help us harness that information by adding a depth to data analysis that is impossible to achieve otherwise, but we need a shift in investment and mindset in order to truly take advantage of the “big data” out there.

 




Text Analysis API - Sign up




0

We like to share simple and useful examples of how we use our Text Analysis API to automate everyday tasks and business processes. What could be seen as a life hack of sorts, we’ve put together a useful script you can use, to stay on top of your favorite news resources or content outlets.

If you’re like us you’ll have a morning routine of checking various news sites, blogs and forums for interesting news or content, either to share or just for personal interest. Trawling through news sites and different outlets can be time consuming, jumping from site to site and skim reading articles can be a bit of a pain and using a news aggregator often doesn’t provide the personalization of content it promises.

With this in mind, we decided hack together a useful script in Node.js to do the hard work for you. This script, available on GitHub , will track an RSS feed or feeds that you specify and email you a personalized daily digest. I know what you’re thinking, another daily digest! But this is different, using Artificial Intelligience, we’ll automatically categorize your articles and summarize them into a TL;DR, so you’ll know immediately which articles to read, what you should share and what to ignore in no time at all.

At AYLIEN, we use this hack to stay on top of useful and interesting content gathered from a range of different outlets. It means we don’t have to spend time checking up on the various news sites and blogs we curate content from. The content we gather is either used internally or we share it through other channels if we think it’s of interest to our customers and followers.

We’re going to take you through how the script works so that you can customize it to monitor a feed or feeds of your choice.

What you’ll need

For this script to run, you’re going to need the following:

  • A Text Analysis API Key (free)
  • A Mailgun account (free)
  • Node.js
  • An RSS feed or multiple feeds to monitor

Setup

Step 1.

Get your accounts with AYLIEN and Mailgun setup and note your AYLIEN_APP_ID, AYLIEN_APP_KEY and MAILGUN_API_KEY which you’ll need to run the script later.

Step 2.

Find the URL(s) for the feed or feeds you want like to monitor. For this example, we’re going to use http://www.kdnuggets.com/feed an interesting blog on all things data and analytics related.

Step 3.

Copy and paste the javascript code given at the end of this blog into your text editor or grab the code from GitHub

Step 4.

At the top of the script, there’s a section to fill in your specific AYLIEN and Mailgun credentials. As below:

const AYLIEN_APP_ID = '123456789';
const AYLIEN_APP_KEY = '123456789;
const MAILGUN_API_KEY = '12345678911';
const SENDING_EMAIL_ACCOUNT = 'sending@example.com';
const RECIPIENTS = 'Recipient 1 <recipient1@example.com>';
const NUM_STORRIES_PER_RSS_FEED = 3;
var rssFeeds = ["http://www.kdnuggets.com/feed", "http://news.ycombinator.com/rss"];

Step 5.

Save the script with the name rssToEmail.js or whatever you choose. Next, open command prompt/terminal, navigate to the folder where you saved the script and run it.


node rssToEmail.js

If you’re running the script in a windows environment you’ll see a response similar to the following.


C:srcjavascript>node rssToEmail.js
Processing rss feed - http://www.kdnuggets.com/feed...
Processing rss feed - http://news.ycombinator.com/rss...
Preparing Digest email...
Success

C:srcjavascript>

Results

Provided your code and setup are running OK, you should now have an email in the inbox you specified titled ‘Your Daily RSS Digest’. The email will resemble the one below:

As you can see in the sample image above, each story will have a Title extracted from the URL, a two-tier classification, a link to the original article and most importantly a TL;DR of the article to give you quick hit of info. Which should give you enough information to decide whether you want to read, share or ignore it.

Try adding Hashtag Suggestions if you’re hoping to use it as a content curation tool. That will include optimal hashtags, along wth the other data, for you to use if your resharing the content ensuring you’ll get maximum exposure on social sites.

Code Snippet

 

//**************************************
//Configuration details specific to you.
//**************************************
const AYLIEN_APP_ID = YOUR_AYLIEN_APP_ID;
const AYLIEN_APP_KEY = YOUR_AYLIEN_APP_KEY;
const MAILGUN_API_KEY = YOUR_MAILGUN_API_KEY;
const SENDING_EMAIL_ACCOUNT = 'sending@example.com';
const RECIPIENTS = 'Recipient 1 ';
const NUM_STORRIES_PER_RSS_FEED = 3;
var rssFeeds = [
  "http://www.kdnuggets.com/feed",
  "http://news.ycombinator.com/rss"
];

//**************************************
//End of configuration section
//**************************************


var AYLIENTextAPI = require('aylien_textapi'),
  request = require('request'),
  xml2js = require('xml2js'),
  Mailgun = require('mailgun').Mailgun;

var textapi = new AYLIENTextAPI({
  application_id: AYLIEN_APP_ID,
  application_key: AYLIEN_APP_KEY
});

var mg = new Mailgun(MAILGUN_API_KEY);
var emailBody = '';
var numStories = 0;

rssFeeds.forEach(function(rssFeed) {
  setTimeout(function() {
    request(rssFeed, function(error, response, body) {
      if (error === null) {
        parser.parseString(body);
      } else {
        console.log('rss error : ', error);
      }
    });
    var parser = new xml2js.Parser();
    console.log("Processing rss feed - " + rssFeed + "...");
    parser.addListener('end', function(result) {
      var items = result.rss.channel[0].item;
      items.slice(0, NUM_STORRIES_PER_RSS_FEED).forEach(function(
        item) {
        var title, link;
        title = item.title[0];
        link = item.link[0];
        textapi.classify(link, function(error, result) {
          if (error === null && result.categories[0]) {
            var label = result.categories[0].label;
            var code = result.categories[0].code;
            textapi.summarize({
              url: link,
              sentences_number: 3
            }, function(err, resp) {
              if (err === null) {
                var summary = "";
                for (var i = 0; i & lt; resp.sentences.length; i++) {
                  summary += resp.sentences[i] + " ";
                }
                var story = 'nnTitle : ' + title +
                  'n' + 'Classification : ' + label +
                  'n' + 'Link : ' + link +
                  'nSummary : ' + summary;
                emailBody += story;
                numStories++;
                if (numStories ==
                  NUM_STORRIES_PER_RSS_FEED * rssFeeds.length
                ) {
                  console.log(
                    "Preparing Digest email...");
                  mg.sendText(SENDING_EMAIL_ACCOUNT, [
                      RECIPIENTS
                    ],
                    'Your Daily RSS Digest',
                    emailBody,
                    'noreply@example.com', {},
                    function(err) {
                      if (err) console.log(
                        'Error sending message to mailgun...: ' +
                        err);
                      else console.log('Success');
                    });
                }
              }
            });
          } else {
            console.log('classify error : ', error);
            console.log('classify error ', result.categories[
              0]);
            numStories++;
          }
        });
      });
    });
  }, 3000);
});
0

Natural Language Processing, Artificial Intelligence and Machine Learning are changing how content is discovered, analyzed and shared online. More recently, there has been a push to harness the power of Text Analytics to help understand and distribute content at scale. This is particularly evident with the popularity of recommendation engines and intelligent content analysis technologies like Outbrain and Taboola, who now have a presence on most content focused sites.

Intelligent software and technological advancements allow machines to understand content as a human would. When we read a piece of text, we make certain observations about it. We understand what it’s about, we notice mentions of companies, people, places, we understand concepts present in it, we’re able to categorize it and if needed we could easily summarize it. All because we understand it and we can process it.

Text Analysis and NLP techniques allow machines to do just that, understand and process text, but the main difference is, machines can work a lot faster than us humans.

But just how far can machines go in understanding a piece of content?

We won’t dwell too much on how the process works in this post, but instead we’ll to focus on what level of understanding a machine can extract from text by using the following news article as an example:

http://www.reuters.com/article/2015/03/06/us-markets-stocks-idUSKBN0M21AP20150306

If you want to read more about how the process works and different approaches the Text Analysis you can download our Text Analysis 101 ebook here.

Extracting insight

When we read a news article, we might want to know what concepts it deals with, whether it mentions people, places, dates etc. We might need to determine whether it’s positive or negative, what’s the authors intent and whether it’s written subjectively or objectively. We do this almost subconsciously when we read text. NLP and AI allow machines to somewhat mimic this process by extracting certain things from text like Keywords, Entities, Concepts and Sentiment.

Entities

Content often contains mentions of people, locations, products, organizations etc. which we collectively call Named Entities. They can also contain values such as links, telephone numbers, email addresses, currency amounts, percentages and so on. Using statistical analysis and machine learning methods, these entities can be recognized and extracted from text as shown below.

 

 

Concepts

Sometimes you wish to find entities and concepts based on information that exists in an external knowledge base, such as Wikipedia. Looking beyond statistical analysis methods and using a Linked Data-aware process machines can extract concepts from text. This allows for a greater understanding of topics present in text.

Extracting concepts is a more intelligent and more accurate approach which gives a deeper understanding of text. These methods of analysis also allow machines to disambiguate terms and make decisions on how they interpret text, decisions like, is a mention of apple referring to the company or the fruit. Which is displayed as an example in the results below.

 

 

All of this information we can glean from text, showcases how machines understand it. However, it doesn’t need to stop there, with all of this information it’s possible to go a step further and start categorizing or classifying text.

Classification

Based on the meta-category of an article, we can easily understand, what a piece of content is about. As is shown in the results below from our sample article, classifying text can make it far easier to understand at a high level what a piece of content is about.

Classifying text means it is far easier to manage and sort large amounts of articles or documents without the need for human analysis which is often time consuming and inefficient.

Sentiment Analysis

Machines can even go as far as interpreting an author’s intent from analyzing text. Utilizing modern NLP techniques machines can determine whether a piece of text is written subjectively or objectively or whether it’s a positive, negative or neutral.

 

 

It’s also possible to process the insights to create something more, like a summarization of content for example.

Summarization

Sometimes articles and documents are just too long to consume, that’s where intelligent services like the automatic summarization of text can help. After analyzing a piece of content, it’s possible for machines to extract the key sentences or points conveyed and to display them as a consumable summary, like the one below.

 

 

These are some straightforward examples of the information machines can extract from text as part of the content analysis process. Our next post, will deal with what exactly, can be done with the information gleaned from text. We’ll look at real life use cases and examples of how automated Text Analysis is shaping how we deal with content online.

 

0

The web as we know it today is a supersaturated content network. It is extremely difficult to discover, track, and distribute digital content efficiently and effectively online. The amount of content being created online on a daily basis (text, images and video) and the wide range of channels distributing it, means it has become increasingly difficult to block out the noise and focus on relevant content.

Content online is growing at an alarming rate:

News websites, blogs, recommendation apps, aggregators, social platforms are all battling for readers and for content to be consumed on their platforms. Content has become a massive component of digital marketing strategies, but the sheer volume of content out there means it’s harder than ever to find relevant content.

Blocking out the noise

While it’s not an entirely new concept, Content Analysis, especially applied to digital content, is more relevant today than ever before because we’re just creating so much of it. We need the ability to gather content at scale and uncover relevant content. By curate, we’re talking about collecting new and popular content, extracting insight from it, deciding whether it’s relevant, to a specific need or just noise.

Traditionally content analysis was carried out by knowledgeable humans who would manually trawl through hundreds if not thousands of pieces of content looking for useful or interesting pieces. Up until recently we relied on Keyword search or tagging to make this process easier, but there was still a large amount of time wasted in the curation process.

Things are little different today

Advancements in and the democratization of Natural Language Processing, Image Recognition and Machine Learning have had a profound effect on how we continue to discover, consume and distribute content on the web.

Content Discovery

Discovering content online is about listening to the web and grabbing relevant pieces of content from a massive amount of data sources. Information retrieval, Machine Learning and Semantic Search advancements now make it possible for machines to monitor content at scale. Intelligent systems can now listen to the web and automatically discover and recommend relevant or personalized content. They can learn what content is relevant to a specific need and automatically block sift through the noise to uncover what matters, without relying on keywords or constant human supervision.

Analyzing Content

Analyzing content is about extracting insight. Understanding content to a human level. Extracting topics or concepts, mentions of people, places, brands etc from text or knowing an image contains the face of a man or a view of a sunset. Natural Language Processing allows machines or software to do exactly that, to understand content. Another aspect of analyzing content is understanding how that content is consumed and shared. What platforms has it been circulated on, what news sites are covering it, how many likes and retweets does it have and who is consuming it.

Whether you’re a publisher, a marketer, a recommendation engine, a news outlet or an advertiser, relying on a team of human agents to discover and analyze content just isn’t good enough anymore. We need to embrace technological advances to work smarter and faster and keep on top of the expanding web of content online today.

 

image

0

Disclaimer:

The following press release is ficticious, while it is quite believable that we would acquire 4 of the biggest companies in the world, we were actually just having some fun with April Fools.

Press Release, 1/4/2015

AYLIEN, a content analysis company, based out of Dublin, Ireland, announced details today of their strategic acquisitions of a number of tech giants including Apple, IBM, Microsoft and Google.

 

 

The Market

There has been a lot of activity in the Machine Learning, Artificial Intelligence and Natural Language Processing market in recent times, so AYLIEN’s acquisition of four of the biggest players in the technology industry comes as no surprise to tech and financial analysts.

The takeovers add further to the disruption in an already exciting and rapidly growing space. Parsa Ghaffari, AYLIEN’s founder, has been quoted as saying “these acquisitions just made sense” however, he was unwilling to comment on the commercial details of the takeover.

The Details

The acquisitions, which have been dubbed #AYLIENINVASION, will have a significant impact on AYLIEN’s vital statistics:

  • AYLIEN’s market cap is now valued at a staggering ~$1610 Trillion
  • Head Count at the 1,000 square foot Dublin HQ will grow from 7 to 660,219 employees
  • The company’s cash reserves will also increase significantly to ~$311 Billion

The Challenge

Ghaffarri also spoke about the challenge of integrating all of the disparate technology they have acquired into one “Super Intelligent Enterprise Huge Data Suite”.

“We have clear goals and ideas for what we want to do with the majority of the technology we have acquired. Our main challenge right now is, figuring out what exactly to do with the Microsoft technology… It’s likely we’ll leave it on the shelf.”

One thing he is clear on, however, is the rebranding of the technology, to “Spiral Loof” which Parsa mentioned was the names of his pet tortoise and hamster.

0