Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

Introduction

Welcome to the second installment in our series of monthly posts where we’ll be showcasing our News API by looking back at online news stories, articles and blog posts to uncover emerging insights and trends from topical categories.

For our February review, we looked at three IAB categories: Arts & Entertainment, Science and Politics.

For March, we’ve decided to narrow our focus a little further by looking at IAB subcategories to give you an idea of just how specific and granular you can be when sourcing and analyzing content through the News API. With this in mind, we’ve gone with the following three subcategories:

  1. Cell phones (subcategory of Tech & Computing)
  2. Boxing (subcategory of Sports)
  3. Stocks (subcategory of Personal FInance)

and for each subcategory we have performed the following analysis;

  • Publication volumes over time
  • Top stories
  • Most mentioned topics
  • Most shared stories on social media

Try it yourself

We’ve included code snippets for each of the analyses above so you can follow along or modify to create your own search queries.

If you haven’t already signed up to our News API you can do so here with a free 14 day News API trial.

1. Cell phones

The graph below shows publication volumes in the Cell phones subcategory throughout the month of March 2017.

Note: All visualizations are interactive. SImply hover your cursor over each to explore the various data points and information.

Volume of stories published: Cell phones

From the graph above we can see a number of spikes indicating sharp rises in publication volumes. Let’s take a look at the top 3;

Top stories

The three stories that contributed to the biggest spikes in news publication volumes;

  1. Samsung release their latest flagship phone, the Galaxy S8.
  2. The UK introduces a loss-of-license punishment for new drivers caught using their cell phones while driving.
  3. HTC reveal a limited edition version of their U Ultra smart phone.

It will perhaps come as no surprise to see one of the world’s top smartphone manufacturers, Samsung, getting the most media attention with the launch of their latest flagship model. In comparison, rivals HTC failed to generate the same level of hype around their latest model. However, by releasing a teaser about a surprise product release on March 15 they still managed to generate two of the top four publication volume spikes within the cell phone category in March.

Try it yourself – here’s the query we used for volume by category

Read more: We looked at Samsung’s recent exploding battery crisis to highlight how news content can be analyzed to track the voice of the customer in relation to crisis prevention and damage limitation.

Most mentioned topics

From the 7,000+ articles we sourced from the Cell phones category in March we looked at the most mentioned topics;

Try it yourself – here’s the query we used for most mentioned topics

Most shared on social media

What were the most shared stories on social media? We analyzed share counts from Facebook, LinkedIn and Reddit to see what type of content is performing best on each channel.

Facebook

  1. Man dies charging iPhone while in the bath (BBC. 26,072 shares)
  2. US bans electronic devices on flights from eight Muslim countries (The Independent. 25,886 shares)

Linkedin

  1. Samsung tries to reclaim its reputation with the Galaxy S8 (Washington Post. 890 shares)
  2. It’s Possible to Hack a Phone With Sound Waves, Researchers Show (NY Times. 814 shares)

Reddit

  1. Samsung confirms the Note 7 is coming back as a refurbished device (The Verge. 7,193 votes)
  2. The Galaxy S8 will be Samsung’s biggest test ever (The Verge. 4,981 votes)

Try it yourself – here’s the query we used for social shares

2. Boxing

We sourced a total of 9,000+ articles categorized under Boxing and found that what goes on outside the ring can garner just as much (if not more) media interest than what happens in it.

Volume of stories published: Boxing

Top stories

The three stories that contributed to the biggest spikes in news publication volumes;

  1. Heavyweight bout between David Haye and Tony Bellew.
  2. Floyd Mayweather urges the UFC to allow him and Conor McGregor to fight.
  3. Middleweight bout between Gennady Golovkin and Daniel Jacobs.

The two biggest fights in world boxing during the month of March are clearly represented by publication spikes in the chart above, particularly the heavyweight clash between Haye and Bellew. However, and as we mentioned, it’s not all about what happens in the ring.

The second largest spike we see above was the result of Floyd Mayweather, who hasn’t fought since September 2015, pleading with the UFC to allow a ‘superfight’ with Conor McGregor to go ahead. Neither Mayweather or McGregor have competed recently, nor have they any future fights scheduled, yet they still find themselves as the two most discussed individuals in this category. The bubble chart below showing the most mentioned topics from the boxing category further highlights this.

Most mentioned topics

Most shared on social media

Facebook

  1. Floyd Mayweather ‘officially out of retirement for Conor McGregor’ fight (FOX Sports. 56,951 shares)
  2. Bad refs, greedy NBF officials frustrating boxers – Apochi (Punchng. 42,367 shares)

Linkedin

  1. David Haye has Achilles surgery after Tony Bellew defeat (BBC. 234 shares)
  2. David Haye rules out retirement as he targets Tony Bellew rematch (BBC. 130 shares)

Reddit

  1. Teenage kickboxer dies after Leeds title fight (BBC. 1,502 shares)
  2. Muhammad Ali family vows to fight Trump’s ‘Muslim ban’ after airport detention (The Independent. 1,147 shares)

3. Stocks

The graph below shows publication volumes in the Stocks subcategory throughout the month of March 2017. In total we collected just over 30,000 articles.

Volume of stories published: Stocks

Top stories

The three stories that contributed to the biggest spikes in news publication volumes;

  1. Retailer Target sees stock drop by 13.5% after consumers boycott their pro-transgender stance.
  2. The US Federal Reserve increases interest rates, adding further pressure to housing market.
  3. Oil drops below US$53 as report shows rising US crude stockpiles

Most mentioned locations

Rather than focusing solely on extracted topics for this category, we thought it would be interesting to separate mentions of both locations and organizations. The chart above shows the most mentioned locations from all 30,000 articles published under the Stocks subcategory in March:

Most mentioned organizations

The chart above shows the top mentioned organizations including well known banks, investment firms and sources. It is interesting to see the likes of Facebook, Twitter and Snapchat in the mix also.

In March we saw Barclays declare Facebook as “the stock to town for the golden age of mobile”, referring to the upcoming 3-5 year period. Earlier in the month, Snapchat closed their first day of public trading up 44% at $24.48 a share.

Most shared on social media

Facebook

  1. Trump’s Approval Rating Hits New Record Low (Slate. 39,582 shares)
  2. Target Retailer Hits $15 Billion Loss Since Pro-Transgender Announcement (Breitbart. 30,107 shares)

Linkedin

  1. How on earth did India come up with these GDP numbers? (QZ. 2,579 shares)
  2. Home Prices in 20 U.S. Cities Rise at Fastest Pace Since 2014 (Bloomberg. 1,601 shares)

Reddit

  1. Bernie Sanders and Planned Parenthood are the most popular things in America, Fox News finds (The Week. 28,075 votes)
  2. GameStop Is Going to Close at Least 150 Stores (Fortune. 4,982 votes)

Conclusion

We hope that this post has given you an idea of the kind of in-depth and precise analyses that our News API users are performing to source and analyze specific news content that is of interest to them.

Ready to try the News API for yourself? Simply click the image below to sign up for a 14-day free trial.





News API - Sign up




0

Introduction

The landscape of data is ever-changing, meaning analysts need to evolve both their thinking and data collection methods to stay ahead of the curve. In many cases, data that might have been considered unique, uncommon or unattainably expensive just a few years ago is now widely used and often very affordable. It is the analysts who take advantage of these untapped data sources, while they remain untapped, who can reap the rewards by gaining a competitive advantage before the rest of their industry or peers catch on.

This type of data is often referred to as alternative data, and with the ever-increasing levels of data available in the modern world comes the opportunity to gain unique insights, competitive industry advantage, and boosted profits. It is perhaps no surprise then to hear that the scramble to get hold of such data has been dubbed the new gold rush.

With so many of our customers here at AYLIEN using our Text Analysis and News APIs to source and analyze alternative data in the form of unstructured content, we thought we would take a look at this trend to give you an idea of how and why it is becoming so popular and important.

What is alternative data?

Alternative data can be described as data that has been derived from non-traditional sources. Data that can be used to complement traditional data sources to produce improved analytical insights that would otherwise not have been achievable with traditional data alone.

Put simply, it’s data that isn’t commonly being used within a specific industry or use case, but can potentially be used to gain a competitive advantage over those that do not have access to it.

Let’s look at investors as an example. Ask any investor what data source they could not do without and they’ll most likely say it’s their Bloomberg terminal, or a similar device. Bloomberg’s data services enable investors to easily scan financial data that has been generated by thousands of companies. The ubiquity of terminals such as Bloomberg’s means that every investor requires one to be successful. However, it therefore also makes it difficult for investors to gain any sort of competitive advantage, seeing as they’re all receiving the same data, at the same time.

So, how is alternative data being sourced and utilized?

Alternative data in use

To give you an idea of just how significant alternative data can be, and the seemingly endless channels from which it can be sourced, we’re going to look at recent examples involving three brands; Chipotle, GoPro and JCPenney.

1. Chipotle

The CEO of Foursquare, which is a search-and-discovery service mobile app, predicted a 30% drop in Q1 sales for Chipotle, based on footfall data accumulated by their app users. Foursquare could see a drop in customer footfall based on a decrease in their users ‘checking-in’ at Chipotle restaurants, and they were correct.

Screenshot 2017-03-31 at 7.01.03 PMSource: https://medium.com/foursquare-direct/foursquare-predicts-chipotle-s-q1-sales-down-nearly-30-foot-traffic-reveals-the-start-of-a-mixed-78515b2389af#.bycy28hgd

2. GoPro

In November of last year, Wall Street was shocked by the news that GoPro had reported a loss of 60 cents per share on $240.56 million in revenue, after analysts had predicted a far less severe loss of 36 cents per share on $314.06 million in revenue.

While Wall Street never saw this coming, Quandl, a financial data provider were able to foresee the drop in GoPro earnings. In this case, the data came from electronic receipts extracted from over 3 million inboxes. A significant decline in email sales receipts from GoPro’s biggest distribution channel, Amazon, was a key indicator of what was to come. The graph below shows a clear drop in Q3 sales from Amazon, in contrast to other channels;

GoPro revenue estimates by channel

Screenshot 2017-03-31 at 7.02.37 PMSource: https://blog.quandl.com/email-receipts-predicted-gopros-q3-earnings

3. JCPenney

When JCPenney reported their Q2 results in 2015, the news came as a surprise to most investors. However, some investors were not surprised at all, because they had been tracking satellite imagery of JCPenney parking lots in near real-time which showed a clear trend in increasing customer footfall.

Screenshot 2017-03-31 at 7.04.18 PMSatellite images of JCPenney parking lots indicated an increase in customer footfall

These three examples show alternative data being sourced from a social app, email receipts and satellite imagery, highlighting the breadth and variety of potential sources that can be explored.

How can NLP and Text Analysis help?

These days we have more data at our fingertips than ever before. One of the main challenges is extracting meaningful insights from it, particularly from vast amounts of unstructured data. Advancements in Natural Language Processing (NLP), Machine Learning and Text Analysis are assisting analysts in exploring data sources that previously would not have been considered worthwhile to their use case or industry.

Here are a few examples:

Social media

Social media channels such as Twitter and Facebook are a gold mine of public opinion and information just waiting to be tapped. If you want to know how the public feels about a specific brand, individual or event, social media offers an easily accessible data source that can be analyzed for trends to help predict consumer behavior.

For example, our sentiment analysis of 1.7 million tweets during last year’s Super Bowl showed Amazon to be the most talk-about brand during the game. Their ad for the Echo also received the highest level of positive sentiment. Result: the Amazon Echo shot up to second place in the bestseller list within a week.

News stories

Highly advanced yet accessible NLP advancements have made it super easy to source and analyze large amounts of news content that can be narrowed down by countless search parameter combinations to deliver precise results to the end-user.

One relevant example we have been seeing among our News API users is the analysis of press releases containing mentions of specific keywords, individuals and companies that are of interest to, for example, an investor.

An investor may have a portfolio of multiple startup companies and may also be keeping a close eye on numerous others. By automatically pulling stories containing mentions of ‘startups’, companies within their own portfolio or companies in a similar area, as soon as they are published, they are accessing the very latest information and can therefore act accordingly and without delay.

Screenshot 2017-03-31 at 5.59.52 PMSearch result from our live News API demo

Some users have even automated their trading process based on press releases, earnings announcements and news of acquisitions.

Customer reviews

The analysis of online customer reviews at scale provides an opportunity for savvy sales professionals and marketers looking to spot a need or weakness in the product or service being reviewed. For example, we analyzed the sentiment of 500 hotel reviews and, using Aspect-Based Sentiment Analysis, we were able to uncover how reviewers felt about specific hotel aspects, such as location, beds, food, staff and WiFi.

As you can see from the charts below, where green = positive and red = negative, the hotel particularly fell down in the areas of beds, WiFi and overall value. As a seller of beds or WiFi solutions, for example, we would find this data extremely useful.

Screenshot 2017-03-31 at 7.12.40 PMAspect-Based Sentiment Analysis of hotel reviews

Conclusion

While we’ve merely scratched the surface today we hope we’ve given you a useful and insightful introduction to alternative data and how it is being used more and more to gain competitive advantage in a multitude of use cases and industries.





News API - Sign up




0

Here at AYLIEN we have a team of researchers who like to keep abreast of, and regularly contribute to, the latest developments in the field of Natural Language Processing. Recently, one of our research scientists, Sebastian Ruder, attended EMNLP 2016 in Austin, Texas. In this post, Sebastian has highlighted some of the stand-out papers and trends from the conference.

30784610796_777bc5b737_o

Image: Jackie Cheung

I spent the past week in Austin, Texas at EMNLP 2016, the Conference on Empirical Methods in Natural Language Processing.

There were a lot of papers at the conference (179 long papers, 87 short papers, and 9 TACL papers all in all) — too many to read each single one. The entire program can be found here. In the following, I will highlight some trends and papers that caught my eye:

Reinforcement learning

One thing that stood out was that RL seems to be slowly finding its footing in NLP, with more and more people using it to solve complex problems:

Dialogue

Dialogue was a focus of the conference with all of the three keynote speakers dealing with different aspects of dialogue: Christopher Potts talked about pragmatics and how to reason about the intentions of the conversation partner; Stefanie Tellex concentrated on how to use dialogue for human-robot collaboration; finally, Andreas Stolcke focused on the problem of addressee detection in his talk.

Among the papers, a few that dealt with dialogue stood out:

  • Andreas and Klein model pragmatics in dialogue with neural speakers and listeners;
  • Liu et al. show how not to evaluate your dialogue system;
  • Ouchi and Tsuboi select addressees and responses in multi-party conversations;
  • Wen et al. study diverse architectures for dialogue modelling.

Sequence-to-sequence

Seq2seq models were again front and center. It is not common for a method to have its own session two years after its introduction (Sutskever et al., 2014). While in the past years, many papers employed seq2seq e.g. for Neural Machine Translation, some papers this year focused on improving the seq2seq framework:

Semantic parsing

Since seq2seq’s use for dialogue modelling was popularised by Vinyals and Le, it is harder to get it to work with goal-oriented tasks that require an intermediate representation on which to act. Semantic parsing is used to convert a message into a more meaningful representation that can be used by another component of the system. As this technique is useful for sophisticated dialogue systems, it is great to see progress in this area:

X-to-text (or natural language generation)

While mapping from text-to-text with the seq2seq paradigm is still prevalent, EMNLP featured some cool papers on natural language generation from other inputs:

Parsing

Parsing and syntax are a mainstay of every NLP conference and the community seems to particularly appreciate innovative models that push the state-of-the-art in parsing: The ACL ’16 outstanding paper by Andor et al. introduced a globally normalized model for parsing, while the best EMNLP ‘16 paper by Lee et al. combines a global parsing model with a local search over subtrees.

Word embeddings

There were still papers on word embeddings, but it felt less overwhelming than at the past EMNLP or ACL, with most methods trying to fix a particular flaw rather than training embeddings for embeddings’ sake. Pilevhar and Collier de-conflate senses in word embeddings, while Wieting et al. achieve state-of-the-art results for character-based embeddings.

Sentiment analysis

Sentiment analysis has been popular in recent years (as attested by the introductions of many recent papers on sentiment analysis). Sadly, many of the conference papers on sentiment analysis reduce to leveraging the latest deep neural network for the task to beat the previous state-of-the-art without providing additional insights. There are, however, some that break the mold: Teng et al. find an effective way to incorporate sentiment lexicons into a neural network, while Hu et al. incorporate structured knowledge into their sentiment analysis model.

Deep Learning

By now, it is clear to everyone: Deep Learning is here to stay. In fact, deep learning and neural networks claimed the two top spots of keywords that were used to describe the submitted papers. The majority of papers used at least an LSTM; using no neural network seems almost contrarian now and is something that needs to be justified. However, there are still many things that need to be improved — which leads us to…

Uphill Battles

While making incremental progress is important to secure grants and publish papers, we should not lose track of the long-term goals. In this spirit, one of the best workshops that I’ve attended was the Uphill Battles in Language Processing workshop, which featured 12 talks and not one, but four all-star panels on text understanding, natural language generation, dialogue and speech, and grounded language. Summaries of the panel discussions should be available soon at the workshop website.

This was my brief review of some of the trends of EMNLP 2016. I hope it was helpful.

 




News API - Sign up




0

General

 

Introduction

Deep Learning is a new area of Machine Learning research that has been gaining significant media interest owing to the role it is playing in artificial intelligence applications like image recognition, self-driving cars and most recently the AlphaGo vs. Lee Sedol matches. Recently, Deep Learning techniques have become popular in solving traditional Natural Language Processing problems like Sentiment Analysis.

For those of you that are new to the topic of Deep Learning, we have put together a list of ten common terms and concepts explained in simple English, which will hopefully make them a bit easier to understand. We’ve done the same in the past for Machine Learning and NLP terms, which you might also find interesting.

Perceptron

In the human brain, a neuron is a cell that processes and transmits information. A perceptron can be considered as a super-simplified version of a biological neuron.

A perceptron will take several inputs and weigh them up to produce a single output. Each input is weighted according to its importance in the output decision.

Artificial Neural Networks

Artificial Neural Networks (ANN) are models influenced by biological neural networks such as the central nervous systems of living creatures and most distinctly, the brain.

ANN’s are processing devices, such as algorithms or physical hardware, and are loosely modeled on the cerebral cortex of mammals, albeit on a considerably smaller scale.

Let’s call them a simplified computational model of the human brain.

Backpropagation

A neural network learns by training, using an algorithm called backpropagation. To train a neural network it is first given an input which produces an output. The first step is to teach the neural network what the correct, or ideal, output should have been for that input. The ANN can then take this ideal output and begin adapting the weights to yield an enhanced, more precise output (based on how much they contributed to the overall prediction) the next time it receives a similar input.

This process is repeated many many times until the margin of error between the input and the ideal output is considered acceptable.

Convolutional Neural Networks

A convolutional neural network (CNN) can be considered as a neural network that utilizes numerous identical replicas of the same neuron. The benefit of this is that it enables a network to learn a neuron once and use it in numerous places, simplifying the model learning process and thus reducing error. This has made CNNs particularly useful in the area of object recognition and image tagging.

CNNs learn more and more abstract representations of the input with each convolution. In the case of object recognition, a CNN might start with raw pixel data, then learn highly discriminative features such as edges, followed by basic shapes, complex shapes, patterns and textures.

 

source: http://stats.stackexchange.com/questions/146413

 

 

Recurrent Neural Network

Recurrent Neural Networks (RNN) make use of sequential information. Unlike traditional neural networks, where it is assumed that all inputs and outputs are independent of one another, RNNs are reliant on preceding computations and what has previously been calculated. RNNs can be conceptualized as a neural network unrolled over time. Where you would have different layers in a regular neural network, you apply the same layer to the input at each timestep in an RNN, using the output, i.e. the state of the previous timestep as input. Connections between entities in a RNN form a directed cycle, creating a sort of internal memory, that helps the model leverage long chains of dependencies.

Recursive Neural Network

A Recursive Neural Network is a generalization of a Recurrent Neural Network and is generated by applying a fixed and consistent set of weights repetitively, or recursively, over the structure. Recursive Neural Networks take the form of a tree, while Recurrent is a chain. Recursive Neural Nets have been utilized in Natural Language Processing for tasks such as Sentiment Analysis.

 

source:  http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf

 

 

Supervised Neural Network

For a supervised neural network to produce an ideal output, it must have been previously given this output. It is ‘trained’ on a pre-defined dataset and based on this dataset, can produce  accurate outputs depending on the input it has received. You could therefore say that it has been supervised in its learning, having for example been given both the question and the ideal answer.

Unsupervised Neural Network

This involves providing a programme or machine with an unlabeled data set that it has not been previously trained for, with the goal of automatically discovering patterns and trends through clustering.

Gradient Descent

Gradient Descent is an algorithm used to find the local minimum of a function. By initially guessing the solution and using the function gradient at that point, we guide the solution in the negative direction of the gradient and repeat this technique until the algorithm eventually converges at the point where the gradient is zero – local minimum. We essentially descend the error surface until we arrive at a valley.

Word Embedding

Similar to the way a painting might be a representation of a person, a word embedding is a representation of a word, using real-valued numbers. Word embeddings can be trained and used to derive similarities between both other words, and other relations. They are an arrangement of numbers representing the semantic and syntactic information of words in a format that computers can understand.

Word vectors created through this process manifest interesting characteristics that almost look and sound like magic at first. For instance, if we subtract the vector of Man from the vector of King, the result will be almost equal to the vector resulting from subtracting Woman from Queen. Even more surprisingly, the result of subtracting Run from Running almost equates to that of Seeing minus See. These examples show that the model has not only learnt the meaning and the semantics of these words, but also the syntax and the grammar to some degree.

 

 

So there you have it – some pretty technical deep learning terms explained in simple english. We hope this helps you get your head around some of the tricky terms you might come across as you begin to explore deep learning.

 

 




News API - Sign up




0

We help our users understand and classify content so they can extract insight from it. Being able to classify and tag content like news articles, blogs and web pages allows our users to manage and categorize content effectively and more importantly at scale. Up until now we’ve offered two forms of content classification/categorization, one based on IPTC Subject Codes specifically useful for our news and media customers and the second, a flexible tagging feature based on Semantic Labeling for those who wish to apply custom labels to text.

From today on, however, we’ll offer a third classification feature that’s primarily focused on providing an advertising focused classification method. This allows our Ad Tech users to tag and categorize text based on Interactive Advertising Bureau (IAB) standards.

We’re super excited about our IAB classification feature which categorizes content based on the IAB Quality Assurance Guidelines. This classification feature automatically categorizes text based into hierarchical groups based on the IAB quality assurance guideline taxonomy thus providing easily referenceable and usable tags of which you can see examples of below.

IAB QAG Taxonomy

The IAB QAG contextual taxonomy was developed by the IAB in conjunction with taxonomy experts from academia, ad measurement companies, and members of the IAB Networks & Exchanges Committee in order to define content categories on at least two different tiers, making content classification a lot more consistent across the advertising industry. The first tier being a broad level category and the second a more detailed description or more specifically a root and leaf type structure.

Example Article:

Results:

"categories": [
  {
      "leaf": {
          "confidence": 0.07787707145827048,
          "id": "IAB2-10","label":
          "Automotive>Electric Vehicle"},
      "root": {
          "confidence": 0.6789603849779564,
          "id": "IAB2","label":
          "Automotive>"
          }
      }

]

IAB classification was the most requested feature addition we’ve had over the last 6 months. As more and more companies invest in online advertising, publishers, agencies, ad networks and brands all want to be sure their ads are being displayed in the best place possible.

More Accurate Content Tagging = Better Ad Targeting

Using automatically generated IAB certified labels means, our users can intelligently categorize large amounts of content, retrospectively or in near real time, to intelligently understand and tag content. These tags can then be used to improve how content is managed and where ads are placed using intelligent semantic/contextual driven targeting powered by the IAB approved taxonomy ensuring ad impressions are displayed in the right place at the right time.

Building our classifier on the IAB QAG taxonomy means it is a lot easier for our users to build solutions and applications that conform to industry standards and integrate well with Ad Tech solutions like Open RTB, ad exchanges and platforms.

We’ve also updated our SDKs to make it quick and easy to get up and running. Check out our live IAB demo or visit our documentation to see how easy it is to start classifying text according to the IAB guidelines.





Text Analysis API - Sign up




0

Most of our users will make 3 or more calls to our API for every piece of text or URL they analyze. For example if you’re a publisher who wants to extract insight from a an article or URL it’s likely you’ll want to use more than one of our features to get a proper understanding of that particular article or URL.

With this in mind, we decided to make it faster, easier and more efficient for our users to run multiple analysis operations in one single call to the API.

Our Combined Calls endpoint, allows you to run more than one type of analysis on a piece of text or URL without having to call each endpoint separately.

  • Run multiple operations at once
  • Speed up your analysis process
  • Write cleaner, more efficient code

Combined Calls

To showcase how useful the Combined Calls endpoint can be, we’ve ran a typical process that a lot of our news and media focused users would use when analyzing URL’s or articles on news sites.

In this case, we’re going to Classify the article in question and extract any Entities and Concepts present in the text. To run a process like this would typically involve passing the same URL to the API 3 times, once for each analysis operation and following that, retrieving 3 separate results relevant to each operation. However, with Combined Calls, we’re only making 1 call to the API and retrieving 1 set of results, making it a lot more efficient and cleaner for the end user.

Code Snippet:

var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
    application_id: "APP_ID",
    application_key: "APP_KEY"
});

textapi.combined({
    "url": "http://www.bbc.com/news/technology-33764155",
    "endpoint": ["entities", "concepts", "classify"]
}, function(err, result) {
  if (err === null) {
    console.log(JSON.stringify(result));
  } else {
    console.log(err)
  }
});

The code snippet above was written using our Node.js SDK. SDKs are available for a variety of languages on our SDKs page.

Results

We’ve broken down the results below into three sections, Entities, Concepts and Classification to help with readability, but using the combined calls endpoint all of these results would be returned together.

Entities:

{
    "results": [
        {
            "endpoint": "entities",
            "result": {
                "entities": {
                    "keyword": [
                        "internet servers",
                        "flaw in the internet",
                        "internet users",
                        "server software",
                        "exploits of the flaw",
                        "internet",
                        "System (DNS) software",
                        "servers",
                        "flaw",
                        "expert",
                        "vulnerability",
                        "systems",
                        "software",
                        "exploits",
                        "users",
                        "websites",
                        "addresses",
                        "offline",
                        "URLs",
                        "services"
                    ],
                    "organization": [
                        "DNS",
                        "BBC"
                    ],
                    "person": [
                        "Daniel Cid",
                        "Brian Honan"
                    ]
                },
                "language": "en"
            }
        },

Concepts:

{
            "endpoint": "concepts",
            "result": {
                "concepts": {
                    "http://dbpedia.org/resource/Apache": {
                        "support": 3082,
                        "surfaceForms": [
                            {
                                "offset": 1261,
                                "score": 0.9726336488480631,
                                "string": "Apache"
                            }
                        ],
                        "types": [
                            "http://dbpedia.org/ontology/EthnicGroup"
                        ]
                    },
                    "http://dbpedia.org/resource/BBC": {
                        "support": 61289,
                        "surfaceForms": [
                            {
                                "offset": 1108,
                                "score": 0.9997923194235071,
                                "string": "BBC"
                            }
                        ],
                        "types": [
                            "http://dbpedia.org/ontology/Agent",
                            "http://schema.org/Organization",
                            "http://dbpedia.org/ontology/Organisation",
                            "http://dbpedia.org/ontology/Company"
                        ]
                    },
                    "http://dbpedia.org/resource/Denial-of-service_attack": {
                        "support": 503,
                        "surfaceForms": [
                            {
                                "offset": 264,
                                "score": 0.9999442627824017,
                                "string": "denial-of-service attacks"
                            }
                        ],
                        "types": [
                            ""
                        ]
                    },
                    "http://dbpedia.org/resource/Domain_Name_System": {
                        "support": 1279,
                        "surfaceForms": [
                            {
                                "offset": 442,
                                "score": 1,
                                "string": "Domain Name System"
                            },
                            {
                                "offset": 462,
                                "score": 0.9984593397878601,
                                "string": "DNS"
                            }
                        ],
                        "types": [
                            ""
                        ]
                    },
                    "http://dbpedia.org/resource/Hacker_(computer_security)": {
                        "support": 1436,
                        "surfaceForms": [
                            {
                                "offset": 0,
                                "score": 0.7808308562314218,
                                "string": "Hackers"
                            },
                            {
                                "offset": 246,
                                "score": 0.9326746054676964,
                                "string": "hackers"
                            }
                        ],
                        "types": [
                            ""
                        ]
                    },
                    "http://dbpedia.org/resource/Indian_School_Certificate": {
                        "support": 161,
                        "surfaceForms": [
                            {
                                "offset": 794,
                                "score": 0.7811847159512098,
                                "string": "ISC"
                            }
                        ],
                        "types": [
                            ""
                        ]
                    },
                    "http://dbpedia.org/resource/Internet_Systems_Consortium": {
                        "support": 35,
                        "surfaceForms": [
                            {
                                "offset": 765,
                                "score": 1,
                                "string": "Internet Systems Consortium"
                            }
                        ],
                        "types": [
                            "http://dbpedia.org/ontology/Agent",
                            "http://schema.org/Organization",
                            "http://dbpedia.org/ontology/Organisation",
                            "http://dbpedia.org/ontology/Non-ProfitOrganisation"
                        ]
                    },
                    "http://dbpedia.org/resource/OpenSSL": {
                        "support": 105,
                        "surfaceForms": [
                            {
                                "offset": 1269,
                                "score": 1,
                                "string": "OpenSSL"
                            }
                        ],
                        "types": [
                            "http://schema.org/CreativeWork",
                            "http://dbpedia.org/ontology/Work",
                            "http://dbpedia.org/ontology/Software"
                        ]
                    }
                },
                "language": "en"
            }
        },

Classification:

{
"endpoint": "classify",
            "result": {
                "categories": [
                    {
                        "code": "04003005",
                        "confidence": 1,
                        "label": "computing and information technology - software"
                    }
                ],
                "language": "en"
      }
  }

You can find more information on using Combined Calls in our Text Analysis Documentation.

We should also point out that the existing rate limits will also apply when using Combined Calls. You can read more about our rate limits here.





Text Analysis API - Sign up




0

Introduction

This is the second edition of our NLP terms explained blog posts. The first edition deals with some simple terms and NLP tasks while this edition, gets a little bit more complicated. Again, we’ve just chosen some common terms at random and tried to break them down in simple English to make them a bit easier to understand.

Part of Speech tagging (POS tagging)

Sometimes referred to as grammatical tagging or word-category disambiguation, part of speech tagging refers to the process of determining the part of speech for each word in a given sentence based on the definition of that word and its context. Many words, especially common ones, can serve as multiple parts of speech. For example, “book” can be a noun (“the book on the table”) or verb (“to book a flight”).

Parsing

Parsing is a major task of NLP. It’s focused on determining the grammatical analysis or Parse Tree of a given sentence. There are two forms of Parse trees Constituency based and dependency based parse trees.

Semantic Role Labeling

This is an important step towards making sense of the meaning of a sentence. It focuses on the detecting semantic arguments associated with a verb or verbs in a sentence and the classification of those verbs into into specific roles.

Machine Translation

A sub-field of computational linguistics MT investigates the use of software to translate text or speech from one language to another.

Statistical Machine Translation

SMT is one of a few different approaches to Machine Translation. A common task in NLP it relies on statistical methods based off bilingual corpora such as the Canadian Hansard corpus. Other approaches to Machine Translation include Rule Based Translation and Example-Based Translation.

Bayesian Classification

Bayesian classification is a classification method based on Bayes Theorem and is commonly used in Machine Learning and Natural Language Processing to classify text and documents. You can read more about it in Naive Bayes for Dummies.

Hidden Markov Model (HMM)

In order to understand a HMM we need to define a Markov Model. This is used to model randomly changing systems where it is assumed that future states only depend on the present state and not on the sequence of events that happened before it.

A HMM is a Markov model where the system being modeled is assumed to have unobserved or hidden states. There are a number of common algorithms used for hidden Markov models. The  Viterbi algorithm which will compute the most-likely corresponding sequence of states and the forward algorithm, for example, will compute the probability of the sequence of observations and both are often used in NLP applications.

In hidden Markov models, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states.

Conditional Random Fields (CRFs)

A class of statistical modeling methods that are often applied in pattern recognition and machine learning, where they are used for structured prediction. Ordinary classifiers will predict labels for a sample without taking neighboring samples into account, a CRF model however, will take context into account. CRF is commonly used in NLP (e.g. in Named Entity Extraction) and more recently in image recognition.

Affinity Propagation (AP)

AP is a clustering algorithm commonly used in Data Mining, unlike other clustering algorithms such as, k-means, AP does not require the number of clusters to be estimated before running the algorithm. A semi-supervised version of AP is commonly used in NLP.

Relationship extraction

Given a chunk of words or a piece of text determining the relationship between named entities.

 




Text Analysis API - Sign up




0

What’s Blockspring?

Blockspring is a really exciting, YC backed startup, who pitch themselves as “The world’s library of functions, accessible from everywhere you do work.” Their platform allows you to interact with a library of various APIs through a spreadsheet, simple code snippets and soon a chat interface.

The platform lets you run 1000+ functions directly from your spreadsheet or through simple code snippets for the more technically inclined. Accessing APIs with Blockspring is done through the concept of functions and they certainly have some cool APIs available to interact with in their library.

Where Blockspring gets really interesting though, is when you start to combine multiple functions. Your spreadsheet pretty much becomes a playpen where you can interact with one or multiple APIs and create powerful applications and “mashups”. Some of the examples of what can be done with Blockspring include, automating social activity and monitoring, gathering marketing data about user segments and usage, accessing public datasets, scraping websites and now even analyzing text and unstructured data all of which are really nicely showcased on their getting started page.

AYLIEN and Blockspring

Like Blockspring, we want to get the power of our API into the hands of anyone that can get value from it. We launched our own Text Analysis Add-on for google sheets last year. The add-on works in the same way as Blockspring, through simple functions and acts as an interface for our Text Analysis API. Integrating with Blockspring, however, means our users can now open up their use cases by combining our functions with other complementary APIs to create powerful tools and integrations.

All of the AYLIEN end-points are available through Blockspring as simple snippets or spreadsheet functions and getting started with AYLIEN and Blockspring is really easy.

It’s simple to get up and running:

Step 1.

Sign up to Blockspring

Step 2.

Grab your AYLIEN APP ID and API key and keep it handy. If you don’t have an AYLIEN account just sign up here.

Step 3.

Explore the getting started section to see examples of the functions and APIs available.

Step 4.

Try some of the different functions through their interactive docs to get a feel for how they work.

Step 5.

Go wild and start building and creating mashups of functions with code snippets or in Google Sheets.

PS: Don’t forget to add your AYLIEN keys to your Blockspring account in the Secrets section of your account settings. Once they’ve been added, you won’t have to do it again. 

We’re really excited to see what the Blockspring community start to build with our various functions. Over the next couple of weeks, we’ll also be showcasing some cool mashups that we’ve put together in Blockspring so keep your eyes peeled on the blog.

0

We’ve just added support for microformat parsing to our Text Analysis API through our Microformat Extraction endpoint.

Microformats are simple conventions or entities that are used on web pages, to describe a specific type of information, for example, Contact info, Reviews, Products, People, Events, etc.

Microformats are often included in the HTML of pages on the web to add semantic information about that page. They make it easier for machines and software to scan, process and understand webpages. AYLIEN Microformat Extraction allows users to detect, parse and extract embedded Microformats when they are present on a page.

Currently, the API supports the hCard format. We will be providing support for the other formats over the coming months. The quickest way to get up and running with this endpoint is to download an SDK and checkout the documentation. We have gone through a simple example below to showcase the endpoints capabilities.

Microformat Extraction in Action

The following piece of code sets up the credentials for accessing our API. If you don’t have an AYLIEN account, you can sign up here.


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
    application_id: YOUR_APP_ID,
    application_key: ‘YOUR_APP_KEY'
});

The next piece of code accesses an HTML test page containing microformats, that we have setup in codepen to illustrate how the endpoint works (check out http://codepen.io/michaelo/pen/VYxxRR.html to see the raw HTML). The code consists of a call to the microformats endpoint and a forEach statement to display any hCards detected on the page.


textapi.microformats('http://codepen.io/michaelo/pen/VYxxRR.html',
    function(err, res) {
    if (err !== null) {
        console.log("Error: " + err);
    } else {
        res.hCards.forEach(function(hCard) {
            console.log(hCard);
            console.log("n****************************************");
            console.log("End Of vCard");
            console.log("******************************************");
        });
    }
});

As you can see from the results below, there are two hcards on the page, one for Sally Ride and the other for John Glenn. The documentation for the endpoint shows the structure of the data returned by the endpoint and lists the optional hCard fields that are currently supported. You can copy the code above and paste it into our sandbox environment to view the results for yourself and play around with the various fields.

Results


{ birthday: '1951-05-26',
  organization: 'Sally Ride Science',
  telephoneNumber: '+1.818.555.1212',
  location:
   { id: '9f15e27ff48eb28c57f49fb177a1ed0af78f93ab',
     latitude: '37.386013',
     longitude: '-122.082932' },
  photo: 'http://example.com/sk.jpg',
  email: 'sally@example.com',
  url: 'http://sally.example.com',
  fullName: 'Sally Ride',
  structuredName:
   { familyName: 'van der Harten',
     givenName: 'Sally',
     honorificSuffix: 'Ph.D.',
     id: 'fe8b0d3222512769e99cd64d256eeda2cadd2838',
     additionalName: 'K.',
     honorificPrefix: 'Dr.' },
  logo: 'http://www.abc.com/pub/logos/abccorp.jpg',
  id: '7d021199b0d826eef60cd31279037270e38715cd',
  note: '1st American woman in space.',
  address:
   { streetAddress: '123 Main st.',
     countryName: 'U.S.A',
     postalCode: 'LWT12Z',
     id: '00cc73c1f9773a66613b04f11ce57317eecf636b',
     region: 'California',
     locality: 'Los Angeles' },
  category: 'physicist' }

****************************************
End Of vCard
****************************************


{ birthday: '1921-07-18',
  telephoneNumber: '+1.818.555.1313',
  location:
   { id: '265b201c7c65cee9af67cad1400c278a672b092a',
     latitude: '30.386013',
     longitude: '-123.082932' },
  photo: 'http://example.com/jg.jpg',
  email: 'johnglenn@example.com',
  url: 'http://john.example.com',
  fullName: 'John Glenn',
  structuredName:
   { familyName: 'Glenn',
     givenName: 'John',
     id: 'a1146a5a67d236f340c5e906553f16d59113a417',
     additionalName: 'Herschel',
     honorificPrefix: 'Senator' },
  logo: 'http://www.example.com/pub/logos/abccorp.jpg',
  id: '18538282ee1ac00b28f8645dff758f2ce696f8e5',
  note: '1st American to orbit the Earth',
  address:
   { streetAddress: '456 Main st.',
     countryName: 'U.S.A',
     postalCode: 'PC123',
     id: '8cc940d376d3ddf77c6a5938cf731ee4ac01e128',
     region: 'Ohio',
     locality: 'Columbus' } }

****************************************
End Of vCard
****************************************

Microformats Extraction allows you to automatically scan and understand webpages by pulling relevant information from HTML. This microformat information is easier for both humans and now machines to understand than other complex forms such as XML.





Text Analysis API - Sign up




0

Our development team have been working hard adding additional features to the API which allow our users to analyze, classify and tag text in more flexible ways. Unsupervised Classification is a feature we are really excited about and we’re happy to announce that it is available as a fully functional and documented feature, as of today.

So what exactly is Unsupervised Classification?

It’s a training-less approach to classification, which means, unlike our standard classification, that is based on IPTC News Codes, it doesn’t rely on a predefined taxonomy to categorize text. This method of classification allows automatic tagging of text that can be tailored to a users needs, without the need for a pre-trained classifier.

Why are we so excited about it?

Our Unsupervised Classification endpoint will allow users to specify a set of labels, analyze a piece of text and then assign the most appropriate label to that text. This allows greater flexibility for our users to decide, how they want to tag and classify text.

There are a number of ways this endpoint can be used and we’ll walk you through a couple of simple examples; Text Classification from a URL and Customer Service Routing of social interactions.

Classification of Text

We’ll start with a simple example to show how the feature works. The user passes a piece of text or a URL to the API, along with a number of labels. In the case below we want to find out which label, Football, Baseball, Hockey or Basketball, best represents the following article: ‘http://insider.espn.go.com/nfl/story/_/id/12300361/bold-move-new-england-patriots-miami-dolphins-new-york-jets-buffalo-bills-nfl’

Code Snippet:


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
  application_id: 'YourAppId',
  application_key: 'YourAppKey'
});

var params = {
  url: 'http://insider.espn.go.com/nfl/story/_/id/12300361/bold-move-new-england-patriots-miami-dolphins-new-york-jets-buffalo-bills-nfl',
  'class': ['basketball', 'baseball', 'football', 'hockey']
};

textapi.unsupervisedClassify(params, function(error, response) {
  if (error !== null) {
    console.log(error, response);
  } else {
    console.log("nThe text to classify is : nn",
      response.text, "n");
    for (var i = 0; i < response.classes.length; i++) {
      console.log("label - ", response.classes[i].label,
        ", score -", response.classes[i].score, "n");
    }
  }
});

Results:


The text to classify is:

"Each NFL team's offseason is filled with small moves and marginal personnel decisions... "

label -  football , score - 0.13

label -  baseball , score - 0.042

label -  hockey , score - 0.008

label -  basketball , score - 0.008

Based on the scores provided, we can confidently say, that the article is about football and should be assigned a “Football” label.

Customer Service Routing

As another example, let’s say we want to automatically determine whether a post on social media should be routed to our Sales, Marketing or Support Departments. In this example, we’ll take the following comment: “I’d like to place an order for 1000 units.” and automatically determine whether it should be dealt with by Sales, Marketing or Support. To do this, we pass the text to the API as well as our pre-chosen labels, in this case: ‘Sales’, ‘Customer Support’, ‘Marketing’.

Code Snippet:


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
  application_id: 'YourAppId',
  application_key: 'YourAppKey'
});

var params = {
  text: "I'd like to place an order for 1000 units.",
  'class': ['Sales', 'Customer Support', 'Marketing']
};

textapi.unsupervisedClassify(params, function(error, response) {
  if (error !== null) {
    console.log(error, response);
  } else {
    console.log("nThe text to classify is : nn",
      response.text, "n");
    for (var i = 0; i < response.classes.length; i++) {
      console.log("label - ",
        response.classes[i].label,
        ", score -", response.classes[i].score, "n");
    }
  }
});

Results:


The text to classify is:

I'd like to place an order for 1000 units.

label -  Sales , score - 0.032

label -  Customer Support , score - 0.008

label -  Marketing , score - 0.002

Similarily, based on the scores given on how closely the text is semantically matched to a label, we can decide that this inquiry should be handled by a sales agent rather than, marketing or support.

Divide and Conquer

Our next example deals with the idea of using the unsupervised classification feature, with a hierarchical taxonomy. When classifying text, it’s sometimes necessary to add a sub-label for finer grained classification, for example “Sports – Basketball” instead of just “sports”.

So, in this example we’re going to analyze a simple piece of text: “The oboe is a woodwind musical instrument” and we’ll attempt to provide a more descriptive classification result, based on the following taxonomy;

  • ‘music’: [‘Instrument’, ‘composer’],
  • ‘technology’: [‘computers’, ‘space’, ‘physics’],
  • ‘health’: [‘disease’, ‘medicine’, ‘fitness’],
  • ‘sport’: [‘football’, ‘baseball’, ‘basketball’]

    The taxonomy has a primary label and a secondary label, for example ‘music’ (primary) and ‘instrument, Composer’ (secondary)

    Code Snippet:

    
    var AYLIENTextAPI = require('aylien_textapi');
    var textapi = new AYLIENTextAPI({
      application_id: 'YourAppId',
      application_key: 'YourAppKey'
    });
    
    var _ = require('underscore');
    var taxonomy = {
      'music':      ['Instrument', 'composer'],
      'technology': ['computers', 'space', 'physics'],
      'health':     ['disease', 'medicine', 'fitness'],
      'sport':      ['football', 'baseball', 'basketball']
    };
    
    var topClasses = ['technology', 'music', 'health', 'sport'];
    var queryText = "The oboe is a woodwind musical instrument.";
    var params = {
      text: queryText,
      'class': topClasses
    };
    
    textapi.unsupervisedClassify(params, function(error, response) {
      if (error !== null) {
        console.log(error, response);
      } else {
        var classificationResult = '';
        console.log("nThe text to classify is : nn",
          response.text, "n");
        classificationResult = response.classes[0].label +
          " (" + response.classes[0].score + ") ";
        params = {
          text: queryText,
          'class': _.values(
            _.pick(taxonomy, response.classes[0].label)
          )[0]
        };
        textapi.unsupervisedClassify(params,
          function(error, response) {
            if (error !== null) {
              console.log(error, response);
            } else {
              classificationResult += " - " +
                response.classes[0].label +
                " (" + response.classes[0].score +
                ") ";
              console.log("Label: ", classificationResult);
            }
          }
        );
      }
    
    });
    

    Results:

    
    The text to classify is :
    
    The Obo is a large musical instrument
    
    Label    :     music (0.076)  - Instrument (0.342)
    

    As you can see from the results, the piece of text has been assigned ‘music’ as its primary label and ‘instrument’ as its secondary label.

    All the code snippets in our examples are fully functional and can be copied and pasted or tested in our sandbox. We’ll also be adding some of these and more interesting apps to our sandbox over the next week or so that will showcase some interesting use cases for Unsupervised Classification. We’d also love to hear more about how you would use this feature, so don’t hesitate to get in touch with comments or feedback.





    Text Analysis API - Sign up




    0

  • PREVIOUS POSTSPage 1 of 2NO NEW POSTS