With our News API, our goal is to make the world’s news content easier to query, just like a database. Additionally, we leverage Machine Learning to process, normalize and analyze this content to make it easier for our users to gain access to rich and high quality metadata, and use powerful filtering capabilities that will ultimately help you to find the needle in the haystack more easily.
To this end, we have just launched two new handy features for filtering stories based on their image metadata and setting range queries for social media share counts. You can read more about these two features – which are now also available in our News API SDKs – below.
Image metadata filters
The news content published online is increasingly becoming multimodal, to the point that it is rare to find an article or a blog post that doesn’t include an image or a video. Our News API stats show that 83% of all the articles that we have in our index contain at least 1 image.
Therefore, it is important to be able to search and filter stories not just based on their textual content, but also based on their images.
To facilitate this, we now analyze each extracted image of each news article to capture its size (width and height), format and content length. Additionally, we have introduced 7 new parameters for filtering stories based on these attributes:
media.images.width.min: minimum image width (in pixels)
media.images.width.max: maximum image width (in pixels)
media.images.height.min: minimum image height (in pixels)
media.images.height.max: maximum image height (in pixels)
media.images.content_length.min: minimum image content size (in bytes)
media.images.content_length.max: maximum image content size (in bytes)
media.images.format: image format (possible values are: JPEG, PNG, GIF, SVG, ICO, TIFF, CUR, WEBP and BMP).
As an example, let’s use these parameters to retrieve stories about Golf that have an image in JPEG or PNG format that is bigger than 80kb in size:
Here’s an image returned from the search query above:
Social range filters
One of the highly popular features of our News API is its ability to sort stories based on how many times they have been shared on social media. However, if you use this to retrieve popular stories over a long period of time, you will sometimes notice that a few highly popular stories (those that have been shared 100’s of thousands of times) would come at the top, preventing you from accessing the long tail of interesting and popular stories easily.
To battle this, we have introduced the following 8 new parameters that allow you to set range (i.e. minimum and maximum) filters on social media shares counts:
social_shares_count.facebook.min: minimum number of Facebook shares
social_shares_count.facebook.max: maximum number of Facebook shares
social_shares_count.google_plus.min: minimum number of Google+ shares
social_shares_count.google_plus.max: maximum number of Google+ shares
social_shares_count.linkedin.min: minimum number of LinkedIn shares
social_shares_count.linkedin.max: maximum number of LinkedIn shares
social_shares_count.reddit.min: minimum number of Reddit shares
social_shares_count.reddit.max: maximum number of Reddit shares
To retrieve all stories that mention Donald Trump, and have been shared between 50 and 500 times on Facebook, we can use the following query:
These filters are now available across all our News API SDKs. We hope that you find these new updates useful, and we would love to hear any feedback you may have.
To start using our News API for free and query the world’s news content easily, click here.
Yesterday I was talking to a friend at a Starbucks by the River Liffey, and I explained to him how I, as a solo founder, approach taking advice from my team and advisors. I don’t think there is anything novel or special about my approach; I think it simply boils down to openness, trust and good communication.
He immediately came back to me and said: “So you’ve basically made a co-founder out of your team?” and I felt like that’s exactly what I’ve tried to do.
Having this ‘virtual’ co-founder has been a crucial step in our progress and development as a company, and in my personal development, and I always encourage other founders to engage with advisors and mentors in areas they lack expertise in, which if done right, could really provide a level of help and support you would normally get from a second or third or fourth co-founder. As a founder, you have to make tough decisions and navigate through diverse and challenging areas, so even if you have the strongest sense of intuition, you really need that second voice and that extra pair of eyeballs to validate your decisions, or at least to give you a better understanding of your intuition, and that’s where an experienced advisor can be tremendously helpful.
In my case, in addition to having the chance to work with amazingly talented, supportive and caring people on a daily basis, I’ve been lucky enough to have two brilliant advisors – Shawn Broderick and John Breslin.
Over the past couple of years, Shawn and John have helped me with various issues, from fundraising to team building to product directions and building ties with academia.
AYLIEN advisory board update
Today I’m proud to announce the addition of two new advisors to our team: Prof. Barry Smyth of UCD and INSIGHT and Dr. James (Jimi) Shanahan of UC Berkeley, Xerox and NativeX.
Both individuals are extremely well achieved, and I find it difficult to put them in a single category like “academic” or “entrepreneurial” as they’re both very well balanced between the academic/research world, and the business world, and in addition to being distinctly successful academics, they have both started, grown and sold companies. So instead, I’m going to tell you a bit more about their background and how we plan to work together in future.
Barry is a Full Professor and Digital Chair of Computer Science at University College Dublin. To date, he has published in excess of 400 scientific articles and has contributed to dozens of patents.
In 1999, Barry co-founded ChangingWorlds, bringing advanced personalization tech to the mobile sector. ChangingWorlds grew to 120 people before being acquired by Amdocs Ltd in 2008, the same year in which Barry co-founded HeyStaks Technologies Ltd – a company focused on commercializing new social search technology.
Barry received a Ph.D. in Artificial Intelligence at Trinity College in Dublin and holds a B.Sc. in Computer Science from University College Dublin.
James has over 20 years’ experience developing and researching cutting-edge information management systems that harness information retrieval, linguistics, and machine learning in applications and domains such as web search and computational advertising at companies such as NativeX, Digg, AT&T, SearchMe and Turn Inc.
A frequent speaker at various academic and commercial conferences, James has published seven books and 45 refereed papers in machine learning and information systems. As you may have guessed from the image above, James is a keen kiteboarder. In fact, he represented Ireland at the Kiteboarding World Championships in both 2014 and 2015!
James received a Ph.D. in Engineering Mathematics from The University of Bristol, United Kingdom and holds a B.Sc. in Computer Science from the University of Limerick Ireland.
Barry and Jimi have a strong knowledge of/ties to academia, and together with our other brilliant academic advisor, John Breslin, we will be working on growing our partnerships with leading universities and research institutions in Ireland and abroad. This will hopefully result in wider academic collaborations between us and other organizations, and will ultimately lead to new publications, products and internship/fellowship opportunities with us for students and researchers working in the Machine Learning and Natural Language Processing space.
If you’re interested in collaborating with us in any of these areas, please feel free to get in touch with me directly: firstname.lastname@example.org
It is a strong indicator of today’s globalized world and rapidly growing access to Internet platforms, that we have users from over 188 countries and 500 cities globally using our Text Analysis and News APIs. Our users need to be able to understand and analyze what’s being said out there, about them, their products, services, or their competitors, regardless of the locality and the language used.
Social media content on platforms like Twitter, Facebook and Instagram can provide unrivalled insights into customer opinion and experience to brands and organizations. However, as shown by the following stats, users post content in a multitude of languages on these platforms:
A look at online review platforms such as Yelp and TripAdvisor, as well as various news outlets and blogs, reveals similar patterns regarding the variety of language used.
Therefore, no matter if you are a social media analyst, or a hotel owner trying to gauge customer satisfaction, or a hedge fund analyst trying to analyze a foreign market, you need to be able to understand textual content in a multitude of languages.
The Challenge with Multilingual Text Analysis
Scaling Natural Language Processing (NLP) and Natural Language Understanding (NLU) applications – which form the basis of our Text Analysis and News APIs – to multiple human languages has traditionally proven to be difficult, mainly due to the language-dependent nature of preprocessing and feature engineering techniques employed in traditional approaches.
However, Deep Learning-based NLP methods, which have gained a tremendous amount of growing attention and popularity over the last couple of years, have proven to bring a great amount of invariance to NLP processes and pipelines, including towards the language used in a document or utterance.
At AYLIEN we have been following the rise and the evolution of Deep Learning-based NLP closely, and our research team have been leveraging Deep Learning to tackle a multitude of interesting and novel problems in Representation Learning, Sentiment Analysis, Named Entity Recognition, Entity Linking and Generative Document Models, with multiple publications to date.
Additionally, using technologies such as TensorFlow, Docker and Kubernetes, as well as software engineering best practices, our engineering team ensures this research is surfaced in our products by ensuring our proprietary models are performant and scalable, enabling us to serve millions of requests every day.
Multilingual Sentiment Analysis with AYLIEN
Today we’re excited to announce an early result of these efforts with the launch of the first version of our Deep Learning-based Sentiment Analysis models for short sentences which are now available for English, Spanish and German.
Let’s explore a couple of examples and see these new capabilities in action:
A Spanish tweet:
“Vamos!! Se ganó, valio la pena levantarse temprano, bueno el futbol todo lo vale :D”
A German tweet:
“Lange wird es mein armes Handy nicht mehr machen 🙁 Nach 5 Jahren muss ich mein Samsung Galaxy S 2 wohl bald aufgeben”
Our new models leverage the power of word embeddings, transfer learning and Convolutional Neural Networks to provide a simple, yet powerful end-to-end Sentiment Analysis pipeline which is largely language agnostic.
Additionally, in contrast to more traditional machine learning models, this new model allows us to learn representations from large amounts of unlabeled data. This is particularly valuable for languages such as German where manually annotated data is scarce or expensive to generate, as it enables us to train sentiment models that leverage small amounts of annotated data in a language to great effect.
Over the next couple of months, we will be continuing to work on improving these models as well as rolling out support for even more languages. Your feedback can be extremely helpful in shaping our roadmap, so if you have any thoughts, ideas or questions please feel free to reach out to us at email@example.com.
We are also excited about the new research that we’ve been doing on cross-lingual embeddings, which should make the process of multilingual Sentiment Analysis even easier.
In recent times deep learning techniques have become more and more prevalent in NLP tasks; just take a look at the list of accepted papers at this year’s NAACL conference, and you can’t miss it. We’ve now completely moved away from traditional NLP approaches to focus on deep learning and how it can be leveraged in language problems, as successfully as it has in both image and audio recognition tasks.
One of these approaches that has seen great success and is backed by a wave of research papers and funding is the concept of word embeddings.
For those of you who aren’t familiar with them, word embeddings are essentially dense vector representations of words.
Similar to the way a painting might be a representation of a person, a word embedding is a representation of a word, using real-valued numbers. They are an arrangement of numbers representing the semantic and syntactic information of words and their context, in a format that computers can understand.
Word embeddings can be trained and used to derive similarities and relations between words. This means that by encoding each word as a small set of unique digits, say 100, 200 digits or more even that represent the word “mother” and another set of digits that represent “father” we can better understand the context of that word.
Word vectors created through this process manifest interesting characteristics that almost look and sound like magic at first. For instance, if we subtract the vector of Man from the vector of King, the result will be almost equal to the vector resulting from subtracting Woman from Queen. Even more surprisingly, the result of subtracting Walked from Walking almost equates to that of Swam minus Swimming. These examples show that the model has not only learnt the meaning and the semantics of these words, but also the syntax and the grammar to some degree.
Relations between words according to word embeddings
As our very own NLP Research Scientist, Sebastian Ruder, explains that “word embeddings are one of the few currently successful applications of unsupervised learning. Their main benefit arguably is that they don’t require expensive annotation, but can be derived from large unannotated corpora that are readily available. Pre-trained embeddings can then be used in downstream tasks that use small amounts of labeled data.”
Although word embeddings have almost become the de facto input layer in many NLP tasks, they do have some drawbacks. Let’s take a look at some of the challenges we face with word2vec, probably the most popular and commercialized model used today.
Inability to handle unknown or OOV words
Perhaps the biggest problem with word2vec is the inability to handle unknown or out-of-vocabulary (OOV) words.
If your model hasn’t encountered a word before, it will have no idea how to interpret it or how to build a vector for it. You are then forced to use a random vector, which is far from ideal. This can particularly be an issue in domains like Twitter where you have a lot of noisy and sparse data, with words that may only have been used once or twice in a very large corpus.
No shared representations at sub-word levels
There are no shared representations at sub-word levels with word2vec. For example, you and I might encounter a new word that ends in “less”, and from our knowledge of words that end similarly we can guess that it’s probably an adjective indicating a lack of something, like flawless or careless.
Word2vec represents every word as an independent vector, even though many words are morphologically similar, just like our two examples above.
This can also become a challenge in morphologically rich, and polysynthetic languages such as Arabic, German or Turkish.
Scaling to new languages requires new embedding matrices
Scaling to new languages requires new embedding matrices and does not allow for parameter sharing, meaning cross-lingual use of the same model isn’t an option.
Cannot be used to initialize state-of-the-art architectures
As explained earlier, pre-training word embeddings on weakly supervised or unsupervised data has become increasingly popular, as have various state-of-the-art architectures that take character sequences as input. If you have a model that takes character-based input, you normally can’t leverage the benefits of pre-training, which forces you to randomize embeddings.
So while the application of deep learning techniques like word embeddings and word2vec in particular have brought about great improvements and advancements in NLP, they are not without their flaws.
At AYLIEN, Representation Learning, the wider field word embeddings comes under, is an active area of research for us. Our scientists are actively working on better embedding models and approaches for overcoming some of the challenges we mentioned.
Stay tuned for some exciting updates over the next few weeks ;).
As you may know we recently launched a new service offering, our News API, and over the past week or so we’ve been using it to run some little experiments around analyzing news content.
We wanted to use the News API to collect and analyze popular news headlines. We set out to find both similarities and differences in the way two journalists write headlines for their respective news articles and blog posts. The two reporters we selected operate in, and write about, two very different industries/topics and have two very different writing styles:
Finance: Akin Oyedele of Business Insider, who covers market updates.
Celebrity: Carly Ledbetter of the Huffington Post, who mainly writes about celebrities.
Note: For a more technical, in-depth and interactive representation of this project, check out the Jupyter notebook we created. This includes sample code and more in depth descriptions of our approach.
We set out some clear steps to follow in comparing the writings of our two selected authors:
Collect news headlines from both of our journalists
Create parse trees from collected headlines (we explain parse trees below!)
Extract information from each parse tree that is indicative of the overall headline structure
Define a simple sequence similarity metric to quantitatively compare any pair of headlines
Apply the same metric to all headlines collected for each author to find similarity
Use K-Means and tSNE to produce a visual map of all the headlines so we can clearly see the differences between our two journalists
So what exactly are parse trees?
In linguistics, a parse tree is a rooted tree that represents the syntactic structure of a sentence, according to some pre-defined grammar. Here’s an example;
For example with a simple sentence like “The cat sat on the mat”, a parse tree might look like this;
Thankfully parsing our extracted headlines isn’t too difficult. We used the Pattern Library for Python to parse the headlines and generate our parse trees.
In total we gathered about 700 article headlines for both journalists using the AYLIEN News API which we then analyzed using Python. If you’d like to give it a go yourself, you can grab the Pickled data files directly from the GitHub repository (link), or by using the data collection notebook we prepared for this project.
First we loaded all the headlines for Akin Oyedele, then we created parse trees for all 700 of them, and finally we stored them together with some basic information about the headline in the same Python object.
Then using a sequence similarity metric, we compared all of these headlines two by two, to build a similarity matrix.
To visualize headline similarities for Akin we generated a 2D scatter plot with the hope of grouping similarly structured headlines close together in a graph in groups of sorts.
To achieve this, we first reduced the dimensionality of our similarity matrix using tSNE and applied K-Means clustering to find groups of similar headlines. We also used some a nice viz library which we’ve outlined below;
tSNE to reduce the dimensionality of our similarity matrix from 700 down to 2
K-Means to identify 5 clusters of similar headlines and add some color
The chart above shows a number of dense groups of headlines, as well as some sparse ones. Each dot on the graph represents a headline, as you can see when you hover over one in the interactive version. Similar titles are as you can see grouped together quite cleanly. Some of the more stand-outish groups are;
The circular group left of center typically consists of short, snappy stock update headlines such as “Viacom is crashing”
The large circular group on the top right are mostly announcement-style headlines such as “Here come the…. “ formats.
The small green circular group towards the bottom left are similar and use the same phrases we see headlines such as “Industrial production falls more than expected” or “ADP private payrolls rise more than expected”.
Comparing the two authors
By repeating the process for our second journalist, Carly Ledbetter, we were then able to compare both authors and see how many common patterns exist between the two in terms of how they write their headlines.
We observed that roughly 50% (347/700) of the headlines had a similar structure.
Here we can see the same dense and sparse patterns, as well as groups of points that are somewhat unique to each author, or shared by both authors. The yellow dots represent our Celebrity focused author and the blue our finance guy.
The bottom right cluster is almost exclusive to the first author, as it covers the short financial/stock report headlines such as “Here comes CPI”, but it also covers some of the headlines from the first author such as “There’s Another Leonardo DiCaprio Doppelgänger”. Same could be said about the top middle cluster.
The top right cluster mostly contains single-verb headlines about celebrities doing things, such as “Kylie Jenner Graces Coachella With Her Peachy Presence” or “Kate Hudson Celebrated Her Birthday With A Few Shirtless Men” but it also includes market report headlines from the first author such as “Oil rig count plunges for 7th straight week”.
Conclusion and future work
In this project we’ve shown how you can retrieve and analyze news headlines, evaluate their structure and similarity, and visualize the results on an interactive map.
While we were quite happy with the results and found it quite interesting there were some areas that we thought could be improved. Some of the weaknesses of our approach, and ways to improve them are:
– Using entire parse trees instead of just the chunk types
– Using a tree or graph similarity metric instead of a sequence similarity one (ideally a linguistic-aware one too)
– Better pre-processing to identify and normalize Named Entities, etc.
In our next post, we’re going to study the correlations between various headline structures and some external metrics like number of Shares and Likes on Social Media platforms, and see if we can uncover any interesting patterns. We can hazard a guess already that the short, snappy Celebrity style headlines would probably get the most shares and reach on social media, but there’s only one way to find out.
If you’d like to access the data used or want to see the sample code we used head over to our Jupyter notebook.
For this blog we’re going to walk you through what it does and use some examples to showcase how useful it can be for classifying or categorizing text.
So what exactly is Semantic Labeling?
It’s an intelligent way of tagging or categorizing text, based on labels that you suggest. It’s a training-less approach to classification, which means, it doesn’t rely on a predefined taxonomy to categorize or tag textual content.
With Semantic Labeling you can provide a piece of text, specify a set of labels and the add-on will automatically assign the most appropriate label to that text. This allows for greater flexibility for add-on users to decide how they want to tag and categorize text in their spreadsheets
Our customers are using this feature for a variety of different use cases. We’ll walk you through a couple of simple ones, to show you the feature in action.
Text Classification from a URL
Say for example, I run a sports blog and I want to automatically curate and categorize lots of articles/URL’s into predefined categories that I cover on my blog and list them in a spreadsheet.
Any of the features in the add-on can be used to analyze a URL. Just choose the cell containing that URL in your spreadsheet and hit analyze. Using the Semantic Labeling feature is a little different because you need to also submit your candidate labels through the Text Analysis Add-on sidebar.
Once you choose Semantic Labeling, you’ll notice 5 label options will populate on the right. This is where you enter your categories or labels. In this case, we’re going to use the following URL and Labels.
Once you’ve selected the cell that you want to analyze and you’ve entered your labels, just hit analyze.
The add-on will then populate the results in the next 5-10 cells in that row. As is the case in the example below.
In this case the add-on chose “Football”, as the most closely related label to the article on that webpage. The add-on also displays a confidence score showing which label is “the winner”.
As you can see from the screenshot of the URL below it did a pretty nice job of recognizing the article had nothing to do with soccer or golf and was primarily about Football.
Customer Query Routing
We’ve also seen our users analyze social interactions like Tweets, Facebook comments and even Email to try and intelligently understand and tag them without the need for manual reading.
So, let’s say we want to automatically determine whether a post on social media should be routed to and dealt with by our Sales, Support or Finance Departments.
We’ll use 2 different Tweets that could be handled by different teams within a business and use the different department titles as labels.
“Are you guys down? I can’t access my account?”
“Who do I get in touch with if I want purchase your software?”
Again, choose the cells you want to analyze that contain your Tweets, add your candidate labels in the sidebar and hit analyze.
The add-on, as shown in the previous example, will populate it’s results in the next few cells showing the most appropriate label first, along with it’s score.
Again the add-on was pretty accurate in assigning the correct labels to each Tweet. The first Tweet was tagged as most relevant to support and the second one was most appropriately referred to the sales department.
This feature allows you to analyze and categorize long and short form text based off your own labels or tags. You can submit between 2 and 5 labels to the add-on and it will return the most semantically relevant tag as well as a confidence score.
This blog is an adaptation of a talk, “Computer intelligence”, delivered by our founder Parsa Ghaffari (@parsaghaffari) and Trinity College, Dublin, research fellow and founder of Wripl, Kevin Koidl (Ph.D., M.Sc. (Dipl. Wirtsch. Inf. TU)). (@koidl)
The talk is a discussion of how computer, or artificial, intelligence works, its applications in the industry and the challenges it presents. You can watch the original video here.
Artificial intelligence or Computer intelligence  is hot in the tech scene right now, it’s high priority with tech giants like Google and Facebook, journalists are writing about how it will take our jobs and our lives and it’s even hot in Hollywood (although mostly in a technophobic fashion typical of the 21st century Hollywood).
In the industry, all of a sudden AI is everywhere and it almost looks like we’re ready to replace Marc Andreessen’s famous “software is eating the world” with “AI is eating the world”.
But what exactly are we talking about when we refer to Artificial or Computer intelligence?
AI could be defined as the science and engineering of making intelligent computers and computer programs. Since we don’t have a solid definition of intelligence that is not relative to human intelligence, we can define it as the ability to learn or understand things or to deal with new or difficult situations. We also know what computers are, they’re essentially machines that are programmed to carry out a specific task. So Computer Intelligence could be seen as a combination of these two concepts: an algorithmic approach to mimicking human intelligence.
Two branches of AI
Back in the 60s, AI got to a point where it could actually do things, and that created a new branch of AI that was more practical and more pragmatic, which, as a result, got adopted and pioneered by the industry eventually. The new branch (which we call Narrow AI in this article) had different optimization goals and success metrics compared to the original, now called, General AI.
If your goal was to predict what’s going to happen next in the room where you’re sitting, one option would be to consult with a physicist who would probably take an analytical approach and use well known equations from Thermodynamics, Electromagnetism and Newtonian Physics to predict the next state of the room.
A fundamentally different approach that doesn’t require a physicist’s involvement would be to set up as many sensors as possible (think video cameras, microphones, thermometers, etc) to capture and feed all the data from the room to a Super Computer, which then runs some form of probabilistic modelling to predict the next state.
The results you get from the second approach would be far more accurate than the ones produced by the physicist. However, with the second approach you did not really understand why things are the way they are, and that’s what General AI is all about: to understand how certain things such as language, cognition, vision, etc work and how they can be replicated.
Narrow AI is a more focused application of Computer Intelligence that aims to solve a specific problem and is driven by industry, economics and results. Common use cases you will certainly have heard of include Siri on your iPhone or self-driving cars for example.
While Siri can be seen as an AI application, that doesn’t mean that the intelligence behind Siri can also power a self-driving car. The AI behind both is very different, one can’t do the other.
It’s also true that with Narrow AI the intelligence works by crunching information in set conditions for economic outputs so as an example Siri can only answer you certain questions, questions that she has the answer to or can retrieve the answer to by referencing a database.
Challenges of AI
As human beings, understanding visual and lingual information comes to us naturally, we read a piece of text and we can extract meaning, intent, feelings, information. We look at a picture and we identify objects, colours, people, places.
However, for machines it’s not that easy. Take this sentence for instance; “I made her duck”. It’s a pretty straightforward sentence, but it has multiple meanings. There are actually 4 potential meanings for that short sentence.
I cooked her some duck I forced her, to duck I made, her duck (the duck belonged to her) I made her duck (made her duck out of wood for example)
When we interpret text we rely on prompts, either syntax indicators or just context, that helps us predict the meaning of a sentence but teaching a machine to do this is a lot harder. There is a lot of ambiguity in language that makes it extremely hard for machines to understand text or language in general.
The same can be said for an image or picture, or visual information in general. As humans we can pick up and recognise certain things in an image within a matter of seconds, we know that there are a man and a dog in a picture, we recognise colours and even brands, but it takes an intelligent machine to do the same.
One of the main arguments against AI’s success is that we don’t have a good understanding of human intelligence, and, therefore, are not able to fully replicate it. A convincing counter-argument, pioneered by the likes of Ray Kurzweil, is that intelligence or consciousness is an emergent property of the comparatively simpler building blocks of our brains (Neurons) and to replicate a brain or to create intelligence, all we need to do is to understand, decode and replicate these building blocks.
Imagine you’re in a self-driving car and it’s taking you over a narrow bridge. Suddenly a person appears in front of the car (say, due to losing balance) and to avoid hitting that person the AI must take a sharp turn which will result in the car falling off the bridge. If you hit the person, they will die and if you fall off the bridge you will get killed.
One solution is for the AI to predict who’s more “valuable” and make a decision based on that. So it would factor in things like age, job status, family status and so on and boil it down to a numerical comparison between you and the other person’s “worth”. But how accurate would that be? Or would you ever buy a self-driving car that has a chance of killing you?
While some serious challenges in AI still remain open, industry and the enterprise have latched on to the benefits that AI, like Natural Language Processing, Image Recognition and Machine Learning, can bring to a variety of problems and applications.
One thing can be said for certain, and it’s that AI has left the science and research labs and is powering developments in health, business and media. Industry has recognised the potential of Narrow AI and how it can change, enhance and optimize the way we approach problems and tasks as human beings.
 The border between AI and human intelligence is getting blurred, therefore eventually we might get to a point where intelligent behaviour manifested by a machine can no longer be labeled as “artificial”. In that case, Computer Intelligence would be better suited. That said, we use the terms Computer Intelligence and Artificial Intelligence interchangeably in this article.
I have made this letter longer than usual, because I lack the time to make it short — Blaise Pascal
We live in the age of “TL;DR“s and 140 character long texts: bite-sized content that is easy to consume and quick to digest. We’re so used to skimming through feeds of TL;DRs for acquiring information and knowledge about our friends and surroundings, that we barely sit through reading a whole article unless we find it extremely interesting.
It’s not necessarily a “bad” thing though – we are getting an option to exchange breadth for depth, which gives us more control over how we acquire new information with a higher overall efficiency.
This is an option we previously did not have, as most of the content was produced in long form and often without considering reader’s time constraints. But in the age of Internet, textual content must compete with other types of media such as images and videos, that are inherently easier to consume.
Vision: The Brevity Knob
In an ideal world, every piece of content should come with a knob attached to it that lets you adjust its length and depth by just turning the knob in either direction, towards brevity or verbosity:
If it’s a movie, you would start with a trailer and based on how interesting you find it, you could turn the knob to watch the whole movie, or a 60 or 30-minute version of it.
For a Wikipedia article, you would start with the gist, and then gradually turn the knob to learn more and gain deeper knowledge about the subject.
When reading news, you would read one or two sentences that describe the event in short and if needed, you’d turn the knob to add a couple more paragraphs and some context to the story.
This is our simplistic vision for how summarization technology should work.
At AYLIEN we’ve been working on a Text Summarization technology that works just like the knob we described above: you give it some text, a news article perhaps, specify the target length of your summary, and our Summarization API automatically summarizes your text for you. Using it you can turn an article like this:
Into a handful of key sentences:
Designed to promote a healthier balance between our real lives and those lived through the small screens of our digital devices, Moment tracks how much you use your phone each day, helps you create daily limits on that usage, and offers “occasional nudges” when you’re approaching those limits.
The app’s creator, Kevin Holesh, says he built Moment for himself after realizing how much his digital addictions were affecting his real-world relationships.
My main goal with Moment was make me aware of how many minutes I’m burning on my phone each day, and it’s helped my testers do that, too.”
The overall goal with Moment is not about getting you to “put down your phone forever and go live in the woods,” Holesh notes on the app’s website.
There’s also a bonus function in the app related to whether or not we’re putting our phone down in favor of going out on the town, so to speak – Moment can also optionally track where you’ve been throughout the day.
Today we’re happy to announce a new version of our Summarization API that has numerous advantages over the previous versions and gives you more control over the length of the generated summary.
Two new parameters sentences_number and sentences_percentage allow you to control the length of your summary. So to get a summary that is 10% of the original text in length, you would make the following request:
Making API requests one by one can be inefficient when you have a large number of documents you wish to analyze. We’ve added a batch processing feature that makes it easier to process a large number of documents all at once using the Text Analysis API.
Steps to use this feature are as follows:
Step 1. Package all your documents in one file
Start by putting all your documents (or URLs) in one big text file – one document/URL per line. Example:
Time is an illusion. Lunchtime doubly so.
For a moment, nothing happened. Then, after a second or so, nothing continued to happen.
Step 2. Make a batch request and obtain job identifier
Calling the /batch endpoint creates a new analysis job that will be processed eventually. There are a couple of parameters that you need to provide to /batch:
Data to be analyzed †
Comma separated list of Text Analysis API endpoints
Will upload contents of file /home/amir/42, and indicates that each line is a text (not a URL), desired operation is sentiment analysis, and you wish to download the results in XML format.
A successful request will lead to a 201 Created, with a Location header which indicates the URI you can poll to get the status of your submitted job. For you convenience URI is also in the body of response.
Step 3. Poll the job status information until it is finished
You can call the URI obtained from last step to see the status of your job. Your job can be in either one of these states: pending, in-progress, failed, or completed. If your job is completed you’ll receive 303 See Other with a Location header indicating where you can download your results. Its also in the body of your response. Example:
location value obtained from the last step, is a pre-signed S3 Object URL which you can easily download using curl, or wget. Please note that results will be kept only for 7 days after the job is finished and will be deleted afterwards. If you fail to obtain the results during this period, you must re-submit your job.
We are delighted to announce that Dr. John Breslin has joined AYLIEN as an advisor. John has a unique blend of academia and industry in his background and he’s currently a lecturer at NUI Galway and Insight Centre.