Introducing Real-Time Clustering, Multilingual NLP, and Translated Content

In our biggest feature rollout to date, we’re bringing a whole new dimension of investigation and discovery capabilities to our customer’s processes and solutions built with our News API.

Introducing:

  • Real-Time Clustering: An automatic topic and event discovery algorithm that groups similar articles together in clusters to provide a 50,000-foot view of the world’s news landscape
  • Multilingual NLP: Further language support for global news coverage with 14 languages supported to provide extended global coverage
  • Translated Content: Our in-house neural machine translation system translates every non-English article ingested in the News API to English

Check out our interactive demo to see each of the features in action. Or start your free News API trial. Existing News API users should contact their account manager for access to Clustering and Multilingual content by filling out our contact sales form.

These features greatly improve our customers’ search, discovery, and investigation workflows and processes by providing a 360-degree view of trending topics and events that matter.

Some of our customers are already benefiting from our private release by leveraging clustering, MT, and Multilingual NLP in their search and discovery processes and solutions.

  • News and Media platforms: We work with editorial teams and newsrooms to help them understand and deduplicate news content across languages and regions helping them discover and publish emerging stories more efficiently while reducing bias by gathering multiple perspectives on events.
  • Risk Intelligence solutions: We have drastically improved the efficiency and accuracy of the traditional risk analyst and adverse media screening processes by identifying all the events that represent a risk or opportunity in an industry sector, geography or language.
  • Financial Analytics solutions: We’re helping analysts and asset managers identify alpha rich events by identifying topics and events of interest relating to sectors or investment portfolios through proactive search and advanced modeling.

Let’s look at each feature in detail and dive into how they’re used.

Real-Time Clustering

Our NLP models add tags and enrichments (25+ data points in fact) to every single article ingested in the News API. These data points include metadata and semantic information, such as who or what is mentioned in an article, what topics are covered, the sentiment of that article and so on. 

Having this in-depth understanding of each article means we can effectively group articles that cover the same event or topic together in real-time, regardless of the time of publication, source, or even the language articles are written in. 

The ability to segment and group the world’s news at a topic or event level enables our users to greatly improve the efficiency and accuracy of their applications and processes through:

  • Event detection workflows
  • Automatic topic discovery 
  • Deduplication of news stories

You can read more about getting started with the Clusters endpoint in our documentation and below you’ll find a visualization of an example cluster retrieved using the Clusters endpoint.

Multilingual NLP

An inherent problem with Media Intelligence initiatives and solutions is that most organizations are forced to concentrate their analysis on English-first publications. This is due to several reasons:

  1. Limited access to content
  2. Substandard analysis and search capabilities on multilingual content
  3. The lack of a multilingual workforce

Particularly in the Risk Analysis space, analysts tracking international events need to be sure they can accurately identify events that matter, but also tag and categorize those events correctly, irrespective of what language they’re published in. Moreover, for certain risk events such as crime, terror or civil unrest, getting access to local regional media is paramount to their discovery and investigation process.

We’ve greatly improved the News API’s capabilities to ingest and analyze content on a global scale with the addition of full NLP support for 14 languages (Turkish, Arabic, and Chinese content and more). This means for every data point and NLP enrichment we add for English content we do the same for articles in European languages, Turkish, Chinese and Arabic, which provides for increased coverage and enhanced search and discovery capabilities.

By ingesting and analyzing multilingual content, we make it easy to both detect and fully understand stories and events in different geographies. We also identify and track the long tail of events that may not be covered by English-speaking publications.

You can read more about getting started with Multilingual content in our documentation and below you’ll find an example of an event breaking on local non-english sources prior to being picked up by an English language publisher 1 hour and 20 minutes later.

* A note on content coverage: We’re adding new sources to the News API every day and have a prioritized list of multilingual sources that we’re working through. If you have particular language or source requirements please let us know using the following form.

Translated Content

From reports of flooding in Spain to civil unrest in Hong Kong, endless amounts of valuable information published on the web about significant events are reported on every minute of every hour, in a wide variety of human languages. But without an army of multilingual agents or analysts, specific events reported in languages other than English become inaccessible to most teams and processes. 

This makes a multi-lingual investigation process massively inefficient and extraordinarily costly, not to mention the bias which can be injected as analysts rely on content published in their native or supported language.

As part of our analysis pipeline, we’ve now added our own proprietary Neural Machine Translation system for 14 languages. For each of the 14 non-English languages supported in the News API, we now offer the original and translated text as part of each story object. This allows you to not only source and analyze the stories that matter to you but to also research and investigate events more accurately and efficiently.

Below you’ll find the same article we discovered above translated to English:

You can read more about getting started with Multilingual content in our documentation and below you’ll find an example of breaking news story translated from Spanish to English.

* A note about quality: Typically our translations are on a par with the state of the art in the industry and it varies across language pairs. From a quality point of view, our translations may not be of the quality that you would publish the translated text online, however, they are completely legible and therefore support an analyst or investigation workflow very well. You can also visit our research hub to read more about our work in machine translation and cross-lingual NLP.

Ready to get started?

Our advanced discovery and investigation features are available in our Advanced and Enterprise Packages and you can try them for free for 14 days by signing up for our free trial or if you’re an existing News API user contact your account manager by filling out our contact sales form.


News API - Sign up
Let's Talk