Data Science, Product

Create an in-depth and intelligent end-of-year review posts using NLP and Text Analysis

Intro

As 2016 draws to a close and another year of news stories are added to the archives, many news aggregators and online media sites will be publishing year in review style posts in the coming days, in which they look back on the biggest events and stories from their niche or genre in 2016.

With this in mind, we thought it would be cool to show you the kind of intelligent insights that our users are achieving using the NLP and Text Analysis capabilities in our News API. To do this, we decided to take a close look at three news categories and the stories generated within each one in an attempt to uncover publishing trends, the most popular stories, the most mentioned people and organizations, and other insights of note.

Categories

The three categories we have decided to look at are;

  • Science
  • Arts & Entertainment
  • Sport

We’ve decided not to include the Politics category as we’re sure you have probably seen, heard and read enough about the US Presidential Election this year! We did however do a pretty cool study on Using NLP and Text Mining to understand how media coverage influenced the US presidential election that is definitely worth checking out.

So, where did we start?

We started by building simple search queries using our News API to scan thousands of monitored news sources for articles related to each category. These articles, over 2.25 million in total, were then indexed automatically using our text analysis capabilities in the News API.

This meant that key data points in those articles were identified and indexed to be used for further analysis:

  • Keywords
  • Entities
  • Concepts
  • Topics

With each of the articles or stories sourced comes granular metadata such as publication time, publication source, source location, journalist name and sentiment polarity of each article. Combined, these data points provided us with an opportunity to uncover and analyze trends in news stories over the course of the year.

Science

2016 truly was an interesting year across all news categories, and the world of scientific discovery was no exception. The past year has seen some major scientific breakthroughs, beginning with the announcement in February that a groundbreaking detection of gravitational waves was finally confirmed, to the discovery of Proxima b, an Earth-sized exoplanet located just under five light years from Earth.

To begin our analysis of the Science category, we will first look at publishing volumes and trends to see if we can spot any high-volume news periods throughout the year. By graphing total daily volumes of stories from this category from January through December, we can do just that.

Publishing volumes and trends

The graph below shows the daily news volumes for the Science category. We have labelled some of the more interesting spikes to give you an idea of how certain stories effected news publication volumes throughout the year.

 

Despite being a year in which Einstein’s theory on gravitational waves was finally proven, an AI computer beats a Go world champion, and a European robot was landed on Mars (albeit unsuccessfully), by far the biggest spike in news stories came just after a US Presidential debate in which Donald Trump denied that climate change was real. Perhaps there is no escaping Politics after all!

Most popular day(s) to publish

There was no stand-out day of the week in which more content is published in the Science category. While the earlier days of the week (Mon, Tue, Wed) appear to be favored somewhat, there does not appear to be any obvious trends in relation to publication times in this category.

Most active publishers in this category

The graph below shows the most active publishers in the Science category by story volume in 2016. To view more information, you can hover over and click each individual bubble.

 

Most Facebook shares for stories in this category

  1. 26 Pictures That Will Make You Re-Evaluate Your Entire Existence (BuzzFeed – 2,629,653 shares)
  2. Well: The Scientific 7-Minute Workout (NY Times – 405,721)
  3. Nasa-funded study: industrial civilisation headed for ‘irreversible collapse’? (The Guardian – 222,483 shares)

Most mentioned organizations

 

Perhaps unsurprisingly, NASA were by far the most mentioned organization in the Science category in 2016. The European Space Agency made the news as their ExoMars probe approached Mars for landing, only to fail at that final stage. SpaceX received plenty of media attention after launching Falcon 9 on a cargo resupply mission to the International Space Station for NASA and, for the first time in history, landed the first-stage of the Falcon 9 back on a droneship in the Atlantic Ocean.

Arts & Entertainment

As we’re sure you’ve noticed, the past year has seen a number of high-profile celebrity deaths, with David Bowie, Prince, Muhammed Ali and Gene Wilder, among many others, dominating news and social media with their passing. It is no surprise then, particularly within this category, that spikes in news story volumes have occurred around the times of these deaths.

Publishing volumes and trends

 

Most popular day to publish

Interestingly, the majority of volume spikes seen on the graph above occur on Thursdays.

Most active publishers in this category

 

Most Facebook shares in this category

  1. This “All About That Bass” Cover Will Make Every Mom Crack Up – 914,068 shares (TMZ)
  2. MASH’s ‘Colonel Potter’ dies aged 96 – 248,902 shares (ABC Australia)
  3. EDM Bubble bursts for haters  – 177,957 shares (Huffington Post)

Most mentioned artists/entities

 

Sport

2016 saw Rio de Janeiro host the Olympic games in which swimming legend Michael Phelps racked up his 22nd gold medal. This achievement, combined with a controlled detonation of a suspect device at the games and the news that Lionel Messi was reversing his retirement decision gave us our highest story news volume day of 2016; August 12.

 

Most popular day to publish

Looking the graph above, we can see that there is a definite trend in higher volumes of sport stories being published on Wednesdays and Thursdays.

Most active publishers in this category

 

Most mentioned sports teams

 

Despite failing to win the EPL title in the past number of years, Manchester United’s global dominance in terms of popularity and media attention is showing no signs of slowing down. Last season’s surprise winners, Leicester City, appear prominently, as do Euro 2016 winners Portugal.

The sheer dominance of soccer clubs perhaps shows us just how much of an appeal the sport has all around the world, as even smaller EPL clubs such as West Ham and Everton received more mentions than the likes of the New York Yankees and Golden State Warriors.

Football fan? Check out our sentiment analysis of 1.8million tweets from Super Bowl 50.

Video content is king?

While video marketing is nothing new, it certainly has been receiving increasing attention in recent time as more and more marketing studies and industry experts play up its ever growing importance. It was predicted that video will account for 69% of all Internet traffic by 2017, so we wanted to see if the news content we have collected and analyzed is showing any evidence to support this prediction.

Our News API gives you the ability to search and sort news stories by video and image volume, so that’s exactly what we did. By graphing the average number of videos per story, in each category, we uncovered the following trends;

Average number of videos per story

 

What’s immediately evident here is that the Arts & Entertainment category generated a considerably higher volume of video content in 2016 than the other two categories, Science and Sport. Interestingly, and in line with industry predictions, the average number of videos in each published story grew steadily in 2016 across each category. This is particularly evident in the Arts & Entertainment category, but there is also a definite increase from year-start to year-end in Science and Sport.

Conclusion

We hope that this post has given you an idea of the kind of in-depth and precise analyses that our News API users are performing to create end of year review style posts. Ready to try the News API for yourself? Simply click the image below to sign up for a 14-day free trial.





News API - Sign up




Author


Avatar

Noel Bambrick

Customer Success Manager @ AYLIEN A graduate of the Dublin Institute of Technology and Digital Marketing Institute in Ireland, Noel heads up Customer Success here at AYLIEN. A keen runner, writer and traveller, Noel joined the team having previously gained experience with SaaS companies in Australia and Canada. Twitter: @noelbambrick