Using NLP and Text Mining to understand how media coverage influenced the US presidential election
The 2016 US Presidential election was one of (if not the) most controversial in the nation’s history. With the end prize being arguably the most powerful job in the world, the two candidates were always going to find themselves coming under intense media scrutiny. With more media outlets covering this election than any that have come before it, an increase in media attention and influence was a given.
But how much of an influence does the media really have on an election? Does journalistic bias sway voter opinion, or does voter opinion (such as poll results) generate journalistic bias? Does the old adage “all publicity is good publicity” ring true at election time?
“My sense is that what we have here is a feedback loop. Does media attention increase a candidate’s standing in the polls? Yes. Does a candidate’s standing in the polls increase media attention? Also yes.” -Jonathan Stray @jonathanstray
Thanks to an ever-increasing volume of media content flooding the web, paired with advances in natural language processing and text analysis capabilities, we are in a position to delve deeper into these questions than ever before, and by analyzing the final sixty days of the 2016 US Presidential election, that’s exactly what we set out to do.
So, where did we start?
We started by building a very simple search using our News API to scan thousands of monitored news sources for articles related to the election. These articles, 170,000 in total, were then indexed automatically using our text analysis capabilities in the News API.
This meant that key data points in those articles were identified and indexed to be used for further analysis:
With each of the articles or stories sourced comes granular metadata such as publication time, publication source, source location, journalist name and sentiment polarity of each article. Combined, these data points provided us with an opportunity to uncover and analyze trends in news stories relating to the two presidential candidates.
We started with a simple count of how many times each candidate was mentioned from our news sources in the sixty days leading up to election day, as well as the keywords that were mentioned most.
By extracting keywords from the news stories we sourced, we get a picture of the key players, topics, organizations and locations that were mentioned most. We generated the interactive chart below using the following steps;
- We called the News API using the query below.
- We called it again, but searched for “Trump NOT Clinton”
- Mentions of the two candidates naturally dominated in both sets of results so we removed them in order to get a better understanding of the keywords that were being used in articles written about them. We also removed some very obvious and/or repetitive words such as USA, America, White House, candidate, day, etc.
Here’s the query;
You can hover your cursor over each cluster to view details;
Most mentioned keywords in articles about Hillary Clinton
Straight away, bang in the middle of these keywords, we can see FBI and right beside it, emails.
Most mentioned keywords in articles about Donald Trump
Similar to Hillary, Trump’s main controversies appear most prominently in his keywords, with terms like women, video, sexual and assault all appearing prominently.
Most media mentions
If this election was decided by the number of times a candidate was mentioned in the media, who would win? We used the following search queries to total the number of mentions from all sources over the sixty days immediately prior to election day;
Note: We could also have performed this search with a single query, but we wanted to separate the candidates for further analysis, and in doing this, we removed overlapping stories with titles that mentioned both candidates.
Here’s what we found, visualized;
Who was mentioned more in the media? Total mentions volume:
It may come as no surprise that Trump was mentioned considerably more than Clinton during this period, but was he consistently more prominent in the news over these sixty days, or was there perhaps a major story that has skewed the overall results? By using the Time Series endpoint, we can graph the volume of stories over time.
We generated the following chart using results from the two previous queries;
How media mentions for both candidates fluctuated in the final 60 days
As you would expect, the volume of mentions for each candidate fluctuates throughout the sixty day period, and to answer our previous question – yes, Donald Trump was consistently more prominent in terms of media mentions throughout this period. In fact, he was mentioned more than Hillary Clinton in 55 of the 60 days.
Let’s now take a look at some of the peak mention periods for each candidate to see if we can uncover the reasons for the spikes in media attention;
Trump’s peak period of media attention was October 10-13, as indicated by the highest red peak in the graph above. This period represented the four highest individual days of mention volume and can be attributed to the scandal that arose from sexual assault accusations and a leaked tape showing Trump making controversial comments about groping women.
The second highest peak, October 17-20, coincides with a more positive period for Trump, as a combination of a strong final presidential debate and a growing email scandal surrounding Hillary Clinton increased his media spotlight.
Excluding the sharp rise in mentions just before election day, Hillary’s highest volume days in terms of media mentions occurred from October 27-30 as news of the re-emergence of an FBI investigation surfaced.
So we’ve established the dates over the sixty days when each candidate was at their peak of media attention. Now we want to try establish the sentiment polarity of the stories that were being written about each candidate throughout this period. In other words, we want to know whether stories were being written in a positive, negative or neutral way. To achieve this, we performed Sentiment Analysis.
Sentiment Analysis is used to detect positive or negative polarity in text. Also known as opinion mining, sentiment analysis is a feature of text analysis and natural language processing (NLP) research that is increasingly growing in popularity as a multitude of use-cases emerge. Put simply, we perform Sentiment Analysis to uncover whether a piece of text is written in a positive, negative or neutral manner.
Note: The vast majority of news articles about the election will undoubtedly contain mentions of both Trump and Clinton. We therefore decided to only count stories with titles that mentioned just one candidate. We believe this significantly increases the likelihood that the article was written about that candidate. To achieve this, we generated search queries that included one candidate while excluding the other. The News API supports boolean operators, making such search queries possible.
First of all, we wanted to compare the overall sentiment of all stories with titles that mentioned just one candidate. Here are the two queries we used;
And here are the visualized results;
What am I seeing here? Blue represents articles written in a neutral manner, red in a negative manner and green in a positive manner. Again, you can hover over the graph to view more information.
What was the overall media sentiment towards Hillary Clinton?
What was the overall media sentiment towards Donald Trump?
Those of you that followed the election, to any degree, will probably not be surprised by these results. We don’t really need data to back up the claim that Trump ran the more controversial campaign and therefore generated more negative press.
Again, similar to how we previously graphed mention volumes over time, we also wanted to see how sentiment in the media fluctuated throughout this sixty day period. First we’ll look at Clinton’s mention volume and see if there is any correlation between mention volume and sentiment levels.
How to read this graph: The top half (blue) represents fluctuations in the number of daily media mentions (‘000’s) for Hillary Clinton. The bottom half represents fluctuations in the average sentiment polarity of the stories in which she was mentioned. Green = positive and red = negative.
You can hover your cursor over the data points to view more in-depth information.
Mentions Volume (top) vs. Sentiment (bottom) for Hillary Clinton
From looking at this graph, one thing becomes immediately clear; as volume increases, polarity decreases, and vice versa. What does this tell us? It tells us that perhaps Hillary was in the news for the wrong reasons too often – there were very few occasions when both volume and polarity increased simultaneously.
Hillary’s average sentiment remained positive for the majority of this period. However, that sharp dip into the red circa October 30 came just a week before election day. We must also point out the black line that cuts through the bottom half of the graph. This is a trend line representing average sentiment polarity and as you can see, it gets consistently closer to negative as election day approaches.
Mentions Volume (top) vs. Sentiment (bottom) for Donald Trump
Trump’s graph paints a different picture altogether. There was not a single day when his average polarity entered into the positive (green). What’s interesting to note here, however, is how little his mention volumes affected his average polarity. While there are peaks and troughs, there were no major swings in either direction, particularly in comparison to those seen on Hillary’s graph.
These results are of course open to interpretation, but what is becoming evident is that perhaps negative stories in the media did more damage to Clinton’s campaign than they did to Trump’s. While Clinton’s average sentiment polarity remained consistently more positive, Trump’s didn’t appear to be as badly affected when controversial stories emerged. He was consistently controversial!
Trumps lowest point, in terms of negative press, came just after the second presidential debate at the end of September. What came after this point is the crucial detail, however. Trump’s average polarity recovered and mostly improved for the remainder of the campaign. Perhaps critically, we see his highest and most positive averages of this period in the final 3 weeks leading up to election day.
Sentiment from sources
At the beginning of this post we mentioned the term media bias and questioned its effect on voter opinion. While we may not be able to prove this effect, we can certainly uncover any traces of bias from media content.
What we would like to uncover is whether certain sources (ie publications) write more or less favorably about either candidate.
To test this, we’ve analyzed the sentiment of articles written about both candidates from two publications: USA Today and Fox News.
Similar to the overall sentiment (from all sources) displayed previously, the sentiment polarity of articles from USA Today shows consistently higher levels of negative sentiment towards Donald Trump. The larger than average percentage of neutral results indicate that USA Today took a more objective approach in their coverage of the election.
USA Today – Sentiment towards Hillary Clinton
USA Today – Sentiment towards Donald Trump
Again, Trump dominates in relation to negative sentiment from Fox News. However, what’s interesting to note here is that Fox produced more than double the percentage of negative story titles about Hillary Clinton than USA Today did. We also found that, percentage-wise, they produced half as many positive stories about her. Also, 3.9% of Fox’s Trump coverage was positive, versus USA Today’s 2.5%.
Fox News – Sentiment towards Hillary Clinton
Fox News – Sentiment towards Donald Trump
These figures beg the question; how are two major news publications writing about the exact same news, with such varied levels of sentiment? It certainly highlights the potential influence that the media can have on voter opinion, especially when you consider how many people see each article/headline. The figures below represent social shares for a single news article;
Bear in mind, these figures don’t represent the number of people who saw the article, they represent the number of people who shared it. The actual number of people who saw this on their social feed will be a high-multiple of these figures. In fact, we grabbed the average daily social shares, per story, and graphed them to compare;
Average social shares per story
Pretty even, and despite Trump being mentioned over twice as many times as Clinton during this sixty day period, he certainly didn’t outperform her when it came to social shares.
Since the 2016 US election was decided there has been a sharp focus on the role played by news and media outlets in influencing public opinion. While we’re not here to join the debate, we are here to show you how you can deep-dive into news content at scale to uncover some fascinating and useful insights that can help you source highly targeted and precise content, uncover trends and assist in decision making.
To start using our News API for free and query the world’s news content easily, click here.