What does the news tell us about the Panama Leaks? (News Analysis case study)
Dubbed the biggest leak of its kind ever, bigger than WikiLeaks and Edward Snowden’s leak in 2013, the Panama Leaks has shed light on how the world’s rich and famous are moving and hiding money across the globe.
The information, released by the International Consortium of Investigative Journalists following a tip off from an anonymous source connected to German Newspaper Süddeutsche Zeitung, is a cache of over 11 Million documents which show how money is laundered through offshore accounts and entities.
— ICIJ (@ICIJorg) April 3, 2016
The documents, which are not entirely public yet, show how the world’s super rich exploit international tax regimes in order to hide money and assets.
At the center of the controversy is a Panamanian law firm, Mossack Fonseca, who helped clients route and move money all over the world. Those incriminated in the leak range from world leaders such as Vladimir Putin to soccer stars such as Lionel Messi.
If you want to follow the action live, check out this reddit live page.
So why are we so interested in the leak at AYLIEN?
Well for one, we’re data geeks and the thought of mining a 2.6TB leak greatly appeals to us 😉 and apart from the fact that we care, we also regularly use world events like this to showcase our technology and solutions.
When the news broke, we thought, wouldn’t it be cool to mine the reports. We wanted to look for interesting data points like people mentioned, organizations, locations, topics discussed and so on. In total there is thought to be over 210,000 entities named and the documents dating as far back as the 70’s in a collection of emails, contracts, transcripts, photos and even passports. That’s a lot of interesting data to mine!
The actual documents haven’t been released yet, but there has been a massive amount of chatter on the subject across news outlets, blogs and social media. Using our News API we decided to concentrate on what news outlets were saying by mining news content from across the world with the goal of extracting the same insights we mentioned above.
So, what did we do?
We started by building a very simple search using our News API to scan thousands of monitored news sources for articles related to the leak. In total we collected over 4,000 articles which were then indexed automatically using our text analysis capabilities in the News API.
This meant that key data points in those articles were identified and indexed to be used for further analysis:
“Panama Leaks” OR “panama papers” OR “Mossack Fonseca” (Try it in our demo)
Note: The visualizations we created below were generated at the time of the analysis. Given the rate of new content surfacing we plan on updating these regularly.
With the stories gathered we decided to dive into them and attempt to extract any interesting data points that the API could surface.
The first thing we looked at was how the story developed over the past few days following the original story breaking. You can see the news chatter around the topic developing on the evening of April 3rd when The Guardian ran their original story; Revealed: the $2bn offshore trail that leads to Vladimir Putin.
We used the Time Series endpoint in the News API to graph the volume of stories over time.
The graph shows how the volume of stories increases as the story spreads and other timezones come online. We’ve noted some of the more prominent stories by choosing the ones with the highest volume of social shares, which can be easily extracted with our API.
Volume over time:
What, Who and Where?
The second thing we wanted to look at was what was being discussed, which individuals, organizations and countries in particular were mentioned in the articles and how often were they mentioned.
We used the API call below to extract any mentions of Entities and Concepts in the articles indexed. The main entities we were focusing included; keywords, people, organizations and countries.
API Call Entities:
API Call Keywords:
The final piece of analysis, while quite basic, was surprisingly interesting. Using the News API’s Trends endpoint, we looked at how the entities and concepts extracted developed over time as more and more stories broke.
It’s clear the likes of Vladimir Putin was implicated from the start but it’s interesting to see how the likes of David Cameron, Lionel Messi and Xi Jinping were only mentioned following further investigation and coverage.
We’re planning on running some further analysis as the story develops. Stay tuned to the blog for updates to the data viz’s and further blog posts.
If you you’d like to try it for yourself just create your free News API account and start collecting and analyzing stories. Our News API is the most powerful way of searching, sourcing and indexing news content from across the globe. We crawl and index thousands of news sources every day and analyze their content using our NLP-powered Text Analysis Engine to give you an enriched and flexible news data source.