Data Science

Web Summit 2016 according to Twitter: Text mining 80,000 tweets

Intro

Dubbed as Europe’s largest technology marketplace and Davos for geeks, the Web Summit has been going from strength to strength in recent years as more and more companies, employees, tech junkies and media personnel flock to the annual event to check out the latest innovations, startups and a star-studded lineup of speakers and exhibitors.

20161108_125147

Having grown from a small gathering of around 500 like-minded people in Dublin, this year’s event, which was held in Lisbon for the first time, topped 50,000 attendees representing 15,000 companies from 166 countries.

With such a large gathering of techies, there was bound to be a whole lot of chatter relating to the event on Twitter. So being the data geeks that we are, and before we jetted off to Lisbon ourselves, we turned our digital ears to Twitter and listened for the duration of the event to see what we could uncover.

Our process

We collected a total of just over 80,000 tweets throughout the event by focusing our search on keywords, Twitter handles and hashtags such as ‘Web Summit’, #websummit, @websummit, etc.

We used the following tools to collect, analyze and visualize the data;

And here’s what we found;

What languages were the tweets written in?

In total, we collected tweets written in 42 different languages.

Out of our 80,000 tweets, 60,000 were written in English, representing 75% of the total volume.

The pie chart below shows all languages, excluding English. As you can see, Portuguese was the next most-used language with just under of 11% of tweets being written in the host country’s native tongue. Spanish and French tweets represented around 2.5% of total volume each.

How did tweet volumes fluctuate throughout the week?

The graph below represents hourly tweet volume fluctuations throughout the week. As you can see, there are four distinct peaks.

While we can’t list all the reasons for these spikes in volume, we did find a few recurring trends during these times, which we have added to the graph;

Let’s now take a more in-depth look at each peak.

What were the causes of these fluctuations?

By adding the average hourly sentiment polarity to this graph we can start to gather a better understanding of how people felt while writing their tweets.

Not familiar with sentiment analysis? This is a feature of text analysis and natural language processing (NLP) that is used to detect positive or negative polarity in text. In short, it tells us whether a piece of text, or a tweet in this instance, has been written in a positive, negative or neutral way. Learn more.

Interestingly, each tweet volume peak correlates with a sharp drop in sentiment. What does this tell us? People were taking to Twitter to complain!

Positivity overall

Overall, average sentiment remained in the positive (green) for the entire week. That dip into negative (red) that you can see came during the early hours of Day 2 as news of the US election result broke. Can’t blame the Web Summit for that one!

 

We can also see distinct rises in positive sentiment around the 5pm mark each day as attendees took to Twitter to reflect on an enjoyable day.

— Matt Terrington (@Matt_Terrington) November 9, 2016

 

Sentiment also remained comparatively high during the later hours of each day as the Web Summit turned to Night Summit – we’ll look at this in more detail later in the post.

20161110_182007

Mike, Afshin, Noel & Hamed after a hectic but enjoyable day at the Web Summit

What was the overall sentiment of the tweets?

The pie chart below shows the breakdown of all 80,000 tweets, split by positive, negative and neutral sentiment.

The majority of tweets (80%) were written in a neutral manner. 14% were written with positive sentiment, with the remaining 6% written negatively.

To uncover the reasons behind both the positive and negative tweets, we extracted and analyzed mentioned keywords to see if we could spot any trends.

What were the most common keywords found in positive tweets?

We used our Entity and Concept Extraction features to uncover keywords, phrases, people and companies that were mentioned most in both positive and negative tweets.

As you can imagine, there were quite a few keywords extracted from 80,000 tweets so we trimmed it down by taking the following steps;

  • Sort by mention count
  • Take the top 100 most mentioned keywords
  • Remove obvious or unhelpful keywords (Web Summit, Lisbon, Tech, etc)

And here are our results. You can hover over individual clusters to see more information.

We can see some very positive phrases here, with great, amazing, awesome, good, love and nice featuring prominently.

The most mentioned speaker from the positive tweets was Gary Vaynerchuk (@garyvee), which makes sense considering the sharp rise in positive sentiment we saw his fans produce earlier in this post on our sentiment-over-time graph.

What were the most common keywords found in negative tweets?

We took the exact same approach to generate a list of the most mentioned keywords from tweets with negative sentiment;

For those of you that attended Web Summit, it will probably come as no surprise to see WiFi at the forefront of the negativity. While it did function throughout the event, many attendees found it unreliable and too slow, leading to many using their own data and hotspotting from their cell phones.

Mentions of queue, long, full, lines and stage are key indicators of just how upset people became while queueing for the opening ceremony at the main stage, only for many to be turned away because the venue became full.

The most mentioned speaker from negative tweets was Dave McClure (@davemcclure). The 500 Startups Founder found himself in the news after sharing his views on the US election result with an explosive on-stage outburst. It should be noted that just because Dave was the most mentioned speaker from all negative tweets, it doesn’t necessarily mean people were being negative towards him. In fact, many took to Twitter to support him;

 

Much of the negativity came from people simply quoting what Dave had said on stage, which naturally contained high levels of negative sentiment;

 

Which speakers were mentioned most?

Web Summit 2016 delivered a star-studded line up of a total of 663 speakers. What we wanted to know who was, who was mentioned most on Twitter?

By combining mentions of names and Twitter handles, we generated and sorted a list of the top 25 most mentioned speakers.

Messrs Vaynerchuk and McClure once again appear prominently, with the former being the most mentioned speaker overall throughout the week. Joseph Gordon-Levitt, actor and Founder of HitRECord, came in in second place, followed by Web Summit founder Paddy Cosgrave.

Which airline flew to Lisbon with the happiest customers?

With attendees visiting Lisbon from 166 countries, we thought it would be cool to see which airline brought in the happiest customers. By extracting mentions of the airlines that fly in to Lisbon, we could then analyze the sentiment of the tweets in which they were mentioned.

For most airlines, there simply wasn’t enough data available to analyze. However, we did find enough mentions of Ryanair and British Airways to be able to analyze and compare.

Here’s what we found;

Ryanair vs. British Airways

The graph below is split into three levels of sentiment – positive, neutral and negative. Ryanair is represented in blue and British Airways in red.

It’s really not hard to pick a winner here. British Airways were not only mentioned in more positive tweets, they were also mentioned in considerably less negative tweets.

 

Night Summit: which night saw the highest tweet volumes?

In total we found 593 mentions of night summit. The graph below shows tweet volumes for each day, and as you can see, November 7 was a clear winner in terms of volume.

..and which morning saw the most hangovers?!

Interestingly, we found a correlation between low tweet volumes (mentioning Night Summit, #nightsummit, etc.) and higher mentions of hangovers the following day!

59% of tweets mentioning hangover, hungover, resaca, etc, came on November 10 – the day after the lowest tweet volume day.

35% came on November 9 while just 6% came on November 8 – the day after the highest tweet volume day.

What do these stats tell us? Well, while we can’t be certain, we’re guessing that the more people partied, the less they tweeted. Probably a good idea 🙂

Conclusion

In today’s world, if someone wants to express their opinion on an event, brand, product, service, or anything really, they will more than likely do so on social media. There is a wealth of information published through user generated content that can be accessed in near real-time using Text Analysis and Text Mining solutions and techniques.

Wanna try it for yourself? Click the image below to sign up to our Text Analysis API with 1,000 free calls per day.

 





Text Analysis API - Sign up




Author


Avatar

Noel Bambrick

Customer Success Manager @ AYLIEN A graduate of the Dublin Institute of Technology and Digital Marketing Institute in Ireland, Noel heads up Customer Success here at AYLIEN. A keen runner, writer and traveller, Noel joined the team having previously gained experience with SaaS companies in Australia and Canada. Twitter: @noelbambrick