Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Your address will show here +12 34 56 78

Introduction

Super Bowl 51 had us on the edge of our seats. A dramatic comeback and a shocking overtime finish meant the 111.3 Million Americans who tuned into the event certainly got what they came for. Even though TV viewership was down on previous years, the emotional rollercoaster that was Sunday’s game will certainly go down as one of the greatest.

As with any major sporting event, the Super Bowl creates an incredible amount of hype, particularly on Social Media. All of the social chatter and media coverage around the Super Bowl means it’s a fantastic case study in analyzing the voice of fans and their reactions to the event. Using advanced Machine Learning and Natural Language Processing techniques, such as Sentiment Analysis, we are able to understand how fans of both the Patriots and the Falcons collectively felt at any given moment throughout the event.

Not familiar with Sentiment Analysis? Sentiment Analysis is used to detect positive or negative polarity in text and can help you understand the split in opinion from almost any body of text, website or document.

Our process

We used the Twitter Streaming API to collect a total of around 2.2 million tweets that mentioned a selection of game and team-related keywords, hashtags and handles. Using the AYLIEN Text Analysis API, we analyzed each of these tweets and visualized our results using Tableau. In particular, we were interested in uncovering and investigating the following key areas:

  • Volume of tweets before, during and after the game
  • Sentiment of tweets before, during and after the game
  • Team-specific fan reactions
  • The most tweeted players
  • The most popular Super Bowl hashtag

Keyword selection

We focused our data collection on keywords, hashtags and handles that were related to Super Bowl 51 and the two competing teams, including;

#SB51, #superbowl, #superbowlLI, #superbowl51, #superbowl2017, #HouSuperBowl, #Patriots, #NEPatriots, #newenglandpatriots, #Falcons, #AtlantaFalcons.

Once we collected all of our tweets, we spent a bit of time cleaning and prepping our data set, first by disregarding some of the metadata which we felt we didn’t need. We kept key indicators like time stamps, tweet ID’s and the raw text of each tweet. We also removed retweets and tweets that contained links. From previous experience, we find that tweets containing links are mostly objective and generally don’t hold any author opinion towards the event.

Tools we used

Visualizations

Like with many of our data-driven blog posts, we used Tableau to visualize our results. All visualizations are interactive and you can hover your mouse over each one to dive deeper into the key data points from which they are generated.

We began our analysis of Super Bowl 51 by looking at the overall volume of tweets in the lead up and during the game.

Tweet volume over time: all tweets

The graph below represents minute-by-minute fluctuations in tweet volumes before during and after the game. For reference, we’ve highlighted some of the key moments throughout the event with the corresponding spikes in tweet volume.

As you can see, there is a definite and steady increase in tweet volume in the period leading up to the game. From kickoff, it is then all about reactions to in-game highlights, as seen by the sharp spikes and dips in volumes. We’ve also highlighted the halftime period to show you the effect that Lady Gaga’s performance had on tweet volumes.

Let’s now take a closer look at the pre-game period and in particular, fan predictions.

Pre-game tweet volume: #PatriotsWin vs. #FalconsWin

For the past 13 years, video game developers EA Sports have been using their football game ‘Madden NFL’ to simulate and predict the winner of the Super Bowl each year. They now have a 10-3 success-failure rate, in case you were wondering! In recent times, they have also been inviting the Twittersphere to show their support for their team by using a certain hashtag in their tweets. For 2017, it was #PatriotsWin vs. #FalconsWin.
So, which set of fans were the most vocal in the 2017 #MyMaddenPrediction battle? We listened to Twitter in the build up to the game for mentions of both hashtags, and here’s what we found;

58.57% of tweets mentioned #FalconsWin while 41.43% went with #PatriotsWin. While the Patriots were firm pre-game favorites, it is likely that the neutral football fan on Twitter got behind the underdog Falcons as they chased their first ever Super Bowl win, in just their second appearance.

Tweet volume over time by team

Now that we’ve seen the overall tweet volume and the pre-game #MyMaddenPrediction volumes, let’s take a look at tweet volumes for each individual team before, during and after the game.
The graph below represents tweet volumes for both teams, with the New England Patriots in the top section and the Atlanta Falcons in the bottom section.

Talk about a game of two halves! That vertical line you can see between the two main peaks represents halftime, and as you can see, Falcons fans were considerably louder in the first half of the game, before the Patriots fans brought the noise in the second half as their team pulled off one of the greatest comebacks in Super Bowl history.

Sentiment analysis of tweets

While tweet volumes relating to either team can be a clear indicator of their on-field dominance during various periods of the game, we like to go a step further and look at the sentiment of these tweets to develop an understanding of how public opinion develops and fluctuates.

The charts below are split into two sections;

Top: Volume of tweets over time, by sentiment (Positive / Negative)

Bottom: Average sentiment polarity over time (Positive / Negative)

New England Patriots

What’s immediately clear from the chart above is that, for the majority of the game, Patriots fans weren’t too happy and it seems had given up hope. However, as you can see by the gradual increase in positive tweets sentiment and volume in the final third, their mood clearly and understandably changes.

Atlanta Falcons

In stark contrast to the Patriots chart, Falcons fans were producing high volumes of positive sentiment for the majority of the game, until the Patriots comeback materialized, and their mood took a turn for the worse, as indicated by the drop of sentiment into negative.

Most tweeted individuals

To get an understanding of who people were talking about in their tweets, we looked at the top mentioned individuals. Unsurprisingly, Tom Brady was heavily featured after his 5th Super Bowl triumph.However, the most mentioned individual had no part to play in the actual game.

All notable players and scorers (and even Brady himself) were shrugged aside when it came to who the viewers were talking about and reacting to most on Twitter, as halftime show performer Lady Gaga dominated. To put the singer’s domination into perspective, she was mentioned in nearly as many tweets as Brady and Ryan were combined!

To get an idea of the scale of her halftime performance, check out this incredible timelapse;


Interestingly, national anthem singer Luke Bryan was tweeted more than both the Patriots’ Head Coach Bill Belichick and catch-of-the-game winner Julian Edelman. Further proof, if needed, that the Super Bowl is not just about the game of football, but that it is becoming more and more of an entertainment spectacle off the field.

Most popular Super Bowl hashtags

We saw a variety hashtags emerge for the Super Bowl this year, so we decided to see which were the most used. Here are the top 5 most popular Super Bowl hashtags, which we have visualized with volumes below;

#SuperBowl

#SB51

#SuperBowl2017

#SuperBowlLI

#SuperBowl51

Despite the NFL’s best efforts to get Twitter using #SB51, the most obvious and simple hashtag of #SuperBowl was a clear winner.

Conclusion

There is no other event on the planet that creates as much hype in the sporting, advertising and entertainment worlds. But the Super Bowl as we know it today, is far less about the football and more about the entertainment factor and commercial opportunity. With big brands spending a minimum $5 Million for a 30 second commercial, competition for viewers eyes and more importantly viewers promotion through shares and likes on social media, the Super Bowl has become big business.

In our next installment, we’ve analyzed the chatter around Super Bowl 51 from a branding point of view. We collected and analyzed Twitter data and news and media coverage of the event to pinpoint which brands and commercials joined the Patriots as Super Bowl 51 champions.





Text Analysis API - Sign up




0

Introduction

We’re just two days away from seeing the Atlanta Falcons and New England Patriots go head to head at Super Bowl 51 in Texas. With an anticipated viewership of over 100 million people, it’s no surprise that some of the world’s biggest brands are pulling out all the stops in an attempt to win a much anticipated off-field battle. We are of course talking about the annual Super Bowl ads battle, where top brands are willing to cough up over $5 million for just 30 seconds of TV airtime.

Sentiment Analysis of tweets from Super Bowl 2016

Last year, we analyzed 1.8 million tweets during Super Bowl 50 to uncover the best, the worst, and the most controversial ads according to Twitter users. Using advanced Sentiment Analysis techniques, we were able to uncover that Amazon’s star-studded effort was the most popular ad at Super Bowl 50, earning the highest volume of positive tweets. PayPal, on the other hand, found themselves at the opposite end of the positivity scale, receiving the highest volume of negative tweets. And the most controversial? We had a clear winner in that category with Mountain Dew’s Puppy Monkey Baby shocking, confusing and amusing viewers in equal measure!

Of course, it’s not all about those 30 seconds of TV airtime. Brands that create something memorable can reap the rewards long after the final whistle has blown. Popular ads can go viral in minutes, with those that fail to impress being left behind and quickly forgotten. Just take a look at the YouTube views for these three brand ads since Super Bowl 50;

YouTube views since Super Bowl 50

With close to 30 million YouTube hits, it’s safe to say that Mountain Dew did pretty well from their wacky creation last year! For PayPal on the other hand, it was back to the drawing board with an expensive disappointment.

Watch: Mountain Dew’s Super Bowl 50 ad “Puppymonkeybaby”

Note: In this post, which is part 1 of a 3 part series, we’re going to focus on the hype surrounding the ads battle in the lead up to big game. Check back for part 2 and 3 where we’ll dive into the in-game reaction on social media and how the brands fared from the press reaction after the event.

The most anticipated ads of Super Bowl 51

This year, as well as once again analyzing millions of tweets to uncover the good, the bad and the ugly among Super Bowl 51 commercials (check back next week for that one!), we thought it would be cool to find out which brands are receiving the most media attention in the lead up to the event.

Using the AYLIEN News API we sourced and analyzed thousands of news stories that mentioned keywords relating to the Super Bowl and the brands that are advertising throughout. From these stories, and using the power of Natural Language Processing in our Text Analysis engine, we were able to uncover which brands have been mentioned most in news stories in the lead up to the event..

The top 15 most mentioned brands

The bubble chart below represents the 15 brands that have received the most mentions in Super Bowl commercial-related news content since January 1. The bigger the bubble, the higher the mentions volume;

Right away we can see a clear leader in Budweiser, who received 50% more mentions than the second most mentioned brand, Pepsi. Why are Budweiser receiving so much attention? Well, much like Mountain Dew last year, controversy is proving to be a key factor, as we’re about to show you.


Want to track mentions and get intelligent, NLP-driven insights into the world’s news content? Sign-up for a free 14 day trial of our News API and get started!

Our top 3 Super Bowl commercials to watch out for

Having uncovered the top 15 most mentioned brands, we thought we would put our necks on the line by selecting three of these brands that we believe will make the biggest splash on social media during Super Bowl 51.

Budweiser

In an attempt to better understand the reasoning behind the hype around Budweiser, we analyzed all news stories mentioning “Super Bowl” and “Budweiser” to see what other topics were present in the collection of articles. From our results we removed keywords relating to the football game itself, as well as obvious brand-related words such as Bud, Anheuser-Busch, beer, etc. The topics that remained quickly gave us an indication of why this ad is proving to be controversial in the US;

Topics extracted from stories mentioning “Super Bowl” and “Budweiser”

Coincidence, or political statement?

Budweiser’s commercial preview, titled Born The Hard Way, shows Adolphus Busch, the co-founder of Anheuser-Busch, arriving in the US from Germany with the dream of opening his beer brewery. With the immigrant-theme of the commercial and opening line of dialogue being “You don’t look like you’re from around here”, the thoughts of political statement quickly spring to mind.

Watch: Budweiser’s Super Bowl ad preview “Born The Hard Way”

Despite Budweiser vice-president, Ricardo Marques, stating that “There’s really no correlation with anything else that’s happening in the country”, news outlets and social media commentators beg to differ, with a strong split in opinion quickly forming. We’re even seeing the spread of #BoycottBudweiser across many tweets.



Whether intentional or not, Budweiser have placed themselves firmly at the center of an fiery debate on immigration, and it will be fascinating to see the public reaction to their main showpiece on Sunday.

Our Budweiser prediction

  • Most controversial ad this year
  • Ad content will be irrelevant, and a political debate will rage on Twitter

Snickers

Snickers will make Super Bowl history this year by being the first brand to perform and broadcast their commercial live during the event.

While Snickers have released a number of small teaser-style previews with a western-theme, we’re still not sure exactly how this one is going to play out.

Watch: Snickers’ Super Bowl ad teaser

With the intrigue of a live performance, as well as the inclusion of superstars like Betty White and Adam Driver, we’re excited to see how this one goes, particularly the reaction on social media.

Live commercial, live Twitter reaction

The world’s first live Super Bowl commercial presents us with the opportunity to track public reaction before, during and after the performance. While we’ll be tracking and analyzing the reaction to all of our top 15 ads, the uniqueness of Snickers’ live commercial brings a whole new level of insight into the tracking of public opinion. Judging by the teasers, it appears that Snickers are going for a wild west-style performance with horses, celebrities and a number of performers.

The big question is, how will social media respond to a real-time, potentially unpolished and unpredictable live performance? We can’t wait to find out!

Our Snickers prediction

  • Live format will inspire and drive high social engagement.
  • A popular cast, inclusion of horses and a fun theme will see Snickers near the top of our most liked ads in terms of positive Twitter sentiment.


Want to track Twitter reactions yourself? Build your own sentiment analysis tool in just 10 minutes. No coding required, and it’s free 🙂

Pepsi

Our second most mentioned brand, Pepsi are investing heavily in Super Bowl 51 with commercials for two products, as well as sponsoring the 12-minute Halftime Show.

For Pepsi, their main aim is to generate awareness around two new products; LIFEWTR and Pepsi Zero Sugar. Have they been successful in this regard so far? While our post-game analysis will give us a better indication of the overall success of their campaign, we can perhaps already say that these two products are being somewhat overshadowed.

Here are the top keywords from stories mentioning “Super Bowl” and “Pepsi”, excluding game-related and obvious brand-related keywords such as Houston, PepsiCo, football, etc.

Topics extracted from stories mentioning “Super Bowl” and “Pepsi”

If you weren’t aware of who was performing during the Super Bowl Halftime Show, now you are! Lady Gaga is absolutely dominating in terms of media mentions, and Pepsi’s high mention volume is most definitely a result of the singer’s involvement in the Halftime Show that they just happen to be sponsoring.

Perhaps worryingly for Pepsi, we saw no mention of LIFEWTR or Pepsi Zero Sugar in our top 100 keyword results.

Watch: Pepsi Super Bowl 51 ad “Inspiration Drops”

Last year, PayPal were accused of playing it safe when it came to their Super Bowl ad. Have Pepsi made the same mistake with LIFEWTR?

Our Pepsi prediction

  • Huge Twitter mention volumes for Pepsi, owing to Lady Gaga’s performance.
  • Low mention volumes for LIFEWTR and Pepsi Zero Sugar.
  • Tame public reaction to LIFEWTR commercial and very low YouTube views.

Who will be the winners and losers at Super Bowl 51?

We’ll be listening to and analyzing news and social media content before, during and after Super Bowl 51 to bring you our annual insights into public and media reaction to both the game itself and the ads battle, so check back next week to find out who were the biggest winners and losers!

Happy Super Bowl weekend to you all 🙂


News API - Sign up




0

Introduction

With our News API, our goal is to make the world’s news content easier to collect, monitor and query, just like a database. We leverage Machine Learning and Natural Language Processing to process, normalize and analyze this content to make it easier for our users to gain access to rich and high quality metadata, and use powerful filtering capabilities that will ultimately help you to find precise and targeted stories with ease.

To this end, we have just launched a cool new feature, Real-time monitoring. Real-time monitoring allows you to further automate your collection and analysis of the world’s news content by creating tailored searches that source and automatically retrieve highly-relevant news stories, as soon as they are published.

real-time monitoring

You can read more about our latest feature – which is now also available in our News API SDKs – below.

Real-time monitoring

With Real-time monitoring enabled you can automatically pull stories as they are published, based on your specific search query. Users who rely on having access to the latest stories as soon as they are published, such as news aggregators and news app developers for example, should find this new feature particularly interesting.

The addition of this powerful new feature will help ensure that your app, webpage or news feed is bang up to date with the latest and most relevant news content, without the need for manual searching and updating.

Newly published stories can be pulled every minute (configurable), and duplicate stories in subsequent searches will be ignored. This ensures you are only getting the most recent publications, rather than a repeat of what has come before.

Usage

We have created code in seven different programming languages to help get you started with Real-time monitoring, each of which can be found below, as well as in our documentation.

NB: Real-time monitoring will only work when you set the sort_by parameter to published_at and sort_direction to desc.

Benefit

The main benefit of this cool new feature is that you can be confident you are receiving the very latest stories and insights, without delay, by creating an automated process that will continue to retrieve relevant content as soon as it is published online. By automating the retrieval of content in real-time, you can cut down on manual input and generate feeds, charts and graphs that will automatically update in real-time.

We hope that you find this new update useful, and we would love to hear any feedback you may have.

To start using our News API for free and query the world’s news content easily, click the image below.

 


News API - Sign up




0

Introduction

In this post, Software Engineer at AYLIEN, Afshin Mehrabani talks us through his experience in building and launching an incredibly popular open source library called Intro.js.

4 years ago, when I started Intro.js as a weekend project I knew it was a useful idea but I didn’t expect to eventually have the likes of Microsoft, IBM, SAP and Amazon using what I had built.

In this post I’ll talk about what sparked the idea for Intro.js, why I open sourced it and what I have learned in the process.

The idea

I originally started Intro.js to use it on a particular product I was working on at the time. I was working as part of a web development team building web-based stock exchange software. The system had many user facing components, tools and widgets and it was a little confusing to use. This meant the support team we worked with spent most of their time on phone calls guiding people on how to use the software, find options, complete trades etc. instead of dealing with the bugs and issues our users had.

Things got even worse when we completely changed the user interface. The support team were working almost exclusively on user interface-related queries and were spending a lot of time guiding users through critical workflows. As you can imagine our support answering the same questions and running thousands of customers through a guided tour just wasn’t efficient.

As a part of the development team, we had to ensure the UI/UX was kept simple and straightforward enough for users to understand the workflow. Which I know is a common problem in software design, but for some web-based portals like this one, it’s an almost impossible task.

Imagine the UI for stock exchange software, it’s made up of many critical moving parts that are paramount to not just the users experience but the software carrying out the job at hand.

 

Metatrader screenshot

Financial software example – source: www.metatrader4.com

These UI’s are busy and complicated and for the most part they need to be. If an object is included in the UI it’s critical to the job at hand. You can’t eliminate any objects from the page and you still need to make it clear enough for users to interact with.

I decided there had to be a better way to improve the usability of the product that didn’t rely on hours and hours of support time. That’s when I came up with the idea to develop a JavaScript and CSS tool that demonstrates the software and its components to the user using a step by step guide. The whole idea is based around a user starting a guided tour as they use the product. This ensures a user sees all the important elements and workflows that are critical to how they use the software.

Introducing Intro.js

The pictures below illustrate some sample steps of a simple Intro.js guide, with the highlighted component, a description of the step and control buttons.

 

Intro.js 1

Intro.js 3

 

Through the control buttons a user can decide to go to the next step, previous step or close the guide.

Although a relatively simple idea after implementing the original version of Intro.js on that project, I realised what I had built could benefit a lot of people who were facing the same problem we had. Which is why I decided to open-source it and let the community use and add to it while of course making some improvements to the code base.

I have now open-sourced almost all of the useful components I built. Not only has this helped a lot of people save time and money, but we’ve also built an active and vibrant community around Intro.js who help maintain the project, add more features, fix bugs and everything else that needs to be done.

In the last year, I have been building on the original offering. Most recently adding the Hints feature. Using this feature, you can add clickable circles to different elements of the webpage with extra descriptions:

 

Intro.js Hints

Intro.js Hints 2

 

Bringing it to the masses

So I had built and validated my idea in a live project and I was pretty sure there was demand out there for something like Intro.js but the challenge was how would I get this into the hands of other developers.

The open source community

After releasing the first version of Intro.js, I created a simple HTML page to demonstrate what it does. I suppose I didn’t think too much about how I was going to find users but I knew from already being actively involved in the open source community that if something was truly useful it would spread and gather pace through the community eventually. I had no idea though that within a couple of days I’d be struggling to keep up with the demand.

To try and get some exposure I decided to submit a post on Hacker News as well as some other dev focused outlets none of which had the effect of Hacker News. Within a couple of hours of the post going live it was gathering some serious interest. Over a couple of days days the post had received about 800 upvotes, 160 comments and the post was on the homepage of Hacker News for two consequent days.

During these two days, I got about 100 pull requests merged and after a month released some other minor versions to address the initial version issues and bugs.

Intro.js was a real hit and I never expected the interest it gathered from users and the hard work of other developers who helped the project by fixing issues, adding features and writing plugins for Angular.js, React, jQuery, etc. However, in saying that, it still probably took about a year to release a stable version of Intro.js that I was happy with.

Since the launch we’ve had some really great companies and developers using Intro.js, it’s been featured on some pretty cool blogs and outlets like Sitepoint and IBM developer works and has collected nearly 15,000 stars on Github.

Maintaining an open-source project

Things get more complicated when you have many many users. You have to be responsible for the product you provide, answer  questions, maintain code, merge pull requests all while also adding more features to the library.

Juggling an open source project, my job at AYLIEN ( we’re hiring BTW  😉 ) and also a Masters course at the university became increasingly difficult and finding new maintainers to work with has also proven quite difficult. Which meant to keep Intro.js alive I needed to look at what options were available.

Licensing

I was unable to devote all of my time to develop and maintain an open-source project. It was clear I needed to hire some more people, mainly web developers, to work on customer queries, fix issues and maintain the project.

Which is why I decided to add a commercial license to Intro.js. I was reluctant to do so in the beginning but after some careful consideration, I’m glad I did. This decision has helped the project a lot; the codebase has improved significantly since, we can now properly serve our users and we can invest more time in adding more features and keeping our users happy.

Adding a commercial license to an open-source project motivates the creators and the maintainers to keep developing the project and release more awesome, useful versions. Since moving to a commercial license we have added a documentation section using Doc42 and moved the Markdown files from Github to introjs.com/docs. Moreover, Intro.js now has an active tag on StackOverflow where developers can tag their questions with Intro.js and get the answer from the awesome community.

Note: There are many open-source licensing options available. You can read more about them here.

What’s next?

We are going to release v3.0 soon which will add responsive tours and many more features. Meanwhile, we are fixing more issues and bugs and v2.5.0 will be released in the next two weeks.

Besides this open-source project, I have developed many other JavaScript projects and they are available here.

Conclusion

I have learned a lot more than I have could have imagined by building and maintaining a real open-source project, I have gained invaluable experience in answering user queries, releasing versions, licensing products and so on and for that reason alone I can say for sure that open sourcing the project was the best decision I made.

Intro.js was a weekend project that turned into a real product and I hope this post gives you the push to go and build something yourself and contribute to the movement or better still to open source something you’ve already built that you think appeals to the masses. Publishing and releasing the first stable version of an open-source project could take you more than a year but during this process, I promise you, you will learn a lot and you’ll have a lot of fun doing it and who knows, you may even get a nice Thank You email from a tech giant ;).





Text Analysis API - Sign up




0

Our researchers at AYLIEN keep abreast of and contribute to the latest developments in the field of Machine Learning. Recently, two of our research scientists, John Glover and Sebastian Ruder, attended NIPS 2016 in Barcelona, Spain. In this post, Sebastian highlights some of the stand-out papers and trends from the conference.

NIPS

The Conference on Neural Information Processing Systems (NIPS) is one of the two top conferences in machine learning. It took place for the first time in 1987 and is held every December, historically in close proximity to a ski resort. This year, it took place in sunny Barcelona. The conference (including tutorials and workshops) went on from Monday, December 5 to Saturday, December 10. The full conference program is available here.

Machine Learning seems to become more pervasive every month. However, it is still sometimes hard to keep track of the actual extent of this development. One of the most accurate barometers for this evolution is the growth of NIPS itself. The number of attendees skyrocketed at this year’s conference growing by over 50% year-over-year.

terry_law

Image 1: The growth of the number of attendees at NIPS follows (the newly coined) Terry’s Law (named after Terrence Sejnowski, the president of the NIPS foundation; faster growth than Moore’s Law)

Unsurprisingly, Deep Learning (DL) was by far the most popular research topic, with about every fourth of more than 2,500 submitted papers (and 568 accepted papers) dealing with deep neural networks.

submissions_distribution

Image 2: Distribution of topics across all submitted papers (Source: The review process for NIPS 2016)

On the other hand, the distribution of research paper topics has quite a long tail and reflects the diversity of topics at the conference that span everything from theory to applications, from robotics to neuroscience, and from healthcare to self-driving cars.

Generative Adversarial Networks

One of the hottest developments within Deep Learning was Generative Adversarial Networks (GANs). The minimax game playing networks have by now won the favor of many luminaries in the field. Yann LeCun hails them as the most exciting development in ML in recent years. The organizers and attendees of NIPS seem to side with him: NIPS featured a tutorial by Ian Goodfellow about his brainchild, which led to a packed main conference hall.

full_conference_hall_gan_tutorial

Image 3: A full conference hall at the GAN tutorial

Though a fairly recent development, there are many cool extensions of GANs among the conference papers:

  • Reed et al. propose a model that allows you to specify not only what you want to draw (e.g. a bird) but also where to put it in an image.
  • Chen et al. disentangle factors of variation in GANs by representing them with latent codes. The resulting models allow you to adjust e.g. the type of a digit, its breadth and width, etc.

In spite of their popularity, we know alarmingly little about what makes GANs so capable of generating realistic-looking images. In addition, making them work in practice is an arduous endeavour and a lot of (undocumented) hacks are necessary to achieve the best performance. Soumith Chintala presents a collection of these hacks in his “How to train your GAN” talk at the Adversarial Training workshop.

soumith_chintala_nips_2016_how_to_train_your_gan_poster

Image 4: How to train your GAN (Source: Soumith Chintala)

Yann LeCun muses in his keynote that the development of GANs parallels the history of neural networks themselves: They were poorly understood and hard to get to work in the beginning and only took off once researchers figured out the right tricks and learned how to make them work. At this point, it seems unlikely that GANs will experience a winter anytime soon; the research community is still at the beginning in learning how to make the best use of them and it will be exciting to see what progress we can make in the coming years.

On the other hand, the success of GANs so far has been limited mostly to Computer Vision due to their difficulty in modelling discrete rather than continuous data. The Adversarial Training workshop showcased some promising work in this direction (see e.g. our own John Glover’s paper on modeling documents, this paper and this paper on generating text, and this paper on adversarial evaluation of dialogue models). It remains to be seen if 2017 will be the year in which GANs break through in NLP.

The Nuts and Bolts of Machine Learning

Andrew Ng gave one of the best tutorials of the conference with his take on building AI applications using Deep Learning. Drawing from his experience of managing the 1,300 people AI team at Baidu and hundreds of applied AI projects and equipped solely with two whiteboards, he shared many insights about how to build and deploy AI applications in production.

Besides better hardware, Ng attributes the success of Deep Learning to two factors: In contrast to traditional methods, deep NNs are able to learn more effectively from large amounts of data. Secondly, end-to-end (supervised) Deep Learning allows us to learn to map from inputs directly to outputs.

While this approach to training chatbots or self-driving cars is sufficient to write innovative research papers, Ng emphasized end-to-end DL is often not production-ready: A chatbot that maps from text directly to a response is not able to have a coherent conversation or fulfill a request, while mapping from an image directly to a steering command might have literally fatal side effects if the model has not encountered the corresponding part of the input space before. Rather, for a production model, we still want to have intermediate steps: For a chatbot, we prefer to have an inference engine that generates a response, while in a self-driving car, DL is used to identify obstacles, while the steering is performed by a traditional planning algorithm.

ng_tutorial_end_to_end_dl

Image 5: Andrew Ng on end-to-end DL (right: end-to-end DL chatbot and chatbot with inference engine; left bottom: end-to-end DL self-driving car and self-driving car with intermediate steps)

Ng also shared that the most common mistakes he sees in project teams is that they track the wrong metrics: In an applied machine learning project, the only relevant metrics are the training error, the development error, and the test error. These metrics alone enable the project team to know what steps to take, as he demonstrated in the diagram below:

ng_tutorial_bias_variance

Image 6: Andrew Ng’s flowchart for applied ML projects

A key facilitator of the recent success of ML have been the advances in hardware that allowed faster computation and storage. Given that Moore’s Law will reach its limits sooner or later, one might reason that also the rise of ML might plateau. Ng, however, argued that the commitment by leading hardware manufacturers such as NVIDIA and Intel and the ensuing performance improvements to ML hardware would fuel further growth.

Among ML research areas, supervised learning is the undisputed driver of the recent success of ML and will likely continue to drive it for the foreseeable future. In second place, Ng saw neither unsupervised learning nor reinforcement learning, but transfer learning. We at AYLIEN are bullish on transfer learning for NLP and think that it has massive potential.

Recurrent Neural Networks

The conference also featured a symposium dedicated to Recurrent Neural Networks (RNNs). The symposium coincided with the 20 year anniversary of LSTM…

rnn_symposium

Image 7: Jürgen Schmidhuber kicking off the RNN symposium

… being rejected from NIPS 1996. The fact that papers that do not use LSTMs have been rare in the most recent NLP conferences (see our EMNLP blog post) is a testament to the perseverance of the authors of the original paper, Sepp Hochreiter and Jürgen Schmidhuber.

At NIPS, we had several papers that sought to improve RNNs in different ways:

Other improvements apply to Deep Learning in general:

  • Salimans and Kingma propose Weight Normalisation to accelerate training that can be applied in two lines of Python code.
  • Li et al. propose a multinomial variant of dropout that sets neurons to zero depending on the data distribution.

The Neural Abstract Machines & Program Induction (NAMPI) workshop also featured several speakers talking about RNNs:

  • Alex Graves focused on his recent work on Adaptive Computation Time (ACT) for RNNs that allows to decouple the processing time from the sequence length. He showed that a word-level language model with ACT could reach state-of-the-art with fewer computations.
  • Edward Grefenstette outlined several limitations and potential future research directions in the context of RNNs in his talk.

Improving classic algorithms

While Deep Learning is a fairly recent development, the conference featured also several improvements to algorithms that have been around for decades:

  • Ge et al. show in their best paper that the non-convex objective for matrix completion has no spurious local minima, i.e. every local minimum is a global minimum.
  • Bachem et al. present a method that guarantees accurate and fast seedings for large-scale k-means++ clustering. The presentation was one of the most polished ones of the conference and the code is open-source and can be installed via pip.
  • Ashtiani et al. show that we can make NP-hard k-means clustering problems solvable by allowing the model to pose queries for a few examples to a domain expert.

Reinforcement Learning

Reinforcement Learning (RL) was another much-discussed topic at NIPS with an excellent tutorial by Pieter Abbeel and John Schulman dedicated to RL. John Schulman also gave some practical advice for getting started with RL.

One of the best papers of the conference introduces Value Iteration Networks, which learn to plan by providing a differentiable approximation to a classic planning algorithm via a CNN. This paper was another cool example of one of the major benefits of deep neural networks: They allow us to learn increasingly complex behaviour as long as we can represent it in a differentiable way.

During the week of the conference, several research environments for RL were simultaneously released, among them OpenAI’s Universe, Deep Mind Lab, and FAIR’s Torchcraft. These will likely be a key driver in future RL research and should open up new research opportunities.

Learning-to-learn / Meta-learning

Another topic that came up in several discussions over the course of the conference was Learning-to-learn or Meta-learning:

  • Andrychowicz et al. learn an optimizer in a paper with the ingenious title “Learning to learn by gradient descent by gradient descent”.
  • Vinyals et al. learn how to one shot-learn in a paper that frames one-shot learning in the sequence-to-sequence framework and has inspired new approaches for one-shot learning.

Most of the existing papers on meta-learning demonstrate that wherever you are doing something that gives you gradients, you can optimize them using another algorithm via gradient descent. Prepare for a surge of “Meta-learning for X” and “(Meta-)+learning” papers in 2017. It’s LSTMs all the way down!

Meta-learning was also one of the key talking points at the RNN symposium. Jürgen Schmidhuber argued that a true meta-learner would be able to learn in the space of all programs and would have the ability to modify itself and elaborated on these ideas at his talk at the NAMPI workshop. Ilya Sutskever remarked that we currently have no good meta-learning models. However, there is hope as the plethora of new research environments should also bring progress in this area.

General Artificial Intelligence

Learning how to learn also plays a role in the pursuit of the elusive goal of attaining General Artificial Intelligence, which was a topic in several keynotes. Yann LeCun argued that in order to achieve General AI, machines need to learn common sense. While common sense is often vaguely mentioned in research papers, Yann LeCun gave a succinct explanation of what common sense is: “Predicting any part of the past, present or future percepts from whatever information is available.” He called this predictive learning, but notes that this is really unsupervised learning.

His talk also marked the appearance of a controversial and often tongue-in-cheek copied image of a cake, which he used to demonstrate that unsupervised learning is the most challenging task where we should concentrate our efforts, while RL is only the cherry on the icing of the cake.

lecun_nips_2016_cake_slide

Image 8: The Cake slide of Yann LeCun’s keynote

Drew Purves focused on the bilateral relationship between the environment and AI in what was probably the most aesthetically pleasing keynote of the conference (just look at those graphics!)

drew_purves_agent_illustrations

Image 9: Graphics by Max Cant of Drew Purves’ keynote (Source: Drew Purves)

He emphasized that while simulations of ecological tasks in naturalistic environments could be an important test bed for General AI, General AI is needed to maintain the biosphere in a state that will allow the continued existence of our civilization.

drew_purves_nips_2016_nature_needs_ai_slide

Image 10: Nature needs AI and AI needs Nature from Drew Purves’ keynote

While it is frequently — and incorrectly — claimed that neural networks work so well because they emulate the brain’s behaviour, Saket Navlakha argued during his keynote that we can still learn a great deal from the engineering principles of the brain. For instance, rather than pre-allocating a large number of neurons, the brain generates 1000s of synapses per minutes until its second year. Afterwards, until adolescence, the number of synapses is pruned and decreases by ~50%.

saket_navlakha_slide

Image 11: Saket Navlakha’s keynote

It will be interesting to see how neuroscience can help us to advance our field further.

In the context of the Machine Intelligence workshop, another environment was introduced in the form of FAIR’s CommAI-env that allows to train agents through interaction with a teacher. During the panel discussion, the ability to learn hierarchical representations and to identify patterns was emphasized. However, although the field is making rapid progress on standard tasks such as object recognition, it is unclear if the focus on such specific tasks brings us indeed closer to General AI.

Natural Language Processing

While NLP is more of a niche topic at NIPS, there were a few papers with improvements relevant to NLP:

  • He et al. propose a dual learning framework for MT that has two agents translating in opposite directions teaching each other via reinforcement learning.
  • Sokolov et al. explore how to use structured prediction under bandit feedback.
  • Huang et al. extend Word Mover’s Distance, an unsupervised document similarity metric to the supervised setting.
  • Lee et al. model the helpfulness of reviews by taking into account position and presentation biases.

Finally, a workshop on learning methods for dialogue explored how end-to-end systems, linguistics and ML methods can be used to create dialogue agents.

Miscellaneous

Schmidhuber

Jürgen Schmidhuber, the father of the LSTM was not only present on several panels, but did his best to remind everyone that whatever your idea, he had had a similar idea two decades ago and you should better cite him lest he interrupt your tutorial.

 

Robotics

Boston Robotics’ Spot proved that — even though everyone is excited by learning and learning-to-learn — traditional planning algorithms are enough to win the admiration of a hall full of learning enthusiasts.

boston_dynamics_spot

Image 12: Boston Robotics’ Spot amid a crowd of fascinated onlookers

Apple

Apple, one of the most secretive companies in the world, has decided to be more open, to publish, and to engage with academia. This can only be good for the community. We’re looking forward to more apple research papers.

russ_salakhutdinov_apple_nips_2016_slide

Image 13: Ruslan Salakhutdinov at the Apple lunch event

Uber

Uber announced their acquisition of Cambridge-based AI startup Geometric Intelligence and threw one of the most popular parties of NIPS.

geometric_intelligence_logo

Image 14: The Geometric Intelligence logo

Rocket AI

Talking about startups, the “launch” of Rocket AI and their patented Temporally Recurrent Optimal Learning had some people fooled (note the acronyms in the below tweets). Riva-Melissa Tez finally cleared up the confusion.

 

These were our impressions from NIPS 2016. We had a blast and hope to be back in 2017!

 





Text Analysis API - Sign up




2

Intro

For PR professionals, entrepreneurs, marketers, or just about anyone out there who is looking to connect with relevant journalists, reporters and influencers to cover their press release, the biggest challenge in doing so can often lie in finding exactly who are the most suitable people to approach.

This can be a time-consuming and often fruitless endeavour as many take a spray and pray approach by sending out high volumes of emails in the hope that someone out there picks one up. One of the main drawbacks of this approach however is that mass emails aren’t targeted and are inevitably written in an impersonal manner and generally fail to grab the attention of the intended recipient.

To help streamline and vastly improve this entire process, we’re going to show you how you can use Machine Learning and NLP to significantly improve your PR targeting process. A technique we’ve used at AYLIEN to land coverage in the likes of TechCrunch, The Next Web and Forbes.

Using the AYLIEN News API, we’ll show you how easy it can be to quickly build your own highly-targeted list of journalists, reporters and influencers to reach out and pitch to.

As an example, let’s say you’ve recently gone through a funding round and you’re hoping to get some press coverage and exposure. We’ll start by first identifying the publishers who have generated the most articles mentioning startups and funding in the past 60 days. We will then narrow our search and get more targeted by finding specific people who write  about startup funding, and then finish by giving you some tips and instructions on how to create a highly-targeted search to match your own needs.

Which publishers are writing about startup funding?

To find the publishers that write the most about startups and funding, we’ll use the /trends endpoint in the News API. Using the /trends endpoint enables you to identify the most frequently mentioned keywords, entities and topical or sentiment-related categories in news content. Put simply, it allows you to measure the amount of times that specific elements of interest are mentioned in the content you source through the News API.

By performing the following search using /trends, we can source these metrics for all stories that mention our keywords–startup and funding–and by specifying field=source.name, our results will be returned with a count for each source (publisher, news outlet or blog).

Here’s the query we used;

Our News API returns results in JSON format, and here’s what they look like for this query;


{
"trends": [
  {
"value": "TechCrunch",
"count": 206
},
  {
"value": "Fortune",
"count": 108
},
  {
"value": "Business Insider",
"count": 91
},
  {
"value": "PR Newswire",
"count": 70
},
  {
"value": "Inc.com",
"count": 64
},
  {
"value": "Seeking Alpha",
"count": 62
},
  {
"value": "Forbes",
"count": 52
},
  {
"value": "CNBC TV18",
"count": 46
},
  {
"value": "Entrepreneur.com",
"count": 44
},
  {
"value": "Bloomberg",
"count": 43
},
  {
"value": "BetaKit",
"count": 33
},
  {
"value": "Market Wired",
"count": 31
},
  {
"value": "Huffington Post",
"count": 26
},
  {
"value": "Quartz",
"count": 26
},
  {
"value": "Fast Company",
"count": 24
},
  {
"value": "Business Wire",
"count": 23
},
  {
"value": "Business Standard",
"count": 19
},
  {
"value": "ZDNet",
"count": 18
},
  {
"value": "Daily Mail UK",
"count": 18
},
  {
"value": "Mashable",
"count": 17
},
  {
"value": "The Guardian",
"count": 16
},
  {
"value": "Deccan Herald",
"count": 15
},
  {
"value": "Globe and Mail",
"count": 14
},
  {
"value": "Business Line",
"count": 13
},
  {
"value": "Reuters",
"count": 12
},
  {
"value": "Upstart Business Journal RSS Feed",
"count": 12
},
  {
"value": "The Next Web",
"count": 10
},
  {
"value": "The Wall Street Journal",
"count": 10
},
  {
"value": "Economic Times",
"count": 10
},
  {
"value": "Variety",
"count": 10
},
  {
"value": "Madison",
"count": 10
},
  {
"value": "Times of Israel",
"count": 10
},
  {
"value": "CNN",
"count": 9
},
  {
"value": "CNET",
"count": 9
},
  {
"value": "Globes",
"count": 9
},
  {
"value": "The Verge",
"count": 8
},
  {
"value": "Autonews",
"count": 8
},
  {
"value": "Yahoo",
"count": 8
},
  {
"value": "Irish Independent",
"count": 8
},
  {
"value": "Modern Ghana",
"count": 8
},
  {
"value": "Drudge Report",
"count": 8
},
  {
"value": "Berlin Startup Jobs",
"count": 8
},
  {
"value": "Digital Trend",
"count": 7
},
  {
"value": "Times of India",
"count": 7
},
  {
"value": "Albuquerque Journal",
"count": 7
},
  {
"value": "USA Today",
"count": 6
},
  {
"value": "Nikkei Asian Review",
"count": 6
},
  {
"value": "Times Picayune",
"count": 6
},
  {
"value": "New Zealand Herald",
"count": 6
},
  {
"value": "Sify",
"count": 6
},
  {
"value": "Star",
"count": 6
},
  {
"value": "Malay Mail",
"count": 6
},
  {
"value": "WCPO",
"count": 6
},
  {
"value": "The Guardian Nigeria",
"count": 6
},
  {
"value": "The Economist",
"count": 5
},
  {
"value": "Japan Times",
"count": 5
},
  {
"value": "Republican",
"count": 5
},
  {
"value": "Daily Courier",
"count": 5
},
  {
"value": "Sydney Morning Herald",
"count": 5
},
  {
"value": "Gulf News",
"count": 5
},
  {
"value": "Bangkok Post",
"count": 5
},
  {
"value": "Buzz Feed",
"count": 5
},
  {
"value": "DNA",
"count": 5
},
  {
"value": "Kyiv Post",
"count": 5
},
  {
"value": "Portland Press Herald",
"count": 5
},
  {
"value": "Roanoke Times",
"count": 5
},
  {
"value": "ALL TOP STARTUPS",
"count": 5
},
  {
"value": "Irish Central",
"count": 5
},
  {
"value": "CRN",
"count": 5
},
  {
"value": "Haaretz",
"count": 5
},
  {
"value": "Nigeria Communications Week",
"count": 5
},
  {
"value": "Wired",
"count": 4
},
  {
"value": "Kiplinger",
"count": 4
},
  {
"value": "Vietnam Net",
"count": 4
},
  {
"value": "M Live - 786",
"count": 4
},
  {
"value": "Scoop",
"count": 4
},
  {
"value": "Arkansas Democrat Gazette",
"count": 4
},
  {
"value": "Newsweek",
"count": 4
},
  {
"value": "Stuff",
"count": 4
},
  {
"value": "Yale Daily News",
"count": 4
},
  {
"value": "Anthill Online",
"count": 4
},
  {
"value": "Medium",
"count": 4
},
  {
"value": "Vice Motherboard",
"count": 4
},
  {
"value": "IT news Africa",
"count": 4
},
  {
"value": "Zero Hedge",
"count": 3
},
  {
"value": "Oregonian",
"count": 3
},
  {
"value": "Philippine Daily Inquirer",
"count": 3
},
  {
"value": "Daily Caller",
"count": 3
},
  {
"value": "Benzinga",
"count": 3
},
  {
"value": "Billboard",
"count": 3
},
  {
"value": "International Business Times - UK",
"count": 3
},
  {
"value": "Age",
"count": 3
},
  {
"value": "D Magazine",
"count": 3
},
  {
"value": "Montreal Gazette",
"count": 3
},
  {
"value": "Hill",
"count": 3
},
  {
"value": "ARL Now",
"count": 3
},
  {
"value": "Canadian Business",
"count": 3
},
  {
"value": "Channel News Asia",
"count": 3
},
  {
"value": "China Post",
"count": 3
}
],
"field": "source.name"
}


By importing our results into a visualization tool such as Tableau, we can quickly get an idea of which publishers are writing most about our selected keywords.

Note: The chart below is interactive. You can hover over and click the various bubbles to see more information.

Straight away we can see that TechCrunch dominate our results, generating almost twice as many matches as the next top result. What does this tell us? It tells us that TechCrunch are more than likely a leading publisher when it comes to writing about startup funding.

Which reporters are writing about startup funding?

Now that we’ve established the top publishers writing about startups and funding, we’ll look to find out which specific reporters/influencers are writing the most content around this subject area.

Similar to our previous query, we’re once again going to use the /trends endpoint. This time, however, we’ll look at field=author.name. Here’s the search query we used;

Here are our visualized results for the query above;


If further proof was needed that TechCrunch are leaders in reporting about startup funding, check out the top ten authors from our results, and who they write for. TechCrunch reporters make up half of the top 10, but top of the list is Erin Griffiths of Fortune.

  1. Erin Griffiths – Fortune
  2. Steve O’Hear – TechCrunch
  3. Lora Kolodny – TechCrunch
  4. Kia Kokalitcheva – Fortune
  5. Sarah Buhr – TechCrunch
  6. Ingrid Lunden – TechCrunch
  7. Sam Shead – Business Insider
  8. Connie Loizos – TechCrunch
  9. Jessica Galang – BetaKit
  10. Tas Bindi – ZDNet

What now?

Now that you have a list of reporters who you know are writing plenty of content around your area of interest, you can focus your efforts on contacting them individually, rather than sending out blind and impersonal mass emails.

Reporters generally have a profile or portfolio of their work on their publisher’s website, and so by citing this relevant work as a reason for contacting them specifically, you are showing that you have done your homework and have intentionally reached out to them.

Further narrowing your search

Depending on your own precise search criteria, there are a number of options available to narrow down your search and pinpoint exactly what, and who, you are looking for.

Search by article title

While searching for mentions of startup and funding gave us some excellent results, perhaps you have a niche product or app and you would like to find a reporter who has previously written about your exact field of expertise. Searching by article title is often the most accurate method of sourcing content that is specifically about your keyword, rather than just mentioning it somewhere in the body of text.

Previously, we found that 5 out of our top 10 search results for startup and funding write for TechCrunch. But what if we want to be even more targeted and find a reporter who specifically writes about fintech startups and funding?

To do so, we will use a previous search query for startup and funding from above, but we will now add a parameter to search article titles for the word fintech. Here’s our updated query;

JSON results;


{
"trends": [
    {
"value": "Oscar Williams-grut",
"count": 9
},
  {
"value": "Erweiterte Suche",
"count": 5
},
  {
"value": "Andrew Meola",
"count": 3
},
  {
"value": "Natasha Lomas",
"count": 2
},
  {
"value": "Steve O'hear",
"count": 2
},
  {
"value": "Tas Bindi",
"count": 2
},
  {
"value": "John Rampton",
"count": 1
},
  {
"value": "Roger Aitken",
"count": 1
},
  {
"value": "Aaron Aders",
"count": 1
},
  {
"value": "Tx Zhuo",
"count": 1
},
  {
"value": "Lisa Rabasca Roepe",
"count": 1
},
  {
"value": "Richie Hecker",
"count": 1
},
  {
"value": "Mileika Lasso",
"count": 1
},
  {
"value": "Peter Nowak",
"count": 1
},
  {
"value": "Par Sophie",
"count": 1
},
  {
"value": "Spencer Israel",
"count": 1
},
  {
"value": "John Detrixhe",
"count": 1
},
  {
"value": "Julie Verhage",
"count": 1
},
  {
"value": "Jessica Galang",
"count": 1
},
  {
"value": "Douglas Soltys",
"count": 1
},
  {
"value": "Ara Rodríguez",
"count": 1
},
  {
"value": "Jessica Vomiero",
"count": 1
},
  {
"value": "Valeria Ríos",
"count": 1
},
  {
"value": "Amy Feldman",
"count": 1
},
  {
"value": "Ameinfo Staff",
"count": 1
},
  {
"value": "Kevin Sandhu",
"count": 1
},
  {
"value": "George Beall",
"count": 1
},
  {
"value": "Par Delphine",
"count": 1
},
  {
"value": "Caitlin Hotchkiss",
"count": 1
},
  {
"value": "Robert Hackett",
"count": 1
},
  {
"value": "Nathan Sinnott",
"count": 1
},
  {
"value": "Eliran Rubin",
"count": 1
},
  {
"value": "Lee Roden",
"count": 1
},
  {
"value": "Piruze Sabuncu",
"count": 1
},
  {
"value": "Danon Gabriel",
"count": 1
},
  {
"value": "Rachel Witkowski",
"count": 1
},

As you can see from the JSON results above, Oscar Williams-grut has recently written 9 articles matching our search query. A quick look at Oscar’s profile on Business Insider confirms that he writes about finance, specializing in fintech, business, markets, and politics. He would certainly top our list of contacts if we wanted to reach out about a fintech startup funding press release!

Screen Shot 2016-12-14 at 15.14.16

Location and language

Our News API scans content from thousands of sources and RSS feeds worldwide, in multiple languages, meaning you can narrow your search to locate content in specific languages and from specific countries. As an example, you can add the following parameters to your search query to locate only sources from Portugal, that are also written in the Portuguese language;

  • source.locations.country[]=pt
  • language[]=pt

Social shares count

One of main reasons for finding relevant reporters and bloggers in the first place is to gain as much public exposure as possible. One way to help ensure this is to source reporters based on the number of shares their content receives on social media.

You can be quite specific here by choosing the social network(s) that interest you most. For example, perhaps your content is best suited for distribution on Facebook. You can therefore find out which reporters tend to generate the most shares on Facebook by adding a minimum share count for that network. Here’s an example query that will do just that, by only sourcing authors who have generated over 10,000 shares on Facebook in the past 60 days;

At the time of writing, this query is returning the names of four reporters, each of which have generated over 10,000 Facebook shares with content containing our keywords startup and funding published in the past 60 days.

Of course, the further you lower the minimum number of shares, the more results you will obtain. We changed the above search query to contain a minimum of 5,000 shares and our results almost trebled.

Alexa rank

Similar to how we defined a minimum number of Facebook social shares in the example above, you also have the option to define the minimum and maximum Alexa rank of websites that you source.

Why is this useful? The Alexa ranking system is compiled to analyze the frequency of visits on websites and rank them against each other according to the volume of visits they receive. Alexa’s algorithm is pretty simple – it is calculated by the amount of website traffic generated over the past 3 months.

If you’re looking to maximize your exposure, you will naturally want your content to be featured on sites with the highest visitor traffic, and you will therefore be looking at sites with the best Alexa ranks.

Try the search query below. It is the same as our earlier search for publishers, but we are now narrowing the search to only include sites with an Alexa rank of 1-1000.

Click here to learn more about sourcing and filtering news content by Alexa rank.

Conclusion

It took us less than 5 minutes to source and visualize the top publishers and reporters writing about startup funding, which could potentially save hours of time scanning the web and social media in the search for suitable influencers to reach out to about your press release.

Ready to try the News API for yourself? Click the image below and sign up for a free 14-day trial.

 




News API - Sign up




0

Intro

Here at AYLIEN we spend our days creating cutting-edge NLP and Text Analysis solutions such as our Text Analysis API and News API to help developers build powerful applications and processes.

We understand, however, that not everyone has the programming knowledge required to use APIs, and this is why we created our Text Analysis Add-on for Google Sheets – to bring the power of NLP and Text Analysis to anyone who knows how to use a simple spreadsheet.

Today we want to show you how you can build an intelligent sentiment analysis tool with zero coding using our Google Sheets Add-on and a free service called IFTTT.

Here’s what you’ll need to get started;

What is IFTTT?

IFTTT stands for If This, Then That. It is a free service that enables you automate specific tasks by triggering actions on apps when certain criteria is met. For example, “if the weather forecast predicts rain tomorrow, notify me by SMS”.

Step 1 – Connect Google Drive to IFTTT

  • Log in to your IFTTT account
  • Search for, and select, Google Drive
  • Click Connect and enter your Google login information

Step 2 – Create Applets in IFTTT

Applets are the processes you create to trigger actions based on certain criteria. It’s really straightforward. You define the criteria (the ‘If’) and then the trigger (the ‘That’). In our previous weather-SMS example, the ‘if’ is a rain status within a weather app, and the ‘that’ is a text message that gets sent to a specified cell phone number.

To create an applet, go to My Applets and click New Applet.

Here’s what you’ll see. Click the blue +this

Screen Shot 2016-12-02 at 14.48.41

You will then be shown a list of available apps. In this case, we want to source specific tweets, so select the Twitter app.

You will then be asked to choose a trigger. Select New tweet from search.

You can now define exactly what tweets you would like to source, based on their content. You can be quite specific with your search using Twitter’s search operators, which we’ve listed below;

Twitter search operators

To search for specific words, hashtags or languages

  • Tweets containing all words in any position (“Twitter” and “search”)  
  • Tweets containing exact phrases (“Twitter search”)
  • Tweets containing any of the words (“Twitter” or “search”)
  • Tweets excluding specific words (“Twitter” but not “search”)
  • Tweets with a specific hashtag (#twitter)
  • Tweets in a specific language (written in English)

To search for specific people or accounts

  • Tweets from a specific account (Tweeted by “@TwitterComms”)
  • Tweets sent as replies to a specific account (in reply to “@TwitterComms”)
  • Tweets that mention a specific account (Tweet includes “@TwitterComms”)

To exclude Retweets and/or links

  • To exclude Retweets (“-rt”)
  • To exclude links/URLs (“-http”) and (“-https”)

Our first trigger

We’re going to search for tweets that mention “bad santa 2 is” or “bad santa 2 was”. Why are we searching for these terms? Well, we find that original, opinionated tweets generally use either one of these phrases. It also helps to cut out tweets that contain no opinion (neutral sentiment) such as the one below;


 

Our goal with this tool is to analyze the viewer reaction to “Bad santa 2”  which means Tweets such as this one aren’t entirely interesting to us in this case. However, if we wanted to asses the overall buzz on Twitter about Bad Santa 2 perhaps we might just look for any mention at all and concentrate on the volume of tweets.

And so, here’s our first trigger.

Screen Shot 2016-12-07 at 10.53.35

Click Create Trigger when you’re happy with your search. You will then see the following;

Screen Shot 2016-12-01 at 17.35.29Notice how the Twitter icon has been added. Now let’s choose our action. Click the blue +that

Next, search for or select Google Drive. You will then be given 4 options – select Add row to spreadsheet. This action will add each matching tweet to an individual row in Google Sheets.

Next, give the spreadsheet a name. We simply went for ‘Bad Santa 2’. Click Create Action. You will then be able to review your applet. Click Finish when you are happy with it.

Done! Tweets that match your search criteria will start appearing in an auto-generated Google Sheet within minutes. Now you can go through this process again to create a second applet. We chose another movie, Allied. (“Allied was” or “Allied is”).

Here is an example of what you can expect to see accumulate in your Google Sheet;

Screen Shot 2016-12-02 at 17.59.39

Note: When you install our Google Sheets Add-on we’ll give 1,000 credits to use for free. You then have the option to purchase additional credits should you wish to. For this example, we will stay within the free range and analyze 500 tweets for each movie. You may choose to use more or less, depending on your preference.

Step 3 – Clean your data

Because of the nature of Twitter, you’re probably going to find a lot of crap and spammy tweets in your spreadsheet. To minimize the amount of these tweets that end up in your final data set, there are a few things we recommend you do;

Sort your tweets alphabetically

By sorting your tweets alphabetically, you can quickly scroll down through your spreadsheet and easily spot multiples of the same tweet. It’s a good idea to delete multiple instances of the same tweet as they will not only skew your overall results but multiple instances of the same tweet can often point to bot activity or spamming activity on Twitter. To sort your tweets alphabetically, select the entire column, select Data and Sort sheet by column B, A-Z.

AYLIEN Start Analysis

Remove retweets (if you haven’t already done so)

Alphabetically sorting your tweets will also list all retweets together (beginning with RT). You may or may not want to include retweets, but this is entirely up to you. We decided to remove all retweets because there are so many bots out there auto-retweeting and we felt that using this duplicate content isn’t exactly opinion mining.

Search and filter certain words

Think about the movie(s) you are searching for and how their titles may be used in different contexts. For example, we searched for tweets mentioning ‘Allied’, and while we used Twitter’s search operators to exclude words like forces, battle and treaty, we noticed a number of tweets about a company named ‘Allied’. By searching for their company Twitter handle, we could highlight and delete the tweets in which they were mentioned.

NB: Remove movie title from tweets

Before you move on to Step 4 and analyze your tweets, it is important to remove the movie title from each tweet, as it may affect the quality of your results. For example, our tweet-level sentiment analysis feature will read ‘Bad Santa 2…” in a tweet and may assign negative sentiment because of the inclusion of the word bad.

To remove all mentions of your chosen movie title, simply use EditFind and Replace in Google Sheets.

Step 4 – Analyze your tweets

Now comes the fun part! It’s time to analyze your tweets using the AYLIEN Text Analysis Add-on. If you have not yet installed the Add-on, you can do see here.

Using our Add-on couldn’t be easier. Simply select the column containing all of your tweets, then click Add-onsText Analysis.

Select sentiment

To find out whether our tweets have been written in a positive, neutral or negative way, we use Sentiment Analysis.

Note: While Sentiment Analysis is a complex and fascinating field in NLP and Machine Learning research, we won’t get into it in too much detail here. Put simply, it enables you to establish the sentiment polarity (whether a piece of text is positive, negative or neutral) of large volumes of text, with ease.

Next, click the drop-down menu and select Sentiment AnalysisAnalyze.

Each tweet will then be analyzed for subjectivity (whether it is written subjectively or objectively) and sentiment polarity (whether it is written in a positive, negative or neutral manner). You will also see a confidence score for both subjectivity and sentiment. This tells you how confident we are that the assigned label (positive, negative, objective, etc) is correct.

tweetyfill
By repeating this process for our
Allied tweets, we can then compare our results and find out which movie has been best received by Twitter users.

Step 5 – Compare & visualize

In total we analyzed 1,000 tweets, 500 for each movie. Through a simple count of positive, negative and neutral tweets, we received the following results;

Bad Santa 2

Positive – 170

Negative – 132

Neutral – 198

Allied

Positive – 215

Negative – 91

Neutral – 194

Now to generate a percentage score for each movie. Let’s start by excluding all neutral tweets. We can then easily figure out what percentage of remaining tweets are positive. So, for Allied, of the remaining 306 tweets, 215 were positive,giving us a positive score of 70%.

By doing the same with Bad Santa 2, we get 56%.

Allied wins!

To visualize your results, use your tweet volume data to generate some charts and graphs in Google Sheets;

piechartsComparing our results with Rotten Tomatoes & IMDb

It’s always interesting to compare results of your analysis with those of others. To compare ours, we went to the two major movie review site – Rotten Tomatoes & IMDb, and we were pleasantly surprised with the similarity in our results!

Allied

The image below from Rotten Tomatoes shows both critic (left) and audience (right) score for Allied. Seeing as we analyzed tweets from a Twitter audience, we are therefore more interested in the latter. Our score of 70% comes so close to that of almost 15,000 reviewers on Rotten Tomatoes – just 1% off!

Screen Shot 2016-12-07 at 15.54.03

IMDb provide an audience-based review score of 7.2/10. Again, very close to our own result.

Screen Shot 2016-12-07 at 16.08.46

Our result for Bad Santa 2, while not as close as that of Allied, was still pretty close to Rotten Tomatoes with 56%.

Screen Shot 2016-12-07 at 15.54.24

With IMDb, however, we once again come within 1% with a score of 5.7/10.

Screen Shot 2016-12-07 at 16.09.04

Conclusion

We hope that this simple and fun use-case using our Google Sheets Add-on will give you an idea of just how useful, flexible and simple Text Analysis can be, without the need for any complicated code.

While we decided to focus on movie reviews in this example, there are countless other uses for you to try. Here’s a few ideas;

  • Track mentions of brands or products
  • Track event hashtags
  • Track opinions towards election candidates

Ready to get started? Click here to install our Text Analysis Add-on for Google Sheets.





Text Analysis API - Sign up




0

Intro

Dubbed as Europe’s largest technology marketplace and Davos for geeks, the Web Summit has been going from strength to strength in recent years as more and more companies, employees, tech junkies and media personnel flock to the annual event to check out the latest innovations, startups and a star-studded lineup of speakers and exhibitors.

20161108_125147

Having grown from a small gathering of around 500 like-minded people in Dublin, this year’s event, which was held in Lisbon for the first time, topped 50,000 attendees representing 15,000 companies from 166 countries.

With such a large gathering of techies, there was bound to be a whole lot of chatter relating to the event on Twitter. So being the data geeks that we are, and before we jetted off to Lisbon ourselves, we turned our digital ears to Twitter and listened for the duration of the event to see what we could uncover.

Our process

We collected a total of just over 80,000 tweets throughout the event by focusing our search on keywords, Twitter handles and hashtags such as ‘Web Summit’, #websummit, @websummit, etc.

We used the following tools to collect, analyze and visualize the data;

And here’s what we found;

What languages were the tweets written in?

In total, we collected tweets written in 42 different languages.

Out of our 80,000 tweets, 60,000 were written in English, representing 75% of the total volume.

The pie chart below shows all languages, excluding English. As you can see, Portuguese was the next most-used language with just under of 11% of tweets being written in the host country’s native tongue. Spanish and French tweets represented around 2.5% of total volume each.


How did tweet volumes fluctuate throughout the week?

The graph below represents hourly tweet volume fluctuations throughout the week. As you can see, there are four distinct peaks.

While we can’t list all the reasons for these spikes in volume, we did find a few recurring trends during these times, which we have added to the graph;



Let’s now take a more in-depth look at each peak.

What were the causes of these fluctuations?

By adding the average hourly sentiment polarity to this graph we can start to gather a better understanding of how people felt while writing their tweets.

Not familiar with sentiment analysis? This is a feature of text analysis and natural language processing (NLP) that is used to detect positive or negative polarity in text. In short, it tells us whether a piece of text, or a tweet in this instance, has been written in a positive, negative or neutral way. Learn more.

Interestingly, each tweet volume peak correlates with a sharp drop in sentiment. What does this tell us? People were taking to Twitter to complain!



Positivity overall

Overall, average sentiment remained in the positive (green) for the entire week. That dip into negative (red) that you can see came during the early hours of Day 2 as news of the US election result broke. Can’t blame the Web Summit for that one!



We can also see distinct rises in positive sentiment around the 5pm mark each day as attendees took to Twitter to reflect on an enjoyable day.


Sentiment also remained comparatively high during the later hours of each day as the Web Summit turned to Night Summit – we’ll look at this in more detail later in the post.

20161110_182007

Mike, Afshin, Noel & Hamed after a hectic but enjoyable day at the Web Summit

What was the overall sentiment of the tweets?

The pie chart below shows the breakdown of all 80,000 tweets, split by positive, negative and neutral sentiment.



The majority of tweets (80%) were written in a neutral manner. 14% were written with positive sentiment, with the remaining 6% written negatively.

To uncover the reasons behind both the positive and negative tweets, we extracted and analyzed mentioned keywords to see if we could spot any trends.

What were the most common keywords found in positive tweets?

We used our Entity and Concept Extraction features to uncover keywords, phrases, people and companies that were mentioned most in both positive and negative tweets.

As you can imagine, there were quite a few keywords extracted from 80,000 tweets so we trimmed it down by taking the following steps;

  • Sort by mention count
  • Take the top 100 most mentioned keywords
  • Remove obvious or unhelpful keywords (Web Summit, Lisbon, Tech, etc)

And here are our results. You can hover over individual clusters to see more information.



We can see some very positive phrases here, with great, amazing, awesome, good, love and nice featuring prominently.

The most mentioned speaker from the positive tweets was Gary Vaynerchuk (@garyvee), which makes sense considering the sharp rise in positive sentiment we saw his fans produce earlier in this post on our sentiment-over-time graph.

What were the most common keywords found in negative tweets?

We took the exact same approach to generate a list of the most mentioned keywords from tweets with negative sentiment;



For those of you that attended Web Summit, it will probably come as no surprise to see WiFi at the forefront of the negativity. While it did function throughout the event, many attendees found it unreliable and too slow, leading to many using their own data and hotspotting from their cell phones.

Mentions of queue, long, full, lines and stage are key indicators of just how upset people became while queueing for the opening ceremony at the main stage, only for many to be turned away because the venue became full.

The most mentioned speaker from negative tweets was Dave McClure (@davemcclure). The 500 Startups Founder found himself in the news after sharing his views on the US election result with an explosive on-stage outburst. It should be noted that just because Dave was the most mentioned speaker from all negative tweets, it doesn’t necessarily mean people were being negative towards him. In fact, many took to Twitter to support him;


Much of the negativity came from people simply quoting what Dave had said on stage, which naturally contained high levels of negative sentiment;

Which speakers were mentioned most?

Web Summit 2016 delivered a star-studded line up of a total of 663 speakers. What we wanted to know who was, who was mentioned most on Twitter?

By combining mentions of names and Twitter handles, we generated and sorted a list of the top 25 most mentioned speakers.



Messrs Vaynerchuk and McClure once again appear prominently, with the former being the most mentioned speaker overall throughout the week. Joseph Gordon-Levitt, actor and Founder of HitRECord, came in in second place, followed by Web Summit founder Paddy Cosgrave.

Which airline flew to Lisbon with the happiest customers?

With attendees visiting Lisbon from 166 countries, we thought it would be cool to see which airline brought in the happiest customers. By extracting mentions of the airlines that fly in to Lisbon, we could then analyze the sentiment of the tweets in which they were mentioned.

For most airlines, there simply wasn’t enough data available to analyze. However, we did find enough mentions of Ryanair and British Airways to be able to analyze and compare.

Here’s what we found;

Ryanair vs. British Airways

The graph below is split into three levels of sentiment – positive, neutral and negative. Ryanair is represented in blue and British Airways in red.



It’s really not hard to pick a winner here. British Airways were not only mentioned in more positive tweets, they were also mentioned in considerably less negative tweets.

Night Summit: which night saw the highest tweet volumes?

In total we found 593 mentions of night summit. The graph below shows tweet volumes for each day, and as you can see, November 7 was a clear winner in terms of volume.



..and which morning saw the most hangovers?!

Interestingly, we found a correlation between low tweet volumes (mentioning Night Summit, #nightsummit, etc.) and higher mentions of hangovers the following day!

59% of tweets mentioning hangover, hungover, resaca, etc, came on November 10 – the day after the lowest tweet volume day.

35% came on November 9 while just 6% came on November 8 – the day after the highest tweet volume day.

What do these stats tell us? Well, while we can’t be certain, we’re guessing that the more people partied, the less they tweeted. Probably a good idea 🙂

Conclusion

In today’s world, if someone wants to express their opinion on an event, brand, product, service, or anything really, they will more than likely do so on social media. There is a wealth of information published through user generated content that can be accessed in near real-time using Text Analysis and Text Mining solutions and techniques.

Wanna try it for yourself? Click the image below to sign up to our Text Analysis API with 1,000 free calls per day.
 




Text Analysis API - Sign up




0

Intro

In recent months, we have been bolstering our sentiment analysis capabilities, thanks to some fantastic research and work from our team of scientists and engineers.

Today we’re delighted to introduce you to our latest feature, Sentence-Level Sentiment Analysis.

New to Sentiment Analysis? No problem. Let’s quickly get you up to speed;

What is Sentiment Analysis?

Sentiment Analysis is used to detect positive or negative polarity in text. Also known as opinion mining, sentiment analysis is a feature of text analysis and natural language processing (NLP) research that is increasingly growing in popularity as a multitude of use-cases emerge. Here’s a few examples of questions that sentiment analysis can help answer in various industries;

  • Brands – are people speaking positively or negatively when they mention my brand on social media?
  • Hospitality – what percentage of online reviews for my hotel/restaurant are positive/negative?
  • Finance – are there negative trends developing around my investments, partners or clients?
  • Politics – which candidate is receiving more positive media coverage in the past week?

We could go on and on with an endless list of examples but we’re sure you get the gist of it. Sentiment Analysis can help you understand the split in opinion from almost any body of text, website or document – an ideal way to uncover the true voice of the customer.

Types of Sentiment Analysis

Depending on your specific use-case and needs, we offer a range of sentiment analysis options;

Document Level Sentiment Analysis

Document level sentiment analysis looks at and analyzes a piece of text as a whole, providing an overall sentiment polarity for a body of text.

For example, this camera review;

Screen Shot 2016-11-22 at 17.56.07

receives the following result;

Screen Shot 2016-11-22 at 17.56.14

Want to test your own text or URLs? Check out our live demo.

Aspect-Based Sentiment Analysis (ABSA)

ABSA starts by locating sentences that relate to industry-specific aspects and then analyzes sentiment towards each individual aspect. For example, a hotel review may touch on comfort, staff, food, location, etc. ABSA can be used to uncover sentiment polarity for each aspect separately.

Here’s an example of results obtained from a hotel review we found online;

Screen Shot 2016-11-22 at 17.58.05

Note how each aspect is automatically extracted and then given a sentiment polarity score.

Click to learn more about Aspect-Based Sentiment Analysis.

Sentence-Level Sentiment Analysis (SLSA)

Our latest feature breaks down a body of text into sentences and analyzes each sentence individually, providing sentiment polarity for each.

SLSA in action

Sentence-Level Sentiment Analysis is available in our Google Sheets Add-on and also through the ABSA endpoint in our Text Analysis API. Here’s a sample query to try with the Text Analysis API;

Now let’s take a look at it in action in the Sheets Add-on.

Analyze text

We imported some hotel reviews into Google Sheets and then ran an analysis using our Text Analysis Add-on. Below you will see the full review in column A, and then each sentence in a column of its own with a corresponding sentiment polarity (positive, negative or neutral), as well as a confidence score. This score reflects how confident we are that the sentiment is correct, with 1.0 representing complete confidence.

Screen Shot 2016-11-23 at 17.54.55

Analyze URLs

This new feature also enables you to analyze volumes of URLs as it first scrapes the main text content from each web page and then runs SLSA on each sentence individually.

In the GIF below, you can see how the content from a URL on Business Insider is first broken down into individual sentences and then assigned a positive, negative or neutral sentiment at sentence level, thus providing a granular insight into the sentiment of an article.

SLSA

What’s the benefit of SLSA?

As we touched on earlier, sentiment analysis, in general, has a wide range of potential use-cases and benefits. However, Document-Level Sentiment Analysis can often miss out on uncovering granular details in text by only providing an overall sentiment score.

Sentence-Level Sentiment Analysis allows you to perform a more in-depth analysis of text by uncovering the positive, neutral and negatively written sentences to find the root causes of the overall document-level polarity. It can assist you in locating instances of strong opinion in a body of text, providing greater insight into the true thoughts and feelings of the author.

SLSA can also be used to analyze and summarize a collection of online reviews by extracting all the individual sentences within them that are written with either positive or negative sentiment.

Ready to get started?

Our Text Analysis Add-on for Google Sheets has been developed to help people with little or no programming knowledge take advantage of our Text Analysis capabilities. If you are in any way familiar with Google Sheets or MS Excel you will be up and running in no time. We’ll even give you 1,000 free credits to play around with. Click here to download your Add-on or click the image below to get started for free with our Text Analysis API.

 




Text Analysis API - Sign up




0

PREVIOUS POSTSPage 2 of 17NEXT POSTS