Getting Started with the “Text Analysis by AYLIEN” Extension for RapidMiner
“Text Analysis by AYLIEN” is an Extension made up of different Operators that allow you to analyze and make sense of textual data from within RapidMiner. The different Operators contained in “Text Analysis by AYLIEN” include the following:
- Sentiment Analysis
- Entity Extraction
- Language Detection
- Hashtag Suggestion
- Related Phrases
Prefer to watch a video tutorial? Click here.
Getting started with the extension is easy. To run you through the setup and how to use it we’re going to do the following:
- Install the Extension and get it up and running
- Create and run a simple RapidMiner Process that will analyze the Sentiment of a sample piece of text from a Document
- Create and run a Process that analyzes the Sentiment of text using an ExampleSet instead of a Document
- Create and run a Process that uses the Search Twitter Operator to collect and analyze Tweets
“Text Analysis by AYLIEN” can be found in the RapidMiner Marketplace, you can navigate directly to the Marketplace while in RapidMiner Studio by using the side panel.
Once you’ve installed the Text Analysis extension, you can find its Operators from within RapidMiner by simply searching for AYLIEN. Here you’ll see the list of Operators that were installed as part of the Text Analysis Extension.
Credentials and Connecting
The first thing we need to do before we can start analyzing text, is make sure we’re connected to the AYLIEN API. You can configure your connections under settings and Manage Connections.
To connect to the AYLIEN API you need an App ID and API Key. If you haven’t already got yours you can grab one for free here.
Create a new connection of type “Aylien Text Analysis Connection”, add your credentials (App ID and API Key) as shown below and hit “Save all changes”.
Now we’re pretty much good to go, and can reuse the connection we just created in all Text Analysis Operators.
Note: Depending on your subscription plan, you are subject to per-minute and daily rate limits (60 calls/minute and 1,000 calls/day on the Free plan). Once you reach your per-minute limit, the Operator will wait for a few seconds before running the subsequent batches, repeating until all the rows have been analyzed. If you reach your daily limits, you’ll get an alert in RapidMiner, but the already analyze documents will be gracefully returned.
Example 1. – Document Sentiment Analysis
As shown below the first thing we do is add an Analyze Sentiment (Document) Operator to our Process.
In this case, we’ve also added a Create Document Operator as shown in the screenshot below, where we’ll type or paste the text we want to analyze.
Add the text you want to analyze to the document, one of our teammates is a big softy and loves puppies, so for the purpose of this tutorial we’re just using a simple quote from him; “I love puppies”, which we’ll use to demonstrate the extension in action.
So now we’ve completed the bare bones of our Process. It’s setup to analyze the sentiment of a piece of text we’ve written in a document.
To run the Process, hit play, but before you do, make sure you connect your Operators to each other and the results ports (we’re always forgetting to do this!) and also select the Connection you created earlier, in the Analyze Sentiment Operator.
Your results will be displayed in a results tab, similar to below.
The next thing we want to show you is how you can analyze text from an ExampleSet rather than a Document. To do this we still use a Sentiment Analysis Operator, but the ExampleSet version instead of the Document version. We’ve bundled two versions of some of the Operators for your convenience, that you can choose from based on the data you’re analyzing. The two formats can be easily converted to one another using the Documents to Data and Data to Documents Operators.
Example 2. – ExampleSet Sentiment Analysis
As before, just add the Analyze Sentiment Operator to your Process.
You’ll also need to add a Data Generation Operator, which will essentially create a basic ExampleSet with the text we want to analyze in it. Note that this could’ve been similarly obtained from a CSV file or a database.
Make sure they’re all connected again and hit run. In this case your results will look a little different and they’ll be stored in a table like below, with the Polarity and Subjectivity confidence scores listed first and the Polarity and Subjectivity results listed second.
The last thing we wanted to show you in our basic tutorial is, how you can leverage other data sources and operates as part of your Process or mashup. In this case, we’re going to combine the Search Twitter Operator with our Sentiment Operator to pull tweets into RapidMiner, analyze their sentiment and visualize the results.
Example 3 – Using Search Twitter Operator
As we did in our previous example, we’ll use an Analyze Sentiment Operator and combine it with a Search Twitter Operator as shown in the screenshot below.
With the Search Twitter Operator you can create queries exactly as you would when using the Twitter search API. You can see in the right-hand side of the screenshot, below we’ve created a query that searches for the keyword “puppies” removing retweets (-rt) and links (-http) to reduce noise and duplications. We’ve made sure the search only pulls in English tweets, by setting our language to English using “en” in the language parameter. We’ve also put a limit of 20 Tweets on our results, but it’s up to you how many you analyze. However, you should keep in mind that we have rate limits of 60 calls/minute on our Free plan when building your search.
Hit run and your results will be displayed in an ExampleSet similar to the one below.
You can create some simple visualizations to display your results by going to the “Charts” section on the left-hand side of the results screen. We created a very simple pie-chart which shows the distribution of positive, negative and neutral tweets.
There you have it, analyzing text in RapidMiner has never been easier.
We’re really excited to see what kind of mashups and Processes RapidMiner users come up with. For the more seasoned RapidMiner user, we’ve also put together some more advanced tutorials/mashups, one using Twitter Search and the other utilizing RSS feeds.
We’ve also put together a repository for sample Processes that we’ll be adding to on a regular basis. It will be a collection of use case focused RapidMiner Process that can be downloaded and imported directly into RapidMiner. You can find more info in the documentation section of our website.
P.S. If you’ve built or are building something cool, tell us about it, we may even feature it on our blog!