Product

Analyzing Text in RapidMiner – Part 1

One of the major challenges with mining the Web and Social Media for insights is trying to get all of your data into one place. To do this, you need to extract information from multiple sources in order to gain an accurate and holistic view.

Combining multiple data sources and analyzing their content can be a daunting task, but thankfully data mining frameworks such as RapidMiner and Weka make it easy to extract information from multiple sources in a quick and straightforward manner.

In this blog post, we’re going to show you how to use AYLIEN’s Text Analysis API from within RapidMiner to analyze text gathered from sources on the web.

The Web Mining extension for RapidMiner provides access to internet sources like web pages, RSS feeds, and web services. In this tutorial, we’re going to use it to make HTTP requests to the Text Analysis API. In part 2 we will use it to scrape information from web pages such as Rotten Tomatoes.

Requirements

  • RapidMiner v5.3+ (download)
  • Text Analysis API key (subscribe for free here)

Step 1: Install Web Mining for RapidMiner

  • Open the RapidMiner Marketplace by selecting Help > Updates and Extensions (Marketplace)
  • Search the Marketplace for “Web Mining” and install the extension

Step 2: Setup the API call

The Web Mining package provides you with an operator for invoking external web services. This operator is called “Enrich Data by Webservice” and can be found in the Operators panel under Web Mining > Services > Enrich Data by Webservice.

  • Drag and drop an instance of the Webservice operator into your process
  • Select the operator to access its configuration parameters
  • Set the following values for the parameters:
    • url: “https://api.aylien.com/api/v1/sentiment?mode=tweet&text=<%text%>” or if you’re using Mashape: “https://aylien-text.p.mashape.com/sentiment?mode=tweet&text=<%text%>”
    • request method: POST
    • body: “text=<%title%>”
    • request properties:
      • Accept: “text/xml”
      • X-AYLIEN-TextAPI-Application-Key: “YOUR_API_KEY”
      • X-AYLIEN-TextAPI-Application-ID: “YOUR_APPLICATION_ID”
      • If you’re using Mashape: X-Mashape-Key: “YOUR_API_KEY”
    • query type: XPath
    • xpath queries:
      • polarity: “//polarity/text()”

Here we are basically calling the /sentiment endpoint of the Text Analysis API to analyze the sentiment of some text in order to find out if it’s positive, negative or neutral.

Step 3: Setup the input text

Now that our API call is setup, we need to provide the operator with some input text.

  • Install the Text Processing extension, the same way you installed the Web Mining extension in Step 1
  • Add an instance of the Text Processing > Create Document operator
  • Select the Create Document operator and add some text by clicking Edit Text
  • Add the Text Processing > Documents to Data operator to convert the Document to an ExampleSet, and set the text attribute parameter to “text”
  • Add the Web Mining > Utility > Encode URLs operator to URL-encode the text, and set the url attribute parameter to “text”
  • Finally, connect the URL-encoded text input to the Enrich Data by Webservice operator created in Step 2

Step 4: Run!

Now that we have everything setup, it’s time to run our process by clicking the Run button.

As you can see, “I love puppies!” was deemed to be positive and the result is now accessible in RapidMiner for further analysis and reporting. You could use one of the many other methods provided in the Text Processing package to generate any number of documents and analyze their sentiment in the same fashion. Also, by changing the url parameter in the API call you can access any other endpoint from the Text API (Concept Extraction, Classification, Summarization and so on).

Next stop: analyzing movie reviews

In the 2nd part of this series, we’re going to crawl Rotten Tomatoes with RapidMiner to extract movie reviews and analyze their sentiment to gain some interesting insights.

For more examples of how we used a similar setup to analyze millions of tweets about various events such as World Cup 2014 and the #AppleLive event, check out our previous blog posts.

<



Text Analysis API - Sign up





Author


Avatar

Mike Waldron

Head of Marketing & Sales @ AYLIEN A legal convert with a masters degree from Smurfit Business School, Mike runs our Sales and Marketing at AYLIEN. Mike gathered his Sales and Marketing experience with technology companies in Sydney and Dublin before getting the startup itch and joining the team at AYLIEN. Twitter: @MikeWallly