Product

Feature Update: Microformats Extraction Added to AYLIEN Text Analysis API

We’ve just added support for microformat parsing to our Text Analysis API through our Microformat Extraction endpoint.

Microformats are simple conventions or entities that are used on web pages, to describe a specific type of information, for example, Contact info, Reviews, Products, People, Events, etc.

Microformats are often included in the HTML of pages on the web to add semantic information about that page. They make it easier for machines and software to scan, process and understand webpages. AYLIEN Microformat Extraction allows users to detect, parse and extract embedded Microformats when they are present on a page.

Currently, the API supports the hCard format. We will be providing support for the other formats over the coming months. The quickest way to get up and running with this endpoint is to download an SDK and checkout the documentation. We have gone through a simple example below to showcase the endpoints capabilities.

Microformat Extraction in Action

The following piece of code sets up the credentials for accessing our API. If you don’t have an AYLIEN account, you can sign up here.


var AYLIENTextAPI = require('aylien_textapi');
var textapi = new AYLIENTextAPI({
    application_id: YOUR_APP_ID,
    application_key: ‘YOUR_APP_KEY'
});

The next piece of code accesses an HTML test page containing microformats, that we have setup in codepen to illustrate how the endpoint works (check out http://codepen.io/michaelo/pen/VYxxRR.html to see the raw HTML). The code consists of a call to the microformats endpoint and a forEach statement to display any hCards detected on the page.


textapi.microformats('http://codepen.io/michaelo/pen/VYxxRR.html',
    function(err, res) {
    if (err !== null) {
        console.log("Error: " + err);
    } else {
        res.hCards.forEach(function(hCard) {
            console.log(hCard);
            console.log("n****************************************");
            console.log("End Of vCard");
            console.log("******************************************");
        });
    }
});

As you can see from the results below, there are two hcards on the page, one for Sally Ride and the other for John Glenn. The documentation for the endpoint shows the structure of the data returned by the endpoint and lists the optional hCard fields that are currently supported. You can copy the code above and paste it into our sandbox environment to view the results for yourself and play around with the various fields.

Results


{ birthday: '1951-05-26',
  organization: 'Sally Ride Science',
  telephoneNumber: '+1.818.555.1212',
  location:
   { id: '9f15e27ff48eb28c57f49fb177a1ed0af78f93ab',
     latitude: '37.386013',
     longitude: '-122.082932' },
  photo: 'http://example.com/sk.jpg',
  email: 'sally@example.com',
  url: 'http://sally.example.com',
  fullName: 'Sally Ride',
  structuredName:
   { familyName: 'van der Harten',
     givenName: 'Sally',
     honorificSuffix: 'Ph.D.',
     id: 'fe8b0d3222512769e99cd64d256eeda2cadd2838',
     additionalName: 'K.',
     honorificPrefix: 'Dr.' },
  logo: 'http://www.abc.com/pub/logos/abccorp.jpg',
  id: '7d021199b0d826eef60cd31279037270e38715cd',
  note: '1st American woman in space.',
  address:
   { streetAddress: '123 Main st.',
     countryName: 'U.S.A',
     postalCode: 'LWT12Z',
     id: '00cc73c1f9773a66613b04f11ce57317eecf636b',
     region: 'California',
     locality: 'Los Angeles' },
  category: 'physicist' }

****************************************
End Of vCard
****************************************


{ birthday: '1921-07-18',
  telephoneNumber: '+1.818.555.1313',
  location:
   { id: '265b201c7c65cee9af67cad1400c278a672b092a',
     latitude: '30.386013',
     longitude: '-123.082932' },
  photo: 'http://example.com/jg.jpg',
  email: 'johnglenn@example.com',
  url: 'http://john.example.com',
  fullName: 'John Glenn',
  structuredName:
   { familyName: 'Glenn',
     givenName: 'John',
     id: 'a1146a5a67d236f340c5e906553f16d59113a417',
     additionalName: 'Herschel',
     honorificPrefix: 'Senator' },
  logo: 'http://www.example.com/pub/logos/abccorp.jpg',
  id: '18538282ee1ac00b28f8645dff758f2ce696f8e5',
  note: '1st American to orbit the Earth',
  address:
   { streetAddress: '456 Main st.',
     countryName: 'U.S.A',
     postalCode: 'PC123',
     id: '8cc940d376d3ddf77c6a5938cf731ee4ac01e128',
     region: 'Ohio',
     locality: 'Columbus' } }

****************************************
End Of vCard
****************************************

Microformats Extraction allows you to automatically scan and understand webpages by pulling relevant information from HTML. This microformat information is easier for both humans and now machines to understand than other complex forms such as XML.





Text Analysis API - Sign up




Author


Avatar

Mike Waldron

Head of Marketing & Sales @ AYLIEN A legal convert with a masters degree from Smurfit Business School, Mike runs our Sales and Marketing at AYLIEN. Mike gathered his Sales and Marketing experience with technology companies in Sydney and Dublin before getting the startup itch and joining the team at AYLIEN. Twitter: @MikeWallly