Product

Batch processing documents with Text API

Making API requests one by one can be inefficient when you have a large number of documents you wish to analyze. We’ve added a batch processing feature that makes it easier to process a large number of documents all at once using the Text Analysis API.

Steps to use this feature are as follows:

Step 1. Package all your documents in one file

Start by putting all your documents (or URLs) in one big text file – one document/URL per line. Example:

Don't panic.
Time is an illusion. Lunchtime doubly so.
For a moment, nothing happened. Then, after a second or so, nothing continued to happen.

Step 2. Make a batch request and obtain job identifier

Calling the /batch endpoint creates a new analysis job that will be processed eventually. There are a couple of parameters that you need to provide to /batch:

Parameter Description Possible values Required
data Data to be analyzed †
endpoints Comma separated list of Text Analysis API endpoints classify, concepts, entities, extract, language, sentiment, summarize, hashtags
entities_type Type of entities in your file, whether they are URLs, or texts text, url
output_format The format you wish to download the batch results in (Default: json) json, xml

† Maximum file size is 5MB

All other parameters sent to /batch will be passed down to the endpoints you’ve specified in endpoints in an as-is manner. For example:

curl -v -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY"
    -F data=@"/home/amir/42"
    -F "endpoints=sentiment"
    -F "entities_type=text"
    -F "output_format=xml"
    -F "mode=tweet" https://aylien-text.p.mashape.com/batch

Will upload contents of file /home/amir/42, and indicates that each line is a text (not a URL), desired operation is sentiment analysis, and you wish to download the results in XML format.

A successful request will lead to a 201 Created, with a Location header which indicates the URI you can poll to get the status of your submitted job. For you convenience URI is also in the body of response.

Step 3. Poll the job status information until it is finished

You can call the URI obtained from last step to see the status of your job. Your job can be in either one of these states: pending, in-progress, failed, or completed. If your job is completed you’ll receive 303 See Other with a Location header indicating where you can download your results. Its also in the body of your response. Example:

curl -H "X-Mashape-Authorization: YOUR_MASHAPE_KEY"
    -H "Accept: text/xml"
    "https://aylien-text.p.mashape.com/queue?uuid=68e16fe3-3cde-43dd-86b7-52136b398e0d"

Sample response (XML):

<result><status>completed</status><location>https://textapi-batch-results.s3.amazonaws.com/...</location></result>

And sample JSON response:

{
    "status": "completed",
    "location": "https://textapi-batch-results.s3.amazonaws.com/..."
}

Step 4. Download your results

location value obtained from the last step, is a pre-signed S3 Object URL which you can easily download using curl, or wget. Please note that results will be kept only for 7 days after the job is finished and will be deleted afterwards. If you fail to obtain the results during this period, you must re-submit your job.

Happy crunching!

Heads up: Processing time

If your job is urgent we don’t recommend using the batch processing endpoint. Due to the way our queuing system works, in busy periods you might see a delay of up to a few days before your job is processed. If you would like to execute multiple endpoints for a single document in one call, we recommend using our Combined Calls endpoint instead.

Heads up: Batch calls calculation

A set of N documents and M endpoints batched together counts toward your usage as N * M requests, not as one request. And the maximum file size is 5MB.





Text Analysis API - Sign up





Author


Avatar

Parsa Ghaffari

CEO and Founder of AYLIEN Parsa is an AI, Machine Learning and NLP enthusiast, whose aim is to make these techniques and technologies more accessible and easier to use for developers and data scientists. When he’s not working he likes to play chess ('parsabg' on lichess.org). Twitter: @parsaghaffari