The Difference Between Entities and Keywords, and When To Use Them
Searching for news should be an efficient, accurate and insightful process, no matter what role you’re in, be it Risk Intelligence, Media Monitoring, or Investment Research to name just a few. Ideally, your search will return all the relevant and reliable information you need with the minimum input in the fastest time possible.
Unfortunately for analysts in the risk, finance or media space this is often not the case. Traditional media intelligence processes and workflows are heavily reliant on manual searches based on keywords, which in today’s lightening-fast world is no longer good enough. Inefficiencies aside, over reliance on keyword-based search and investigation processes is problematic because they are prone to returning irrelevant results due to lexical ambiguities (e.g. Apple vs apple). Working around this problem requires lengthy, complex queries that can take a huge amount of time to create, update and maintain, which negatively affects productivity.
There is a better way of searching, however, that focuses on more than just keywords.
Leveraging Artificial Intelligence (AI), and specifically Machine Learning (ML) and Natural Language Processing (NLP), to search for news by entities, helps save time and increase precision. It also reduces false positives, as well as providing a variety of other insights previously not possible. This blog will take a look at the differences between entities and keywords, and the benefits of using the former over the latter.
What Is An Entity?
The Oxford English Dictionary provides a basic starting point of what an entity is, with its definition being ‘a thing with distinct and independent existence’. In the context of this blog, news articles contain mentions of things like people, places, products, companies, and even concepts. Collectively we call these entities. In theory they should be distinct and independent of other entities, and so can be used as search inputs to deliver more relevant news articles with minimal effort by the end user. There are, of course, problems that arise with entity extraction too.
For instance there is the problem where two or more entities have identical spellings, but are very different things (again, apple vs Apple, or jaguar vs Jaguar). Unlike a keyword search, however, leveraging NLP as we do in our News API will make accurate predictions about which entity is being referred to by considering the rest of the article or document for context, as well as using a knowledge base such as DBpedia/Wikipedia for further context. Taking the above ‘Apple vs apple’ example, it will filter out articles about the fruit if you are looking for articles about the company in a process called Named Entity Disambiguation.
The Benefit of Entities Over Keywords
To demonstrate the benefits of entities further, let’s look at an example of a problematic keyword search. In this case, searching for the music streaming company ‘Pandora’ opens a Pandora’s Box of ambiguities and irrelevant articles.
Analysts will quickly realise that there are results returned for the music streaming service, but also results for the Danish jewellery company called Pandora, the Greek myth, the DC comic character, the feature film, the TV series, the novel, the theme park, the musician, the sculpture, the places, and even the fungus! The list goes on.
A boolean search could help you filter to only include the music streaming company, but queries can get very long and complicated very quickly. Here’s a short version of what a search string could look like to filter out some of the other main results for ‘Pandora‘:
“Pandora” NOT (“Pandora A/S” OR “Jewellery” OR “Denmark” OR “Danish” OR “Charms” OR “Rings” OR “Bracelets” OR “Necklaces” OR “Earrings” OR “Pandora film” OR “Park Jung-woo” OR “South Korean disaster film” OR………………………………………….you get the picture ;).
Now imagine doing that for every query, and having to update and maintain queries as time goes by.
In contrast, let’s see what the boolean string looks like beside the entity search for the exact same search intent:
That’s how quick and easy it is. Leveraging NLP as part of your news discovery and investigation process doesn’t have to be difficult. As part of our News API we offer upwards of 25 different search and filtering parameters, which include Entity, Concept and Category based search to ensure you can find what matters to you in news content without having to rely on non scalable keyword queries. You can also read more about when to use entities over keywords in our documentation.
When To Use Keywords
This is not to say that keywords are obsolete. Far from it. They are an essential search tool with two very particular uses in this context.
The first is when searching for an entity that is not very well known. One that doesn’t have a Wikipedia entry, for instance. The vast majority of people do not have a Wikipedia page dedicated to them, so if a company is carrying out KYC or CDD research on an unknown entity or individuals, a keyword-based search is perhaps the best route to take.
The other way that keywords are valuable is when refining an entity search. For example, combining the entity search Pandora Music with the keyword search ‘number of customers’ will really hone your search to deliver exactly what you are looking for.
To test Aylien’s News API and discover the benefits of entity search for yourself, you can sign up for a free 14-day trial here.