Last updated on February 14, 2022 by Balvinder Dang
Your enterprise company stands to benefit greatly from entity extraction but technology jargon can sometimes get in the way of understanding its true business value. Here’s what entity extraction can do for your enterprise.
What is entity extraction?
Entity extraction is an automation technique to generate knowledge from content and, more specifically, to generate knowledge from documents.
Enterprise organisations have focused on the management and analysis of structured data to power business intelligence applications and decision support.
However, this data is dwarfed by the amount of data held in documents.
Examples would be
- Patient Medical Records and case notes for Healthcare;
- Regulations, studies and formulations for Pharmaceuticals;
- Policies, claims and schedules for Insurers.
The common feature is that documents have no standard format.
Even a simple document like an invoice can have multiple presentations and types that frustrate machine processing. The default method is to have human experts read documents to understand their context and relevance.
Clearly, this is hugely expensive, as well as being slow and unreliable.
This is particularly acute when a business wants to determine a negative:
“Tell me if we have ever seen this claim / event / customer before”
Entity extraction offers the promise of automating the process; reducing time, reducing cost, standardising quality and massively increasing the amount of data that can be brought to bear for difficult use cases such as drug effectiveness, fraud detection and customer relationship management. Looking to the future; large, rich data sets are going to be particularly valuable to power AI applications.
Some entities are trivial to extract, for example a date or a postcode. But even for these examples, questions arise as to what the date or postcode relates to:
- Was it the date the document originated, or when an event occurred?
- Is it the customer’s address or that of a 3rd party?
And having extracted these entities:
» How should they be managed and applied to solve business problems?
All of these questions are answered when entity extraction is implemented to solve the specific problems that your enterprise faces, generating value across departments.
The value of entity extraction for your enterprise
The combination of data models and low-cost compute unlocks the door to using entity extraction. The data models may be pre-existing Ontologies (for example SNOMED for medical terms) or unique data models developed from training data.
The data models “understand” specialised terms, their meaning and more importantly how they relate to other concepts and entities.
This can be extremely complicated and is particularly unsuitable for the standard relational database technologies dominant in enterprise computing.
Fortunately, computer science has already developed the techniques to model and manage entity data.
Graph databases allow hierarchical, parent-child structures to be represented.
Semantics give an elegant method of capturing relationship data.
- If a customer lives in London, we can also infer they live in England even though this fact was not present in the original data, or;
- A simple query about the number of finger injuries will also understand synonyms e.g., finger/digit/phalanx and comparable terms like thumb, nail, palm.
With entities extracted and relationships defined, the final part is to make this data available to search from a user interface. Typically, this is done either through existing internal applications or existing business intelligence applications.
Entity extraction: 6 problems to overcome to maximize value
There are 6 fundamental problems to overcome for the proper implementation of entity extraction, all of which can be solved through a rigorous design process:
1. Creating the training model if no ontology exists
Clearly, both the quantity and the quality of data itself area big part of solving the entity extraction puzzle. Without an existing ontology to work with, integrating the necessary information from public sources will take a lot longer.
That’s why Datavid stands on the shoulders of giants like public organisations (or private enterprises) when solving this first problem. Simply sourcing the data without the opinionated expert curation will lead to lower-quality results.
2. Gathering data from multiple sources (data silos)
Once each data source is identified, the problem is bringing it all together in one place. That’s because each source will have its own unique schema, making it basically impossible to extract entities without first unifying the data.
Also, orchestrating extracts is an important part of the data integration process. Some of the data can be imported immediately whereas other extracts might need scheduling depending on whether the data needs to be refreshed regularly.
3. Managing multiple data types in one database
Getting the data to interact with one another in its original type (graph, semantics, tabular) is tricky. This can be accomplished with multi-model databases such as MarkLogic Server but can’t be done easily with other technologies.
Without the proper technologies and know-how, this problem can make or break an entity extraction project. At Datavid we recommend MarkLogic because of its 3-in-1 formula: multi-model database, data hub, and cognitive search engine.
4. Presenting the data in a business-friendly format
To make the most of the information extracted, a subject matter expert must be able to access it in a user-friendly, intuitive interface made for humans. They shouldn’t have to write code, edit spreadsheets, or anything in between.
One way to achieve this is to integrate the entity extraction engine to an existing business intelligence application. Another one is to leverage a data discovery platform (like Datavid Rover) designed to fit this specific purpose.
5. Security and data curation across the system
All businesses deal with data today, but enterprises in particular deal with high volumes of sensitive data on a daily basis. Without a secure implementation, your intellectual property and customers’ private information is at risk.
The problem here is all-encompassing; everything has to be developed with security and data curation in mind. The possibility of inferring highly personal facts from snippets of data has important privacy and legal ramifications.
6. Delivering in a timely and cost-effective manner
Data projects at the enterprise level are complex; they require the right technical expertise and management capacity to be delivered successfully. Starting an entity extraction project without properly assessing these resources is a big risk.
To cut down on overall project costs and deliver a working solution in a timely manner, hiring an external partner is often the best way to go. Datavid for example has decades of combined experience delivering large-scale data projects.
With these 6 problems (and solutions) in mind, you are ready to start considering your options and extracting additional value from your enterprise’s documents.
The best tool for your entity extraction project
There’s no “one-size-fits-all” solution to a successful entity extraction project at the enterprise level. But there is a next best thing: a data discovery platform.
With platforms like Datavid Rover, your enterprise can almost immediately solve the data silo, data type, and security problems associated with entity extraction.
Having a secure application as a baseline to start feeding and analysing your data also increases the likelihood of success and cuts down the delivery time.
To learn more about what Datavid Rover can do for your enterprise, request a free 30-minute consultation and start extracting valuable entities from your documents.