2 minute read
We think data curation is cool – and here’s why you should too.
People from a publishing background will recognise the importance of data curation – selecting content for a purpose based on its characteristics.
Table of contents
People from a publishing background will recognise the importance of curation – selecting content for a purpose based on its characteristics.
That same mentality can be applied to enterprise data.
A large business has so much data, in so many places, with different data structures, data models and data standards.
Search
The first problem is search; how can a business user even find information? The second problem is relevance. How can you find information when you don’t know what is there?
The answer is to understand the relationship between data elements. We need a rich, inclusive language for describing how data elements relate according to their type.
How we think about data-intensive projects
Datavid’s approach is to use standard coding frameworks and Commercial off the Shelf (COTS) software.
This way, we stand on the shoulders of giants to deliver cost effective business results.
Fortunately, the computer science for managing relationship data has already been created.
Semantics has a beguilingly simple construct:
Subject -> Predicate -> Object
(e.g Andrew lives in London)
Context makes data intelligent, to understand that a Purchase Order relates to an Invoice for example. We can also infer new facts.
If we know a person lives in London, we can answer the question:
Do they live in England?
But how did the system know that London is in England?
By structuring the data in a way that is understandable by machines and easily retrieved by humans, the system feeds itself this knowledge through iterations of data mining, labeling, and linking.
The role of ontologies (i.e. structured dictionaries)
The technology already exists to create “structured dictionaries,” or as they are known in the field: Ontologies.
An ontology explains hierarchies and synonyms.
By this means, we can start to understand degrees of separation between entities.
This is the world that Datavid inhabits – leveraging existing artifacts, applying our methodology and know-how to solve complex data problems through concepts such ontologies and entities.
The role of all of this?
To help business users find hidden information that is relevant to them, which is paramount to the success of a modern enterprise looking to streamline their processes.
Ontologies aren’t the only piece of the puzzle though…
Bringing data together in one place
If you want to leverage your information effectively, it’s key that you develop a strategy for it; sometimes referred to as an Enterprise Data Management framework.
Without the expertise to back up this effort, it takes time to master the best practices and find real value in your data management journey.
That’s why Datavid is committed to getting you there faster.
Our team of expert MarkLogic consultants has worked on dozens of data-intensive projects requiring data curation and enterprise search expertise.
We work with data hubs and warehouses to source all of your raw data and give it context; then we surface it through intuitive web interfaces which employees can feel comfortable using.
If you think that data curation (and discovery) is the perfect tool for your enterprise to make an impact both internally and on the broader market, give us a hint.
We’d love to guide your way forward.
Frequently asked questions
An example of data curation is the process of selecting, organizing, and managing a collection of research data for long-term usability and accessibility. For instance, a data curator may curate a dataset by documenting its metadata, ensuring proper formatting and consistency, and applying data quality checks.
The steps of data curation include selection, documentation, organization, quality assurance, preservation, access and sharing, and discovery and reuse.
Data curation is utilized by various stakeholders, including researchers, data scientists, librarians, archivists, and data professionals, but can be used as approach by anyone in any industry.