Articles on all things data | Datavid blog

Data centricity and FAIR: Creating value from data

Written by Avinash Dixit | Jul 4, 2024

Data-centricity emphasises placing data at the core of organisational strategies and operations.

FAIR principle complements this approach by ensuring data is managed effectively to maximise value, empowering businesses to cultivate a culture of data-driven decision-making that fuels continuous improvement and innovation.

Fragmentation to cohesion

Data centricity is a modern approach that emphasises organising and managing data based on the scientific processes and insights it supports rather than the specific applications or platforms that handle it.

This paradigm shift is rooted in the belief that the actual value of data lies in its potential to be reused, shared, and repurposed across various contexts and applications, fostering greater scientific collaboration and innovation.

Data-centricity enables:

  • High-quality, reliable data: Fundamental for sound scientific conclusions.
  • Efficiency: Saves time and resources by avoiding the need to collect or process the same data repeatedly.
  • Interdisciplinary utilisation: Facilitates data usage across different fields, fostering interdisciplinary research and potentially leading to new scientific discoveries.

Managing data independently 

Data-centricity advocates for managing data independently of the specific systems or tools used to collect or process it.

This means designing data architectures that prioritise the long-term preservation and accessibility of data over the particularities of any one system.

The benefits to organisations include:

  • Cleaner data: More researchers can access it quickly, accelerating data discovery and increasing transparency.
  • Common inventory language: Reduces duplication of effort and addresses the lack of transparency.
  • Reduced data management burden: Frees scientists from the burden of data management, aligning with regulatory agencies' directions.
  • Standardised collaboration: Transitions from chaotic to standardised data, enabling better collaboration with external partners.

Data management woes

  • Persistent data silos: Continue to delay critical decision-making processes, hindering overall efficiency.
  • Inefficient data science efforts: Around 80% of data science efforts are consumed by data wrangling tasks, reducing the time available for analysis.
  • High IT costs: A substantial 50-60% of the IT budget is allocated to integration efforts, indicating significant expenditure that could be optimised with better data management solutions.

Enterprise Knowledge Graphs and data-centricity

Enterprise Knowledge Graphs (EKGs) and data-centricity are closely intertwined concepts that, when combined, provide a robust framework for managing, integrating, and leveraging data within an organisation.

An Enterprise Knowledge Graph (EKG) is an advanced data model that interlinks and contextualises data from various organisational sources, forming a comprehensive knowledge network.

It uses RDF to create interconnected datasets, applies ontologies for semantic context, and adapts flexibly to changes without extensive rework.

EKGs support complex queries via SPARQL, enabling seamless data integration, enhanced data discovery through semantic relationships, deeper contextual insights, and data reusability across multiple applications.

The following image depicts the transformation from application-centric to data-centric architecture using EKGs.

Linking FAIR principles with data-centricity

Linking FAIR (Findable, Accessible, Interoperable, Reusable) principles with data-centricity involves understanding how these principles can enhance a data-centric approach in research and data management.

This comprehensive exploration will delve into the core aspects of FAIR data, its significance in contemporary research and data management practices, and its influence on various stakeholders across multiple sectors.

FAIR principles

Findability

Metadata and data are identified by URI – Universal Resource Identifiers (F1), described with rich RDF-based metadata (F2), explicitly linked (F3), and indexed in searchable resources like triple stores(F4).

Accessibility

Data is retrieved via standardised protocols like SPARQL, supporting open, free, and authenticated access (A1, A1.1, A1.2). Metadata, stored in separate named graphs (A2), remains accessible even if data is unavailable.

Interoperability

Data uses RDF, and metadata uses OWL in searchable triple stores (I1). Vocabularies are in RDF and OWL, following FAIR principles and using specific references (I1, I3).

Reusability

Metadata includes detailed usability attributes (R1). Clear usage licenses and detailed provenance are provided(R1.1, R1.2). Data and metadata adhere to domain-relevant community standards(R1.3).

FAIR principles in data-centric decision making

In a data-centric approach, data is at the core of decision-making and processes. Ensuring that data is easily findable(F) enhances this approach by making relevant data readily available to stakeholders.

FAIR: Implementing metadata standards, using persistent identifiers (like DOIs), and maintaining well-organised data repositories makes data findable. This supports data-centric operations by reducing the time spent searching for data.

Access(A) to data is crucial for a data-centric approach, where data needs to be readily available for analysis and decision-making.

FAIR: Providing clear usage licenses, implementing authentication and authorisation protocols, and ensuring data can be retrieved efficiently to ensure accessibility. This enhances a data-centric approach by ensuring that data is available when needed under the right conditions.

Data from different sources often needs to be integrated and analysed in a data-centric approach. Interoperability(I) ensures that data from various origins can work together seamlessly.

FAIR: Adopting standard formats, vocabularies, and ontologies ensures interoperability. This is essential for data-driven approaches integrating diverse datasets to derive comprehensive insights.

For a data-centric strategy to be sustainable, data must be reusable(R) to avoid redundancy and build upon previous work.

FAIR: Clear and detailed documentation, adherence to community standards, and ensuring data quality and provenance make data reusable. This supports data-centric approaches by enabling data to be repurposed and recombined for various analyses, reducing duplication of effort.

Degrees of FAIR

The image depicts 6- levels of data maturity in the context of FAIR principles, ranging from raw, uncatalogued data to fully described and integrated knowledge. This progression illustrates the increasing degrees of data FAIR(ness), from basic cataloguing and accessibility to sophisticated, integrated, and AI-enabled data environments.

Practical implementation steps

Step 1# Adopt standards and protocols 

Use standard metadata schemas, data formats, and ontologies widely accepted in your domain.Adhering to standardised data formats and protocols allows data to be easily shared and understood by different systems and researchers. This reduces the friction involved in data exchange and promotes interoperability.

 Examples include FHIR for healthcare information exchange and CDISC for clinical research data.

Step 2# Develop robust metadata

Create detailed metadata that describes your data’s content, context, and structure to make it more findable and reusable.

Metadata provides context, detailing the origin, methodology, and data collection parameters, which is crucial for other researchers who might use the data in the future.

Step 3# Use persistent identifiers

Apply DOIs or other persistent identifiers to datasets to ensure they can be reliably found and cited. See the illustration below: 

Step 4# Promote interoperability 

To ensure data can be easily integrated, use common formats and languages (e.g., JSON, XML) and adopt domain-specific standards.

Making data accessible to multiple systems boosts flexibility and prevents vendor lock-in, freeing data from proprietary systems. This fosters seamless integration and empowers businesses to thrive in a dynamic environment.

Step 5# Ensure data accessibility

Set up access controls that balance openness with necessary restrictions and use APIs to facilitate data access.

Establishing clear policies and practices for data governance ensures that data remains accessible, secure, and trustworthy. This encompasses data stewardship, ethical considerations, and compliance with regulatory requirements.

See the example describing data accessibility components below:

Dataset: Global Biodiversity Survey Data 
 
Access controls: 
  • Public access: Non-sensitive data is accessible via a public API.
  • API endpoint: (an example site such as) https://api.globalbiodiversity.org/public  
  • Restricted access: Sensitive data available to verified researchers via a secure API.
  • API endpoint: (an example site such as) https://api.globalbiodiversity.org/secure 
  • Authentication: OAuth 2.0
Data governance:
  • Stewardship: Managed by a dedicated team ensuring data quality and integrity.
  • Ethics and compliance: Adheres to ethical guidelines and international regulations .
Contact: Name, Organisation, Email
 

Step 6# Document thoroughly

To enhance reusability, provide comprehensive documentation on the data collection process, quality checks, and usage guidelines.

Promoting a culture that values data-centric practices requires training researchers and stakeholders on the importance of data management and the benefits of a data-centric approach. 

See the example describing key documentation below: 

Data collection:

Sources: N weather stations worldwide

  • Period: Month/Year – Month/Year
  • Variables: Temperature, precipitation, humidity, wind speed
  • Methods: Hourly recordings, daily summaries

Quality checks: 

  • Validation: Real-time anomaly detection
  • Post-collection: Outlier detection, consistency checks
  • Handling missing data: Imputation for minor gaps

Usage Guidelines: 

  • Access: Public repository and API
  • Repository: datarepo.climate.org
  • API: (an example site such as) https://api.climate.org/data  
  • License: CC BY 4.0
  • Citation: Climate Institute. DOI: 10.12345/climate data 

Training & support: 

  • Workshops: Regular training sessions 
  • Helpdesk: support@climate.org  (sample email, for example)

Why FAIR data matters

The relevance of FAIR data transcends industries and academic fields, becoming a pivotal element in advancing knowledge and technology.

In research environments, FAIR data facilitates a more collaborative and efficient use of information, leading to faster and more reliable scientific discoveries.

Implementing FAIR principles improves data quality, enhanced transparency, and more robust scientific and business outcomes.

Implementing FAIR principles enhances decision-making processes, improves compliance with regulatory requirements, and fosters business innovation by providing a solid framework for data management and usage.

Barriers to implementing FAIR data

Implementing FAIR principles faces several challenges, including managing comprehensive metadata, ensuring data interoperability across different systems, maintaining high data quality and consistency, and balancing accessibility with security and compliance requirements.

Additionally, scaling FAIR principles across large organisations is resource-intensive, requiring significant financial and human investments, while cultural shifts towards data-centric approaches can meet resistance.

Technical expertise is crucial but often needs to be improved, necessitating investment in skilled personnel and training. 

Solutions involve phased implementation, adopting standard technologies like RDF and SPARQL, robust data governance frameworks, secure access controls, and promoting organisational buy-in through training and change management strategies.

Summing up

The FAIR Data Principles mark a critical step towards cohesive, efficient data management practices that can significantly impact research outcomes, business efficiency, and overall data quality.

As data continues to be recognised as a valuable asset, implementing these principles is beneficial and essential for any data-driven organisation aiming to thrive in a digital economy.

The journey from data fragmentation to cohesion requires a commitment to adopting these principles, which, while challenging, offer substantial rewards in innovation, efficiency, and compliance. 

In summary, data-centricity is about valuing data as a core asset, structuring it around the scientific work it supports, and ensuring its longevity and accessibility beyond the lifespan of any single system or project. 

This approach promotes a more efficient, collaborative, and innovative scientific environment. 

Data-centric approaches offer a straightforward path to realising FAIR principles. Adopting data-centric strategies comprehensively addresses the goals outlined in the FAIR principles, along with additional benefits. 

With a solution like Datavid Rover you can achieve data centricity through FAIR principles. 
It is an extensible platform enabling organisations to transform their data into actionable insights and knowledge quickly.

Frequently asked questions