Data-centricity emphasises placing data at the core of organisational strategies and operations.
FAIR principle complements this approach by ensuring data is managed effectively to maximise value, empowering businesses to cultivate a culture of data-driven decision-making that fuels continuous improvement and innovation.
Data centricity is a modern approach that emphasises organising and managing data based on the scientific processes and insights it supports rather than the specific applications or platforms that handle it.
This paradigm shift is rooted in the belief that the actual value of data lies in its potential to be reused, shared, and repurposed across various contexts and applications, fostering greater scientific collaboration and innovation.
Data-centricity enables:
Data-centricity advocates for managing data independently of the specific systems or tools used to collect or process it.
This means designing data architectures that prioritise the long-term preservation and accessibility of data over the particularities of any one system.
The benefits to organisations include:
Enterprise Knowledge Graphs (EKGs) and data-centricity are closely intertwined concepts that, when combined, provide a robust framework for managing, integrating, and leveraging data within an organisation.
An Enterprise Knowledge Graph (EKG) is an advanced data model that interlinks and contextualises data from various organisational sources, forming a comprehensive knowledge network.
It uses RDF to create interconnected datasets, applies ontologies for semantic context, and adapts flexibly to changes without extensive rework.
EKGs support complex queries via SPARQL, enabling seamless data integration, enhanced data discovery through semantic relationships, deeper contextual insights, and data reusability across multiple applications.
The following image depicts the transformation from application-centric to data-centric architecture using EKGs.
Linking FAIR (Findable, Accessible, Interoperable, Reusable) principles with data-centricity involves understanding how these principles can enhance a data-centric approach in research and data management.
This comprehensive exploration will delve into the core aspects of FAIR data, its significance in contemporary research and data management practices, and its influence on various stakeholders across multiple sectors.
Metadata and data are identified by URI – Universal Resource Identifiers (F1), described with rich RDF-based metadata (F2), explicitly linked (F3), and indexed in searchable resources like triple stores(F4).
Data is retrieved via standardised protocols like SPARQL, supporting open, free, and authenticated access (A1, A1.1, A1.2). Metadata, stored in separate named graphs (A2), remains accessible even if data is unavailable.
Data uses RDF, and metadata uses OWL in searchable triple stores (I1). Vocabularies are in RDF and OWL, following FAIR principles and using specific references (I1, I3).
Metadata includes detailed usability attributes (R1). Clear usage licenses and detailed provenance are provided(R1.1, R1.2). Data and metadata adhere to domain-relevant community standards(R1.3).
In a data-centric approach, data is at the core of decision-making and processes. Ensuring that data is easily findable(F) enhances this approach by making relevant data readily available to stakeholders.
FAIR: Implementing metadata standards, using persistent identifiers (like DOIs), and maintaining well-organised data repositories makes data findable. This supports data-centric operations by reducing the time spent searching for data.
Access(A) to data is crucial for a data-centric approach, where data needs to be readily available for analysis and decision-making.
FAIR: Providing clear usage licenses, implementing authentication and authorisation protocols, and ensuring data can be retrieved efficiently to ensure accessibility. This enhances a data-centric approach by ensuring that data is available when needed under the right conditions.
Data from different sources often needs to be integrated and analysed in a data-centric approach. Interoperability(I) ensures that data from various origins can work together seamlessly.
FAIR: Adopting standard formats, vocabularies, and ontologies ensures interoperability. This is essential for data-driven approaches integrating diverse datasets to derive comprehensive insights.
For a data-centric strategy to be sustainable, data must be reusable(R) to avoid redundancy and build upon previous work.
FAIR: Clear and detailed documentation, adherence to community standards, and ensuring data quality and provenance make data reusable. This supports data-centric approaches by enabling data to be repurposed and recombined for various analyses, reducing duplication of effort.
The image depicts 6- levels of data maturity in the context of FAIR principles, ranging from raw, uncatalogued data to fully described and integrated knowledge. This progression illustrates the increasing degrees of data FAIR(ness), from basic cataloguing and accessibility to sophisticated, integrated, and AI-enabled data environments.
Use standard metadata schemas, data formats, and ontologies widely accepted in your domain.Adhering to standardised data formats and protocols allows data to be easily shared and understood by different systems and researchers. This reduces the friction involved in data exchange and promotes interoperability.
Examples include FHIR for healthcare information exchange and CDISC for clinical research data.
Create detailed metadata that describes your data’s content, context, and structure to make it more findable and reusable.
Metadata provides context, detailing the origin, methodology, and data collection parameters, which is crucial for other researchers who might use the data in the future.
Apply DOIs or other persistent identifiers to datasets to ensure they can be reliably found and cited. See the illustration below:
To ensure data can be easily integrated, use common formats and languages (e.g., JSON, XML) and adopt domain-specific standards.
Making data accessible to multiple systems boosts flexibility and prevents vendor lock-in, freeing data from proprietary systems. This fosters seamless integration and empowers businesses to thrive in a dynamic environment.
Set up access controls that balance openness with necessary restrictions and use APIs to facilitate data access.
Establishing clear policies and practices for data governance ensures that data remains accessible, secure, and trustworthy. This encompasses data stewardship, ethical considerations, and compliance with regulatory requirements.
See the example describing data accessibility components below:
To enhance reusability, provide comprehensive documentation on the data collection process, quality checks, and usage guidelines.
Promoting a culture that values data-centric practices requires training researchers and stakeholders on the importance of data management and the benefits of a data-centric approach.
See the example describing key documentation below:
Data collection:
Sources: N weather stations worldwide
Quality checks:
Usage Guidelines:
Training & support:
The relevance of FAIR data transcends industries and academic fields, becoming a pivotal element in advancing knowledge and technology.
In research environments, FAIR data facilitates a more collaborative and efficient use of information, leading to faster and more reliable scientific discoveries.
Implementing FAIR principles improves data quality, enhanced transparency, and more robust scientific and business outcomes.
Implementing FAIR principles enhances decision-making processes, improves compliance with regulatory requirements, and fosters business innovation by providing a solid framework for data management and usage.
Implementing FAIR principles faces several challenges, including managing comprehensive metadata, ensuring data interoperability across different systems, maintaining high data quality and consistency, and balancing accessibility with security and compliance requirements.
Additionally, scaling FAIR principles across large organisations is resource-intensive, requiring significant financial and human investments, while cultural shifts towards data-centric approaches can meet resistance.
Technical expertise is crucial but often needs to be improved, necessitating investment in skilled personnel and training.
Solutions involve phased implementation, adopting standard technologies like RDF and SPARQL, robust data governance frameworks, secure access controls, and promoting organisational buy-in through training and change management strategies.
The FAIR Data Principles mark a critical step towards cohesive, efficient data management practices that can significantly impact research outcomes, business efficiency, and overall data quality.
As data continues to be recognised as a valuable asset, implementing these principles is beneficial and essential for any data-driven organisation aiming to thrive in a digital economy.
The journey from data fragmentation to cohesion requires a commitment to adopting these principles, which, while challenging, offer substantial rewards in innovation, efficiency, and compliance.
In summary, data-centricity is about valuing data as a core asset, structuring it around the scientific work it supports, and ensuring its longevity and accessibility beyond the lifespan of any single system or project.
This approach promotes a more efficient, collaborative, and innovative scientific environment.
Data-centric approaches offer a straightforward path to realising FAIR principles. Adopting data-centric strategies comprehensively addresses the goals outlined in the FAIR principles, along with additional benefits.
With a solution like Datavid Rover you can achieve data centricity through FAIR principles.
It is an extensible platform enabling organisations to transform their data into actionable insights and knowledge quickly.