Turning 70 years of R&D into a searchable scientific knowledge base

Industry

Life Sciences

Challenge

Syngenta faced fragmented legacy data across siloed systems, limited search capabilities, and compliance risks, leading to duplicated research, wasted resources, and missed insights.

Solution and Results

Datavid built Synapse, an AI-powered semantic search platform that unified decades of scientific knowledge, enabling faster discovery, improved compliance, and greater reuse of existing research across the organization.

Technology used

Progress MarkLogic, Angular, Apache NiFi , Tesseract, AWS Lambda, AWS SQS, AWS S3, Abbyy, Apache Tika, FedChem

www.syngenta.com

16M+

fully searchable docs

50/60%

faster search performance

30/40%

less time spent on data discovery

20/30%

reduction in compliance risk

“Synapse is really quite amazing at finding information from the simplest of queries.”

User of Synapse

modern-office-building-facade-2025-04-01-02-55-19-utc

About Syngenta

Syngenta is a global leader in the agrochemical space, recognised for its innovation and impact across a wide range of applications.

Setting the Scene

In agricultural R&D, past knowledge is a critical asset, but only if it can be found. With tens of millions of documents spanning over 70 years and no centralised search, Syngenta, one of the world's leading agrochemical companies, faced mounting challenges in making its legacy research discoverable.

Valuable insights were locked away in data silos, leading to duplicated studies, delayed innovation, and unnecessary costs.

The Syngenta CP (Crop Protection) R&D client required a centralized enterprise semantic search, using NLP to understand user intent. The goal was to accelerate research workflows, surface historical knowledge (even back to the 1960s), and reduce compliance risks across globally distributed teams.

Datavid partnered with Syngenta to develop Synapse, an AI-powered semantic search platform that transforms unstructured and structured documents into an accessible, compliant, and searchable knowledge base.

Still struggling to find critical information in your data?
We’ve helped organizations like yours - let’s explore what’s possible!

The Challenges

Too much data, too little visibility

Syngenta operates at global scale, enabling deep, collaborative R&D efforts, but this also brings considerable complexity.

Over the years, Syngenta’s research and development teams generated tens of millions of documents spanning over 70 years. However, there was no centralized platform to access or search this body of knowledge. Information was scattered across siloed systems: SharePoint, Veeva Vault, internal drives, regulatory websites, and even scanned paper documents, making it nearly impossible to locate what was already known.

Syngenta challenges synapse

A solution had to be found, otherwise, data silos would continue to grow, increasing the frequency of their problems and the waste of duplicate research costs.

1. Search that didn't understand science

The search capabilities that did exist relied on basic keyword matching. They lacked semantic understanding and couldn’t interpret scientific synonyms, chemical identifiers, or domain-specific regulatory language. As a result, critical insights were buried, and valuable research went unused simply because it couldn’t be found.

2. Duplicate studies, repeated costs

This lack of visibility meant that teams often unintentionally duplicated studies, repeating work whose results already existed somewhere in the system. This slowed the research process and led to unnecessary time, resources, and budget use.

3. Compliance bottlenecks and risk

Reviewing documents for compliance was a manual, time-consuming process prone to human error and inefficiency. There were no automated controls to flag or protect confidential or regulated content, exposing the business to compliance risks and delaying time to insight.

4. Limited scientific utility

The existing tools lacked support for chemical structure searches or integration with scientific taxonomies and industry vocabularies. This made it harder for scientists to explore data in the context they needed, limiting the utility of a vast knowledge base that should have been a competitive advantage.

The Solution

Cognitive search powered by semantic enrichment and robust architecture

solution syngenta r&d case study

To turn their legacy research into a strategic information asset, Syngenta CP R&D set out to create a unified, searchable platform capable of surfacing decades of fragmented knowledge.

For this ambitious goal, they chose Datavid, a specialist in data intelligence solutions, as their partner.

Together, we built Synapse: a semantic search and discovery platform purpose-built to handle scientific, regulatory, and chemical data at scale.

At its core, Synapse combines advanced semantic enrichment with a resilient architecture, enabling researchers to access trusted information quickly, securely, and in context.

Key capabilities of the platform include:

Semantic synonym search across chemical names, commercial labels, and regulatory terms, making it easier for researchers to find relevant data no matter how referenced.

Ontology-driven classification of scientific and regulatory concepts to ensure consistent tagging, search accuracy, and discoverability across 22+ content sources.

Automated ingestion and enrichment of over 37 million documents, with ongoing updates and content growth managed at scale.

Role-based access control (RBAC) and detailed audit trails ensure compliance and data security for sensitive regulatory information.

Flexible integration mechanisms, including APIs, file ingestion, and web scraping to connect with structured and unstructured sources, even those without native APIs.

Real-time system health monitoring, alerts for ingestion failures or anomalies, ensure reliability and continuity across workflows.

Search analytics dashboard to track platform usage and identify content gaps.

Automated lifecycle workflows for content retention, archiving, and deletion in line with governance policies

Regulatory vocabulary harmonization across disparate global standards and internal taxonomies

The Outcomes

Faster research. Smarter decisions. Measurable ROI.

With the launch of Synapse, Syngenta CP R&D has fundamentally transformed how scientific knowledge is accessed and used across the organization.

More than 16 million internal and external documents from 22 structured and unstructured sources are now fully searchable within a single, integrated platform. Historical knowledge, including pre-digital formats dating back to the 1960s, is readily available to researchers worldwide.

The result? A dramatic 50–60% improvement in search performance, enabling users to retrieve relevant information in minutes rather than the 2–3 weeks it previously took.

This shift has unlocked several measurable benefits:

30–40% less time spent by scientists and regulatory teams on data discovery

20–30% reduction in compliance risk, thanks to automated filtering of sensitive data

Duplicate studies identified and removed, saving thousands per project

Data classified into 16 categories, with 30+ concept types automatically extracted for deeper semantic understanding

Seamless access and sharing of insights across teams via simple export and collaboration tools

What once required extensive effort, manual comparison of databases, spreadsheet compilation, and document reconciliation, is now handled instantly through cognitive search and semantic enrichment.

Importantly, the platform is not static. It is being continuously enhanced with LLM-powered enrichment and UX updates. It is expanding to support additional departments and business units, ensuring Synapse grows in value as the organization evolves.

Syngenta is now saving significant time and money on every new R&D project—and more importantly, redirecting focus toward innovation and long-term business goals instead of tedious document retrieval.

Turning 70 years of R&D into a searchable scientific knowledge base

“Synapse is really quite amazing at finding information from the simplest of queries.”

About Syngenta

Setting the Scene

The Challenges

The Solution

The Outcomes

Curious what’s possible in your environment?

Related case studies

Transforming policy assistance in pharma: Scalable, searchable, self-service

Unifying biobank research at scale with a metadata knowledge graph & RAG automation

Revolutionising post-market analytics for performance at scale