Turning 70 years of R&D into a searchable scientific knowledge base

Industry
Life Sciences
Challenge
Syngenta faced fragmented legacy data across siloed systems, limited search capabilities, and compliance risks, leading to duplicated research, wasted resources, and missed insights.
Solution and Results
Datavid built Synapse, an AI-powered semantic search platform that unified decades of scientific knowledge, enabling faster discovery, improved compliance, and greater reuse of existing research across the organization.
Technology used
Progress MarkLogic, Angular, Apache NiFi , Tesseract, AWS Lambda, AWS SQS, AWS S3, Abbyy, Apache Tika, FedChem
“Synapse is really quite amazing at finding information from the simplest of queries.”
User of Synapse

About Syngenta
With almost 50 thousand people spread across the globe, Syngenta is a powerhouse in the agrochemical space, leading the field in most of its applications worldwide.Setting the Scene
In agricultural R&D, past knowledge is a critical asset, but only if it can be found. With tens of millions of documents spanning over 70 years and no centralised search, Syngenta, one of the world's leading agrochemical companies, faced mounting challenges in making its legacy research discoverable.
Valuable insights were locked away in data silos, leading to duplicated studies, delayed innovation, and unnecessary costs.
The Syngenta CP (Crop Protection) R&D client required a centralized enterprise semantic search, using NLP to understand user intent. The goal was to accelerate research workflows, surface historical knowledge (even back to the 1960s), and reduce compliance risks across globally distributed teams.
Datavid partnered with Syngenta to develop Synapse, an AI-powered semantic search platform that transforms unstructured and structured documents into an accessible, compliant, and searchable knowledge base.
Still struggling to find critical information in your data?
We’ve helped organizations like yours - let’s explore what’s possible!
The Challenges
Too much data, too little visibility
Syngenta employs over 49,000 people across more than 100 countries. This global scale enables deep, collaborative R&D efforts but it brings considerable complexity.
Over the years, Syngenta’s research and development teams generated tens of millions of documents spanning over 70 years. However, there was no centralized platform to access or search this body of knowledge. Information was scattered across siloed systems: SharePoint, Veeva Vault, internal drives, regulatory websites, and even scanned paper documents, making it nearly impossible to locate what was already known.
A solution had to be found, otherwise, data silos would continue to grow, increasing the frequency of their problems and the waste of duplicate research costs.
1. Search that didn't understand scienceThe search capabilities that did exist relied on basic keyword matching. They lacked semantic understanding and couldn’t interpret scientific synonyms, chemical identifiers, or domain-specific regulatory language. As a result, critical insights were buried, and valuable research went unused simply because it couldn’t be found.
2. Duplicate studies, repeated costsThis lack of visibility meant that teams often unintentionally duplicated studies, repeating work whose results already existed somewhere in the system. This slowed the research process and led to unnecessary time, resources, and budget use.
3. Compliance bottlenecks and riskReviewing documents for compliance was a manual, time-consuming process prone to human error and inefficiency. There were no automated controls to flag or protect confidential or regulated content, exposing the business to compliance risks and delaying time to insight.
4. Limited scientific utilityThe existing tools lacked support for chemical structure searches or integration with scientific taxonomies and industry vocabularies. This made it harder for scientists to explore data in the context they needed, limiting the utility of a vast knowledge base that should have been a competitive advantage.
The Solution
Cognitive search powered by semantic enrichment and robust architecture
To turn their legacy research into a strategic information asset, Syngenta CP R&D set out to create a unified, searchable platform capable of surfacing decades of fragmented knowledge.
For this ambitious goal, they chose Datavid, a specialist in data intelligence solutions, as their partner.
Together, we built Synapse: a semantic search and discovery platform purpose-built to handle scientific, regulatory, and chemical data at scale.
At its core, Synapse combines advanced semantic enrichment with a resilient architecture, enabling researchers to access trusted information quickly, securely, and in context.
Key capabilities of the platform include:
- Semantic synonym search across chemical names, commercial labels, and regulatory terms, making it easier for researchers to find relevant data no matter how referenced.
- Ontology-driven classification of scientific and regulatory concepts to ensure consistent tagging, search accuracy, and discoverability across 22+ content sources.
- Automated ingestion and enrichment of over 37 million documents, with ongoing updates and content growth managed at scale.
- Role-based access control (RBAC) and detailed audit trails ensure compliance and data security for sensitive regulatory information.
- Flexible integration mechanisms, including APIs, file ingestion, and web scraping to connect with structured and unstructured sources, even those without native APIs.
- Real-time system health monitoring, alerts for ingestion failures or anomalies, ensure reliability and continuity across workflows.
- Search analytics dashboard to track platform usage and identify content gaps.
- Automated lifecycle workflows for content retention, archiving, and deletion in line with governance policies
- Regulatory vocabulary harmonization across disparate global standards and internal taxonomies
The Outcomes
Faster research. Smarter decisions. Measurable ROI.
With the launch of Synapse, Syngenta CP R&D has fundamentally transformed how scientific knowledge is accessed and used across the organization.
More than 16 million internal and external documents from 22 structured and unstructured sources are now fully searchable within a single, integrated platform. Historical knowledge, including pre-digital formats dating back to the 1960s, is readily available to researchers worldwide.
The result? A dramatic 50–60% improvement in search performance, enabling users to retrieve relevant information in minutes rather than the 2–3 weeks it previously took.
This shift has unlocked several measurable benefits:
- 30–40% less time spent by scientists and regulatory teams on data discovery
- 20–30% reduction in compliance risk, thanks to automated filtering of sensitive data
- Duplicate studies identified and removed, saving thousands per project
- Data classified into 16 categories, with 30+ concept types automatically extracted for deeper semantic understanding
- Seamless access and sharing of insights across teams via simple export and collaboration tools
What once required extensive effort, manual comparison of databases, spreadsheet compilation, and document reconciliation, is now handled instantly through cognitive search and semantic enrichment.
Importantly, the platform is not static. It is being continuously enhanced with LLM-powered enrichment and UX updates. It is expanding to support additional departments and business units, ensuring Synapse grows in value as the organization evolves.
Syngenta is now saving significant time and money on every new R&D project—and more importantly, redirecting focus toward innovation and long-term business goals instead of tedious document retrieval.