Skip to content

Empowering data transformation at scale

CAS

Industry

Publishing

Challenge

CAS faced the urgent need to modernize legacy systems and siloed workflows to rapidly scale from 80 years of chemical curation to agile, cross-disciplinary research—amid mounting data volumes, fragmented standards, and delivery bottlenecks.

Solution and Results

CAS’s transformation wasn’t just technical—it was a shift in mindset. By uniting domain, content, and technology experts around a shared vision, they redefined how teams work, communicate, and innovate. With agile processes, a culture of collaboration, and the right tools in place, CAS accelerated delivery timelines, launched a new product in record time, and positioned itself to lead in emerging scientific fields like biology and materials science.

<75days
from ingestion to product (was 12+ months)
14
new data sources integrated in one iteration
$8.3M
in high-value biomarker data included
7months
from PoC to launch of a second product

“We went from taking over a year to process a single data source to doing 14 in one go—including $8.3 million worth of biomarker data. That’s what transformation looks like.”

Rodney Fulford

Assistant Director, Content and Technology Strategy, CAS

company building

About CAS

CAS, a division of the American Chemical Society, is a trusted provider of scientific curated data products solutions. Known globally for its substance registry and curated chemical information, CAS supports scientific discovery through deep domain expertise, rigorous content curation, and advanced data platforms. With a mission to empower innovation, CAS now extends beyond chemistry into biology and materials science, embracing modern technologies and strategic partnerships to deliver next-generation research tools. 

Setting the Scene

The scientific information industry is undergoing a significant transformation. With the exponential growth of data in life sciences, materials science, and chemistry organizations face challenges in managing, integrating, and deriving meaningful insights from vast and diverse datasets. Traditional data management systems often struggle with the volume, variety, and velocity of modern scientific data, leading to inefficiencies and delayed discoveries. 

CAS, renowned for its comprehensive chemical substance database, recognized the need to evolve its product offerings and infrastructure. The new requirements were for interdisciplinary research. Pharmaceutical companies and academic researchers now want gold standard, curated, biological research as well as chemical data. 

A new data hub, new infrastructure, and a new way of working were needed to rapidly meet this demand.  

  • Internal operations: modernising curation tools, some over 30 years old 
  • Customer-facing platforms: enabling semantic search and data-driven insights 

 This AI-driven transformation is supported by CAS’s strategic investments across the entire value chain—from operations to product development and customer enablement.

The Challenges

From legacy to agility

Despite its industry leadership, CAS was constrained by legacy systems and fragmented workflows that were no longer fit for purpose in a data-intensive, fast-moving research environment. 

Key obstacles: 

  • High data volume: Content scattered across many disconnected sources;
  • Inconsistent standards: Lack of harmonization across formats and taxonomies;
  • Poor compatibility: Diverse systems and tools blocked integration and insights;
  • Slow delivery timelines: New content could take 18+ months to become productized;
  • Rigid processes: Agile ceremonies had lost meaning, creating bottlenecks instead of speed;
  • Cultural silos: Siloed roles and documentation-heavy workflows fostered a blame culture rather than collaboration.

CAS also faced a unique pressure: to compress 80 years of chemical curation experience into just 2–3 years to cover new disciplines like biology and the new discipline of biological research. 

The Solution

Technology, process, and cultural change

THE TRIANGLE OF SUCCESS 

“This wasn’t just about technology. The real transformation came from aligning people, processes, and platforms—rethinking how we work, not just what we use.” 
— Rodney Fulford, CAS 

To drive innovation, CAS adopted a foundational model they call the Triangle of Success, recognizing that impactful AI transformation requires three distinct capabilities: 

Triangle of success CAS case study


  • Domain Experts: Who bring scientific context and define ontologies 
  • Content Experts: Who ensure proper data modelling and scientific harmonization 
  • Tech/Algorithm Experts: Who build scalable, intelligent systems 

This triangulated expertise enabled effective semantic modelling, entity extraction, and relevance-driven curation. 

 
TECHNOLOGY MODERNISATION WITH A SEMANTIC DATA PLATFORM AND AWS

CAS selected  a fit-for-purpose semantic data platform, hosted on AWS, to replace five siloed data stores with a single operational data hub. The platform supports both structured and unstructured content management, combining hierarchical XML with triple-based metadata to enable advanced search, enrichment, and content reuse. Importantly, the solution was not solely focused on semantics, it was designed to manage rich XML content in tandem with RDF triples to meet the diverse needs of CAS’s content architecture.
 
CAS diagram

ROADRUNNER: 90-DAY PROOF OF CONCEPT 

To validate the new architecture, CAS launched Project Roadrunner, a 90-day POC focused on the core knowledge management pipeline: 

  • Unified ingestion, curation, discovery, and enrichment 
  • Replaced ETL-heavy workflows with agile, semantic pipelines 
  • Enabled content visibility within 15 minutes of ingestion 
  • Demonstrated "Stage Gate" checks for fast-cycle quality control 

Key milestone: reduced cycle time for content from over a year to <75 days

PHOENIX: A NEW CURATION ENGINE 

To validate The success of Roadrunner laid the foundation for Phoenix, CAS’s next-generation platform for scientific content creation. Phoenix brings flexibility, feedback-driven iteration, and test data subsetting, all powered by unified semantic models.

ODH: CREATING A GOLDEN RECORD FOR SCIENTIFIC DATA 

Building on Phoenix’s foundation for curation, CAS initiated the Operational Data Hub (ODH): a modern data management architecture that serves as a central point for accessing, integrating, and processing scientific data at scale. Designed for performance, the ODH enables fast, structured access to curated content, solving complex data integration challenges while laying the groundwork for advanced enterprise strategies like cloud scalability and future technology adoption. 

Unlike a raw data lake, the ODH produces a “golden record” through multi-stage semantic enrichment. It ingests structured data from multiple external vendors, transforms it into a unified CSA XML format, and links key biomedical entities such as targets, diseases, drugs, and interactions. Datavid supported the initiative by contributing to metadata modeling, UI development, and foundation standardization, ensuring downstream platforms can reliably consume high-quality, linked content. This trusted data layer later became essential to powering products like BioFinder.

PRIME WITH DATAVID: FROM POC TO BIOFINDER V2 LAUNCH 

To drive product innovation, CAS launched Project PRIME (Product Innovation Mechanism) with Datavid as its strategic co-development partner. The initiative began as a 90-day proof of concept (July to November), where Datavid played a central role in rethinking the BioFinder platform’s architecture.

Working closely with both product and content teams, Datavid enabled agile development workflows that accelerated delivery and reduced risk. Following the success of the PoC, CAS proceeded with full-scale implementation - culminating in the launch of BioFinder V2 earlier this year. 

The result: a modernized, scalable product delivered on time and built through close collaboration with Datavid.

Curious to see how it works in action?
Watch a quick demo and explore the platform's full potential.

 


The Outcomes

Speed, scale, and scientific impact

The impact of CAS’s transformation was immediate and far-reaching. By modernizing its technology, embracing agile processes, and rethinking its organizational culture, CAS delivered breakthrough outcomes across operations, data, and product innovation.

Quantifiable outcomes 

  • <75 days to process content from ingestion to product (down from 18+ months)
  • 14 new data sources integrated in one iteration
  • $8.3 million in curated biomarker data prepared for May Life Sciences launch 

Operational advancements 

  • Unified five legacy repositories into one operational data hub
  • Seamless ingestion, semantic enrichment, and search/discovery
  • Low-latency test data subsetting and content quality checks 

Cultural and process shifts 

  • Shift from legacy-heavy workflows to startup-style innovation
  • Agile principles refocused on business outcomes
  • Business experts and implementers co-located to foster shared accountability
  • Reduced blame culture, enhanced trust, and real-time collaboration 
 

Strategic enablement 

  • Phoenix now powers content creation with live feedback loops
  • BioFinder V2 delivered in 90 days, showcasing agile product transformation
  • CAS is now equipped to scale rapidly into biology, material science, and beyond 

Business Outcomes

After more than 100 years as a single-product organization, CAS successfully launched a second product, moving from proof of concept to production in just seven months. 

The CAS journey demonstrates that with the right mindset, tools, and partners, even the most established organizations can transform. By aligning data, people, and purpose, CAS is no longer just curating scientific knowledge, it is shaping its future. 

Experience the journey through the voices of its real-life protagonists.
WATCH THE FULL WEBINAR

Curious what’s possible in your environment?

REQUEST A FREE POC ASSESSMENT