From data to curated information to new product

Industry
Publishing
Challenge
CAS faced the urgent need to modernize legacy systems and siloed workflows to rapidly scale from 80 years of chemical curation to agile, cross-disciplinary research—amid mounting data volumes, fragmented standards, and delivery bottlenecks.
Solution and Results
CAS’s transformation wasn’t just technical—it was a shift in mindset. By uniting domain, content, and technology experts around a shared vision, they redefined how teams work, communicate, and innovate. With agile processes, a culture of collaboration, and the right tools in place, CAS accelerated delivery timelines, launched a new product in record time, and positioned itself to lead in emerging scientific fields like biology and materials science.
Technology used
Progress MarkLogic, AWS
“We went from taking over a year to process a single data source to doing 14 in one go—including $8.3 million worth of biomarker data. That’s what transformation looks like.”
Rodney Fulford
Assistant Director, Content and Technology Strategy, CAS

About CAS
CAS, a division of the American Chemical Society, is a trusted provider of scientific curated data products solutions. Known globally for its substance registry and curated chemical information, CAS supports scientific discovery through deep domain expertise, rigorous content curation, and advanced data platforms. With a mission to empower innovation, CAS now extends beyond chemistry into biology and materials science, embracing modern technologies and strategic partnerships to deliver next-generation research tools.Setting the Scene
The scientific information industry is undergoing a significant transformation. With the exponential growth of data in life sciences, materials science, and chemistry organizations face challenges in managing, integrating, and deriving meaningful insights from vast and diverse datasets. Traditional data management systems often struggle with the volume, variety, and velocity of modern scientific data, leading to inefficiencies and delayed discoveries.
CAS, renowned for its comprehensive chemical substance database, recognized the need to evolve its product offerings and infrastructure. The new requirements were for interdisciplinary research. Pharmaceutical companies and academic researchers now want gold standard, curated, biological research as well as chemical data.
A new data hub, new infrastructure, and a new way of working were needed to rapidly meet this demand.
- Internal operations: modernising curation tools, some over 30 years old
- Customer-facing platforms: enabling semantic search and data-driven insights
- Customer AI initiatives: through co-developed, agile solutions
This AI-driven transformation is supported by CAS’s strategic investments across the entire value chain—from operations to product development and customer enablement.
The Challenges
From legacy to agility
Despite its industry leadership, CAS was constrained by legacy systems and fragmented workflows that were no longer fit for purpose in a data-intensive, fast-moving research environment.
Key obstacles:
- High data volume: Content scattered across many disconnected sources;
- Inconsistent standards: Lack of harmonization across formats and taxonomies;
- Poor compatibility: Diverse systems and tools blocked integration and insights;
- Slow delivery timelines: New content could take 18+ months to become productized;
- Rigid processes: Agile ceremonies had lost meaning, creating bottlenecks instead of speed;
- Cultural silos: Siloed roles and documentation-heavy workflows fostered a blame culture rather than collaboration.
CAS also faced a unique pressure: to compress 80 years of chemical curation experience into just 2–3 years to cover new disciplines like biology and the new discipline of biological research.
The Solution
Technology, process, and cultural change
THE TRIANGLE OF SUCCESS
To drive innovation, CAS adopted a foundational model they call the Triangle of Success, recognizing that impactful AI transformation requires three distinct capabilities:
- Domain Experts: Who bring scientific context and define ontologies
- Content Experts: Who ensure proper data modelling and scientific harmonization
- Tech/Algorithm Experts: Who build scalable, intelligent systems
This triangulated expertise enabled effective semantic modelling, entity extraction, and relevance-driven curation.
This transformation was not limited to technological change. As Rodney Fulford from CAS highlighted, true innovation came from aligning people, processes, and platforms, introducing a cultural shift as much as a technical one.
TECHNOLOGY MODERNISATION WITH MARKLOGIC AND AWS
“Big data makes the hard things possible, but it makes the simple things hard. MarkLogic gave us visibility and speed to do the simple things well.”
— Rodney Fulford, CAS
CAS selected Progress MarkLogic, hosted on AWS, to replace five siloed data stores with a single operational data hub.
The platform supports both structured and unstructured content management, combining hierarchical XML with triple-based metadata to enable advanced search, enrichment, and content reuse. Importantly, the solution was not solely focused on semantics, it was designed to manage rich XML content in tandem with RDF triples to meet the diverse needs of CAS’s content architecture.
MarkLogic offered:
- Real-time visibility into hierarchical and unstructured data
- Support for multi-model content (XML, JSON, RDF, SQL)
- Built-in semantic search, triple stores, and enrichment capabilities
- Superior speed, performance, and change agility
ROADRUNNER: 90-DAY PROOF OF CONCEPT
To validate the new architecture, CAS launched Project Roadrunner, a 90-day POC focused on the core knowledge management pipeline:
- Unified ingestion, curation, discovery, and enrichment
- Replaced ETL-heavy workflows with agile, semantic pipelines
- Enabled content visibility within 15 minutes of ingestion
- Demonstrated "Stage Gate" checks for fast-cycle quality control
Key milestone: reduced cycle time for content from over a year to <75 days
PHOENIX: A NEW CURATION ENGINE
To validate the success of Roadrunner laid the foundation for Phoenix, CAS’s next-generation platform for scientific content creation. Phoenix brings flexibility, feedback-driven iteration, and test data subsetting, all powered by unified semantic models.
ODH: CREATING A GOLDEN RECORD FOR SCIENTIFIC DATA
Building on Phoenix’s foundation for curation, CAS initiated the Operational Data Hub (ODH): a modern data management architecture that serves as a central point for accessing, integrating, and processing scientific data at scale. Designed for performance, the ODH enables fast, structured access to curated content, solving complex data integration challenges while laying the groundwork for advanced enterprise strategies like cloud scalability and future technology adoption.
Unlike a raw data lake, the ODH produces a “golden record” through multi-stage semantic enrichment. It ingests structured data from multiple external vendors, transforms it into a unified CSA XML format, and links key biomedical entities such as targets, diseases, drugs, and interactions. Datavid supported the initiative by contributing to metadata modeling, UI development, and foundation standardization, ensuring downstream platforms can reliably consume high-quality, linked content. This trusted data layer later became essential to powering products like BioFinder.
PRIME WITH DATAVID: FROM POC TO BIOFINDER V2 LAUNCH
To drive product innovation, CAS launched Project PRIME (Product Innovation Mechanism) with Datavid as its strategic co-development partner. The initiative began as a 90-day proof of concept (July to November), where Datavid played a central role in rethinking the BioFinder platform’s architecture.
Working closely with both product and content teams, Datavid enabled agile development workflows that accelerated delivery and reduced risk. Following the success of the PoC, CAS proceeded with full-scale implementation - culminating in the launch of BioFinder V2 earlier this year.
The result: a modernized, scalable product delivered on time and built through close collaboration with Datavid.
Don’t just take our word for it.
WATCH THE DEMO!
The Outcomes
Speed, scale, and scientific impact
The impact of CAS’s transformation was immediate and far-reaching. By modernizing its technology, embracing agile processes, and rethinking its organizational culture, CAS delivered breakthrough outcomes across operations, data, and product innovation.
Quantifiable outcomes
- <75 days to process content from ingestion to product (down from 18+ months)
- 14 new data sources integrated in one iteration
- $8.3 million in curated biomarker data prepared for May Life Sciences launch
Operational advancements
- Unified five legacy repositories into one operational data hub
- Seamless ingestion, semantic enrichment, and search/discovery
- Low-latency test data subsetting and content quality checks
Cultural and process shifts
- Shift from legacy-heavy workflows to startup-style innovation
- Agile principles refocused on business outcomes
- Business experts and implementers co-located to foster shared accountability
- Reduced blame culture, enhanced trust, and real-time collaboration
Strategic enablement
- Phoenix now powers content creation with live feedback loops
- BioFinder V2 delivered in 90 days, showcasing agile product transformation
- CAS is now equipped to scale rapidly into biology, material science, and beyond
Business Outcomes
After more than 100 years as a single-product organization, CAS successfully launched a second product, moving from proof of concept to production in just seven months.
The CAS journey demonstrates that with the right mindset, tools, and partners, even the most established organizations can transform. By aligning data, people, and purpose, CAS is no longer just curating scientific knowledge, it is shaping its future.
Want the full story from the leaders who made it happen?
Discover how CAS’s CDO-led transformation turned MarkLogic
into a catalyst for product innovation.