3x Faster AI Training for Cohere | CoreWeave Case Study

min read

How Cohere deployed NVIDIA GB200 NVL72 on CoreWeave to achieve 3× faster training for its North agentic AI platform

Challenge

Cohere is building the next generation of AI for the enterprise, delivering production-ready LLMs and agentic AI systems for real-world applications.

For its recent pioneering project, Cohere needed a compute and storage foundation powerful enough to support North, their secure agentic AI platform for enterprises, while maintaining the agility and efficiency that define their approach to innovation.

To achieve this, Cohere required a cloud partner that could deliver cutting-edge compute performance, high-throughput AI storage, and the operational reliability needed to run next-generation agentic workloads at scale.

‍Core Needs List

‍Massively scalable compute to support rapidly rising AI workloads and training cycles
High-performance storage to overcome cloud object-storage limitations at scale
A proactive partner capable of deploying NVIDIA GB200 NVL72 early and reliably

Because we had access to compute early, we were able to optimize for speed and efficiency. When it came time to train models for North, our team was able to focus on training iterations to bring our enterprise customers efficient and secure agentic AI.

Autumn Moulder
VP of Engineering, Cohere

Solution

Cohere partnered with CoreWeave to deploy one of the industry’s first NVIDIA GB200 NVL72 clusters in production. Having collaborated across multiple GPU generations—including A100, H100, and GH200—they trusted CoreWeave’s ability to operationalize new hardware early and at scale.

This deployment combined high-performance compute with CoreWeave AI Object Storage to accelerate data movement, simplify replication across clouds, and reduce costs.

Early access to NVIDIA GB200 NVL72
CoreWeave provided Cohere with early GB200 capacity for testing, enabling them to validate workflows and identify issues before production deployment. This accelerated Cohere’s development process for North and future large-scale models.
Proactive hardware health management
CoreWeave’s automated rack-health monitoring (e.g., validating cluster node health) surfaced issues earlier than other cloud providers, reducing operational risk for Cohere during early adoption.
Operational excellence with CoreWeave Kubernetes Service (CKS)
CKS delivered reliable operations with built-in observability tooling and declarative nodepool management via Infrastructure As Code, requiring minimal intervention from Cohere’s team.
ARM64 validation through GH200 nodes
GH200 availability allowed Cohere to test ARM64 compatibility with their software stack, uncovering surprises and resolving them before full GB200 rollout.
High-performance AI Object Storage
CoreWeave AI Object Storage provides sustained throughput of up to reduced replication friction, and introduced cost-efficient tiering.

Outcomes

The deployment of NVIDIA GB200 NVL72 on CoreWeave enabled Cohere to accelerate experimentation cycles, iterate more quickly on large-scale models, and bring North to its customers faster. With higher performance and more consistent throughput, Cohere can sustain rapid innovation while strengthening its position as a leader in enterprise-ready, agentic AI.

3x Faster training performance at scale
Cohere achieved up to 3× higher training performance for 100B-parameter models compared to previous-generation Hopper GPUs—even before Blackwell-specific optimizations. This uplift allows their team to train more frequently, shorten iteration loops, and accelerate innovation velocity.
High-throughput multicloud data access
Cohere realized consistently high storage throughput up to 7 GB/s/GPU across clouds and regions, powered by CoreWeave AI Object Storage. This enabled a unified dataset architecture without the performance penalties or replication friction common in traditional cloud object storage.
Faster time to market on next-gen AI
With early access to GH200 and GB200 hardware, proactive cluster health tools, and seamless CKS operations, Cohere moved from prototype to production faster. Shorter training cycles and reduced overhead drive meaningful ROI and enable earlier delivery of new AI capabilities.

‍

_{1. Faster training: “Thousands of NVIDIA Grace Blackwell GPUs Now Live at CoreWeave, Propelling Development for AI Pioneers,” NVIDIA, April 15, 2025.
2. Throughput/GPU: “CAIOS Achieves 7+ GB/s per GPU on NVIDIA Blackwell Ultra,” CoreWeave, September 22, 2025.}

‍

Cohere Accelerates Training of North, Agentic AI for Enterprise

Ready to get started?

Challenge

Solution

Outcomes

Products

Solutions

AI Infrastructure

Why CoreWeave

Resources

About

Cohere Accelerates Training of North, Agentic AI for Enterprise

Ready to get started?

Challenge

Solution

Outcomes

AI Object storage

GPU Compute

Slurm on Kubernetes

Read more case studies

General Intuition Scales World Model Training with CoreWeave ARENA

Mistral AI Unlocks 2.5x Faster Training Speeds

Products

Solutions

AI Infrastructure

Why CoreWeave

Resources

About