CoreWeave Kubernetes Service

Our managed Kubernetes environment is purpose-built for building, training, and deploying AI Applications.

Designed for generative AI

Blazing-fast performance. Reliability. Security. Easy to use. Real workload and infrastructure transparency.

At CoreWeave, every element of our stack is intentionally built around generative AI. CoreWeave Kubernetes Service (CKS) lies at the heart of that.

Kubernetes on bare metal

We’ve removed the hypervisor layer entirely, meaning your teams will work with bare metal nodes for optimal node performance, lower latency, better observability, and faster time to market.

Preconfigured clusters for AI

Free your teams from spending countless hours managing complex Kubernetes clusters. CKS Clusters come pre-installed and with pre-configured components.

That includes network and storage interfaces, GPU drivers, Slurm-on-Kubernetes, and Observability plug-ins for out-of-the-box production use on day one.

Tightly integrated with AI workload orchestration tools

CKS is built to natively integrate with workload orchestration tools like Slurm, KubeFlow, and KServe to help your developers focus on what they do best: Innovating.

Industry-leading <performance, scale, and resiliency>

Spin-up GPU superclusters in an environment built for AI workloads, with ultra-low latency, high-speed interconnect, and “human-in-the-loop” automation for top-tier performance.

Get maximum performance from your GPU nodes

CKS clusters use bare-metal nodes with NVIDIA BlueField DPUs for offloading node and resource management processes. That gives you high performance from your GPUs during model training, experimentation, and inference.

Supercomputer level scale and performance

Powered by NVIDIA Infiniband with SHARP—the industry’s best cluster scale-out interconnect and purpose-built cloud storage services. CKS supports scaling across clusters with 100k+ GPUs while delivering cutting-edge performance.

Reliability and resilience

CKS is deeply integrated with Mission Control—our collection of cluster health management tools and services. Get the most out of your AI infrastructure with little to no fleet management overhead.

Enterprise-grade security and observability

Trusted by leading AI Labs and enterprises, CKS provides the enterprise-grade security and observability solutions you need to run your mission-critical workloads. With unprecedented visibility into what’s going on in your clusters, bounce back from workload interruptions quickly and maximize cluster utilization.

Securely connect via Virtual Private Cloud (VPC)

Create isolated CKS clusters with compute and storage resources using VPC networking and encryption support to manage your cloud resources—powered by NVIDIA BlueField DPUs.

Granular observability to pinpoint troubleshooting

Traditional virtualized cloud environments provide limited visibility into infrastructure issues.

CoreWeave’s approach provides cutting edge observability tools that provide real-time insights into detailed cluster, node, and job-level metrics. 

Plus, CKS is complemented by intelligent monitoring that identifies and removes problem nodes before they can disrupt workloads.

Nip interruptions in the bud

Automated, proactive health-checking continuously runs on idle nodes, identifying patterns for potential hardware issues and swapping out problem nodes before they impact your workload.

Your teams directly benefit from our learnings and experience managing some of the industry’s largest GPU deployments.

A full stack of solutions

CKS was made to support developers with AI workloads. That’s why CKS leverages a holistic tech stack that makes building and deploying AI applications faster, easier, and more cost-efficient.

See the power of SUNK

SUNK runs Slurm on CKS, letting you easily run Slurm jobs and containerized workloads on the same cluster. That gives you better workload fungibility and greater resource utilization.

Cut time with Tensorizer

Never waste time waiting for models to load. Tensorizer accelerates model loading times in your CKS nodes by serializing AI models and their tensors into a single file and streaming them from HTTPS or S3 endpoints.

Do more with Mission Control

CoreWeave Mission Control ensures CKS cluster readiness at delivery. Comprehensive monitoring tracks the health of all infrastructure components, enabling optimal cluster performance and resiliency.

Start building on CKS today

Don’t settle with a Kubernetes platform built for web applications. Use a platform made for AI.