Slurm on Kubernetes (SUNK)

Run large-scale Slurm research clusters with SUNK, deeply integrated with CoreWeave’s AI infrastructure for unmatched utilization, scalability, and job performance.

Maximize researcher productivity with a Slurm cluster purpose-built for AI

Run training, inference, and reinforcement learning on a single high-performance cluster built for research at scale. SUNK delivers the full power of Slurm on CoreWeave’s optimized infrastructure, supporting jobs that scale beyond 32,000 GPUs, with dedicated researcher environments, automated user access management, and deep job observability. Researchers move faster, utilize resources more efficiently, and scale seamlessly from experiment to production.

Enterprise-grade research infrastructure at scale

SUNK unifies security, scalability, performance, and observability to deliver a high-performance Slurm experience purpose-built for AI research clusters on CoreWeave’s optimized infrastructure.

Security

SUNK User Provisioning automatically synchronizes POSIX and Slurm users with CoreWeave IAM or any supported Identity Provider, such as Okta or Google Workspace. User and group updates propagate instantly, eliminating manual configuration, reducing operational risk, and accelerating secure researcher access to compute.

Scalability

Workloads and data move fast and friction-free across clouds and regions without lock-in, giving you the flexibility to choose and the performance to stay.

Performance

The SUNK Scheduler runs training, inference, and reinforcement learning workloads on the same cluster to maximize efficiency. Topology-aware scheduling and optimized job requeue improve performance and resource utilization across every phase of research.

Observability

Quickly troubleshoot and optimize performance using Grafana dashboards purpose-built for SUNK. Access rich visibility into Slurm job metrics, hardware, networking, and storage layers, all tightly integrated with CoreWeave’s observability stack for end-to-end infrastructure insight.

Left
Right

Streamline secure access with SUNK

Discover how Automated User Provisioning in SUNK automates identity management for AI research clusters. Reduce setup time, improve security, and keep teams focused on innovation.

Run on industry-leading Cloud infrastructure services

SUNK runs on infrastructure services that provide the ideal combination of ease of use, workload fungibility, performance, and scale.

Compute Services

Get the latest GPU compute you need for your AI workloads through a Kubernetes-native environment

Storage Services

Flexible, purpose-built, high-performance storage solutions tailored for AI

Networking Services

High-performance networking for optimal cluster scale-out and connectivity

Supercomputing Scale & Enterprise-grade security

With massive megaclusters, CoreWeave GPU clusters help support multi-trillion parameter model training.

Left
Right

“When customers experienced challenges in interoperating between Slurm and Kubernetes orchestration frameworks, we gave them that capability through our SUNK service that integrates these frameworks. This allows both training and inference to work on the same infrastructure, which is a massive efficiency unlock for our customers.”

— Mike Intrator, CEO at CoreWeave

Technical Partnership and 24/7 support

Our team of solution architects will get SUNK up and running for you in a matter of hours.

A partnership mindset

Experience top-of-the-line assistance with extensive and comprehensive onboarding

Best-in-class teams

Access expert engineers for day-to-day support via Slack, with ultra-fast turnaround times

Enhanced observability

Get better visibility into critical hardware, Kubernetes, and Slurm job metrics via intuitive dashboards

Left
Right

Frequently asked questions

How does CoreWeave’s Automated User Provisioning (AUP) work with my existing identity provider?

AUP connects directly to enterprise Identity Providers like Google Workspace, Okta, or Microsoft Entra using the SCIM protocol. It automatically syncs users and groups from your existing directory into CoreWeave IAM, keeping access policies consistent across environments without manual setup or custom scripts.

What’s the difference between Automated User Provisioning (AUP) and SUNK User Provisioning (SUP)?

AUP handles identity federation. It brings users and groups from your enterprise IdP into CoreWeave IAM. SUP handles access provisioning. It automatically creates and manages accounts inside Slurm-on-Kubernetes (SUNK) clusters. AUP and SUP work together to automate the full lifecycle from identity to cluster access, eliminating manual onboarding and offboarding.

Does AUP or SUP help with access control and compliance?

Yes. AUP and SUP ensure every access change made in your IdP is reflected across CoreWeave IAM and SUNK in real time. That means instant deprovisioning when users leave and auditable, policy-driven access control for compliance and security reviews.

Left
Right

See what SUNK can do

Get the resource flexibility your teams need to build, train, and deploy new models.