Agenda

Accelerating the path to Superintelligence

September 29 – October 1, 2026Moscone South, San Francisco
Three tracks

Tracks Overview

track 01

Build adaptive AI infrastructure

Running AI in production is a different job from running anything else. The infrastructure decisions you make today—how you provision capacity, design for security, optimize throughput, and manage costs—determine how fast you can move tomorrow. In this track, hear from customers and CoreWeave experts on the latest in compute, networking, storage, observability, and security at AI scale, plus the platform patterns that turn raw infrastructure into systems your team can actually rely on.

track 02

Push models further

The expectations on your models keep growing—and the infrastructure and workflows behind them have to keep pace. In this track, hear from customers and CoreWeave experts who are pushing the model frontier on what they've learned running distributed training at scale, building RL pipelines that hold up in production, fine-tuning for real domain constraints, and the experiment tracking and deployment patterns that make every iteration faster.

track 03

Evaluate and monitor agents

Shipping an agent is the beginning of the work, not the end. Production agents live and die by how well you can see what they're doing, catch what's breaking, and turn real-world experience into continuous improvement. In this track, hear from customers and CoreWeave experts on how to gain end-to-end observability, surface failure modes, prevent regressions, and close the loop from inference back to training—so every user interaction makes your agent measurably better.

Track
01

Build adaptive AI infrastructure

Hear from CoreWeave customers and experts on building AI infrastructure—compute, networking, storage, observability, and security—that your team can actually rely on in production.

Sessions

Run AI workloads on NVIDIA Vera Rubin NVL72

Not every architecture leap is created equal. Learn what makes NVIDIA Vera Rubin NVL72 genuinely different—from its underlying architecture and the workloads it’s optimized for, to the performance gains it delivers over prior generations. Hear directly from CoreWeave’s product and engineering teams on what it actually takes to run AI workloads on Vera Rubin, so you can make an informed call on whether it’s the right next step for your organization.

Plan capacity and control costs at scale

In a constrained GPU market, the teams that win are the ones who know how to find available capacity and make smart decisions about how to use it. Learn how to evaluate spot, flex, and reserved models based on your workload needs. Use real-time usage dashboards to monitor spending and keep your team accountable. And apply the same cost management patterns that have helped CoreWeave customers meaningfully improve their infrastructure economics.

Secure your AI infrastructure end to end

Security gaps don’t announce themselves—they just get exploited. Learn how to design security into your AI infrastructure from day zero, covering identity and access management, workload isolation, secure sandbox environments, and encryption. Through real customer examples spanning multiple industries, you’ll see where organizations most commonly expose risk and what it takes to close those gaps before they become problems.

Operate AI seamlessly across clouds

Multi-cloud shouldn’t mean multiplying your operational overhead. Learn how to manage AI workloads across clouds and environments without the complexity that typically comes with it—from deploying Cross-Cloud LOTA and SUNK across providers to bridging cloud and on-premises infrastructure. You’ll leave with concrete, proven patterns for simplifying multi-environment operations, grounded in real customer examples.

Maximum token output with infrastructure optimizations

Token throughput isn’t just a model problem—it’s an infrastructure problem. Learn how decisions at the rack level,from cooling and rack design to orchestration and control planes, directly affect output at scale. You’ll learn how to benchmark throughput across configurations, understand the tradeoffs that matter when you’re operating at scale, and apply patterns that help your team push more tokens without sacrificing reliability or efficiency.

Discover best practices for efficient training and inference

Getting more out of your existing compute doesn’t require more hardware—it requires better practices. Learn how to run training and inference workloads more efficiently, from infrastructure optimizations that reduce time-to-train to serving strategies that lower latency and cost at inference time. You’ll leave with actionable techniques you can apply immediately to raise MFU, improve goodput, and get measurably more from the infrastructure you already have.

Track
02

Push models further

Learn what it takes to push models further: distributed training at scale, production-ready RL pipelines, fine-tuning for real domain constraints, and faster iteration on every run.

Sessions

Train at scale

Pre-training at hyperscale breaks in predictable ways if you know where to look. Learn how to run pre-training at scale with end-to-end observability across your models and clusters, and how pioneering teams are eliminating the infrastructure overhead that typically slows the first model run. We’ll walk through a real production setup: what the configuration looks like, what breaks, and what you need to have in place before you scale. You’ll leave with a clear picture of what production-ready hyperscale training actually requires.

Run AI at scale

Published benchmark numbers are only useful if you know how to interrogate them. Learn how CoreWeave’s MLPerf and NVIDIA Exemplar results for training and inference were produced, what the NVIDIA partnership enables at the hardware layer, and how to map those published numbers to your own model size and workloads. You’ll leave with a practical framework for cutting through infrastructure claims before they cost you a contract—and a clearer sense of what performance at scale actually looks like when the methodology is transparent.

Accelerate the loop

Slow iteration doesn’t just feel painful—it compounds directly into time to market. Learn two concrete patterns fortightening the development loop: how to run experiment tracking at production scale without the overhead that typically slows the cycle, and how connecting your data pipeline to serving under a unified infrastructure layer eliminates the handoff delays that consume the second half of your roadmap. Both patterns are designed to give you actionable insights you can take away and apply immediately.

Tune and optimize

When the standard fine-tuning recipe doesn’t fit your domain, the gap between general practice and what your workload actually requires gets expensive fast. Learn what fine-tuning at scale truly demands, with a focused look at two verticals—robotics and creative/biotech—where the tuning approach, data format, and hardware configuration diverge sharply from general practice. You’ll leave with a concrete model for designing your fine-tuning workflow around your domain’s actual constraints, rather than forcing your workload into a stack that wasn’t built for it.

Reinforce and align

RL has different infrastructure failure modes than supervised fine-tuning, and most teams hit them mid-run. Learn where the pipeline actually breaks—environment isolation, compute scaling at reward-model inference, the gap between a research RL setup and a production one—and which infrastructure pattern fits your current stage. We’ll cover the full range: infrastructure-scale training RL for teams running serious workloads, and serverless approaches for teams that want the capability without managing persistent compute. You’ll leave knowing how to build a production RL pipeline that doesn’t surprise you at the worst moment.

Run in production

Getting a model to production is an infrastructure decision as much as a model decision—and the wrong call costs you in ops overhead, latency, or both. Learn how to evaluate the deployment models that balance infrastructure abstraction, customization, and SLA requirements for production workloads. Through two real production deployments, you’ll walk away with a clear decision framework for choosing the right approach for your workload, yourteam, and the performance bar your users actually expect.

Track
03

Evaluate and monitor agents

Master end-to-end agent observability—surfacing failures, preventing regressions, and closing the loop from inference back to training so every interaction makes your agent better.

Sessions

Agent architectures and multi-agent systems

The gap between a demo agent and a production one comes down to architecture. Learn how modern agent systems are designed to reason, plan, remember, and collaborate—and the decisions that determine whether they hold up when complexity scales. We’ll explore context management, memory systems, planning loops, and multi-agent coordination, with an honest look atthe tradeoffs between simplicity and autonomy. You’ll leave with practical patterns for decomposing complex tasks and building agents that stay effective as workflows, tools, and user expectations grow more demanding.

Agent frameworks, protocols, and tooling

The agent ecosystem is maturing fast, and knowing which abstractions to reach for makes the difference between a system that scales and one that becomes a maintenance burden. Learn how to navigate the emerging landscape of agent frameworks, SDKs, and protocols—how to choose the right abstractions for your use case, instrument agents for real visibility, and leverage standards that let agents, tools, and services work together without brittle custom glue. You’ll leave with practical approaches to building development workflows that accelerate iteration without compounding operational complexity.

Agent evaluation and reliability

Demos don’t predict production performance—rigorous evaluation does. Learn how leading teams measure agent performance, detect regressions, and build genuine confidence in increasingly autonomous systems. We’ll cover evaluation methodologies from offline benchmarks to production-derived test suites, how to construct high-quality evaluation datasets from real-world traces, and which metrics actually correlate with user outcomes. You’ll leave with the tools to move beyond anecdotal results toward repeatable, trustworthy agent performance at scale.

Agent observability and operations

You can’t operate what you can’t see. Learn how to monitor, troubleshoot, and run fleets of agents in production—using tracing, observability, and telemetry to surface what agents are actually doing, why they fail, and where performance bottlenecks emerge. We’ll cover operational best practices for managing reliability, controlling costs, and debugging complex multi-step workflows. You’ll leave with practical techniques for turning opaque agent behavior into clear, actionable operational insight.

Agentic inference

Agent workloads don’t behave like standard inference—and infrastructure designed for one will underperform on the other. Learn how the unique demands of agentic workloads are reshaping inference systems design, from tool calling and context management to model routing and workflow orchestration. We’ll work through the real tradeoffs between latency, cost, and capability, and examine how infrastructure choices ripple through agent performance. You’ll leave knowing how to optimize systems built for complex, iterative reasoning at scale.

Accelerate the agent loop

The fastest-improving agent systems aren’t just running tasks—they’re learning from their own outputs. Learn how autonomous research systems are transforming the way organizations discover insights, improve models, and accelerate experimentation. We’ll cover how evaluation-driven development makes these systems trustworthy in practice, from turning production traces into benchmarks to building quality gates that determine which behaviors get promoted to production. You’ll leave with a concrete approach to building agents that automate research tasks and get measurably better overtime.

Immersive workshops
hands-on labs

Before you hit submit, here's what separates Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin lectus tellus, gravida et molestie quis, scelerisque in ligula. Cras vestibulum a elit non pretium. Mauris dignissim, nibh eu vestibulum rhoncus, mauris risus sodales mi, a euismod leo lacus nec dui. Donec accumsan sed dui a lacinia. In vulputate varius sapien ac interdum.

Intermediate/Advanced
~4 hours

Autoresearch: Building the Tireless Scientist

Some problems don't fit in a single prompt. They take dozens of experiments, careful measurement, and the patience to try again. In this hands-on workshop, you will build an autoresearch agent that proposes a change, runs it, reads the result, and iterates — turning a problem statement into a campaign of measurable progress against a real benchmark.

You will design the loop, trace every step with W&B Weave, log experiments as the agent works, and watch the metrics climb. No local GPU or heavy setup required.

Led by:
BK
Brandon Kates
Forward Deployed Engineer, CoreWeave
KT
Kai Wei Tan
Senior Forward Deployed Engineer, CoreWeave
beginner/Intermediate
~2.5 hours

Climbing the Inference Ladder

Every model deployment starts the same way — one API call, one response. But the moment you care about latency, cost, custom weights, or control, that single call becomes a journey. In this hands-on workshop, you'll climb the inference ladder on CoreWeave: starting with Serverless (your first OpenAI-compatible request in seconds, zero infrastructure), graduating to Dedicated Inference (bring your own weights and tune the knobs that shape throughput and tail latency), and finishing on self-hosted CKS (your own serving stack, total control over GPU and runtime).

You'll fire requests at each tier, compare the latency and cost trade-offs, and track every experiment in Weights & Biases so you can see — not guess — which rung is right for your workload. By the end, you'll have a mental map of the full inference journey and the hands-on reps to know when to climb and when to stop.

Led by:
NN
Nisha Nadkarni
Specialist Field Engineer, CoreWeave
TM
Tara Madhyastha
Senior Specialist Field Engineer, CoreWeave
beginner/Intermediate
~2.5 hours

Anatomy of Performance: Where Every Second Hides

Your job just launched. GPUs become hot, NCCL collectives form, gradients start flowing — and somewhere between "looks fine" and "why is my MFU so low," you realize you really have no idea where the time actually goes. In this hands-on workshop, you'll crack open a training run on SUNK (Slurm on Kubernetes) and learn to see what's really happening under the hood — from the first collective to the final checkpoint write.

We'll cover the three places training runs lose time and most people don't look: communication, computation, and I/O. You'll build intuition for NCCL and collective operations, learn to profile and identify bottlenecks, and understand how storage performance and data loading patterns shape end-to-end throughput. Along the way, we'll cover the monitoring tools and checkpointing practices that keep long runs healthy.

You'll leave with the tools and the instincts to close the gap between the utilization you're getting and the GPUs you're paying for.

Led by:
NN
Nisha Nadkarni
Specialist Field Engineer, CoreWeave
TM
Tara Madhyastha
Senior Specialist Field Engineer, CoreWeave
Intermediate/Advanced
~1.5 hours

Let's Get Physical with AI: Teaching Robots New Skills in Simulation

Learn how the Coreweave team trained a Humanoid to perform {inserttask name-tbd} and how we leverage simulation environments to train, develop, and test these skills before deploying on a real embodied AI device.

You will learn and build your very own skill for a robot and test it in a simulation environment running on CoreWeave Infra and track it using Weights & Biases.

Structured as a competition:

  • Users will be able to define their own policy
  • Control the robot in the simulation to perform the task
  • Teams closest to the success criteria will win
Led by:
BK
Anu Vatsa
Senior Account Solutions Architect, CoreWeave
Intermediate
~1.5 hours

From Sequence to Shipped: Fine-Tune a Small Protein Model

Fine-tune your own protein-confidence model and ship it to the W&B Model Registry with full lineage. By the end of this hands-on 90-minute workshop, you'll have a small, versioned, trustworthy model ready to plug into downstream research workflows and AI agents.

Led by:
BK
Lorenzo Balderrama Porras
Position or Title, CoreWeave
Intermediate
1.5 hours

Ship It or Block It: Govern a Production AI Agent

Your AI agent passes every eval. The demo looks great. Then someone in compliance asks "what did we test, on which version, and who signed off?" and the room goes quiet. In this hands-on workshop, you'll run a real governance review gate on a live agent, and compete while you do it.

We'll cover the two sides of every release decision most teams keep separate: breaking the model, and defending it. First, as the red team, you'll launch jailbreaks, prompt injection, and PII-extraction attacks, scoring points for every exploit by severity. Then, as the governance board, you'll scope the risks, read the evals and red-team evidence, probe the model yourself, and decide whether to approve, request changes, or block, mapped to NIST AI RMF and the EU AI Act. A live leaderboard tracks it all on W&B Weave.

Led by:
BK
Karan Nisar
Position or Title, CoreWeave