AI Fleet Management 101AI Fleet Management 101
CoreWeave

AI Fleet Management 101

AI Fleet Management 101

On-Demand

Webinar: AI Fleet Management 101

Location
Location
45 min. on-demand
Schedule

·

 

 — 

Location
Chen Goldberg
SVP of Engineering
CoreWeave
Peter Salanki
CTO
CoreWeave

Improve the observability and reliability of your AI cluster.

Ready to elevate your Kubernetes cluster management skills? Watch CoreWeave’s CTO, Peter Salanki, and SVP of Engineering, Chen Goldberg, discuss strategies to improve full-stack observability and reliability with AI fleet management.

Gain practical knowledge of building more reliable and efficient AI operations in Kubernetes.

Key takeaways

  • Uncover critical components of a large-scale Kubernetes training cluster optimized for AI workloads.
  • Learn how advanced fleet management techniques can enhance cluster resilience and accelerate time-to-market for AI models.
  • Discover how automation can help detect, diagnose, and respond to job failures, minimizing downtime.
  • Gain insights via comprehensive monitoring across all layers of your AI infrastructure stack.

Keep job interruptions to a minimum, and know why they happen when they do. Register for the webinar today.

Speakers

Chen Goldberg
Chen Goldberg
CoreWeave
SVP of Engineering
Peter Salanki
Peter Salanki
CoreWeave
CTO

Watch the webinar on-demand

Watch the webinar on-demand