All

KubeCon 2024 Recap: AI Innovation and What's Ahead

KubeCon 2024 Recap: AI Innovation and What's Ahead

KubeCon + CloudNativeCon North America 2024 just wrapped up, bringing together thousands of innovators, developers, and thought leaders in the Kubernetes and cloud-native ecosystem. This year’s event showcased how Kubernetes continues to evolve, particularly as a foundation for AI and machine learning applications. CoreWeave was proud to be part of this pivotal event, sharing our expertise in AI infrastructure and learning from some of the brightest minds in the industry.

KubeCon attracts some heavy hitters in the Kubernetes community, and this year’s edition was one for the books. For CoreWeave, this year was not just about showcasing what we do—it was about engaging with the community, exchanging ideas, and bringing valuable insights back to our teams. Here’s what we learned, what we shared, and how we’re keeping the conversation going.

What We Learned

KubeCon 2024 highlighted how Kubernetes is maturing as a platform for AI/ML workloads. The sessions and conversations sparked valuable insights for our team, including:

  • The Kubernetes Community Is Growing: The Kubernetes community has grown substantially since our last visit to KubeCon in 2023. Even more exciting than the sheer volume increase on the network is the diversification of open-source projects and AI use cases.
  • AI/ML Advancements Are Accelerating: Tools like Kueue are evolving rapidly, bringing much-needed features for scheduling and scaling. This shift reflects how AI infrastructure needs are driving innovation across the Kubernetes ecosystem.
  • Customer Needs Are Dynamic: A recurring theme throughout the event was the importance of flexibility. Customers' needs evolve quickly, particularly when it comes to multi-cluster scheduling and cross-site coordination—key challenges we’re working to address at CoreWeave.
  • New Connections Are Critical: It was great to have one-on-one conversations with the people directly responsible for creating the software we use. One that stood out was a sync with the team at Cilium, an open-source, cloud-native solution for network connectivity in Kubernetes.
  • Networking Is Essential: Conversations with experts about the networking stack reinforced the need for low-latency, high-performance connectivity, which aligns with the solutions we’re building for our clients.

Our team left KubeCon inspired by the community’s drive to tackle complex challenges and excited to apply what we have learned to our work.

What We Shared

CoreWeave’s contributions to KubeCon 2024 centered around our expertise in AI infrastructure, particularly the tools and strategies we use to manage large-scale Kubernetes clusters.

  • Keynote Presentation: Our CTO, Peter Salanki, and SVP of Engineering, Chen Goldberg, kicked off KubeCon with the keynote, Take a Peak Under the Hood of Cloud-Native AI at Scale. With over 5,000 people in attendance, the talk provided an in-depth look at the challenges of managing Kubernetes-based AI training clusters. The presentation highlighted CoreWeave’s innovative approach to fleet management through Mission Control. With fantastic feedback from the audience, attendees shared that they were particularly fascinated by the performance metrics and interruption-reduction strategies emphasized in the discussion.
  • theCUBE Interview: After the keynote, Peter and Chen interviewed with theCUBE. The talk highlights CoreWeave's strategies for efficient fleet management, emphasizing the importance of observability, automation, and cluster health to ensure optimal performance.
  • Mission Control Demos: At our booth, attendees received hands-on demonstrations of Mission Control, our comprehensive system for validation, monitoring, and automation. Visitors were shown how  Mission Control enables users to see cluster health on day one and maintain high reliability throughout its lifecycle.
  • SUNK Demos: Attendees were also given the opportunity to explore SUNK, CoreWeave’s groundbreaking solution for workload fungibility. Visitors were excited to see how SUNK enables training and inference workloads to run simultaneously on the same cluster, streamlining resource use and reducing bottlenecks to maximize efficiency. 
  • Booth Conversations: Engaging with attendees at the booth was another highlight. The feedback we received validated the solutions we’ve built for our customers and underscored the value of transparency, observability, and automation in large-scale cluster management.

Join the Conversation

Let’s keep the momentum from KubeCon going. Keynote speakers Peter Salanki and Chen Goldberg led an on-demand webinar about CoreWeave's Mission Control and how we optimize AI fleet management. Access the full webinar here.

Thank you to everyone who stopped by our booth, attended our keynote, or shared their insights with us at KubeCon. We look forward to continuing these conversations and driving innovation together.

Connect with us

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.