DemandTeq

As large language models grow in capability, they also grow in complexity—requiring GPU memory and compute beyond what most single systems can provide. For infrastructure and operations teams, this creates new challenges around deployment, scheduling, cost management, and reliability.

In this session, we’ll introduce LLM-d, an open, Kubernetes-native framework for distributed inference. You’ll learn how Red Hat is leading efforts across the community to shape LLM-d into a scalable, operator-friendly platform for production GenAI.

We’ll demonstrate how LLM-d integrates into OpenShift AI, supports multi-GPU workloads, and provides:

Declarative model deployment using Kubernetes-native APIs
Distributed serving for large models like Llama3 and Granite

Event details

Date: Thursday, 11 September 2025
Time: 10:30 AM IST | 1 PM SGT | 3 PM AEST

We’ll demonstrate how LLM-d integrates into OpenShift AI, supports multi-GPU workloads, and provides:

Declarative model deployment using Kubernetes-native APIs
Distributed serving for large models like Llama3 and Granite

Event details

Date: Thursday, 11 September 2025
Time: 10:30 AM IST | 1 PM SGT | 3 PM AEST