DemandTeq © 2025-2026 All Rights Reserved. 

Scaling Generative AI with Confidence: LLM-d and OpenShift for Distributed Inference

As large language models grow in capability, they also grow in complexity—requiring GPU memory and compute beyond what most single systems can provide. For infrastructure and operations teams, this creates new challenges around deployment, scheduling, cost management, and reliability.

In this session, we’ll introduce LLM-d, an open, Kubernetes-native framework for distributed inference. You’ll learn how Red Hat is leading efforts across the community to shape LLM-d into a scalable, operator-friendly platform for production GenAI.


We’ll demonstrate how LLM-d integrates into OpenShift AI, supports multi-GPU workloads, and provides:

  • Declarative model deployment using Kubernetes-native APIs
  • Distributed serving for large models like Llama3 and Granite

Event details

Date: Thursday, 11 September 2025
Time: 10:30 AM IST | 1 PM SGT | 3 PM AEST

Speakers

DemandTeq © 2025-2026 All Rights Reserved. 

Scaling Generative AI with Confidence: LLM-d and OpenShift for Distributed Inference

As large language models grow in capability, they also grow in complexity—requiring GPU memory and compute beyond what most single systems can provide. For infrastructure and operations teams, this creates new challenges around deployment, scheduling, cost management, and reliability.

In this session, we’ll introduce LLM-d, an open, Kubernetes-native framework for distributed inference. You’ll learn how Red Hat is leading efforts across the community to shape LLM-d into a scalable, operator-friendly platform for production GenAI.


We’ll demonstrate how LLM-d integrates into OpenShift AI, supports multi-GPU workloads, and provides:

  • Declarative model deployment using Kubernetes-native APIs
  • Distributed serving for large models like Llama3 and Granite

Event details

Date: Thursday, 11 September 2025
Time: 10:30 AM IST | 1 PM SGT | 3 PM AEST

Speakers