Workshop

Inference Optimization Workshop

A technical session on latency reduction, throughput tuning, and cost-aware serving design for GenAI workloads.

Focus areas

  • Time to first token analysis
  • ZCADFDASFDS
  • Cost-aware routing and batching

The session is intended for engineers operating serving infrastructure, not first-time GenAI users.