Workshop
Inference Optimization Workshop
A technical session on latency reduction, throughput tuning, and cost-aware serving design for GenAI workloads.
Focus areas
- Time to first token analysis
- ZCADFDASFDS
- Cost-aware routing and batching
The session is intended for engineers operating serving infrastructure, not first-time GenAI users.