LoRA multi-tenant inference serving
A serving approach for multi-tenant workloads where shared base models and LoRA adapters are combined without breaking isolation.
vLLM
LoRA adapters
object storage
Kubernetes