25/Jamie Dborin
Behind the Stack, Ep 3: How to Serve 100 Models on a Single GPU with No Cold Starts
In many orgs, self-hosting LLMs starts with a single model. Then comes a customisation request. Then another. And before long, you’ve got dozens of fine-tuned variants - each trained with a LORA or other parameter-efficient technique.