HeadlinesBriefing favicon HeadlinesBriefing.com

Auto-Scaling ComfyUI-API on Azure AKS

DEV Community •
×

A solutions architect migrated a ComfyUI and ComfyUI-API Stable Diffusion orchestrator from a single GPU VM to Azure Kubernetes Service (AKS). The goal was to solve scalability and cost issues by decoupling training, fine-tuning, and inference workloads that were previously tightly coupled on one machine.

After evaluating tools like vLLM and KServe, the team chose AKS with KEDA for HTTP-based autoscaling. They containerized the application, using a custom Dockerfile to download models like `dreamshaper_8.safetensors` at runtime to keep the image lean. The deployment includes a GPU node pool (`Standard_NC4as_T4_v3`) and a separate CPU pool for other workloads.

KEDA's HTTP add-on scales the ComfyUI-API deployment from zero to a maximum of two replicas based on incoming traffic. After five minutes of inactivity, it scales back down to zero, drastically reducing GPU costs for idle workloads. This provides a production-ready, cost-effective solution for orchestrating complex AI workflows in the cloud.