Scaling architecture
CDN and static assets
Static assets (JavaScript, CSS, images) are served from a globally distributed edge network. Requests are served from the nearest edge location to the user, and the CDN scales automatically with no configuration required.App service
The app service scales based on two metrics, evaluated independently. If either metric exceeds its target, new instances are added:- Request count — by default, when requests per minute per instance exceed 1000, additional instances are added to distribute the load.
- CPU utilization — by default, when average CPU usage across instances exceeds 70%, additional instances are added.
Deployments use a rolling update strategy — new instances are launched and must pass health checks before traffic is shifted to them. Previous instances continue serving requests until the new ones are healthy, so deployments do not cause downtime nor affect autoscaling.
Worker service
The worker service scales independently from the app service, based on workflow demand — the number of workflows currently running or waiting to run. By default, the scaling target is calculated asmax concurrency × target demand (4 × 50% = 2). When demand per instance exceeds this value, additional instances are added.
A lower target demand value scales more aggressively, adding capacity before instances are fully utilized. A higher value waits until instances are closer to their concurrency limit before scaling.
The worker service supports scale to zero — when min instances is set to 0, all worker instances are removed when there is no workflow demand, and new instances are provisioned automatically when workflows are submitted.
All worker service scaling settings — including concurrency, target demand, instance limits, and cooldowns — are configurable per app.

