Skip to main content
Deployed apps auto-scale automatically. Instances are added when demand increases and removed when it subsides. Scaling behavior is independently managed across three layers, and all scaling parameters are configurable at the app level.

Scaling architecture

CDN and static assets

Static assets (JavaScript, CSS, images) are served from a globally distributed edge network. Requests are served from the nearest edge location to the user, and the CDN scales automatically with no configuration required.

App service

The app service scales based on two metrics, evaluated independently. If either metric exceeds its target, new instances are added:
  • Request count — by default, when requests per minute per instance exceed 1000, additional instances are added to distribute the load.
  • CPU utilization — by default, when average CPU usage across instances exceeds 70%, additional instances are added.
When both metrics fall below their targets, instances are gradually removed down to the minimum. Cooldown periods prevent rapid oscillation — after scaling out, the system waits before adding another instance, and after scaling in, it waits before removing another. All app service scaling settings — including request count target, CPU target, instance limits, and cooldowns — are configurable per app.
Deployments use a rolling update strategy — new instances are launched and must pass health checks before traffic is shifted to them. Previous instances continue serving requests until the new ones are healthy, so deployments do not cause downtime nor affect autoscaling.

Worker service

The worker service scales independently from the app service, based on workflow demand — the number of workflows currently running or waiting to run. By default, the scaling target is calculated as max concurrency × target demand (4 × 50% = 2). When demand per instance exceeds this value, additional instances are added. A lower target demand value scales more aggressively, adding capacity before instances are fully utilized. A higher value waits until instances are closer to their concurrency limit before scaling. The worker service supports scale to zero — when min instances is set to 0, all worker instances are removed when there is no workflow demand, and new instances are provisioned automatically when workflows are submitted. All worker service scaling settings — including concurrency, target demand, instance limits, and cooldowns — are configurable per app.

Database

Each app gets a dedicated Postgres database that scales independently. Both Synthetiq Hosted and BYOI use serverless Postgres that auto-scales within configurable capacity limits. See Configuration for details.