Case study: cutting API latency with a disciplined regional rollout

January 15, 202511 min read

Context

A platform serving students and institutions across several continents ran primarily out of a single cloud region. p95 API latency from Kathmandu, Dubai, and Lagos routinely sat 3× higher than from North America. Mobile web felt “fine” on Wi‑Fi but fragile on 4G once JSON payloads and waterfall calls stacked up.

Leadership wanted improvement before rewriting the whole backend. The mandate: measurable latency wins, minimal downtime, and a credible rollback if something went wrong.

Baseline (honest numbers)

We picked three synthetic checks and three RUM-derived dashboards:

Check	Before (p95, ms)	Notes
Auth token exchange	420–680	TLS + RTT dominated
Application list (paginated)	510–890	N+1-ish queries amplified RTT
File metadata (signed URL)	380–620	Round trip to origin region

RUM confirmed the same story: TTFB on document requests correlated with distance to origin, not CPU on the cluster.

Strategy: parallel region, not lift-and-shift day one

Stand up a read-optimized path in Singapore (API + Redis replica + read pool).
Route only read-heavy, idempotent endpoints to the new region via GeoDNS with a low TTL.
Keep writes on the primary region initially, with async replication we already trusted.
Feature flag per tenant cohort so we could steer 10% → 50% → 100% of traffic.

This avoided rewriting data ownership on day one while still moving bytes closer to users.

Implementation notes

Connection pools

Opening one pool per instance against a faraway primary still hurt when bursts hit. We:

Right-sized pool limits per service so we did not exhaust DB max_connections when regional replicas doubled instance counts.
Added pool wait time metrics—when waits climbed, we scaled replicas before CPUs maxed.

Caching boundaries

We cached stable reads (reference data, feature flags) at the edge with short TTLs and explicit cache keys per locale. We did not cache personalized lists without a hard invalidation story—stale “application status” is worse than +200ms latency.

Rollback

GeoDNS toggle + flag meant rollback was flip traffic + drain pools, not “restore a database.” We rehearsed it once in staging with a runbook under 15 minutes wall clock.

Results

After full cutover for read paths and moving upload initiation to a regional bucket + S3 transfer acceleration where applicable:

Check	After (p95, ms)	Delta
Auth token exchange	260–410	RTT + edge TLS termination
Application list	320–520	Fewer round trips + regional DB reads
File metadata	210–380	Regional origin for signed URL step

Numbers varied by ISP and time of day; the direction held across two intake cycles.

What I’d do earlier next time

Instrument per-query timings broken down by DB region from day one—otherwise you argue about “the network” forever.
Put connection pool dashboards next to latency dashboards; they are the same story told twice.
Write the rollback runbook before the first 10% traffic shift, not after the first incident dress rehearsal.

Takeaway

Regional rollout is a product and ops problem as much as an infra one. If you can’t explain who gets which stack and how to undo it in minutes, you don’t have a rollout—you have a hope.

All articles