If your Unity NPC voice cuts out with HttpRequestException: 502 Bad Gateway from the ElevenLabs Conversational AI Unity SDK, the provider edge is overloaded or your client has no graceful degradation path.
Mid-2026 indie-tier pricing pulled many small teams onto ElevenLabs at once; 502 spikes during Q3 evening peak hours are now the dominant Discord report for Unity voice NPCs. This article adds retry discipline, a circuit breaker, cache warming, and a local Piper fallback so dialogue never goes silent.
Problem summary
Common symptoms:
- Hero NPC speaks on first line, then silence after the player triggers the next branch.
- Console shows
502 Bad GatewayorHttpRequestExceptionwith ElevenLabs endpoint in the stack trace. - Retries without backoff make failures worse during peak hours.
- No offline voice path when the API is unhealthy.
Why this matters now:
- 2026 Conversational AI tier changes increased concurrent indie traffic on shared edges.
- Players interpret voice dropouts as broken game quality, not transient cloud load.
- Fest demos with voice-forward trailers fail hour-one reviews when the first spike hits.
Root causes
- Provider edge overload during regional peak (502 is upstream, not your API key).
- SDK default: no retry or immediate tight retry loops amplifying load.
- No local fallback — silence instead of degraded audio.
- Cold voice cache — every line hits the network on first play.
- No circuit breaker — one NPC can spam failing requests for the whole scene.
Step-by-step fix
Step 1 — Wrap SDK calls with bounded exponential backoff
Retry only on 5xx (502, 503, 504), not on 401/403.
Suggested policy:
| Attempt | Base delay | Jitter |
|---|---|---|
| 1 | 250 ms | 0–100 ms |
| 2 | 500 ms | 0–150 ms |
| 3 | 1000 ms | 0–200 ms |
Stop after 3 attempts; do not retry 4xx.
async Task<AudioClip> SynthesizeWithRetryAsync(string text, CancellationToken ct)
{
var delays = new[] { 250, 500, 1000 };
for (int i = 0; i <= delays.Length; i++)
{
try
{
return await _elevenLabsClient.SynthesizeAsync(text, ct);
}
catch (HttpRequestException ex) when (IsRetryable502(ex) && i < delays.Length)
{
var jitter = Random.Range(0, 100 + i * 50);
await Task.Delay(delays[i] + jitter, ct);
}
}
throw new VoiceSynthesisUnavailableException("ElevenLabs exhausted retries");
}
Step 2 — Add a circuit breaker (30-second window)
Track consecutive 5xx in a rolling window:
- Open circuit after 3 failures within 30 seconds.
- While open: skip live API; route to fallback immediately.
- Half-open after 60 seconds: allow one probe request.
- Close circuit on probe success.
Log circuit_state (closed / open / half_open) in your dialogue telemetry.
Step 3 — Warm voice cache at scene start
For each named NPC in the active scene:
- Pre-synthesize opening bark + greeting lines at load.
- Store
AudioClipor.oggbytes keyed by(npc_id, line_id, voice_id). - On 502 during play, play cached line for the same
line_idif hash matches script text.
Cache directory example: StreamingAssets/voice_cache/v1/.
Step 4 — Piper local fallback for low-stakes barks
Use Piper (or OS TTS where acceptable) for:
- Ambient crowd lines
- Combat callouts
- Repeated UI acknowledgements
Keep ElevenLabs for hero moments only. Fallback must return audio within 500 ms of circuit open.
Pipeline:
Request → cache hit? → play
→ circuit open? → Piper fallback
→ else ElevenLabs with retry
→ still fail? → Piper + log
Step 5 — Per-NPC token and request budget
- Cap requests per NPC per session (example: 40 lines).
- Cap characters per minute per NPC to prevent one chatty NPC starving others.
- Queue synthesis per NPC (FIFO), not global fire-and-forget.
Verification checklist
- Forced 502 (sandbox or proxy) triggers fallback within 500 ms.
- Three synthetic 502s open the circuit; fourth call skips API.
- After 60 s half-open, one successful probe closes the circuit.
- Scene-start warm cache plays without network on first player interaction.
- Peak-hour playtest: no silent gaps longer than 1 s on hero lines.
Alternative fixes for edge cases
- WebGL builds: ElevenLabs from browser may need a backend proxy; do not embed API keys in client WASM.
- Multi-language: separate
voice_idper locale in cache keys. - Streaming TTS: if SDK supports stream, treat mid-stream 502 as cancel + fallback clip, not hang.
Prevention tips
- Pre-render hero monologue for demo/fest builds as committed
.oggassets. - Monitor
502_rate_5min your dialogue dashboard; alert above 5%. - Load-test 20 concurrent synthesis calls before cert week.
- Keep a non-AI text-only mode flag for cert reviewers without API keys.
FAQ
Is 502 always ElevenLabs overload?
Usually yes for intermittent 502 during peak hours. Persistent 502 on every call suggests wrong endpoint URL or revoked key (check 401 separately).
Should I retry forever?
No. Bounded retries plus circuit breaker protect player experience and provider edges.
Can I use Windows SAPI instead of Piper?
Acceptable for internal QA; Piper gives more consistent cross-platform timbre for shipped fallback.
Does fallback violate AI disclosure rules?
Disclose assistive / generative voice in store forms. Fallback is still AI-assisted if Piper replaces the same pipeline — update your disclosure packet accordingly.
Related links
- OpenAI API 429 Too Many Requests in Unity NPC Dialogue - Retry Backoff and Token Budget Fix
- Anthropic API 529 Overloaded in Game Backend - Queue Retry and Fallback Model Fix
- 15 Free LLM-Driven NPC Dialogue and Local Fallback Net Resources (2026)
- 12 Free AI Voice and Dialogue Tools for Indie Games (2026)
- Your First LLM NPC Dialogue System With a Hard Fallback Net (2026)
- Official: ElevenLabs API status and Conversational AI docs
Bookmark this fix before your next voice-heavy playtest, and share it with gameplay programmers wiring Conversational AI in Unity.