ElevenLabs Conversational AI NPC with Local Ollama Fallback Architecture - 2026

Your trailer shows a voice-reactive NPC. In playtest, ElevenLabs returns 502 during the second line, the dialogue box shows cloud text, and the speaker goes silent. You added Ollama as “fallback” but Unity freezes on the first token because the integration still shells ollama run with blocked stdout.

That is not an AI feature—it is two half-wired paths pretending to be resilience.

Mid-2026 moved thousands of indie teams onto Conversational AI voice tiers after price drops, while the same quarter brought 502 spikes on shared edges and partner reviewers who disable network mid-dialogue. This AI Integration / Workflow post documents a dual-stack architecture: ElevenLabs for primary voice + cloud dialogue when healthy, Ollama HTTP for local text when cloud LLM or voice fails, Piper (or cached clips) when voice must stay local, and voice_fallback_receipt_v1.json so cert lanes see proof—not promises.

Non-repetition note: Help pages fix 502 retry and Ollama Process hang in isolation. The beginner LLM fallback net covers text-only paths. This URL owns ElevenLabs + Ollama combined architecture for voice-first NPCs—not Steam metadata, not Construct saves.

Why this matters now (May 2026)

502 evening peaks — ElevenLabs Unity SDK help documents the spike; architecture prevents silence after retries exhaust.
Ollama adoption without HTTP discipline — Process redirect hang is the dominant Windows 11 failure when teams copy CLI tutorials.
Partner cert airplane mode — Reviewers expect deterministic lines when cloud fails; voice can degrade but gameplay must continue.
Steam AI disclosure — Store disclosure checklist needs honest “online voice + local text fallback” wording.
Cost predictability — Conversational AI billed per minute; Ollama is “free at runtime” but not free in engineering—architecture separates budgets.

Direct answer: Implement a Dialogue Director state machine with three lanes: Primary (ElevenLabs voice + cloud LLM if used), Text fallback (Ollama /api/chat streaming), Voice degrade (Piper or pre-baked WAV). Log lane switches in voice_fallback_receipt_v1.json during QA week.

Who this is for

Unity 6 or Godot 4.5 teams shipping voice-forward fest demos
Engineers who fixed 502 retries but still have silent NPCs on spike
Producers preparing Q3 partner packets beside AI disclosure sprints
Beginners who completed the hard fallback net tutorial and add ElevenLabs voice layer

Time to wire MVP: ~2–3 evenings after a working dialogue tree exists.

Architecture overview

Player selects dialogue option
        │
        v
+------------------+
| Dialogue Director |
+--------+---------+
         │
    health + budget check
         │
    +----+----+----+
    v    v    v    |
 Primary  Text  Voice
 (EL+cloud) (Ollama) (Piper/cache)
    │    │    │
    +----+----+
         v
   Same UI + subtitle pipeline
         v
   Log lane + latency in receipt

Lane definitions

Lane	When	Output
Primary	ElevenLabs healthy, circuit closed, token budget OK	Streamed voice + optional cloud LLM line
Text fallback	Voice 502 or LLM 429/timeout	Ollama HTTP line + subtitle; Piper or silent VO
Voice degrade	Text OK but voice unhealthy	Cached WAV / Piper on Ollama text
Hard canned	Ollama down	Authored tree only—never silent

Rule: Hard canned is mandatory; Ollama is optional enhancement; ElevenLabs is never the only path.

Beginner path (first evening)

Keep authored dialogue tree as source of truth for story beats.
Add DialogueDirector.cs (or Godot autoload) with enum Lane { Primary, TextFallback, VoiceDegrade, Canned }.
Health-check http://127.0.0.1:11434/api/tags before first NPC prompt.
Wrap ElevenLabs calls per 502 help—three retries then open circuit.
On circuit open → call Ollama HTTP → Piper on resulting text → log lane in receipt JSON.

Success check: Airplane mode mid-conversation produces spoken or subtitled line within 3 seconds—no infinite spinner.

Developer path (production)

Separate secrets — ElevenLabs API key in env/ScriptableObject; never commit.
Token budget per scene—see OpenAI 429 help patterns for cloud text if paired.
Warm voice cache — hash (voice_id, text) → WAV on disk for fest builds.
Circuit breaker — 3 consecutive 5xx in 30s → Primary disabled 60s.
Evidence folder — release-evidence/ai/voice_fallback_receipt_v1.json per milestone build.

ElevenLabs Conversational AI (primary lane)

Responsibilities

Voice synthesis for NPC lines (Conversational AI agent or TTS endpoint per your SDK version).
Optional agent session if you use their full conversational stack—document which subset you ship (voice-only vs full agent).
Latency budget: target <2.5s to first audio byte on LAN; log p95 in receipt.

Retry and circuit (summary)

Event	Action
502/503/504	Exponential backoff ×3 per help article
401/403	No retry—fix key, fall through
Circuit open	Skip Primary 60s; route Text fallback
Success after open	Half-open: one trial line

Do not implement unbounded retry loops—fest traffic makes spikes worse for everyone.

Cache warming table

Line type	Warm?
Tutorial barks	Yes—ship in build
Branching shop	Top 20 lines
Procedural LLM	No—expect fallback lane

Store under StreamingAssets/voice_cache/{hash}.wav with manifest voice_cache_manifest.json listing text_hash, voice_id, semver.

Ollama HTTP (text fallback lane)

Why HTTP only

The Ollama Process hang fix is non-negotiable on Windows 11 Unity paths. Production uses:

POST http://127.0.0.1:11434/api/chat
{ "model": "llama3.2:3b", "stream": true, "messages": [...] }

Parse NDJSON chunks; append to UI subtitle buffer; trigger Piper when line complete.

Prompt contract (keep NPC safe)

Field	Purpose
`system`	Persona + content policy + max sentences
`context`	Quest state from authored variables only
`player_choice`	Selected option text
`max_tokens`	Hard cap (e.g. 120)

Never pass raw player chat into Ollama on fest demos—selected options only.

Health check before scene load

curl -s http://127.0.0.1:11434/api/tags

Unity: UnityWebRequest.Get("http://127.0.0.1:11434/api/tags") with 2s timeout. Failure → skip Ollama lane, go Hard canned immediately.

Model choice (2026 realistic)

Hardware	Model hint
Dev laptop 16GB	`llama3.2:3b`
Player PC mid	Optional smaller quant in settings
Steam Deck	Canned only—do not assume Ollama

Document in settings: Local dialogue: Off / Auto / Required.

Piper voice degrade (voice when ElevenLabs down)

When Ollama returns text but ElevenLabs circuit is open:

Feed text to Piper offline TTS (or play gender-matched cached narrator WAV).
Show subtitles always—accessibility + cert reviewers read text.
Log lane=VoiceDegrade in receipt.

Honest limit: Piper does not clone ElevenLabs celebrity voices—disclose in AI storefront disclosure bullet.

Dialogue Director state machine (pseudo)

OnPlayerChoose(optionId):
  if CircuitBreaker.Open: goto FallbackText
  if TokenBudget.Exhausted: goto FallbackText
  try Primary = ElevenLabs.SynthesizeAsync(line)
  on success: PlayAudio(Primary); return
  on retryable5xx after retries: CircuitBreaker.RecordFailure()
  FallbackText:
    if OllamaHealthy:
      text = Ollama.StreamChat(prompt)
      if VoiceDegradeEnabled: PlayPiper(text)
      else: ShowSubtitleOnly(text)
    else:
      PlayCanned(optionId)

Godot: same logic in autoload with await and signals—avoid blocking main thread.

voice_fallback_receipt_v1.json

Store under release-evidence/ai/:

{
  "receipt_type": "voice_fallback_receipt_v1",
  "build_id": "fest-demo-20260522",
  "tests": {
    "primary_lane_smoke": "pass",
    "airplane_mode_mid_dialogue": "pass",
    "ollama_stopped_mid_dialogue": "pass",
    "elevenlabs_502_simulated": "pass"
  },
  "fallback_pass": true,
  "lanes_observed": ["Primary", "TextFallback", "VoiceDegrade", "Canned"],
  "p95_primary_latency_ms": 2100,
  "observed_date_utc": "2026-05-22"
}

QA scenarios (mandatory)

#	Setup	Pass
1	Normal network	Primary plays
2	Airplane mode before line	Canned or Ollama+Piper <3s
3	Kill Ollama mid-scene	Canned
4	Mock 502 (proxy or test hook)	Degrade after retries
5	Token budget = 0	No cloud spend; fallback

Record in release-evidence/ai/qa-voice-fallback.md with build_id.

Unity vs Godot wiring notes

Concern	Unity 6	Godot 4.5
HTTP	UnityWebRequest + coroutine	HTTPRequest node
Audio	AudioSource clip queue	AudioStreamPlayer
Threading	Main thread parse NDJSON	Signals to main
Secrets	ScriptableObject + env	`.env` not in export

Both engines: one Dialogue Director—do not duplicate logic per NPC.

Cost and budget table (indicative)

Path	Cost driver	Control
ElevenLabs	Characters / minutes	Scene cap + cache
Cloud LLM (if any)	Tokens	Per-session budget
Ollama	Dev time	Optional player setting
Piper	Disk	Ship voice model in build

Publish max minutes per playtest hour in producer sheet—prevents surprise invoice week before fest.

Disclosure and store copy alignment

Store page must match runtime:

Online: ElevenLabs voice generation (and cloud LLM if used).
Offline: Local text model (Ollama) + offline TTS (Piper) + canned lines.

Mismatch triggers refund dashboard store-copy tags when trailer implies always-neural voice.

Link Steam Play AI disclosure checklist before upload.

Pairing with Sentis and other AI stacks

Teams mixing Unity Sentis for classification and ElevenLabs for voice should keep receipts separate:

sentis_deploy_receipt_v1.json for ONNX
voice_fallback_receipt_v1.json for NPC lanes

Do not conflate “AI feature” paragraphs in partner README—list each subsystem.

Resource roundup cross-links

15 free LLM dialogue fallback resources
50 free AI tools refresh — ElevenLabs Conversational AI row
Live-ops prompt registry sprint for prompt semver beside voice

Common mistakes

Ollama via Process on Windows—freeze.
ElevenLabs only—silent fest demos on 502.
LLM writes quest state—narrative breaks; context from authored vars only.
No canned path—cert fail airplane mode.
Unbounded retries—worse spikes + bill shock.
Same voice ID across Piper and ElevenLabs without disclosure.
Blocking main thread on HTTP—stutter even when fallback works.
No receipt—partners cannot verify.
Player free-text chat on demo—moderation risk.
Skipping cache warm for tutorial lines—every line hits 502 surface.

Proof table

Claim	Evidence	Pass
502 handled	Simulated test log	Degrade <3s
Ollama HTTP	curl + in-game	NDJSON streams
Airplane mode	QA scenario 2	Canned plays
Disclosure	Store bullet	Matches lanes
Cost cap	Budget sheet	Scene limits set

Key takeaways

2026 voice NPCs need three lanes—Primary, Ollama text, Piper/cache degrade.
Never shell ollama run from Unity on Windows—use HTTP per help fix.
Circuit breaker after ElevenLabs 502—protect players and provider.
Hard canned tree is cert-non-negotiable.
voice_fallback_receipt_v1.json documents QA for partners.
Pairs with existing help fixes and beginner fallback net—this is combined architecture.
AI Integration category after Construct trilogy—diversifies blog mix.
8 backlog pitches remain.
Align Steam AI disclosure with actual lanes.
Subtitles always-on for accessibility and degrade mode.

FAQ

Do I need both ElevenLabs and Ollama?
Primary voice wants ElevenLabs (or similar). Ollama covers text when cloud fails; you can ship canned-only without Ollama but not cloud-only without canned.

Can Ollama drive voice directly?
Not production-quality alone—pair with Piper or subtitles.

Godot on Steam Deck?
Assume Canned default; optional Ollama off.

Does Conversational AI agent replace dialogue tree?
No—tree owns beats; AI fills delivery within bounds.

How does this relate to Inworld/Convai?
Same director pattern—vendor SDKs still need canned + degrade lanes.

502 fix article enough?
Help fixes retries; this post wires full lane graph.

Anthropic/Gemini instead of Ollama?
Architecture identical—swap HTTP client; keep canned.

Latency SLA for fest?
Target p95 <3s to first feedback (audio or subtitle).

Legal on voice cloning?
Use licensed voice IDs; disclose in store AI section.

Conclusion

Voice-first NPCs are a system design problem in 2026—not a plugin purchase. Wire ElevenLabs behind a circuit breaker, Ollama behind HTTP, Piper behind degrade, and canned lines behind everything else. Log the receipt before you mark the demo fest-ready.

Next reads: 502 retry help, Ollama HTTP help, and beginner fallback net.

Ninety-minute architecture sketch

Minute	Task
0–15	Draw Dialogue Director diagram on paper
15–30	Add canned + circuit enum stubs
30–50	Ollama health GET + one streamed line
50–70	ElevenLabs wrap with 3 retries
70–90	Write receipt JSON + airplane test

Sketch night does not replace five QA scenarios—it starts the graph.

SEO and discovery note

Targets elevenlabs conversational ai unity npc and ollama fallback architecture 2026—distinct from help fix URLs and beginner tutorial intent.

Evidence folder layout

release-evidence/ai/
  voice_fallback_receipt_v1.json
  qa-voice-fallback.md
  voice_cache_manifest.json
  prompt_registry_semver.txt

Aligns with release evidence taxonomy and prompt registry sprint when live-ops edits ship mid-fest.

NDJSON streaming parse (Unity sketch)

Ollama returns one JSON object per line when "stream": true. Do not wait for full body on main thread.

IEnumerator StreamOllamaLine(string prompt, Action<string> onLineDone)
{
    var body = JsonUtility.ToJson(new ChatRequest { model = "llama3.2:3b", stream = true, messages = BuildMessages(prompt) });
    using var req = new UnityWebRequest("http://127.0.0.1:11434/api/chat", "POST");
    req.uploadHandler = new UploadHandlerRaw(Encoding.UTF8.GetBytes(body));
    req.downloadHandler = new DownloadHandlerBuffer();
    req.SetRequestHeader("Content-Type", "application/json");
    yield return req.SendWebRequest();
    if (req.result != UnityWebRequest.Result.Success) { onLineDone(null); yield break; }
    foreach (var line in req.downloadHandler.text.Split('\n'))
    {
        if (string.IsNullOrWhiteSpace(line)) continue;
        var chunk = JsonUtility.FromJson<ChatChunk>(line);
        if (chunk.done) onLineDone(chunk.message.content);
    }
}

Godot 4.5: use HTTPRequest with request_completed, split buffer by newline, emit dialogue_line_ready signal. Same contract—no Process.

Latency measurement (receipt input)

Log per line in qa-voice-fallback.md:

Metric	How
`t_request`	Player confirms option
`t_first_token`	First NDJSON chunk or first audio byte
`t_playback`	AudioSource.Play

Compute p95 over 20 scripted lines in Primary lane and 10 in Text fallback. If Text fallback p95 > 4s on target min-spec PC, shrink model or shorten max_tokens.

Moderation and content bounds (2026 policy reality)

Risk	Mitigation
Model invents quest	Context from authored flags only
Unsafe text	Blocklist filter on output string
Player harassment	No free-text to cloud on demo
Voice deepfake confusion	Store disclosure + subtitle

Run human sign-off on prompt registry per 14-day sprint before changing system prompts mid-fest.

Fest-week operations calendar

Day	Action
Mon	Warm voice cache for tutorial NPC
Wed	Check ElevenLabs status page; lower scene token cap if spikes
Fri	Re-run airplane mode QA; update receipt
Upload eve	Freeze `prompt_registry` semver

Pair with BUILD_RECEIPT notes: voice_fallback_pass=true.

Troubleshooting matrix

Symptom	Check	Fix
Silent after 502	Circuit stuck open	Half-open trial
Editor freeze	Process Ollama	HTTP only
Subtitle only	Piper path disabled	Enable VoiceDegrade
Wrong persona	Prompt registry drift	Freeze semver
High bill	Cache miss	Warm lines
Deck stutter	Ollama on Deck	Default Canned
Duplicate lines	Retry without idempotency	Hash request id

Vendor-neutral comparison (architecture level)

Capability	ElevenLabs Conversational AI	Ollama local	Canned tree
Voice quality	High	N/A (use Piper)	Fixed WAV
Latency	Network	GPU/CPU bound	Instant
Cost	Per use	Dev time	Authored time
Cert offline	Degrade	Yes with HTTP	Yes
Narrative control	Medium	Medium	Full

Architecture stays valid if you swap ElevenLabs for another TTS—keep lane graph.

Worked scenario — 502 during shopkeeper haggle

Player selects “Haggle” option opt_haggle_3.
Director checks circuit—closed. Calls ElevenLabs.
Provider returns 502 after retries. Circuit records failure #1.
Director routes Text fallback—Ollama returns shorter line with price hint from quest.haggle_count.
Voice degrade—Piper speaks line; subtitle shows same text.
Receipt logs lane=TextFallback+VoiceDegrade, latency_ms=2400.
Player hears voice—not silence. Refund risk avoided.

If Ollama also down, step 4 becomes Canned shopkeeper_haggle_fail_02.wav—still no silence.

Integration with playtest CSV

Add columns to 18 playtest tools sheet:

voice_lane (Primary / TextFallback / VoiceDegrade / Canned)
elevenlabs_502_Y/N
ollama_up_Y/N

Playtesters reproduce architecture bugs faster than “voice felt weird.”

Corporate laptop note

Ollama help mentions AppLocker. Architecture default: detect blocked daemon → Canned without hanging. Log ollama_blocked=true in receipt for partner laptops.

Stretch goals

Ship in-game AI status icon (cloud / local / offline).
Add Anthropic 529 queue help as alternate cloud text behind same Director.
Automate receipt generation in validate-packet script grep.

Found this useful? Bookmark the two help fixes and implement the Director once—502 and Ollama hang stops being two separate firefights.

Security checklist (ship gate)

[ ] API keys in CI secrets—not repo
[ ] Ollama bound to 127.0.0.1 only in shipping build
[ ] No player PII in prompts
[ ] Log redaction for dialogue text in public builds
[ ] Rate limit per NPC to prevent spam-click cost explosion
[ ] Disable debug “force cloud” cheat in release binaries

Dual-cloud text optional layer

Some teams use ElevenLabs for voice and OpenAI/Anthropic for line generation on Primary lane. Architecture unchanged:

Cloud text failure (429) → open text circuit → Ollama
Cloud text success + voice 502 → Ollama not required if cached WAV exists for that line hash

Document which cloud APIs you actually call in disclosure—three vendors is still one Director.

Audio mixer routing

Route Primary and Piper into same NPC bus with shared reverb send so degrade mode does not sound like a different character from another room. Subtitle styling stays identical across lanes—players perceive one NPC with a bad connection, not two different systems.

Build flavors table

Flavor	Primary	Ollama	Canned
`DEVELOPMENT`	On	On	On
`FEST_DEMO`	On	Auto	On
`STEAM_DECK`	On	Off	On
`REVIEWER_OFFLINE`	Off	Off	On only

Use scripting define symbols—do not #if inside Dialogue Director without tests per flavor.

Partner README one-liner

Voice: ElevenLabs online with Piper/canned fallback. Dialogue text: cloud when healthy, Ollama HTTP locally, canned tree always. Evidence: release-evidence/ai/voice_fallback_receipt_v1.json.

Paste into partner ZIP README beside hash manifests from cold-hash challenge.

Idempotency and duplicate line prevention

When retries fire after a timeout, the player must not hear two lines for one choice. Hash (npc_id, option_id, attempt_id) and discard duplicate audio starts within 500 ms. Log duplicate_suppressed=true in QA when testing flaky Wi-Fi—proves Director is production-safe, not demo-only.

Accessibility requirements (2026 baseline)

Subtitles for every lane including Primary.
Subtitle speed independent of audio length when Piper runs faster than ElevenLabs.
Visual indicator when voice is synthetic vs recorded—icons help disclosure honesty.
Reduce motion setting should not disable fallback subtitles.

Fest reviewers increasingly check accessibility alongside airplane mode—architecture treats both as first-class QA scenarios.

When to skip ElevenLabs entirely

Project signal	Recommendation
No VO budget	Canned + subtitles only
Web-only tiny scope	Text + Piper; skip cloud voice
Strict offline SKU	No Primary lane in that SKU
Narration-heavy AA	Recorded WAV primary; AI for barks only

Architecture doc still applies—lanes shrink but Director remains. Ship the graph once; swap vendors later without rewriting NPC scenes. That is the difference between a voice feature and a shippable NPC system.