AI Integration / Workflow May 22, 2026

ElevenLabs Conversational AI NPC with Local Ollama Fallback Architecture - 2026

2026 indie architecture for ElevenLabs Conversational AI NPCs with Ollama local dialogue fallback, Piper voice degrade path, circuit breakers, and cert-safe evidence receipts.

By GamineAI Team

ElevenLabs Conversational AI NPC with Local Ollama Fallback Architecture - 2026

Pixel-art hero for ElevenLabs Conversational AI NPC with local Ollama fallback architecture 2026

Your trailer shows a voice-reactive NPC. In playtest, ElevenLabs returns 502 during the second line, the dialogue box shows cloud text, and the speaker goes silent. You added Ollama as “fallback” but Unity freezes on the first token because the integration still shells ollama run with blocked stdout.

That is not an AI feature—it is two half-wired paths pretending to be resilience.

Mid-2026 moved thousands of indie teams onto Conversational AI voice tiers after price drops, while the same quarter brought 502 spikes on shared edges and partner reviewers who disable network mid-dialogue. This AI Integration / Workflow post documents a dual-stack architecture: ElevenLabs for primary voice + cloud dialogue when healthy, Ollama HTTP for local text when cloud LLM or voice fails, Piper (or cached clips) when voice must stay local, and voice_fallback_receipt_v1.json so cert lanes see proof—not promises.

Non-repetition note: Help pages fix 502 retry and Ollama Process hang in isolation. The beginner LLM fallback net covers text-only paths. This URL owns ElevenLabs + Ollama combined architecture for voice-first NPCs—not Steam metadata, not Construct saves.

Why this matters now (May 2026)

  1. 502 evening peaksElevenLabs Unity SDK help documents the spike; architecture prevents silence after retries exhaust.
  2. Ollama adoption without HTTP disciplineProcess redirect hang is the dominant Windows 11 failure when teams copy CLI tutorials.
  3. Partner cert airplane mode — Reviewers expect deterministic lines when cloud fails; voice can degrade but gameplay must continue.
  4. Steam AI disclosureStore disclosure checklist needs honest “online voice + local text fallback” wording.
  5. Cost predictability — Conversational AI billed per minute; Ollama is “free at runtime” but not free in engineering—architecture separates budgets.

Direct answer: Implement a Dialogue Director state machine with three lanes: Primary (ElevenLabs voice + cloud LLM if used), Text fallback (Ollama /api/chat streaming), Voice degrade (Piper or pre-baked WAV). Log lane switches in voice_fallback_receipt_v1.json during QA week.

Who this is for

  • Unity 6 or Godot 4.5 teams shipping voice-forward fest demos
  • Engineers who fixed 502 retries but still have silent NPCs on spike
  • Producers preparing Q3 partner packets beside AI disclosure sprints
  • Beginners who completed the hard fallback net tutorial and add ElevenLabs voice layer

Time to wire MVP: ~2–3 evenings after a working dialogue tree exists.

Architecture overview

Player selects dialogue option
        │
        v
+------------------+
| Dialogue Director |
+--------+---------+
         │
    health + budget check
         │
    +----+----+----+
    v    v    v    |
 Primary  Text  Voice
 (EL+cloud) (Ollama) (Piper/cache)
    │    │    │
    +----+----+
         v
   Same UI + subtitle pipeline
         v
   Log lane + latency in receipt

Lane definitions

Lane When Output
Primary ElevenLabs healthy, circuit closed, token budget OK Streamed voice + optional cloud LLM line
Text fallback Voice 502 or LLM 429/timeout Ollama HTTP line + subtitle; Piper or silent VO
Voice degrade Text OK but voice unhealthy Cached WAV / Piper on Ollama text
Hard canned Ollama down Authored tree only—never silent

Rule: Hard canned is mandatory; Ollama is optional enhancement; ElevenLabs is never the only path.

Beginner path (first evening)

  1. Keep authored dialogue tree as source of truth for story beats.
  2. Add DialogueDirector.cs (or Godot autoload) with enum Lane { Primary, TextFallback, VoiceDegrade, Canned }.
  3. Health-check http://127.0.0.1:11434/api/tags before first NPC prompt.
  4. Wrap ElevenLabs calls per 502 help—three retries then open circuit.
  5. On circuit open → call Ollama HTTP → Piper on resulting text → log lane in receipt JSON.

Success check: Airplane mode mid-conversation produces spoken or subtitled line within 3 seconds—no infinite spinner.

Developer path (production)

  1. Separate secrets — ElevenLabs API key in env/ScriptableObject; never commit.
  2. Token budget per scene—see OpenAI 429 help patterns for cloud text if paired.
  3. Warm voice cache — hash (voice_id, text) → WAV on disk for fest builds.
  4. Circuit breaker — 3 consecutive 5xx in 30s → Primary disabled 60s.
  5. Evidence folderrelease-evidence/ai/voice_fallback_receipt_v1.json per milestone build.

ElevenLabs Conversational AI (primary lane)

Responsibilities

  • Voice synthesis for NPC lines (Conversational AI agent or TTS endpoint per your SDK version).
  • Optional agent session if you use their full conversational stack—document which subset you ship (voice-only vs full agent).
  • Latency budget: target <2.5s to first audio byte on LAN; log p95 in receipt.

Retry and circuit (summary)

Event Action
502/503/504 Exponential backoff ×3 per help article
401/403 No retry—fix key, fall through
Circuit open Skip Primary 60s; route Text fallback
Success after open Half-open: one trial line

Do not implement unbounded retry loops—fest traffic makes spikes worse for everyone.

Cache warming table

Line type Warm?
Tutorial barks Yes—ship in build
Branching shop Top 20 lines
Procedural LLM No—expect fallback lane

Store under StreamingAssets/voice_cache/{hash}.wav with manifest voice_cache_manifest.json listing text_hash, voice_id, semver.


Ollama HTTP (text fallback lane)

Why HTTP only

The Ollama Process hang fix is non-negotiable on Windows 11 Unity paths. Production uses:

POST http://127.0.0.1:11434/api/chat
{ "model": "llama3.2:3b", "stream": true, "messages": [...] }

Parse NDJSON chunks; append to UI subtitle buffer; trigger Piper when line complete.

Prompt contract (keep NPC safe)

Field Purpose
system Persona + content policy + max sentences
context Quest state from authored variables only
player_choice Selected option text
max_tokens Hard cap (e.g. 120)

Never pass raw player chat into Ollama on fest demos—selected options only.

Health check before scene load

curl -s http://127.0.0.1:11434/api/tags

Unity: UnityWebRequest.Get("http://127.0.0.1:11434/api/tags") with 2s timeout. Failure → skip Ollama lane, go Hard canned immediately.

Model choice (2026 realistic)

Hardware Model hint
Dev laptop 16GB llama3.2:3b
Player PC mid Optional smaller quant in settings
Steam Deck Canned only—do not assume Ollama

Document in settings: Local dialogue: Off / Auto / Required.


Piper voice degrade (voice when ElevenLabs down)

When Ollama returns text but ElevenLabs circuit is open:

  1. Feed text to Piper offline TTS (or play gender-matched cached narrator WAV).
  2. Show subtitles always—accessibility + cert reviewers read text.
  3. Log lane=VoiceDegrade in receipt.

Honest limit: Piper does not clone ElevenLabs celebrity voices—disclose in AI storefront disclosure bullet.


Dialogue Director state machine (pseudo)

OnPlayerChoose(optionId):
  if CircuitBreaker.Open: goto FallbackText
  if TokenBudget.Exhausted: goto FallbackText
  try Primary = ElevenLabs.SynthesizeAsync(line)
  on success: PlayAudio(Primary); return
  on retryable5xx after retries: CircuitBreaker.RecordFailure()
  FallbackText:
    if OllamaHealthy:
      text = Ollama.StreamChat(prompt)
      if VoiceDegradeEnabled: PlayPiper(text)
      else: ShowSubtitleOnly(text)
    else:
      PlayCanned(optionId)

Godot: same logic in autoload with await and signals—avoid blocking main thread.


voice_fallback_receipt_v1.json

Store under release-evidence/ai/:

{
  "receipt_type": "voice_fallback_receipt_v1",
  "build_id": "fest-demo-20260522",
  "tests": {
    "primary_lane_smoke": "pass",
    "airplane_mode_mid_dialogue": "pass",
    "ollama_stopped_mid_dialogue": "pass",
    "elevenlabs_502_simulated": "pass"
  },
  "fallback_pass": true,
  "lanes_observed": ["Primary", "TextFallback", "VoiceDegrade", "Canned"],
  "p95_primary_latency_ms": 2100,
  "observed_date_utc": "2026-05-22"
}

QA scenarios (mandatory)

# Setup Pass
1 Normal network Primary plays
2 Airplane mode before line Canned or Ollama+Piper <3s
3 Kill Ollama mid-scene Canned
4 Mock 502 (proxy or test hook) Degrade after retries
5 Token budget = 0 No cloud spend; fallback

Record in release-evidence/ai/qa-voice-fallback.md with build_id.


Unity vs Godot wiring notes

Concern Unity 6 Godot 4.5
HTTP UnityWebRequest + coroutine HTTPRequest node
Audio AudioSource clip queue AudioStreamPlayer
Threading Main thread parse NDJSON Signals to main
Secrets ScriptableObject + env .env not in export

Both engines: one Dialogue Director—do not duplicate logic per NPC.


Cost and budget table (indicative)

Path Cost driver Control
ElevenLabs Characters / minutes Scene cap + cache
Cloud LLM (if any) Tokens Per-session budget
Ollama Dev time Optional player setting
Piper Disk Ship voice model in build

Publish max minutes per playtest hour in producer sheet—prevents surprise invoice week before fest.


Disclosure and store copy alignment

Store page must match runtime:

  • Online: ElevenLabs voice generation (and cloud LLM if used).
  • Offline: Local text model (Ollama) + offline TTS (Piper) + canned lines.

Mismatch triggers refund dashboard store-copy tags when trailer implies always-neural voice.

Link Steam Play AI disclosure checklist before upload.


Pairing with Sentis and other AI stacks

Teams mixing Unity Sentis for classification and ElevenLabs for voice should keep receipts separate:

  • sentis_deploy_receipt_v1.json for ONNX
  • voice_fallback_receipt_v1.json for NPC lanes

Do not conflate “AI feature” paragraphs in partner README—list each subsystem.


Resource roundup cross-links


Common mistakes

  1. Ollama via Process on Windows—freeze.
  2. ElevenLabs only—silent fest demos on 502.
  3. LLM writes quest state—narrative breaks; context from authored vars only.
  4. No canned path—cert fail airplane mode.
  5. Unbounded retries—worse spikes + bill shock.
  6. Same voice ID across Piper and ElevenLabs without disclosure.
  7. Blocking main thread on HTTP—stutter even when fallback works.
  8. No receipt—partners cannot verify.
  9. Player free-text chat on demo—moderation risk.
  10. Skipping cache warm for tutorial lines—every line hits 502 surface.

Proof table

Claim Evidence Pass
502 handled Simulated test log Degrade <3s
Ollama HTTP curl + in-game NDJSON streams
Airplane mode QA scenario 2 Canned plays
Disclosure Store bullet Matches lanes
Cost cap Budget sheet Scene limits set

Key takeaways

  • 2026 voice NPCs need three lanes—Primary, Ollama text, Piper/cache degrade.
  • Never shell ollama run from Unity on Windows—use HTTP per help fix.
  • Circuit breaker after ElevenLabs 502—protect players and provider.
  • Hard canned tree is cert-non-negotiable.
  • voice_fallback_receipt_v1.json documents QA for partners.
  • Pairs with existing help fixes and beginner fallback net—this is combined architecture.
  • AI Integration category after Construct trilogy—diversifies blog mix.
  • 8 backlog pitches remain.
  • Align Steam AI disclosure with actual lanes.
  • Subtitles always-on for accessibility and degrade mode.

FAQ

Do I need both ElevenLabs and Ollama?
Primary voice wants ElevenLabs (or similar). Ollama covers text when cloud fails; you can ship canned-only without Ollama but not cloud-only without canned.

Can Ollama drive voice directly?
Not production-quality alone—pair with Piper or subtitles.

Godot on Steam Deck?
Assume Canned default; optional Ollama off.

Does Conversational AI agent replace dialogue tree?
No—tree owns beats; AI fills delivery within bounds.

How does this relate to Inworld/Convai?
Same director pattern—vendor SDKs still need canned + degrade lanes.

502 fix article enough?
Help fixes retries; this post wires full lane graph.

Anthropic/Gemini instead of Ollama?
Architecture identical—swap HTTP client; keep canned.

Latency SLA for fest?
Target p95 <3s to first feedback (audio or subtitle).

Legal on voice cloning?
Use licensed voice IDs; disclose in store AI section.

Conclusion

Voice-first NPCs are a system design problem in 2026—not a plugin purchase. Wire ElevenLabs behind a circuit breaker, Ollama behind HTTP, Piper behind degrade, and canned lines behind everything else. Log the receipt before you mark the demo fest-ready.

Next reads: 502 retry help, Ollama HTTP help, and beginner fallback net.

Ninety-minute architecture sketch

Minute Task
0–15 Draw Dialogue Director diagram on paper
15–30 Add canned + circuit enum stubs
30–50 Ollama health GET + one streamed line
50–70 ElevenLabs wrap with 3 retries
70–90 Write receipt JSON + airplane test

Sketch night does not replace five QA scenarios—it starts the graph.

SEO and discovery note

Targets elevenlabs conversational ai unity npc and ollama fallback architecture 2026—distinct from help fix URLs and beginner tutorial intent.

Evidence folder layout

release-evidence/ai/
  voice_fallback_receipt_v1.json
  qa-voice-fallback.md
  voice_cache_manifest.json
  prompt_registry_semver.txt

Aligns with release evidence taxonomy and prompt registry sprint when live-ops edits ship mid-fest.


NDJSON streaming parse (Unity sketch)

Ollama returns one JSON object per line when "stream": true. Do not wait for full body on main thread.

IEnumerator StreamOllamaLine(string prompt, Action<string> onLineDone)
{
    var body = JsonUtility.ToJson(new ChatRequest { model = "llama3.2:3b", stream = true, messages = BuildMessages(prompt) });
    using var req = new UnityWebRequest("http://127.0.0.1:11434/api/chat", "POST");
    req.uploadHandler = new UploadHandlerRaw(Encoding.UTF8.GetBytes(body));
    req.downloadHandler = new DownloadHandlerBuffer();
    req.SetRequestHeader("Content-Type", "application/json");
    yield return req.SendWebRequest();
    if (req.result != UnityWebRequest.Result.Success) { onLineDone(null); yield break; }
    foreach (var line in req.downloadHandler.text.Split('\n'))
    {
        if (string.IsNullOrWhiteSpace(line)) continue;
        var chunk = JsonUtility.FromJson<ChatChunk>(line);
        if (chunk.done) onLineDone(chunk.message.content);
    }
}

Godot 4.5: use HTTPRequest with request_completed, split buffer by newline, emit dialogue_line_ready signal. Same contract—no Process.


Latency measurement (receipt input)

Log per line in qa-voice-fallback.md:

Metric How
t_request Player confirms option
t_first_token First NDJSON chunk or first audio byte
t_playback AudioSource.Play

Compute p95 over 20 scripted lines in Primary lane and 10 in Text fallback. If Text fallback p95 > 4s on target min-spec PC, shrink model or shorten max_tokens.


Moderation and content bounds (2026 policy reality)

Risk Mitigation
Model invents quest Context from authored flags only
Unsafe text Blocklist filter on output string
Player harassment No free-text to cloud on demo
Voice deepfake confusion Store disclosure + subtitle

Run human sign-off on prompt registry per 14-day sprint before changing system prompts mid-fest.


Fest-week operations calendar

Day Action
Mon Warm voice cache for tutorial NPC
Wed Check ElevenLabs status page; lower scene token cap if spikes
Fri Re-run airplane mode QA; update receipt
Upload eve Freeze prompt_registry semver

Pair with BUILD_RECEIPT notes: voice_fallback_pass=true.


Troubleshooting matrix

Symptom Check Fix
Silent after 502 Circuit stuck open Half-open trial
Editor freeze Process Ollama HTTP only
Subtitle only Piper path disabled Enable VoiceDegrade
Wrong persona Prompt registry drift Freeze semver
High bill Cache miss Warm lines
Deck stutter Ollama on Deck Default Canned
Duplicate lines Retry without idempotency Hash request id

Vendor-neutral comparison (architecture level)

Capability ElevenLabs Conversational AI Ollama local Canned tree
Voice quality High N/A (use Piper) Fixed WAV
Latency Network GPU/CPU bound Instant
Cost Per use Dev time Authored time
Cert offline Degrade Yes with HTTP Yes
Narrative control Medium Medium Full

Architecture stays valid if you swap ElevenLabs for another TTS—keep lane graph.


Worked scenario — 502 during shopkeeper haggle

  1. Player selects “Haggle” option opt_haggle_3.
  2. Director checks circuit—closed. Calls ElevenLabs.
  3. Provider returns 502 after retries. Circuit records failure #1.
  4. Director routes Text fallback—Ollama returns shorter line with price hint from quest.haggle_count.
  5. Voice degrade—Piper speaks line; subtitle shows same text.
  6. Receipt logs lane=TextFallback+VoiceDegrade, latency_ms=2400.
  7. Player hears voice—not silence. Refund risk avoided.

If Ollama also down, step 4 becomes Canned shopkeeper_haggle_fail_02.wav—still no silence.


Integration with playtest CSV

Add columns to 18 playtest tools sheet:

  • voice_lane (Primary / TextFallback / VoiceDegrade / Canned)
  • elevenlabs_502_Y/N
  • ollama_up_Y/N

Playtesters reproduce architecture bugs faster than “voice felt weird.”


Corporate laptop note

Ollama help mentions AppLocker. Architecture default: detect blocked daemon → Canned without hanging. Log ollama_blocked=true in receipt for partner laptops.


Stretch goals

Found this useful? Bookmark the two help fixes and implement the Director once—502 and Ollama hang stops being two separate firefights.

Security checklist (ship gate)

  • [ ] API keys in CI secrets—not repo
  • [ ] Ollama bound to 127.0.0.1 only in shipping build
  • [ ] No player PII in prompts
  • [ ] Log redaction for dialogue text in public builds
  • [ ] Rate limit per NPC to prevent spam-click cost explosion
  • [ ] Disable debug “force cloud” cheat in release binaries

Dual-cloud text optional layer

Some teams use ElevenLabs for voice and OpenAI/Anthropic for line generation on Primary lane. Architecture unchanged:

  • Cloud text failure (429) → open text circuit → Ollama
  • Cloud text success + voice 502 → Ollama not required if cached WAV exists for that line hash

Document which cloud APIs you actually call in disclosure—three vendors is still one Director.

Audio mixer routing

Route Primary and Piper into same NPC bus with shared reverb send so degrade mode does not sound like a different character from another room. Subtitle styling stays identical across lanes—players perceive one NPC with a bad connection, not two different systems.

Build flavors table

Flavor Primary Ollama Canned
DEVELOPMENT On On On
FEST_DEMO On Auto On
STEAM_DECK On Off On
REVIEWER_OFFLINE Off Off On only

Use scripting define symbols—do not #if inside Dialogue Director without tests per flavor.

Partner README one-liner

Voice: ElevenLabs online with Piper/canned fallback. Dialogue text: cloud when healthy, Ollama HTTP locally, canned tree always. Evidence: release-evidence/ai/voice_fallback_receipt_v1.json.

Paste into partner ZIP README beside hash manifests from cold-hash challenge.

Idempotency and duplicate line prevention

When retries fire after a timeout, the player must not hear two lines for one choice. Hash (npc_id, option_id, attempt_id) and discard duplicate audio starts within 500 ms. Log duplicate_suppressed=true in QA when testing flaky Wi-Fi—proves Director is production-safe, not demo-only.

Accessibility requirements (2026 baseline)

  • Subtitles for every lane including Primary.
  • Subtitle speed independent of audio length when Piper runs faster than ElevenLabs.
  • Visual indicator when voice is synthetic vs recorded—icons help disclosure honesty.
  • Reduce motion setting should not disable fallback subtitles.

Fest reviewers increasingly check accessibility alongside airplane mode—architecture treats both as first-class QA scenarios.

When to skip ElevenLabs entirely

Project signal Recommendation
No VO budget Canned + subtitles only
Web-only tiny scope Text + Piper; skip cloud voice
Strict offline SKU No Primary lane in that SKU
Narration-heavy AA Recorded WAV primary; AI for barks only

Architecture doc still applies—lanes shrink but Director remains. Ship the graph once; swap vendors later without rewriting NPC scenes. That is the difference between a voice feature and a shippable NPC system.