Tutorial May 11, 2026

Your First LLM-Driven NPC Dialogue System with a Hard Fallback Net - Unity and Godot 2026 Beginner Build

2026 beginner guide to building LLM-driven NPC dialogue with a hard fallback net for Unity and Godot - mid-2026 LLM API price drops made per-NPC dialogue indie-viable but partner-cert reviewers now formally reject submissions without a deterministic fallback path; this post walks the two-path architecture, the five safety guarantees, OpenAI/Anthropic/Gemini/Ollama wiring, the canned-line fallback tree, and the token-budget discipline.

By GamineAI Team

Cat Love Dog artwork - hero image for the 2026 beginner-first LLM-driven NPC dialogue with a hard fallback net build for Unity and Godot

Why this matters now

Three things changed in 2026 that turned LLM-driven NPC dialogue from a tech-demo curiosity into a production-viable feature for indie teams - and one of those three things turned the fallback net from optional polish into a hard cert-lane requirement:

  1. Mid-2026 LLM API price drops at OpenAI, Anthropic, and Google brought per-NPC dialogue costs into indie range. A small studio can now budget a per-player-hour cost (a few cents in many cases) instead of a per-prompt cost; the unit economics finally close for sub-$30 indie titles.
  2. Local-model maturity with Ollama, llama.cpp, and quantized models running on player hardware (RTX 3060+ on PC, M-series Macs, even Steam Deck for smaller models) means the same dialogue can be shipped with a free-at-runtime path by mid-2026.
  3. Q3 2026 partner-cert reviewers formally rejected several indie submissions in spring 2026 specifically because the demo failed to handle LLM API outages gracefully. The reviewer image in late 2026 now actively tests with network suspended mid-dialogue and expects a deterministic, on-disk fallback path. No fallback = no cert.

This post is the beginner-first build for the two-path system: live LLM remote path (for the variety) + deterministic local fallback (for the cert reviewer pressing the airplane-mode toggle). Wire it up in one focused evening for Unity 6.6 LTS or Godot 4.5; the architecture is the same, only the API calls differ.

If your team has not yet built a basic dialogue system, start with How to Implement a Simple Dialogue System in Unity and How to Build a Simple Dialogue and Quest System in Godot 4 first. The LLM layer goes on top of a working dialogue tree, not in place of one.

Who this guide is for

  • Beginner-to-intermediate indie developers with a working dialogue system in Unity or Godot who want to add live LLM-driven variety to NPC speech.
  • Solo developers preparing for a 2026-2027 partner-cert submission who need a fallback story for the reviewer image.
  • Anyone who has read about Inworld, Convai, or similar live-LLM-NPC platforms and wants to understand the architecture without committing to a vendor SDK.

Not for: teams already running production LLM dialogue in a shipped title (you are past this scope). Teams without any working dialogue system (start with the primers linked above).

What the LLM-driven NPC system actually is (and is not)

In one sentence: the LLM generates contextually-varied dialogue lines for an NPC at runtime when the network is healthy; the fallback returns pre-authored canned lines from a deterministic local tree when the network is not healthy. Both paths share the same dialogue-state machine, the same UI rendering, and the same player-facing pacing.

What it is not:

  • Not a replacement for your narrative designer. The LLM does not invent quests, plot points, or characters; those stay scripted in your dialogue tree. The LLM only fills in how the NPC says the line within tight authored bounds.
  • Not a chat window. Player-facing LLM chat creates moderation, latency, and creative-control problems out of scope for a beginner build. The LLM operates on NPC outbound speech only, with the player still selecting from authored response options.
  • Not free. Even at 2026 price drops, the live path costs real money. The token-budget discipline section below covers how to keep cost predictable.

The two-path architecture in one diagram

+-----------------------------+
| Player triggers NPC line    |
| (existing dialogue tree)    |
+--------------+--------------+
               |
               v
+-----------------------------+    network healthy + within budget?
| Dialogue Director           |--------+----------------------------+
+-----------------------------+        |                            |
                                       v                            v
                          +------------------------+   +------------------------+
                          | Live LLM Path          |   | Fallback Path          |
                          | (OpenAI/Anthropic/     |   | (canned lines from     |
                          |  Gemini/Ollama)        |   |  authored local tree)  |
                          +-----------+------------+   +-----------+------------+
                                      |                            |
                                      v                            v
                          +-----------+----------------------------+
                          | Output Sanitizer + UI Renderer         |
                          +-----------------------------------------+
                                       |
                                       v
                          +-----------------------------+
                          | NPC says the line on screen |
                          +-----------------------------+

The two boxes that matter for safety:

  • Dialogue Director decides which path to use, per line, in under 50ms.
  • Output Sanitizer strips/clamps anything the LLM returned that does not fit the player-facing contract.

The five non-negotiable safety guarantees

These are the guarantees the partner-cert reviewer image checks. Build all five into the system from the start; retrofitting them later is harder than building them in.

Guarantee 1 - Deterministic offline operation

Disconnect_network() mid-dialogue must produce a complete authored fallback line within one frame of the next NPC line trigger. No hangs, no spinner, no "Connection lost" dialog. The fallback path is invisible to the player.

Guarantee 2 - Maximum response latency cap

The live path has a hard timeout (default 2 seconds). If the LLM has not returned a full line by then, the Dialogue Director cancels the request and routes to fallback. The player never waits.

Guarantee 3 - Output sanitization

Every line the LLM returns - even on the live path - passes through a sanitizer that:

  • Strips characters outside the engine font's glyph range.
  • Caps line length at 220 characters (longer lines break dialogue UI on Steam Deck native 1280×800).
  • Rejects responses that match a built-in profanity / slur list (use a 2026 community-maintained list, not your own).
  • Rejects responses containing JSON, markdown, code blocks, or obvious model artifacts (the LLM occasionally returns these).
  • On any sanitizer rejection, routes to fallback for that line.

Guarantee 4 - Token budget enforcement

The Dialogue Director tracks per-session token usage and hard-stops the live path when the per-player-hour budget is exceeded. The session falls back to canned lines for the rest of the play session, with no UI difference. Beginner-default budget: 5,000 input + 1,000 output tokens per player-hour, which at 2026 prices is a few cents.

Guarantee 5 - Logged, replayable failure modes

Every fallback trigger is logged with a reason code: network_timeout, network_unreachable, sanitizer_rejected_glyph, sanitizer_rejected_length, sanitizer_rejected_profanity, sanitizer_rejected_artifact, budget_exceeded, provider_error_4xx, provider_error_5xx. The log goes to Application.persistentDataPath/llm_fallback_log.json (Unity) or user://llm_fallback_log.json (Godot) for the cert reviewer to inspect.

Build all five and you pass the cert reviewer's airplane-mode toggle test on the first try.

Setting up the Unity 6.6 LTS recipe

Time budget: 45-90 minutes from "I have a working dialogue tree" to "I have a live LLM line speaking with the fallback ready."

Step 1 - Pick a provider and create an API key

Pick one of:

  • OpenAI (gpt-4o-mini is the indie-budget sweet spot in 2026; lower cost per token, fast response).
  • Anthropic (claude-3-5-haiku equivalent for indie scale; good safety defaults).
  • Google Gemini (gemini-1.5-flash or 2.0-flash; competitive 2026 pricing).
  • Ollama local (llama-3.1-8b-instruct or qwen2.5-7b-instruct; free at runtime, requires user GPU and the model file shipped or fetched).

For your first build, pick the cloud provider you already have an account with. Add Ollama as a second provider later for the offline-by-default tier.

Store the API key in never-in-the-binary form. For a beginner build, that means an environment variable at runtime (developer machine) and a server-side proxy at ship (production) - never the player's machine.

Step 2 - Add the LLMDialogueDirector.cs component

Create a new MonoBehaviour at Assets/Scripts/Dialogue/LLMDialogueDirector.cs:

using System;
using System.Collections;
using System.Net.Http;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using UnityEngine;

public class LLMDialogueDirector : MonoBehaviour
{
    [SerializeField] private float requestTimeoutSeconds = 2.0f;
    [SerializeField] private int tokensInputPerHourBudget = 5000;
    [SerializeField] private int tokensOutputPerHourBudget = 1000;
    [SerializeField] private string proxyEndpoint = "https://api.yourstudio.com/llm-proxy";

    private static readonly HttpClient http = new HttpClient();
    private TokenBudget budget = new TokenBudget();
    private FallbackTree fallback;

    public void Initialize(FallbackTree localFallback)
    {
        fallback = localFallback;
        budget.ResetIfNewHour();
    }

    public async Task<string> GetLineAsync(DialogueContext ctx, CancellationToken ct)
    {
        if (!budget.HasHeadroom(tokensInputPerHourBudget, tokensOutputPerHourBudget))
        {
            LLMFailureLog.Record(ctx, "budget_exceeded");
            return fallback.LineFor(ctx);
        }

        try
        {
            using var timeout = CancellationTokenSource.CreateLinkedTokenSource(ct);
            timeout.CancelAfter(TimeSpan.FromSeconds(requestTimeoutSeconds));

            string line = await CallProxyAsync(ctx, timeout.Token);
            string sanitized = OutputSanitizer.Clean(line, out string rejectReason);
            if (sanitized == null)
            {
                LLMFailureLog.Record(ctx, rejectReason);
                return fallback.LineFor(ctx);
            }
            budget.Consume(input: ctx.PromptTokens, output: ctx.EstimatedOutputTokens);
            return sanitized;
        }
        catch (TaskCanceledException)
        {
            LLMFailureLog.Record(ctx, "network_timeout");
            return fallback.LineFor(ctx);
        }
        catch (HttpRequestException)
        {
            LLMFailureLog.Record(ctx, "network_unreachable");
            return fallback.LineFor(ctx);
        }
    }

    private async Task<string> CallProxyAsync(DialogueContext ctx, CancellationToken ct)
    {
        var body = new StringContent(ctx.ToProxyJson(), Encoding.UTF8, "application/json");
        var resp = await http.PostAsync(proxyEndpoint, body, ct);
        if (!resp.IsSuccessStatusCode)
        {
            string code = ((int)resp.StatusCode / 100) == 4 ? "provider_error_4xx" : "provider_error_5xx";
            LLMFailureLog.Record(ctx, code);
            return null;
        }
        return await resp.Content.ReadAsStringAsync();
    }
}

Three things this code commits to:

  • Single-method API. GetLineAsync(ctx, ct) is the entire surface area.
  • Always returns a usable line - either live or fallback. Callers do not handle null.
  • Cancellation everywhere. Required for clean abort on scene-load or NPC interrupt.

Step 3 - Add the OutputSanitizer.cs static class

using System.Text;

public static class OutputSanitizer
{
    private const int MaxChars = 220;
    private static readonly System.Collections.Generic.HashSet<string> Blocklist =
        new System.Collections.Generic.HashSet<string> { /* load from blocklist.txt at startup */ };

    public static string Clean(string raw, out string rejectReason)
    {
        rejectReason = null;
        if (string.IsNullOrEmpty(raw))
        {
            rejectReason = "sanitizer_rejected_artifact";
            return null;
        }
        if (raw.Contains("```") || raw.Contains("{\"") || raw.Contains("```json"))
        {
            rejectReason = "sanitizer_rejected_artifact";
            return null;
        }
        if (raw.Length > MaxChars)
        {
            rejectReason = "sanitizer_rejected_length";
            return null;
        }
        var sb = new StringBuilder(raw.Length);
        foreach (char c in raw)
        {
            if (c < 32 && c != '\n') continue;
            if (c > 0xFFFF) { rejectReason = "sanitizer_rejected_glyph"; return null; }
            sb.Append(c);
        }
        string text = sb.ToString();
        foreach (string banned in Blocklist)
        {
            if (text.IndexOf(banned, System.StringComparison.OrdinalIgnoreCase) >= 0)
            {
                rejectReason = "sanitizer_rejected_profanity";
                return null;
            }
        }
        return text;
    }
}

The sanitizer is the most important file in the system. Spend an hour reviewing this code line by line; it is the boundary between "you pass cert" and "you fail cert."

Step 4 - Wire the Director into your existing dialogue tree

Wherever your existing DialogueController.cs currently picks the line to speak, route it through the Director:

async void OnNpcSpeak(DialogueNode node)
{
    var ctx = DialogueContext.From(node, currentScene, currentQuest, playerState);
    string line = await director.GetLineAsync(ctx, this.GetCancellationTokenOnDestroy());
    npcDialogueUI.Show(node.SpeakerId, line);
}

If your existing system was synchronous, this is the place to add async/await. Coroutine versions work but are messier; prefer async.

Step 5 - Build the FallbackTree from your existing dialogue authoring

Your existing dialogue tree already contains authored lines. The FallbackTree.LineFor(ctx) function should return the closest authored match for the current (speakerId, sceneId, questState, emotionTag) tuple. Every NPC line in the LLM-driven system must have at least one authored fallback; the Director cannot return a null line and the player cannot see "..." or an empty bubble.

Make this rule mechanical:

  • On dialogue-tree import, run a validator that flags any node without a fallback_line field.
  • Reject the build if any node lacks a fallback.

Setting up the Godot 4.5 recipe

Time budget: 40-75 minutes.

Step 1 - Pick the same provider and create the same API key setup

Same choices as Unity above; environment variables on developer machine, server-side proxy on ship.

Step 2 - Create llm_dialogue_director.gd

extends Node
class_name LLMDialogueDirector

@export var request_timeout_seconds: float = 2.0
@export var tokens_input_per_hour_budget: int = 5000
@export var tokens_output_per_hour_budget: int = 1000
@export var proxy_endpoint: String = "https://api.yourstudio.com/llm-proxy"

var _http: HTTPRequest
var _budget: TokenBudget = TokenBudget.new()
var _fallback: FallbackTree

func initialize(fallback: FallbackTree) -> void:
    _fallback = fallback
    _budget.reset_if_new_hour()
    _http = HTTPRequest.new()
    add_child(_http)
    _http.timeout = request_timeout_seconds

func get_line(ctx: DialogueContext) -> String:
    if not _budget.has_headroom(tokens_input_per_hour_budget, tokens_output_per_hour_budget):
        LLMFailureLog.record(ctx, "budget_exceeded")
        return _fallback.line_for(ctx)

    var headers := ["Content-Type: application/json"]
    var body := ctx.to_proxy_json()
    var err = _http.request(proxy_endpoint, headers, HTTPClient.METHOD_POST, body)
    if err != OK:
        LLMFailureLog.record(ctx, "network_unreachable")
        return _fallback.line_for(ctx)

    var result = await _http.request_completed
    var result_code: int = result[1]
    var body_buffer: PackedByteArray = result[3]

    if result_code == 0:
        LLMFailureLog.record(ctx, "network_timeout")
        return _fallback.line_for(ctx)

    if result_code < 200 or result_code >= 300:
        var code = "provider_error_4xx" if (result_code / 100) == 4 else "provider_error_5xx"
        LLMFailureLog.record(ctx, code)
        return _fallback.line_for(ctx)

    var raw: String = body_buffer.get_string_from_utf8()
    var reject_reason: Array[String] = [""]
    var sanitized: String = OutputSanitizer.clean(raw, reject_reason)
    if sanitized.is_empty():
        LLMFailureLog.record(ctx, reject_reason[0])
        return _fallback.line_for(ctx)

    _budget.consume(ctx.prompt_tokens, ctx.estimated_output_tokens)
    return sanitized

Step 3 - Create output_sanitizer.gd

extends RefCounted
class_name OutputSanitizer

const MAX_CHARS := 220
const BLOCKLIST := []  # load from res://config/blocklist.txt at startup

static func clean(raw: String, reject_reason: Array[String]) -> String:
    if raw.is_empty():
        reject_reason[0] = "sanitizer_rejected_artifact"
        return ""
    if raw.contains("```") or raw.contains("{\""):
        reject_reason[0] = "sanitizer_rejected_artifact"
        return ""
    if raw.length() > MAX_CHARS:
        reject_reason[0] = "sanitizer_rejected_length"
        return ""

    var cleaned := ""
    for i in raw.length():
        var cp: int = raw.unicode_at(i)
        if cp < 32 and cp != 10:
            continue
        if cp > 0xFFFF:
            reject_reason[0] = "sanitizer_rejected_glyph"
            return ""
        cleaned += String.chr(cp)

    var lower := cleaned.to_lower()
    for banned in BLOCKLIST:
        if lower.contains(banned):
            reject_reason[0] = "sanitizer_rejected_profanity"
            return ""

    return cleaned

Step 4 - Wire into existing Godot dialogue UI

func _on_npc_speak(node: DialogueNode) -> void:
    var ctx := DialogueContext.from_node(node, current_scene, current_quest, player_state)
    var line: String = await director.get_line(ctx)
    npc_dialogue_ui.show(node.speaker_id, line)

The fallback dialogue tree - what authoring looks like

The fallback tree is your dialogue tree with one extra field per node: fallback_line (or fallback_lines array if you want random selection).

For a typical RPG NPC, your dialogue node JSON / Resource looks like:

{
  "node_id": "merchant_greet_01",
  "speaker_id": "merchant_npc",
  "scene_id": "village_square",
  "emotion_tag": "friendly",
  "fallback_lines": [
    "Looking for something special today?",
    "Welcome traveler - what can I get you?",
    "Take a look around. I have what you need."
  ],
  "llm_prompt_hint": "merchant in village square, friendly, daytime, greeting customer entering shop",
  "response_options": [
    { "id": "buy", "label": "Show me your wares." },
    { "id": "leave", "label": "Just looking." }
  ]
}

The 3-line fallback set gives the player some variety in offline mode without becoming unrecognizably scripted. The llm_prompt_hint is the context the LLM uses to compose the live-path variant; keep it short and concrete.

Authoring 3 fallback lines per significant NPC node is the indie sweet spot. Below 3 feels repetitive in offline mode; above 6 burns authoring time the team doesn't have.

Token budget discipline (the indie cost-control math)

At 2026 prices for the indie-budget models (gpt-4o-mini, claude-3-5-haiku, gemini-2.0-flash):

  • Input tokens: roughly $0.10-0.30 per million tokens.
  • Output tokens: roughly $0.30-0.90 per million tokens.

A typical NPC line:

  • Prompt: ~400 input tokens (system + context + dialogue history + emotion tag).
  • Response: ~50 output tokens (a single dialogue line is short).
  • Per-line cost: ~$0.00007 to $0.00018.

A player who triggers 100 NPC lines in an hour costs ~$0.007 to $0.018 per hour at the live path. For a $20 indie title selling at 5% LLM-enabled at-runtime, the LLM cost is well under 1% of revenue, even before the fallback path catches budget overruns.

The math fails when:

  • You ship without a budget cap. A player who leaves the game running for 10 hours can cost $0.10-0.20 instead of $0.02. Multiply by player count and the AWS bill surprises.
  • You ship without caching. Identical (speaker_id, scene_id, emotion_tag) calls in a 15-minute window should return the cached previous line, not a new API call.
  • You ship without per-session and per-day caps in the proxy. A compromised key or scripted client can drain runway fast without server-side cap enforcement.

Wire the budget cap in LLMDialogueDirector (tokens_input_per_hour_budget), the 15-minute cache in the proxy, and the per-key daily cap at the provider dashboard. Three layers, not one.

Common beginner mistakes

  1. Shipping the API key in the game binary. The player extracts it within hours; the bill arrives within days. Always use a server-side proxy.
  2. Skipping output sanitization. The first time the LLM returns a markdown code block in your NPC dialogue UI is the day the cert reviewer flags it.
  3. No per-session budget cap. A long play session drains your budget without you noticing in QA.
  4. Forgetting to author fallback lines for every node. The validator at build time catches this; do not skip it.
  5. Letting the live-path latency exceed 2 seconds. Players experience the LLM as "the NPC is laggy" rather than "the LLM is varied." Hard timeout to fallback.
  6. Streaming the LLM response token-by-token to the UI. Tempting for dramatic effect; cert reviewers actively dislike this because suspended-network mid-stream creates UI glitches. Render the full line at once after sanitization.
  7. Using the player's typed input as the LLM prompt. That is a chat window, not an NPC dialogue system. Player input goes through your existing response-option system; it is not free text to the LLM.
  8. Treating the fallback path as second-class. The reviewer image will exercise it. Author the fallback lines with the same care as the live prompt hints.

Pro tips

  1. Cache aggressively at the proxy. A 15-minute cache on (speaker_id, scene_id, emotion_tag, dialogue_history_hash) typically catches 30-50% of API calls. Direct savings.
  2. Profile fallback-path triggers per build. A spike in sanitizer_rejected_* reasons after a model upgrade means the new model's output style broke your sanitizer; recalibrate.
  3. Ship a development-mode toggle that forces the fallback path. QA exercises the fallback every session that way; bugs surface during normal play, not during cert.
  4. Localize the fallback lines. The LLM live path can localize on the fly (with risk); the fallback path is the authored ground truth and must be localized like any other dialogue.
  5. Pin one model version per ship. "Latest" is a moving target. Pin gpt-4o-mini-2024-07-18 (or whichever 2026 build), test against it, ship against it. Upgrade deliberately.
  6. Use the Ollama local path for a free-at-runtime tier. Ship the system with an in-game toggle ("Use offline AI dialogue: ON/OFF"); offline routes to Ollama if a sized model file is present and to the canned fallback if not.
  7. Document the prompt template in prompts/<speaker_id>.md. Version-control it. The prompt is your dialogue-style API; treat it like code.
  8. Add temperature: 0.7 and top_p: 0.9 for indie-scale NPC dialogue. Lower temperature is more consistent but feels flat; higher is unpredictable. 0.7 / 0.9 is the indie sweet spot for 2026 mid-tier models.

Pre-cert-submission checklist

Before you submit to a partner cert with LLM-driven dialogue, verify every box:

  • [ ] Network suspended mid-dialogue produces a fallback line within one frame of next trigger.
  • [ ] 2-second hard timeout enforced on every live-path call.
  • [ ] Output sanitizer rejects: glyphs outside font range, lines over 220 chars, profanity from blocklist, markdown/JSON artifacts.
  • [ ] Per-player-hour token budget enforced; over-budget routes to fallback for rest of session.
  • [ ] llm_fallback_log.json written to Application.persistentDataPath / user:// with reason codes for every fallback trigger.
  • [ ] Every dialogue node has at least one authored fallback line (validator passes at build time).
  • [ ] API key not in the shipped binary (proxy in front, environment variable on developer machine).
  • [ ] Proxy enforces per-IP rate limit and per-key daily token cap.
  • [ ] Cached response window at proxy (15-min default) confirmed working in dev tools.
  • [ ] One model version pinned (no "latest" in production calls).
  • [ ] Development-mode toggle ships disabled for production but available for QA.

If all eleven boxes are checked, the cert reviewer's airplane-mode-toggle pass succeeds. If any box is unchecked, retest before submission.

Key takeaways

  • Mid-2026 LLM API price drops brought per-NPC dialogue into indie unit economics; partner-cert reviewers now formally require a fallback path after several spring-2026 indie rejections on this exact issue.
  • Two-path architecture: live LLM remote path for variety + deterministic local fallback tree for guarantees. Same dialogue state machine, same UI, same pacing - the player should not be able to tell which path served each line in normal play.
  • Five non-negotiable safety guarantees: deterministic offline operation within one frame, 2-second max latency to fallback, output sanitizer (glyphs / length / profanity / artifacts), per-player-hour token budget cap, logged replayable failure reasons.
  • Unity recipe: LLMDialogueDirector.cs MonoBehaviour + OutputSanitizer.cs static class + FallbackTree resource + proxy endpoint. 45-90 minutes.
  • Godot recipe: llm_dialogue_director.gd Node + output_sanitizer.gd RefCounted + FallbackTree Resource + proxy endpoint. 40-75 minutes.
  • Fallback tree authoring: every dialogue node gets 3 authored fallback lines + a short llm_prompt_hint field. Build-time validator rejects any node missing a fallback.
  • Token budget math: indie-budget 2026 models cost ~$0.00007-0.00018 per line; ~$0.007-0.018 per player-hour at 100 lines/hour. Three layers of caps required (per-session, per-day, per-key).
  • Eight common beginner mistakes: API key in binary, no sanitizer, no budget cap, missing fallback lines, latency over 2s, token-streaming UI, treating player input as LLM prompt, treating fallback as second-class.
  • Pre-cert checklist: 11 boxes. Network suspended, hard timeout, sanitizer rejections, budget cap, fallback log, every node has fallback, no key in binary, proxy rate-limit, cache window, pinned model version, dev-mode toggle disabled in production.
  • Ship Ollama as a free-at-runtime tier alongside the cloud provider. Offline-by-default players get the local model; online players get the cloud variety; everyone gets the canned fallback when both fail.

FAQ

1. Do we have to use a server-side proxy for our first build? For a development build on your own machine, no - an environment variable is fine. For anything you give to a tester, alpha player, or partner-cert reviewer, yes. The proxy is the single most-important production-readiness investment in this system.

2. Which 2026 provider has the best indie-tier pricing right now? The three majors (OpenAI, Anthropic, Google) cluster within 30-50% of each other on indie-budget tiers as of 2026. Pick the one whose dashboard you already know. Switch later if you must. Lock-in is small at this scale because the prompt template is portable.

3. Can we run pure Ollama with no cloud path at all? Yes, and that is a legitimate choice for offline-first or budget-zero builds. The fallback net is still required; an Ollama model can still fail to return within the latency cap on weak hardware. The architecture above works with Ollama as the "live" path and canned lines as the fallback.

4. What about voice / TTS for the LLM-generated lines? Voice synthesis (ElevenLabs Conversational AI, Cartesia Sonic, etc.) adds a second async layer on top of the dialogue layer and is out of scope for a beginner build. Stick to text-only for the first pass; add voice as a follow-up once the dialogue layer is cert-passing.

5. How do we measure whether the LLM-driven dialogue is actually improving the player experience? A/B with the fallback-only path on a small slice of your playtest cohort. Compare session length, NPC re-talk rate, and "this NPC felt repetitive" feedback. The LLM variety is worth shipping if and only if the metrics move; otherwise the simpler canned-line system is the win.

Related reading

Resource lists worth bookmarking: 12 Free AI Voice and Dialogue Tools for Indie Games 2026, Top 12 Free Narrative and Dialogue Tools for Indie Games 2026 Edition, 50 Free AI Tools for Game Developers (Updated January 2026).

Authoritative references: OpenAI API docs, Anthropic API docs, Google Gemini API docs, Ollama project, llama.cpp project.

If your team has shipped LLM-driven NPC dialogue through a partner-cert process in 2025-2026, the most useful thing you can do for the community is publish the exact sanitizer rejection counts you saw in QA and the prompt templates that produced the lowest rejection rates. The architecture above gets you to cert; the prompt-tuning craft gets you to a great player experience.