Ollama Local LLM Fallback Hangs at First Token in Unity 6 on Windows 11 - Process Redirect vs HTTP API Fix

Problem: Your Unity 6 NPC dialogue path calls ollama run <model> through System.Diagnostics.Process with redirected stdout. The Editor freezes on the first line: StandardOutput.ReadLine() never returns, and no token reaches the UI.

Who is affected now: Indie teams wiring Ollama as the local fallback after mid-2026 cloud LLM price drops and partner-cert expectations for a deterministic offline path — especially on Windows 11 where tutorial Process redirect samples still dominate blog posts.

Fastest safe fix: Stop driving dialogue through the CLI. Call http://127.0.0.1:11434/api/chat with UnityWebRequest, parse NDJSON chunks on the main thread or a worker, and reserve Process only for a one-shot ollama list health check.

Direct answer

Ollama’s CLI is interactive and line-buffered; Unity’s synchronous ReadLine() on a redirected pipe waits for a full line that may never arrive during streaming generation. The supported integration surface is the HTTP API (/api/chat with "stream": true). Health-check the daemon before the first NPC prompt, stream tokens incrementally, and disable the live LLM branch when Ollama is down so the Editor never blocks on a hung child process.

Why this issue spikes in 2026

LLM API price drops pushed more indies to add local fallback beside OpenAI/Anthropic paths.
Partner and store reviews increasingly ask for a deterministic offline dialogue path when cloud APIs fail.
Unity 6 samples still show Process.Start for “call external tools,” which copies poorly to long-running Ollama streams.
Windows 11 stdout pipe behavior + AppLocker on corporate laptops adds silent child-process failures.

Pair this fix with ElevenLabs Conversational AI Unity SDK 502 Bad Gateway (cloud voice spike) and OpenAI API 429 Too Many Requests (cloud text throttle). Architecture context: Your First LLM NPC Dialogue System With a Hard Fallback Net (2026) and 15 Free LLM-Driven NPC Dialogue Local Fallback Net Resources.

Symptoms and phrases to match

First NPC line after cloud failure never appears; Editor spinner runs indefinitely.
ollama run works in PowerShell but hangs inside Play Mode.
Hang only on first call; later calls fail fast if you kill the Editor.
WaitForExit() before reading stdout — process never exits during run.
Works on macOS/Linux dev machine, fails on Windows 11 QA laptops.

Root causes (check in this order)

Stdout line buffering — ReadLine() blocks until a newline; streaming tokens may not flush as full lines.
WaitForExit() ordering — waiting for CLI exit while the model is still generating.
Wrong API surface — ollama run is not the production integration; use HTTP /api/chat.
Daemon not running — Unity starts a new ollama.exe per prompt; cold start + pipe stall.
Windows AppLocker / Defender — blocks child ollama.exe without a visible Unity error.
Main-thread blocking — synchronous Process I/O on the Unity player loop.

Fastest safe fix path

Step 1 — Prove Ollama is up (outside Unity)

In PowerShell:

ollama list
curl http://127.0.0.1:11434/api/tags

Both must succeed before you wire Unity. If curl fails, start the tray app or run ollama serve once and pin it in your dev README.

Step 2 — Replace `Process` dialogue with HTTP `/api/chat`

Request body (non-streaming smoke test first):

{
  "model": "llama3.1:8b",
  "messages": [{ "role": "user", "content": "Say hi in one sentence." }],
  "stream": false
}

Endpoint: POST http://127.0.0.1:11434/api/chat

When smoke test returns within 600 ms, switch to "stream": true and read NDJSON lines (one JSON object per line).

Step 3 — UnityWebRequest streaming wrapper (C#)

Use a coroutine or async with UnityWebRequest — do not block Update() on ReadLine():

using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.Networking;

public sealed class OllamaChatClient : MonoBehaviour
{
    const string ChatUrl = "http://127.0.0.1:11434/api/chat";

    public IEnumerator StreamNpcLine(string model, string userPrompt, System.Action<string> onDelta)
    {
        var payload = JsonUtility.ToJson(new ChatRequest
        {
            model = model,
            stream = true,
            messages = new[] { new ChatMessage { role = "user", content = userPrompt } }
        });

        using var req = new UnityWebRequest(ChatUrl, "POST");
        req.uploadHandler = new UploadHandlerRaw(Encoding.UTF8.GetBytes(payload));
        req.downloadHandler = new DownloadHandlerBuffer();
        req.SetRequestHeader("Content-Type", "application/json");

        var op = req.SendWebRequest();
        while (!op.isDone)
        {
            // For production: use a custom DownloadHandlerScript or split buffer by newlines.
            yield return null;
        }

        if (req.result != UnityWebRequest.Result.Success)
        {
            Debug.LogError($"Ollama HTTP failed: {req.responseCode} {req.error}");
            yield break;
        }

        foreach (var line in req.downloadHandler.text.Split('\n'))
        {
            if (string.IsNullOrWhiteSpace(line)) continue;
            var chunk = JsonUtility.FromJson<ChatStreamChunk>(line);
            if (!string.IsNullOrEmpty(chunk.message?.content))
                onDelta?.Invoke(chunk.message.content);
            if (chunk.done) break;
        }
    }

    [System.Serializable] class ChatRequest
    {
        public string model;
        public bool stream;
        public ChatMessage[] messages;
    }

    [System.Serializable] class ChatMessage { public string role; public string content; }

    [System.Serializable] class ChatStreamChunk
    {
        public ChatMessage message;
        public bool done;
    }
}

For Play Mode responsiveness, move HTTP work off the main thread with Task.Run + a thread-safe queue consumed in Update, or use a DownloadHandlerScript that parses NDJSON as bytes arrive.

Step 4 — Health-check on startup (Process allowed here only)

One-shot check before enabling the LLM path:

bool OllamaReachable()
{
    using var req = UnityWebRequest.Get("http://127.0.0.1:11434/api/tags");
    req.timeout = 2;
    req.SendWebRequest();
    while (!req.isDone) { }
    return req.result == UnityWebRequest.Result.Success;
}

If false: skip Ollama, log once, route to Ink/Yarn/canned fallback immediately.

Step 5 — Remove anti-patterns from old tutorials

Anti-pattern	Replace with
`ollama run` + `ReadLine()` loop	`POST /api/chat` + NDJSON
`WaitForExit()` before reading stdout	HTTP request lifetime
New `Process` per NPC line	Single daemon + HTTP keep-alive
Blocking main thread on I/O	Coroutine / async + timeout

Verification checklist

[ ] curl http://127.0.0.1:11434/api/tags succeeds on the QA machine.
[ ] First streamed token arrives within 600 ms of SendWebRequest (same LAN, warm model).
[ ] Editor stays interactive while tokens append (no multi-second freeze).
[ ] Killing ollama.exe disables the LLM branch and plays canned lines within one frame budget.
[ ] No hung ollama child processes left after exiting Play Mode (Task Manager clean).

Prevention

Health-check on boot; set llm_local_enabled = false when tags endpoint fails.
Cache last-good NPC response keyed by (npc_id, prompt_seed_hash) for the session.
Timeout every HTTP call (2 s connect, 30 s total per line) and fall back to deterministic dialogue.
Document “Ollama must be running before Play” in README — do not auto-spawn from Unity in production builds unless you ship the installer.
CI: optional headless smoke against /api/tags on build agents that run dialogue integration tests.

Troubleshooting table

Symptom	Likely cause	Fix
Hang on first `ReadLine()`	CLI stdout buffering	Switch to HTTP `/api/chat`
Immediate `connection refused`	Daemon not running	`ollama serve` / tray app; health-check gate
Works in Editor, fails in build	Stripped `UnityWebRequest` or HTTP blocked	Allow localhost in firewall; test player build
Empty response, no error	Wrong model tag	`ollama pull llama3.1:8b`; match `model` string
Child `ollama.exe` blocked	AppLocker / AV	Whitelist path; use HTTP to existing service
Slow first token only	Cold model load	Pre-warm with tiny prompt at scene load

Frequently asked questions

Q: Can I keep Process for anything?

A: Yes — one-shot health probes (ollama list exit code) are fine. Do not stream dialogue through stdout.

Q: Does this apply to Godot or Unreal?

A: Same root cause: use the HTTP API, not CLI redirect. Godot: HTTPRequest with stream buffer; Unreal: FHttpModule.

Q: Should Unity ship Ollama with the game?

A: Most PC indies document “install Ollama separately” for dev fallback. Shipping the runtime is a distribution/licensing choice — still use HTTP to 127.0.0.1, not Process pipes.

Q: First token still slow after HTTP fix?

A: Warm the model at scene start with a 1-token prompt; quantize to a smaller GGUF for festival laptops.