AI Integration Problems May 20, 2026

Ollama Local LLM Fallback Hangs at First Token in Unity 6 on Windows 11 - Process Redirect vs HTTP API Fix

Fix Ollama local LLM fallback hanging at the first token when Unity 6 starts ollama run via Process redirect on Windows 11. Use HTTP /api/chat streaming, health checks, and cached NPC responses.

By GamineAI Team

Ollama Local LLM Fallback Hangs at First Token in Unity 6 on Windows 11 - Process Redirect vs HTTP API Fix

Problem: Your Unity 6 NPC dialogue path calls ollama run <model> through System.Diagnostics.Process with redirected stdout. The Editor freezes on the first line: StandardOutput.ReadLine() never returns, and no token reaches the UI.

Who is affected now: Indie teams wiring Ollama as the local fallback after mid-2026 cloud LLM price drops and partner-cert expectations for a deterministic offline path — especially on Windows 11 where tutorial Process redirect samples still dominate blog posts.

Fastest safe fix: Stop driving dialogue through the CLI. Call http://127.0.0.1:11434/api/chat with UnityWebRequest, parse NDJSON chunks on the main thread or a worker, and reserve Process only for a one-shot ollama list health check.

Direct answer

Ollama’s CLI is interactive and line-buffered; Unity’s synchronous ReadLine() on a redirected pipe waits for a full line that may never arrive during streaming generation. The supported integration surface is the HTTP API (/api/chat with "stream": true). Health-check the daemon before the first NPC prompt, stream tokens incrementally, and disable the live LLM branch when Ollama is down so the Editor never blocks on a hung child process.

Why this issue spikes in 2026

  1. LLM API price drops pushed more indies to add local fallback beside OpenAI/Anthropic paths.
  2. Partner and store reviews increasingly ask for a deterministic offline dialogue path when cloud APIs fail.
  3. Unity 6 samples still show Process.Start for “call external tools,” which copies poorly to long-running Ollama streams.
  4. Windows 11 stdout pipe behavior + AppLocker on corporate laptops adds silent child-process failures.

Pair this fix with ElevenLabs Conversational AI Unity SDK 502 Bad Gateway (cloud voice spike) and OpenAI API 429 Too Many Requests (cloud text throttle). Architecture context: Your First LLM NPC Dialogue System With a Hard Fallback Net (2026) and 15 Free LLM-Driven NPC Dialogue Local Fallback Net Resources.

Symptoms and phrases to match

  • First NPC line after cloud failure never appears; Editor spinner runs indefinitely.
  • ollama run works in PowerShell but hangs inside Play Mode.
  • Hang only on first call; later calls fail fast if you kill the Editor.
  • WaitForExit() before reading stdout — process never exits during run.
  • Works on macOS/Linux dev machine, fails on Windows 11 QA laptops.

Root causes (check in this order)

  1. Stdout line bufferingReadLine() blocks until a newline; streaming tokens may not flush as full lines.
  2. WaitForExit() ordering — waiting for CLI exit while the model is still generating.
  3. Wrong API surfaceollama run is not the production integration; use HTTP /api/chat.
  4. Daemon not running — Unity starts a new ollama.exe per prompt; cold start + pipe stall.
  5. Windows AppLocker / Defender — blocks child ollama.exe without a visible Unity error.
  6. Main-thread blocking — synchronous Process I/O on the Unity player loop.

Fastest safe fix path

Step 1 — Prove Ollama is up (outside Unity)

In PowerShell:

ollama list
curl http://127.0.0.1:11434/api/tags

Both must succeed before you wire Unity. If curl fails, start the tray app or run ollama serve once and pin it in your dev README.

Step 2 — Replace Process dialogue with HTTP /api/chat

Request body (non-streaming smoke test first):

{
  "model": "llama3.1:8b",
  "messages": [{ "role": "user", "content": "Say hi in one sentence." }],
  "stream": false
}

Endpoint: POST http://127.0.0.1:11434/api/chat

When smoke test returns within 600 ms, switch to "stream": true and read NDJSON lines (one JSON object per line).

Step 3 — UnityWebRequest streaming wrapper (C#)

Use a coroutine or async with UnityWebRequest — do not block Update() on ReadLine():

using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.Networking;

public sealed class OllamaChatClient : MonoBehaviour
{
    const string ChatUrl = "http://127.0.0.1:11434/api/chat";

    public IEnumerator StreamNpcLine(string model, string userPrompt, System.Action<string> onDelta)
    {
        var payload = JsonUtility.ToJson(new ChatRequest
        {
            model = model,
            stream = true,
            messages = new[] { new ChatMessage { role = "user", content = userPrompt } }
        });

        using var req = new UnityWebRequest(ChatUrl, "POST");
        req.uploadHandler = new UploadHandlerRaw(Encoding.UTF8.GetBytes(payload));
        req.downloadHandler = new DownloadHandlerBuffer();
        req.SetRequestHeader("Content-Type", "application/json");

        var op = req.SendWebRequest();
        while (!op.isDone)
        {
            // For production: use a custom DownloadHandlerScript or split buffer by newlines.
            yield return null;
        }

        if (req.result != UnityWebRequest.Result.Success)
        {
            Debug.LogError($"Ollama HTTP failed: {req.responseCode} {req.error}");
            yield break;
        }

        foreach (var line in req.downloadHandler.text.Split('\n'))
        {
            if (string.IsNullOrWhiteSpace(line)) continue;
            var chunk = JsonUtility.FromJson<ChatStreamChunk>(line);
            if (!string.IsNullOrEmpty(chunk.message?.content))
                onDelta?.Invoke(chunk.message.content);
            if (chunk.done) break;
        }
    }

    [System.Serializable] class ChatRequest
    {
        public string model;
        public bool stream;
        public ChatMessage[] messages;
    }

    [System.Serializable] class ChatMessage { public string role; public string content; }

    [System.Serializable] class ChatStreamChunk
    {
        public ChatMessage message;
        public bool done;
    }
}

For Play Mode responsiveness, move HTTP work off the main thread with Task.Run + a thread-safe queue consumed in Update, or use a DownloadHandlerScript that parses NDJSON as bytes arrive.

Step 4 — Health-check on startup (Process allowed here only)

One-shot check before enabling the LLM path:

bool OllamaReachable()
{
    using var req = UnityWebRequest.Get("http://127.0.0.1:11434/api/tags");
    req.timeout = 2;
    req.SendWebRequest();
    while (!req.isDone) { }
    return req.result == UnityWebRequest.Result.Success;
}

If false: skip Ollama, log once, route to Ink/Yarn/canned fallback immediately.

Step 5 — Remove anti-patterns from old tutorials

Anti-pattern Replace with
ollama run + ReadLine() loop POST /api/chat + NDJSON
WaitForExit() before reading stdout HTTP request lifetime
New Process per NPC line Single daemon + HTTP keep-alive
Blocking main thread on I/O Coroutine / async + timeout

Verification checklist

  • [ ] curl http://127.0.0.1:11434/api/tags succeeds on the QA machine.
  • [ ] First streamed token arrives within 600 ms of SendWebRequest (same LAN, warm model).
  • [ ] Editor stays interactive while tokens append (no multi-second freeze).
  • [ ] Killing ollama.exe disables the LLM branch and plays canned lines within one frame budget.
  • [ ] No hung ollama child processes left after exiting Play Mode (Task Manager clean).

Prevention

  • Health-check on boot; set llm_local_enabled = false when tags endpoint fails.
  • Cache last-good NPC response keyed by (npc_id, prompt_seed_hash) for the session.
  • Timeout every HTTP call (2 s connect, 30 s total per line) and fall back to deterministic dialogue.
  • Document “Ollama must be running before Play” in README — do not auto-spawn from Unity in production builds unless you ship the installer.
  • CI: optional headless smoke against /api/tags on build agents that run dialogue integration tests.

Troubleshooting table

Symptom Likely cause Fix
Hang on first ReadLine() CLI stdout buffering Switch to HTTP /api/chat
Immediate connection refused Daemon not running ollama serve / tray app; health-check gate
Works in Editor, fails in build Stripped UnityWebRequest or HTTP blocked Allow localhost in firewall; test player build
Empty response, no error Wrong model tag ollama pull llama3.1:8b; match model string
Child ollama.exe blocked AppLocker / AV Whitelist path; use HTTP to existing service
Slow first token only Cold model load Pre-warm with tiny prompt at scene load

Frequently asked questions

Q: Can I keep Process for anything?

A: Yes — one-shot health probes (ollama list exit code) are fine. Do not stream dialogue through stdout.

Q: Does this apply to Godot or Unreal?

A: Same root cause: use the HTTP API, not CLI redirect. Godot: HTTPRequest with stream buffer; Unreal: FHttpModule.

Q: Should Unity ship Ollama with the game?

A: Most PC indies document “install Ollama separately” for dev fallback. Shipping the runtime is a distribution/licensing choice — still use HTTP to 127.0.0.1, not Process pipes.

Q: First token still slow after HTTP fix?

A: Warm the model at scene start with a 1-token prompt; quantize to a smaller GGUF for festival laptops.

Related help articles and resources

Bookmark this page when your local fallback path freezes the Editor — the fix is almost always HTTP streaming, not a better Process redirect.