Ollama Local LLM Fallback Hangs at First Token in Unity 6 on Windows 11 - Process Redirect vs HTTP API Fix
Problem: Your Unity 6 NPC dialogue path calls ollama run <model> through System.Diagnostics.Process with redirected stdout. The Editor freezes on the first line: StandardOutput.ReadLine() never returns, and no token reaches the UI.
Who is affected now: Indie teams wiring Ollama as the local fallback after mid-2026 cloud LLM price drops and partner-cert expectations for a deterministic offline path — especially on Windows 11 where tutorial Process redirect samples still dominate blog posts.
Fastest safe fix: Stop driving dialogue through the CLI. Call http://127.0.0.1:11434/api/chat with UnityWebRequest, parse NDJSON chunks on the main thread or a worker, and reserve Process only for a one-shot ollama list health check.
Direct answer
Ollama’s CLI is interactive and line-buffered; Unity’s synchronous ReadLine() on a redirected pipe waits for a full line that may never arrive during streaming generation. The supported integration surface is the HTTP API (/api/chat with "stream": true). Health-check the daemon before the first NPC prompt, stream tokens incrementally, and disable the live LLM branch when Ollama is down so the Editor never blocks on a hung child process.
Why this issue spikes in 2026
- LLM API price drops pushed more indies to add local fallback beside OpenAI/Anthropic paths.
- Partner and store reviews increasingly ask for a deterministic offline dialogue path when cloud APIs fail.
- Unity 6 samples still show
Process.Startfor “call external tools,” which copies poorly to long-running Ollama streams. - Windows 11 stdout pipe behavior + AppLocker on corporate laptops adds silent child-process failures.
Pair this fix with ElevenLabs Conversational AI Unity SDK 502 Bad Gateway (cloud voice spike) and OpenAI API 429 Too Many Requests (cloud text throttle). Architecture context: Your First LLM NPC Dialogue System With a Hard Fallback Net (2026) and 15 Free LLM-Driven NPC Dialogue Local Fallback Net Resources.
Symptoms and phrases to match
- First NPC line after cloud failure never appears; Editor spinner runs indefinitely.
ollama runworks in PowerShell but hangs inside Play Mode.- Hang only on first call; later calls fail fast if you kill the Editor.
WaitForExit()before reading stdout — process never exits duringrun.- Works on macOS/Linux dev machine, fails on Windows 11 QA laptops.
Root causes (check in this order)
- Stdout line buffering —
ReadLine()blocks until a newline; streaming tokens may not flush as full lines. WaitForExit()ordering — waiting for CLI exit while the model is still generating.- Wrong API surface —
ollama runis not the production integration; use HTTP/api/chat. - Daemon not running — Unity starts a new
ollama.exeper prompt; cold start + pipe stall. - Windows AppLocker / Defender — blocks child
ollama.exewithout a visible Unity error. - Main-thread blocking — synchronous
ProcessI/O on the Unity player loop.
Fastest safe fix path
Step 1 — Prove Ollama is up (outside Unity)
In PowerShell:
ollama list
curl http://127.0.0.1:11434/api/tags
Both must succeed before you wire Unity. If curl fails, start the tray app or run ollama serve once and pin it in your dev README.
Step 2 — Replace Process dialogue with HTTP /api/chat
Request body (non-streaming smoke test first):
{
"model": "llama3.1:8b",
"messages": [{ "role": "user", "content": "Say hi in one sentence." }],
"stream": false
}
Endpoint: POST http://127.0.0.1:11434/api/chat
When smoke test returns within 600 ms, switch to "stream": true and read NDJSON lines (one JSON object per line).
Step 3 — UnityWebRequest streaming wrapper (C#)
Use a coroutine or async with UnityWebRequest — do not block Update() on ReadLine():
using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.Networking;
public sealed class OllamaChatClient : MonoBehaviour
{
const string ChatUrl = "http://127.0.0.1:11434/api/chat";
public IEnumerator StreamNpcLine(string model, string userPrompt, System.Action<string> onDelta)
{
var payload = JsonUtility.ToJson(new ChatRequest
{
model = model,
stream = true,
messages = new[] { new ChatMessage { role = "user", content = userPrompt } }
});
using var req = new UnityWebRequest(ChatUrl, "POST");
req.uploadHandler = new UploadHandlerRaw(Encoding.UTF8.GetBytes(payload));
req.downloadHandler = new DownloadHandlerBuffer();
req.SetRequestHeader("Content-Type", "application/json");
var op = req.SendWebRequest();
while (!op.isDone)
{
// For production: use a custom DownloadHandlerScript or split buffer by newlines.
yield return null;
}
if (req.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"Ollama HTTP failed: {req.responseCode} {req.error}");
yield break;
}
foreach (var line in req.downloadHandler.text.Split('\n'))
{
if (string.IsNullOrWhiteSpace(line)) continue;
var chunk = JsonUtility.FromJson<ChatStreamChunk>(line);
if (!string.IsNullOrEmpty(chunk.message?.content))
onDelta?.Invoke(chunk.message.content);
if (chunk.done) break;
}
}
[System.Serializable] class ChatRequest
{
public string model;
public bool stream;
public ChatMessage[] messages;
}
[System.Serializable] class ChatMessage { public string role; public string content; }
[System.Serializable] class ChatStreamChunk
{
public ChatMessage message;
public bool done;
}
}
For Play Mode responsiveness, move HTTP work off the main thread with Task.Run + a thread-safe queue consumed in Update, or use a DownloadHandlerScript that parses NDJSON as bytes arrive.
Step 4 — Health-check on startup (Process allowed here only)
One-shot check before enabling the LLM path:
bool OllamaReachable()
{
using var req = UnityWebRequest.Get("http://127.0.0.1:11434/api/tags");
req.timeout = 2;
req.SendWebRequest();
while (!req.isDone) { }
return req.result == UnityWebRequest.Result.Success;
}
If false: skip Ollama, log once, route to Ink/Yarn/canned fallback immediately.
Step 5 — Remove anti-patterns from old tutorials
| Anti-pattern | Replace with |
|---|---|
ollama run + ReadLine() loop |
POST /api/chat + NDJSON |
WaitForExit() before reading stdout |
HTTP request lifetime |
New Process per NPC line |
Single daemon + HTTP keep-alive |
| Blocking main thread on I/O | Coroutine / async + timeout |
Verification checklist
- [ ]
curl http://127.0.0.1:11434/api/tagssucceeds on the QA machine. - [ ] First streamed token arrives within 600 ms of
SendWebRequest(same LAN, warm model). - [ ] Editor stays interactive while tokens append (no multi-second freeze).
- [ ] Killing
ollama.exedisables the LLM branch and plays canned lines within one frame budget. - [ ] No hung
ollamachild processes left after exiting Play Mode (Task Manager clean).
Prevention
- Health-check on boot; set
llm_local_enabled = falsewhen tags endpoint fails. - Cache last-good NPC response keyed by
(npc_id, prompt_seed_hash)for the session. - Timeout every HTTP call (2 s connect, 30 s total per line) and fall back to deterministic dialogue.
- Document “Ollama must be running before Play” in
README— do not auto-spawn from Unity in production builds unless you ship the installer. - CI: optional headless smoke against
/api/tagson build agents that run dialogue integration tests.
Troubleshooting table
| Symptom | Likely cause | Fix |
|---|---|---|
Hang on first ReadLine() |
CLI stdout buffering | Switch to HTTP /api/chat |
Immediate connection refused |
Daemon not running | ollama serve / tray app; health-check gate |
| Works in Editor, fails in build | Stripped UnityWebRequest or HTTP blocked |
Allow localhost in firewall; test player build |
| Empty response, no error | Wrong model tag | ollama pull llama3.1:8b; match model string |
Child ollama.exe blocked |
AppLocker / AV | Whitelist path; use HTTP to existing service |
| Slow first token only | Cold model load | Pre-warm with tiny prompt at scene load |
Frequently asked questions
Q: Can I keep Process for anything?
A: Yes — one-shot health probes (ollama list exit code) are fine. Do not stream dialogue through stdout.
Q: Does this apply to Godot or Unreal?
A: Same root cause: use the HTTP API, not CLI redirect. Godot: HTTPRequest with stream buffer; Unreal: FHttpModule.
Q: Should Unity ship Ollama with the game?
A: Most PC indies document “install Ollama separately” for dev fallback. Shipping the runtime is a distribution/licensing choice — still use HTTP to 127.0.0.1, not Process pipes.
Q: First token still slow after HTTP fix?
A: Warm the model at scene start with a 1-token prompt; quantize to a smaller GGUF for festival laptops.
Related help articles and resources
- ElevenLabs Conversational AI Unity SDK 502 Bad Gateway — cloud voice fallback sibling.
- OpenAI API 429 Too Many Requests — cloud text throttle before local fallback.
- Anthropic Messages API 400 Tools JSON Schema — tool-schema failures on hybrid stacks.
- 15 Free LLM-Driven NPC Dialogue Local Fallback Net Resources (2026) — Ollama, llama.cpp, LangChain anchors.
- Your First LLM NPC Dialogue System With a Hard Fallback Net (2026) — two-path architecture.
- Official: Ollama API documentation
Bookmark this page when your local fallback path freezes the Editor — the fix is almost always HTTP streaming, not a better Process redirect.