Local Whisper CUDA Silent CPU Fallback on Windows 11 Playtest VOD Batch - How to Fix
Problem: Your overnight playtest VOD batch uses local Whisper. ffmpeg extracts audio fine. python -c "import torch; print(torch.cuda.is_available())" prints True, but a 2-hour clip still takes all night—Task Manager shows Python on CPU, not your NVIDIA GPU.
Who is affected now: Indies adopting the May 2026 local Whisper playtest triage pipeline on Windows 11 laptops. The dominant failure is silent CPU fallback: CUDA looks available, but Whisper never binds the GPU.
Fastest safe fix: Run device probes in the same venv as whisper, reinstall the matching torch CUDA wheel, pass device="cuda" explicitly (or use faster-whisper with device="cuda"), log device_used on the first segment in playtest_vod_triage_receipt_v1.json, and fall back to base on CPU with an explicit receipt flag—not an accidental overnight stall.
Direct answer
torch.cuda.is_available() only proves a CUDA-capable wheel is installed—not that Whisper loaded weights on the GPU. Wrong venv, CPU-only torch, hybrid graphics routing, empty CUDA_VISIBLE_DEVICES, or a copied device="cpu" snippet all produce 10× slower batches with no obvious error line. Fix the PyTorch + CUDA pairing first; then force device logging so receipts prove which path ran.
Why this issue spikes in 2026
- Overnight playtest VOD triage replaced “watch forty Discord clips” workflows.
- Teams install CPU
torchby default (pip install torch) then wonder why CUDA is “broken.” - Laptop hybrid graphics send
python.exeto the iGPU unless the NVIDIA control panel prefers discrete. - Blog snippets omit
--device cudaon CLI; OpenAI Whisper Python API defaults can surprise on mixed installs.
Pair with 15 Free Local Whisper and ffmpeg Playtest VOD Triage Tools, FsCheck Editor hang help when the same machine also runs save fuzz overnight, and Ollama first-token hang for local LLM summarize passes on transcripts.
Symptoms and search phrases
torch.cuda.is_available()→ True, GPU usage 0% duringwhisperrun.- First 60 s of audio takes minutes to transcribe.
- Worked on a teammate’s desktop; fails on Windows 11 laptop.
nvidia-smishows no Python process while batch runs.- Upgraded GPU driver; batch got slower, not faster.
- Multiple Python installs—
where python≠ venv used in batch script.
Root causes (check in order)
- CPU-only PyTorch in the active venv (
+cpuwheel). - CUDA toolkit / driver mismatch with installed
torchCUDA build. - Wrong venv — system Python runs Whisper, venv has CUDA torch (or reverse).
CUDA_VISIBLE_DEVICESempty or set to invalid index.- Hybrid graphics — Windows routes Python to integrated GPU.
- Explicit
device="cpu"or missing device in copied script. - Whisper subprocess uses different interpreter than your probe command.
Fastest safe fix path
Step 1 — Prove the venv and GPU name (same shell as batch)
cd C:\path\to\your\playtest-vod-project
.\.venv\Scripts\Activate.ps1
python -c "import torch; print('cuda_available', torch.cuda.is_available()); print('device', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'none')"
python -c "import torch; print('torch_version', torch.__version__); print('cuda', torch.version.cuda)"
where python
Pass: GPU name prints (e.g. NVIDIA GeForce RTX 4060 Laptop GPU).
Fail: cuda_available True but device errors → driver/CUDA runtime broken.
Fail: cpu in torch.__version__ → reinstall CUDA wheel (Step 2).
Step 2 — Reinstall matching PyTorch CUDA wheel
From pytorch.org pick Windows + Pip + CUDA matching your driver (example CUDA 12.x):
pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
Re-run Step 1. Pin versions in requirements.txt:
torch==2.5.1+cu124
openai-whisper==20240930
Step 3 — Force Whisper device + log first segment
OpenAI Whisper Python API:
import whisper
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print("whisper_device", device) # must appear in batch log
model = whisper.load_model("small", device=device)
result = model.transcribe("audio/clip.wav", fp16=(device == "cuda"))
CLI smoke (60 s clip):
whisper audio/smoke.wav --model small --device cuda --output_dir transcripts/
Watch Task Manager → GPU during the run—CUDA or 3D utilization should spike.
Step 4 — faster-whisper lane (often clearer device binding)
pip install faster-whisper
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio/clip.wav")
print("device_used", "cuda", "lang", info.language)
for s in segments:
print(s.start, s.text)
Step 5 — Windows hybrid graphics
- Settings → System → Display → Graphics
- Add
python.exefrom your venv (\.venv\Scripts\python.exe). - Set High performance (NVIDIA discrete).
- Reboot once after driver updates.
Step 6 — Honest CPU fallback + receipt
If GPU still unavailable after Step 2:
- Switch model
small→basefor overnight queue. - Set receipt flag—do not pretend CUDA ran.
{
"schema": "playtest_vod_triage_receipt_v1",
"batch_date": "2026-05-24",
"whisper_model": "base",
"device_used": "cpu",
"device_fallback_reason": "cuda_wheel_reinstall_failed",
"clips_processed": 5,
"gates": { "T3_transcript": true, "T6_receipt": true }
}
Pass: First 60 s clip on GPU finishes in under 2 minutes with visible GPU use; receipt shows device_used": "cuda".
Verification checklist
- [ ] Step 1 GPU name prints in batch venv.
- [ ]
whisper_device cuda(or faster-whisper cuda) in log file. - [ ] Task Manager shows GPU activity during first segment.
- [ ]
playtest_vod_triage_receipt_v1.jsonincludesdevice_used. - [ ] Full nightly batch completes inside your window (estimate:
small~0.3–0.5× realtime on mid laptop GPU). - [ ]
where whisper/where pythonpoint to same venv in scheduled task.
Prevention
- One venv per machine for playtest triage—document activate command in
playtest-vod/README.md. - Pin
torch+ CUDA index URL inrequirements.txt; no unpinnedpip install torchon QA laptops. - CI smoke: 30 s clip on release runner; fail build if
device_used != cudaon GPU agents. - Log
device_usedon segment 1, not only at batch end. - Prefer faster-whisper when you need explicit
compute_typeand CTranslate2 speed. - Do not run Whisper batch on the same machine as 1M FsCheck Editor tests—compete for RAM.
Troubleshooting
| Symptom | Fix |
|---|---|
CUDA out of memory |
Use small not large; close browser; compute_type=int8_float16 in faster-whisper |
cudnn errors after driver update |
Reinstall torch wheel matching new driver; reboot |
| GPU spikes then 0% | Normal between segments; watch sustained load on long files |
True but get_device_name fails |
Driver install corrupt—DDU + clean NVIDIA install |
| Batch uses wrong Python | Scheduled Task must call \.venv\Scripts\python.exe full path |
| AMD GPU laptop | CUDA path N/A—use CPU base or ROCm experimental; log device_used: cpu |
FAQ
Is cloud Whisper easier?
Upload caps and consent issues favor local batching—chunk long VODs with ffmpeg before any cloud API. Keep local as default when playtest NDAs restrict uploads.
Does WSL2 share the GPU?
Sometimes—native Windows venv is simpler for fest teams. If using WSL, install CUDA inside WSL, not only on Windows host.
Apple Silicon?
Different lane (MPS/Metal). This help is Windows 11 + NVIDIA CUDA.
Can I summarize transcripts with Ollama?
Yes—text only, HTTP API; never block on CLI stdout per Ollama help above.
Related links
- Local Whisper Playtest VOD Triage Pipeline (2026)
- 15 Free Local Whisper and ffmpeg Playtest VOD Triage Tools
- Steam Playtest vs Fest Demo Isolation Playbook
- Ollama Local LLM Fallback First-Token Hang Fix
- OpenAI Whisper (GitHub)
- PyTorch Get Started
Log device_used on clip one—a green cuda.is_available() line in yesterday’s shell is not proof tonight’s batch used the GPU.