AI Integration Problems May 24, 2026

Local Whisper CUDA Silent CPU Fallback on Windows 11 Playtest VOD Batch - How to Fix

Fix local Whisper on Windows 11 when CUDA looks available but playtest VOD transcription runs CPU-only and stalls overnight. PyTorch wheel alignment, venv discipline, device_used receipt logging.

By GamineAI Team

Local Whisper CUDA Silent CPU Fallback on Windows 11 Playtest VOD Batch - How to Fix

Problem: Your overnight playtest VOD batch uses local Whisper. ffmpeg extracts audio fine. python -c "import torch; print(torch.cuda.is_available())" prints True, but a 2-hour clip still takes all night—Task Manager shows Python on CPU, not your NVIDIA GPU.

Who is affected now: Indies adopting the May 2026 local Whisper playtest triage pipeline on Windows 11 laptops. The dominant failure is silent CPU fallback: CUDA looks available, but Whisper never binds the GPU.

Fastest safe fix: Run device probes in the same venv as whisper, reinstall the matching torch CUDA wheel, pass device="cuda" explicitly (or use faster-whisper with device="cuda"), log device_used on the first segment in playtest_vod_triage_receipt_v1.json, and fall back to base on CPU with an explicit receipt flag—not an accidental overnight stall.

Direct answer

torch.cuda.is_available() only proves a CUDA-capable wheel is installed—not that Whisper loaded weights on the GPU. Wrong venv, CPU-only torch, hybrid graphics routing, empty CUDA_VISIBLE_DEVICES, or a copied device="cpu" snippet all produce 10× slower batches with no obvious error line. Fix the PyTorch + CUDA pairing first; then force device logging so receipts prove which path ran.

Why this issue spikes in 2026

  1. Overnight playtest VOD triage replaced “watch forty Discord clips” workflows.
  2. Teams install CPU torch by default (pip install torch) then wonder why CUDA is “broken.”
  3. Laptop hybrid graphics send python.exe to the iGPU unless the NVIDIA control panel prefers discrete.
  4. Blog snippets omit --device cuda on CLI; OpenAI Whisper Python API defaults can surprise on mixed installs.

Pair with 15 Free Local Whisper and ffmpeg Playtest VOD Triage Tools, FsCheck Editor hang help when the same machine also runs save fuzz overnight, and Ollama first-token hang for local LLM summarize passes on transcripts.

Symptoms and search phrases

  • torch.cuda.is_available()True, GPU usage 0% during whisper run.
  • First 60 s of audio takes minutes to transcribe.
  • Worked on a teammate’s desktop; fails on Windows 11 laptop.
  • nvidia-smi shows no Python process while batch runs.
  • Upgraded GPU driver; batch got slower, not faster.
  • Multiple Python installs—where python ≠ venv used in batch script.

Root causes (check in order)

  1. CPU-only PyTorch in the active venv (+cpu wheel).
  2. CUDA toolkit / driver mismatch with installed torch CUDA build.
  3. Wrong venv — system Python runs Whisper, venv has CUDA torch (or reverse).
  4. CUDA_VISIBLE_DEVICES empty or set to invalid index.
  5. Hybrid graphics — Windows routes Python to integrated GPU.
  6. Explicit device="cpu" or missing device in copied script.
  7. Whisper subprocess uses different interpreter than your probe command.

Fastest safe fix path

Step 1 — Prove the venv and GPU name (same shell as batch)

cd C:\path\to\your\playtest-vod-project
.\.venv\Scripts\Activate.ps1
python -c "import torch; print('cuda_available', torch.cuda.is_available()); print('device', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'none')"
python -c "import torch; print('torch_version', torch.__version__); print('cuda', torch.version.cuda)"
where python

Pass: GPU name prints (e.g. NVIDIA GeForce RTX 4060 Laptop GPU).
Fail: cuda_available True but device errors → driver/CUDA runtime broken.
Fail: cpu in torch.__version__ → reinstall CUDA wheel (Step 2).

Step 2 — Reinstall matching PyTorch CUDA wheel

From pytorch.org pick Windows + Pip + CUDA matching your driver (example CUDA 12.x):

pip uninstall torch torchvision torchaudio -y
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Re-run Step 1. Pin versions in requirements.txt:

torch==2.5.1+cu124
openai-whisper==20240930

Step 3 — Force Whisper device + log first segment

OpenAI Whisper Python API:

import whisper
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
print("whisper_device", device)  # must appear in batch log
model = whisper.load_model("small", device=device)
result = model.transcribe("audio/clip.wav", fp16=(device == "cuda"))

CLI smoke (60 s clip):

whisper audio/smoke.wav --model small --device cuda --output_dir transcripts/

Watch Task Manager → GPU during the run—CUDA or 3D utilization should spike.

Step 4 — faster-whisper lane (often clearer device binding)

pip install faster-whisper
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio/clip.wav")
print("device_used", "cuda", "lang", info.language)
for s in segments:
    print(s.start, s.text)

See faster-whisper on GitHub.

Step 5 — Windows hybrid graphics

  1. Settings → System → Display → Graphics
  2. Add python.exe from your venv (\.venv\Scripts\python.exe).
  3. Set High performance (NVIDIA discrete).
  4. Reboot once after driver updates.

Step 6 — Honest CPU fallback + receipt

If GPU still unavailable after Step 2:

  • Switch model smallbase for overnight queue.
  • Set receipt flag—do not pretend CUDA ran.
{
  "schema": "playtest_vod_triage_receipt_v1",
  "batch_date": "2026-05-24",
  "whisper_model": "base",
  "device_used": "cpu",
  "device_fallback_reason": "cuda_wheel_reinstall_failed",
  "clips_processed": 5,
  "gates": { "T3_transcript": true, "T6_receipt": true }
}

Pass: First 60 s clip on GPU finishes in under 2 minutes with visible GPU use; receipt shows device_used": "cuda".

Verification checklist

  • [ ] Step 1 GPU name prints in batch venv.
  • [ ] whisper_device cuda (or faster-whisper cuda) in log file.
  • [ ] Task Manager shows GPU activity during first segment.
  • [ ] playtest_vod_triage_receipt_v1.json includes device_used.
  • [ ] Full nightly batch completes inside your window (estimate: small ~0.3–0.5× realtime on mid laptop GPU).
  • [ ] where whisper / where python point to same venv in scheduled task.

Prevention

  1. One venv per machine for playtest triage—document activate command in playtest-vod/README.md.
  2. Pin torch + CUDA index URL in requirements.txt; no unpinned pip install torch on QA laptops.
  3. CI smoke: 30 s clip on release runner; fail build if device_used != cuda on GPU agents.
  4. Log device_used on segment 1, not only at batch end.
  5. Prefer faster-whisper when you need explicit compute_type and CTranslate2 speed.
  6. Do not run Whisper batch on the same machine as 1M FsCheck Editor tests—compete for RAM.

Troubleshooting

Symptom Fix
CUDA out of memory Use small not large; close browser; compute_type=int8_float16 in faster-whisper
cudnn errors after driver update Reinstall torch wheel matching new driver; reboot
GPU spikes then 0% Normal between segments; watch sustained load on long files
True but get_device_name fails Driver install corrupt—DDU + clean NVIDIA install
Batch uses wrong Python Scheduled Task must call \.venv\Scripts\python.exe full path
AMD GPU laptop CUDA path N/A—use CPU base or ROCm experimental; log device_used: cpu

FAQ

Is cloud Whisper easier?
Upload caps and consent issues favor local batching—chunk long VODs with ffmpeg before any cloud API. Keep local as default when playtest NDAs restrict uploads.

Does WSL2 share the GPU?
Sometimes—native Windows venv is simpler for fest teams. If using WSL, install CUDA inside WSL, not only on Windows host.

Apple Silicon?
Different lane (MPS/Metal). This help is Windows 11 + NVIDIA CUDA.

Can I summarize transcripts with Ollama?
Yes—text only, HTTP API; never block on CLI stdout per Ollama help above.

Related links

Log device_used on clip one—a green cuda.is_available() line in yesterday’s shell is not proof tonight’s batch used the GPU.