OpenAI Whisper API Returns 413 Payload Too Large on Playtest VOD - How to Fix
Problem: Your nightly script uploads a 90-minute playtest VOD to the OpenAI Whisper API. The HTTP response is 413 Payload Too Large. No transcript file lands in playtest-vod/out/. The issue board stays empty while Discord clips pile up.
Who is affected now: Teams that read the local Whisper playtest triage blog, got legal approval for cloud transcription, and skipped the segment step because “Whisper should handle long files.” The API enforces a per-request file size cap (commonly 25 MB on the speech-to-text endpoint)—a full-session MP3 from ffmpeg extract often exceeds it in one shot.
Fastest safe fix: Extract mono 16 kHz WAV with ffmpeg → split into 10–15 minute segments each under 25 MB → upload segments with verbose_json → merge text with offset seconds per chunk → set merge_ok: true and chunk_count in playtest_vod_triage_receipt_v1.json. If consent README forbids cloud, use local faster-whisper instead—do not retry the full VOD on 413.
Direct answer
413 means the request body exceeded the API’s upload limit—not that Whisper failed linguistically. Sending an entire playtest session as one MP3/M4A will fail until you chunk. Split audio before upload, transcribe each chunk, stitch timestamps, and log proof in your triage receipt. On repeated 413, halve segment duration and use exponential backoff—never resend the same oversized file unchanged.
Why this issue spikes in June 2026
- The Whisper API chunking resource list shipped beside the local-only stack—teams compare cloud vs local without reading the 25 MB cap.
- Playtest VOD sessions routinely run 60–120 minutes after OBS Replay Buffer exports.
- 413 looks like a “broken API key” in logs—facilitators retry the same file and burn rate limits.
- Fest-week volume makes partial transcripts worse than no transcript—merge order must be deterministic.
Pair with 15 Free Local Whisper and ffmpeg tools when consent blocks cloud, OBS Replay Buffer zero-duration audio when ffprobe shows no audio before upload, and community playtest ops for README language before any upload.
Symptoms and search phrases
- HTTP 413 / Payload Too Large on
audio/transcriptions. - First chunk succeeds; chunk 3+ fails when segments drift over cap.
Content-Type: application/jsonon multipart upload (wrong—must be multipart/form-data).- Transcript stops mid-session with no error in summary markdown.
- Retry loop hammers API with the same 40 MB file.
- Legal approved cloud, but no
cloud_api_usedrow in receipt JSON.
Root causes (check in order)
- Full VOD uploaded as one file — exceeds 25 MB after extract.
- High bitrate extract — stereo 48 kHz WAV blows the cap in ten minutes.
- Wrong container — raw MKV sent instead of compressed segment.
- Missing segment loop — script assumes one
openai.audio.transcriptions.createper session. - No merge step — partial JSON files never concatenated with time offsets.
- 413 retry without resize — backoff on the same payload.
- Consent gap — cloud used when README says local-only (process issue, not HTTP).
Beginner path (first 30 minutes)
Prerequisites: ffmpeg on PATH, OpenAI API key in env var (not committed), one playtest clip under playtest-vod/inbox/, consent README allows cloud if you upload.
- Extract audio:
ffmpeg -i session.mkv -vn -ac 1 -ar 16000 -c:a pcm_s16le session.wav - Check size: if over 25 MB, you must segment—continue to Step 1 below.
- Cut one 10-minute test segment and upload only that—confirm 200 response.
- If test passes, run the segment loop on the full WAV.
Common mistake: Uploading the MKV video—always extract audio first; video inflates size and wastes quota.
Fastest safe fix path
Step 1 — Normalize audio (mono 16 kHz)
ffmpeg -i "playtest-vod/inbox/session_2026-05-24.mkv" `
-vn -ac 1 -ar 16000 -c:a pcm_s16le `
"playtest-vod/work/session_2026-05-24.wav"
Outbound: ffmpeg documentation, OpenAI speech-to-text guide.
Step 2 — Segment under upload cap (10–15 min default)
Fixed-duration split (900 s = 15 min):
$segmentSec = 900
ffmpeg -i "playtest-vod/work/session_2026-05-24.wav" `
-f segment -segment_time $segmentSec -reset_timestamps 1 `
"playtest-vod/work/seg_%03d.wav"
Pass: Each seg_*.wav is under 25 MB (Get-Item seg_*.wav | Select Length).
Fail: Still over cap → lower $segmentSec to 600 (10 min) or export MP3 at 64–128 kbps for API-only lane.
Optional: Silero VAD splits at silence—see chunking resource list.
Step 3 — Upload segments with verbose_json
Python (OpenAI SDK v1+):
from pathlib import Path
from openai import OpenAI
import time
client = OpenAI()
segments_dir = Path("playtest-vod/work")
offset = 0.0
merged = []
for wav in sorted(segments_dir.glob("seg_*.wav")):
size_mb = wav.stat().st_size / (1024 * 1024)
if size_mb > 24:
raise RuntimeError(f"segment too large: {wav} ({size_mb:.1f} MB)")
for attempt in range(4):
try:
with wav.open("rb") as f:
resp = client.audio.transcriptions.create(
model="whisper-1",
file=f,
response_format="verbose_json",
)
break
except Exception as e:
if "413" in str(e) and attempt < 3:
time.sleep(2 ** attempt)
continue
raise
for seg in resp.segments:
merged.append({
"start": seg.start + offset,
"end": seg.end + offset,
"text": seg.text,
})
offset += float(resp.duration)
# write merged transcript
API reference: Create transcription.
Step 4 — Write merged transcript + receipt
{
"schema": "playtest_vod_triage_receipt_v1",
"batch_date": "2026-05-24",
"build_label": "fest-demo-2026-05-24-rc2",
"surface": "playtest",
"cloud_api_used": true,
"chunk_count": 6,
"merge_ok": true,
"segment_duration_sec": 900,
"gates": { "T3_transcript": true, "T6_receipt": true }
}
Attach merged .json or summary.md with build_id and surface per playtest isolation playbook.
Step 5 — Local fallback when cloud blocked or 413 persists
If README forbids cloud or segments still fail after resize:
- Route files to local faster-whisper batch.
- Set
"cloud_api_used": falseand"device_used": "cuda"or"cpu"in receipt. - Do not upload raw VOD to any SaaS “because API failed once.”
Working dev path (proof table)
| Check | Artifact | Pass signal |
|---|---|---|
| Segment size | Get-Item seg_*.wav |
All < 24 MB |
| HTTP status | batch.log | No 413 lines |
| Merge order | merged.json |
Monotonic start times |
| Receipt | playtest_vod_triage_receipt_v1.json |
merge_ok: true, chunk_count matches files |
| Consent | playtest README | Cloud row only if allowed |
| Surface tag | summary.md header | surface=playtest or fest_public |
Verification checklist
- [ ] 90-minute test VOD completes with ordered segments and no 413 in logs.
- [ ] Merged transcript references timestamps past 60:00 (proves offset math).
- [ ]
chunk_countin receipt matchesseg_*.wavfile count. - [ ] Exponential backoff tested—script does not infinite-retry same file.
- [ ] Wednesday smoke row
vod_triage_okupdated when batch green. - [ ] Local fallback path documented when consent denies cloud.
Prevention
- Default local per triage blog; cloud is opt-in per consent README.
- Pre-flight script: reject any upload file > 24 MB before HTTP call.
- Pin
segment_duration_secin repo config—do not tune by hand each night. - Log segment index + file size on every API call.
- CI smoke: 30 s clip via API; separate job for 12-minute synthetic WAV near cap.
- Tag transcripts with
surfacebefore merging into fest fix lists.
Troubleshooting
| Symptom | Fix |
|---|---|
| 413 on chunk 1 only | Wrong file (MKV); re-extract mono 16 kHz |
| 413 on all chunks | Bitrate too high; use MP3 64k or shorter segment_time |
| 200 but empty text | Silent segment; trim with VAD or skip dead air |
| Duplicated paragraphs | Merge missing offset; add resp.duration per chunk |
| 401 / 403 | API key env—not 413; fix auth first |
| Rate limit 429 | Backoff; reduce parallel uploads to 1 |
| Partial files in out/ | Crash mid-loop; resume from last seg_N index |
FAQ
Is 413 the same as a rate limit?
No—413 is body size. 429 is rate. See MDN 413.
Should I use MP3 or WAV for API segments?
WAV is fine if each segment is under the cap. MP3 reduces size when you need longer segments per request.
Can I send video to the API?
Extract audio first—video wastes cap and often triggers 413.
Local Whisper instead?
Yes—preferred when NDAs restrict upload. See CUDA local batch help.
Does this replace the triage blog pipeline?
No—it extends the optional cloud row. The blog’s T1–T6 gates still apply; add chunk_count when cloud_api_used is true.
Related links
- 15 Free OpenAI Whisper API Chunking and Local Fallback Tools
- 15 Free Local Whisper and ffmpeg Playtest VOD Tools
- Local Whisper Playtest VOD Triage Pipeline (2026)
- Local Whisper CUDA Silent CPU Fallback (Windows 11)
- Community Playtest Feedback Ops (2026)
- OpenAI speech-to-text guide
Segment before upload—413 on a full playtest VOD is a chunking bug, not an API outage.