OpenAI Whisper API Returns 413 Payload Too Large on Playtest VOD - How to Fix

Problem: Your nightly script uploads a 90-minute playtest VOD to the OpenAI Whisper API. The HTTP response is 413 Payload Too Large. No transcript file lands in playtest-vod/out/. The issue board stays empty while Discord clips pile up.

Who is affected now: Teams that read the local Whisper playtest triage blog, got legal approval for cloud transcription, and skipped the segment step because “Whisper should handle long files.” The API enforces a per-request file size cap (commonly 25 MB on the speech-to-text endpoint)—a full-session MP3 from ffmpeg extract often exceeds it in one shot.

Fastest safe fix: Extract mono 16 kHz WAV with ffmpeg → split into 10–15 minute segments each under 25 MB → upload segments with verbose_json → merge text with offset seconds per chunk → set merge_ok: true and chunk_count in playtest_vod_triage_receipt_v1.json. If consent README forbids cloud, use local faster-whisper instead—do not retry the full VOD on 413.

Direct answer

413 means the request body exceeded the API’s upload limit—not that Whisper failed linguistically. Sending an entire playtest session as one MP3/M4A will fail until you chunk. Split audio before upload, transcribe each chunk, stitch timestamps, and log proof in your triage receipt. On repeated 413, halve segment duration and use exponential backoff—never resend the same oversized file unchanged.

Why this issue spikes in June 2026

The Whisper API chunking resource list shipped beside the local-only stack—teams compare cloud vs local without reading the 25 MB cap.
Playtest VOD sessions routinely run 60–120 minutes after OBS Replay Buffer exports.
413 looks like a “broken API key” in logs—facilitators retry the same file and burn rate limits.
Fest-week volume makes partial transcripts worse than no transcript—merge order must be deterministic.

Pair with 15 Free Local Whisper and ffmpeg tools when consent blocks cloud, OBS Replay Buffer zero-duration audio when ffprobe shows no audio before upload, and community playtest ops for README language before any upload.

Symptoms and search phrases

HTTP 413 / Payload Too Large on audio/transcriptions.
First chunk succeeds; chunk 3+ fails when segments drift over cap.
Content-Type: application/json on multipart upload (wrong—must be multipart/form-data).
Transcript stops mid-session with no error in summary markdown.
Retry loop hammers API with the same 40 MB file.
Legal approved cloud, but no cloud_api_used row in receipt JSON.

Root causes (check in order)

Full VOD uploaded as one file — exceeds 25 MB after extract.
High bitrate extract — stereo 48 kHz WAV blows the cap in ten minutes.
Wrong container — raw MKV sent instead of compressed segment.
Missing segment loop — script assumes one openai.audio.transcriptions.create per session.
No merge step — partial JSON files never concatenated with time offsets.
413 retry without resize — backoff on the same payload.
Consent gap — cloud used when README says local-only (process issue, not HTTP).

Beginner path (first 30 minutes)

Prerequisites: ffmpeg on PATH, OpenAI API key in env var (not committed), one playtest clip under playtest-vod/inbox/, consent README allows cloud if you upload.

Extract audio: ffmpeg -i session.mkv -vn -ac 1 -ar 16000 -c:a pcm_s16le session.wav
Check size: if over 25 MB, you must segment—continue to Step 1 below.
Cut one 10-minute test segment and upload only that—confirm 200 response.
If test passes, run the segment loop on the full WAV.

Common mistake: Uploading the MKV video—always extract audio first; video inflates size and wastes quota.

Fastest safe fix path

Step 1 — Normalize audio (mono 16 kHz)

ffmpeg -i "playtest-vod/inbox/session_2026-05-24.mkv" `
  -vn -ac 1 -ar 16000 -c:a pcm_s16le `
  "playtest-vod/work/session_2026-05-24.wav"

Outbound: ffmpeg documentation, OpenAI speech-to-text guide.

Step 2 — Segment under upload cap (10–15 min default)

Fixed-duration split (900 s = 15 min):

$segmentSec = 900
ffmpeg -i "playtest-vod/work/session_2026-05-24.wav" `
  -f segment -segment_time $segmentSec -reset_timestamps 1 `
  "playtest-vod/work/seg_%03d.wav"

Pass: Each seg_*.wav is under 25 MB (Get-Item seg_*.wav | Select Length).
Fail: Still over cap → lower $segmentSec to 600 (10 min) or export MP3 at 64–128 kbps for API-only lane.

Optional: Silero VAD splits at silence—see chunking resource list.

Step 3 — Upload segments with verbose_json

Python (OpenAI SDK v1+):

from pathlib import Path
from openai import OpenAI
import time

client = OpenAI()
segments_dir = Path("playtest-vod/work")
offset = 0.0
merged = []

for wav in sorted(segments_dir.glob("seg_*.wav")):
    size_mb = wav.stat().st_size / (1024 * 1024)
    if size_mb > 24:
        raise RuntimeError(f"segment too large: {wav} ({size_mb:.1f} MB)")

    for attempt in range(4):
        try:
            with wav.open("rb") as f:
                resp = client.audio.transcriptions.create(
                    model="whisper-1",
                    file=f,
                    response_format="verbose_json",
                )
            break
        except Exception as e:
            if "413" in str(e) and attempt < 3:
                time.sleep(2 ** attempt)
                continue
            raise

    for seg in resp.segments:
        merged.append({
            "start": seg.start + offset,
            "end": seg.end + offset,
            "text": seg.text,
        })
    offset += float(resp.duration)

# write merged transcript

API reference: Create transcription.

Step 4 — Write merged transcript + receipt

{
  "schema": "playtest_vod_triage_receipt_v1",
  "batch_date": "2026-05-24",
  "build_label": "fest-demo-2026-05-24-rc2",
  "surface": "playtest",
  "cloud_api_used": true,
  "chunk_count": 6,
  "merge_ok": true,
  "segment_duration_sec": 900,
  "gates": { "T3_transcript": true, "T6_receipt": true }
}

Attach merged .json or summary.md with build_id and surface per playtest isolation playbook.

Step 5 — Local fallback when cloud blocked or 413 persists

If README forbids cloud or segments still fail after resize:

Route files to local faster-whisper batch.
Set "cloud_api_used": false and "device_used": "cuda" or "cpu" in receipt.
Do not upload raw VOD to any SaaS “because API failed once.”

Working dev path (proof table)

Check	Artifact	Pass signal
Segment size	`Get-Item seg_*.wav`	All < 24 MB
HTTP status	batch.log	No 413 lines
Merge order	`merged.json`	Monotonic `start` times
Receipt	`playtest_vod_triage_receipt_v1.json`	`merge_ok: true`, `chunk_count` matches files
Consent	playtest README	Cloud row only if allowed
Surface tag	summary.md header	`surface=playtest` or `fest_public`

Verification checklist

[ ] 90-minute test VOD completes with ordered segments and no 413 in logs.
[ ] Merged transcript references timestamps past 60:00 (proves offset math).
[ ] chunk_count in receipt matches seg_*.wav file count.
[ ] Exponential backoff tested—script does not infinite-retry same file.
[ ] Wednesday smoke row vod_triage_ok updated when batch green.
[ ] Local fallback path documented when consent denies cloud.

Prevention

Default local per triage blog; cloud is opt-in per consent README.
Pre-flight script: reject any upload file > 24 MB before HTTP call.
Pin segment_duration_sec in repo config—do not tune by hand each night.
Log segment index + file size on every API call.
CI smoke: 30 s clip via API; separate job for 12-minute synthetic WAV near cap.
Tag transcripts with surface before merging into fest fix lists.

Troubleshooting

Symptom	Fix
413 on chunk 1 only	Wrong file (MKV); re-extract mono 16 kHz
413 on all chunks	Bitrate too high; use MP3 64k or shorter `segment_time`
200 but empty text	Silent segment; trim with VAD or skip dead air
Duplicated paragraphs	Merge missing `offset`; add `resp.duration` per chunk
401 / 403	API key env—not 413; fix auth first
Rate limit 429	Backoff; reduce parallel uploads to 1
Partial files in out/	Crash mid-loop; resume from last `seg_N` index

FAQ

Is 413 the same as a rate limit?
No—413 is body size. 429 is rate. See MDN 413.

Should I use MP3 or WAV for API segments?
WAV is fine if each segment is under the cap. MP3 reduces size when you need longer segments per request.

Can I send video to the API?
Extract audio first—video wastes cap and often triggers 413.

Local Whisper instead?
Yes—preferred when NDAs restrict upload. See CUDA local batch help.

Does this replace the triage blog pipeline?
No—it extends the optional cloud row. The blog’s T1–T6 gates still apply; add chunk_count when cloud_api_used is true.