Your First OBS Replay Buffer ffmpeg Concat Before Whisper Playtest Batch in One Evening - 2026 Beginner Pipeline
You dropped twelve OBS Replay Buffer MKV files into playtest-vod/inbox/. Each clip plays fine alone. You ran ffmpeg -f concat and got Non-monotonous DTS, a 0-byte output, or audio that drifts two seconds by minute four. You gave up and transcribed clips one-by-one—then wondered why your local Whisper pipeline batch script skipped half the folder.
June–July 2026 facilitators batch Replay Buffer saves after Discord playtests. The failure is almost never Whisper—it is merge discipline before ASR. This Tutorials & Beginner-First pipeline is the evening between capture and transcription: lock OBS fragment settings, ffprobe every fragment, normalize to one audio format, concat with proof, set concat_ok on playtest_vod_triage_receipt_v1.json, then hand off to Whisper.
Non-repetition note: OBS zero-duration audio help is missing audio tracks; this URL is timestamp and concat failures when audio exists. Whisper API 413 chunking is cloud size limits—not local MKV merge. Planned deep fix: OBS MKV fragments ffmpeg concat help (pairs this tutorial).
Pair with 15 Free Local Whisper and ffmpeg tools, 18 playtest feedback tools, playtest isolation, and BUILD_RECEIPT for build_id on every batch.
Who this is for and what you get
| Audience | You will be able to… |
|---|---|
| First-time playtest facilitator | Merge a session's Replay Buffer clips into one Whisper-ready file |
| Solo dev | Stop losing Tuesday clips to concat errors |
| Producer | Require concat_ok: true before triage standup |
Time: one evening (~90 minutes first setup; 20 minutes per playtest session after).
Prerequisites: OBS Studio with Replay Buffer enabled, ffmpeg and ffprobe on PATH, empty playtest-vod/inbox/ folder convention from the Whisper pipeline blog.
Why this matters now (June–July 2026)
- Replay Buffer default — Community playtest ops recommend Replay Buffer; facilitators produce many small MKVs, not one MP4.
- Whisper batch scripts — The local VOD triage blog assumes one audio file per session; concat is the missing middle step.
- DTS gaps — Mixed sample rates and non-monotonic timestamps explode naive concat—beginners blame Whisper.
- Consent and cost — Merged local file avoids re-uploading twelve fragments to a cloud API.
- October volume — Fixing merge in July prevents triage collapse when fest playtests multiply.
Direct answer: fragments/ → per-file ffprobe log → normalize to 48 kHz stereo WAV segments → concat to session_merged.wav → concat_ok in receipt → Whisper once.
Evening overview (four blocks)
| Block | Minutes | Output |
|---|---|---|
| 1 — OBS profile lock | 20 | obs-replay-profile.md with buffer seconds + tracks |
| 2 — Fragment intake + O1–O2 | 25 | Renamed clips + ffprobe_table.csv |
| 3 — Normalize + concat O3–O5 | 35 | session_merged.wav + concat log |
| 4 — Receipt + Whisper handoff O6 | 10 | playtest_vod_triage_receipt_v1.json with concat_ok |
Mental model — three layers
| Layer | Tool | Proves |
|---|---|---|
| Capture | OBS Replay Buffer | Last N seconds saved on hotkey |
| Merge | ffmpeg (this article) | One timeline-safe audio file |
| Understand | Whisper | Searchable text for triage |
Skipping merge and running Whisper per clip works for three files; it fails operationally at twelve with no build_id session story.
Block 1 — OBS Replay Buffer profile lock
Document once in playtest-vod/obs-replay-profile.md:
| Setting | Recommended | Why |
|---|---|---|
| Format | MKV | Default; supports separate tracks |
| Replay Buffer | 120–180 s | Enough context; not huge files |
| Audio tracks | Desktop + Mic (if used) | Zero-duration audio if tracks wrong |
| Filename pattern | replay_%buildid_%YYYY-MM-DD_%HH-mm-ss |
Sortable; see naming below |
| Output path | playtest-vod/inbox/ |
Matches triage blog |
Hotkey discipline: Facilitators save with build_id spoken aloud or typed in overlay—matches Thursday row review build_id parity.
Outbound: OBS Replay Buffer documentation (official KB).
Naming fragments (beginner rule)
playtest-vod/inbox/
2026-05-25_session-rc4/
001_replay_2026-05-25_19-02-11.mkv
002_replay_2026-05-25_19-14-33.mkv
...
Rules:
- One folder per playtest session (same
build_id). - Three-digit prefix enforces sort order—never rely on filesystem mtime.
- No spaces in filenames (Windows shell scripts thank you).
Gates O1–O6 (concat pass)
| Gate | Name | Pass criterion |
|---|---|---|
| O1 | Audio present | Each fragment: ffprobe shows audio stream, duration greater than 0 |
| O2 | Sample rate known | sample_rate and channels logged per file |
| O3 | Normalized segments | Each fragment converted to norm_XXX.wav same rate/channels |
| O4 | Concat output | session_merged.wav exists and duration ≈ sum of inputs ± 2 s |
| O5 | Listen smoke | No speed-up chipmunk, no long silence gaps mid-file |
| O6 | Receipt | concat_ok: true in playtest_vod_triage_receipt_v1.json |
O1–O5 block Whisper when RED. Fix merge before ASR spend.
Block 2 — ffprobe every fragment (O1–O2)
From session folder:
cd playtest-vod/inbox/2026-05-25_session-rc4
for f in *.mkv; do
echo "=== $f ==="
ffprobe -hide_banner -show_streams -select_streams a:0 "$f"
done > ffprobe_log.txt
Build ffprobe_table.csv:
file,audio_codec,sample_rate,channels,duration_sec,pass_o1
001_replay....mkv,aac,48000,2,125.4,yes
002_replay....mkv,aac,44100,2,118.2,yes
| O1 fail signal | Likely cause | Fix pointer |
|---|---|---|
| No audio stream | OBS track matrix | Zero-duration audio help |
| duration=0 | Corrupt save / disk full | Re-capture |
| Mixed 44100 and 48000 | Different OBS sessions | Normalize in O3 (not a fail if O3 passes) |
Block 3 — Normalize then concat (O3–O5)
Do not concat raw MKVs when sample rates differ. Normalize first.
Step A — Normalize each fragment to WAV
mkdir -p norm
i=1
for f in $(ls -1 *.mkv | sort); do
out=$(printf "norm/norm_%03d.wav" "$i")
ffmpeg -y -i "$f" -vn -ac 2 -ar 48000 -c:a pcm_s16le "$out"
i=$((i+1))
done
Beginner checks:
-vndrops video—Whisper only needs audio.- 48000 Hz stereo is a stable interchange; Whisper accepts 16 kHz later—extract step in triage blog can resample once.
Step B — Concat demuxer list file
cd norm
ls -1 norm_*.wav | sort | sed "s/^/file '/;s/$/'/" > concat_list.txt
ffmpeg -y -f concat -safe 0 -i concat_list.txt -c copy ../session_merged.wav
If -c copy fails with DTS errors, re-encode once:
ffmpeg -y -f concat -safe 0 -i concat_list.txt -ac 2 -ar 48000 -c:a pcm_s16le ../session_merged.wav
Step C — Duration proof (O4)
ffprobe -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 session_merged.wav
Sum duration_sec from CSV; merged duration within ±2 seconds passes O4. Larger drift → missing fragment or double-count—re-run inventory.
Step D — Listen smoke (O5)
Headphones: skimming start, middle, end of session_merged.wav. Chipmunk voice → wrong sample rate assumed. Long dead air → hotkey saved menu idle—not a concat fail, but tag triage low priority.
Block 4 — Receipt and Whisper handoff (O6)
Extend triage receipt from Whisper pipeline blog:
{
"schema": "playtest_vod_triage_receipt_v1",
"build_id": "nextfest-oct-2026-rc4",
"surface": "playtest_invite",
"session_folder": "playtest-vod/inbox/2026-05-25_session-rc4",
"fragment_count": 12,
"concat_ok": true,
"merged_audio": "playtest-vod/inbox/2026-05-25_session-rc4/session_merged.wav",
"merged_duration_sec": 1420.5,
"normalize_profile": "48000_stereo_pcm_s16le",
"gates": {
"O1_audio_present": "pass",
"O2_sample_rate_logged": "pass",
"O3_normalized": "pass",
"O4_duration_proof": "pass",
"O5_listen_smoke": "pass",
"O6_receipt": "pass"
},
"whisper_next": "extract_16k_mono_then_batch",
"notes": "Replay Buffer 120s; facilitator laptop Win11"
}
Only after concat_ok: true run Whisper extract + transcribe on session_merged.wav (or chunk merged file per API 413 if you chose cloud lane).
Surface and build_id (do not skip)
| Field | Source |
|---|---|
build_id |
In-game build_label or BUILD_RECEIPT |
surface |
playtest_invite vs fest_public per isolation playbook |
Wrong surface corrupts triage boards—not a concat bug, but receipt must be right before standup.
PowerShell variant (Windows facilitators)
$session = "playtest-vod\inbox\2026-05-25_session-rc4"
New-Item -Force -Path "$session\norm" | Out-Null
$i = 1
Get-ChildItem "$session\*.mkv" | Sort-Object Name | ForEach-Object {
$out = "{0}\norm\norm_{1:D3}.wav" -f $session, $i
ffmpeg -y -i $_.FullName -vn -ac 2 -ar 48000 -c:a pcm_s16le $out
$i++
}
Concat list and merge commands mirror bash; keep paths quoted.
Common concat errors (troubleshooting)
| Error / symptom | Cause | Fix |
|---|---|---|
Non-monotonous DTS |
Raw MKV concat | Normalize to WAV first (O3) |
| Output file 0 bytes | Empty concat list | Sort + concat_list.txt paths |
| Audio faster than video | Wrong -ar on input |
Re-normalize from source MKV |
| Whisper timestamps jump | Per-clip transcribe merged without offsets | Transcribe merged file once |
| Fragment 7 silent | OBS saved before game audio | O1 fail; exclude from concat |
| Merged too short | Missing numbered file | Re-check 001–00N sequence |
When not to concat
| Situation | Action |
|---|---|
Fragments from different build_id |
Separate session folders |
| One fragment is 2 h rest-of-stream | Transcribe alone; do not merge with 2 min clips |
| Legal requires per-clip deletion | Keep fragments; document policy |
| Cloud API only | Still normalize; upload segments under 25 MB each |
Future decision tree blog (backlog) expands lane choice—tonight assume local concat + local Whisper.
Facilitator README snippet (paste)
## Replay Buffer → Whisper
1. Save clips into `playtest-vod/inbox/YYYY-MM-DD_session-<build_id>/` with 001_ prefix.
2. Run ffprobe table; fix zero-audio before merge.
3. Normalize to 48 kHz WAV → `session_merged.wav`.
4. Set `concat_ok` in playtest_vod_triage_receipt_v1.json.
5. Run Whisper batch only when concat_ok is true.
Link README from multi-channel facilitator contract when that post ships.
Integration with weekly ops
| Day | Ritual | Uses merged audio? |
|---|---|---|
| After playtest | This concat pipeline | Creates session_merged.wav |
| Same night | Whisper triage | Yes |
| Wednesday | Demo smoke | No (binary) |
| Thursday | Row review | Receipt only |
Worked example (twelve fragments)
Input: 12 MKVs, 11× ~120 s + 1× 45 s, mixed 44100/48000 from two OBS restarts.
| Step | Result |
|---|---|
| O1 | Fragment 9 fails—re-export from OBS; 11 pass |
| O3 | 11 norm_*.wav |
| O4 | Merged 1335 s vs expected 1338 s — PASS |
| O5 | No chipmunk |
| Whisper | One transcript; issues tagged with approximate timestamps |
Lesson: Excluding bad fragment beat blind concat of all twelve.
Python batch helper (optional)
from pathlib import Path
import json, subprocess
session = Path("playtest-vod/inbox/2026-05-25_session-rc4")
fragments = sorted(session.glob("*.mkv"))
norm = session / "norm"
norm.mkdir(exist_ok=True)
for i, mkv in enumerate(fragments, 1):
out = norm / f"norm_{i:03d}.wav"
subprocess.run([
"ffmpeg", "-y", "-i", str(mkv), "-vn",
"-ac", "2", "-ar", "48000", "-c:a", "pcm_s16le", str(out)
], check=True)
# write concat_list.txt then ffmpeg concat (see Block 3)
receipt = {
"concat_ok": True,
"fragment_count": len(fragments),
"merged_audio": str(session / "session_merged.wav"),
}
(session / "playtest_vod_triage_receipt_v1.json").write_text(
json.dumps(receipt, indent=2), encoding="utf-8"
)
Automate after one manual GREEN evening—scripts should not hide O5 listen smoke.
Privacy and retention
- Merged WAV still contains player voice—same consent as triage blog.
- Delete
norm/intermediates after Whisper if disk tight; keep receipt + transcript. - Do not upload
session_merged.wavto public issue trackers.
Outbound references
- ffmpeg concat demuxer — official concat docs
- Whisper GitHub — model sizes for batch after merge
Related GamineAI reads
- Local Whisper playtest VOD triage
- OBS zero-duration audio help
- 15 Free Local Whisper ffmpeg tools
- 18 playtest feedback tools
- Playtest isolation playbook
- Thursday BUILD_RECEIPT row review
- CUDA silent CPU fallback help
Key takeaways
- Replay Buffer produces many MKVs—merge before Whisper, not twelve separate ASR jobs without a plan.
- Run O1–O6: ffprobe audio, log rates, normalize to 48 kHz WAV, concat, duration proof, listen smoke, receipt.
- Set
concat_ok: trueonplaytest_vod_triage_receipt_v1.jsonbefore batch transcription. - Normalize before concat fixes most
Non-monotonous DTSerrors beginners blame on Whisper. - Numbered filenames (
001_,002_) beat sorting by clock or mtime. - One session folder per
build_id; never merge fragments across builds. - Pair with zero-duration audio help when O1 fails.
- ~90 minutes first evening; ~20 minutes per session once profile is locked.
- Forward-fix depth lives in planned MKV concat help.
- 16-tool concat prep listicle bookmarks tools; this URL is the hands-on beginner pipeline.
FAQ
Why not concat MKV directly?
Different codecs, B-frames, and DTS timelines across hotkey saves break naive concat. WAV normalize is boring and reliable.
Does this replace the Whisper pipeline blog?
No. That blog owns extract → transcribe → triage. This blog owns OBS fragments → session_merged.wav.
What if only three clips exist?
You may skip concat and transcribe per clip—but still run O1 and use the same receipt schema with concat_ok: true and fragment_count: 3 noting per-file mode.
Should I use MP4 instead of MKV in OBS?
MKV is fine if you normalize. MP4 does not remove the need for O3 when rates differ.
Cloud Whisper after concat?
Yes—chunk session_merged.wav under API limits. Local-first teams stay on local Whisper resources.
How does this relate to Thursday row review?
Row review diffs BUILD_RECEIPT rows; concat receipt proves triage inputs were merged correctly for that build_id.
What sample rate for Whisper?
This pipeline uses 48 kHz interchange; triage blog often resamples to 16 kHz mono at extract—one resample step, not three per fragment.
Batch folder layout (release evidence)
Archive proof beside BUILD_RECEIPT when partners ask how playtest feedback was captured:
release-evidence/
06-playtest-vod/
2026-05-25_session-rc4/
ffprobe_table.csv
concat_log.txt
playtest_vod_triage_receipt_v1.json
session_merged.wav # optional archive; may delete after transcript
transcript/
session_merged.txt
Producer rule: Standup slide shows concat_ok, fragment_count, and top three issue titles from transcript—not raw MKV paths.
Silero VAD pre-check (optional O5b)
Before Whisper, run a ten-second VAD sanity check on session_merged.wav if facilitators were silent during menu captures:
# illustrative: your VAD tool prints speech spans
# fail O5b if zero speech spans but duration > 600s
This catches merged menus with no commentary—not a concat failure, but saves ASR minutes. Link Silero docs when you add tools listicle #2.
Compare lanes — concat merge vs per-clip ASR
| Approach | Pros | Cons |
|---|---|---|
| Merged + one Whisper | One timeline; simpler issue titles | One bad fragment excluded manually |
| Per-clip Whisper | Isolates corrupt MKV | Twelve transcripts; offset math hell |
| Cloud API per clip | No local GPU | Cost; consent; 413 on long clips |
Tonight picks merged + one Whisper for facilitators with 6–20 clips per session.
Steam Playtest night checklist (facilitator)
| Before session | During | After (this pipeline) |
|---|---|---|
| OBS profile saved | Hotkey saves with verbal build_id |
ffprobe table GREEN |
| Disk 5 GB free | Note game mode + map | normalize + concat |
| README linked in Discord | No desktop audio-only tracks | concat_ok receipt |
| Isolation playbook read | Tag surface in overlay |
Whisper triage blog steps |
Engine-agnostic note
This pipeline is tooling, not Unity/Godot/Construct specific. Console capture may use different containers—still run O1–O6 on whatever files land in inbox/. Wednesday demo smoke remains the game binary gate; concat is the feedback audio gate.
Mistakes we see in Discord support threads
| Quote | Reality |
|---|---|
| “Whisper broke” | Concat never produced merged WAV |
| “Only last clip transcribed” | concat_list.txt not sorted |
| “Chipmunk voice” | Forced 16 kHz on 48 kHz without resample |
| “Merged 40 minutes” | Included AFK hour—split sessions |
| “API 413” | Merged file huge—chunk after concat per 413 help |
Pointing beginners to O1–O6 before model size debates saves hours.