We Recovered From a Store-Demo Mismatch Yellow in 48 Hours - Evidence Timeline for Partner Review (2026 Case Study)
The email subject line was polite: Yellow — please clarify store vs demo scope before next diligence pass. Inside, a reviewer pasted your Steam FAQ line about co-op next to a single-player fest demo build id that did not match the short description’s implied scope. No invented studio names here—this Case Studies & Experiments write-up is a field-style recovery timeline you can copy: what we did in the first 48 hours, which artifacts we produced, and which habits we kept so the mismatch class does not reopen during October 2026 Next Fest traffic.
This is not legal advice. It is evidence discipline aligned with wishlist truth audit, Wednesday metadata diff ritual, 5-day metadata freeze challenge, and BUILD_RECEIPT culture.
Who this is for and what you get
| Audience | You will be able to… |
|---|---|
| Producer owning partner comms | Ship a 48-hour response packet without panic threads |
| Engineer on call | Tie build_id, depot ids, and store exports to one folder |
| Solo dev | Freeze promotion and fix FAQ before social amplification |
Time to read: ~25 minutes. Time to execute pattern: ~48 hours wall-clock with two focused people rotating.
Why this matters now (May 2026)
- Partner dimension 7 — Mock audit tabletop already scores FAQ vs demo; real reviewers do the same.
- Intake compression — 72-hour windows mean yellows stack; fast evidence beats long apologies.
- Metadata tooling exists — 14 free metadata diff tools remove the excuse “we could not diff.”
- LLM FAQ risk — FAQ LLM pipeline failures show up as store-demo mismatch, not model bugs.
- Fest branch velocity — Demo depot playbook promotes builds faster than copy updates.
Direct answer: Hour zero — freeze promotion and ads; hour four — export live store + capture build_id; hour forty-eight — attach diff + fixed FAQ + cold install log to partner reply.
Ground rules (no invented facts)
- No named studios, round funding, or fabricated review quotes.
- Timeline is pattern composite from multiple micro-team recoveries in Q1–Q2 2026 partner and cert-adjacent intakes—use as a checklist, not a guarantee your reviewer uses identical wording.
- Yellow means “clarify before green,” not lawsuit—still treat as ship-blocking until resolved.
Hour 0–2 — Freeze and snapshot
| Minute block | Action | Artifact |
|---|---|---|
| 0–15 | Reply email: “Acknowledged; promotion and paid traffic paused until resolved.” | Sent mail id in incident-2026-05-18/timeline.md |
| 15–45 | Screenshot reviewer’s pasted FAQ + their cited build_id |
00-reviewer-paste.png |
| 45–120 | Export live Steam store text (G1–G3) per Wednesday ritual field groups | live-hour0-g1g3.txt |
Promotion freeze: Same rule as fest marketing cap—no spend bump while yellow is open.
Hour 2–8 — Binary truth
| Task | Owner | Pass criterion |
|---|---|---|
| Confirm which Steam branch players receive | Engineer | Matches fest-demo-depot-audit-2026.md |
| Pull BUILD_RECEIPT for that branch | Engineer | build_id equals reviewer’s or explain delta |
| Cold install on clean account | Engineer + QA | First-hour gameplay matches FAQ claims |
| Hash demo zip if partner packet expects it | Engineer | Cold-hash challenge discipline |
If binary truth disagrees with reviewer’s cited id, prove which id was live at email timestamp with upload log row—do not argue from memory.
Hour 8–24 — Copy fix path
Branch A — FAQ wrong, demo right: Delete co-op line; update FAQ approved files; run Gate D; paste Steam; re-export live-hour20-g3.txt; diff empty vs approved.
Branch B — Demo wrong, FAQ right: Rare but happens after accidental promotion—rollback branch per demo depot playbook; update BUILD_RECEIPT; re-cold install.
Branch C — both wrong: Fix FAQ first (fast), then schedule demo build fix—yellow may stay open until demo ships; communicate timeline honestly.
Hour 8–16 — Deep binary audit (when yellow is not “just FAQ”)
Some yellows arrive as FAQ text but the underlying issue is branch drift—the reviewer installed a build from a branch your internal Slack still calls “current fest,” while Steam actually served an older depot. Use this sub-timeline when the first cold install disagrees with the reviewer’s notes.
| Hour block | Action | Artifact |
|---|---|---|
| 8–10 | List every Steam branch visible in Steamworks; screenshot | branches-08.png |
| 10–12 | Compare each branch’s last promotion timestamp to upload_log.csv |
branch-promotion-table.md |
| 12–14 | If drift found, document which branch was intended vs live | delta-branch.txt |
| 14–16 | Roll forward or rollback promotion per demo depot playbook; re-cold install | cold-install-16.log |
Stop rule: If you cannot explain branch drift in two paragraphs, do not email the reviewer yet—ambiguous replies extend yellow into a second cycle under intake compression.
Hour 16–24 — Copy fix path (expanded decision tree)
The earlier Branch A/B/C table is the headline. Below is the expanded decision tree teams actually use when tired.
Branch A — FAQ wrong, demo right (most common)
- Open
faq-approved/in git; if missing, create from live export as untrusted baseline then edit down. - Delete or narrow every sentence that implies multiplayer, full-game scope, or platforms not in the demo zip.
- Run FAQ LLM pipeline Gate D even if humans wrote the text—yellow is proof drift happened without governance.
- Paste to Steamworks; export
live-postfix-g3.txt; diff must be empty vs approved. - Add
faq-promotion-log.mdrow with reviewer ticket id in notes column.
Narrative risk: Marketing wants to “soften” language instead of deleting lies—reject. Softening keeps yellow semantics alive.
Branch B — Demo wrong, FAQ right (less common)
- Confirm FAQ text never promised the broken feature—attach diff proving FAQ was honest at hour zero.
- Fix demo binary or rollback promotion to last known-good
build_id. - Update BUILD_RECEIPT; re-run validate-packet exit 0 before telling reviewer “fixed.”
- If rollback affects fest schedule, say so with new ETA—silent re-promote is how trust dies twice.
Branch C — both wrong (worst case, still survivable)
- Ship FAQ fix first within 24 hours to stop new players reading lies.
- Open second ticket internally for demo scope bug with owner + date.
- Email reviewer: “FAQ corrected same day; demo scope fix ETA date with interim build id x.”
- Never promise same-day demo fix if engineering has not signed it.
Hour 24–36 — Evidence packet assembly
Folder partner-reply-2026-05-19/:
timeline.md— hour-by-hour bulletslive-hour0vslive-hour30diffsBUILD_RECEIPT.json+upload_log.csvexcerptcold-install-checklist-signed.pdfor markdown sign-offfaq-promotion-log.mdrows for changed questions- Link to
release-evidence/path reviewer can grep
Zip with partner ZIP naming README bullet map—bullet 3 lists depot id.
Hour 36–48 — Human review and send
| Step | Action |
|---|---|
| 36–42 | Producer reads live Steam FAQ aloud; engineer says pass/fail per line vs demo |
| 42–46 | Second human signs high-risk rows (AI, multiplayer, price) |
| 46–48 | Email partner with zip link + one-paragraph root cause + prevention |
Prevention paragraph template:
Root cause: FAQ drifted after
build_idpromotion without Wednesday metadata diff. Prevention: promotion gate requires weekly diff log + 5-day metadata freeze before fest month; FAQ changes route through human diff pipeline.
Tabletop rehearsal (30 minutes, before send)
- Producer reads final FAQ aloud.
- Engineer runs demo first hour on clean machine without speaking.
- Writer marks any spoken sentence not experienced in hour one.
- Marketer checks 32px icon and first screenshot still match post-FAQ story.
- Sign
tabletop-signoff.md.
Skipping tabletop is how “we fixed FAQ” ships with capsule text still promising co-op.
Distributed team note
Asia-EU handoff teams: assign single incident commander in UTC window covering both reviewer business hours and your promotion window. Handoffs without commander produce duplicate Steam edits—worse than the original yellow.
Post-send stabilization (hour 48–72)
Yellow cleared does not mean habits fixed. Schedule:
| Task | When |
|---|---|
| Wednesday diff | Next calendar occurrence |
| 5-day metadata freeze | Within two weeks if fest prep active |
| UTM experiment review | Pause corrupted rows from yellow window |
| Fest marketing cap | Reconcile spend paused during incident |
Wishlist monitoring: If yellow coincided with a traffic spike, scan for one-star “misleading” reviews—respond with factual correction link only, not argument.
What we changed after recovery (process deltas)
| Before | After |
|---|---|
| Promote branch when CI green | Promote only when Wednesday log GREEN |
| FAQ edits in Steamworks only | FAQ edits in git faq-approved/ first |
| No cold install on copy change | Cold install when G3 or G6 changes |
| Single owner for store | Writer + producer two-human on risk rows |
Comparison to hash mismatch case study
Hash mismatch recovery is artifact integrity. This case is marketing vs binary parity—different failure class, same evidence tone: timelines, receipts, diffs, no drama adjectives.
Metrics we tracked internally (not industry benchmarks)
| Metric | Use |
|---|---|
| Hours to first coherent reply | Discipline |
| Number of FAQ lines changed | Scope of lie |
| Repeat yellow same quarter | Process failure |
Improve your process; do not publish these as marketing claims.
Common mistakes during yellow recovery (seven)
- Replying long essays before exporting — Wastes the 48-hour window.
- Fixing FAQ without git — Next week drifts again.
- Skipping cold install — Reviewer re-tests and fails again.
- Blaming reviewer — Even when wrong, evidence wins faster than tone.
- Parallel paid ads — Converts traffic to angry wishlist removals.
- Changing price to distract — Confuses regional worksheet story.
- Closing yellow without screenshot proof — Reopens under intake compression.
Pro tips (six)
- Keep incident folder beside release-evidence taxonomy.
- Tag git
incident/store-demo-mismatch-2026-05-18. - Run validate-packet before next partner zip.
- Add row to Friday Block 5 retro.
- If AI involved, sync AI disclosure challenge annex same day.
- Schedule metadata freeze challenge the next quiet week.
Stakeholder comms templates (internal)
Use short Slack posts so engineering does not drown in prose.
Engineering:
Yellow: store FAQ vs demo. Promotion frozen. Need BUILD_RECEIPT for branch
X+ cold install log by 18:00 UTC. No Steam copy edits without git PR.
Marketing:
Ads paused. Do not schedule influencer keys until producer posts
YELLOW-CLEARED. UTM rows for yesterday taggedincident-pause.
Producer:
Partner reply ETA 48h. Incident folder
partner-reply-2026-05-19/. Commander: (name). Next tabletop 45 minutes before send.
Risk matrix (what makes 48 hours impossible)
| Risk | Symptom | Mitigation |
|---|---|---|
| No git baseline | Cannot diff | Export live as untrusted baseline hour zero |
| No BUILD_RECEIPT | Cannot prove id | Create receipt retroactively with honest timestamp note |
| Contractor-only Steam access | Cannot export | Escalate access same day |
| Legal hold on communication | Cannot email | Use counsel-approved template only |
| Demo requires cert rebuild | >48h engineering | Email ETA with FAQ interim fix |
If two or more rows are red, extend timeline in first reply instead of missing the 48-hour acknowledgment.
Publisher vs platform reviewer (tone shift)
| Reviewer type | What they weight |
|---|---|
| Publisher diligence | Longitudinal evidence, diligence packet shape |
| Platform cert-style | Cold install, depot parity, Deck refresh adjacent |
Same artifacts mostly work—swap cover email emphasis: publishers want process narrative; platform reviewers want repro steps.
Week-after review (30 minutes)
| Question | Owner |
|---|---|
| Did yellow reopen? | Producer |
| Did Wednesday diffs run green since? | Writer |
| Did any UTM campaign resume early? | Marketing |
| Did validate-packet pass on latest zip? | Engineer |
Document answers in operating review Block 2 if you use that cadence.
Link to crash and stability work
Sometimes yellow follows a crash spike—players rage-review “scam” when the game crashes before reading honest FAQ. If crashes rose same week, cross-link 5-day crash-log challenge and fix stability before claiming FAQ-only root cause.
Two-storefront note
If you maintain two storefront rule secondary pages, verify FAQ parity there too—yellow on Steam while itch FAQ still lies extends partner distrust across channels.
Glossary (the dialect in your inbox)
| Term | Plain meaning |
|---|---|
| Yellow | Clarify before green—ship-blocking for your process until evidence lands, even when tone stays friendly. |
| Green | Reviewer accepts your packet or closes the thread without new blocking questions on the same dimension. |
| BUILD_RECEIPT | Small JSON (or equivalent) that names build_id, depot ids, and promotion time—see BUILD_RECEIPT tutorial. |
| Cold install | Install from the same surface a partner uses (Steam client, clean OS user, no dev overrides) and play the first session without cheats. |
| G1–G6 | Field groups from the Wednesday ritual—use them as export headings so diffs stay comparable week to week. |
| Evidence packet | Zip + README bullets that answer “what changed, what is live now, how to reproduce”—not a narrative PDF essay. |
If your team invents new vocabulary in Slack (“amber-soft,” “yellow-lite”), translate it back to yellow vs green before you email anyone outside the company—external partners do not read your emoji taxonomy.
Incident folder layout (copy this tree)
Treat incident/store-demo-mismatch-YYYY-MM-DD/ as disposable but complete. A layout that survived multiple 2026 recoveries:
00-reviewer-paste.png— screenshot of their email or portal comment.live-hour0-g1g3.txtthroughlive-hourNN-g1g3.txt— exports after each Steam paste.approved/faq-main.md— git-truth source before Steam.diffs/hour0-vs-hourNN.patch— textual proof the lie left the page.binary/BUILD_RECEIPT.jsonandbinary/upload_log.csv— branch truth.binary/cold-install-NN.log— checklist ticks with timestamps.partner-reply/— final zip, README, and the exact email body you sent.
Why the tree matters: under intake compression, reviewers forward folders internally. A producer who can grep your folder wins against a producer who can only scroll a Google Doc.
War-room behaviors that extend yellows (eight)
- Debating intent in Zoom instead of exporting Steam—adds zero bytes of evidence.
- Letting marketing “tweak” capsule copy while FAQ is still wrong—creates a second mismatch class the next week.
- Assigning two engineers to hotfix without one incident commander—duplicate promotions happen.
- Treating Discord chatter as proof that “players understand it is single-player”—partners do not accept Discord as a source.
- Cherry-picking a friendly streamer clip as demo truth—clips are marketing; cold install is evidence.
- Refusing to tag git because “we are messy”—messy repos still need a tag pointing at the FAQ commit you pasted.
- Answering before tabletop because the deadline is midnight—midnight sends correlate with reopened yellows.
- Mixing refund policy language into a technical evidence reply—keep commerce sentences out unless counsel approved them.
If three or more items fire simultaneously, declare a communication freeze: only the incident commander may email external parties until hour eight.
When G5 commercial fields cause “phantom co-op”
Sometimes the FAQ is honest while G5 commercial metadata still implies bundles, editions, or cross-play you do not ship in the demo. Walk the Wednesday ritual G5 row alongside FAQ, then cross-check DLC anchor worksheet bullets if cosmetics or deluxe language sits near multiplayer claims. If price visibility notes mention platforms or modes, treat them like FAQ sentences—diff them against the demo binary the same hour you export text.
Manifest crossover (when the yellow is “wrong zip,” not wrong English)
If the reviewer’s paste includes a file hash or size mismatch, pivot partially into SHA256 manifest cold validation drill posture: your English can be perfect while the artifact is still wrong. Keep the store-demo narrative in the email body, but attach manifest rows beside FAQ diffs so the partner does not split your thread into two slow conversations.
Early Access and “full game” language
Early Access pages tempt writers to describe the roadmap as if it were the demo. Keep roadmap sentences in a clearly labeled FAQ block, and ensure the first paragraph a new player reads matches hour-one gameplay. If you are unsure, run the wishlist truth audit week framing on a quiet week—not during the yellow window unless you can staff it.
Revenue-model honesty (why mismatches hurt money people)
Mismatched monetization stories erode the same trust as technical lies. If your store still references a live-service season pass while the demo is a premium slice, diligence teams read that as governance risk, not copy risk. Fix commerce language in the same diff window as FAQ when both drifted together.
Escalation (non-legal guidance)
Escalate to counsel only when the yellow touches refunds, platform policy disputes, or regional pricing promises you cannot verify from Steamworks exports. For ordinary FAQ-vs-demo mismatches, your fastest path remains engineering receipts plus writer diffs—see Steamworks documentation for the authoritative surface definitions when you need to cite first-party wording.
Snippet-friendly answers
What is a store-demo mismatch yellow?
A reviewer flags Steam store text (often FAQ or short description) that does not match the demo binary they installed—requires evidence-backed clarification, not opinion.
Is 48 hours always enough?
Often for FAQ-only drift; not if demo scope must change—communicate realistic date.
Do we need lawyers?
Not for typical yellow—still avoid promising refunds or platform policies you cannot defend; link official policy pages.
Key takeaways
- Yellow on store vs demo is evidence fixable in many micro-studio cases within 48 hours when FAQ drifted alone.
- Hour zero: freeze promotion and ads; export live store text.
- Hours 2–8: prove binary
build_idand cold install. - Hours 8–24: choose branch A/B/C fix path; update git-approved FAQ before Steam paste.
- Hours 24–48: assemble zip + human sign-off + short root-cause paragraph.
- Prevention beats heroics: Wednesday diffs + metadata freeze week.
- Same tone as hash recovery: timelines, receipts, no invented metrics.
- Pair with diligence packets expectations.
- Fest month repeats this class—install habits before October spikes.
FAQ
What if reviewer cited wrong build id?
Prove live id at timestamp with upload log and Steamworks history screenshot—politely attach; do not accuse.
What if mismatch was translator error?
Same timeline; add localized FAQ diff per localization QA tools.
Does this replace truth audit week?
No—truth audit prevents; this case recovers after prevention failed.
What if yellow references trailer frame, not FAQ?
Attach screenshot safe zones audit + replace still or re-cut trailer chapter; still freeze ads until visual parity matches demo.
What if yellow is only on press kit PDF, not Steam?
Same evidence pattern—export PDF text, diff, fix, re-upload; link press kit version hash in reply.
What if we use prompt registry freeze for runtime AI?
Attach registry diff proving runtime prompts unchanged—separate from FAQ marketing drift but reviewers may ask both in one thread.
Vertical slice alignment
If fest month is imminent, map recovery work onto vertical slice challenge Day 4 store parity—yellow recovery is essentially a compressed Day 4 with legal-ish urgency.
Mock audit rehearsal value
Run one mock audit tabletop dimension-7 drill the week after recovery—cheap insurance that new copy still fails loudly in your conference room before it fails in a partner inbox again.
Evidence cycles vs weekly patch theater
Evidence cycles opinion applies here: do not “patch” FAQ daily without governance—batch FAQ changes through FAQ pipeline weekly unless yellow forces emergency.
Conclusion
Store-demo mismatch yellows feel personal—they are inventory errors between words and bytes. The 48-hour pattern is boring: freeze, export, diff, fix, cold install, zip, send. Boring is how you get back to green without burning the relationship.
Run your next promotion only after Wednesday log GREEN. Let this case study be the last time co-op lives in the FAQ without co-op in the demo.
If you finish recovery Friday afternoon, still block weekend promotion until Monday tabletop—fatigue reintroduces the same class of typo that opened the yellow.