Case Studies & Experiments May 20, 2026

We Recovered From a Store-Demo Mismatch Yellow in 48 Hours - Evidence Timeline for Partner Review (2026 Case Study)

2026 case study—recover from store vs demo mismatch yellow in 48 hours with evidence timeline, metadata diffs, and partner review prep for micro-studios.

By GamineAI Team

We Recovered From a Store-Demo Mismatch Yellow in 48 Hours - Evidence Timeline for Partner Review (2026 Case Study)

Pixel-art hero illustration for a 2026 store versus demo mismatch yellow recovery timeline — evidence-first partner review pattern for micro-studios

The email subject line was polite: Yellow — please clarify store vs demo scope before next diligence pass. Inside, a reviewer pasted your Steam FAQ line about co-op next to a single-player fest demo build id that did not match the short description’s implied scope. No invented studio names here—this Case Studies & Experiments write-up is a field-style recovery timeline you can copy: what we did in the first 48 hours, which artifacts we produced, and which habits we kept so the mismatch class does not reopen during October 2026 Next Fest traffic.

This is not legal advice. It is evidence discipline aligned with wishlist truth audit, Wednesday metadata diff ritual, 5-day metadata freeze challenge, and BUILD_RECEIPT culture.

Who this is for and what you get

Audience You will be able to…
Producer owning partner comms Ship a 48-hour response packet without panic threads
Engineer on call Tie build_id, depot ids, and store exports to one folder
Solo dev Freeze promotion and fix FAQ before social amplification

Time to read: ~25 minutes. Time to execute pattern: ~48 hours wall-clock with two focused people rotating.

Why this matters now (May 2026)

  1. Partner dimension 7Mock audit tabletop already scores FAQ vs demo; real reviewers do the same.
  2. Intake compression72-hour windows mean yellows stack; fast evidence beats long apologies.
  3. Metadata tooling exists14 free metadata diff tools remove the excuse “we could not diff.”
  4. LLM FAQ riskFAQ LLM pipeline failures show up as store-demo mismatch, not model bugs.
  5. Fest branch velocityDemo depot playbook promotes builds faster than copy updates.

Direct answer: Hour zero — freeze promotion and ads; hour four — export live store + capture build_id; hour forty-eight — attach diff + fixed FAQ + cold install log to partner reply.

Ground rules (no invented facts)

  • No named studios, round funding, or fabricated review quotes.
  • Timeline is pattern composite from multiple micro-team recoveries in Q1–Q2 2026 partner and cert-adjacent intakes—use as a checklist, not a guarantee your reviewer uses identical wording.
  • Yellow means “clarify before green,” not lawsuit—still treat as ship-blocking until resolved.

Hour 0–2 — Freeze and snapshot

Minute block Action Artifact
0–15 Reply email: “Acknowledged; promotion and paid traffic paused until resolved.” Sent mail id in incident-2026-05-18/timeline.md
15–45 Screenshot reviewer’s pasted FAQ + their cited build_id 00-reviewer-paste.png
45–120 Export live Steam store text (G1–G3) per Wednesday ritual field groups live-hour0-g1g3.txt

Promotion freeze: Same rule as fest marketing cap—no spend bump while yellow is open.

Hour 2–8 — Binary truth

Task Owner Pass criterion
Confirm which Steam branch players receive Engineer Matches fest-demo-depot-audit-2026.md
Pull BUILD_RECEIPT for that branch Engineer build_id equals reviewer’s or explain delta
Cold install on clean account Engineer + QA First-hour gameplay matches FAQ claims
Hash demo zip if partner packet expects it Engineer Cold-hash challenge discipline

If binary truth disagrees with reviewer’s cited id, prove which id was live at email timestamp with upload log row—do not argue from memory.

Hour 8–24 — Copy fix path

Branch A — FAQ wrong, demo right: Delete co-op line; update FAQ approved files; run Gate D; paste Steam; re-export live-hour20-g3.txt; diff empty vs approved.

Branch B — Demo wrong, FAQ right: Rare but happens after accidental promotion—rollback branch per demo depot playbook; update BUILD_RECEIPT; re-cold install.

Branch C — both wrong: Fix FAQ first (fast), then schedule demo build fix—yellow may stay open until demo ships; communicate timeline honestly.

Hour 8–16 — Deep binary audit (when yellow is not “just FAQ”)

Some yellows arrive as FAQ text but the underlying issue is branch drift—the reviewer installed a build from a branch your internal Slack still calls “current fest,” while Steam actually served an older depot. Use this sub-timeline when the first cold install disagrees with the reviewer’s notes.

Hour block Action Artifact
8–10 List every Steam branch visible in Steamworks; screenshot branches-08.png
10–12 Compare each branch’s last promotion timestamp to upload_log.csv branch-promotion-table.md
12–14 If drift found, document which branch was intended vs live delta-branch.txt
14–16 Roll forward or rollback promotion per demo depot playbook; re-cold install cold-install-16.log

Stop rule: If you cannot explain branch drift in two paragraphs, do not email the reviewer yet—ambiguous replies extend yellow into a second cycle under intake compression.

Hour 16–24 — Copy fix path (expanded decision tree)

The earlier Branch A/B/C table is the headline. Below is the expanded decision tree teams actually use when tired.

Branch A — FAQ wrong, demo right (most common)

  1. Open faq-approved/ in git; if missing, create from live export as untrusted baseline then edit down.
  2. Delete or narrow every sentence that implies multiplayer, full-game scope, or platforms not in the demo zip.
  3. Run FAQ LLM pipeline Gate D even if humans wrote the text—yellow is proof drift happened without governance.
  4. Paste to Steamworks; export live-postfix-g3.txt; diff must be empty vs approved.
  5. Add faq-promotion-log.md row with reviewer ticket id in notes column.

Narrative risk: Marketing wants to “soften” language instead of deleting lies—reject. Softening keeps yellow semantics alive.

Branch B — Demo wrong, FAQ right (less common)

  1. Confirm FAQ text never promised the broken feature—attach diff proving FAQ was honest at hour zero.
  2. Fix demo binary or rollback promotion to last known-good build_id.
  3. Update BUILD_RECEIPT; re-run validate-packet exit 0 before telling reviewer “fixed.”
  4. If rollback affects fest schedule, say so with new ETA—silent re-promote is how trust dies twice.

Branch C — both wrong (worst case, still survivable)

  1. Ship FAQ fix first within 24 hours to stop new players reading lies.
  2. Open second ticket internally for demo scope bug with owner + date.
  3. Email reviewer: “FAQ corrected same day; demo scope fix ETA date with interim build id x.”
  4. Never promise same-day demo fix if engineering has not signed it.

Hour 24–36 — Evidence packet assembly

Folder partner-reply-2026-05-19/:

  1. timeline.md — hour-by-hour bullets
  2. live-hour0 vs live-hour30 diffs
  3. BUILD_RECEIPT.json + upload_log.csv excerpt
  4. cold-install-checklist-signed.pdf or markdown sign-off
  5. faq-promotion-log.md rows for changed questions
  6. Link to release-evidence/ path reviewer can grep

Zip with partner ZIP naming README bullet map—bullet 3 lists depot id.

Hour 36–48 — Human review and send

Step Action
36–42 Producer reads live Steam FAQ aloud; engineer says pass/fail per line vs demo
42–46 Second human signs high-risk rows (AI, multiplayer, price)
46–48 Email partner with zip link + one-paragraph root cause + prevention

Prevention paragraph template:

Root cause: FAQ drifted after build_id promotion without Wednesday metadata diff. Prevention: promotion gate requires weekly diff log + 5-day metadata freeze before fest month; FAQ changes route through human diff pipeline.

Tabletop rehearsal (30 minutes, before send)

  1. Producer reads final FAQ aloud.
  2. Engineer runs demo first hour on clean machine without speaking.
  3. Writer marks any spoken sentence not experienced in hour one.
  4. Marketer checks 32px icon and first screenshot still match post-FAQ story.
  5. Sign tabletop-signoff.md.

Skipping tabletop is how “we fixed FAQ” ships with capsule text still promising co-op.

Distributed team note

Asia-EU handoff teams: assign single incident commander in UTC window covering both reviewer business hours and your promotion window. Handoffs without commander produce duplicate Steam edits—worse than the original yellow.

Post-send stabilization (hour 48–72)

Yellow cleared does not mean habits fixed. Schedule:

Task When
Wednesday diff Next calendar occurrence
5-day metadata freeze Within two weeks if fest prep active
UTM experiment review Pause corrupted rows from yellow window
Fest marketing cap Reconcile spend paused during incident

Wishlist monitoring: If yellow coincided with a traffic spike, scan for one-star “misleading” reviews—respond with factual correction link only, not argument.

What we changed after recovery (process deltas)

Before After
Promote branch when CI green Promote only when Wednesday log GREEN
FAQ edits in Steamworks only FAQ edits in git faq-approved/ first
No cold install on copy change Cold install when G3 or G6 changes
Single owner for store Writer + producer two-human on risk rows

Comparison to hash mismatch case study

Hash mismatch recovery is artifact integrity. This case is marketing vs binary parity—different failure class, same evidence tone: timelines, receipts, diffs, no drama adjectives.

Metrics we tracked internally (not industry benchmarks)

Metric Use
Hours to first coherent reply Discipline
Number of FAQ lines changed Scope of lie
Repeat yellow same quarter Process failure

Improve your process; do not publish these as marketing claims.

Common mistakes during yellow recovery (seven)

  1. Replying long essays before exporting — Wastes the 48-hour window.
  2. Fixing FAQ without git — Next week drifts again.
  3. Skipping cold install — Reviewer re-tests and fails again.
  4. Blaming reviewer — Even when wrong, evidence wins faster than tone.
  5. Parallel paid ads — Converts traffic to angry wishlist removals.
  6. Changing price to distract — Confuses regional worksheet story.
  7. Closing yellow without screenshot proof — Reopens under intake compression.

Pro tips (six)

  1. Keep incident folder beside release-evidence taxonomy.
  2. Tag git incident/store-demo-mismatch-2026-05-18.
  3. Run validate-packet before next partner zip.
  4. Add row to Friday Block 5 retro.
  5. If AI involved, sync AI disclosure challenge annex same day.
  6. Schedule metadata freeze challenge the next quiet week.

Stakeholder comms templates (internal)

Use short Slack posts so engineering does not drown in prose.

Engineering:

Yellow: store FAQ vs demo. Promotion frozen. Need BUILD_RECEIPT for branch X + cold install log by 18:00 UTC. No Steam copy edits without git PR.

Marketing:

Ads paused. Do not schedule influencer keys until producer posts YELLOW-CLEARED. UTM rows for yesterday tagged incident-pause.

Producer:

Partner reply ETA 48h. Incident folder partner-reply-2026-05-19/. Commander: (name). Next tabletop 45 minutes before send.

Risk matrix (what makes 48 hours impossible)

Risk Symptom Mitigation
No git baseline Cannot diff Export live as untrusted baseline hour zero
No BUILD_RECEIPT Cannot prove id Create receipt retroactively with honest timestamp note
Contractor-only Steam access Cannot export Escalate access same day
Legal hold on communication Cannot email Use counsel-approved template only
Demo requires cert rebuild >48h engineering Email ETA with FAQ interim fix

If two or more rows are red, extend timeline in first reply instead of missing the 48-hour acknowledgment.

Publisher vs platform reviewer (tone shift)

Reviewer type What they weight
Publisher diligence Longitudinal evidence, diligence packet shape
Platform cert-style Cold install, depot parity, Deck refresh adjacent

Same artifacts mostly work—swap cover email emphasis: publishers want process narrative; platform reviewers want repro steps.

Week-after review (30 minutes)

Question Owner
Did yellow reopen? Producer
Did Wednesday diffs run green since? Writer
Did any UTM campaign resume early? Marketing
Did validate-packet pass on latest zip? Engineer

Document answers in operating review Block 2 if you use that cadence.

Link to crash and stability work

Sometimes yellow follows a crash spike—players rage-review “scam” when the game crashes before reading honest FAQ. If crashes rose same week, cross-link 5-day crash-log challenge and fix stability before claiming FAQ-only root cause.

Two-storefront note

If you maintain two storefront rule secondary pages, verify FAQ parity there too—yellow on Steam while itch FAQ still lies extends partner distrust across channels.

Glossary (the dialect in your inbox)

Term Plain meaning
Yellow Clarify before green—ship-blocking for your process until evidence lands, even when tone stays friendly.
Green Reviewer accepts your packet or closes the thread without new blocking questions on the same dimension.
BUILD_RECEIPT Small JSON (or equivalent) that names build_id, depot ids, and promotion time—see BUILD_RECEIPT tutorial.
Cold install Install from the same surface a partner uses (Steam client, clean OS user, no dev overrides) and play the first session without cheats.
G1–G6 Field groups from the Wednesday ritual—use them as export headings so diffs stay comparable week to week.
Evidence packet Zip + README bullets that answer “what changed, what is live now, how to reproduce”—not a narrative PDF essay.

If your team invents new vocabulary in Slack (“amber-soft,” “yellow-lite”), translate it back to yellow vs green before you email anyone outside the company—external partners do not read your emoji taxonomy.

Incident folder layout (copy this tree)

Treat incident/store-demo-mismatch-YYYY-MM-DD/ as disposable but complete. A layout that survived multiple 2026 recoveries:

  • 00-reviewer-paste.png — screenshot of their email or portal comment.
  • live-hour0-g1g3.txt through live-hourNN-g1g3.txt — exports after each Steam paste.
  • approved/faq-main.md — git-truth source before Steam.
  • diffs/hour0-vs-hourNN.patch — textual proof the lie left the page.
  • binary/BUILD_RECEIPT.json and binary/upload_log.csv — branch truth.
  • binary/cold-install-NN.log — checklist ticks with timestamps.
  • partner-reply/ — final zip, README, and the exact email body you sent.

Why the tree matters: under intake compression, reviewers forward folders internally. A producer who can grep your folder wins against a producer who can only scroll a Google Doc.

War-room behaviors that extend yellows (eight)

  1. Debating intent in Zoom instead of exporting Steam—adds zero bytes of evidence.
  2. Letting marketing “tweak” capsule copy while FAQ is still wrong—creates a second mismatch class the next week.
  3. Assigning two engineers to hotfix without one incident commander—duplicate promotions happen.
  4. Treating Discord chatter as proof that “players understand it is single-player”—partners do not accept Discord as a source.
  5. Cherry-picking a friendly streamer clip as demo truth—clips are marketing; cold install is evidence.
  6. Refusing to tag git because “we are messy”—messy repos still need a tag pointing at the FAQ commit you pasted.
  7. Answering before tabletop because the deadline is midnight—midnight sends correlate with reopened yellows.
  8. Mixing refund policy language into a technical evidence reply—keep commerce sentences out unless counsel approved them.

If three or more items fire simultaneously, declare a communication freeze: only the incident commander may email external parties until hour eight.

When G5 commercial fields cause “phantom co-op”

Sometimes the FAQ is honest while G5 commercial metadata still implies bundles, editions, or cross-play you do not ship in the demo. Walk the Wednesday ritual G5 row alongside FAQ, then cross-check DLC anchor worksheet bullets if cosmetics or deluxe language sits near multiplayer claims. If price visibility notes mention platforms or modes, treat them like FAQ sentences—diff them against the demo binary the same hour you export text.

Manifest crossover (when the yellow is “wrong zip,” not wrong English)

If the reviewer’s paste includes a file hash or size mismatch, pivot partially into SHA256 manifest cold validation drill posture: your English can be perfect while the artifact is still wrong. Keep the store-demo narrative in the email body, but attach manifest rows beside FAQ diffs so the partner does not split your thread into two slow conversations.

Early Access and “full game” language

Early Access pages tempt writers to describe the roadmap as if it were the demo. Keep roadmap sentences in a clearly labeled FAQ block, and ensure the first paragraph a new player reads matches hour-one gameplay. If you are unsure, run the wishlist truth audit week framing on a quiet week—not during the yellow window unless you can staff it.

Revenue-model honesty (why mismatches hurt money people)

Mismatched monetization stories erode the same trust as technical lies. If your store still references a live-service season pass while the demo is a premium slice, diligence teams read that as governance risk, not copy risk. Fix commerce language in the same diff window as FAQ when both drifted together.

Escalation (non-legal guidance)

Escalate to counsel only when the yellow touches refunds, platform policy disputes, or regional pricing promises you cannot verify from Steamworks exports. For ordinary FAQ-vs-demo mismatches, your fastest path remains engineering receipts plus writer diffs—see Steamworks documentation for the authoritative surface definitions when you need to cite first-party wording.

Snippet-friendly answers

What is a store-demo mismatch yellow?
A reviewer flags Steam store text (often FAQ or short description) that does not match the demo binary they installed—requires evidence-backed clarification, not opinion.

Is 48 hours always enough?
Often for FAQ-only drift; not if demo scope must change—communicate realistic date.

Do we need lawyers?
Not for typical yellow—still avoid promising refunds or platform policies you cannot defend; link official policy pages.

Key takeaways

  • Yellow on store vs demo is evidence fixable in many micro-studio cases within 48 hours when FAQ drifted alone.
  • Hour zero: freeze promotion and ads; export live store text.
  • Hours 2–8: prove binary build_id and cold install.
  • Hours 8–24: choose branch A/B/C fix path; update git-approved FAQ before Steam paste.
  • Hours 24–48: assemble zip + human sign-off + short root-cause paragraph.
  • Prevention beats heroics: Wednesday diffs + metadata freeze week.
  • Same tone as hash recovery: timelines, receipts, no invented metrics.
  • Pair with diligence packets expectations.
  • Fest month repeats this class—install habits before October spikes.

FAQ

What if reviewer cited wrong build id?
Prove live id at timestamp with upload log and Steamworks history screenshot—politely attach; do not accuse.

What if mismatch was translator error?
Same timeline; add localized FAQ diff per localization QA tools.

Does this replace truth audit week?
No—truth audit prevents; this case recovers after prevention failed.

What if yellow references trailer frame, not FAQ?
Attach screenshot safe zones audit + replace still or re-cut trailer chapter; still freeze ads until visual parity matches demo.

What if yellow is only on press kit PDF, not Steam?
Same evidence pattern—export PDF text, diff, fix, re-upload; link press kit version hash in reply.

What if we use prompt registry freeze for runtime AI?
Attach registry diff proving runtime prompts unchanged—separate from FAQ marketing drift but reviewers may ask both in one thread.

Vertical slice alignment

If fest month is imminent, map recovery work onto vertical slice challenge Day 4 store parity—yellow recovery is essentially a compressed Day 4 with legal-ish urgency.

Mock audit rehearsal value

Run one mock audit tabletop dimension-7 drill the week after recovery—cheap insurance that new copy still fails loudly in your conference room before it fails in a partner inbox again.

Evidence cycles vs weekly patch theater

Evidence cycles opinion applies here: do not “patch” FAQ daily without governance—batch FAQ changes through FAQ pipeline weekly unless yellow forces emergency.

Conclusion

Store-demo mismatch yellows feel personal—they are inventory errors between words and bytes. The 48-hour pattern is boring: freeze, export, diff, fix, cold install, zip, send. Boring is how you get back to green without burning the relationship.

Run your next promotion only after Wednesday log GREEN. Let this case study be the last time co-op lives in the FAQ without co-op in the demo.

If you finish recovery Friday afternoon, still block weekend promotion until Monday tabletop—fatigue reintroduces the same class of typo that opened the yellow.