Q3 2026 Cert Intake Mock Audit Tabletop - Seven Dimensions Before Partner Upload

Your release-evidence folder looks complete until a partner opens it and asks why leadership rollups disagree with partner annex totals from the same build. That question is not a missing screenshot—it is a scoring dimension you never rehearsed.
Q3 2026 stacks Gamescom-adjacent reviews, the Steam Deck Verified autumn refresh, and early Meta holiday intake inside one operational quarter. This Process & Workflow guide is a ninety-minute mock audit tabletop with seven weighted dimensions so micro-studios practice the whole packet stack before upload—not one checklist row at a time.
Why this matters now (May 2026)
- Intake density — Multiple submission cadences land in Q3; teams that only drill once per year arrive with April habits in August packets.
- Reviewer pattern shift — Partners increasingly ask for cross-artifact parity (dashboards, annexes, replay goldens, FAQ readbacks) in one zip—not isolated PDFs.
- Lesson stack integration — If you followed the AI RPG governance arc through Lesson 172, this blog is the studio-facing runbook; if not, you can still run the tabletop with the Q3 mock audit tooling resource.
- Fest pressure — October Next Fest rewards teams that freeze branches with proof, not optimism.
Direct answer: Schedule two mock audits per cert window—T-14 days (systemic fixes) and T-3 days (last-touch verification)—using the seven-dimension rubric below. Any dimension below pass threshold opens a deficiency ticket; no real upload until tickets close or you explicitly waive with signer approval.
Who this is for, time, and outcome
| Who | Solo leads and 2–4 person teams shipping PC, Deck, or XR with partner intake |
| Time | 90 minutes tabletop + 30 minutes ticket filing |
| Outcome | Single mock score, deficiency log, and updated release-evidence/ rows |
Prerequisites: Staging packet assembled (not production guesses). Read Friday Block 5 maintenance so folders are current.
Beginner quick start
- Copy the packet manifest (below) into a dated folder
release-evidence/06-governance/mock-audit-2026-05-16/. - Assign four roles: Facilitator, Evidence clerk, Reviewer A, Reviewer B.
- Run dimensions 1–7 in order; do not skip to “easy” PNG checks first.
- File one ticket per red dimension with owner + due date.
- Re-run only failed dimensions at T-3.
Success check: Facilitator can read the total weighted score aloud and every red dimension has a ticket URL.
The packet manifest (assemble before the room)
Mirror what partners actually open:
- Leadership SLA rollup export (CSV + PDF)
- Partner annex export (same UTC window)
- Carve-out / freeze-bypass audit column sample
- Footer metadata semver row + replay-parser allow-list excerpt
- Weekly reconciliation job log (last two runs)
- Synthetic replay diff gate CI summary (last green + last intentional fail screenshot)
- Staffing buffer burn-down dashboard slice (leadership-only)
- FAQ readback packet or store-page truth audit excerpt
- Build manifest hash + smoke video path list
Store hashes in manifest.json so the tabletop debates content, not “which file is latest.”
Tabletop roles (ninety minutes)
| Minute block | Activity |
|---|---|
| 0–10 | Facilitator reads build ID, window, and non-goals |
| 10–75 | Score dimensions 1–7 (8–9 min each) |
| 75–85 | Weighted total + pass/fail line |
| 85–90 | Deficiency ticket blast + T-3 calendar hold |
Pro tip: Ban laptops except the evidence clerk machine. Everyone else uses printed score sheets or a shared doc—prevents silent “fixing” during the drill.
Seven dimensions and pass thresholds
Weights sum to 100. Adjust only if your partner contract names different priorities—document changes in README.
Dimension 1 - Leadership vs partner SLA parity (weight 20)
Question: Do leadership and partner extracts share metric dictionary IDs, UTC bounds, and tuple revision?
Pass: Numeric deltas within your published epsilon policy (or governance replay gate); bypass columns present on both slices.
Fail tags: rollup_mismatch, utc_skew, missing_bypass_column
Tools: csv diff discipline, OpenTelemetry exemplar IDs cited in annex footnotes.
Dimension 2 - Freeze and carve-out lineage (weight 15)
Question: Does every carve-out row cite parent freeze ID and lift decision ID?
Pass: Annex footers explain numeric deltas; no “verbal bypass” without audit ID.
Fail tags: orphan_lift, unsigned_carve_out
Cross-read: 7-day RC freeze challenge, freeze-lift resource list.
Dimension 3 - Footer semver and replay parser contract (weight 15)
Question: Do footer schema_version bumps match replay-parser allow-list and promoted build?
Pass: Mixed-archive migration note exists if any breaking field shipped in last 30 days.
Fail tags: semver_drift, parser_reject_unknown_field
Dimension 4 - Weekly reconciliation discipline (weight 15)
Question: Did the last two weekly jobs run and produce signer acknowledgments?
Pass: Skipped-run policy documented; red state not silently ignored.
Fail tags: skipped_reconciliation, stale_ack
Dimension 5 - Synthetic replay diff gate (weight 15)
Question: Do frozen goldens match promoted exports within epsilon?
Pass: Last CI run artifact linked in evidence folder; intentional fails documented.
Fail tags: golden_drift, missing_ci_artifact
Dimension 6 - Staffing buffer visibility (weight 10)
Question: Does leadership-only slice show buffer burn-down without hiding partner totals?
Pass: Chart legend states audience; no duplicate metric names across folders.
Fail tags: buffer_hidden, metric_name_collision
Dimension 7 - FAQ and store truth readback (weight 10)
Question: Do FAQ bullets match demo build and AI disclosure?
Pass: Human-gated patch notes reference only shipped changes.
Fail tags: faq_drift, ai_disclosure_gap
Scoring sheet (copy into your repo)
Dimension | Weight | Score 0-5 | Weighted | Pass? (>=3)
1 SLA parity | 20 | | |
2 Freeze lineage | 15 | | |
3 Footer semver | 15 | | |
4 Reconciliation | 15 | | |
5 Replay diff | 15 | | |
6 Staffing buffer | 10 | | |
7 FAQ readback | 10 | | |
TOTAL | 100 | | |
Pass line: Total >= 80 and no dimension below 3.
Waivers: Signer email in waivers.md with expiry—partners treat unsigned waivers as failures.
T-14 vs T-3 drills
| Drill | When | Goal |
|---|---|---|
| T-14 | Two weeks before intake | Fix systemic reds (dictionary, UTC, semver) |
| T-3 | Three days before | Verify last-touch greens; no new scope |
Do not substitute T-3 for T-14. Teams that only T-3 discover schema failures too late for clean builds.
Deficiency ticket template
## DEF-2026-05-16-03
Dimension: 1 SLA parity
Tag: rollup_mismatch
Build: rc-2026-05-14
Owner: @name
Due: 2026-05-18
Evidence: release-evidence/06-governance/mock-audit-2026-05-16/dim1/
Acceptance: Partner + leadership totals within epsilon doc; bypass cols match
Link tickets in operating review Block 3 so governance work is visible beside art tasks.
What partner reviewers actually ask in Q3 2026 (plain language)
After reading intake feedback patterns across PC, handheld, and XR lanes in 2026, the questions cluster into six themes—your seven dimensions map to them deliberately:
- “Why does this number differ from that number?” — Dimension 1 and 4.
- “Who approved the exception?” — Dimension 2.
- “Will old packets still parse?” — Dimension 3.
- “Prove automation ran.” — Dimensions 4 and 5.
- “Are you staffed for the lane you promised?” — Dimension 6.
- “Does the store page lie?” — Dimension 7.
A mock audit that only checks video files and crash logs misses half the intake failures indie teams report in Discord support channels. The tabletop exists to make those themes scorable before someone else scores you.
Facilitator scripts per dimension (copy aloud)
Dimension 1 script (8 minutes)
“Evidence clerk, open leadership rollup and partner annex for build
RC-###with UTC window2026-05-01T00:00:00Zthrough2026-05-14T23:59:59Z. Reviewer A, read metric IDs aloud for rowactive_players_7d. Reviewer B, confirm the same ID exists in partner annex with identical definition footnote. If either side uses a different label, stop—we score red. If numeric delta exceeds epsilon doc, red. If bypass column missing on either slice, red. Otherwise green.”
Do not debate why the game design changed—only whether exports agree.
Dimension 2 script
“List every carve-out row in the annex. For each, show parent freeze ID and lift decision ID in the evidence folder. If any row says ‘hotfix’ without ID, red. If signer name missing, red.”
Cross-link to rollup mismatch help patterns when reviewers simulate disagreement between slices.
Dimension 3 script
“Read footer
schema_versionfrom promoted build metadata. Open replay-parser allow-list file at same git SHA. Any field in build not in allow-list is red unless waiver signed.”
Teams upgrading Unity 6.6 LTS mid-quarter trigger this dimension most often.
Dimension 4 script
“Show last two weekly reconciliation job logs with
job_run_id. If either week skipped, show skip policy and signer acknowledgment. Two consecutive skips without red governance state is automatic red.”
Dimension 5 script
“Open last CI synthetic replay summary. Confirm golden hashes match
manifest.json. If last run failed intentionally, show ticket proving failure was expected test—not accidental drift.”
Pair tooling with save fuzz resource when saves feed governance CSVs.
Dimension 6 script
“Open leadership-only staffing chart. Confirm legend says audience. Confirm no metric name duplicates partner annex series names.”
Dimension 7 script
“Read three FAQ bullets aloud. Evidence clerk launches demo build. Reviewer A marks each bullet true/false. Any false is red. Check AI disclosure checklist row for same build ID.”
Ninety-minute agenda (minute-by-minute)
| Min | Owner | Action |
|---|---|---|
| 0–5 | Facilitator | Read build, scope, pass line |
| 5–8 | Clerk | Verify manifest hashes |
| 8–23 | All | Dimension 1 |
| 23–31 | All | Dimension 2 |
| 31–39 | All | Dimension 3 |
| 39–47 | All | Dimension 4 |
| 47–55 | All | Dimension 5 |
| 55–61 | All | Dimension 6 |
| 61–67 | All | Dimension 7 |
| 67–72 | Facilitator | Compute weighted score |
| 72–80 | All | Agree reds; assign owners |
| 80–90 | Clerk | File tickets; schedule T-3 |
Buffer rule: If dimension 1 goes red at minute 20, stop—remaining dimensions are informational only until dimension 1 is fixed and re-run. Prevents fantasy scores.
Folder layout under release-evidence/06-governance/
06-governance/
mock-audit-2026-05-16/
manifest.json
dim1-sla-parity/
leadership.csv
partner.csv
epsilon-policy.md
dim2-freeze/
carve-out-rows.pdf
dim3-footer/
schema-version.txt
parser-allow-list.json
dim4-reconciliation/
week-2026-w19.log
week-2026-w20.log
dim5-replay/
ci-summary.html
dim6-staffing/
leadership-slice.png
dim7-faq/
faq-readback.md
score-sheet.md
deficiencies/
DEF-2026-05-16-01.md
This layout makes Friday maintenance a diff review instead of an archaeological dig.
Post-tabletop partner email (template)
Use when a publisher or platform partner asks for status before intake:
Subject: Mock audit complete - build RC-2026-05-14 - score 84/100
We completed an internal seven-dimension mock audit on 2026-05-16.
Total weighted score: 84/100 (pass line 80).
Open deficiencies: 2 (dimension 2 carve-out IDs, dimension 7 FAQ bullet 3).
Remediation ETA: 2026-05-18.
Evidence folder index: release-evidence/06-governance/mock-audit-2026-05-16/manifest.json
Honest scores build trust faster than “everything looks good” emails that collapse on first reviewer pass.
Cross-links for specialized lanes
| If your packet includes… | Also run… |
|---|---|
| LLM NPC features | Prompt registry freeze sprint |
| Save-heavy RPG | Save corruption test blog |
| Steam Deck slice | Deck case study |
| Browser demo | Truth audit challenge |
When to waive vs fix
Waivers are not failures if signed. Use waivers for:
- Known cosmetic FAQ wording lagging demo by 48 hours with patch scheduled.
- Partner-only metric deferred with documented roadmap.
Never waive:
- Rollup mismatches without epsilon justification.
- Missing bypass audit IDs.
- Parser allow-list drift on promoted build.
Unsigned waivers are how teams learn “we thought it was fine” from a rejection email.
Second tabletop muscle memory
Teams that run the tabletop twice in one cert window report:
- Dimension 1 time drops from 15 minutes to 6.
- Fewer arguments about file versions.
- Reviewers stop inventing new dimensions mid-meeting.
The first tabletop feels expensive. The second feels like insurance.
Engine-specific smoke (optional add-on, +30 min)
Not a scored dimension—run after tabletop if time:
- Unity 6.6 LTS: Revalidation checklist + Input preflight
- Godot 4.5: Web CI matrix receipts if browser demo ships in packet
- Deck: Verified autumn refresh replay hooks attached
Common mistakes
- Treating mock audit as marketing review — Capsule polish does not fix
rollup_mismatch. - Scoring without frozen manifest — Debating “latest export” burns the room.
- Single reviewer — One brain invents excuses; four roles surface gaps.
- Skipping dimension 7 — FAQ drift fails intake after technical greens.
- No T-3 — T-14 fixes regress before upload.
How this pairs with publisher diligence
Publisher diligence packets and milestone receipt habits reuse the same folder taxonomy. Running this tabletop before diligence calls prevents “send us another zip” loops that burn calendar.
Cold-open drill (five minutes)
Facilitator picks a random DEF-* ticket from last month and asks: “Show me the artifact that proves this is fixed in today’s build.” If the room hesitates, maintenance discipline slipped—return to Block 5 Friday ritual.
Key takeaways
- Q3 2026 intake rewards cross-artifact parity, not single PDF excellence.
- Run seven dimensions with explicit weights; total >= 80, no dimension < 3.
- Schedule T-14 for systemic fixes and T-3 for verification.
- Use four roles; one laptop as evidence clerk.
- Every red dimension becomes a deficiency ticket with acceptance criteria.
- Assemble a packet manifest with hashes before scoring.
- Pair with Lesson 172 and the Q3 tooling resource.
- FAQ and AI disclosure are scored, not optional narrative.
- Waivers require signer email, not chat agreements.
- Engine smoke is additive, not a substitute for governance dimensions.
FAQ
How is this different from the 7-day RC freeze challenge?
The RC freeze challenge builds daily gates during freeze week. This tabletop scores the assembled packet before partner upload—complementary, not duplicate.
Do solo developers need four people?
Use hats: facilitator + reviewer split across two sessions if needed. Minimum two distinct reviewers to reduce self-deception.
What if we fail dimension 1 only?
Do not upload. Fix dictionary and UTC, re-export both slices, rerun dimension 1 only, then full tabletop if anything touched semver or parser contracts.
Can we use spreadsheets instead of CSV exports?
Yes if partners accept them—still apply epsilon and bypass column rules. Document format in manifest.
Where does Deck Verified evidence go?
Under release-evidence/01-build/handheld/ with replay hook references from soft-lock replay article.
How long should deficiency tickets stay open?
Until acceptance criteria pass in staging build cited on ticket—close on evidence, not optimism.
Should we run mock audit before or after RC freeze?
Before partner upload, after you believe RC is feature-complete. Ideal sequence: 7-day RC freeze → mock audit tabletop → T-14 fixes → T-3 verification → upload.
What score should we target internally?
80 is pass; 85+ is healthy buffer for reviewer interpretation drift. Chasing 95 before first upload often delays shipping without reducing real risk.
How do we handle multiple SKUs (PC + Deck + XR)?
Run dimension 1–3 per SKU if partner annexes differ; run dimensions 4–7 once at studio level if shared governance. Document SKU scope in manifest targets[].
Does mock audit replace legal review of AI disclosure?
No. Dimension 7 checks presence and consistency of disclosure artifacts; counsel still owns regulatory wording.
What if our partner changes intake rules mid-quarter?
Re-run dimension 3 and 7 only if schema or FAQ requirements changed; full tabletop if rollup definitions change.
Can contractors attend?
Yes as reviewers if they can challenge evidence without defensiveness. Contractors should not be sole evidence clerk—knowledge transfer matters.
How does this relate to Lesson 172 homework?
Lesson 172 teaches the rubric; this article is the facilitation playbook for the same rubric in a live room. Do homework first if your team has never scored dimensions.
Red-team prompts (optional 15 minutes after tabletop)
Ask one reviewer to argue why intake should reject the packet. Common red-team hits:
- “Leadership CSV was exported Friday; partner annex Monday—different player cohorts.”
- “FAQ mentions co-op; demo is single-player only.”
- “Replay hash green but build ID in CI log does not match manifest.”
If red-team finds a gap scoring missed, add a process deficiency ticket to improve facilitator script—not to hide the gap.
Metrics to track across quarters
| Metric | Why |
|---|---|
| Minutes per dimension | Shows which discipline needs tooling |
| Open DEF count at upload | Should trend toward zero |
| Repeat reds same dimension | Signals systemic debt |
| T-3 pass rate | Measures whether T-14 fixes stuck |
Log these in your operating review Block 3 for one quarter—you will see whether governance investment pays off in fewer intake loops.
Remote and async tabletop variants
Distributed teams can run the same rubric without a single 90-minute Zoom block:
- Pre-read (async, 24h): Evidence clerk uploads manifest + folder zip; reviewers score dimensions privately on the worksheet.
- Sync (45m): Facilitator reads only disagreements and reds—do not re-walk greens.
- Close (async): Clerk files DEF tickets; owners comment with artifact links.
Rules that do not bend: frozen build ID, four distinct hats across the team (one person may wear two hats only if another reviewer is external), and no scoring until manifest hashes verify. Async saves calendar; it does not save you from exporting mismatched CSVs.
For teams in Asia–EU handoff time zones, schedule dimension 1 during overlap hours when both finance and engineering can answer rollup definition questions live.
Conclusion
Mock audit is how you control the submission window instead of letting the window control you. The seven dimensions look heavy until the second tabletop—then reviewers’ questions sound familiar instead of surprising.
Book T-14 and T-3 on the same calendar as Next Fest capsule work. When the real intake email arrives, you send a zip you have already failed and fixed in rehearsal.