OpenXR Governance Replay Fails Because Partner SLA Transparency Snapshot Totals Disagree With Leadership Dashboard Rollup - How to Fix

Problem: A Quest OpenXR governance replay or tabletop review stops because the partner-facing SLA transparency annex shows totals, queue depths, or breach counters that do not match the leadership dashboard rollup for the same snapshot tuple revision. Reviewers treat the mismatch as evidence drift, block signoff, and ask you to prove which surface is canonical.

Who is affected: Release and live-ops teams running 2026 partner plus leadership dual-publish governance where both sides export CSV or PDF slices from tools that look equivalent but aggregate differently.

Fastest safe fix: Stop debating which number “feels” right. Freeze both exports under one metric dictionary revision, identical UTC window boundaries, the same tuple footer hash, and rerun a deterministic row-level diff with a published variance epsilon before any packet leaves the building.

How to confirm success: A replay script loads partner annex and leadership slice from the same revision pointer, hashes match, numeric diff is within epsilon or every out-of-epsilon row has a bypass audit ID and freeze context ID visible on both surfaces.

Why this issue spikes now

Partner annexes moved from informal email attachments to tuple-bound evidence in 2026 submission windows. Leadership dashboards picked up cert-window freeze and carve-out columns at the same time. When one export path rounds timestamps to local wall clock while the other stays on RFC 3339 UTC, totals diverge without anyone editing a formula.

Replay reviewers now compare annex and rollup side by side because prior quarters taught them that “small rounding” hid real queue debt.

Direct answer

Publish one shared metric dictionary ID, export partner and leadership slices with identical UTC window labels bound to the same tuple revision, include bypass and freeze columns on both surfaces, and fail the publish pipeline when variance exceeds your documented epsilon unless each row carries an approved audit ID.

Root cause summary

Different aggregation windows — partner annex ends 23:59:59 local while leadership closes on UTC day boundaries.
Lane filters — leadership slice excludes a route class that partner annex still counts.
Missing carve-out rows — freeze bypass or emergency promotion moved numbers on leadership but annex export job is older.
Stale cached CSV — CDN or SharePoint snapshot serves yesterday’s export with today’s footer revision.
Dictionary drift — column renamed on one sheet; other still maps old header to a different metric definition.

Fastest safe fix path

Mark both packets hold_numeric_mismatch until diff completes.
Re-export both surfaces using the same dictionary revision and UTC window parameters from a single runbook row.
Run automated diff (or scripted pivot compare) and classify every out-of-epsilon delta.
Attach missing bypass audit IDs or refresh carve-out annex rows until leadership and partner totals reconcile or every delta is explained.
Regenerate reviewer packet with new footer hash and replay.

Step-by-step fix

Step 1: Lock the snapshot tuple

Write down four fields and refuse mixed values:

snapshot_tuple_id
metric_dictionary_revision
export_window_start_utc
export_window_end_utc

Verification checkpoint: partner annex header and leadership slice header show identical values for all four fields.

Step 2: Prove UTC parity, not “close enough” local time

Store exports as UTC-labeled filenames and metadata, not implicit local.
If a BI tool auto-shifts time zones, document the offset row used and mirror it on partner job.
For daylight-saving weeks, run one extra sanity row comparing UTC hour histograms on both sides.

Verification checkpoint: first and last event timestamps in raw event feeds match within one second of the declared window on both extracts.

Step 3: Align filters and dimensions

List every dimension on leadership rollup (lane, route class, severity band, environment).
Confirm partner annex includes the same dimension set or documents explicit N/A with reason.
If leadership excludes experimental routes, partner annex must exclude the same keys—not a footnote buried on page seven.

Verification checkpoint: row counts per dimension match before summing KPI columns.

Step 4: Surface carve-outs and bypass numbers as first-class columns

Replay failures spike when freeze bypass or emergency promotion changes totals but only leadership shows the adjustment.

Add bypass_audit_id and freeze_id columns to both partner and leadership exports when carve-outs exist.
If partner legally cannot see certain carve-out classes, ship a redacted partner annex plus a sealed leadership annex with matching totals on shared columns and explicit “withheld class” counters that still sum.

Verification checkpoint: summing shared columns across annex and leadership matches after carve-out rows are included on both sides or explicitly deferred with signed waiver text.

Step 5: Kill stale cache exports

Invalidate CDN or portal cache keys tied to the tuple revision.
Re-download both files through the same path reviewers will use.
Store SHA-256 of each file beside the replay index.

Verification checkpoint: checksums on reviewer machine match the values in your handoff index.

Step 6: Add or tighten an epsilon diff gate in CI or publish scripts

Choose epsilon per metric (integer counts often epsilon zero; rates may allow tiny float tolerance).
Fail publish when out-of-epsilon rows lack variance_explanation_id.
Log diff output as an artifact attached to the governance job run.

Verification checkpoint: publish job cannot complete while unexplained variance rows exist.

Step 7: Tabletop the replay once with printed checksums

Dry-run the exact reviewer sequence:

Print or PDF the replay index page that lists tuple ID, dictionary revision, and both file checksums.
Load partner annex first; circle the footer hash aloud in the room.
Load leadership slice; confirm the same footer hash.
Walk the first three numeric columns and reconcile any delta before opening discussion topics.

Verification checkpoint: the tabletop log records no numeric discussion until checksum and dictionary rows are read verbatim.

When leadership uses BI fiscal weeks and partner uses rolling seven days

This hybrid causes predictable pain. Pick one window type for SLA evidence and demote the other view to informational only until it is regenerated with the same boundaries.

Document the chosen window type in the metric dictionary row for each KPI.
If finance insists on fiscal weeks, generate a partner annex fiscal-week companion tab that is clearly labeled—not mixed into rolling-seven-day tabs.
Never paste fiscal-week totals into a rolling-seven-day column header, even “temporarily” for a deck.

Alternative fixes when politics block raw row sharing

Shared aggregate table — both surfaces read from one warehouse view versioned by tuple ID.
Signer-visible reconciliation appendix — short table listing only mismatched metrics with owner and resolution state (use when raw rows cannot ship externally).
Third-party witness export — analytics vendor snapshot hashed independently when partner and leadership stacks cannot share a database.

Verification checklist before replay

[ ] Same metric_dictionary_revision on both headers
[ ] Same UTC window labels
[ ] Same dimension filters
[ ] Carve-out and bypass columns present or explicitly waived with signer text
[ ] File checksums recorded in replay index
[ ] Automated diff artifact attached with zero unexplained out-of-epsilon rows

Prevention

Run a weekly reconciliation job with signer-visible acknowledgment tied to job_run_id (see course continuity below).
Ban “quick CSV from my laptop” exports for partner-facing annexes; only CI or scheduled jobs may mint numbered files.
Keep a single runbook row that names export owners, tools, and time zone assumptions per tuple revision.

FAQ

Do partner and leadership files have to come from the same database?

They must share one dictionary revision and one window definition. Physical database identity is optional if a warehouse view guarantees identical aggregation semantics.

What epsilon should we use for queue depth integers?

Default zero for integer counts unless your dictionary documents intentional smoothing. Publish any non-zero epsilon in the same packet revision as the metrics it governs.

Reviewer already saw an older annex. What now?

Ship a correction packet that bumps packet_revision, invalidates old checksums in the replay index, and states which tuples supersede prior annexes.

Can we approve leadership as canonical and ignore partner deltas?

Only with explicit signer governance text and a documented waiver path. Silent preference for one side is how replays fail twice.

What if only one team can access the warehouse view?

Use a witness export job run by a neutral owner (release engineering or analytics platform) that publishes hashed CSV to both teams from the same query revision. Both annex and leadership then cite witness_job_run_id in their headers.

Should we diff PDFs visually?

No. OCR and layout shifts create false positives. Diff structured extracts (CSV or parquet) derived from the same query revision, then render PDFs only for human reading after numbers agree.

If this unblocked your replay, bookmark the article for the next cert window. Share it with whoever owns the partner export job so UTC and dictionary revisions stop drifting apart under deadline pressure.