Lesson 76: Waiver Renewal Decision Replay Checklist for Post-Promote Telemetry Slice Divergence in RPG Live-Ops

Lesson 75 locked promotion packets into audit exports. Exports preserve what reviewers believed; they do not prove live behavior still matches that belief an hour later.

This lesson adds a decision replay checklist: a bounded, repeatable comparison between packet fields (scorecard, playbook, corrective acceptance) and post-promote telemetry slices so teams catch silent divergence while mitigation is still cheap.

Fast Food Collection illustration representing many parallel signals that need a consistent review tray

What this lesson solves

You need:

A single checklist document operators run after promote, not an ad-hoc dashboard tour
Explicit slice definitions (time window, cohort, metric set) so two replays are comparable
A divergence log that links back to promotion_packet_id and, when applicable, export_batch_id from Lesson 75

Prerequisites: Lessons 74 (promotion packet row) and 75 (audit export log).
Expected time: 90-110 minutes including one dry-run replay on a past promote.

For governance vocabulary that overlaps release gates, keep wording aligned with 18 Free Release Gate Evidence Packet Templates for Indie Teams (2026 Q4) so replay findings do not invent a second taxonomy.

What you will build

A waiver_renewal_decision_replay_checklist_policy.md contract
A waiver_renewal_decision_replay_log.csv append-only schema
One reference telemetry slice profile (JSON or table) your team reuses across promotes

Step 1 - Define replay policy

Create waiver_renewal_decision_replay_checklist_policy.md and specify:

When replay is mandatory (for example: every promote, every hotfix that touches waiver lanes, or within N hours of deploy for regulated retention classes)
Maximum latency between deploy complete and replay complete
Who may sign a replay pass versus who must only observe
Escalation rule when any checklist row is divergent or unknown (default: hold new waiver relaxations until resolved)

Step 2 - Map packet fields to measurable signals

For each promotion packet column your team trusts, define one telemetry binding:

packet field family	example live signals
closure scorecard lane	error rate, p95 latency, saturation index for that lane
playbook row completion	feature flag state, config version, job success ratio tied to that mitigation
corrective acceptance	test gate status, canary cohort health, debt burn metric
executive exception	cap counters, exposure meters, budget telemetry tied to the exception

If a field has no honest signal, mark it non_replayable in policy and require a human attestation row instead of pretending dashboards cover it.

Step 3 - Author `waiver_renewal_decision_replay_log.csv`

Append one row per replay execution. Suggested columns:

column	purpose
`replay_row_id`	monotonic id
`promotion_packet_id`	Lesson 74 reference
`export_batch_id`	Lesson 75 pointer when export exists
`deploy_marker`	build id, git sha, or release tag
`replay_slice_id`	named profile (for example `post_promote_t0_plus_2h_core_cohort`)
`slice_window_start_utc`	inclusive
`slice_window_end_utc`	exclusive
`replay_started_at_utc`
`replay_completed_at_utc`
`replay_operator_ack`	who ran it
`packet_to_telemetry_mapping_version`	version of your binding table
`overall_replay_verdict`	`aligned`, `divergent`, `inconclusive`
`divergence_summary`	short text when not aligned
`followup_ticket_id`	empty when aligned
`replay_signoff_lane`	owner lane for the verdict

Treat the log as append-only; corrections add a new row referencing correction_of_replay_row_id if your tooling supports it.

Step 4 - Build one reusable slice profile

Document replay_slice_id profiles so operators do not improvise windows under pressure. Each profile should list:

cohort keys (region, platform, account tier, or percentage canary)
metric list with query anchors or dashboard deep links
expected stability assumptions (weekday vs weekend, event blackout)

Pro tip: Keep the first profile narrow. A two-hour window on your core paying cohort beats a twenty-four-hour global aggregate that hides regressions behind volume.

Step 5 - Run a tabletop dry-run

Pick a historical promote with known outcome. Replay using only artifacts you would still have (packet row, export pointer, telemetry snapshots). Note every gap. Update policy and bindings before you rely on this for a live gate.

Common mistakes

Mistake: Replay becomes a generic health review. Fix: bind each step to a specific packet field; if it does not map, mark non-replayable.
Mistake: Slices drift between replays. Fix: freeze replay_slice_id versions and bump packet_to_telemetry_mapping_version when queries change.
Mistake: Green replay while export is missing. Fix: Lesson 75 export completeness is a prerequisite row in your policy for regulated classes.

Mini challenge

Take one live promotion_packet_id.
List five packet fields and the exact telemetry query or dashboard tile that proves each.
Identify one field that is only provable by human attestation and write the attestation wording.

FAQ

Is this redundant with canary analysis?

Canary analysis proves rollout safety for the binary. Replay proves decision documentation still matches observed live posture for waiver-specific claims.

How soon after promote should replay run?

Policy decision. Many teams run a first pass within two hours for fast feedback, then a second daily pass for slower-moving debt metrics.

What if telemetry is temporarily incomplete?

Log inconclusive with reason, open a follow-up ticket, and treat that as yellow for new waiver relaxations until signals recover.

Lesson recap

You now have a decision replay checklist pattern that compares waiver promotion packets to bounded telemetry slices, logs divergence early, and stays linked to export and packet identifiers for audit continuity.

Next lesson teaser

Continue to Lesson 77: Waiver Renewal Replay Divergence Triage Queue with SLA, Severity Rubric, and Re-Promote Gates in RPG Live-Ops, which routes divergent and inconclusive replay rows into lane owners with SLA, severity, and explicit re-promote gates.