Lesson 76: Waiver Renewal Decision Replay Checklist for Post-Promote Telemetry Slice Divergence in RPG Live-Ops
Lesson 75 locked promotion packets into audit exports. Exports preserve what reviewers believed; they do not prove live behavior still matches that belief an hour later.
This lesson adds a decision replay checklist: a bounded, repeatable comparison between packet fields (scorecard, playbook, corrective acceptance) and post-promote telemetry slices so teams catch silent divergence while mitigation is still cheap.

What this lesson solves
You need:
- A single checklist document operators run after promote, not an ad-hoc dashboard tour
- Explicit slice definitions (time window, cohort, metric set) so two replays are comparable
- A divergence log that links back to
promotion_packet_idand, when applicable,export_batch_idfrom Lesson 75
Prerequisites: Lessons 74 (promotion packet row) and 75 (audit export log).
Expected time: 90-110 minutes including one dry-run replay on a past promote.
For governance vocabulary that overlaps release gates, keep wording aligned with 18 Free Release Gate Evidence Packet Templates for Indie Teams (2026 Q4) so replay findings do not invent a second taxonomy.
What you will build
- A
waiver_renewal_decision_replay_checklist_policy.mdcontract - A
waiver_renewal_decision_replay_log.csvappend-only schema - One reference telemetry slice profile (JSON or table) your team reuses across promotes
Step 1 - Define replay policy
Create waiver_renewal_decision_replay_checklist_policy.md and specify:
- When replay is mandatory (for example: every
promote, every hotfix that touches waiver lanes, or within N hours of deploy forregulatedretention classes) - Maximum latency between deploy complete and replay complete
- Who may sign a replay pass versus who must only observe
- Escalation rule when any checklist row is
divergentorunknown(default: hold new waiver relaxations until resolved)
Step 2 - Map packet fields to measurable signals
For each promotion packet column your team trusts, define one telemetry binding:
| packet field family | example live signals |
|---|---|
| closure scorecard lane | error rate, p95 latency, saturation index for that lane |
| playbook row completion | feature flag state, config version, job success ratio tied to that mitigation |
| corrective acceptance | test gate status, canary cohort health, debt burn metric |
| executive exception | cap counters, exposure meters, budget telemetry tied to the exception |
If a field has no honest signal, mark it non_replayable in policy and require a human attestation row instead of pretending dashboards cover it.
Step 3 - Author waiver_renewal_decision_replay_log.csv
Append one row per replay execution. Suggested columns:
| column | purpose |
|---|---|
replay_row_id |
monotonic id |
promotion_packet_id |
Lesson 74 reference |
export_batch_id |
Lesson 75 pointer when export exists |
deploy_marker |
build id, git sha, or release tag |
replay_slice_id |
named profile (for example post_promote_t0_plus_2h_core_cohort) |
slice_window_start_utc |
inclusive |
slice_window_end_utc |
exclusive |
replay_started_at_utc |
|
replay_completed_at_utc |
|
replay_operator_ack |
who ran it |
packet_to_telemetry_mapping_version |
version of your binding table |
overall_replay_verdict |
aligned, divergent, inconclusive |
divergence_summary |
short text when not aligned |
followup_ticket_id |
empty when aligned |
replay_signoff_lane |
owner lane for the verdict |
Treat the log as append-only; corrections add a new row referencing correction_of_replay_row_id if your tooling supports it.
Step 4 - Build one reusable slice profile
Document replay_slice_id profiles so operators do not improvise windows under pressure. Each profile should list:
- cohort keys (region, platform, account tier, or percentage canary)
- metric list with query anchors or dashboard deep links
- expected stability assumptions (weekday vs weekend, event blackout)
Pro tip: Keep the first profile narrow. A two-hour window on your core paying cohort beats a twenty-four-hour global aggregate that hides regressions behind volume.
Step 5 - Run a tabletop dry-run
Pick a historical promote with known outcome. Replay using only artifacts you would still have (packet row, export pointer, telemetry snapshots). Note every gap. Update policy and bindings before you rely on this for a live gate.
Common mistakes
- Mistake: Replay becomes a generic health review. Fix: bind each step to a specific packet field; if it does not map, mark non-replayable.
- Mistake: Slices drift between replays. Fix: freeze
replay_slice_idversions and bumppacket_to_telemetry_mapping_versionwhen queries change. - Mistake: Green replay while export is missing. Fix: Lesson 75 export completeness is a prerequisite row in your policy for regulated classes.
Mini challenge
- Take one live
promotion_packet_id. - List five packet fields and the exact telemetry query or dashboard tile that proves each.
- Identify one field that is only provable by human attestation and write the attestation wording.
FAQ
Is this redundant with canary analysis?
Canary analysis proves rollout safety for the binary. Replay proves decision documentation still matches observed live posture for waiver-specific claims.
How soon after promote should replay run?
Policy decision. Many teams run a first pass within two hours for fast feedback, then a second daily pass for slower-moving debt metrics.
What if telemetry is temporarily incomplete?
Log inconclusive with reason, open a follow-up ticket, and treat that as yellow for new waiver relaxations until signals recover.
Lesson recap
You now have a decision replay checklist pattern that compares waiver promotion packets to bounded telemetry slices, logs divergence early, and stays linked to export and packet identifiers for audit continuity.
Next lesson teaser
Continue to Lesson 77: Waiver Renewal Replay Divergence Triage Queue with SLA, Severity Rubric, and Re-Promote Gates in RPG Live-Ops, which routes divergent and inconclusive replay rows into lane owners with SLA, severity, and explicit re-promote gates.
Related learning
- Lesson 75: Waiver Renewal Post-Promotion Audit Export for Rollback Evidence and Incident Lineage in RPG Live-Ops
- Lesson 74: Waiver Renewal Promotion Decision Packet Template for Scorecard, Playbook, Corrective Acceptance, and Executive Exceptions in RPG Live-Ops
- How to Score Forecast Calibration Drift Before Release Gates (Live-Ops, 2026)