Lesson 77: Waiver Renewal Replay Divergence Triage Queue with SLA, Severity Rubric, and Re-Promote Gates in RPG Live-Ops

Lesson 76 produces divergent or inconclusive replay rows. Without a triage queue, those rows become meeting anecdotes instead of closed accountability.

This lesson defines a replay divergence triage queue: severity, SLA, owner lane, and re-promote gates that block the next waiver relaxation until a fresh replay passes.

Cheers illustration representing closure and sign-off when divergence work finishes cleanly

What this lesson solves

You need:

  1. One append-only triage log keyed to replay_row_id from Lesson 76
  2. A severity rubric that maps divergence to response time and escalation
  3. Explicit re-promote gates (what must be true before another promote or waiver change ships)

Prerequisites: Lesson 76 replay log fields; Lesson 74 packet identifiers.
Expected time: 90-105 minutes including one tabletop triage drill.

What you will build

  1. A waiver_renewal_replay_divergence_triage_policy.md contract
  2. A waiver_renewal_replay_divergence_triage_queue.csv append-only schema
  3. A short severity table referenced by policy (copy into the same doc or link)

Step 1 - Define triage policy

In waiver_renewal_replay_divergence_triage_policy.md, specify:

  • which overall_replay_verdict values open a triage row (divergent, inconclusive; optionally aligned with manual override)
  • default owner lane by divergence class (scorecard vs playbook vs corrective vs telemetry gap)
  • SLA clock start (replay completion timestamp versus triage row creation)
  • freeze rule for new waiver relaxations while any sev-1 or sev-2 triage row is open
  • re-promote gate definition: a new replay row id that shows aligned after mitigation, tied to the same promotion_packet_id lineage or its documented successor

Step 2 - Severity rubric

Use a simple ordered scale so executives and ICs share language:

severity meaning example SLA to first meaningful update
sev-1 customer-visible or regulator-adjacent mismatch vs packet claims policy-defined hours
sev-2 internal material risk, not yet user-visible policy-defined business day
sev-3 explainable telemetry gap, bounded mitigation policy-defined days

Write one paragraph per level in policy so severity cannot be negotiated live without updating the doc.

Step 3 - Author waiver_renewal_replay_divergence_triage_queue.csv

Append one row when a replay opens triage. Suggested columns:

column purpose
triage_row_id monotonic id
replay_row_id Lesson 76 pointer
promotion_packet_id Lesson 74 pointer
export_batch_id Lesson 75 pointer when relevant
divergence_class short label (scorecard_drift, playbook_incomplete, telemetry_gap, etc.)
severity sev-1, sev-2, or sev-3
owner_lane accountable lane
triage_opened_at_utc
sla_due_at_utc from policy
triage_status open, mitigating, awaiting_replay, closed
mitigation_summary what changed
repromote_gate_replay_row_id empty until a passing replay exists
closed_at_utc
closure_ack who accepted closure

Corrections append a new row with correction_of_triage_row_id if your tooling supports it.

Step 4 - Re-promote gates

A re-promote gate is not a generic green build. It is:

  • a new replay_row_id with overall_replay_verdict = aligned
  • using a packet_to_telemetry_mapping_version at least equal to the version that produced the divergence (or a documented bump with reviewer ack)
  • logged in the triage row before triage_status moves to closed

If you ship a hotfix without replay, policy should force sev-1 or sev-2 treatment and document the exception in the packet lineage.

Step 5 - Tabletop drill

Take one historical divergent replay narrative. Walk: open triage row, assign severity, pick SLA, draft mitigation, write the repromote_gate_replay_row_id you would require. Note any tool gaps.

Common mistakes

  • Mistake: Closing triage on verbal agreement. Fix: require repromote_gate_replay_row_id or an explicit policy exception record.
  • Mistake: Severity debates in chat. Fix: cite rubric paragraph; change rubric in policy if reality changed.
  • Mistake: Duplicate triage rows for the same replay. Fix: one triage row per replay_row_id; reopening uses a new replay_row_id after a new promote.

Mini challenge

  1. Write three divergence_class labels your team would actually use.
  2. Map each to a default owner_lane.
  3. Pick SLA hours that fit your release cadence without being fantasy.

FAQ

Do we triage every inconclusive replay?

Policy choice. Many teams triage all inconclusive as sev-3 until explained, because unknown state is promotion-unsafe for regulated lanes.

What if mitigation cannot finish before the next train?

Escalate with an explicit exception row on the promotion packet (Lesson 74) and keep triage open until the next aligned replay closes the loop.

Lesson recap

You now have a triage queue pattern that turns replay divergence into owned work with SLA, severity, and explicit re-promote gates tied to replay_row_id and promotion_packet_id.

Next lesson teaser

Continue to Lesson 78: Waiver Renewal Divergence Causal Factor Register and Monthly Executive Readout in RPG Live-Ops, which rolls recurring divergence_class themes into a factor register and monthly executive readouts.

Related learning