Lesson 174: Signer Route Fatigue Heat-Map, Backup Owner Promotion, and Escalation (2026)
Why this matters now
Lesson 173 gave you a recurrence trend board that answers "which dimension keeps going red." This lesson answers the operational counterpart: "who is one cert window away from missing an SLA, and what is the team's response when that signer's queue starts to slip."
The autumn 2026 load is the specific shock this lesson is built for. Steam's autumn Deck Verified refresh review intake opens in late September. Quest's holiday cert window overlaps from early October. PlayStation Indies and Xbox Game Preview tightened their Q4 windows roughly two weeks earlier than the 2025 calendar to clear shipping ahead of the November-December consumer push. The result: many small teams now find themselves with three concurrently open cert windows in a single eight-week stretch, and the same one or two human signers named on every owner-route in their governance schema.
Live-ops threads through Q3 2026 name this exact failure mode: a single signer assigned to four owner-routes (publish-gate ack per Lesson 171, FAQ rewrite ack per Lesson 170, freeze-bypass ack per Lesson 163, exec readback co-sign per Lesson 164) misses one ack by a few hours during the Quest holiday window. That single miss becomes a tuple-drift block on the Steam upload because the publish-pipeline's reverse dependency on the FAQ ack is still walking through the gate, which blocks the freeze-lift dry run scheduled for the same evening, which blocks the partner annex export the next morning, which blocks the cert reviewer's intake response by 24 hours. One tired signer, three cert windows slipped one calendar day each.
Partner reviewers in late 2026 have started asking explicitly: "what is your backup owner promote policy, and have you exercised it during a real cert window in the past quarter?" A team that cannot point to a documented promote policy plus a real-fire exercise log gets a yellow on their year-end risk letter — and the year-end risk letter rolls into Q1 2027 intake decisions.
This lesson installs the fatigue heat-map, the backup-owner promote workflow, and the escalation ladder that makes the autumn cert overlap survivable without burning out the named signers.
Lesson objectives
By the end of this lesson you will have:
- A signer-route response-time table capturing ack latency per (signer, owner-route, cert-window) tuple from existing Lesson 170 + 171 + 163 + 164 ack events.
- A fatigue heat-map computing rolling 14-day load and identifying signers approaching threshold across multiple concurrent cert windows.
- A backup owner promote policy with deterministic trigger conditions, a signed-handoff schema, and an explicit revert path when the primary owner becomes available again.
- An escalation ladder that routes from "primary signer slow" through "backup promoted" to "incident lane invoked" with named thresholds and named owners at each rung.
- A monthly fatigue review that consumes the heat-map and proposes structural owner-route changes for the next quarter rather than relying on heroic individual response.
- A partner readback line that exports backup-promote exercise history into the Lesson 170 FAQ-bound packet for the autumn 2026 + Q4 partner readback cadence.
Prerequisites from earlier lessons
This lesson assumes:
- Lesson 170 FAQ-bound readback exists with
faq_change_idack rows recording signer + timestamp. - Lesson 171 publish-gate
block_reason_owner_routeexists so reroute events are signed and auditable. - Lesson 163 freeze-bypass audit trails exist with
bypass_id+ signer ack rows. - Lesson 164 leadership-partner SLA dashboard sync exists with co-signed exec readback rows.
- Lesson 166 weekly SLA snapshot reconciliation exists so the heat-map can correlate ack-latency rows against snapshot timestamps.
- Lesson 162 SLA breach forecasting + cert-window freeze gates exists so the escalation ladder has a downstream blocker to invoke when the backup-owner lane also slips.
- Lesson 173 deficiency trend board exists so signer fatigue can be cross-referenced with dimensions trending hot (a tired signer guarding a hot dimension is a higher-priority backup-promote candidate).
If any of these is missing, build it first. The fatigue heat-map without the underlying ack tables produces fictional heat patterns; the trend-board cross-reference without Lesson 173 produces a heat-map that cannot be prioritised.
The signer-route response-time table
The starting fact: across the four prerequisite ack systems (FAQ rewrites, publish-gate owner routes, freeze-bypass approvals, exec readback co-signs), the team already records signer + timestamp + outcome per ack event. The lesson's first artifact normalises these into a single signer_ack_event table:
CREATE TABLE signer_ack_event (
ack_event_id TEXT PRIMARY KEY,
signer_id TEXT NOT NULL,
owner_route TEXT NOT NULL, -- 'publish_gate' | 'faq_rewrite' | 'freeze_bypass' | 'exec_readback'
cert_window_id TEXT, -- nullable; some acks span windows
request_dt_utc TIMESTAMPTZ NOT NULL,
ack_dt_utc TIMESTAMPTZ, -- nullable until acked
ack_outcome TEXT, -- 'approved' | 'rejected' | 'reroute_requested' | 'timeout'
sla_target_hours NUMERIC(5,2) NOT NULL,
ack_latency_hours NUMERIC(8,2)
GENERATED ALWAYS AS (
CASE
WHEN ack_dt_utc IS NULL THEN NULL
ELSE EXTRACT(EPOCH FROM (ack_dt_utc - request_dt_utc)) / 3600
END
) STORED,
breached BOOLEAN
GENERATED ALWAYS AS (
ack_latency_hours IS NOT NULL
AND ack_latency_hours > sla_target_hours
) STORED
);
The two generated columns (ack_latency_hours, breached) are computed at insert and update time so the heat-map below does not have to recompute on every refresh. The sla_target_hours column captures the per-route SLA (publish-gate is typically a 30-minute first-response per Lesson 171; FAQ rewrite is typically a 4-hour ack per Lesson 170; freeze-bypass is typically a 2-hour ack per Lesson 163; exec readback is typically a 24-hour co-sign per Lesson 164).
Backfill the table from existing ack systems via an ETL job that runs nightly. Once backfilled, the table contains every ack event the team has produced — and the fatigue heat-map is two queries away.
The fatigue heat-map
The heat-map is a daily-refreshed materialised view that buckets each signer's recent load and surfaces who is at risk:
CREATE MATERIALIZED VIEW signer_fatigue_heatmap_14d AS
SELECT
signer_id,
COUNT(*) AS ack_count_14d,
COUNT(*) FILTER (WHERE breached = TRUE) AS breached_count_14d,
COUNT(DISTINCT cert_window_id) AS concurrent_window_count_14d,
AVG(ack_latency_hours) AS avg_latency_hours_14d,
AVG(ack_latency_hours / sla_target_hours) AS avg_latency_ratio_14d,
COUNT(*) FILTER (WHERE ack_outcome IS NULL) AS pending_acks_now
FROM signer_ack_event
WHERE request_dt_utc >= NOW() - INTERVAL '14 days'
GROUP BY signer_id;
The classification follows Lesson 173's deterministic-thresholds discipline:
| Classification | Trigger condition |
|---|---|
red |
concurrent_window_count_14d >= 3 AND breached_count_14d >= 1 |
amber |
concurrent_window_count_14d >= 3 OR avg_latency_ratio_14d >= 0.75 |
recovering |
previously red/amber, now avg_latency_ratio_14d < 0.5 for 7 consecutive days |
green |
all other rows |
red means: this signer is on three or more cert windows in the last fortnight and has already missed at least one SLA. Backup-owner promote should be triggered automatically for the next eligible owner-route. amber means: the signer is at risk — either by window count or by latency ratio (averaging 75% or more of their SLA window across the fortnight is the early warning). recovering is the bookkeeping state so the system does not flip back and forth between amber and green on a single quiet day. green is the healthy default.
Refresh the view nightly with CONCURRENTLY so the morning operating review reads a stable snapshot. Pin the materialised view's content hash so any partner export referencing fatigue state is reproducible against a known point in time.
The backup owner promote policy
The policy has four required fields per owner-route: primary owner, named backup owner, promote trigger condition, revert condition. Store these in a backup_owner_policy table:
CREATE TABLE backup_owner_policy (
owner_route TEXT PRIMARY KEY, -- one row per route
primary_signer TEXT NOT NULL,
backup_signer TEXT NOT NULL,
promote_trigger TEXT NOT NULL, -- deterministic condition string
revert_condition TEXT NOT NULL,
policy_version TEXT NOT NULL, -- semver, e.g. '1.2.0'
approved_by TEXT NOT NULL, -- team-lead role, signer-acked
approved_dt_utc TIMESTAMPTZ NOT NULL,
CHECK (primary_signer <> backup_signer)
);
The CHECK constraint enforces a real backup (same person on both rows defeats the purpose and is the failure mode that 2026 partner reviewers explicitly look for in their late-2026 audit questionnaires).
The promote trigger condition is one of three deterministic forms:
heatmap_red— promote if the primary signer's row insigner_fatigue_heatmap_14dclassifies asred. Automatic.pending_acks_aged— promote if the primary signer has anypending_acks_nowrow aged past 150% of itssla_target_hours. Automatic.manual_invocation— promote on explicit team-lead invocation, requires both an audit reason and a signed ack from the team lead. Manual.
The first two are automatic and produce a backup_promote_event row without any human in the loop. The third is the planned-rest path used when the primary signer is going on holiday or attending a conference; the team lead invokes it ahead of time so the promote is preventive rather than reactive.
Every promote event writes a row to backup_promote_event:
CREATE TABLE backup_promote_event (
promote_event_id TEXT PRIMARY KEY,
owner_route TEXT NOT NULL REFERENCES backup_owner_policy(owner_route),
promoted_signer TEXT NOT NULL,
trigger_type TEXT NOT NULL, -- 'heatmap_red' | 'pending_acks_aged' | 'manual_invocation'
trigger_evidence_ref TEXT NOT NULL, -- pointer to the heatmap snapshot hash or aged-ack row
promoted_at_dt_utc TIMESTAMPTZ NOT NULL,
reverted_at_dt_utc TIMESTAMPTZ,
policy_version_at_time TEXT NOT NULL
);
The trigger_evidence_ref column is the audit anchor. A heat-map snapshot hash plus the policy version at promote time produces a forensic trail that survives partner-reviewer scrutiny.
The revert condition runs symmetrically: when the primary signer's classification returns to green and stays green for 7 consecutive days, the promote reverts automatically. The backup_promote_event row's reverted_at_dt_utc populates and the owner-route's effective signer flips back. Forced revert (team-lead manual override) is allowed for emergencies but writes a separate forced_revert_event row with reason.
The escalation ladder
The ladder defines what happens when the backup also starts to slip. Three rungs:
Rung 1 — Backup promoted (automatic, from policy above). Primary signer's queue routes to backup. No human discussion required. Standard operational path.
Rung 2 — Backup also amber (manual review trigger). If the backup's row in signer_fatigue_heatmap_14d also classifies as amber or red within 48 hours of the promote, the team lead is paged. The team lead's response options are documented up front: (a) invoke a deferred-decision lane for the owner-route's non-critical acks, deferring them to the post-window cleanup; (b) re-route to a tertiary backup if one exists in backup_owner_policy_extended (a separate, more permissive table allowing chained backups for routes that need them); (c) invoke Lesson 162 cert-window freeze gate if the owner-route is on the critical path of an active cert submission and no human can ack within SLA.
Rung 3 — Critical-path freeze gate invoked (option c from rung 2). This is the explicit "we cannot ship this cert window safely" lane. The freeze gate halts the publish pipeline (Lesson 171) and the freeze-lift dry run (Lesson 169) until a tertiary owner is named, a window slip is communicated to the partner reviewer with a documented reason, or the load shifts back to a recoverable state. This rung is the rare-but-named option; teams that exercise rungs 1 and 2 routinely almost never reach rung 3.
The ladder's value is the explicit naming of each rung's owner and trigger. Without it, every signer fatigue event is a one-off discussion that consumes 30-60 minutes of leadership attention. With it, rungs 1 and 2 are mechanical, rung 3 is reserved for genuine emergencies, and the partner reviewer sees a documented response pattern.
The monthly fatigue review
Add a monthly review (separate from the Friday operating review's Block 6, which runs weekly) that consumes the heat-map and proposes structural changes:
- Input:
signer_fatigue_heatmap_14drolled up over the prior calendar month,backup_promote_eventhistory for the month,forced_revert_eventrows, and cross-reference against Lesson 173'sdeficiency_trend_boardto identify "tired signer guarding a hot dimension" overlaps. - Discussion: Are there owner-routes where the structural load is too high for one primary + one backup? Are there
manual_invocationpatterns that should be promoted to automatic triggers? Has a tertiary backup been used twice or more in the month — if so, formalise it as a named backup inbackup_owner_policy_extended. - Output: A proposed
backup_owner_policysemver increment (typically a minor bump) with explicit changes, scheduled for next-month rollout. Pin tobackup_owner_policy_historywith diff.
The monthly review is the slow-loop counterpart to the daily heat-map refresh and the weekly Block 6 trend conversation. It is the place structural owner-route problems get fixed rather than papered over by individual heroics.
The partner readback line
Q4 2026 partner readbacks now ask for backup-promote exercise history. Add a section to the Lesson 170 FAQ-bound readback packet:
## Signer Route Fatigue and Backup Owner Promote Exercise History
**Reporting window:** [YYYY-Q[N]]
**Backup owner policy version at window open:** [semver]
**Backup owner policy version at window close:** [semver]
### Backup promotes this window
| Owner route | Trigger type | Promoted to | Duration (days) | Reverted |
|---|---|---|---|---|
| [Route] | [Trigger] | [Backup signer] | [N] | [Y/N] |
### Escalation rung events
- Rung 1 promotes: [N]
- Rung 2 manual reviews: [N]
- Rung 3 freeze gates invoked: [N] (each rung-3 row links to the freeze-gate audit record)
### Heat-map snapshot at window close
[Embed the four-row table: red / amber / recovering / green counts]
Heat-map snapshot hash: [SHA-256 hex]
The hash binds the readback to a specific materialised view state so the partner reviewer can request the underlying rows. Teams that ship this readback section in autumn 2026 report partner reviewer feedback shifting from "what is your backup policy?" (a yellow-flag question) to "is the tertiary backup column overcrowded?" (a green-flag improvement-discussion question).
Common mistakes to avoid
- Naming the same human as primary and backup for two different owner-routes — superficially OK on each route alone, but the heat-map will show one human carrying the load and the CHECK constraint does not catch cross-route overlap. Audit
backup_owner_policyas a whole, not per row. - Setting promote thresholds so high that they never trigger — if no
heatmap_redrow produces a promote event in a full month, the thresholds are wrong, not the team. Lower the threshold until the system actually promotes. - Treating
recoveringasgreen— a signer fresh out of red who immediately gets re-loaded fast-flips back to red within days. The 7-dayrecoveringbuffer matters. - Allowing
manual_invocationto dominate over automatic triggers — manual is for planned rest. If most promotes are manual, the automatic triggers are tuned wrong and the team is making ad-hoc calls instead of running the policy. - Forgetting to pin the heat-map snapshot hash in the readback — without the hash, "we promoted three times last quarter" is unverifiable, and partner reviewers know it.
- Treating fatigue as a personal performance issue — the heat-map measures load, not capability. Conflating destroys honest reporting and within two quarters signers under-report their queue depth to avoid being labelled "slow."
- Skipping the monthly review during a quiet month — quiet months are exactly when structural changes are safe to roll out. Skipping the monthly review during quiet months guarantees the next loud month is reactive again.
Verification checklist
- [ ]
signer_ack_eventtable exists, backfilled from FAQ + publish-gate + freeze-bypass + exec-readback ack systems, with generatedack_latency_hoursandbreachedcolumns. - [ ]
signer_fatigue_heatmap_14dmaterialised view exists, refreshes nightly withCONCURRENTLY, classifies rows on the four-bucket deterministic rules. - [ ]
backup_owner_policytable exists with one row per owner-route, CHECK constraint preventing primary = backup, policy semver versioning, signed approval row. - [ ]
backup_promote_event+forced_revert_eventtables exist and have produced at least one row each (run a deliberate manual invocation during the first week to seed the audit). - [ ] Escalation ladder rung definitions are pinned in the team runbook (rung 1 automatic, rung 2 manual review with documented options, rung 3 critical-path freeze).
- [ ] Monthly fatigue review is on the calendar with named attendees (team lead + at least one signer + one engineer).
- [ ] Lesson 170 FAQ-bound readback packet has the signer-route-fatigue section added with the table template plus the snapshot hash field.
- [ ] At least one backup-promote exercise has been executed end-to-end with full audit trail before the first autumn 2026 cert window opens.
If any item is missing, fix it before late September. The autumn cert overlap will not wait for the heat-map to come online.
What you have just earned
After this lesson the team has a deterministic answer to the 2026 Q4 partner question: which signers are tired, what is the policy when a primary slips, has the policy been exercised, and what are the structural improvements proposed for the next quarter. The answer is mechanical (heat-map classification rules), audited (snapshot hash on the export), and routine (monthly review).
The autumn 2026 cert overlap (Steam Deck Verified refresh + Quest holiday + PlayStation Indies + Xbox Game Preview windows compressing into eight weeks) becomes a load that the system absorbs rather than a sequence of personal-heroism scrambles. Backup-owner promotes happen automatically on heatmap_red. Manual invocations happen ahead of planned rest. Rung 2 manual reviews happen rarely. Rung 3 freeze gates almost never.
Most importantly, the team's relationship to its own human signers changes. Instead of "Alex is overloaded again," the conversation is "Alex's signer_fatigue_heatmap_14d row is red for the third consecutive day; the policy promoted Pat at 9:14 UTC; we will revert on the standard 7-day recovery." That language is what mature governance looks like to partner reviewers in late 2026 — and the team that runs it routinely is the team that ships through autumn cert overlap without breaking.
Next lesson teaser
The next lesson (Lesson 175: WORM-Style Archived Submission Packet Retention and 90-Minute Cold-Storage Retrieval Drill (2026)) covers formal retention policy for post-submit packets, cold-storage pointer schema with hashed fixture parity against Lesson 165 footer semver, and a 90-minute retrieval drill mirroring Lesson 169's dry-run discipline. Off-cycle January 2027 auditor questions against Q3 2026 packets are already appearing in partner templates; teams without a retrieval drill are missing 48-hour reply deadlines.
After Lesson 175, the queue extends through Lessons 176-181 (partner reply packet versioning against archived tuple hashes, dictionary minor-increment migration guardrails, carved-back deficiency quorum, multi-region reviewer feedback ingestion, AI-assisted governance red-team prompts with human gate, and Q1 2027 cert-intake rehearsal calendar export from the Lesson 172 rubric JSON).
Continuity
- Paired Unity guide chapter (next Guide-Create pass will author): Unity 6.6 LTS OpenXR governance signer-route fatigue heat-map, backup owner promotion, and escalation ladder preflight - editor-side
Governance/Signer Fatigue Heat-MapScriptableObject group, BI-side bind contract on thesigner_fatigue_heatmap_14dmaterialised view, and the backup-owner policy pinned inGovernance/Backup Owner Policywith the primary-not-equal-backup invariant enforced at the inspector level. - Help article: OpenXR Governance Partner SLA Snapshot vs Leadership Dashboard Rollup Mismatch (Quest) Fix - the rollup mismatch failure mode upstream of the breached-ack rows the fatigue heat-map surfaces.
- Lesson 173 - Mock Audit Deficiency Recurrence Trend Board and Sprint Hardening Budget (2026) - the dimension hot/structural-red classifications this lesson cross-references against signer load to prioritise backup promotes for hot-dimension owner-routes.
- Lesson 172 - Q3 2026 Submission Intake Mock Audit Tabletop Scoring Rubric (2026) - the rehearsal cadence that generates ack events the
signer_ack_eventtable consumes. - Lesson 171 - Tuple Drift Automatic Block on Dashboard Publish Pipeline (2026) - the publish-gate owner-route this lesson protects from missed ack cascades.
- Lesson 170 - Executive Readback Redlines Versus Partner Annex FAQ Discipline (2026) - the FAQ-rewrite owner-route, and the readback packet this lesson's partner export drops into.
- Lesson 169 - Freeze Lift Rehearsal Dry Run (2026) - the dry-run rehearsal that depends on signer acks landing on time.
- Lesson 166 - Weekly SLA Snapshot Reconciliation Job (2026) - the snapshot timestamps the heat-map correlates ack latency against.
- Lesson 164 - Leadership Partner SLA Dashboard Sync (2026) - the exec-readback co-sign owner-route.
- Lesson 163 - Governance Freeze Bypass Audit Trails (2026) - the freeze-bypass ack owner-route.
- Lesson 162 - Governance SLA Breach Forecasting Cert-Window Freeze Gates (2026) - the rung-3 critical-path freeze gate this lesson's escalation ladder invokes.
A fatigue heat-map is the team's promise to its own signers that the autumn cert overlap will be absorbed by the system, not by the individual humans named on the owner-routes. Run the daily refresh, the weekly Block 6 cross-check against Lesson 173, the monthly structural review, and the autumn 2026 windows close cleanly.