Lesson 180: AI-Assisted Governance Packet Red-Team Prompts with Human Sign-Off Gate (2026)

Direct answer: Lesson 179 kept regional reviewer comments honest. Lesson 180 lets you use an LLM to stress-test frozen governance packets only as a read-only critic—findings land in an append-only log, and humans alone promote text into signer-visible annex rows.

Artist vs AI Robot artwork used as lesson hero for AI-assisted governance packet red-team with human sign-off gate

Why this matters now (mid-2026)

Mid-2026 partner questionnaires stopped asking whether you use AI and started asking what touches submission packets:

Teams paste model output straight into FAQ-bound annex tables from Lesson 170 — reviewers flag unattested AI-generated annex risk.
Engineers wire “helpful” auto-fix scripts that rewrite partner_reply_packet bodies from Lesson 176 after an LLM pass — silent tuple drift meets forbid-silent-rewrite CI failures.
Legal wants a blanket ban on LLMs; product still wants red-team coverage. The defensible middle path is frozen packet in → findings out → human promotion only.

This lesson is that middle path—not a vibe checklist.

Lesson objectives

You will implement:

governance_packet_red_team_run bound to publish_tuple_hash and frozen packet bytes
Versioned red_team_prompt_template with read-only packet scope
Append-only red_team_finding rows (severity, section ref, model rationale not promoted until human ack)
human_sign_off_promotion lane (who promoted, when, diff hash)
forbid_auto_mutate_signer_fields guard on annex / reply tables
Retention policy for model I/O logs aligned to WORM discipline from Lesson 175
Publish-gate extension ai_red_team_human_signoff_pending

Prerequisites

Lesson 171 — active publish_tuple_hash freeze before any red-team run
Lesson 176 — reply packets versioned; red-team never overwrites byte_hash
Lesson 175 — append-only archive; red-team logs are new objects, not mutations
Lesson 170 — executive readback / FAQ annex field list defines signer-visible columns
Optional: 15 Free LLM-Driven NPC Dialogue and Local Fallback Net Resources — moderation + human-gate patterns for game LLM lanes; this lesson applies the same discipline to governance packets

Signer-visible field contract

Maintain a CSV or JSON allow-list exported from Lesson 170 FAQ schema:

{
  "signer_visible_fields": [
    "annex.executive_summary",
    "annex.deficiency_table",
    "annex.faq_bound_rows",
    "partner_reply_packet.body_markdown"
  ],
  "model_may_write": [],
  "human_promotion_targets": [
    "red_team_finding.promoted_annex_patch"
  ]
}

Rule: model_may_write stays empty in production. CI fails if any automation sets those keys.

`governance_packet_red_team_run`

CREATE TABLE governance_packet_red_team_run (
  red_team_run_id         TEXT PRIMARY KEY,
  publish_tuple_hash      TEXT NOT NULL,
  packet_archive_id       TEXT REFERENCES packet_archive_pointer(archive_id),
  prompt_template_id      TEXT NOT NULL,
  prompt_template_version TEXT NOT NULL,
  frozen_packet_sha256    TEXT NOT NULL,
  model_provider          TEXT NOT NULL,
  model_id                TEXT NOT NULL,
  started_at_utc          TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at_utc        TIMESTAMPTZ,
  run_status              TEXT NOT NULL DEFAULT 'running'
    CHECK (run_status IN ('running', 'completed', 'failed', 'aborted')),
  UNIQUE (publish_tuple_hash, prompt_template_id, prompt_template_version)
    DEFERRABLE INITIALLY DEFERRED
);

Invariant: frozen_packet_sha256 must match Lesson 171 active tuple manifest at started_at_utc. Abort run if tuple promoted mid-flight.

`red_team_prompt_template`

Store templates outside the packet:

CREATE TABLE red_team_prompt_template (
  prompt_template_id      TEXT NOT NULL,
  template_version        TEXT NOT NULL,
  system_preamble         TEXT NOT NULL,
  user_prompt_skeleton    TEXT NOT NULL,
  max_output_tokens       INT NOT NULL CHECK (max_output_tokens BETWEEN 256 AND 8192),
  PRIMARY KEY (prompt_template_id, template_version)
);

Skeleton rules (embed in system_preamble):

You receive read-only JSON of the frozen packet—no instructions to edit it.
Output only a JSON array of findings: { "section_ref", "severity", "finding_text", "suggested_human_action" }.
Never emit replacement annex text—only critiques.
Flag contradictions between annex totals and leadership rollup language (Lesson 166 epsilon vocabulary).

`red_team_finding` (append-only)

CREATE TABLE red_team_finding (
  finding_id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  red_team_run_id         TEXT NOT NULL REFERENCES governance_packet_red_team_run(red_team_run_id),
  section_ref             TEXT NOT NULL,
  severity                TEXT NOT NULL CHECK (severity IN ('blocker', 'major', 'minor', 'info')),
  finding_text            TEXT NOT NULL,
  suggested_human_action  TEXT,
  model_rationale         TEXT,
  created_at_utc          TIMESTAMPTZ NOT NULL DEFAULT now(),
  promotion_status        TEXT NOT NULL DEFAULT 'pending_human'
    CHECK (promotion_status IN ('pending_human', 'promoted', 'rejected', 'deferred'))
);

No UPDATE on finding_text after insert—rejections append red_team_finding_event rows instead.

Human-only promotion lane

CREATE TABLE human_sign_off_promotion (
  promotion_id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  finding_id              UUID NOT NULL REFERENCES red_team_finding(finding_id),
  promoted_by_email       TEXT NOT NULL,
  promoted_at_utc         TIMESTAMPTZ NOT NULL DEFAULT now(),
  target_field            TEXT NOT NULL,
  promoted_text_sha256    TEXT NOT NULL,
  diff_against_frozen     TEXT NOT NULL,
  signer_ack_required     BOOLEAN NOT NULL DEFAULT true,
  CHECK (target_field <> ANY(ARRAY[]::TEXT[])) -- replace with signer_visible allow-list check in app layer
);

Workflow:

Model run completes → findings pending_human.
Governance owner reviews in UI—copy acceptable phrasing manually into annex editor.
Insert human_sign_off_promotion with SHA-256 of exact promoted bytes.
Route Lesson 174 signer ack if signer_ack_required.
Only then set promotion_status = 'promoted'.

Forbid auto-mutate on signer tables

CREATE OR REPLACE FUNCTION forbid_auto_mutate_signer_fields()
RETURNS TRIGGER AS $$
BEGIN
  IF current_setting('app.red_team_actor', true) = 'llm_pipeline' THEN
    RAISE EXCEPTION 'signer_field_auto_mutate_forbidden';
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Set app.red_team_actor = 'llm_pipeline' only inside the isolated worker role. Human editors use human_editor or default.

Log retention (partner questionnaire row)

Log type	Retention	Storage class
Raw model request/response	90 days hot	Encrypted object store, separate prefix `red-team-logs/`
`red_team_finding` rows	Life of `cert_window_id` + 24 months	Database + export to WORM per Lesson 175
`human_sign_off_promotion`	Same as findings	Tied to annex version semver

Export monthly red_team_retention_manifest.json with object keys + SHA-256 for intake packets.

Publish-gate coupling

Extend Lesson 171:

-- Pseudocode policy
block_reason = 'ai_red_team_human_signoff_pending'
WHEN EXISTS (
  SELECT 1 FROM red_team_finding f
  JOIN governance_packet_red_team_run r USING (red_team_run_id)
  WHERE r.publish_tuple_hash = :active_tuple
    AND f.severity IN ('blocker', 'major')
    AND f.promotion_status = 'pending_human'
);

Distinct from follow_the_sun_handoff_stale (Lesson 179) and mock_audit_carve_back_pending (Lesson 178).

Six-step red-team procedure

Freeze tuple — confirm Lesson 171 gate green for promotion into red-team, not out.
Snapshot bytes — hash frozen packet; store frozen_packet_sha256 on run row.
Execute template — single model pass; save raw log to red-team-logs/ prefix.
Parse findings — insert append-only rows; zero rows promoted automatically.
Human triage — owner marks promoted / rejected / deferred with promotion table for accepts.
Signer ack — Lesson 174 route for any promoted blocker; re-run publish gate.

Common mistakes

Promoting model text without human_sign_off_promotion row — fails 2026 annex AI attestation questions.
Re-running red-team after tuple promotion without new run id — findings reference stale hash.
Letting CI “fix” annex typos via LLM — bypasses human lane; use spell-check locally on promoted text only.
Deleting raw logs at 30 days when partner asks for 90-day evidence.
Using red-team to mutate partner_reply_packet.byte_hash — use new semver row per Lesson 176.

Verification checklist

[ ] Worker role cannot UPDATE partner_reply_packet.body_markdown (trigger fires).
[ ] Blocker finding with pending_human blocks publish with distinct block_reason.
[ ] Promoted text SHA-256 matches annex editor export.
[ ] Raw log object exists for each red_team_run_id.
[ ] Lesson 170 FAQ export lists count of AI-assisted findings promoted vs rejected.

Mini exercise (90 minutes)

Freeze a synthetic packet with one intentional FAQ contradiction. Run red-team template v1. Confirm model returns a major finding. Attempt scripted UPDATE on annex via llm_pipeline role—watch trigger fail. Manually promote corrected sentence, insert human_sign_off_promotion, clear gate.

Troubleshooting

Symptom	Likely cause	Fix
Gate stuck after “fix”	Finding still `pending_human`	Reject or promote all blocker/major rows
Partner says no AI log	Retention manifest missing	Export `red_team_retention_manifest.json`
Duplicate findings	Re-run same template version	Bump `template_version` or abort duplicate run
Signer rejects promoted text	Promotion bypassed FAQ schema	Re-promote only allow-listed fields

Pro tips

Tip — Template semver: bump minor when partner questionnaire adds a new annex section.
Tip — Pair with resource list: game NPC LLM fallback discipline and governance red-team share the same human-before-signer rule.
Tip — Defer info findings: do not block publish on info—partners care about blockers/majors only.

Next lesson teaser

The next lesson (Lesson 181: Q1 2027 Cert Intake Rehearsal Calendar Export from Lesson 172 Rubric JSON (2026)) exports the 172 rubric into ICS + governance README so Q1 2027 tabletop rooms are not double-booked in October 2026 planning decks. Lesson 180 keeps AI out of signer-visible text; Lesson 181 keeps calendar chaos out of intake week.

Continuity

Paired Unity guide chapter (next Guide-Create pass): Unity 6.6 LTS OpenXR governance red-team prompt exporter preflight — ScriptableObject template + frozen packet hash gate before any LLM call.
Lesson 179 — regional ingestion findings may feed red-team section_ref tags.
Lesson 178 — carve-back tickets should not be auto-closed by model suggestions.
Lesson 177 — dictionary columns in frozen packet must match migration phase.
Lesson 176 — promoted patches create new reply packet semver, never in-place hash rewrite.
Lesson 175 — red-team raw logs append to WORM prefix, not mutate archive rows.
Lesson 174 — signer ack on promoted blockers.
Lesson 172 — rubric JSON becomes Lesson 181 calendar source.
Lesson 171 — tuple freeze is the red-team entry gate.
Lesson 170 — FAQ field allow-list defines signer-visible boundary.

FAQ

Can we ban LLMs entirely and skip this lesson?
Yes for production annex authoring. You still need a written “no AI on signer fields” policy in intake packets—partners ask even when you do not run models.

Does red-team replace mock audit tabletop?
No. Lesson 172 tabletop stays human-scored; red-team is extra critique on frozen bytes before send.

Which model provider?
Any—log model_provider + model_id on the run row. Policy is process, not vendor.

Can deferred findings ship?
Yes if severity is minor/info only. Blocker/major pending_human keeps publish gate red.

Partners in 2026 do not fear LLMs—they fear unattested LLM edits on annexes they must sign. Red-team loudly, promote quietly, and keep every signer-visible byte behind a human thumbprint.