Lesson 180: AI-Assisted Governance Packet Red-Team Prompts with Human Sign-Off Gate (2026)

Direct answer: Lesson 179 kept regional reviewer comments honest. Lesson 180 lets you use an LLM to stress-test frozen governance packets only as a read-only critic—findings land in an append-only log, and humans alone promote text into signer-visible annex rows.

Artist vs AI Robot artwork used as lesson hero for AI-assisted governance packet red-team with human sign-off gate

Why this matters now (mid-2026)

Mid-2026 partner questionnaires stopped asking whether you use AI and started asking what touches submission packets:

  • Teams paste model output straight into FAQ-bound annex tables from Lesson 170 — reviewers flag unattested AI-generated annex risk.
  • Engineers wire “helpful” auto-fix scripts that rewrite partner_reply_packet bodies from Lesson 176 after an LLM pass — silent tuple drift meets forbid-silent-rewrite CI failures.
  • Legal wants a blanket ban on LLMs; product still wants red-team coverage. The defensible middle path is frozen packet in → findings out → human promotion only.

This lesson is that middle path—not a vibe checklist.

Lesson objectives

You will implement:

  • governance_packet_red_team_run bound to publish_tuple_hash and frozen packet bytes
  • Versioned red_team_prompt_template with read-only packet scope
  • Append-only red_team_finding rows (severity, section ref, model rationale not promoted until human ack)
  • human_sign_off_promotion lane (who promoted, when, diff hash)
  • forbid_auto_mutate_signer_fields guard on annex / reply tables
  • Retention policy for model I/O logs aligned to WORM discipline from Lesson 175
  • Publish-gate extension ai_red_team_human_signoff_pending

Prerequisites

  • Lesson 171 — active publish_tuple_hash freeze before any red-team run
  • Lesson 176 — reply packets versioned; red-team never overwrites byte_hash
  • Lesson 175 — append-only archive; red-team logs are new objects, not mutations
  • Lesson 170 — executive readback / FAQ annex field list defines signer-visible columns
  • Optional: 15 Free LLM-Driven NPC Dialogue and Local Fallback Net Resources — moderation + human-gate patterns for game LLM lanes; this lesson applies the same discipline to governance packets

Signer-visible field contract

Maintain a CSV or JSON allow-list exported from Lesson 170 FAQ schema:

{
  "signer_visible_fields": [
    "annex.executive_summary",
    "annex.deficiency_table",
    "annex.faq_bound_rows",
    "partner_reply_packet.body_markdown"
  ],
  "model_may_write": [],
  "human_promotion_targets": [
    "red_team_finding.promoted_annex_patch"
  ]
}

Rule: model_may_write stays empty in production. CI fails if any automation sets those keys.

governance_packet_red_team_run

CREATE TABLE governance_packet_red_team_run (
  red_team_run_id         TEXT PRIMARY KEY,
  publish_tuple_hash      TEXT NOT NULL,
  packet_archive_id       TEXT REFERENCES packet_archive_pointer(archive_id),
  prompt_template_id      TEXT NOT NULL,
  prompt_template_version TEXT NOT NULL,
  frozen_packet_sha256    TEXT NOT NULL,
  model_provider          TEXT NOT NULL,
  model_id                TEXT NOT NULL,
  started_at_utc          TIMESTAMPTZ NOT NULL DEFAULT now(),
  completed_at_utc        TIMESTAMPTZ,
  run_status              TEXT NOT NULL DEFAULT 'running'
    CHECK (run_status IN ('running', 'completed', 'failed', 'aborted')),
  UNIQUE (publish_tuple_hash, prompt_template_id, prompt_template_version)
    DEFERRABLE INITIALLY DEFERRED
);

Invariant: frozen_packet_sha256 must match Lesson 171 active tuple manifest at started_at_utc. Abort run if tuple promoted mid-flight.

red_team_prompt_template

Store templates outside the packet:

CREATE TABLE red_team_prompt_template (
  prompt_template_id      TEXT NOT NULL,
  template_version        TEXT NOT NULL,
  system_preamble         TEXT NOT NULL,
  user_prompt_skeleton    TEXT NOT NULL,
  max_output_tokens       INT NOT NULL CHECK (max_output_tokens BETWEEN 256 AND 8192),
  PRIMARY KEY (prompt_template_id, template_version)
);

Skeleton rules (embed in system_preamble):

  1. You receive read-only JSON of the frozen packet—no instructions to edit it.
  2. Output only a JSON array of findings: { "section_ref", "severity", "finding_text", "suggested_human_action" }.
  3. Never emit replacement annex text—only critiques.
  4. Flag contradictions between annex totals and leadership rollup language (Lesson 166 epsilon vocabulary).

red_team_finding (append-only)

CREATE TABLE red_team_finding (
  finding_id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  red_team_run_id         TEXT NOT NULL REFERENCES governance_packet_red_team_run(red_team_run_id),
  section_ref             TEXT NOT NULL,
  severity                TEXT NOT NULL CHECK (severity IN ('blocker', 'major', 'minor', 'info')),
  finding_text            TEXT NOT NULL,
  suggested_human_action  TEXT,
  model_rationale         TEXT,
  created_at_utc          TIMESTAMPTZ NOT NULL DEFAULT now(),
  promotion_status        TEXT NOT NULL DEFAULT 'pending_human'
    CHECK (promotion_status IN ('pending_human', 'promoted', 'rejected', 'deferred'))
);

No UPDATE on finding_text after insert—rejections append red_team_finding_event rows instead.

Human-only promotion lane

CREATE TABLE human_sign_off_promotion (
  promotion_id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  finding_id              UUID NOT NULL REFERENCES red_team_finding(finding_id),
  promoted_by_email       TEXT NOT NULL,
  promoted_at_utc         TIMESTAMPTZ NOT NULL DEFAULT now(),
  target_field            TEXT NOT NULL,
  promoted_text_sha256    TEXT NOT NULL,
  diff_against_frozen     TEXT NOT NULL,
  signer_ack_required     BOOLEAN NOT NULL DEFAULT true,
  CHECK (target_field <> ANY(ARRAY[]::TEXT[])) -- replace with signer_visible allow-list check in app layer
);

Workflow:

  1. Model run completes → findings pending_human.
  2. Governance owner reviews in UI—copy acceptable phrasing manually into annex editor.
  3. Insert human_sign_off_promotion with SHA-256 of exact promoted bytes.
  4. Route Lesson 174 signer ack if signer_ack_required.
  5. Only then set promotion_status = 'promoted'.

Forbid auto-mutate on signer tables

CREATE OR REPLACE FUNCTION forbid_auto_mutate_signer_fields()
RETURNS TRIGGER AS $$
BEGIN
  IF current_setting('app.red_team_actor', true) = 'llm_pipeline' THEN
    RAISE EXCEPTION 'signer_field_auto_mutate_forbidden';
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Set app.red_team_actor = 'llm_pipeline' only inside the isolated worker role. Human editors use human_editor or default.

Log retention (partner questionnaire row)

Log type Retention Storage class
Raw model request/response 90 days hot Encrypted object store, separate prefix red-team-logs/
red_team_finding rows Life of cert_window_id + 24 months Database + export to WORM per Lesson 175
human_sign_off_promotion Same as findings Tied to annex version semver

Export monthly red_team_retention_manifest.json with object keys + SHA-256 for intake packets.

Publish-gate coupling

Extend Lesson 171:

-- Pseudocode policy
block_reason = 'ai_red_team_human_signoff_pending'
WHEN EXISTS (
  SELECT 1 FROM red_team_finding f
  JOIN governance_packet_red_team_run r USING (red_team_run_id)
  WHERE r.publish_tuple_hash = :active_tuple
    AND f.severity IN ('blocker', 'major')
    AND f.promotion_status = 'pending_human'
);

Distinct from follow_the_sun_handoff_stale (Lesson 179) and mock_audit_carve_back_pending (Lesson 178).

Six-step red-team procedure

  1. Freeze tuple — confirm Lesson 171 gate green for promotion into red-team, not out.
  2. Snapshot bytes — hash frozen packet; store frozen_packet_sha256 on run row.
  3. Execute template — single model pass; save raw log to red-team-logs/ prefix.
  4. Parse findings — insert append-only rows; zero rows promoted automatically.
  5. Human triage — owner marks promoted / rejected / deferred with promotion table for accepts.
  6. Signer ack — Lesson 174 route for any promoted blocker; re-run publish gate.

Common mistakes

  • Promoting model text without human_sign_off_promotion row — fails 2026 annex AI attestation questions.
  • Re-running red-team after tuple promotion without new run id — findings reference stale hash.
  • Letting CI “fix” annex typos via LLM — bypasses human lane; use spell-check locally on promoted text only.
  • Deleting raw logs at 30 days when partner asks for 90-day evidence.
  • Using red-team to mutate partner_reply_packet.byte_hash — use new semver row per Lesson 176.

Verification checklist

  • [ ] Worker role cannot UPDATE partner_reply_packet.body_markdown (trigger fires).
  • [ ] Blocker finding with pending_human blocks publish with distinct block_reason.
  • [ ] Promoted text SHA-256 matches annex editor export.
  • [ ] Raw log object exists for each red_team_run_id.
  • [ ] Lesson 170 FAQ export lists count of AI-assisted findings promoted vs rejected.

Mini exercise (90 minutes)

Freeze a synthetic packet with one intentional FAQ contradiction. Run red-team template v1. Confirm model returns a major finding. Attempt scripted UPDATE on annex via llm_pipeline role—watch trigger fail. Manually promote corrected sentence, insert human_sign_off_promotion, clear gate.

Troubleshooting

Symptom Likely cause Fix
Gate stuck after “fix” Finding still pending_human Reject or promote all blocker/major rows
Partner says no AI log Retention manifest missing Export red_team_retention_manifest.json
Duplicate findings Re-run same template version Bump template_version or abort duplicate run
Signer rejects promoted text Promotion bypassed FAQ schema Re-promote only allow-listed fields

Pro tips

  • Tip — Template semver: bump minor when partner questionnaire adds a new annex section.
  • Tip — Pair with resource list: game NPC LLM fallback discipline and governance red-team share the same human-before-signer rule.
  • Tip — Defer info findings: do not block publish on info—partners care about blockers/majors only.

Next lesson teaser

The next lesson (Lesson 181: Q1 2027 Cert Intake Rehearsal Calendar Export from Lesson 172 Rubric JSON (2026)) exports the 172 rubric into ICS + governance README so Q1 2027 tabletop rooms are not double-booked in October 2026 planning decks. Lesson 180 keeps AI out of signer-visible text; Lesson 181 keeps calendar chaos out of intake week.

Continuity

  • Paired Unity guide chapter (next Guide-Create pass): Unity 6.6 LTS OpenXR governance red-team prompt exporter preflight — ScriptableObject template + frozen packet hash gate before any LLM call.
  • Lesson 179 — regional ingestion findings may feed red-team section_ref tags.
  • Lesson 178 — carve-back tickets should not be auto-closed by model suggestions.
  • Lesson 177 — dictionary columns in frozen packet must match migration phase.
  • Lesson 176 — promoted patches create new reply packet semver, never in-place hash rewrite.
  • Lesson 175 — red-team raw logs append to WORM prefix, not mutate archive rows.
  • Lesson 174 — signer ack on promoted blockers.
  • Lesson 172 — rubric JSON becomes Lesson 181 calendar source.
  • Lesson 171 — tuple freeze is the red-team entry gate.
  • Lesson 170 — FAQ field allow-list defines signer-visible boundary.

FAQ

Can we ban LLMs entirely and skip this lesson?
Yes for production annex authoring. You still need a written “no AI on signer fields” policy in intake packets—partners ask even when you do not run models.

Does red-team replace mock audit tabletop?
No. Lesson 172 tabletop stays human-scored; red-team is extra critique on frozen bytes before send.

Which model provider?
Any—log model_provider + model_id on the run row. Policy is process, not vendor.

Can deferred findings ship?
Yes if severity is minor/info only. Blocker/major pending_human keeps publish gate red.


Partners in 2026 do not fear LLMs—they fear unattested LLM edits on annexes they must sign. Red-team loudly, promote quietly, and keep every signer-visible byte behind a human thumbprint.