Lesson 32: Read-Only Analytics Warehouse Contracts and Service Accounts

Lesson 31 automated how rows arrive. This lesson defines where they live when spreadsheets stop being enough: a read-only analytics warehouse layer that stores the same closure and train facts you already trust, but with schemas, partitions, and accounts that scale to more builds and more stakeholders.

You are not building a full data mesh. You are adding one disciplined read layer so product, support, and production can query metrics without touching the ingest writer or raw webhook secrets.

Iphone 17 illustration for read-only analytics warehouse lesson

What you will build

By the end of this lesson, you will have:

  1. A three-table contract (build dimension, closure facts, ingest audit) that maps directly onto Lesson 29 through Lesson 31 fields
  2. A read-only service account pattern for BI tools with explicit deny lists on write APIs
  3. A partitioning rule keyed on promotion_date so historical trains stay fast to scan
  4. A published view layer that hides webhook payload columns from casual consumers

Step 1 - Decide the graduation trigger

Move off pure Sheets when at least one of these is true:

  • you have more than twelve active build_id values in the rolling window and scroll lag breaks standups
  • two teams need the same numbers with different filters and you are tired of duplicating tabs
  • you need queryable history for audits without granting edit rights to the ingest automation account

If none apply, stay on Lesson 31. Warehouse work has a carrying cost.

Step 2 - Freeze the logical model before picking a vendor

Start vendor-neutral. Map columns you already export:

Logical entity Source lessons Minimum columns
dim_build Lesson 29 closures, Lesson 30 columns build_id, promotion_date, platform_targets, owner_lane
fact_stability_closure Lesson 29 row at close build_id, yellow_open_close, stability_signal, dialogue_signal, closure_utc
fact_release_train_snapshot Lesson 30 grid build_id, train_note, recurrence_flag, rolling_window_label
stg_ingest_batch Lesson 31 ingest_batch_id, source, ingest_utc, row_count, checksum

Add one surrogate key only if your warehouse makes joins easier. Otherwise build_id plus ingest_batch_id stays the composite truth anchor, matching your Lesson 26 analytics confidence habit of naming the evidence object explicitly.

Step 3 - Land data in staging, promote to facts

Never let webhooks or CSV loaders write straight into BI-facing views.

  1. Landing tables or buckets accept raw files exactly as Lesson 31 produces them.
  2. Staging applies type casts, rejects unknown headers, and quarantines bad rows with a reject_reason column.
  3. Facts receive idempotent merges keyed on (build_id, ingest_batch_id) the same way you described in Lesson 31.

If a row fails staging, the dashboard should still show the last good fact row for that build_id, not a blank cell. That is a player-trust issue dressed up as SQL.

Step 4 - Create the read-only service account

Create two principals:

Principal Purpose Permissions
svc_rpg_ingest_writer Lesson 31 automation insert into landing and staging only
svc_rpg_bi_reader Looker Studio, Metabase, Hex select on approved views, no insert, no delete

Document a one-page deny list: the reader account must not call job APIs, bucket delete APIs, or secret managers. If your cloud console makes that annoying, that is a signal you are still on the default broad role. Narrow it.

Step 5 - Publish analyst-safe views

Ship at least these views:

  • vw_release_train_current_window mirrors Lesson 30’s rolling window rule in SQL so every consumer uses the same predicate.
  • vw_yellow_recurrence encodes the recurrence flag logic so exec summaries cannot drift from engineering definitions.
  • vw_build_lookup joins dim_build to the latest successful ingest_batch_id per build_id.

Name columns the same way Lessons 28 and 29 name them in briefings so Lesson 28 patch-readiness packets still read familiar when someone deep-links from a slide deck.

Step 6 - Partition and prune with intent

Partition fact tables on promotion_date month (or week if you ship very often). Keep cold partitions for legal or publisher audits, but set a documented retention cap for raw webhook JSON if it includes stack snippets. Align that cap with what you already decided in Lesson 31 about PII.

Pro tips

  • Pro tip: Treat ingest_batch_id like a Git tag. If a batch is wrong, you roll forward with a corrective batch, not silent edits.
  • Pro tip: Mirror Launch Lesson 24 scorecard rhythm by materializing a monthly summary table fed only from vw_release_train_current_window, so commercial and stability narratives share one clock.
  • Pro tip: Add a cheap row count diff alert: landing row count versus staging promoted count should match within your duplicate tolerance every run.

Common mistakes

  • Granting the BI tool the same OAuth app your writers use, then wondering why someone dragged a formula over production ingest credentials.
  • Letting analysts create private extracts with stale build_id filters that contradict the Lesson 21 style control panel thresholds your launch track already published.
  • Skipping stg_ingest_batch metadata, which makes it impossible to answer “which upload broke the train view?” in under five minutes.

Troubleshooting

Symptom Likely cause Fix
Dashboard shows double trains merge key too narrow require (build_id, ingest_batch_id, source) uniqueness in staging
Warehouse costs spike full table scans on JSON move parsed columns into typed fact columns, partition by date
BI tool timeouts view joins too wide precompute vw_release_train_current_window on a schedule

Mini challenge

In one document, write:

  1. The exact CREATE VIEW (or equivalent) definition for vw_release_train_current_window, including the rolling window predicate in plain English and SQL.
  2. The IAM or role binding list for svc_rpg_bi_reader.
  3. A rollback note: how you would freeze reads and revert to Lesson 31 CSV-only mode for one release week.

FAQ

Do I need dbt or Airflow on day one?

No. Scheduled SQL in your warehouse console plus version-controlled .sql files in Git is enough until you have three failing merges a month.

How do I keep warehouse schema in sync with Lesson 31 CSV headers?

Add a CI check that parses the header row of a golden sample CSV and compares it to a checked-in schema.json. Fail the build when someone renames recurrence_flag without updating the loader.

What if we use SQLite or Postgres instead of a cloud warehouse?

The same contracts apply. Use schemas, roles, and views. Swap partition syntax for table inheritance or monthly tables.

Lesson recap

You can now:

  • describe a minimal star layout for RPG stability metrics without over-modeling
  • separate writer and reader credentials for ingest versus BI
  • expose only vetted views that match Lesson 29 through Lesson 31 language
  • partition and retain data with the same seriousness you apply to player crash bundles in the crash triage tooling overview

Next lesson teaser

When dashboards multiply, add query budgets and billing guardrails so warehouse spend stays predictable: Lesson 33: Warehouse Query Budgets and Cost Guardrails for RPG Live-Ops Metrics. Until then, instrument warehouse spend per ingest_batch_id week and review it beside your Lesson 30 train notes.

Related learning

If this lesson stopped a silent schema drift from erasing trust in your yellow mitigation story, save it beside your Lesson 28 briefing template runbook.