OpenXR Option Scorer Model Version Binding Mismatch on Quest Build - Release Lane and Tuple Lock Fix

Your calibration packet says model M-2026.2.4 and expects option B to win for cluster MIT-12. On the Quest build candidate, rankings invert, policy filters disagree, or promotion labels do not match the signer packet. Nobody changed weights on purpose.

In 2026 mitigation and OpenXR release lanes, this is rarely "random XR behavior." It is almost always binding drift: more than one code path can supply scores, or the build tuple does not lock the scorer version your governance assumed.

Problem

Typical symptoms:

Editor or dogfood build ranks options correctly; Quest store candidate does not
CI gate passes on one agent; release owner reproduces different top score on device
Telemetry shows model_version missing, null, or different across two logs from the same build id
policy filter accepts an option in replay but rejects it in the candidate you intended to ship
mitigation decisions made pre-build do not match post-install first-session scoring

If you see ranking motion without a logged model identity, treat every conclusion as suspect until binding is proven.

Root cause summary

Multiple scorers — shadow/canary code accidentally left enabled; production path still reads legacy weights.
Unpinned resources — scoring config loaded from StreamingAssets, remote JSON, or Addressables without version hash tied to the release tuple.
Conditional compilation — #if branches load different weight files for Quest than for Editor.
Stale cached config — warm start or persistent storage rehydrates an older model id after you thought you shipped a new one.
Tuple skew — build number, git SHA, and scorer manifest disagree; humans reference one tuple while automation references another.

Fix strategy: one binding path, one source of truth per tuple, mandatory model_version in telemetry.

Fastest safe fix path

Search the project for every load of scoring weights or calibration JSON. There must be one production resolver for Quest builds.
Embed model_version (and ideally a short hash) in player settings or a generated ScorerManifest included only via deterministic pre-build step.
Log model_version at session start on device and fail closed in internal builds if it is missing.
Lock the release tuple: commit id, build number, scorer manifest hash in one row your signer packet cites.
Re-run one replay pack on the exact Quest artifact and compare rankings to the calibration packet.

Step-by-step fix

Step 1: Inventory binding sites

List every class or ScriptableObject that can provide weights or dimension definitions.
Mark each as production, shadow, editor-only, or deprecated.

Success check: only one production path remains for player builds.

Step 2: Collapse duplicate initialization

Common bug: startup order runs an old initializer after your new loader.

Ensure scorer init runs once, after config is available, before first option list is evaluated.
If you use Addressables, confirm the label you load in Quest matches CI.

Success check: deterministic order in player log with single "scorer_bound" event.

Step 3: Pin manifest to build

Generate a small manifest at build time:

model_version
weights_revision or file hash
generated_at (UTC)
git_sha or build_number

Embed it in:

a Resources/StreamingAssets file only replaced by CI, or
PlayerSettings scripting define that maps to a checked-in manifest for that tag

Success check: manifest on device matches signer packet row.

Step 4: Align Editor vs Quest defines

Search for:

different preprocessor symbols between Editor and Android
missing OPENXR or headset-only branches that skip new loader

Success check: Development and Release Quest builds both load same manifest for a given tag.

Step 5: Fix cache and persistence issues

If you cache scorer config:

key cache by model_version
clear cache on app upgrade when manifest hash changes
never reuse cache across different build numbers without validation

Success check: cold install and upgrade install both report identical model_version on first frame where scoring is active.

Step 6: Telemetry contract

Add fields to your existing OpenXR startup or mitigation telemetry (see related help on startup instrumentation):

active_model_version
scorer_manifest_hash
config_load_source (resource path id, not full secrets)

Success check: every scoring decision log row joins to the same version as startup.

Verification checklist

Quest artifact A and B (same tag) produce identical model_version logs
Rankings for a frozen option set match calibration packet within expected float tolerance
Policy filter outcomes match calibration table for the same inputs
Removing network does not change local scorer version (unless you intentionally stream config; then block promotion when offline at first lock)

Alternative fixes

Feature flag service: if you must remote-switch models, gate by signed payload and log flag id beside model_version. Do not silently override local manifest without audit row.
Split configs: keep Quest-only tuning in a separate file but still one loader with explicit merge rules documented in signer packet.

Prevention tips

Treat scorer changes like code changes: review + CI + tuple lock.
Never approve promotion without a device log snippet showing model_version.
After wide rollout of a new model, run one "binding regression" test in your weekly cadence.

FAQ

Why do rankings differ slightly between Editor and Quest?

Floating-point order or platform math can cause ties to resolve differently. Freeze tolerance bands in your calibration packet and test on device for borderline cases.

Should the scorer live in native plugin code?

If it does, ensure the same version string is exposed to C# and logged. Split stacks often cause invisible drift.

Can Addressables serve scorer config safely?

Yes, if the address is pinned per release tuple, content hash is verified before bind, and offline behavior is defined.

Escalation criteria for release owners

Escalate to a hold or rollback discussion when:

model_version cannot be confirmed on device for two consecutive candidates
policy outcomes disagree with calibration packet on the same frozen fixture
shadow and production paths both emit scores in one session (duplicate bind detected)

These are governance signals, not “wait for next patch” cosmetic issues.