
Why this matters now
Three things changed in 2026 that turned save-system fuzz testing from a nice-to-have into a partner-cert expectation:
- Partner reviewers in mid-2026 began asking for evidence that save files survive interrupted writes, hot-swapped platforms, schema migrations, and resume-from-sleep on the Steam Deck. The asks landed informally first, then formally in Q3 2026 reviewer notes. Several indie launches lost partner-cert sign-off specifically on save corruption edges that the team had never thought to test.
- The autumn 2026 Steam Deck Verified refresh actively tests resume-from-sleep with the Deck's power button mid-play. A save format that cannot survive a suspend mid-write fails the refresh check, not as a warning, as a fail.
- Property-based testing libraries matured to the point where a beginner can wire one up in a Unity or Godot project in one focused evening: FsCheck.Unity (for C# / Unity Test Framework), Hypothesis for any Python tooling around your save format, and a small GUT + GDScript wrapper pattern for Godot.
Property-based fuzzing is not a replacement for unit tests; it is the layer underneath unit tests that catches the inputs you never thought to write a unit test for. This post walks the six save invariants every save system should hold, the Unity and Godot recipes, the CI integration, and the evidence packet template you can attach to your next partner-cert submission.
If your save system is brand new and you do not yet have load/save round-tripping working, start with How to Build a Simple Save/Load System in Unity and Game Save Systems - Implementing Persistent Data in Unity. If you have a working save system and need to migrate old files across an upgrade, read Reliable Save Migration Unity Godot 2026 before adding fuzz tests so the migration paths are stable. Then come back here.
Who this guide is for
- Indie teams with a working Unity or Godot save system that has never been formally fuzz-tested.
- Solo developers preparing for their first Steam Deck Verified submission or first partner-cert submission of any kind.
- Beginner-to-intermediate engineers who have heard "property-based testing" and bounced off the academic framing.
- Teams that lost a partner-cert review specifically on a save edge case and want to never see that reviewer note again.
Not for: teams who already use Hypothesis + a robust state-machine model in production (you are past the scope of this post). Teams without any working save load/save (start with the linked primers above).
What property-based fuzz testing actually is
In one sentence: you describe properties your save files must always satisfy, the library generates thousands of inputs to try to break those properties, and any failure produces a shrunk minimal counter-example that you turn into a regression seed.
The three differences from unit testing:
- Unit test: "Save with PlayerLevel=5, load, assert PlayerLevel==5." One input. One assertion.
- Random fuzz: "Save with random bytes, load, assert no crash." Many inputs but only one assertion ("did not crash"). Catches less than people expect.
- Property-based fuzz: "For any valid PlayerState, Save(s) then Load() should produce a PlayerState equal to s." Many inputs, each tested against a meaningful invariant. The library shrinks failing inputs to the smallest example that still fails so you get a 4-byte counter-example instead of a 50KB one.
The fourth library benefit is stateful testing - you describe a state machine of save-system operations (save, load, append achievement, migrate version) and the library searches for sequences of operations that break invariants. Most beginner setups skip this; you can add it later.
The six invariants every save system should hold
Memorize these six. They are the framework the rest of the post hangs off.
Invariant 1 - Round-trip integrity
Load(Save(state)) == state for every valid state.
This is the table-stakes invariant. Most beginner save systems pass this for simple states and fail it for edge cases (Unicode names, very large inventories, dictionaries with specific key orderings, floats at NaN/infinity, deeply nested structures).
Invariant 2 - Monotonic versioning
SaveVersion(later_save) >= SaveVersion(earlier_save) and Load(save).version == SaveVersion(save).
If your save format has a version field, it must monotonically increase or stay the same as your codebase advances. Loading a save must produce a state that knows its own version. Failing this invariant means migration paths break silently.
Invariant 3 - Schema-evolution safety
Load_v2(Save_v1(state_v1)) should either succeed (with valid state) or fail loudly. Never produce silent corruption.
Schema evolution is where most production save bugs live. If you add a field, removing a field, or change a field type, old saves must either upgrade cleanly or be rejected with an actionable error. Silent partial loads that fill missing fields with zeros are the worst failure class because they look like they worked.
Invariant 4 - Atomic write
Power_loss_during(Save(state)) should leave either the previous save intact OR the new save intact. Never a half-written file.
The canonical implementation: write to a .tmp file, fsync, then rename atomically to the real path. If the power dies mid-write, the previous save survives because the rename has not happened yet.
Invariant 5 - Corruption resilience
Load(corrupt_byte_at(Save(state), position)) should either succeed (if the byte was in a non-critical field) or fail with a specific error code. Never crash, never silently load wrong state.
You will be amazed how many save systems pass invariants 1-4 and crash on a single flipped bit. Production users will flip bits via storage device errors, OS sync glitches, and SD card wear.
Invariant 6 - Size bounds
SaveSize(state) <= MaxSaveSize and Load rejects files larger than MaxSaveSize quickly.
A save that grows unbounded eventually fails. A loader without an upper bound is a denial-of-service vector if a player swaps in a malicious file. Both ends need a sane cap.
Setting up Unity Test Framework + FsCheck.Unity
The Unity recipe. Time budget: 30-45 minutes from "I have a working save system" to "I have my first failing property-based test."
Step 1 - Install FsCheck.Unity
In Unity 6.6 LTS, open Window → Package Manager → Add package from git URL and add:
https://github.com/fscheck/FsCheck.Unity.git
Wait for resolution. The package adds an FsCheck namespace usable from any Editor or Runtime test assembly.
Step 2 - Create a test assembly
In Assets/Tests/EditMode/, create an Assembly Definition file SaveSystemTests.asmdef with:
Editorplatform only.- References: your save-system assembly,
UnityEngine.TestRunner,UnityEditor.TestRunner, andFsCheck. - "Define Constraints" empty.
Step 3 - Write your first round-trip property
Create SaveSystemRoundTripTests.cs:
using NUnit.Framework;
using FsCheck;
using FsCheck.NUnit;
using YourGame.Saves;
public class SaveSystemRoundTripTests
{
[Property(MaxTest = 500)]
public bool Save_then_load_round_trips(PlayerState state)
{
var bytes = SaveSerializer.Serialize(state);
var loaded = SaveSerializer.Deserialize(bytes);
return PlayerStateEquals(state, loaded);
}
}
MaxTest = 500 runs 500 random inputs per test. Default is 100; for save-system tests, 500 catches edges 100 misses without bloating test time past ~2 seconds.
Step 4 - Register a generator for your save type
FsCheck needs to know how to generate random PlayerState instances. Add a generator class:
public static class PlayerStateGenerators
{
public static Arbitrary<PlayerState> PlayerStateArb()
{
return Arb.From(from level in Gen.Choose(1, 999)
from hp in Gen.Choose(0, 10000)
from name in Arb.Default.UnicodeString().Generator
.Where(s => s.Length <= 64)
from inventory in Arb.Default.List<InventoryItem>().Generator
select new PlayerState
{
Level = level,
Hp = hp,
Name = name,
Inventory = inventory
});
}
}
Tell FsCheck to use it via an assembly-level attribute:
[assembly: FsCheck.NUnit.Properties(Arbitrary = new[] { typeof(PlayerStateGenerators) })]
Step 5 - Run and shrink
Open Window → General → Test Runner → EditMode and click Run All. Your property runs 500 random inputs. When one fails, FsCheck shrinks the input to a minimal counter-example and prints it. Copy that counter-example into a permanent regression-seed file at Assets/Tests/EditMode/SaveSystemRegressionSeeds.cs:
[Test]
public void Regression_unicode_name_with_surrogate_pair_round_trips()
{
var state = new PlayerState { Name = "\uD83D\uDC0D", Level = 1 };
var bytes = SaveSerializer.Serialize(state);
var loaded = SaveSerializer.Deserialize(bytes);
Assert.AreEqual(state.Name, loaded.Name);
}
Every failing fuzz input becomes a permanent unit test. This is how the fuzz suite gets faster and more confident over time: the fuzz layer finds new edges, the unit layer guards against regressions on old ones.
Setting up Godot 4.5 GUT + property-based wrapper pattern
Godot does not have an FsCheck equivalent natively, but GUT (Godot Unit Test) plus a small GDScript wrapper gives you 80% of the value. Time budget: 25-40 minutes.
Step 1 - Install GUT
In Godot 4.5, open AssetLib and search GUT. Install version 9.x or newer (the 9.x line targets Godot 4.5).
Step 2 - Create a tests/ folder structure
res://tests/
test_save_roundtrip.gd
test_save_regression_seeds.gd
fuzz_helpers.gd
Step 3 - Write a minimal fuzz_helpers.gd
extends RefCounted
class_name FuzzHelpers
const ITERATIONS = 500
const SEED = 12345 # pin for reproducibility; rotate per CI run if desired
static func rand_player_state(rng: RandomNumberGenerator) -> Dictionary:
return {
"level": rng.randi_range(1, 999),
"hp": rng.randi_range(0, 10000),
"name": _rand_unicode_string(rng, rng.randi_range(0, 64)),
"inventory": _rand_inventory(rng, rng.randi_range(0, 50)),
}
static func _rand_unicode_string(rng: RandomNumberGenerator, length: int) -> String:
var out := ""
for i in length:
# cover BMP plus emoji range
var cp = rng.randi_range(32, 0x1F9FF)
out += String.chr(cp)
return out
static func _rand_inventory(rng: RandomNumberGenerator, count: int) -> Array:
var items := []
for i in count:
items.append({
"id": rng.randi_range(0, 9999),
"qty": rng.randi_range(0, 99)
})
return items
Step 4 - Write your first round-trip property
test_save_roundtrip.gd:
extends GutTest
const SaveSystem = preload("res://scripts/save_system.gd")
func test_save_then_load_round_trips():
var rng := RandomNumberGenerator.new()
rng.seed = FuzzHelpers.SEED
for i in FuzzHelpers.ITERATIONS:
var state = FuzzHelpers.rand_player_state(rng)
var bytes = SaveSystem.serialize(state)
var loaded = SaveSystem.deserialize(bytes)
assert_eq(loaded, state,
"Round-trip failed at iteration %d with state %s" % [i, state])
if not _states_equal(state, loaded):
print("[regression-seed] ", state)
return
Run via Project → Tools → GUT → Run All.
Step 5 - Capture regression seeds
When the property fails, GDScript prints the failing state. Convert it to a permanent unit test in test_save_regression_seeds.gd:
extends GutTest
func test_regression_unicode_surrogate_pair_round_trips():
var state = {"name": "\uD83D\uDC0D", "level": 1, "hp": 0, "inventory": []}
var bytes = SaveSystem.serialize(state)
var loaded = SaveSystem.deserialize(bytes)
assert_eq(loaded.name, state.name)
Same pattern as Unity. Fuzz finds new edges; unit layer guards old ones.
The Godot wrapper pattern is less elegant than FsCheck (no automatic shrinking), but the regression seed preservation is the part that delivers the partner-cert evidence. The fuzz library you use is less important than the discipline of converting failures into permanent unit tests.
Wiring the other five invariants
Round-trip integrity is the entry point. The other five follow the same pattern: define the property, write the generator (or extend the existing one), run, capture, and convert failures to regression seeds.
Invariant 2 - Monotonic versioning
[Property(MaxTest = 200)]
public bool Save_writes_correct_version(PlayerState state)
{
var bytes = SaveSerializer.Serialize(state);
var version = SaveSerializer.ReadVersionHeader(bytes);
return version == SaveSerializer.CurrentVersion;
}
Invariant 3 - Schema-evolution safety
Pair your current PlayerState with a frozen PlayerStateV1 type and verify the V2 loader handles V1 input cleanly:
[Property(MaxTest = 200)]
public bool Load_v2_handles_v1_saves(PlayerStateV1 v1State)
{
var v1Bytes = SaveSerializerV1.Serialize(v1State);
var migrated = SaveSerializer.Deserialize(v1Bytes); // current loader
return migrated != null && migrated.Level == v1State.Level;
}
Invariant 4 - Atomic write
Atomic write is best fuzzed at the file-system level rather than the byte level. Inject a fake IFileSystem that randomly aborts writes:
[Property(MaxTest = 100)]
public bool Power_loss_mid_write_preserves_previous_save(int abortAfterByteCount)
{
var fs = new FlakyFileSystem(abortAfterByteCount: abortAfterByteCount);
var previousState = new PlayerState { Level = 5 };
SaveSerializer.SaveToFileSystem(fs, previousState);
var newState = new PlayerState { Level = 6 };
try { SaveSerializer.SaveToFileSystem(fs, newState); }
catch (FlakyFileSystemAbortException) { /* expected */ }
var loaded = SaveSerializer.LoadFromFileSystem(fs);
return loaded.Level == 5 || loaded.Level == 6;
}
Invariant 5 - Corruption resilience
[Property(MaxTest = 1000)]
public bool Single_byte_flip_never_crashes(PlayerState state, int byteIndex, byte flipMask)
{
var bytes = SaveSerializer.Serialize(state);
if (bytes.Length == 0) return true;
var position = byteIndex % bytes.Length;
bytes[position] ^= flipMask;
try
{
var loaded = SaveSerializer.Deserialize(bytes);
return loaded != null;
}
catch (SaveCorruptionException)
{
return true; // expected, not a crash
}
}
MaxTest = 1000 here because corruption resilience is a high-coverage invariant - you want many bit-flip positions tested per CI run.
Invariant 6 - Size bounds
[Property(MaxTest = 100)]
public bool Save_size_bounded(PlayerState state)
{
var bytes = SaveSerializer.Serialize(state);
return bytes.Length <= SaveSerializer.MaxSaveSize;
}
[Property(MaxTest = 100)]
public bool Loader_rejects_oversize_quickly(int overshootBytes)
{
var overshoot = System.Math.Max(0, overshootBytes);
var bytes = new byte[SaveSerializer.MaxSaveSize + overshoot + 1];
var stopwatch = System.Diagnostics.Stopwatch.StartNew();
try { SaveSerializer.Deserialize(bytes); }
catch (SaveTooLargeException) { /* expected */ }
return stopwatch.ElapsedMilliseconds < 100;
}
The 100ms threshold catches loaders that allocate up-front before checking size; those are the slow rejections that turn into DoS surfaces.
CI integration recipe
Run all six invariants on every PR. Add the fuzz layer to your existing GitHub Actions (or whatever CI you use):
- name: Run Unity EditMode tests with fuzz
uses: game-ci/unity-test-runner@v4
with:
testMode: editmode
artifactsPath: artifacts/editmode
customParameters: -runTests -testCategory "Fuzz,Regression"
Two PR-time tactics that matter:
- Pin the FsCheck seed per PR but rotate it per nightly run. PR-time runs use a deterministic seed for reproducibility; nightly runs use a fresh seed to explore new inputs.
- Time-box fuzz runs. If a fuzz invariant takes longer than 30 seconds on PR-time CI, lower
MaxTestfor PR-time runs and let the nightly run handle deeper exploration. The CI shouldn't make developers wait.
For the broader CI lane shape, pair with 15 Free GitHub Actions CI Recipes for Unity and Godot Builds.
The partner-cert evidence packet template
When you submit for Steam Deck Verified or any partner cert, attach a save-system-fuzz-evidence.md file to your submission notes with this exact shape:
# Save System Fuzz Evidence - <ProjectName> - <Date>
## Invariants tested
- [x] Round-trip integrity - FsCheck.Unity, 500 iterations/PR + 5000 nightly
- [x] Monotonic versioning - FsCheck.Unity, 200 iterations/PR
- [x] Schema-evolution safety - FsCheck.Unity v1→v2 path, 200 iterations
- [x] Atomic write - FlakyFileSystem injection, 100 iterations
- [x] Corruption resilience - single-bit flip, 1000 iterations/PR
- [x] Size bounds - Loader DoS check at 100ms threshold
## Regression seeds preserved
N regression-seed unit tests in `Assets/Tests/EditMode/SaveSystemRegressionSeeds.cs`
(or Godot equivalent path).
## CI evidence
Last 30 days of nightly fuzz runs: green / N regressions caught and fixed.
## Resume-from-sleep specifically
Suspend-mid-save test: <PASS/FAIL>
Wake-up save-integrity-check pattern: <description>
Reviewers read this in 30 seconds and move on. The packet does not need to be long; it needs to be specific and verifiable.
Common beginner mistakes
- Skipping the regression seed step. Fuzz finds an edge case, you fix it, but you do not write a permanent unit test for it. Six months later the fix regresses and nobody notices.
- Treating fuzz as a replacement for unit tests. Fuzz finds edges; units guard known edges. Both layers are needed.
- Running fuzz only locally, not in CI. Without CI gating, the value of the suite decays over weeks.
- Setting
MaxTestto a huge number on PR CI. Slow PR-time CI is unsanctioned CI that gets disabled. Keep PR-time fuzz under 2 minutes total. - Letting the random seed drift on PR-time runs. PR-time CI should be deterministic. Random seed = nightly only.
- Fuzzing only the serializer, not the file-system path. Atomic write and resume-from-sleep failures live on the file-system side, not the byte-level side.
- Generating only "easy" inputs. UnicodeString with
Where(s => s.Length <= 64)is correct; UnicodeString unconstrained risks running out of memory on a 10,000-character name. Bound the generator at the size your game actually allows. - Treating
LoadCorruptionExceptionas a generic crash. Catch expected exceptions specifically; any other exception type is still a real crash.
Pro tips
- Pin a
SAVE_FORMAT_VERSIONconstant and bump it in the same PR that changes the format. Schema-evolution invariant tests then catch the version mismatch automatically. - Save the failing input on every CI failure. A
failures-<date>.txtartifact uploaded by the CI job lets you reproduce exactly the input that broke. - Run the corruption fuzz against real save files from your QA team, not only synthetic generated ones. Real saves have weird structures synthetic ones miss.
- Cap save file size deliberately. A 50KB cap for a small RPG is fine; auditing the cap proves intent.
- Add a per-platform fuzz run. Linux Editor + Windows Editor + Deck remote-play each find different edges due to filesystem differences.
- Document the regression-seed schema versioning. When you bump
SAVE_FORMAT_VERSION, mark which regression seeds were captured against which version so future migrations know what to preserve. - Audit fuzz coverage quarterly. Add new invariants when you add new save fields. Remove obsolete ones when you delete fields.
How the six invariants connect to broader cert lanes
- Steam Deck Verified autumn 2026 refresh: the resume-from-sleep check directly exercises Invariants 4 and 5. Run those two specifically before Deck Verified submission. See Steam Deck Verified Autumn 2026 Refresh.
- Steam Next Fest October 2026: the demo-week traffic spike will surface save corruption faster than a closed alpha will. See the 7-Day Vertical Slice Demo Challenge.
- Q3 2026 partner cert intake: mock-audit packets now expect fuzz evidence under reviewer-facing dashboards. See We Delayed Feature Branch Save Launch Week for the case-study framing.
Key takeaways
- Property-based fuzz testing finds the inputs you never thought to unit-test. Failing inputs shrink to minimal counter-examples and convert to permanent regression seeds.
- Six invariants every save system should hold: round-trip integrity, monotonic versioning, schema-evolution safety, atomic write, corruption resilience, size bounds.
- Unity stack: FsCheck.Unity via Package Manager + Unity Test Framework + assembly-level
[Properties(Arbitrary = ...)]attribute +[Property(MaxTest = ...)]annotated tests. - Godot stack: GUT 9.x + small
FuzzHelpersGDScript wrapper + manual seed pinning + manual regression seed capture. - Convert every failing fuzz input into a permanent unit test at
SaveSystemRegressionSeeds.cs(Unity) ortest_save_regression_seeds.gd(Godot). This is where the suite gets stronger over time. - PR-time CI: deterministic seed +
MaxTestsized so total fuzz takes under 2 minutes. Nightly CI: rotating seed + 10xMaxTest. - Atomic write fuzzes at the file-system level, not byte level. Inject a
FlakyFileSystem. - Size-bounds fuzz must check both the writer's cap and the loader's quick-reject (under 100ms for oversized inputs).
- Attach a
save-system-fuzz-evidence.mdfile to your partner-cert submission notes. Six checkboxes + last 30 days of CI green + resume-from-sleep specifically. - The fuzz library you use matters less than the discipline. Generators bounded to realistic input sizes + regression-seed preservation + CI gating = 90% of the value.
FAQ
1. Our save format is JSON. Does property-based fuzz still help? Yes - arguably more. JSON is text but the invariants still hold: round-trip integrity (escape sequences, Unicode), schema-evolution safety (added/removed fields), corruption resilience (truncated JSON, mismatched braces). FsCheck handles JSON strings fine; just bound the generator to strings under your actual size cap.
2. We use Protobuf / FlatBuffers / MessagePack. Anything different? The serializer is binary so corruption resilience and size bounds get more interesting (single-byte flips in a length prefix create more dramatic failures than in JSON). The Protobuf official test suite has good corruption-resilience patterns to copy.
3. How long should fuzz tests take on CI?
PR-time: under 2 minutes for the full fuzz suite. Nightly: 10-30 minutes is fine; the developer is asleep. If PR-time exceeds 2 minutes, lower MaxTest until it fits, and let the nightly cover the deeper exploration.
4. We have a cloud save (Steam Cloud, iCloud, Google Play Saves). Do we fuzz those too? The local save is the place to fuzz. Cloud save is the upload path; treat it as an opaque transport. The invariants you fuzz locally protect against the failure modes that survive the round-trip through cloud save.
5. The fuzz suite found a real corruption bug. How do I prioritize? Severity heuristic: (a) crashes the game = fix immediately; (b) silently corrupts state = fix this sprint; (c) fails to load with a specific error = fix next sprint and ship a migration; (d) fuzz found it but no real user could reproduce = add the regression seed and move on. Fuzz finds many edges; most are real, a few are theoretical. Triage with the heuristic.
Related reading
- How to Build a Simple Save/Load System in Unity - prerequisite for Unity readers.
- Game Save Systems - Implementing Persistent Data in Unity - primer on Unity save architecture.
- Reliable Save Migration Unity Godot 2026 - migration paths Invariant 3 depends on.
- Steam Deck Verified Autumn 2026 Refresh - upstream cert context for Invariants 4 and 5.
- 7-Day Vertical Slice Demo Challenge for Steam Next Fest October 2026 - festival traffic spike companion.
- We Delayed Feature Branch Save Launch Week - case study where save discipline saved a launch.
- AI Game Testing - Automated QA - broader QA automation context.
Resource lists worth bookmarking: 20 Free Game Save and Cloud Sync Services, 15 Free GitHub Actions CI Recipes for Unity and Godot, 20 Free Game Testing QA Tools 2026, and 18 Free Unity 6.6 Migration Regression Triage Resources.
Authoritative references: FsCheck on GitHub, Hypothesis documentation, GUT (Godot Unit Test) repository, Unity Test Framework documentation.
If your team has run this six-invariant suite against a production save system and caught a real bug, share the shrunk counter-example. The community gets faster as a network than as individual studios.