Case Studies & Experiments May 22, 2026

I Built a Game with ChatGPT and Claude - Here's What Happened (2026)

I built a game with ChatGPT and Claude in 2026—real results, failures, and lessons. Watch the educational video and copy the dual-LLM workflow for your own indie prototype.

By GamineAI Team

I Built a Game with ChatGPT and Claude - Here's What Happened (2026)

Pixel-art hero for building a game with ChatGPT and Claude 2026 experiment

I did not set out to prove AI replaces game developers. I set out to answer a narrower question that indies keep asking in 2026: If I split the work between ChatGPT and Claude instead of one chat window, do I ship faster—or just fail with better documentation?

This article is the written companion to that experiment. The full educational walkthrough is on YouTube so you can watch decisions in real time, pause on prompts, and copy the workflow into your own engine.

Watch the educational video first

The video shows the same pipeline this article describes: scope lock, ChatGPT drafting, Claude review, engine integration, and playtest gates. Watch it when you want motion and screen context; read this page when you want checklists, failure modes, and links to deeper GamineAI guides.

YouTube link: https://www.youtube.com/watch?v=rB_o4zNo2bg

If the embed does not load on your reader, open the link directly—the walkthrough is the same content in video form.

Why this matters now (May 2026)

Three shifts made a dual-LLM build worth documenting:

  1. Model specialization — ChatGPT families excel at fast iteration and tool-style codegen; Claude families excel at long-context review and careful refactors. Using only one model hides that split.
  2. Store and partner scrutinySteam AI disclosure and seven-day disclosure sprints mean “we used AI” must be operational truth, not marketing fluff.
  3. Fallback expectations — If your game has dialogue or live ops text, ElevenLabs + Ollama architecture and LLM fallback nets are now baseline literacy—not stretch goals.

Direct answer: ChatGPT carried first drafts and velocity; Claude carried review, safety, and structure. I still did integration, playtesting, and scope cuts. The game shipped as a small vertical slice, not a AAA title—and that honesty is the point.

What I built (scope lock)

Constraint Choice
Genre Top-down micro-roguelite rooms
Engine Godot 4.x (GDScript)
Session 15–25 minute runs
Art Mixed AI-assisted + kitbash
Audio Stock + one generated loop
Multiplayer None
Narrative Light barks, no LLM in player-facing chat

Working title internally: Room Ledger—a run-based game where room order is seeded and logged for replay debugging. The name matters less than the scope contract pasted at the top of every AI thread.

If you cannot describe your game in one sentence without commas, cut scope before opening ChatGPT.

The dual-LLM contract (rules I actually followed)

Role Model Allowed to do Not allowed
Drafter ChatGPT Design docs, GDScript v1, UI copy v1, task breakdowns Final merge without review
Reviewer Claude Refactor, threat modeling, test plans, prompt registry Invent new mechanics mid-sprint
Human (me) Engine wiring, playtest, cuts, uploads Pretend AI playtests

Every merged script passed a Claude review pass with a standard prompt (below). Every new mechanic started as a ChatGPT brief with acceptance criteria.

This is different from the older platformer experiment that rotated more tools (Midjourney, Copilot). Here the story is two LLMs + one engine.

Week 0 — Design without code (ChatGPT-led)

Prompt pattern:

You are a senior game designer. Propose a 15-minute roguelite vertical slice for Godot 4.x. Output: core loop, 8 rooms max, 3 enemy types, 1 boss, fail states, and acceptance tests in plain English. No code.

What worked: Fast loop definition, readable milestone table, sensible difficulty curve on paper.

What failed: Feature creep in the same thread—“add daily challenges, add meta progression”—unless I started a new chat with “ignore prior messages except scope doc.”

Artifact saved: docs/scope_v1.md—pasted into Claude and ChatGPT system context for the rest of the project.

Beginner tip: Export ChatGPT’s design reply to Markdown the same day. Chats vanish; files do not.

Week 1 — Code drafts (ChatGPT-led, Claude-gated)

Player controller

ChatGPT produced a workable CharacterBody2D mover in one pass: acceleration, coyote time, dash with cooldown.

Claude review caught:

  • Missing is_on_floor() guard on dash reset edge case
  • Hardcoded 300 pixels/sec with no export variables
  • No input action map references (keyboard vs controller drift)

Time: ~3 hours human integration, ~45 minutes AI combined.

Room spawner + seed

ChatGPT suggested a run_seed and room_index pattern—aligned with industry discourse on RNG ledgers even though this project was Godot, not Construct.

Claude added:

  • run_id string for playtest forms
  • Warning comment on where not to call randi() (UI shake killed determinism in an earlier micro-test)

Lesson: Treat AI seed advice like any other sample code—verify with refresh/reload tests.

UI overlay

ChatGPT drafted pause menu copy and layout tree. Claude shortened strings for 1280×800 and flagged unverifiable store claims in placeholder text (“infinite runs!”). Small catch, large refund-risk if shipped.

Day-by-day build diary (what the camera caught)

The educational video follows this rough calendar. Numbers are calendar days, not eight-hour workdays—this was nights-and-weekends indie time.

Day Focus ChatGPT Claude Human outcome
1 Scope + loop Design doc v1 Trim creep scope_v1.md frozen
2 Movement Controller draft P0 dash bug Playable box in room
3 Combat stub Hitbox template Signal naming One enemy dies
4 Room flow Spawner sketch Seed warnings Room 1→2 works
5 UI pause Menu tree Copy length Pause without crash
6 Audio pass SFX list only Placeholder blips
7 Playtest Triage list 12 bugs filed
8 Fix P0 Patch suggestions Review diffs Dash build-only bug gone
9 Polish Bark lines Tone pass Strings wired
10 Build export Checklist Disclosure draft itch HTML5 zip
11 Film edit YouTube rough cut
12 Publish Article + video live

Why publish the diary? Beginners underestimate integration days (8–10). The table sets honest expectations: AI compresses typing, not integration.

BUILD_RECEIPT excerpt (developer evidence)

We keep receipts lightweight—no fantasy metrics. A redacted slice from the project folder:

{
  "project": "room-ledger-vertical-slice",
  "engine": "godot-4.5",
  "drafter_model": "chatgpt-2026-05-pinned",
  "reviewer_model": "claude-2026-05-pinned",
  "human_integration_hours_estimate": 28,
  "ai_assist": {
    "gdscript_drafts": true,
    "store_copy_drafts": true,
    "runtime_llm_gameplay": false
  },
  "playtest_gate": {
    "checklist_version": "v3",
    "pass": true,
    "notes": "dash build-only repro fixed day 8"
  },
  "disclosure": {
    "ai_assisted_code": true,
    "human_review": true,
    "live_generative_gameplay": false
  },
  "educational_video": "https://www.youtube.com/watch?v=rB_o4zNo2bg"
}

If you adopt one habit from this article, adopt receipts—they survive chat deletion and partner questions. See BUILD_RECEIPT beginner pipeline for a fuller template.

Proof table — claims vs evidence

Claim Evidence type Result
Dual-LLM faster than solo typing Time logs on movement task Yes for boilerplate
Dual-LLM faster than solo calendar 12-day diary No for decisions
Claude catches Godot API drift Review diff count 9 P0/P1 in week 1
ChatGPT best for creative lists Bark v1 throughput High
Playtest without human Failed (always)
Video matches article Chapter map below Aligned
Store disclosure accurate Binary feature audit Matched

Proof tables are not SEO decoration—they are how you stop lying to yourself when a model sounds confident.

Week 2 — Content and polish (split workflow)

Asset ChatGPT Claude Human
Enemy barks v1 lines tone + length cap edited for lore
Room names list banned duplicate roots picked final 8
Tutorial strings step list clarity pass wired in engine
Bug triage repro guesses root-cause ranking fixed in Godot

Pro tip from the video: Record Claude reviews as screen captures for your publisher folder—shows human-in-the-loop without selling fantasy “fully autonomous AI game.”

What surprised me (good)

  1. Acceptance tests in design prompts — ChatGPT outputs became playtest checklists with light edits.
  2. Claude refactors — Extracted state machine from spaghetti if chains without changing behavior—rare and valuable.
  3. Cross-model disagreement — When models disagreed on architecture, the bug was real 70% of the time.
  4. Faster boring code — Save/load boilerplate, options menu scaffolding, signal wiring templates.
  5. Educational clarity — Filming the YouTube walkthrough forced me to explain prompts out loud—which improved the actual prompts.

What disappointed me (honest)

  1. “Just fix it” loops — ChatGPT sometimes patched symptoms; Claude sometimes over-refactored. Human judgment still arbitrates.
  2. No engine plugin awareness — Both models occasionally hallucinated Godot 3 APIs until I pasted the exact version string every session.
  3. Art pipeline untouched by words — Visuals still needed human taste and manual cleanup; see AI asset generation guide for limits.
  4. Playtesting — Zero substitutes for human hands on controller.
  5. Schedule illusion — AI shortened typing time, not decision time. Total calendar time was ~4 weeks part-time, not a weekend.

Side-by-side — same task, two models

Task: Design a GameState autoload for room transitions.

Output quality ChatGPT Claude
Speed Faster first draft Slower first draft
Godot 4.x accuracy Good with version pinned Better signal naming
Over-engineering Sometimes More conservative
Comments/docs Sparse Thorough
Security mindset Weak Stronger on save tampering

Workflow I kept: ChatGPT generate → human integrate → Claude review → human playtest → merge.

Compare also prompt battle for quests—same “split models by strength” philosophy, different task.

Copy-paste prompts that survived the project

ChatGPT — feature brief

Context: Godot 4.5 GDScript roguelite vertical slice (see scope_v1.md).
Task: Implement [FEATURE] with acceptance tests.
Constraints: No new autoloads without listing them; export vars for tuning; no gameplay random() in UI nodes.
Output: GDScript + short integration steps + test checklist.

Claude — review pass

You are a lead gameplay programmer reviewing GDScript for shipping.
Input: [PASTE CODE]
Check: Godot 4.5 API correctness, edge cases, save tampering, performance on low-end PC, signal leaks.
Output: numbered issues P0/P1/P2; propose minimal diff; do not rewrite entire file unless P0 count > 3.

Human — merge gate

  • [ ] Playtest checklist run
  • [ ] run_id visible on pause overlay
  • [ ] No new store claims in UI strings
  • [ ] BUILD_RECEIPT or upload note updated

Beginner path — copy this if you start tomorrow

  1. Watch the educational video once end-to-end.
  2. Write scope_v1.md in ChatGPT with the design prompt above.
  3. Pick one engine you already tolerate.
  4. Implement movement only before combat.
  5. Add Claude review before any public itch upload.
  6. Read how to create a game with AI (no coding) if you want visual-tool routes instead of Godot.

Time budget: 10–15 hours for first playable room if scope stays tiny.

Developer path — evidence and ops

  1. Prompt registry — semver your system prompts per live-ops sprint.
  2. Receipt cultureBUILD_RECEIPT notes which model drafted which subsystem.
  3. AI disclosure — Store page matches reality (assistive codegen + human integration).
  4. Voice/dialogue — If you add NPC voice later, use fallback architecture, not raw API hope.
  5. Do not ship player-facing LLM chat in v1—moderation and latency risks dominate.

Metrics I will share (and what I will not)

Share freely Will not invent
Calendar weeks (~4 part-time) Download counts
Lines human-edited vs AI-origin Revenue
Bug counts by category “Saved X hours” %
Playtest pass/fail on checklist Wishlist conversion

The video includes honest screen recordings of failures—including a broken dash that only reproduced on build, not in editor. That clip alone was worth filming.

Relationship to other GamineAI AI build guides

Resource How it differs
ChatGPT step-by-step beginner guide Single-model tutorial
Claude beginner guide Single-model tutorial
I let AI build a platformer Older multi-tool experiment
This post + video Dual-LLM 2026 narrative with educational film

Common mistakes when pairing ChatGPT and Claude

  1. Same mega-thread for both — Context pollution; start fresh threads per phase.
  2. Skipping human playtest — Models optimize for plausible code, not fun.
  3. Letting AI pick scope — You get a RPG in week one on paper and nothing shippable in engine.
  4. No version pin — “Godot script” is not specific enough.
  5. Hiding AI use — Disclosure rules caught up; be accurate.
  6. Trusting generated shaders blindly — Always profile on target GPU.
  7. Mixing three more tools mid-project — Tool thrash kills receipts.

If you only have one model

Only ChatGPT Only Claude
Add strict self-review checklist Add faster draft passes with smaller prompts
Use external linter heavily Pair with engine debugger time
Film your own review session Ask a peer for playtest

Dual-LLM is optional efficiency, not morality.

Steam / itch upload notes (what I actually disclosed)

Store copy listed:

  • AI-assisted code and text drafts with human review
  • No real-time generative AI gameplay in the demo build
  • Human integration, tuning, and QA

That matched the binary. If you add live LLM features later, update disclosure before the feature ships—not after a forum thread.

Tools stack (final)

Tool Role
ChatGPT Drafts
Claude Review
Godot 4.5 Engine
Git History
itch.io HTML5 demo
OBS Video + repro clips
YouTube Educational publish

Not listed as magic: Midjourney, Copilot, Cursor—used sparingly, not load-bearing for this slice.

After shipping — what I would do differently

  1. Film earlier — The video should have been episode 1, not week 3.
  2. Freeze prompts weekly — Fewer “helpful” model upgrades mid-sprint.
  3. Run determinism tests even for small roguelites—Godot has different surfaces than Construct refresh case studies but the discipline rhymes.
  4. One page prompt registry — Not fifty chat tabs.
  5. Publish the video link in devlog day one — Builds trust while building.

Key takeaways

  • ChatGPT + Claude is a workflow, not a genre—assign draft vs review roles.
  • Educational video shows the real pipeline: YouTube walkthrough.
  • Scope lock beats model choice for first ship.
  • Human still owns integration, playtest, disclosure, and cuts.
  • Four weeks part-time for a vertical slice—AI reduced typing, not decisions.
  • Claude caught edge cases ChatGPT introduced; ChatGPT saved days on boilerplate.
  • No invented metrics—honest limits on what shipped.
  • Pair with GamineAI guides for engine-specific depth after this narrative.
  • 2026 expects AI honesty on store pages and partner packets.
  • Watch, then read, then copy prompts—order matters for beginners.

Licensing, snippets, and “did AI steal this code?”

Neither model is a license lawyer. Our workflow:

  1. Paste small — Ask for minimal diffs, not 400-line dumps you cannot audit.
  2. Search odd strings — If a function name looks too polished, web-search a distinctive comment.
  3. Prefer MIT/CC0 kits for art/audio with paper trail.
  4. Document assistive AI on store pages—see Steam AI disclosure intake.
  5. Do not ship unknown shader packs labeled “royalty-free” without opening the files.

ChatGPT once returned a helper that resembled a popular tutorial repo. Claude flagged naming overlap; we rewrote manually. That hour was cheaper than a takedown thread.

Security — what we did not put in v1

Tempting feature Why we skipped v1
Player-facing LLM chat Moderation + latency
Cloud-only saves Offline fest demos
Arbitrary code exec from prompts Obvious
Unsigned mod loading Scope

If you add voice or dialogue AI later, start with hard fallback nets, not “API always up” wishful thinking.

Godot integration notes (working dev detail)

Autoload order mattered. ChatGPT proposed GameState before AudioBus in one draft; scene tree init broke footstep cues. Claude’s review reordered autoload list with justification—small diff, big stability win.

Export variables saved arguments. Every tunable the video tweaks live (dash_cooldown, room_transition_fade) is an @export so designers (future us) do not reopen AI chats for numbers.

Signals over singleton soup. Claude pushed event bus pattern for room_cleared instead of five cross-calls. ChatGPT’s first pass used direct node paths that broke when we duplicated room scenes.

HTML5 export gotchas. itch upload failed once on missing .wasm MIME—unrelated to LLMs but worth hosting smoke tests. The video includes that failure because beginners blame AI when hosting is the culprit.

When ChatGPT and Claude disagreed (case studies)

Case A — Save format JSON vs binary
ChatGPT: human-readable JSON for debug. Claude: signed JSON + version field. Winner: Claude shape, ChatGPT’s debug pretty-print in dev builds only.

Case B — Object pooling for projectiles
ChatGPT: pool immediately. Claude: pool only if profiler shows need. Winner: ship without pool; add if fest build dips.

Case C — Daily challenge meta
ChatGPT: exciting for retention. Claude: scope creep vs scope_v1.md. Winner: cut; ship vertical slice first.

Disagreement is a feature—treat it as a design review, not annoyance.

Filming the educational video — production choices

We recorded in 1080p60 with OBS, separate audio track, and chapter markers matching the day table above. Deliberate choices:

  • Show failed playtests — Trust beats polish.
  • Blur API keys — Even fake keys teach bad habits if visible.
  • On-screen prompt text — Pause-friendly for international viewers.
  • No fake revenue graphs — Aligns with GamineAI editorial rules.
  • Link in description to this article for checklists.

Re-watch the YouTube session with this article open side-by-side the first time; second watch, implement movement only.

Extended beginner FAQ (search phrasing)

How do I start if I never used Godot?
Follow how to create a game with AI no coding for visual tools, or Godot’s official “your first 2D game” docs, then return to dual-LLM drafts.

Do I need paid ChatGPT and Claude?
Free tiers can finish a slice with patience; paid tiers reduce wait time, not responsibility.

Can I use Copilot or Cursor too?
Yes, but cap tools—receipts get messy past three assistants.

What if Claude refuses a prompt?
Reframe as security review or shorten pasted code; refusal often signals scope or policy, not “broken AI.”

Should I stream the build on Twitch?
Optional—same disclosure honesty applies live.

Extended developer FAQ

CI for AI-generated GDScript?
Run gdlint/formatters if you use them; add human playtest gate; do not trust model self-report “all tests pass.”

Branch naming?
ai-draft/chatgpt-movement-v1review/claude-movement-v1human/integrate-movement keeps history legible.

Diff size limits for Claude?
Paste ≤300 lines per review with file path context; chain reviews for large files.

Compare to Copilot in Unity 2026 tools list?
Copilot is in-editor; ChatGPT/Claude are out-of-editor strategists—complementary, not duplicate.

FAQ

Can I build a whole game this way?
You can build milestones this way. Full games still need leadership, art direction, and QA systems.

Which ChatGPT or Claude version?
Pin the version you start with; note it in BUILD_RECEIPT. Upgrades mid-project cost time.

Is the video required?
No, but beginners should watch once—the article is the reference manual.

Godot only?
Principles transfer to Unity, Unreal, Construct—API prompts must change.

Does this replace courses?
No—see GamineAI courses and guides for structured depth.

Legal risk using AI code?
Use licenses you understand; disclose assistive AI; review for third-party snippet echoes.

Why not Gemini too?
Scope control—three models added thrash. Prompt battle covers tri-model quest design separately.

Conclusion

I built a small game with ChatGPT and Claude, not hype. ChatGPT gave me speed. Claude gave me discipline. Godot gave me a place where the work became real. The educational video gives you the classroom version; this page gives you the lab notes.

If you take one action today: write scope_v1.md, implement movement only, run one Claude review, and watch your first playtest without explaining it away. That is what happened when I stopped treating AI as a slot machine and started treating it as two specialists on a short leash.

Next reads: Your first LLM NPC fallback net, AI tools for Unity 2026, and how to create a video game with ChatGPT.

Video chapter map (align article ↔ film)

Use this map when jumping between the article and the YouTube session:

Topic in article Watch for in video
Scope lock Opening constraints slide
ChatGPT design Whiteboard / doc paste
First movement code Editor capture segment
Claude review Split-screen diff moment
Playtest fail Raw footage clip
Upload / disclosure Closing checklist

Pausing on the video beats re-reading the whole article when you are implementing the same week.

Prompt registry starter (one file)

Create docs/ai_prompt_registry.md:

version: 0.1.0
drafter: chatgpt-2026-05
reviewer: claude-2026-05
scope: scope_v1.md
human_signoff: pending

Bump semver when system prompts change—partners and future-you will ask.

Final note on the educational format

Publishing the walkthrough on YouTube was not afterthought marketing. It was QA for teaching: if I could not explain a step out loud, the step was not ready to merge. That discipline improved the game more than one extra feature idea from a chat window.

If this article helped, share the video link with someone who keeps asking whether “AI can make a game for them.” The honest answer is still no—but two models plus a human can ship a slice worth playing, and worth learning from.