I Built a Game with ChatGPT and Claude - Here's What Happened (2026)

I did not set out to prove AI replaces game developers. I set out to answer a narrower question that indies keep asking in 2026: If I split the work between ChatGPT and Claude instead of one chat window, do I ship faster—or just fail with better documentation?

This article is the written companion to that experiment. The full educational walkthrough is on YouTube so you can watch decisions in real time, pause on prompts, and copy the workflow into your own engine.

Watch the educational video first

The video shows the same pipeline this article describes: scope lock, ChatGPT drafting, Claude review, engine integration, and playtest gates. Watch it when you want motion and screen context; read this page when you want checklists, failure modes, and links to deeper GamineAI guides.

YouTube link: https://www.youtube.com/watch?v=rB_o4zNo2bg

If the embed does not load on your reader, open the link directly—the walkthrough is the same content in video form.

Why this matters now (May 2026)

Three shifts made a dual-LLM build worth documenting:

Model specialization — ChatGPT families excel at fast iteration and tool-style codegen; Claude families excel at long-context review and careful refactors. Using only one model hides that split.
Store and partner scrutiny — Steam AI disclosure and seven-day disclosure sprints mean “we used AI” must be operational truth, not marketing fluff.
Fallback expectations — If your game has dialogue or live ops text, ElevenLabs + Ollama architecture and LLM fallback nets are now baseline literacy—not stretch goals.

Direct answer: ChatGPT carried first drafts and velocity; Claude carried review, safety, and structure. I still did integration, playtesting, and scope cuts. The game shipped as a small vertical slice, not a AAA title—and that honesty is the point.

What I built (scope lock)

Constraint	Choice
Genre	Top-down micro-roguelite rooms
Engine	Godot 4.x (GDScript)
Session	15–25 minute runs
Art	Mixed AI-assisted + kitbash
Audio	Stock + one generated loop
Multiplayer	None
Narrative	Light barks, no LLM in player-facing chat

Working title internally: Room Ledger—a run-based game where room order is seeded and logged for replay debugging. The name matters less than the scope contract pasted at the top of every AI thread.

If you cannot describe your game in one sentence without commas, cut scope before opening ChatGPT.

The dual-LLM contract (rules I actually followed)

Role	Model	Allowed to do	Not allowed
Drafter	ChatGPT	Design docs, GDScript v1, UI copy v1, task breakdowns	Final merge without review
Reviewer	Claude	Refactor, threat modeling, test plans, prompt registry	Invent new mechanics mid-sprint
Human (me)	—	Engine wiring, playtest, cuts, uploads	Pretend AI playtests

Every merged script passed a Claude review pass with a standard prompt (below). Every new mechanic started as a ChatGPT brief with acceptance criteria.

This is different from the older platformer experiment that rotated more tools (Midjourney, Copilot). Here the story is two LLMs + one engine.

Week 0 — Design without code (ChatGPT-led)

Prompt pattern:

You are a senior game designer. Propose a 15-minute roguelite vertical slice for Godot 4.x. Output: core loop, 8 rooms max, 3 enemy types, 1 boss, fail states, and acceptance tests in plain English. No code.

What worked: Fast loop definition, readable milestone table, sensible difficulty curve on paper.

What failed: Feature creep in the same thread—“add daily challenges, add meta progression”—unless I started a new chat with “ignore prior messages except scope doc.”

Artifact saved: docs/scope_v1.md—pasted into Claude and ChatGPT system context for the rest of the project.

Beginner tip: Export ChatGPT’s design reply to Markdown the same day. Chats vanish; files do not.

Week 1 — Code drafts (ChatGPT-led, Claude-gated)

Player controller

ChatGPT produced a workable CharacterBody2D mover in one pass: acceleration, coyote time, dash with cooldown.

Claude review caught:

Missing is_on_floor() guard on dash reset edge case
Hardcoded 300 pixels/sec with no export variables
No input action map references (keyboard vs controller drift)

Time: ~3 hours human integration, ~45 minutes AI combined.

Room spawner + seed

ChatGPT suggested a run_seed and room_index pattern—aligned with industry discourse on RNG ledgers even though this project was Godot, not Construct.

Claude added:

run_id string for playtest forms
Warning comment on where not to call randi() (UI shake killed determinism in an earlier micro-test)

Lesson: Treat AI seed advice like any other sample code—verify with refresh/reload tests.

UI overlay

ChatGPT drafted pause menu copy and layout tree. Claude shortened strings for 1280×800 and flagged unverifiable store claims in placeholder text (“infinite runs!”). Small catch, large refund-risk if shipped.

Day-by-day build diary (what the camera caught)

The educational video follows this rough calendar. Numbers are calendar days, not eight-hour workdays—this was nights-and-weekends indie time.

Day	Focus	ChatGPT	Claude	Human outcome
1	Scope + loop	Design doc v1	Trim creep	`scope_v1.md` frozen
2	Movement	Controller draft	P0 dash bug	Playable box in room
3	Combat stub	Hitbox template	Signal naming	One enemy dies
4	Room flow	Spawner sketch	Seed warnings	Room 1→2 works
5	UI pause	Menu tree	Copy length	Pause without crash
6	Audio pass	SFX list only	—	Placeholder blips
7	Playtest	—	Triage list	12 bugs filed
8	Fix P0	Patch suggestions	Review diffs	Dash build-only bug gone
9	Polish	Bark lines	Tone pass	Strings wired
10	Build export	Checklist	Disclosure draft	itch HTML5 zip
11	Film edit	—	—	YouTube rough cut
12	Publish	—	—	Article + video live

Why publish the diary? Beginners underestimate integration days (8–10). The table sets honest expectations: AI compresses typing, not integration.

BUILD_RECEIPT excerpt (developer evidence)

We keep receipts lightweight—no fantasy metrics. A redacted slice from the project folder:

{
  "project": "room-ledger-vertical-slice",
  "engine": "godot-4.5",
  "drafter_model": "chatgpt-2026-05-pinned",
  "reviewer_model": "claude-2026-05-pinned",
  "human_integration_hours_estimate": 28,
  "ai_assist": {
    "gdscript_drafts": true,
    "store_copy_drafts": true,
    "runtime_llm_gameplay": false
  },
  "playtest_gate": {
    "checklist_version": "v3",
    "pass": true,
    "notes": "dash build-only repro fixed day 8"
  },
  "disclosure": {
    "ai_assisted_code": true,
    "human_review": true,
    "live_generative_gameplay": false
  },
  "educational_video": "https://www.youtube.com/watch?v=rB_o4zNo2bg"
}

If you adopt one habit from this article, adopt receipts—they survive chat deletion and partner questions. See BUILD_RECEIPT beginner pipeline for a fuller template.

Proof table — claims vs evidence

Claim	Evidence type	Result
Dual-LLM faster than solo typing	Time logs on movement task	Yes for boilerplate
Dual-LLM faster than solo calendar	12-day diary	No for decisions
Claude catches Godot API drift	Review diff count	9 P0/P1 in week 1
ChatGPT best for creative lists	Bark v1 throughput	High
Playtest without human	—	Failed (always)
Video matches article	Chapter map below	Aligned
Store disclosure accurate	Binary feature audit	Matched

Proof tables are not SEO decoration—they are how you stop lying to yourself when a model sounds confident.

Week 2 — Content and polish (split workflow)

Asset	ChatGPT	Claude	Human
Enemy barks	v1 lines	tone + length cap	edited for lore
Room names	list	banned duplicate roots	picked final 8
Tutorial strings	step list	clarity pass	wired in engine
Bug triage	repro guesses	root-cause ranking	fixed in Godot

Pro tip from the video: Record Claude reviews as screen captures for your publisher folder—shows human-in-the-loop without selling fantasy “fully autonomous AI game.”

What surprised me (good)

Acceptance tests in design prompts — ChatGPT outputs became playtest checklists with light edits.
Claude refactors — Extracted state machine from spaghetti if chains without changing behavior—rare and valuable.
Cross-model disagreement — When models disagreed on architecture, the bug was real 70% of the time.
Faster boring code — Save/load boilerplate, options menu scaffolding, signal wiring templates.
Educational clarity — Filming the YouTube walkthrough forced me to explain prompts out loud—which improved the actual prompts.

What disappointed me (honest)

“Just fix it” loops — ChatGPT sometimes patched symptoms; Claude sometimes over-refactored. Human judgment still arbitrates.
No engine plugin awareness — Both models occasionally hallucinated Godot 3 APIs until I pasted the exact version string every session.
Art pipeline untouched by words — Visuals still needed human taste and manual cleanup; see AI asset generation guide for limits.
Playtesting — Zero substitutes for human hands on controller.
Schedule illusion — AI shortened typing time, not decision time. Total calendar time was ~4 weeks part-time, not a weekend.

Side-by-side — same task, two models

Task: Design a GameState autoload for room transitions.

Output quality	ChatGPT	Claude
Speed	Faster first draft	Slower first draft
Godot 4.x accuracy	Good with version pinned	Better signal naming
Over-engineering	Sometimes	More conservative
Comments/docs	Sparse	Thorough
Security mindset	Weak	Stronger on save tampering

Workflow I kept: ChatGPT generate → human integrate → Claude review → human playtest → merge.

Compare also prompt battle for quests—same “split models by strength” philosophy, different task.

Copy-paste prompts that survived the project

ChatGPT — feature brief

Context: Godot 4.5 GDScript roguelite vertical slice (see scope_v1.md).
Task: Implement [FEATURE] with acceptance tests.
Constraints: No new autoloads without listing them; export vars for tuning; no gameplay random() in UI nodes.
Output: GDScript + short integration steps + test checklist.

Claude — review pass

You are a lead gameplay programmer reviewing GDScript for shipping.
Input: [PASTE CODE]
Check: Godot 4.5 API correctness, edge cases, save tampering, performance on low-end PC, signal leaks.
Output: numbered issues P0/P1/P2; propose minimal diff; do not rewrite entire file unless P0 count > 3.

Human — merge gate

[ ] Playtest checklist run
[ ] run_id visible on pause overlay
[ ] No new store claims in UI strings
[ ] BUILD_RECEIPT or upload note updated

Beginner path — copy this if you start tomorrow

Watch the educational video once end-to-end.
Write scope_v1.md in ChatGPT with the design prompt above.
Pick one engine you already tolerate.
Implement movement only before combat.
Add Claude review before any public itch upload.
Read how to create a game with AI (no coding) if you want visual-tool routes instead of Godot.

Time budget: 10–15 hours for first playable room if scope stays tiny.

Developer path — evidence and ops

Prompt registry — semver your system prompts per live-ops sprint.
Receipt culture — BUILD_RECEIPT notes which model drafted which subsystem.
AI disclosure — Store page matches reality (assistive codegen + human integration).
Voice/dialogue — If you add NPC voice later, use fallback architecture, not raw API hope.
Do not ship player-facing LLM chat in v1—moderation and latency risks dominate.

Metrics I will share (and what I will not)

Share freely	Will not invent
Calendar weeks (~4 part-time)	Download counts
Lines human-edited vs AI-origin	Revenue
Bug counts by category	“Saved X hours” %
Playtest pass/fail on checklist	Wishlist conversion

The video includes honest screen recordings of failures—including a broken dash that only reproduced on build, not in editor. That clip alone was worth filming.

Relationship to other GamineAI AI build guides

Resource	How it differs
ChatGPT step-by-step beginner guide	Single-model tutorial
Claude beginner guide	Single-model tutorial
I let AI build a platformer	Older multi-tool experiment
This post + video	Dual-LLM 2026 narrative with educational film

Common mistakes when pairing ChatGPT and Claude

Same mega-thread for both — Context pollution; start fresh threads per phase.
Skipping human playtest — Models optimize for plausible code, not fun.
Letting AI pick scope — You get a RPG in week one on paper and nothing shippable in engine.
No version pin — “Godot script” is not specific enough.
Hiding AI use — Disclosure rules caught up; be accurate.
Trusting generated shaders blindly — Always profile on target GPU.
Mixing three more tools mid-project — Tool thrash kills receipts.

If you only have one model

Only ChatGPT	Only Claude
Add strict self-review checklist	Add faster draft passes with smaller prompts
Use external linter heavily	Pair with engine debugger time
Film your own review session	Ask a peer for playtest

Dual-LLM is optional efficiency, not morality.

Steam / itch upload notes (what I actually disclosed)

Store copy listed:

AI-assisted code and text drafts with human review
No real-time generative AI gameplay in the demo build
Human integration, tuning, and QA

That matched the binary. If you add live LLM features later, update disclosure before the feature ships—not after a forum thread.

Tools stack (final)

Tool	Role
ChatGPT	Drafts
Claude	Review
Godot 4.5	Engine
Git	History
itch.io	HTML5 demo
OBS	Video + repro clips
YouTube	Educational publish

Not listed as magic: Midjourney, Copilot, Cursor—used sparingly, not load-bearing for this slice.

After shipping — what I would do differently

Film earlier — The video should have been episode 1, not week 3.
Freeze prompts weekly — Fewer “helpful” model upgrades mid-sprint.
Run determinism tests even for small roguelites—Godot has different surfaces than Construct refresh case studies but the discipline rhymes.
One page prompt registry — Not fifty chat tabs.
Publish the video link in devlog day one — Builds trust while building.

Key takeaways

ChatGPT + Claude is a workflow, not a genre—assign draft vs review roles.
Educational video shows the real pipeline: YouTube walkthrough.
Scope lock beats model choice for first ship.
Human still owns integration, playtest, disclosure, and cuts.
Four weeks part-time for a vertical slice—AI reduced typing, not decisions.
Claude caught edge cases ChatGPT introduced; ChatGPT saved days on boilerplate.
No invented metrics—honest limits on what shipped.
Pair with GamineAI guides for engine-specific depth after this narrative.
2026 expects AI honesty on store pages and partner packets.
Watch, then read, then copy prompts—order matters for beginners.

Licensing, snippets, and “did AI steal this code?”

Neither model is a license lawyer. Our workflow:

Paste small — Ask for minimal diffs, not 400-line dumps you cannot audit.
Search odd strings — If a function name looks too polished, web-search a distinctive comment.
Prefer MIT/CC0 kits for art/audio with paper trail.
Document assistive AI on store pages—see Steam AI disclosure intake.
Do not ship unknown shader packs labeled “royalty-free” without opening the files.

ChatGPT once returned a helper that resembled a popular tutorial repo. Claude flagged naming overlap; we rewrote manually. That hour was cheaper than a takedown thread.

Security — what we did not put in v1

Tempting feature	Why we skipped v1
Player-facing LLM chat	Moderation + latency
Cloud-only saves	Offline fest demos
Arbitrary code exec from prompts	Obvious
Unsigned mod loading	Scope

If you add voice or dialogue AI later, start with hard fallback nets, not “API always up” wishful thinking.

Godot integration notes (working dev detail)

Autoload order mattered. ChatGPT proposed GameState before AudioBus in one draft; scene tree init broke footstep cues. Claude’s review reordered autoload list with justification—small diff, big stability win.

Export variables saved arguments. Every tunable the video tweaks live (dash_cooldown, room_transition_fade) is an @export so designers (future us) do not reopen AI chats for numbers.

Signals over singleton soup. Claude pushed event bus pattern for room_cleared instead of five cross-calls. ChatGPT’s first pass used direct node paths that broke when we duplicated room scenes.

HTML5 export gotchas. itch upload failed once on missing .wasm MIME—unrelated to LLMs but worth hosting smoke tests. The video includes that failure because beginners blame AI when hosting is the culprit.

When ChatGPT and Claude disagreed (case studies)

Case A — Save format JSON vs binary
ChatGPT: human-readable JSON for debug. Claude: signed JSON + version field. Winner: Claude shape, ChatGPT’s debug pretty-print in dev builds only.

Case B — Object pooling for projectiles
ChatGPT: pool immediately. Claude: pool only if profiler shows need. Winner: ship without pool; add if fest build dips.

Case C — Daily challenge meta
ChatGPT: exciting for retention. Claude: scope creep vs scope_v1.md. Winner: cut; ship vertical slice first.

Disagreement is a feature—treat it as a design review, not annoyance.

Filming the educational video — production choices

We recorded in 1080p60 with OBS, separate audio track, and chapter markers matching the day table above. Deliberate choices:

Show failed playtests — Trust beats polish.
Blur API keys — Even fake keys teach bad habits if visible.
On-screen prompt text — Pause-friendly for international viewers.
No fake revenue graphs — Aligns with GamineAI editorial rules.
Link in description to this article for checklists.

Re-watch the YouTube session with this article open side-by-side the first time; second watch, implement movement only.

Extended beginner FAQ (search phrasing)

How do I start if I never used Godot?
Follow how to create a game with AI no coding for visual tools, or Godot’s official “your first 2D game” docs, then return to dual-LLM drafts.

Do I need paid ChatGPT and Claude?
Free tiers can finish a slice with patience; paid tiers reduce wait time, not responsibility.

Can I use Copilot or Cursor too?
Yes, but cap tools—receipts get messy past three assistants.

What if Claude refuses a prompt?
Reframe as security review or shorten pasted code; refusal often signals scope or policy, not “broken AI.”

Should I stream the build on Twitch?
Optional—same disclosure honesty applies live.

Extended developer FAQ

CI for AI-generated GDScript?
Run gdlint/formatters if you use them; add human playtest gate; do not trust model self-report “all tests pass.”

Branch naming?
ai-draft/chatgpt-movement-v1 → review/claude-movement-v1 → human/integrate-movement keeps history legible.

Diff size limits for Claude?
Paste ≤300 lines per review with file path context; chain reviews for large files.

Compare to Copilot in Unity 2026 tools list?
Copilot is in-editor; ChatGPT/Claude are out-of-editor strategists—complementary, not duplicate.

FAQ

Can I build a whole game this way?
You can build milestones this way. Full games still need leadership, art direction, and QA systems.

Which ChatGPT or Claude version?
Pin the version you start with; note it in BUILD_RECEIPT. Upgrades mid-project cost time.

Is the video required?
No, but beginners should watch once—the article is the reference manual.

Godot only?
Principles transfer to Unity, Unreal, Construct—API prompts must change.

Does this replace courses?
No—see GamineAI courses and guides for structured depth.

Legal risk using AI code?
Use licenses you understand; disclose assistive AI; review for third-party snippet echoes.

Why not Gemini too?
Scope control—three models added thrash. Prompt battle covers tri-model quest design separately.

Conclusion

I built a small game with ChatGPT and Claude, not hype. ChatGPT gave me speed. Claude gave me discipline. Godot gave me a place where the work became real. The educational video gives you the classroom version; this page gives you the lab notes.

If you take one action today: write scope_v1.md, implement movement only, run one Claude review, and watch your first playtest without explaining it away. That is what happened when I stopped treating AI as a slot machine and started treating it as two specialists on a short leash.

Next reads: Your first LLM NPC fallback net, AI tools for Unity 2026, and how to create a video game with ChatGPT.

Video chapter map (align article ↔ film)

Use this map when jumping between the article and the YouTube session:

Topic in article	Watch for in video
Scope lock	Opening constraints slide
ChatGPT design	Whiteboard / doc paste
First movement code	Editor capture segment
Claude review	Split-screen diff moment
Playtest fail	Raw footage clip
Upload / disclosure	Closing checklist

Pausing on the video beats re-reading the whole article when you are implementing the same week.

Prompt registry starter (one file)

Create docs/ai_prompt_registry.md:

version: 0.1.0
drafter: chatgpt-2026-05
reviewer: claude-2026-05
scope: scope_v1.md
human_signoff: pending

Bump semver when system prompts change—partners and future-you will ask.

Final note on the educational format

Publishing the walkthrough on YouTube was not afterthought marketing. It was QA for teaching: if I could not explain a step out loud, the step was not ready to merge. That discipline improved the game more than one extra feature idea from a chat window.

If this article helped, share the video link with someone who keeps asking whether “AI can make a game for them.” The honest answer is still no—but two models plus a human can ship a slice worth playing, and worth learning from.