Unity ML-Agents Training Hangs or Crashes

Problem - ML-Agents Training Freezes, Never Starts, or Crashes

You hit Play to start training with Unity ML-Agents (for example, using mlagents-learn), but:

The training window shows no episodes or rewards.
The Unity editor freezes or becomes unresponsive after a few seconds.
The Python trainer prints “Environment took too long to respond” or similar timeouts.
The game runs in the editor, but agents never move or send observations.

This usually means your environment is not talking correctly to the trainer, or the simulation is running in a way ML-Agents does not expect.

Root Causes (Why This Happens)

Typical reasons ML-Agents training hangs or crashes include:

The Behavior Parameters and trainer config (YAML) do not match.
The Agent objects never call EndEpisode or generate decisions.
The Academy is misconfigured (for example, time scale or max steps).
The Python package and Unity package versions are out of sync.
There is a script exception in CollectObservations, OnActionReceived, or Heuristic that silently breaks the agent.
The game is doing heavy work every frame (for example, logging or allocations) so training looks frozen.

The fix is to check the pipeline from Python → environment → agent and back, step by step.

Step 1 – Confirm Your Versions Match

First, make sure Unity and Python ML-Agents versions are compatible.

In Unity:
- Open Package Manager and check the ML-Agents and ML-Agents Extensions versions.
In a terminal:

pip show mlagents
pip show mlagents-envs

Check the official docs or package notes for supported version pairs. If they are far apart (for example, Unity package 3.x and Python 1.x), upgrade/downgrade so they line up.

If you recently upgraded Unity or the ML-Agents package, re-run:

pip install --upgrade mlagents mlagents-envs

Then restart both the Unity editor and your training terminal.

Step 2 – Verify Behavior Name and Trainer Config Match

ML-Agents uses a behavior name to connect Unity agents to the trainer configuration.

Select your Agent in the Unity Hierarchy.
In the Behavior Parameters component, check:
- Behavior Name (for example, WalkerBrain).
Open your YAML config file used with mlagents-learn.
Confirm you have a section with the same behavior name:

behaviors:
  WalkerBrain:
    trainer_type: ppo
    max_steps: 500000
    # ...

If the names differ, the trainer will not send actions to your agents, and training will appear idle or stuck.

Step 3 – Check That Episodes Actually Start and End

If episodes never start or never end, training metrics will be flat.

In your Agent script:

Make sure OnEpisodeBegin is called to reset the environment.
Call AddReward and EndEpisode when an episode should finish.
Avoid infinite loops or waits inside these methods.

Quick sanity check:

Add simple logging:

public override void OnEpisodeBegin()
{
    Debug.Log("Episode begin");
    // reset environment...
}

public override void OnActionReceived(ActionBuffers actions)
{
    Debug.Log("Action received");
    // apply actions...
}

Enter Play Mode without running mlagents-learn.
Confirm that these logs appear in the Console when the agent runs.

If they do not:

Your agent might not be enabled.
The object might be disabled or outside the scene.
There may be script errors preventing execution.

Fix these basic lifecycle issues before trying to train.

Step 4 – Look for Silent Exceptions in Agent Methods

Exceptions in CollectObservations, OnActionReceived, or Heuristic can break the agent’s logic without an obvious crash.

While in Play Mode:

Open Console and enable Error and Exception logs.
Watch for repeated errors such as:
- IndexOutOfRangeException in CollectObservations.
- NullReferenceException when accessing transforms or components.

Common pitfalls:

Writing more observations than you declared in the Observation Spec.
Using GetComponent every frame instead of caching references.
Assuming objects exist when they may be null on reset.

Fix these exceptions and verify that the console stays clean before training again.

Step 5 – Adjust Time Scale and Decision Frequency

Excessive time scale or very frequent decisions can overwhelm your environment.

In the Academy (Project Settings → ML-Agents) or custom Academy script:

Set Time Scale to a reasonable value (for example, 10–20 for debugging).
Disable Automatic Stepping only if you know what you are doing.

In your Agent’s Behavior Parameters:

Check Decision Period:
- 1 means a decision every frame (can be heavy).
- For many environments, 5–10 frames per decision is enough.

If training hangs only when time scale is high:

Start with time scale 1–5 and scale up gradually.
Profile CPU usage and garbage allocations to see if your logic is too heavy.

Step 6 – Confirm Python Can Connect to the Environment

If the Python trainer cannot reach Unity properly, you will see timeouts like:

Environment took too long to respond
Could not connect to Unity environment

Checklist:

Close all old Unity editor instances that might still be running an environment.
Use the recommended command:

mlagents-learn config.yaml --run-id=my_run --env=.

or, if running from the editor:

Start mlagents-learn first.
Then press Play in the Unity editor once.

Avoid:

Pressing Play multiple times with the trainer already running.
Running two different training sessions pointing at the same environment.

If you are using a built executable as the environment, ensure the path and platform are correct.

Step 7 – Test with a Minimal Example Scene

If your main project is complex, isolate ML-Agents in a tiny test scene:

One agent.
One simple reward condition (for example, move toward a target).
Short episodes (end after reaching the target or a small step count).

Use this scene to confirm that:

Training starts.
Rewards and episode counts increase.
The editor does not hang or crash.

If the minimal scene works but your full game does not, the issue is likely:

Heavy per-frame logic.
Complex physics or VFX.
Many agents or large observations/actions.

Scale up carefully from the minimal environment, checking where things break.

Verification – How to Know It’s Fixed

You should see:

The Python trainer printing episode counts, rewards, and loss values.
The Unity editor remains responsive during training (even with higher time scale).
Agents move and change behavior over time instead of staying idle.

If metrics are updating and the editor is stable across multiple episodes, your environment is now correctly wired.

Alternative Fixes and Edge Cases

Headless builds on CI
- Make sure you use the correct command-line arguments for headless environments.
- Check that required assets and scenes are included in the build.
Running multiple environments in parallel
- Confirm each environment uses a unique port if running multiple instances.
- Verify your machine has enough CPU/RAM for parallel training.
Platform-specific issues (macOS, Linux, cloud)
- Check OS-specific permissions and execution flags.
- Ensure Python and Unity builds use compatible architectures (ARM vs x86).

Prevention Tips

Keep ML-Agents and Unity packages up to date, but avoid upgrading mid-training without testing.
Document your behavior names, YAML configs, and environment reset logic.
Add simple health checks:
- Log when episodes begin/end.
- Log cumulative reward at the end of each episode.
Use a minimal test scene for new projects or after major engine upgrades.

Unity ML-Agents Training Hangs or Crashes - Environment Fix