Behavior Cloning

Behavior cloning is the most direct form of imitation learning. You collect a dataset of demonstrations - pairs of (what the agent observed, what action the demonstrator took) - and train a model with ordinary supervised learning to predict the action from the observation. There is no reward, no exploration, and no simulation of consequences; the policy simply learns to copy the expert. Google’s RT-1, for instance, is fundamentally a behavior-cloning policy trained on 130,000 teleoperated robot episodes.

The appeal is simplicity and data efficiency, but behavior cloning has a well-known weakness called distribution shift, or compounding error. The policy is only trained on states the expert visited; once it makes a small mistake, it drifts into states it never saw in training, where its predictions degrade, causing larger mistakes - a feedback loop that can derail long tasks. Methods like DAgger address this by collecting expert corrections in the states the policy actually reaches, and richer policies such as Diffusion Policy reduce errors by modeling the full distribution of expert actions instead of a single averaged action.

Why business readers should care: behavior cloning is the workhorse behind teaching robots and agents from demonstrations, and its distribution-shift failure mode explains why a system that looks perfect in a demo can still fail in the field - the real world keeps presenting situations the demonstrations never covered.

Sources

Related