GameNGen: Diffusion Models Are Real-Time Game Engines

“Diffusion Models Are Real-Time Game Engines,” posted to arXiv on August 27, 2024 by Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter at Google, introduced GameNGen, a system that simulated the classic 1993 video game DOOM using nothing but a neural network. There was no traditional game engine running underneath; the model predicted each frame directly.

The training had two stages. First, a reinforcement-learning agent played DOOM while the sessions were recorded. Then a diffusion model was trained to predict the next frame conditioned on the recent frames and the player’s actions, learning the game’s behavior from the recordings. The result ran at 20 frames per second on a single TPU, stayed stable over multi-minute play sessions, and reached a PSNR of 29.4, comparable to lossy JPEG compression. In a human evaluation, raters were only slightly better than chance at telling real gameplay from the simulated version after watching short clips. The work was accepted at ICLR 2025.

GameNGen was a striking proof that an interactive, rule-following environment could emerge entirely from a generative model trained on observation, a stepping stone toward the broader idea of neural world models. For a general reader, it hints at a future where simulations, training environments, and even games could be generated on the fly rather than hand-built, with all the cost and flexibility implications that shift would bring.

GameNGen: Diffusion Models Are Real-Time Game Engines

Sources

Related