On May 31, 2026, NVIDIA launched Cosmos 3, which it describes as an open foundation model for physical AI - the branch of AI that perceives and acts in the real world through robots, autonomous vehicles, and vision systems. NVIDIA calls it the first fully open omnimodel with native vision reasoning and multimodal generation across text, image, video, ambient sound, and action. The model uses a mixture-of-transformers architecture and was trained on one of the largest multimodal physical AI datasets, spanning billions of samples across text, image, video, sound, and action trajectories.
Cosmos 3 combines three capabilities that were typically handled by separate systems: vision reasoning, world generation (simulating environments), and action prediction. Developers can deploy it as a vision-language model, as a world model for simulating environments, or as a backbone for training robots on specific tasks, which NVIDIA says reduces physical AI training and evaluation cycles from months to days. NVIDIA released variants including Cosmos 3 Super for highest accuracy and Cosmos 3 Nano for fast reasoning, with a Cosmos 3 Edge variant for real-time edge deployment announced as coming soon. The launch named a Cosmos coalition of partners including Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI.
For business leaders tracking robotics and automation, Cosmos 3 lowers the barrier to building embodied AI. A single open model that perceives, generates, simulates, and acts means companies can start from a shared foundation instead of assembling many narrow tools, concentrating progress and shortening the path from prototype to deployed physical systems. The accompanying technical paper, released the following day, documents the architecture and benchmark results in detail.