Google DeepMind launches Veo, its rival to Sora in AI video

Google DeepMind announced Veo, its text-to-video model, at Google I/O on May 14, 2024, positioning it as a direct rival to OpenAI’s Sora, which had been revealed three months earlier. The first version could generate 1080p clips over a minute long and respond to cinematographic direction - shot types, camera moves, and visual styles. On December 16, 2024, Google followed with Veo 2, raising the bar to 4K resolution, clips running to minutes in length, and a markedly improved grasp of real-world physics and human movement.

Veo 2 was built to understand the language of filmmaking. Google’s example was telling: ask for a low-angle tracking shot that glides through a scene, or a close-up on a scientist’s face at a microscope, and the model composes it. Google paired the launch with Imagen 3 for still images and a remixing tool called Whisk, and embedded an invisible SynthID watermark in Veo output to flag it as AI-generated and help curb misinformation. Veo 3, released in May 2025, added synchronized audio - dialogue, sound effects, and ambient noise generated to match the picture.

Veo marked the moment the two largest AI labs were openly competing on generative video, each iterating in months rather than years, and each shipping provenance watermarking alongside the model.

Why business readers should care: AI video moved from demo to product faster than almost any prior media technology, and the major providers are building labeling in from the start. For anyone in marketing, film, or advertising, the cost and provenance of moving images are both being rewritten at once.