On December 11, 2024, Google announced Gemini 2.0, saying the model family is “built for this new agentic era” and can “understand more about the world, think multiple steps ahead, and take action on your behalf.” The first release was Gemini 2.0 Flash, an experimental version that Google said “outperforms 1.5 Pro on key benchmarks, at twice the speed.”
Gemini 2.0 Flash moved beyond text generation into native multimodal output. Google said the model “supports multimodal output like natively generated images mixed with text and steerable text-to-speech multilingual audio,” along with native tool use. The launch also previewed two agent research prototypes: Project Astra, exploring a “universal AI assistant” with longer in-session memory and lower latency, and Project Mariner, a browser-controlling agent that Google reported reached “a state-of-the-art result of 83.5%” on the WebVoyager benchmark.
Gemini 2.0 mattered as Google’s explicit pivot from chat models to agents. By framing the generation around taking action and shipping browser and assistant prototypes alongside the model, Google signaled that the next competitive front was software that does tasks, not just answers questions.