Gemini 2.0 Flash, released by Google DeepMind on February 5, 2025, is the efficiency-focused successor to Gemini 1.5 Flash. It is a multimodal model that accepts text, code, images, audio, and video as inputs, though its stable GA release outputs text only (image and audio generation remain in preview). The model supports up to 1 million tokens of input context with an output cap of ~8K tokens, making it well-suited for analyzing large documents, transcripts, or media files. Its knowledge is current through August 2024.
Flash 2.0 is optimized for speed, scalability, and agentic workflows, offering fast response times, tool use, structured outputs, and function calling. While more cost-efficient than Pro variants, its trade-offs include shorter output lengths and less depth on reasoning-intensive tasks. Available through the Gemini API, Vertex AI, AI Studio, and Gemini apps, Gemini 2.0 Flash is positioned for real-time applications, enterprise assistants, and production-scale multimodal processing where efficiency and throughput are priorities.