Google

Google: Gemini 2.5 Pro

Gemini 2.5 Pro Overview

Gemini 2.5 Pro, released on June 17, 2025, is Google DeepMind’s most capable model in the Gemini 2.5 family, optimized for deep reasoning, coding, and complex multimodal tasks. It accepts text, images, audio, video, and PDFs as input and outputs text. The model supports 1 million input tokens with an output capacity of up to 65K tokens, enabling large-scale comprehension of datasets, codebases, and technical documents. Its training knowledge extends to January 2025.

Pro outperforms earlier Gemini 2.0 models across benchmarks, including agentic coding tasks where it achieved ~63.8% on SWE-Bench Verified. It supports structured outputs, function calling, code execution, search grounding, and URL context, making it well-suited for enterprise, STEM, and developer workflows. However, it does not currently support image or audio generation in its stable release, and its higher computational cost and latency make it less efficient than Flash or Flash-Lite. It is available via the Gemini API, Google AI Studio, and Vertex AI.

Gemini 2.5 Pro Interactive Demo

Gemini 2.5 Pro Details & Performance

Details

Resources

Vision Tasks

Vision LanguageObject DetectionClassificationOCRVisual Question AnsweringCaptioning

Features

Foundation VisionLLMs with Vision CapabilitiesMultimodal Vision

Usage

Past 30 Days

Performance

Avg. Latency

Arena Rankings

Gemini 2.5 Pro Vision Evals

Visual Understanding

72 models · 67 tasks
HighestLowest
This model#17 of 7270.15% pass rate · better than 72%
Score70.15%pass rate across 67 tasks
Speed11.87savg response per task
Cost$0.0060 / task$1.25 in · $10.00 out / 1M
Tokens856 / task294 in · 565 out
Score key:≥75%40–74%<40%
CategoryPassedScore
Document Understanding8 / 9
88.9%
Spatial Understanding15 / 19
78.9%
Object Understanding11 / 14
78.6%
Defect Detection11 / 15
73.3%
Object Counting2 / 10
20%
HighestLowest
This model#23 of 5078.6% pass rate · better than 52%
Score78.6%pass rate across 229 tasks
Speed4.91savg response per task
Cost$0.0036 / task$1.25 in · $10.00 out / 1M
Tokens611 / task290 in · 323 out
Score key:≥75%40–74%<40%
CategoryPassedScore
License Plate Recognition27 / 30
90%
Handwritten Math8 / 10
80%
Focused Scene OCR78 / 99
78.8%
VQA & Extraction45 / 60
75%
Text Recognition22 / 30
73.3%

Scores based on a single evaluation run · Methodology

View all Vision Evals →

Gemini 2.5 Pro Pricing

Gemini 2.5 Pro costs $1.25 per 1M input tokens and $10.00 per 1M output tokens.

Input$1.25 / 1M tokens
Output$10.00 / 1M tokens
Cached input$0.125 / 1M tokens

Pricing updated Jun 28, 2026

Price vs. performance

Estimated cost per task vs. Visual Understanding score, for this model and others ranked near it. Upper-left is the sweet spot (high quality, low cost).

11 of 11 models plotted

ModelScoreMedian tokensEst. cost / taskCompare
GoogleGemini 3.1 Pro75.8%1.1K$0.0024Compare
GoogleGemini 3 Flash74.6%1.4K$0.0014Compare
OpenAIGPT-5 Mini73.1%1.8K$0.0006Compare
QwenQwen3.5 27B71.6%1.2K$0.0002Compare
AnthropicClaude Sonnet 4.670.2%2.3K$0.0080Compare
GoogleGemini 2.5 Pro(this model)70.2%856$0.0060
GoogleGemini 3.1 Flash-Lite68.7%1.1K$0.0003Compare
GoogleGemma 4 26B A4B68.7%531$0.0001Compare
QwenQwen3.6 Plus68.7%1.6K$0.0005Compare
AnthropicClaude Opus 4.867.2%2.2K$0.012Compare
AnthropicClaude Opus 4.767.2%2.6K$0.015Compare

Alternatives to Gemini 2.5 Pro

Other models worth comparing for similar use cases.

Anthropic
Claude Opus 4.8
Claude Opus 4.8 is Anthropic's most capable generally available large language model, released on May 28, 2026 as an incremental upgrade to Claude Opus 4.7. The model accepts text and image inputs and produces text outputs, with a 1 million token context window on the Claude API, Amazon Bedrock, and Google Cloud Vertex AI (200k tokens on Microsoft Foundry) and up to 128k max output tokens. It uses adaptive thinking and supports adjustable effort tiers — high by default, with extra and max tiers available for more demanding tasks. A fast mode operates at approximately 2.5x standard speed. The model is described by Anthropic as a hybrid reasoning model designed for advanced coding, agentic workflows, long-context reasoning, and professional knowledge work.Key behavioral improvements over Opus 4.7 include substantially reduced rates of unreported code flaws, improved honesty in self-assessment, and better tool-calling reliability. On Anthropic's Super-Agent benchmark, Opus 4.8 completes every case end-to-end, and it scores 84% on Online-Mind2Web for computer-use and browser-agent tasks. It achieves 88.6% on SWE-bench Verified and 69.2% on SWE-bench Pro. Alongside the model, Anthropic launched Dynamic Workflows in Claude Code (research preview), which enables Claude to orchestrate hundreds of parallel subagents for codebase-scale tasks such as large migrations. The Messages API was also updated to accept mid-task system messages without breaking prompt caching, improving support for long-running agentic pipelines.
Anthropic
Claude Opus 4.5
Claude Opus 4.5 is Anthropic’s most advanced large language model in the Claude Opus family, designed for high-end reasoning, coding, and autonomous agent workflows. Released in late 2025, it targets developers and enterprises that need reliable long-context understanding and strong multi-step problem solving in production environments.The model supports text and code natively, with reported multimodal capabilities for documents and images, and offers an exceptionally large context window of up to roughly 200,000 tokens. Claude Opus 4.5 emphasizes long-horizon task execution, complex code generation and refactoring, and sustained reasoning over large inputs. In the current landscape, it positions itself as a premium, accuracy- and reasoning-focused alternative to faster or cheaper peers, trading cost for depth and contextual fidelity. Typical applications include advanced coding assistants, research analysis, agentic automation, and enterprise knowledge workflows deployed via Anthropic’s API or major cloud platforms.
Qwen
Qwen3 VL 235B A22B Instruct
Qwen3 VL 235B A22B Instruct is a flagship multimodal vision-language model developed by Qwen (Alibaba Cloud), designed for instruction-following tasks that combine advanced text generation with visual understanding. It serves as a high-end open-weight model for developers and researchers building multimodal AI systems that require strong reasoning, perception, and long-context capabilities.The model supports interleaved text and image inputs, very long context windows (up to roughly 256K tokens), and efficient inference through a mixture-of-experts architecture with about 22B active parameters out of 235B total. In today’s landscape, it competes with top-tier proprietary vision-language models while offering the advantages of open weights and flexible deployment. Typical applications include multimodal assistants, document and image analysis, visual reasoning, and large-context instruction-based workflows.
OpenAI
GPT-5
GPT-5, released by OpenAI in August 2025, is a multimodal large language model that advances beyond the GPT-4 family with a new “unified system” architecture. This design allows the model to dynamically choose between fast responses and extended reasoning depending on task complexity. It supports text, code, and images, alongside stronger tool use and agentic workflows, making it more adaptable for real-world problem solving. While its exact context window size is not disclosed, GPT-5 is optimized for long-horizon reasoning and multi-step tool chaining, indicating substantially expanded capacity over its predecessors.The release introduced specialized variants: GPT-5 Pro, offering extended reasoning for complex workflows, and GPT-5 Codex, optimized for advanced coding tasks such as large-scale refactoring and code review. GPT-5 shows benchmark gains in coding, biomedical reasoning, multimodal analysis, and scientific tasks. Developers also gain new controls, such as verbosity and personalization parameters, for greater steerability. With these improvements, GPT-5 positions itself as OpenAI’s most capable and versatile model, suited for enterprise automation, research, healthcare, and sophisticated coding environments.

Other Google Gemini Pro models

Other versions in the same family as Gemini 2.5 Pro.

Gemini 2.5 Pro License

Proprietary

License terms and commercial-use guidance for Gemini 2.5 Pro.

License information is provided as a guide and is not legal advice.