Roboflow
Anthropic

Anthropic: Claude Sonnet 5

Claude Sonnet 5 Overview

Claude Sonnet 5 is a mid-tier large language model from Anthropic, released on June 30, 2026, as the latest model in the Sonnet series and a direct successor to Claude Sonnet 4.6. It is a hybrid reasoning model designed primarily for agentic workflows, software coding, and professional tasks. The model features a 1 million token context window, a 128k maximum output token limit, and runs adaptive thinking by default, giving API users fine-grained control over reasoning effort across five levels (low, medium, high, max, and extra-high). It uses an updated tokenizer shared with Opus 4.7 and later models, which produces approximately 30% more tokens for equivalent text compared to earlier Claude models. On benchmarks, Sonnet 5 scores 63.2% on agentic coding and 81.2% on OSWorld, narrowing the gap with Opus 4.8 while remaining at Sonnet-tier pricing.

The model supports text and image input with text output, and accepts tools including browsers and terminals for autonomous multi-step task execution. Anthropic's safety evaluations report that Sonnet 5 shows a lower rate of undesirable behaviors than Sonnet 4.6 and is generally safer in agentic contexts, with improved resistance to prompt injection and reduced sycophancy. Cybersecurity safeguards equivalent to those on Opus 4.7 and 4.8 are active, though Anthropic notes the model was not deliberately trained on cybersecurity tasks. The model is proprietary and API-only, with no open weights.

Claude Sonnet 5 Interactive Demo

Claude Sonnet 5 Details & Performance

Details

Resources

Vision Tasks

CaptioningClassificationDocument Question AnsweringMulti-Label ClassificationOCRObject DetectionVision LanguageVisual Question Answering

Features

Multimodal VisionLLMs with Vision Capabilities

Usage

Past 30 Days

Performance

Avg. Latency

Arena Rankings

Claude Sonnet 5 Vision Evals

Visual Understanding

74 models · 67 tasks
HighestLowest
This model#17 of 7470.15% pass rate · better than 72%
Score70.15%pass rate across 67 tasks
Speed3.90savg response per task
Cost$0.0048 / task$2.00 in · $10.00 out / 1M
Tokens2.2K / task2.1K in · 61 out
Score key:≥75%40–74%<40%
CategoryPassedScore
Object Understanding13 / 14
92.9%
Spatial Understanding15 / 19
78.9%
Defect Detection11 / 15
73.3%
Document Understanding6 / 9
66.7%
Object Counting2 / 10
20%
HighestLowest
This model#13 of 5283.84% pass rate · better than 73%
Score83.84%pass rate across 229 tasks
Speed2.77savg response per task
Cost$0.0019 / task$2.00 in · $10.00 out / 1M
Tokens725 / task642 in · 64 out
Score key:≥75%40–74%<40%
CategoryPassedScore
License Plate Recognition27 / 30
90%
Focused Scene OCR88 / 99
88.9%
Text Recognition24 / 30
80%
VQA & Extraction48 / 60
80%
Handwritten Math5 / 10
50%

Scores based on a single evaluation run · Methodology

View all Vision Evals →

Claude Sonnet 5 Pricing

Claude Sonnet 5 costs $2.00 per 1M input tokens and $10.00 per 1M output tokens.

Input$2.00 / 1M tokens
Output$10.00 / 1M tokens
Cached input$0.200 / 1M tokens

Pricing updated Jul 1, 2026

Price vs. performance

Estimated cost per task vs. Visual Understanding score, for this model and others ranked near it. Upper-left is the sweet spot (high quality, low cost).

11 of 11 models plotted

ModelScoreMedian tokensEst. cost / taskCompare
QwenQwen3.5 122B A10B76.1%1.2K$0.0003Compare
GoogleGemini 3.1 Pro75.8%1.1K$0.0024Compare
GoogleGemini 3 Flash74.6%1.4K$0.0014Compare
OpenAIGPT-5 Mini73.1%1.8K$0.0006Compare
QwenQwen3.5 27B71.6%1.2K$0.0002Compare
AnthropicClaude Sonnet 5(this model)70.2%2.2K$0.0048
AnthropicClaude Sonnet 4.670.2%2.3K$0.0080Compare
GoogleGemini 2.5 Pro70.2%856$0.0060Compare
GoogleGemini 3.1 Flash-Lite68.7%1.1K$0.0003Compare
GoogleGemma 4 26B A4B68.7%531$0.0001Compare
QwenQwen3.6 Plus68.7%1.6K$0.0005Compare

Alternatives to Claude Sonnet 5

Other models worth comparing for similar use cases.

OpenAI
GPT-5 Mini
GPT-5 Mini, released by OpenAI on August 7, 2025, is a mid-tier variant of the GPT-5 family that balances cost, speed, and capability. It is multimodal, supporting both text and image inputs, and offers a substantial input context window of ~400,000 tokens with output lengths up to ~128,000 tokens. While less powerful than the full GPT-5, it inherits its safety tuning, instruction-following improvements, and multimodal reasoning, making it a practical choice for developers who need large context handling without the expense of premium models.GPT-5 Mini is optimized for affordability while retaining strong reasoning performance. Benchmarks show it outperforming earlier models such as GPT-4o on many multimodal and medical VQA tasks, though it lags behind GPT-5 on the most complex problems. Ideal use cases include prototyping, scalable content generation, document analysis, and mid-range reasoning tasks where efficiency and context capacity matter more than top-tier accuracy.
Google
Gemini 3.5 Flash
Gemini 3.5 Flash is a multimodal language model developed by Google DeepMind and released at Google I/O 2026. It is built on the Gemini 3 Flash reasoning foundation and introduces configurable thinking levels (minimal, low, medium, and high) that allow developers to tune the depth of internal reasoning before a response is generated. The model accepts text, image, video, audio, and PDF inputs and produces text output, with a 1 million token context window and up to 65,000 output tokens per request. It is natively multimodal, processing visual inputs alongside text to support tasks such as image captioning, classification, optical character recognition, object detection, and visual grounding, where the model references specific regions within an image or video frame.Its vision capabilities extend to interpreting UI screenshots, diagrams, charts, and real-world scenes, as well as understanding video and live frame sequences for activity and scene recognition. The model supports combined tool use, including Google Search, URL context, code execution, and custom functions, within a single request, and it uses reasoning context from previous turns when thought signatures are present in the conversation history, enabling persistent multi-turn reasoning chains. Gemini 3.5 Flash carries a knowledge cutoff of January 2026 and is available via the Gemini API, Google AI Studio, Google Antigravity, and the Gemini Enterprise Agent Platform.
OpenAI
GPT-5.4 Mini
GPT-5.4 mini is a fast, cost-efficient model developed by OpenAI and released on March 17, 2026, optimized for high-throughput workloads and subagent orchestration. It supports text and image inputs within a 400,000-token context window, making it ideal for processing extensive visual datasets and large codebases in a single request. Designed for low-latency production environments, the model integrates with key API features including function calling, web search, and tool-based computer use, allowing it to assist in automated workflows that require navigating digital interfaces.Compared to the previous GPT-5 mini, this version runs more than twice as fast while approaching the performance levels of the flagship GPT-5.4 on reasoning and coding benchmarks. While the larger GPT-5.4 introduces native, state-of-the-art computer-use capabilities, GPT-5.4 mini provides a scalable alternative for interpreting screenshots and reasoning over dense UI layouts. For vision tasks on Playground, it excels at extracting structured information from visual documents and assisting in agentic tasks that involve real-time interpretation of software interfaces alongside text.
Google
Gemini 2.5 Flash
Gemini 2.5 Flash, released on June 17, 2025, is Google DeepMind’s production-ready, efficiency-focused model in the Gemini 2.5 family. It is multimodal, accepting text, images, video, and audio as inputs, with text as the primary output format. The model supports 1 million input tokens and up to 65K output tokens, enabling it to process very large contexts such as books, long video transcripts, or extensive datasets. Its training knowledge extends to January 2025.Designed as a price-performance leader, Gemini 2.5 Flash balances speed and reasoning power, making it suitable for everyday enterprise and developer use cases without the higher latency and cost of Pro models. It supports advanced workflows like function calling, code execution, search grounding, URL context ingestion, and structured outputs. While efficient and scalable, output length is still limited compared to its input capacity, and multimodal outputs (e.g. image or audio generation) remain restricted to specialized or preview variants.
Qwen
Qwen3.6 35B A3B
Qwen3.6-35B-A3B is a sparse Mixture-of-Experts (MoE) multimodal language model developed by the Qwen team at Alibaba Group. It carries 35 billion total parameters but activates only approximately 3 billion per forward pass via a learned routing mechanism, giving it the representational capacity of a large dense model at a fraction of the inference compute. The model is natively multimodal, processing images, documents, and video alongside text as a core architectural capability rather than an add-on. It supports a native context window of 262,144 tokens, extensible up to 1,010,000 tokens via YaRN. A key design feature is the unified thinking/non-thinking mode framework: users can switch between deliberate chain-of-thought reasoning and fast direct responses within a single model, and a "thinking preservation" option retains reasoning context across multi-turn agentic workflows to reduce redundant computation.The model is specifically optimized for agentic coding tasks, including repository-level reasoning, frontend workflow generation, multi-step tool use, and MCP (Model Context Protocol) integration. On SWE-bench Verified it scores 73.4%, on Terminal-Bench 2.0 it scores 51.5%, and on MCPMark it scores 37.0%. For vision-language tasks it achieves 92.0 on RefCOCO, 89.9 on OmniDocBench 1.5, and 83.7 on VideoMMMU. The model also supports Multi-Token Prediction (MTP) for speculative decoding. All Qwen3.6 open-weight models are released under the Apache 2.0 license.

Other Anthropic Sonnet models

Other versions in the same family as Claude Sonnet 5.

Claude Sonnet 5 License

Proprietary

License terms and commercial-use guidance for Claude Sonnet 5.

License information is provided as a guide and is not legal advice.