Anthropic

Anthropic: Claude Opus 4.1

Claude Opus 4.1 Overview

Claude 4.1 Opus, released by Anthropic in August 2025, is the upgraded flagship of the Claude 4 family, building on Opus 4 with stronger reasoning and agentic capabilities. Like its predecessor, it is multimodal and optimized for text, code, and tool use, with support for large context windows suited to multi-file codebases, technical workflows, and long-horizon problem solving.

On benchmarks, Opus 4.1 improves coding performance, reaching ~74.5% on SWE-Bench Verified compared to Opus 4’s ~72.5%. It demonstrates more precise debugging, refactoring, and orchestration of agentic tasks while maintaining similar safety and alignment safeguards. It is best suited for enterprise-scale software development, research automation, and advanced reasoning workflows where reliability and depth of analysis are critical.

Claude Opus 4.1 Interactive Demo

Claude Opus 4.1 Details & Performance

Details

Resources

Vision Tasks

Vision LanguageObject DetectionClassificationOCRVisual Question AnsweringCaptioning

Features

Foundation VisionLLMs with Vision CapabilitiesMultimodal Vision

Usage

Past 30 Days

Performance

Avg. Latency

Arena Rankings

Claude Opus 4.1 Vision Evals

Visual Understanding

72 models · 67 tasks
HighestLowest
This model#41 of 7259.7% pass rate · better than 38%
Score59.7%pass rate across 67 tasks
Speed7.09savg response per task
Cost$0.040 / task$15.00 in · $75.00 out / 1M
Tokens2.1K / task2.0K in · 140 out
Score key:≥75%40–74%<40%
CategoryPassedScore
Document Understanding8 / 9
88.9%
Defect Detection11 / 15
73.3%
Object Understanding9 / 14
64.3%
Spatial Understanding12 / 19
63.2%
Object Counting0 / 10
0%
HighestLowest
This model#34 of 5068.56% pass rate · better than 30%
Score68.56%pass rate across 229 tasks
Speed5.08savg response per task
Cost$0.016 / task$15.00 in · $75.00 out / 1M
Tokens656 / task552 in · 97 out
Score key:≥75%40–74%<40%
CategoryPassedScore
Text Recognition24 / 30
80%
Focused Scene OCR73 / 99
73.7%
VQA & Extraction41 / 60
68.3%
License Plate Recognition16 / 30
53.3%
Handwritten Math3 / 10
30%

Scores based on a single evaluation run · Methodology

View all Vision Evals →

Claude Opus 4.1 Pricing

Claude Opus 4.1 costs $15.00 per 1M input tokens and $75.00 per 1M output tokens.

Input$15.00 / 1M tokens
Output$75.00 / 1M tokens
Cached input$1.50 / 1M tokens

Pricing updated Jun 28, 2026

Price vs. performance

Estimated cost per task vs. Visual Understanding score, for this model and others ranked near it. Upper-left is the sweet spot (high quality, low cost).

11 of 11 models plotted

ModelScoreMedian tokensEst. cost / taskCompare
AnthropicClaude Opus 4.867.2%2.2K$0.012Compare
AnthropicClaude Opus 4.767.2%2.6K$0.015Compare
GoogleGemma 4 31B67.2%467$0.0001Compare
AnthropicClaude Opus 4.6 64.2%2.3K$0.014Compare
AnthropicClaude Sonnet 4.559.7%2.3K$0.0092Compare
AnthropicClaude Opus 4.1(this model)59.7%2.1K$0.040
OpenAIGPT-5 Nano58.2%2.7K$0.0003Compare
QwenQwen3.5 397B A17B58.2%1.5K$0.0006Compare
GoogleGemini 2.5 Flash55.2%476$0.0005Compare
GoogleGemini 2.5 Flash-Lite53.7%301$0.0000Compare
MoonshotAIKimi K2.535.8%2.7K$0.0021Compare

Alternatives to Claude Opus 4.1

Other models worth comparing for similar use cases.

Google
Gemini 3.1 Pro
Gemini 3.1 Pro is a proprietary multimodal model from Google’s Gemini 3 series, released in early 2026 and designed for advanced reasoning across large multimodal datasets. It accepts text, images, audio, video, and documents, supporting up to a 1-million-token input context with up to 64k output tokens. Compared with Gemini 3 Pro, it improves long-context synthesis and multi-step reasoning, enabling more reliable analysis of large documents, datasets, and software codebases.The model also advances visual understanding and grounding, allowing it to interpret UI screenshots, diagrams, and real-world scenes while referencing specific regions within images or video. These capabilities make Gemini 3.1 Pro well suited for multimodal workflows involving document processing, interface analysis, robotics research, and complex visual reasoning.
Qwen
Qwen3 VL 235B A22B Instruct
Qwen3 VL 235B A22B Instruct is a flagship multimodal vision-language model developed by Qwen (Alibaba Cloud), designed for instruction-following tasks that combine advanced text generation with visual understanding. It serves as a high-end open-weight model for developers and researchers building multimodal AI systems that require strong reasoning, perception, and long-context capabilities.The model supports interleaved text and image inputs, very long context windows (up to roughly 256K tokens), and efficient inference through a mixture-of-experts architecture with about 22B active parameters out of 235B total. In today’s landscape, it competes with top-tier proprietary vision-language models while offering the advantages of open weights and flexible deployment. Typical applications include multimodal assistants, document and image analysis, visual reasoning, and large-context instruction-based workflows.

Other Anthropic Opus models

Other versions in the same family as Claude Opus 4.1.

Claude Opus 4.1 License

Proprietary

License terms and commercial-use guidance for Claude Opus 4.1.

License information is provided as a guide and is not legal advice.