GPT-5.5 Overview

GPT-5.5 is a multimodal large language model released by OpenAI on April 23, 2026, engineered for autonomous, multi-step knowledge work and agentic workflows. It accepts text, images, and code as input, featuring enhanced spatial reasoning and visual grounding to support its computer use capabilities for operating software and navigating UI elements. Built to execute complex workflows end-to-end, the model interprets loosely defined tasks, selects appropriate tools, and performs self-verification with minimal user intervention. It is available in a standard version, a Thinking mode for extended reasoning budgets, and a Pro variant that uses parallel test-time compute for maximum precision on complex tasks.

Co-optimized with NVIDIA for GB200 NVL72 infrastructure, GPT-5.5 delivers per-token latency comparable to its predecessor GPT-5.4 while maintaining a 1-million-token context window. Despite increased capability, the model achieves greater token efficiency in coding and data analysis workflows, often completing tasks with fewer total tokens than previous versions. OpenAI reports a 60% reduction in hallucination rate compared to GPT-5.4, improving reliability for accuracy-sensitive applications. API access is available via the Responses and Chat Completions endpoints at $5 per million input tokens and $30 per million output tokens, double the unit price of GPT-5.4.

GPT-5.5 Interactive Demo

GPT-5.5 Details & Performance

Details

Resources

Vision Tasks

CaptioningClassificationOCRObject DetectionVision LanguageVisual Question Answering

Features

Multimodal VisionLLMs with Vision Capabilities

Usage

Past 30 Days

Performance

Avg. Latency

Arena Rankings

GPT-5.5 Vision Evals

Visual Understanding

72 models · 67 tasks
HighestLowest
This model#3 of 7277.61% pass rate · better than 92%
Score77.61%pass rate across 67 tasks
Speed30.12savg response per task
Cost$0.011 / task$5.00 in · $30.00 out / 1M
Tokens1.7K / task1.4K in · 138 out
Score key:≥75%40–74%<40%
CategoryPassedScore
Object Understanding13 / 14
92.9%
Document Understanding8 / 9
88.9%
Defect Detection13 / 15
86.7%
Spatial Understanding15 / 19
78.9%
Object Counting3 / 10
30%
HighestLowest
This model#16 of 5081.22% pass rate · better than 66%
Score81.22%pass rate across 229 tasks
Speed5.16savg response per task
Cost$0.0030 / task$5.00 in · $30.00 out / 1M
Tokens276 / task105 in · 83 out
Score key:≥75%40–74%<40%
CategoryPassedScore
License Plate Recognition28 / 30
93.3%
VQA & Extraction52 / 60
86.7%
Text Recognition25 / 30
83.3%
Focused Scene OCR77 / 99
77.8%
Handwritten Math4 / 10
40%

Scores based on a single evaluation run · Methodology

View all Vision Evals →

GPT-5.5 Pricing

GPT-5.5 costs $5.00 per 1M input tokens and $30.00 per 1M output tokens.

Input$5.00 / 1M tokens
Output$30.00 / 1M tokens
Cached input$0.500 / 1M tokens

Pricing updated Jun 28, 2026

Price vs. performance

Estimated cost per task vs. Visual Understanding score, for this model and others ranked near it. Upper-left is the sweet spot (high quality, low cost).

8 of 8 models plotted

ModelScoreMedian tokensEst. cost / taskCompare
GoogleGemini 3.5 Flash79.1%1.4K$0.0043Compare
OpenAIGPT-5.477.6%1.7K$0.0052Compare
OpenAIGPT-5.5(this model)77.6%1.7K$0.011
QwenQwen3.5 122B A10B76.1%1.2K$0.0003Compare
GoogleGemini 3.1 Pro75.8%1.1K$0.0024Compare
GoogleGemini 3 Flash74.6%1.4K$0.0014Compare
OpenAIGPT-5 Mini73.1%1.8K$0.0006Compare
QwenQwen3.5 27B71.6%1.2K$0.0002Compare

Alternatives to GPT-5.5

Other models worth comparing for similar use cases.

Anthropic
Claude Opus 4.8
Claude Opus 4.8 is Anthropic's most capable generally available large language model, released on May 28, 2026 as an incremental upgrade to Claude Opus 4.7. The model accepts text and image inputs and produces text outputs, with a 1 million token context window on the Claude API, Amazon Bedrock, and Google Cloud Vertex AI (200k tokens on Microsoft Foundry) and up to 128k max output tokens. It uses adaptive thinking and supports adjustable effort tiers — high by default, with extra and max tiers available for more demanding tasks. A fast mode operates at approximately 2.5x standard speed. The model is described by Anthropic as a hybrid reasoning model designed for advanced coding, agentic workflows, long-context reasoning, and professional knowledge work.Key behavioral improvements over Opus 4.7 include substantially reduced rates of unreported code flaws, improved honesty in self-assessment, and better tool-calling reliability. On Anthropic's Super-Agent benchmark, Opus 4.8 completes every case end-to-end, and it scores 84% on Online-Mind2Web for computer-use and browser-agent tasks. It achieves 88.6% on SWE-bench Verified and 69.2% on SWE-bench Pro. Alongside the model, Anthropic launched Dynamic Workflows in Claude Code (research preview), which enables Claude to orchestrate hundreds of parallel subagents for codebase-scale tasks such as large migrations. The Messages API was also updated to accept mid-task system messages without breaking prompt caching, improving support for long-running agentic pipelines.
Google
Gemini 3.1 Pro
Gemini 3.1 Pro is a proprietary multimodal model from Google’s Gemini 3 series, released in early 2026 and designed for advanced reasoning across large multimodal datasets. It accepts text, images, audio, video, and documents, supporting up to a 1-million-token input context with up to 64k output tokens. Compared with Gemini 3 Pro, it improves long-context synthesis and multi-step reasoning, enabling more reliable analysis of large documents, datasets, and software codebases.The model also advances visual understanding and grounding, allowing it to interpret UI screenshots, diagrams, and real-world scenes while referencing specific regions within images or video. These capabilities make Gemini 3.1 Pro well suited for multimodal workflows involving document processing, interface analysis, robotics research, and complex visual reasoning.
Grok
Grok 4
Grok 4, released by xAI on July 9, 2025, is the fourth-generation model in the Grok family and the most advanced to date. It is multimodal, supporting text, vision, tool use, and real-time web search, with a reported 256,000-token context window for long-form reasoning and document analysis. Its training data extends through November 2024, making it the most up-to-date Grok model at launch.The lineup includes Grok 4 Generalist for broad tasks, Grok 4 Heavy for higher-capacity reasoning, and Grok 4 Code optimized for programming and debugging. A notable feature is its always-on “Think” mode, designed for deeper multi-step reasoning. While xAI has not disclosed parameter counts, Grok 4 is positioned to compete with frontier models like GPT-5 and Claude 4, balancing real-time knowledge via web integration with structured tool use. It is best suited for coding, complex reasoning, and multimodal AI assistants.
Qwen
Qwen3.5 397B A17B
Qwen3.5-397B-A17B is a 397B-parameter (17B active) open-weight multimodal model developed by Alibaba’s Qwen team, released on 2026-02-16 under Apache-2.0. It supports text and image inputs with text outputs, combining a sparse Mixture-of-Experts architecture with Gated Delta Networks for efficient scaling. The model provides native vision-language reasoning and a large ~262K token context window, extendable to ~1M tokens.As the first open-weight release in the Qwen3.5 family, it positions itself as a high-capacity, long-context alternative in the large vision-language space, balancing scale and efficiency via sparse activation. It is designed for advanced reasoning, coding, agent workflows, and multimodal understanding tasks.
Anthropic
Claude Opus 4.7
Claude Opus 4.7 is a proprietary multimodal language model developed by Anthropic, released on April 16, 2026. It is designed for agentic coding, long-horizon task execution, and enterprise knowledge work. The model supports text and vision inputs and operates with a context window of up to 1,000,000 tokens. It introduces adaptive thinking, which dynamically allocates reasoning based on task complexity, along with configurable effort controls including a new xhigh setting that sits between the existing high and max levels. It achieves 87.6% on SWE-bench Verified and 78.0% on OSWorld-Verified, reflecting strong performance on autonomous software engineering and computer use tasks respectively.Compared to Claude Opus 4.6, version 4.7 shows improved instruction following and higher reliability in extended agentic tasks. Vision capabilities now support high-resolution inputs up to 2,576px on the long edge (~3.75 megapixels), more than three times the resolution of prior Claude models, enabling finer interpretation of dense diagrams, UI screenshots, and document layouts. These improvements, combined with self-verification on long-running tasks and a new task budget system for controlling agentic loops, make it well-suited for complex software engineering, technical analysis, and multimodal vision workflows.

Other OpenAI GPT models

Other versions in the same family as GPT-5.5.

GPT-5.5 License

Proprietary

License terms and commercial-use guidance for GPT-5.5.

License information is provided as a guide and is not legal advice.