OpenAI: GPT-4.1

This model is deprecated

GPT-4.1 and can no longer be run here. Its evaluation results and details remain available for reference. Try GPT-5.5 instead.

GPT-4.1 Overview

GPT-4.1, released by OpenAI in April 2025, is a multimodal large language model that advances the GPT-4 series with major improvements in coding, reasoning, and instruction following. It accepts both text and images, supports tool calling and structured outputs, and features an expanded context window of up to ~1 million tokens—enabling it to process very large documents, multi-file codebases, or long conversations in a single prompt. Its knowledge is current through June 2024.

The GPT-4.1 family includes standard, mini, and nano variants, offering trade-offs between performance, cost, and latency. While parameter counts remain undisclosed, the series improves efficiency and responsiveness compared to GPT-4, making it suitable for both enterprise-scale tasks and cost-sensitive applications. Common use cases include software development, technical research, knowledge management, multimodal analysis, and high-context enterprise assistants.

GPT-4.1 Details & Performance

Details

Resources

—

Vision Tasks

Vision LanguageObject DetectionClassificationOCRVisual Question AnsweringCaptioning

Features

Foundation VisionLLMs with Vision CapabilitiesMultimodal Vision

Usage

Past 30 Days

Not available

Not in Playground

Performance

Avg. Latency

Arena Rankings

GPT-4.1 Vision Evals

#17 of 70 models|

Pass/fail results across 67 image tasks

Overall Score70.15%across 67 eval prompts

Prompts Passed47 / 675 task categories

Avg Response Time2.56son eval prompts

Median tokens / task891 in · 6 out~$0.0018 / task · 67/67 tasks

Score key:≥75%40–74%<40%

Category	Passed	Score
Spatial Understanding	16 / 19	84.2%
Defect Detection	12 / 15	80%
Object Understanding	11 / 14	78.6%
Document Understanding	6 / 9	66.7%
Object Counting	2 / 10	20%

Scores based on single evaluation run · Methodology

View all Vision Evals →

GPT-4.1 Pricing

GPT-4.1 costs $2.00 per 1M input tokens and $8.00 per 1M output tokens.

Input$2.00 / 1M tokens

Output$8.00 / 1M tokens

Cached input$0.500 / 1M tokens

Pricing updated Jun 22, 2026

Alternatives to GPT-4.1

Other models worth comparing for similar use cases.

GPT-5, released by OpenAI in August 2025, is a multimodal large language model that advances beyond the GPT-4 family with a new “unified system” architecture. This design allows the model to dynamically choose between fast responses and extended reasoning depending on task complexity. It supports text, code, and images, alongside stronger tool use and agentic workflows, making it more adaptable for real-world problem solving. While its exact context window size is not disclosed, GPT-5 is optimized for long-horizon reasoning and multi-step tool chaining, indicating substantially expanded capacity over its predecessors.The release introduced specialized variants: GPT-5 Pro, offering extended reasoning for complex workflows, and GPT-5 Codex, optimized for advanced coding tasks such as large-scale refactoring and code review. GPT-5 shows benchmark gains in coding, biomedical reasoning, multimodal analysis, and scientific tasks. Developers also gain new controls, such as verbosity and personalization parameters, for greater steerability. With these improvements, GPT-5 positions itself as OpenAI’s most capable and versatile model, suited for enterprise automation, research, healthcare, and sophisticated coding environments.

GPT-5.1 is an OpenAI frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, clearer long-form responses, and improved instruction following. It introduces two variants—Instant and Thinking—that dynamically adjust computational depth. Instant focuses on fast, conversational replies, while Thinking provides deeper, more thorough reasoning for complex tasks. In ChatGPT, GPT-5.1 also powers an Auto mode that switches between these variants automatically based on task difficulty.The model supports significantly expanded context windows: up to 16K/32K/128K tokens for Instant (depending on tier) and up to 196K tokens for Thinking on paid tiers. GPT-5.1 is also compatible with ChatGPT tools such as web search, file and image analysis, and multi-step workflows.GPT-5.1 includes enhanced tone and style controls, allowing responses to be tailored using presets like Friendly, Professional, or Efficient, along with fine-grained adjustments for warmth, brevity, and emoji usage. Designed for broad applications in research assistance, coding, analysis, and conversational agents, GPT-5.1 serves as OpenAI’s primary full-capability successor to GPT-5 across ChatGPT and API integrations.

Gemini 2.5 Pro, released on June 17, 2025, is Google DeepMind’s most capable model in the Gemini 2.5 family, optimized for deep reasoning, coding, and complex multimodal tasks. It accepts text, images, audio, video, and PDFs as input and outputs text. The model supports 1 million input tokens with an output capacity of up to 65K tokens, enabling large-scale comprehension of datasets, codebases, and technical documents. Its training knowledge extends to January 2025.Pro outperforms earlier Gemini 2.0 models across benchmarks, including agentic coding tasks where it achieved ~63.8% on SWE-Bench Verified. It supports structured outputs, function calling, code execution, search grounding, and URL context, making it well-suited for enterprise, STEM, and developer workflows. However, it does not currently support image or audio generation in its stable release, and its higher computational cost and latency make it less efficient than Flash or Flash-Lite. It is available via the Gemini API, Google AI Studio, and Vertex AI.

Claude Sonnet 4

Claude 4 Sonnet, released by Anthropic in May 2025, is the mid-tier model in the Claude 4 family, designed to balance capability, cost, and speed. It is multimodal, accepting both text and images, and extends beyond prior versions with improved “computer use” support, allowing API-driven interaction with desktop-like interfaces. By default, it supports 200,000 tokens of context, but as of August 2025, it also offers a 1 million-token context window in public beta—making it one of the most context-capable models available for processing entire codebases or large document sets in a single request.Sonnet 4 is significantly cheaper than the flagship Opus while still demonstrating strong reasoning, coding, and instruction-following ability with reduced hallucinations. Its extended context capabilities and lower latency make it well-suited for enterprise-scale knowledge management, software development, research assistants, and productivity automation where both cost efficiency and high reliability are essential.

Gemini 2.5 Flash

Gemini 2.5 Flash, released on June 17, 2025, is Google DeepMind’s production-ready, efficiency-focused model in the Gemini 2.5 family. It is multimodal, accepting text, images, video, and audio as inputs, with text as the primary output format. The model supports 1 million input tokens and up to 65K output tokens, enabling it to process very large contexts such as books, long video transcripts, or extensive datasets. Its training knowledge extends to January 2025.Designed as a price-performance leader, Gemini 2.5 Flash balances speed and reasoning power, making it suitable for everyday enterprise and developer use cases without the higher latency and cost of Pro models. It supports advanced workflows like function calling, code execution, search grounding, URL context ingestion, and structured outputs. While efficient and scalable, output length is still limited compared to its input capacity, and multimodal outputs (e.g. image or audio generation) remain restricted to specialized or preview variants.

GPT-4.1 License

Proprietary

License terms and commercial-use guidance for GPT-4.1.

License information is provided as a guide and is not legal advice.