Anthropic: Claude 3.5 Haiku

This model is deprecated

Claude 3.5 Haiku and can no longer be run here. Its evaluation results and details remain available for reference. Try Claude Haiku 4.5 instead.

Claude 3.5 Haiku Overview

Claude 3.5 Haiku, released by Anthropic in October 2024, is the fastest member of the Claude 3.5 family, optimized for low-latency, high-throughput applications. It is a multimodal model that handles both text and image inputs and supports a large ~200,000-token context window. Haiku is designed to balance efficiency with intelligence, outperforming even Claude 3 Opus on several reasoning benchmarks while maintaining its hallmark speed.

Typical applications include real-time chatbots, code completion, large-scale data extraction, and content moderation—scenarios where rapid response and scalability are essential.

Claude 3.5 Haiku Details & Performance

Details

Resources

—

Vision Tasks

Vision LanguageObject DetectionClassificationOCRVisual Question AnsweringCaptioning

Features

Foundation VisionLLMs with Vision CapabilitiesMultimodal Vision

Usage

Past 30 Days

Not available

Not in Playground

Performance

Avg. Latency

Arena Rankings

Claude 3.5 Haiku Vision Evals

#35 of 70 models|

Pass/fail results across 67 image tasks

Overall Score62.69%across 67 eval prompts

Prompts Passed42 / 675 task categories

Avg Response Time18.36son eval prompts

Score key:≥75%40–74%<40%

Category	Passed	Score
Document Understanding	7 / 9	77.8%
Object Understanding	10 / 14	71.4%
Defect Detection	10 / 15	66.7%
Spatial Understanding	10 / 19	52.6%
Object Counting	5 / 10	50%

Scores based on single evaluation run · Methodology

View all Vision Evals →

Alternatives to Claude 3.5 Haiku

Other models worth comparing for similar use cases.

Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic’s lightweight model in the Claude 4.5 series, released in October 2025 under a proprietary license. Designed for speed and cost efficiency, it delivers near-frontier performance while maintaining Anthropic’s AI Safety Level 2 standard. Haiku 4.5 supports both text and multimodal (text and image) inputs, integrates tool use and extended reasoning, and features a 200,000 token context window, making it adept at handling long or complex workflows. Though the parameter count remains undisclosed, it achieves about 73.3% on SWE-bench Verified, reflecting strong coding and reasoning ability. Haiku 4.5 is ideal for developers and researchers seeking rapid, cost-effective model calls for analysis, coding, or multimodal understanding.

GPT-5 Nano, released by OpenAI on August 7, 2025, is the smallest and most cost-efficient model in the GPT-5 family. Like its larger counterparts, it is multimodal—accepting text and images, supporting tool use, structured outputs, and reasoning—but it is optimized for speed, low latency, and affordability. It features input and output token limits of roughly 272K and 128K tokens respectively, enabling large-context processing even at its compact scale. Its knowledge cutoff is around May 2024, slightly earlier than the full GPT-5 model.GPT-5 Nano is well-suited for high-volume or cost-sensitive deployments such as mobile apps, embedded AI systems, or rapid-response APIs. While it offers less depth on complex reasoning and coding tasks compared to GPT-5 Mini or Pro, it retains core multimodal and agentic capabilities, making it an attractive option where efficiency and scale matter more than maximum performance.

GPT-5 Mini, released by OpenAI on August 7, 2025, is a mid-tier variant of the GPT-5 family that balances cost, speed, and capability. It is multimodal, supporting both text and image inputs, and offers a substantial input context window of ~400,000 tokens with output lengths up to ~128,000 tokens. While less powerful than the full GPT-5, it inherits its safety tuning, instruction-following improvements, and multimodal reasoning, making it a practical choice for developers who need large context handling without the expense of premium models.GPT-5 Mini is optimized for affordability while retaining strong reasoning performance. Benchmarks show it outperforming earlier models such as GPT-4o on many multimodal and medical VQA tasks, though it lags behind GPT-5 on the most complex problems. Ideal use cases include prototyping, scalable content generation, document analysis, and mid-range reasoning tasks where efficiency and context capacity matter more than top-tier accuracy.

Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite, released for general availability on July 22, 2025, is the most cost-efficient model in the Gemini 2.5 family, designed for high-volume and latency-sensitive tasks. It is multimodal, supporting text, images, video, audio, and PDFs as inputs, with text as its primary output. The model handles up to 1 million input tokens and generates outputs up to 64K tokens, making it suitable for large-scale document or media processing at low cost. It is built on a Sparse Mixture-of-Experts architecture with native multimodal support, though exact parameter counts are undisclosed.Flash-Lite offers the lowest usage cost among Gemini 2.5 models. It introduces developer controls for “thinking mode,” allowing fine-tuning of reasoning depth vs. efficiency. It also integrates native tools such as code execution, search grounding, and URL context. While strong on translation, classification, coding, and general multimodal reasoning, it lacks support for image or audio generation in its stable release and is less capable than Gemini 2.5 Flash or Pro on complex reasoning-heavy workflows.

Qwen2.5 VL 7B Instruct

Qwen2.5-VL-7B-Instruct is a 7-billion parameter vision-language model from Alibaba’s QwenLM team, released on January 26, 2025 under the Apache 2.0 license. It is the instruction-tuned variant of the 7B scale in the Qwen2.5-VL family, designed to process multimodal inputs such as text, images, charts, documents, and video. The model enables structured outputs—including JSON for structured content and bounding boxes for visual localization. Weights are publicly available on Hugging Face and GitHub, making it suitable for both research and applied multimodal use.

Moondream 2 is a small open-source vision-language model from Moondream, the company founded by Vikhyat Korrapati. It was first released in early 2024 and updated through mid-2025. At approximately 1.9 billion parameters, it is designed to run efficiently on consumer hardware such as laptops and edge devices while supporting a practical range of multimodal tasks. Moondream 2 combines a vision encoder based on SigLIP with a compact language backbone, trained for image understanding tasks rather than as a general chat model.The model accepts an image paired with a natural language prompt and produces text responses, supporting visual question answering, image captioning, and image-conditioned dialogue. Later Moondream 2 releases added object localization through a point API that returns coordinates for queried objects, along with improvements to OCR, counting, and document understanding. Moondream 2 is distributed under the Apache 2.0 license and is available through Hugging Face and the maintainer's distribution. Because the model is updated frequently, production deployments should pin to a specific revision rather than tracking the latest release. A successor model, Moondream 3 (Preview), was released in September 2025 with a 9B mixture-of-experts architecture and 2B active parameters, offering substantially stronger visual reasoning than Moondream 2 while retaining the efficiency-focused design. A referring expression segmentation extension to Moondream 3 was released in March 2026.

Claude 3.5 Haiku License

Proprietary

License terms and commercial-use guidance for Claude 3.5 Haiku.

License information is provided as a guide and is not legal advice.