AI Vision Model Leaderboard

Updated 8 minutes ago

Explore top-performing models across computer vision tasks. Compare accuracy, speed, and user votes to find the best AI models.

Votes power leaderboards.

Object Detection

Models that detect and localize objects in images.

RankModelScoreDeltaAvg Latency
1
Azure
Florence-2
1366+84.16 s
2
Google
Gemini 1.5 Pro
130305.46 s
3
Yolo World
1257+32.59 s
4
Google
Gemini 1.5 Flash
1245-83.66 s
5
Anthropic
Claude 3.5 Sonnet
1237+173.91 s

Classification

Models that classify images into categories.

RankModelScoreDeltaAvg Latency
1
Anthropic
Claude 3 Opus
1247+123.03 s
2
Google
Gemini 2.0 Flash Exp
1235+134.39 s
3
Anthropic
Claude 3.5 Sonnet
123204.20 s
4
Google
Gemini 1.5 Pro
1229-133.41 s
5
Google
Gemini 1.5 Flash
1209+04.69 s

OCR

Models that extract text from images.

RankModelScoreDeltaAvg Latency
1
OpenAI
GPT-4o mini
1231-148.00 s
2
Mistral
Mistral Medium 3.1
1224+1115.02 s
3
Google
Gemini 1.5 Flash
1224+122.71 s
4
Anthropic
Claude 4 Opus
1223+126.31 s
5
Google
Gemma 3 27B
122302.62 s

Captioning

Models that generate descriptive captions for images.

RankModelScoreDeltaAvg Latency
1
Google
Gemma 3 4B
1224+125.55 s
2
Google
Gemini 2.0 Flash Exp
1223+123.60 s
3
Google
Gemma 3 12B
1212+127.88 s
4
Google
Gemini 1.5 Pro
1212+125.63 s
4
Mistral
Pixtral 12B
1212+123.71 s

Open Prompt

Models that interpret free-form prompts on images.

RankModelScoreDeltaAvg Latency
1
Google
Gemini 2.5 Flash
1235+123.33 s
2
Anthropic
Claude 3 Opus
1235+115.81 s
3
Google
Gemini 1.5 Pro
1224+128.28 s
4
Google
Gemini 1.5 Flash
1224+122.51 s
5
Meta
Llama 4 Maverick
1212+121.95 s