AI Vision Model Leaderboard
Updated 8 minutes agoExplore top-performing models across computer vision tasks. Compare accuracy, speed, and user votes to find the best AI models.
Votes power leaderboards.
Object Detection
Models that detect and localize objects in images.
Rank | Model | Score | Delta | Avg Latency |
---|---|---|---|---|
1 | Florence-2 | 1366 | +8 | 4.16 s |
2 | Gemini 1.5 Pro | 1303 | 0 | 5.46 s |
3 | Yolo World | 1257 | +3 | 2.59 s |
4 | Gemini 1.5 Flash | 1245 | -8 | 3.66 s |
5 | Claude 3.5 Sonnet | 1237 | +17 | 3.91 s |
Loading chart...
Classification
Models that classify images into categories.
Rank | Model | Score | Delta | Avg Latency |
---|---|---|---|---|
1 | Claude 3 Opus | 1247 | +12 | 3.03 s |
2 | Gemini 2.0 Flash Exp | 1235 | +13 | 4.39 s |
3 | Claude 3.5 Sonnet | 1232 | 0 | 4.20 s |
4 | Gemini 1.5 Pro | 1229 | -13 | 3.41 s |
5 | Gemini 1.5 Flash | 1209 | +0 | 4.69 s |
Loading chart...
OCR
Models that extract text from images.
Rank | Model | Score | Delta | Avg Latency |
---|---|---|---|---|
1 | GPT-4o mini | 1231 | -14 | 8.00 s |
2 | Mistral Medium 3.1 | 1224 | +11 | 15.02 s |
3 | Gemini 1.5 Flash | 1224 | +12 | 2.71 s |
4 | Claude 4 Opus | 1223 | +12 | 6.31 s |
5 | Gemma 3 27B | 1223 | 0 | 2.62 s |
Loading chart...
Captioning
Models that generate descriptive captions for images.
Rank | Model | Score | Delta | Avg Latency |
---|---|---|---|---|
1 | Gemma 3 4B | 1224 | +12 | 5.55 s |
2 | Gemini 2.0 Flash Exp | 1223 | +12 | 3.60 s |
3 | Gemma 3 12B | 1212 | +12 | 7.88 s |
4 | Gemini 1.5 Pro | 1212 | +12 | 5.63 s |
4 | Pixtral 12B | 1212 | +12 | 3.71 s |
Loading chart...
Open Prompt
Models that interpret free-form prompts on images.
Rank | Model | Score | Delta | Avg Latency |
---|---|---|---|---|
1 | Gemini 2.5 Flash | 1235 | +12 | 3.33 s |
2 | Claude 3 Opus | 1235 | +11 | 5.81 s |
3 | Gemini 1.5 Pro | 1224 | +12 | 8.28 s |
4 | Gemini 1.5 Flash | 1224 | +12 | 2.51 s |
5 | Llama 4 Maverick | 1212 | +12 | 1.95 s |
Loading chart...