Captioning Model Rankings
Updated 2 minutes agoVotes power rankings.
Top Model Scores
ELO ratings for the highest performing models
Loading chart...
Performance vs Accuracy
ELO score vs average latency • Better models are top-left
Loading chart...
| Action | |||||||
|---|---|---|---|---|---|---|---|
1 | multimodal | 1278 | 5 | 21.10 s | |||
2 | multimodal | 1246 | 3 | 11.54 s | Qwen | ||
3 | multimodal | 1245 | 5 | 10.85 s | |||
4 | multimodal | 1245 | 3 | 12.32 s | Qwen | ||
5 | multimodal | 1236 | 5 | 12.21 s | Anthropic | ||
6 | multimodal | 1223 | 3 | 8.91 s | |||
7 | multimodal | 1223 | 2 | 28.26 s | Qwen | ||
8 | multimodal | 1222 | 3 | 22.27 s | Qwen | ||
9 | multimodal | 1221 | 3 | 10.40 s | |||
10 | multimodal | 1213 | 5 | 7.16 s | |||
11 | multimodal | 1212 | 5 | 10.50 s | OpenAI | ||
12 | multimodal | 1212 | 5 | 5.13 s | Anthropic | ||
13 | multimodal | 1212 | 3 | 9.46 s | |||
14 | multimodal | 1211 | 4 | 17.18 s | |||
15 | multimodal | 1211 | 3 | 6.57 s | Qwen | ||
16 | multimodal | 1203 | 3 | 14.47 s | xAI | ||
17 | multimodal | 1201 | 3 | 3.75 s | Meta | ||
18 | multimodal | 1200 | 2 | 9.80 s | OpenAI | ||
19 | multimodal | 1200 | 5 | 5.36 s | Anthropic | ||
20 | multimodal | 1200 | 3 | 15.05 s | Mistral | ||
21 | multimodal | 1200 | 5 | 10.91 s | Anthropic | ||
22 | multimodal | 1200 | 5 | 12.71 s | OpenAI | ||
23 | multimodal | 1199 | 4 | 70.55 s | |||
24 | multimodal | 1198 | 4 | 6.71 s | |||
25 | multimodal | 1192 | 5 | 18.48 s | OpenAI | ||
26 | multimodal | 1190 | 5 | 6.20 s | Anthropic | ||
27 | multimodal | 1190 | 3 | 4.09 s | Mistral | ||
28 | multimodal | 1189 | 3 | 3.35 s | Qwen | ||
29 | multimodal | 1188 | 4 | 17.97 s | Qwen | ||
30 | multimodal | 1188 | 3 | 23.40 s | Qwen | ||
31 | multimodal | 1188 | 3 | 6.00 s | Mistral | ||
32 | multimodal | 1188 | 3 | 14.42 s | Qwen | ||
33 | multimodal | 1188 | 5 | 9.79 s | OpenAI | ||
34 | multimodal | 1186 | 5 | 8.95 s | Anthropic | ||
35 | multimodal | 1184 | 4 | 11.87 s | Meta | ||
36 | multimodal | 1179 | 5 | 2.10 s | |||
37 | multimodal | 1178 | 5 | 6.34 s | Anthropic | ||
38 | multimodal | 1168 | 3 | 5.63 s | Meta | ||
39 | multimodal | 1154 | 3 | 6.34 s | Qwen | ||
40 | multimodal | 1144 | 3 | 5.94 s | Microsoft | ||
41 | multimodal | 1133 | 5 | 17.11 s | OpenAI |