Gemma 4 12B is an open-weight multimodal model from Google in the Gemma 4 family. It is intended for text and image understanding tasks such as visual question answering, OCR, captioning, and document understanding, with a smaller parameter footprint than the larger Gemma 4 variants.
This entry is connected to Roboflow Playground vision evals for comparison. No runnable Playground workflow is configured yet, so the model page is used for discovery and benchmark context rather than direct hosted inference.
—
Usage
Past 30 DaysNot available
Not in Playground
Not yet ranked in arena
| Category | Passed | Score |
|---|---|---|
| Document Understanding | 8 / 9 | 88.9% |
| Object Understanding | 11 / 14 | 78.6% |
| Defect Detection | 11 / 15 | 73.3% |
| Spatial Understanding | 11 / 19 | 57.9% |
| Object Counting | 1 / 10 | 10% |
Scores based on single evaluation run · Methodology
View all Vision Evals →License terms and commercial-use guidance for Gemma 4 12B.
License information is provided as a guide and is not legal advice.