Gemini 3.1 Flash-Lite is a natively multimodal reasoning model from Google DeepMind in the Gemini 3 series, based on the Gemini 3 Pro architecture. It processes text, image, video, audio, and PDF inputs within a 1 million token context window and produces text output up to 64K tokens. The model targets high-volume, latency-sensitive workloads and supports visual question answering, image and document data extraction, content moderation, classification, translation, automated speech recognition, and agentic data pipelines. It exposes configurable thinking levels of minimal, low, medium, and high, which set the depth of internal reasoning applied per request and let developers balance response quality against cost and latency.
On benchmarks reported at launch, Gemini 3.1 Flash-Lite scores 86.9% on GPQA Diamond and 76.8% on the MMMU Pro multimodal benchmark, and reaches an Elo score of 1432 on the Arena.ai leaderboard. According to Artificial Analysis benchmarks, it produces a 2.5 times faster time to first answer token and a 45% increase in output speed relative to Gemini 2.5 Flash. It also shows improved instruction following, higher audio input quality for automated speech recognition tasks, and support for structured JSON output used in data extraction pipelines.
Drag and drop an image here, or click to browse
Usage
Past 30 Days| Category | Passed | Score |
|---|---|---|
| Spatial Understanding | 15 / 19 | 78.9% |
| Document Understanding | 7 / 9 | 77.8% |
| Defect Detection | 11 / 15 | 73.3% |
| Object Understanding | 9 / 14 | 64.3% |
| Object Counting | 3 / 10 | 30% |
Scores based on single evaluation run · Methodology
View all Vision Evals →Gemini 3.1 Flash-Lite costs $0.250 per 1M input tokens and $1.50 per 1M output tokens.
Pricing updated Jun 21, 2026
Other models worth comparing for similar use cases.
License terms and commercial-use guidance for Gemini 3.1 Flash-Lite.
License information is provided as a guide and is not legal advice.