This model is deprecated
Llama 3.2 Vision 90b and can no longer be run here. Its evaluation results and details remain available for reference.
Llama 3.2 Vision 90B, released by Meta AI on September 25, 2024, is the largest vision-capable model in the Llama 3.2 family. With about 90 billion parameters (~88.8B) and a 128,000-token context window, it is designed for high-performance multimodal reasoning over images and text, while producing only text outputs. The model was trained on ~6 billion image–text pairs and instruction-tuned (SFT + RLHF), with a knowledge cutoff of December 2023.
It powers tasks like visual question answering, captioning, and image-grounded reasoning, and achieves strong benchmark performance compared to both open and proprietary models. The model officially supports English for multimodal (image+text) tasks, while text-only inputs extend to eight languages (including German, French, Hindi, and Spanish). Due to its large parameter size, it requires substantial compute resources but is accessible via cloud providers like Amazon Bedrock, Oracle Cloud, and Azure AI Foundry. While highly capable, it is limited to text-only outputs and has stricter multilingual support for vision-based inputs.
—
Usage
Past 30 DaysNot available
Not in Playground
Other models worth comparing for similar use cases.
License terms and commercial-use guidance for Llama 3.2 Vision 90b.
License information is provided as a guide and is not legal advice.