DINOv2 is a self-supervised vision foundation model released in April 2023 by Meta AI's FAIR lab. It produces general-purpose visual features that transfer to a wide range of downstream tasks (including image classification, semantic segmentation, depth estimation, and image retrieval) without requiring task-specific fine-tuning. DINOv2 is trained on a curated dataset of 142 million images using a self-supervised objective combining student-teacher distillation, masked image modeling, and an image-level contrastive loss, extending the approach introduced in the original DINO.
The model family spans Vision Transformer sizes from ViT-S (21M parameters) to ViT-g (1.1B parameters), with the larger variants setting state-of-the-art results on linear-probing benchmarks for classification, segmentation, and dense prediction tasks at release. DINOv2 features can be used directly as frozen backbones, reducing the need for labeled training data in downstream applications. The model is primarily used as an image encoder rather than as a complete task-specific model, making it a common backbone choice for custom vision pipelines. DINOv2 code and pretrained weights are released under the Apache 2.0 license, which was adopted after an initial CC-BY-NC 4.0 release in response to community requests for commercial compatibility. A successor model, DINOv3, was released in August 2025 with further scaling and a new training technique called Gram anchoring.
Other models worth comparing for similar use cases.
License terms and commercial-use guidance for DINOv2.
License information is provided as a guide and is not legal advice.