Depth Anything V2 is a monocular depth estimation model released in June 2024 by researchers at the University of Hong Kong and TikTok. It predicts a dense depth map from a single RGB image, enabling 3D-aware applications without the need for stereo cameras, LiDAR, or multi-view inputs. The model improves on the original Depth Anything through three modifications: replacing real labeled images with 595K high-quality synthetic images during teacher training, scaling up teacher model capacity, and using the stronger teacher to generate pseudo-labels on 62 million unlabeled real images used to train the student models. This pipeline reduces the depth prediction artifacts that can occur in reflective, transparent, and texture-poor regions. Compared to diffusion-based depth models such as Marigold, Depth Anything V2 runs more than 10× faster while producing more accurate predictions.
Depth Anything V2 is released in four sizes: Small (25M), Base (97M), Large (335M), and Giant (1.3B), and in two output modes: relative depth (normalized scene-level estimates) and metric depth (absolute distance in meters, produced by fine-tuning the relative-depth backbone on depth-annotated datasets). The Small, Base, and Large model weights are released under Apache 2.0, and the Giant variant under CC-BY-NC-4.0 for non-commercial use. A successor model, Depth Anything 3, was released in November 2025 by the ByteDance Seed team, extending the framework to multi-view depth estimation and camera pose recovery.
License terms and commercial-use guidance for Depth Anything V2.
License information is provided as a guide and is not legal advice.