ByteDance

ByteDance: Depth Anything V2

Depth Anything V2 Overview

Depth Anything V2 is a monocular depth estimation model released in June 2024 by researchers at the University of Hong Kong and TikTok. It predicts a dense depth map from a single RGB image, enabling 3D-aware applications without the need for stereo cameras, LiDAR, or multi-view inputs. The model improves on the original Depth Anything through three modifications: replacing real labeled images with 595K high-quality synthetic images during teacher training, scaling up teacher model capacity, and using the stronger teacher to generate pseudo-labels on 62 million unlabeled real images used to train the student models. This pipeline reduces the depth prediction artifacts that can occur in reflective, transparent, and texture-poor regions. Compared to diffusion-based depth models such as Marigold, Depth Anything V2 runs more than 10× faster while producing more accurate predictions.

Depth Anything V2 is released in four sizes: Small (25M), Base (97M), Large (335M), and Giant (1.3B), and in two output modes: relative depth (normalized scene-level estimates) and metric depth (absolute distance in meters, produced by fine-tuning the relative-depth backbone on depth-annotated datasets). The Small, Base, and Large model weights are released under Apache 2.0, and the Giant variant under CC-BY-NC-4.0 for non-commercial use. A successor model, Depth Anything 3, was released in November 2025 by the ByteDance Seed team, extending the framework to multi-view depth estimation and camera pose recovery.

Depth Anything V2 Details & Performance

Details

Vision Tasks

Depth Estimation

Features

Foundation Vision

Usage

Past 30 Days

Not available

Not in Playground

Performance

Avg. Latency

Arena Rankings

Not yet ranked in arena

Depth Anything V2 License

Apache 2.0

License terms and commercial-use guidance for Depth Anything V2.

License information is provided as a guide and is not legal advice.