Roboflow
Meta

Meta: Detectron2

Detectron2 Overview

Detectron2 is a computer vision model library developed by Facebook AI Research (Meta), released in September 2019. It serves as a comprehensive platform for object detection, instance segmentation, panoptic segmentation, keypoint detection, and DensePose, implemented in PyTorch. It is the successor to the original Detectron framework, which was written in Caffe2, and offers a more modular and extensible codebase designed for both research and production use.

Detectron2 includes implementations of Faster R-CNN, Mask R-CNN, RetinaNet, Cascade R-CNN, Panoptic FPN, and several other architectures. Its modular design allows components such as backbones, necks, and heads to be swapped independently, making it widely used as a baseline framework in academic research. It supports training on COCO-format datasets and integrates with standard distributed training setups.

Detectron2 Details & Performance

Details

Vision Tasks

Object DetectionInstance SegmentationSemantic SegmentationKeypoint Detection

Features

Foundation Vision

Usage

Past 30 Days

Not available

Not in Playground

Performance

Avg. Latency

Arena Rankings

Not yet ranked in arena

Alternatives to Detectron2

Other models worth comparing for similar use cases.

Meta
Mask R-CNN
Mask R-CNN is an instance segmentation model developed by Facebook AI Research (Meta), released in October 2017. It extends Faster R-CNN by adding a parallel branch that predicts binary segmentation masks for each detected object, independent of the classification and bounding box regression branches. A key contribution is RoIAlign, which replaces RoIPool with bilinear interpolation to preserve spatial correspondence between features and input pixels, significantly improving mask quality.Mask R-CNN achieves strong performance on the COCO instance segmentation benchmark and supports keypoint detection as an additional output head. It remains a foundational architecture in instance segmentation and is available through Meta's Detectron2 framework. The model is most appropriate for tasks requiring pixel-level object delineation, such as medical imaging, autonomous driving, and industrial inspection.
Azure
Faster R-CNN
Faster R-CNN is an object detection model introduced by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun at Microsoft Research, published at NIPS in June 2015. It advances upon Fast R-CNN and R-CNN by introducing the Region Proposal Network (RPN), a fully convolutional network that shares features with the detection network and generates object proposals at negligible additional cost. This makes Faster R-CNN the first near-real-time deep learning object detector based on region proposals.Faster R-CNN achieves strong detection accuracy on PASCAL VOC and MS COCO at the time of release. It remains a widely referenced architecture in computer vision research and is available through Meta's Detectron2 framework as a maintained PyTorch implementation. It is most appropriate for offline or server-side inference tasks where accuracy is prioritized over latency, as its two-stage pipeline carries higher inference cost than single-stage detectors.
YOLOv8 Instance Segmentation
YOLOv8 Instance Segmentation is the segmentation variant of the YOLOv8 model developed by Ultralytics, released in January 2023 under the AGPL-3.0 license. It extends the standard YOLOv8 detection head with a mask prediction branch that generates pixel-level segmentation masks for each detected object using a prototype mask approach. This enables real-time instance segmentation within a single forward pass.YOLOv8 Instance Segmentation shares the same backbone and neck architecture as the base detection model and is available in the same size range. It is deployable through Roboflow Inference and supports fine-tuning on custom COCO-format segmentation datasets. It is suited for applications requiring both object localization and precise mask prediction at real-time speeds.
YOLO11
YOLO11 is an object detection and multi-task vision model developed by Ultralytics, released in September 2024 under the AGPL-3.0 license. It is the latest generation in the Ultralytics YOLO series and supports object detection, instance segmentation, image classification, pose estimation, and oriented bounding box detection within a single unified framework. YOLO11 introduces architectural refinements that improve accuracy while reducing parameter count compared to YOLOv8 at equivalent model sizes.YOLO11 is available in five model sizes from Nano to Extra Large and is deployable through the Ultralytics Python package, Roboflow Inference, and export formats including ONNX, TensorRT, and CoreML. It supports fine-tuning on custom datasets through the standard Ultralytics training API.
Google
MediaPipe
MediaPipe is an open-source framework developed by Google for building real-time machine learning pipelines across mobile, web, desktop, and edge platforms. First released in 2019, the framework uses a graph-based architecture where pre-built components called Calculators process streaming data such as images, video, and audio through configurable computation graphs. This design allows developers to compose perception pipelines from reusable building blocks without writing custom glue code between models. The current MediaPipe Tasks API replaces the earlier Solutions API and provides a unified cross-platform interface for vision, text, and audio.Rather than providing a single model, MediaPipe ships a suite of ready-to-use Tasks that wrap trained models for specific problems. These include MediaPipe Pose Landmarker for 33-point body landmark detection, Hand Landmarker for 21-point hand tracking, Face Landmarker which extends the earlier 468-point Face Mesh with blendshape outputs for facial expression, Selfie Segmentation for person-background separation, and Holistic Landmarker for combined body, hand, and face tracking. The Tasks prioritize on-device inference with low latency and support GPU acceleration where available, making the framework a common choice for mobile augmented reality, fitness and wellness applications, gesture-based interfaces, and accessibility features such as sign language recognition.
Meta
DETR
DETR (Detection Transformer) is an end-to-end object detection model developed by Facebook Research (Meta), released in May 2020. It is one of the first models to eliminate hand-crafted components such as anchor generation and non-maximum suppression by framing object detection as a direct set prediction problem, solved with a transformer encoder-decoder architecture built on top of a CNN backbone.DETR achieves 42.0% AP on the COCO benchmark with a ResNet-50 backbone, performing comparably to a well-tuned Faster R-CNN at the time of release. Its attention-based design allows it to reason about global context and long-range dependencies within an image. DETR is primarily used as a research baseline and architectural reference, with subsequent works such as Deformable DETR and DINO building on its foundations to address its slower training convergence and limited small-object detection capability.

Detectron2 License

Apache 2.0

License terms and commercial-use guidance for Detectron2.

License information is provided as a guide and is not legal advice.