DETR (Detection Transformer) is an end-to-end object detection model developed by Facebook Research (Meta), released in May 2020. It is one of the first models to eliminate hand-crafted components such as anchor generation and non-maximum suppression by framing object detection as a direct set prediction problem, solved with a transformer encoder-decoder architecture built on top of a CNN backbone.
DETR achieves 42.0% AP on the COCO benchmark with a ResNet-50 backbone, performing comparably to a well-tuned Faster R-CNN at the time of release. Its attention-based design allows it to reason about global context and long-range dependencies within an image. DETR is primarily used as a research baseline and architectural reference, with subsequent works such as Deformable DETR and DINO building on its foundations to address its slower training convergence and limited small-object detection capability.
Other models worth comparing for similar use cases.
License terms and commercial-use guidance for DETR.
License information is provided as a guide and is not legal advice.