Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Ultralytics released a technical report for YOLO26, the latest entry in the widely deployed YOLO family of real-time object detectors. The paper argues that most prior YOLO detectors still depend on non-maximum suppression (NMS) at inference time, carry heavy detection heads because of Distribution Focal Loss, need long training schedules, and can leave the smallest objects without positive label assignments. YOLO26 is presented as a redesign that removes these dependencies.

The report describes several changes. The model performs NMS-free, end-to-end inference, removing the post-processing step that complicates deployment. It drops Distribution Focal Loss to lighten the detection head. The training pipeline combines MuSGD, a hybrid Muon-SGD optimizer; a Progressive Loss that shifts supervision toward the inference-time head; and STAL, a label assignment strategy aimed at giving small objects positive matches.

On reported results, the five model scales reach 40.9 to 57.5 mAP on the COCO benchmark with inference latencies of 1.7 to 11.8 milliseconds. The framework supports detection, segmentation, pose estimation, and oriented bounding box detection, and an open-vocabulary extension reaches 40.6 AP on LVIS minival.

YOLO matters because it is one of the most heavily used object-detection model families in production computer vision, from robotics to surveillance to industrial inspection. A new release that removes NMS and simplifies the head targets the practical friction that determines whether a detector is easy to ship across diverse hardware.

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Sources

Related