A ConvNet for the 2020s (ConvNeXt)

“A ConvNet for the 2020s” was submitted to arXiv in January 2022 by Zhuang Liu, Saining Xie, and colleagues at Facebook AI Research, now Meta AI, together with UC Berkeley. It was a direct response to the moment when vision transformers seemed poised to replace convolutional networks entirely.

Rather than propose a brand-new design, the authors ran a careful experiment: they took a standard ResNet and modernized it one change at a time, importing the training recipes and design choices that had made transformers like Swin so effective, such as larger kernels, fewer activation and normalization layers, and updated training schedules, while keeping the network purely convolutional. The endpoint of this gradual roadmap was ConvNeXt, a pure ConvNet that matched or beat comparable Swin transformers on image classification, object detection, and segmentation, reaching 87.8 percent top-1 accuracy on ImageNet.

The result reframed a debate that had felt settled. It showed that much of the transformer’s vision advantage came from its training and design conventions rather than self-attention itself, and that a well-modernized convolutional network remained fully competitive. ConvNeXt became a strong, efficient backbone and a reference point for arguing that architecture choices should be judged on evidence, not fashion.

For a general reader, ConvNeXt is a useful corrective to hype cycles: the newest architecture is not automatically the best, and sometimes the right move is to carefully update a proven design rather than discard it.

A ConvNet for the 2020s (ConvNeXt)

Sources

Related