“Caffe: Convolutional Architecture for Fast Feature Embedding” was submitted to arXiv on June 20, 2014 by Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell, working out of the Berkeley Vision and Learning Center (BVLC).
The paper describes Caffe as a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying deep networks. Its design separated model definition (written in plain configuration files) from implementation, so researchers could specify architectures without writing low-level code. The paper reports that Caffe could process over 40 million images a day on a single GPU, roughly 2.5 milliseconds per image, making it one of the fastest tools of its time. It shipped with a collection of reference models that others could download and build on.
Caffe was, for several years, the dominant framework in computer vision research, especially for convolutional networks, and its “Model Zoo” of shareable pretrained models was an early version of the model-sharing culture later formalized by the Hugging Face Hub. It was eventually superseded by PyTorch and TensorFlow, and Caffe2, a successor effort at Facebook, was later folded into PyTorch.