OLMo (Open Language Model) is the Allen Institute for AI’s family of fully open large language models, first released on February 1, 2024. Its defining feature is the degree of openness: where models like Llama release only weights, OLMo shipped its weights alongside the complete Dolma training corpus of roughly three trillion tokens, the code that generated that data, the full training and inference code, training logs and metrics, and more than 500 intermediate checkpoints per model. As project lead Hanna Hajishirzi put it, “Without having access to training data, researchers cannot scientifically understand how a model is working.”
The first release centered on 7-billion-parameter models, each trained on at least two trillion tokens, with a smaller 1-billion variant. Ai2 developed it with partners including Harvard’s Kempner Institute, AMD, and the LUMI supercomputer. In November 2024, Ai2 followed with OLMo 2 at 7B and 13B, trained on an updated ~3.9-trillion-token mix and described as the best fully open models of their day, often beating open-weight peers of equivalent size. Because OLMo is a moving line, specific sizes and dataset details reflect Ai2’s announcements as of the verification date.
Why business readers should care: OLMo is the reference point for “fully open” AI - the only frontier-adjacent family where the training data and process are public, which matters for auditing, reproducibility, and regulatory scrutiny of how a model was built.