MosaicML releases MPT-7B

On May 5, 2023, MosaicML released MPT-7B, a decoder-style transformer with 6.7 billion parameters trained from scratch on 1 trillion tokens of text and code. Its headline feature was the license: unlike Meta’s Llama, which restricted commercial use at the time, MPT-7B was licensed for commercial use, making it a drop-in open base model that companies could build products on without legal friction. MosaicML reported it matched the quality of Llama-7B on standard benchmarks.

The model used ALiBi positional embeddings instead of conventional position encodings, which let it handle and extrapolate to very long inputs. A specialized fine-tune, MPT-7B-StoryWriter, was tuned for a 65,000-token context and demonstrated generations as long as 84,000 tokens. MosaicML said the base model took about 9.5 days to train on 440 A100 GPUs at a cost of roughly 200,000 dollars, with no human intervention, highlighting how its training platform made building a capable model relatively cheap and repeatable.

MPT-7B was an important early example of a fully open, commercially usable foundation model with an unusually long context. MosaicML was acquired by Databricks shortly afterward in 2023, folding this open-model expertise into a major data platform. For businesses, a commercially licensed open base model removed a key barrier to deploying self-hosted AI in products.

Sources

Related