DeepSeek-Coder

DeepSeek-Coder is a series of open code models from the Chinese AI lab DeepSeek, described in “DeepSeek-Coder: When the Large Language Model Meets Programming” (arXiv, January 2024, by Daya Guo and colleagues). The series ranges from 1.3 billion to 33 billion parameters and was trained from scratch on two trillion tokens drawn from a high-quality, project-level code corpus rather than only isolated files.

The training used a fill-in-the-blank objective with a 16,000-token window to strengthen both code completion and infilling, where the model fills a gap using surrounding context on both sides. The paper reports state-of-the-art results among open code models across multiple benchmarks, and notably claims to surpass closed models including Codex and GPT-3.5. The models were released under a permissive license that allows research and unrestricted commercial use.

DeepSeek-Coder was an early sign that open models from outside the dominant US labs could reach or exceed the quality of well-known proprietary code systems. For businesses, it widened the field of self-hostable coding models and contributed to the broader trend of capable open-weight code AI that organizations can run privately.

Sources

Related