RepoBench

RepoBench is a benchmark for repository-level code auto-completion, introduced in “RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems” (arXiv, June 2023, by Tianyang Liu, Canwen Xu, and Julian McAuley). It addresses a gap in earlier evaluations, which mostly tested single-file tasks and so could not measure how well a system handles real codebases spread across many files.

The benchmark has three connected tasks. RepoBench-R (Retrieval) measures whether a system can find relevant code snippets in other files to use as context. RepoBench-C (Code Completion) measures predicting the next line of code using both in-file and cross-file context. RepoBench-P (Pipeline) combines the two, requiring a system to retrieve the right context and then complete the code. It supports both Python and Java.

RepoBench reflects how modern coding assistants actually have to work: a useful suggestion often depends on a function or type defined in a different file. For businesses evaluating AI coding tools, repository-level benchmarks like RepoBench are a better predictor of real-world usefulness than tests built from isolated, self-contained functions.

Sources

Last verified June 7, 2026