On June 29, 2021, GitHub announced GitHub Copilot in a post titled “Introducing GitHub Copilot: your AI pair programmer.” GitHub described it as a technical preview of a tool that helps developers write code by suggesting completions and entire functions directly inside the editor as a person types. The announcement noted it was developed with OpenAI and was particularly effective with languages including Python, JavaScript, TypeScript, Ruby, and Go.
Copilot was powered by OpenAI Codex, a model trained on code. The accompanying research paper, “Evaluating Large Language Models Trained on Code” (submitted July 7, 2021), authored by Mark Chen, Jerry Tworek, and a large team at OpenAI, documented how a large language model could be trained and measured on programming tasks. It introduced systematic ways to evaluate whether generated code actually works.
Copilot marked the first widely visible product to bring large language models into everyday software development. It moved AI code generation from a research curiosity into a tool used by working developers, and it set the stage for the agentic coding tools that followed later in the decade.
Copilot also produced the first big “trained on our code” lawsuit. On November 3, 2022, a proposed class action, Doe 1 v. GitHub (case 4:22-cv-06823, N.D. Cal.), was filed against GitHub, Microsoft, and OpenAI on behalf of developers whose public repositories had been used to train Copilot and the underlying Codex model. The core allegation was that the defendants reproduced open-source code without honoring its licenses, stripping the attribution and license terms that those licenses require. The case predates the better-known NYT v. OpenAI suit (see 2023-nyt-v-openai) and raised the same underlying question one step earlier: whether training a model on copyrighted material, and emitting things that resemble it, is infringement.
The case went largely the defendants’ way. Judge Jon Tigar dismissed most claims with leave to amend in January 2024, and on July 5, 2024 he dismissed the central DMCA Section 1202(b) claim (over removed copyright-management information) with prejudice, reasoning that Copilot’s suggestions were generally not identical enough to the plaintiffs’ own code for that provision to apply. That left only a narrow breach-of-contract theory based on the open-source licenses, a small remainder of the original twenty-two claims. The outcome was an early signal that “you trained on my work” is far easier to allege than to win, and it makes the library’s coverage of these disputes consistent across both code and journalism. (Sourcing: the CourtListener docket is the primary record of the case and its orders; githubcopilotlitigation.com, run by the plaintiffs’ firms, is the filing site for the complaint. The CourtListener and SEC-style pages return HTTP 403 to automated fetchers, so docket dates and the July 2024 ruling were corroborated through search against the canonical docket.)