Doe v. GitHub: the Copilot copyright case narrows

In November 2022, a group of anonymous programmers (“J. Doe” plaintiffs) sued GitHub, its parent Microsoft, and OpenAI over GitHub Copilot, the AI coding assistant built on OpenAI’s Codex model and trained on billions of lines of public code from GitHub repositories. The plaintiffs argued that Copilot reproduced their open-source code without honoring the attribution and license terms attached to it, and that this stripped away “copyright management information” in violation of section 1202(b) of the Digital Millennium Copyright Act. The case, Doe v. GitHub (No. 4:22-cv-06823, N.D. Cal.), was the first major lawsuit testing how copyright applies to an AI system trained on source code.

Judge Jon S. Tigar progressively narrowed the case. In a February 2024 order he dismissed many claims, and on July 5, 2024, he dismissed the central DMCA section 1202(b) claim with prejudice. The key reasoning was the “identicality” requirement: the court agreed that a DMCA copyright-management-information violation requires the AI to reproduce code that is identical to the original, and the plaintiffs had not shown Copilot output identical copies of their work. The surviving claims focused on breach of the open-source license contracts rather than copyright. The dismissal was later taken up on interlocutory appeal to the Ninth Circuit.

Why business readers should care: Copilot was one of the first widely deployed generative-AI products, and this case set an early bar for how hard it is to win copyright claims when an AI is trained on - but does not verbatim copy - protected material. The “identicality” hurdle continues to shape how plaintiffs frame AI infringement suits.