Part 3 of the US Copyright Office’s “Copyright and Artificial Intelligence” report, released as a pre-publication version on May 9, 2025, addressed the most consequential legal question in the field: whether using copyrighted works to train generative AI models is fair use. The Office declined to offer a blanket answer, stressing that fair use requires balancing all four statutory factors on a case-by-case basis - but it gave clear signals about where the lines likely fall.
The report found that AI training is most likely to be transformative, and thus closer to fair use, when the purpose is research or a closed, non-substitutive task. It is far less likely to qualify, the Office concluded, when developers make “commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access.” In other words, training on legally obtained data for a narrow purpose looks defensible; training on pirated material to build a commercial product that floods the same market as the original works likely “goes beyond established fair use boundaries.” Rather than recommend new legislation or a compulsory license, the Office urged the market to develop voluntary licensing solutions.
The report’s release was overshadowed by controversy: shortly after it appeared, the head of the Copyright Office and the Librarian of Congress were dismissed, raising questions about political pressure.
Why business readers should care: this report is the federal government’s most detailed view on the central legal risk in AI development. It strongly suggests that how training data is acquired - licensed versus pirated - and what the model is used for will determine whether companies face liability.