EU DSM Directive: the text-and-data-mining opt-out

Directive (EU) 2019/790 on copyright in the Digital Single Market, adopted on April 17, 2019, created the European framework that now governs how copyrighted works can be used to train AI. It introduced two text-and-data-mining (TDM) exceptions. Article 3 grants research organizations and cultural heritage institutions a mandatory, non-waivable right to mine lawfully accessible works for scientific research. Article 4 is broader - it permits anyone, including commercial actors, to perform TDM on lawfully accessible works - but with a crucial catch: rightsholders may “expressly reserve” their works from mining, an opt-out that must be machine-readable for content made available online.

The Article 4 opt-out has become the legal hinge of the European AI debate. Because text and data mining is how AI training corpora are assembled, the directive effectively means that AI developers may train on European works unless the rightsholder has signaled “no” in a machine-readable way (for example, through website terms or metadata). This shifts the burden onto creators to declare a reservation, and it has driven the development of standards for expressing such opt-outs at web scale. The EU AI Act later reinforced this by requiring providers of general-purpose AI models to respect these reservations.

Member states were required to transpose the directive into national law by June 2021, with varying results.

Why business readers should care: the EU’s opt-out model is fundamentally different from the US fair-use approach. Companies training models on European content must check for and honor rights reservations, and creators who want to block AI training must take active, machine-readable steps to do so.

EU DSM Directive: the text-and-data-mining opt-out

Sources

Related