Reddit licenses its data for AI training and goes public

On February 22, 2024, Reddit filed its Form S-1 registration statement with the US Securities and Exchange Commission, ahead of its March IPO. The filing did something new for a social platform: it presented Reddit’s two decades of user-written discussion as a licensable data asset for training AI. Reddit disclosed that it had entered “data licensing arrangements” with partners and that a substantial portion of its data-licensing revenue came from one partner - widely reported to be Google, in a deal valued at roughly $60 million a year that gave Google access to Reddit’s content API for AI training.

The timing - announcing the licensing model the same week it filed to go public - made the strategy explicit. Reddit’s conversational, human-written archive is exactly the kind of high-signal text that language models train on, and rather than let it be scraped for free, Reddit moved to sell structured, real-time access through its Data API. This followed Reddit’s controversial 2023 decision to start charging for that API, which had set the stage for treating the data as a revenue source.

The deal became a template. Other content owners - from Stack Overflow to news publishers - struck similar licensing agreements, and Reddit later disclosed that its AI data deals were collectively worth hundreds of millions of dollars.

Why business readers should care: Reddit turned a cost center (hosting user discussion) into a recurring revenue line by licensing it for AI training. For any organization sitting on a large, distinctive corpus of human-generated content, this is the emerging playbook - and a reminder that “your users’ data” can be both a liability and a monetizable asset in the AI era.