In a blog post published on November 1, 2024, the Big Sleep team, a collaboration between Google Project Zero and Google DeepMind, reported that their AI agent had found a previously unknown, exploitable memory-safety vulnerability in real-world software. The agent, an evolution of an earlier project called Naptime, uses a large language model to carry out vulnerability research the way a human analyst would.
The bug was a stack buffer underflow in SQLite, one of the most widely deployed open-source database engines in the world. According to the post, the flaw stemmed from improper handling of a sentinel value (a negative one) in a code path within the engine’s query optimizer, which could lead to a write below an allocated buffer using a negative index. The Big Sleep agent worked through the code, reasoned about how the function could be misused, and surfaced the issue. The SQLite developers were notified in early October and patched the bug the same day, before it appeared in any official release, so users were never exposed.
The team called this the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software. They were careful to note prior related work, including a null-pointer issue found by another team at a DARPA event, while arguing that this finding represented a more serious vulnerability class.
For a business reader, Big Sleep is an early concrete sign that AI agents can contribute to defensive security by finding genuine bugs in important software, the same capability that, in the wrong hands, could accelerate offensive vulnerability discovery.