Nicholas Carlini

Nicholas Carlini is a researcher working at the intersection of machine learning and computer security, and one of the most prolific authors in the field of adversarial machine learning. He earned his Ph.D. at the University of California, Berkeley under David Wagner, after a Berkeley undergraduate degree in computer science and mathematics. He spent 2018 to 2023 at Google Brain and 2023 to 2025 at Google DeepMind, and is now a research scientist at Anthropic.

His research repeatedly shows that machine learning systems are less secure than they appear. He is known for systematically breaking proposed defenses against adversarial examples, including the influential “obfuscated gradients” line of work that demonstrated many published defenses provided only an illusion of robustness. He is a central author on the major privacy and data-extraction results, including the 2020 demonstration that GPT-2 memorizes and leaks training data, the 2023 divergence attack that extracted training data from production ChatGPT, and the 2023 paper showing that poisoning web-scale datasets is practical and cheap. He also co-authored the 2023 work on universal and transferable adversarial attacks that jailbreak aligned language models.

His work has earned multiple best-paper awards at the top venues in security and machine learning, including IEEE Security and Privacy, USENIX Security, and ICML, and has been covered by outlets including the New York Times, the BBC, Nature, and Science. Outside the research mainstream he is also known for playful programming feats, such as winning the 2020 International Obfuscated C Code Contest.

Carlini’s body of work has been foundational in establishing adversarial machine learning, model privacy, and rigorous robustness evaluation as serious sub-disciplines, and in pushing the field toward attacks-first, empirically tested claims about model security.

Sources

Related