Christopher Olah is a machine learning researcher who works on interpretability, the effort to understand what is happening inside neural networks. On his blog he sums up his goal plainly: “I work on reverse engineering artificial neural networks into human understandable algorithms.”
Olah is a co-founder of Anthropic, an AI lab focused on the safety of large models. Earlier he led interpretability research at OpenAI and worked at Google Brain, and he co-founded Distill, which he describes as “a scientific journal focused on outstanding communication.” Distill was known for interactive, visually rich explanations of machine learning ideas. His own early work on feature visualization and on the “circuits” inside vision models helped establish mechanistic interpretability as a research program.
At Anthropic his team has produced work such as the 2024 study that mapped millions of interpretable features inside a production language model. Olah notes on his site that his blog “should not be taken to reflect the views of any organization I’m affiliated with,” a reminder that much of the field’s clearest writing started as personal exposition.