Evo (DNA foundation models)

Evo is a family of genomic foundation models developed at the Arc Institute, with collaborators at Stanford and UC Berkeley. Where most biological AI models specialize in proteins or in single cells, Evo operates at the level of raw DNA, learning patterns across the nucleotide sequences that encode genes, RNA, and proteins all at once.

The first model, Evo, was published in Science in November 2024. It has 7 billion parameters and reads sequences up to 131 kilobases long at single-nucleotide resolution, trained on a corpus of prokaryotic and phage genomes. Its hybrid architecture blends attention with state-space model layers to handle these very long sequences efficiently. A second generation, Evo 2, followed in 2025 with a much larger training set spanning all domains of life and longer context.

The models support both analysis and design. On the analysis side they predict mutation effects and gene function without task-specific training. On the design side they generate plausible new sequences, including molecular machines such as CRISPR-Cas systems, some of which the Arc team validated experimentally.

For a general reader, Evo represents the arrival of true foundation models for genomics: large, general-purpose networks pretrained on the language of DNA that can be adapted to many downstream biological questions, much as GPT-style models are adapted across text tasks.