DeepVariant: a universal SNP and indel caller using deep neural networks

In September 2018 Nature Biotechnology published “A universal SNP and small-indel variant caller using deep neural networks” by Ryan Poplin, Pi-Chuan Chang, and colleagues at Google. It introduced DeepVariant, a tool that reframed a core problem in genomics as an image-recognition task.

Reading a genome with current technology means sequencing it in many short overlapping fragments, then deciding at each position whether the differences from a reference are real variants or sequencing errors. DeepVariant turns the stacked, aligned reads around each candidate site into an image, then uses a convolutional neural network, the same kind used for photo classification, to call the genotype. This let the authors borrow mature image-classification techniques rather than hand-coding statistical models of sequencing error.

DeepVariant outperformed existing state-of-the-art variant callers on standard benchmarks and, notably, generalized well: a model trained on human data could call variants in other species and across different sequencing technologies and experimental designs. That transferability meant non-human genome projects could benefit from the large, carefully curated human ground-truth datasets.

For a general reader, DeepVariant is a clean example of cross-domain transfer: a method built for recognizing cats and street signs, repurposed to read the letters of DNA more accurately. It has been widely adopted in research and clinical sequencing pipelines where accurate variant calls underpin diagnosis and discovery.

DeepVariant: a universal SNP and indel caller using deep neural networks

Sources

Related