scGPT

scGPT is a foundation model for single-cell biology described in the 2024 Nature Methods paper “scGPT: toward building a foundation model for single-cell multi-omics using generative AI,” by Haotian Cui, Chloe Wang, Bo Wang, and colleagues at the University of Toronto, the Vector Institute, and Microsoft Research. It adapts the generative, GPT-style approach to the gene-expression data of individual cells.

The model was pretrained on more than 33 million single-cell RNA-sequencing profiles. From this it learns general representations of cells and genes that can be fine-tuned for a range of downstream tasks: classifying cell types, correcting for batch effects when combining datasets from different experiments, inferring gene regulatory networks, and predicting how cells respond to perturbations. The authors also showed it extending beyond RNA to other modalities such as chromatin accessibility and protein abundance.

Like Geneformer, scGPT embodies the bet that single-cell biology will follow text and images into the foundation-model era, where one large pretrained network is reused across many specific problems rather than a fresh model being built for each.

For a general reader, scGPT is a concrete sign of how quickly the recipes behind ChatGPT have been ported into the life sciences, here turning the readout of which genes are active in millions of cells into a reusable engine for biological discovery.

Sources

Last verified June 7, 2026