UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

“UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction” was posted to arXiv by Leland McInnes, John Healy, and James Melville on February 9, 2018. UMAP is a general-purpose technique for reducing high-dimensional data to a low-dimensional embedding, most often used for visualization.

The method is grounded in ideas from Riemannian geometry and algebraic topology. In practice it builds a graph of each point’s nearest neighbors, then finds a low-dimensional layout that preserves that graph’s structure. The authors argue UMAP is competitive with t-SNE on visualization quality while running considerably faster, preserving more of the global structure of the data, and placing no real limit on the number of output dimensions, which lets it serve as a general dimensionality-reduction step rather than only a plotting tool.

UMAP spread quickly through bioinformatics, where it became a default for visualizing single-cell genomics data, and into general data science. Like t-SNE, it requires judgment to read: its hyperparameters change the appearance of the embedding, and distances between separated clusters should not be over-interpreted.

Why business readers should care: UMAP made it practical to visualize and cluster very large datasets on ordinary hardware, which is why it shows up in everything from cell biology pipelines to customer-segmentation dashboards.

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Sources

Related