Microsoft builds a dedicated supercomputer for OpenAI on Azure

On May 19, 2020, at its Build developer conference, Microsoft announced that it had built a supercomputer hosted in Azure “exclusively for OpenAI” to train that lab’s large AI models. Microsoft described it as a single system with “more than 285,000 CPU cores,” 10,000 GPUs, and 400 gigabits per second of network connectivity for each GPU server, and said it would rank “in the top five” of publicly disclosed supercomputers in the world.

The machine was the first concrete deliverable of the partnership Microsoft and OpenAI had announced in 2019, when Microsoft invested $1 billion and OpenAI agreed to build its models on Azure. Rather than buy hardware piecemeal, OpenAI gained access to a purpose-built cluster, and Microsoft gained a flagship demonstration of Azure’s ability to host frontier-scale AI training.

This supercomputer was the infrastructure on which OpenAI trained GPT-3, the 175-billion-parameter model released later in 2020 that demonstrated few-shot learning at scale. It set the template for the next several years: frontier labs partnering with hyperscale cloud providers to access compute that would be impractical to build alone.

Why business readers should care: the 2020 Azure supercomputer marked the start of the era in which access to dedicated, top-five-scale compute became a competitive moat. The deal showed how a cloud provider and an AI lab could fuse their fortunes - infrastructure for one, a flagship customer for the other - a pattern now repeated across the industry.

Microsoft builds a dedicated supercomputer for OpenAI on Azure

Sources

Related