Computer Use: AI agents that operate software

On October 22, 2024, Anthropic announced upgraded Claude 3.5 models alongside a capability it called “computer use,” available in public beta. The idea was simple to state and hard to build: rather than calling purpose-built tools through an API, Claude would operate ordinary software the way a person does, by “looking at a screen, moving a cursor, clicking buttons, and typing text.” Anthropic described it as teaching Claude “general computer skills, allowing it to use a wide range of standard tools and software programs designed for people.” It was the first time a frontier lab shipped a general computer-using model to developers, and Anthropic was candid that it remained “experimental, at times cumbersome and error-prone,” advising developers to start with low-risk tasks.

Three months later, on January 23, 2025, OpenAI introduced a research preview of Operator, an agent that “can go to the web to perform tasks for you.” Operator was powered by a model OpenAI called the Computer-Using Agent (CUA), which combined GPT-4o’s vision with reinforcement-learning-trained reasoning and, like Claude’s computer use, interacted with graphical interfaces “just as humans do” using its own browser to type, click, and scroll. OpenAI reported state-of-the-art results on agent benchmarks such as OSWorld and WebVoyager and released Operator first to Pro-tier users in the United States.

Taken together, these two releases mark the pivot from agents-in-theory to agents-that-click. For years the industry had talked about autonomous agents (see devin-and-the-agent-hype), but most systems relied on bespoke integrations and brittle scripting. Letting a model perceive a screen and act on it directly meant, in principle, that an agent could use any software a person can use, with no special API required. That generality is the point, and it is also the risk: the same flexibility that lets an agent book travel or fill a form lets it click the wrong button on a live system.

For business readers, computer use reframed what “automation” could mean. It complements the structured, tool-calling approach standardized by the Model Context Protocol (see 2024-model-context-protocol): MCP gives an agent clean, governed connections to data and services, while computer use gives it a fallback path through the human-facing interface when no such connection exists. Both labs stressed that the early systems were slow and made mistakes, but the direction was unmistakable, and by 2025 agentic operation had become a central thread of frontier-model competition.

Computer Use: AI agents that operate software

Sources

Related