GPT-4V gives ChatGPT eyes

In October 2023 OpenAI published the GPT-4V(ision) System Card, documenting the addition of image input to GPT-4 and its rollout to ChatGPT users. For most people this was the moment a mainstream chatbot gained eyes: you could upload a photo, a screenshot, a diagram, or a handwritten note and ask questions about it in the same conversation as text.

The system card is not a research paper describing a new architecture - OpenAI did not disclose how GPT-4V was built - but a safety document. It covers the capabilities unlocked by adding vision and, at length, the new risks: the model identifying people in photos, reading private information from images, producing confident but wrong descriptions, being jailbroken through text embedded in pictures, and giving unsafe advice about images of, say, medical or chemical content. It describes the evaluations and mitigations OpenAI applied before broad release, including limits on facial recognition.

GPT-4V landed alongside the open LLaVA and BLIP-2 efforts, marking the point where general-purpose vision-language models moved from research demos into products used by millions.

Why business readers should care: GPT-4V turned image understanding into an everyday feature rather than a specialist tool. It let non-technical users do visual tasks - interpreting charts, debugging from screenshots, extracting data from photos of documents - by simply uploading an image, and it set the template for the multimodal assistants that followed.

Sources

Related