On March 14, 2023, the same day OpenAI announced GPT-4, the accessibility company Be My Eyes announced Virtual Volunteer, which it described as “the first-ever digital visual assistant powered by OpenAI’s new GPT-4 language model.” Be My Eyes had spent years connecting blind and low-vision users with sighted human volunteers over live video. Virtual Volunteer added an AI option: a user could send a photo and ask any question about it, and the system would answer in natural, conversational language - identifying objects, reading labels, describing a scene, or, in the company’s own example, looking in a refrigerator and suggesting what to cook.
The launch was significant for two reasons. First, it was the visible debut of GPT-4’s image-understanding capability - the “vision” half of the multimodal model, later widely called GPT-4V. OpenAI’s own GPT-4 announcement noted that the image-input feature was not yet open to all customers and was being tested with a single partner to start; Be My Eyes was that partner. So this was not a generic demo but the first real application of the capability put in front of users with a concrete need. Second, it reframed what an assistive technology could be. Earlier image-to-text tools could label a photo flatly; Virtual Volunteer could hold a back-and-forth about what the image showed, supplying context and nuance that a list of detected objects cannot.
The entry belongs in this library as the assistive, plainly beneficial face of the same multimodal capability that elsewhere drove anxiety about deepfakes and synthetic media. The technology that lets a model generate a convincing fake image is, turned around, the technology that lets a blind person ask “what does this say?” and get a useful answer. Be My Eyes is the early, concrete demonstration that the generative-AI wave was not only a source of cultural and labor disputes but also, immediately and for a specific group of people, a genuine expansion of what was possible.