Facebook ships automatic alt text for blind users

On April 4, 2016, Facebook’s engineering team announced Automatic Alternative Text (AAT), a feature that generated spoken descriptions of photos for people using screen readers. When a blind or low-vision user reached a photo, VoiceOver on iOS would read a short, automatically generated caption such as “image may contain: two people, smiling, outdoors.”

The system was powered by a deep convolutional neural network trained to recognize objects, then organized its detections into people, objects, and scenes. To avoid misleading users, the team limited AAT to about 100 carefully chosen concepts - covering appearance, nature, transportation, sports, food, and settings - and only surfaced a tag when the model’s precision exceeded 0.8. The work was informed by a study Facebook ran with Cornell University showing that blind people wanted to engage with visual content but felt excluded from photo-centered conversations; the post cited the roughly 39 million blind people worldwide. AAT launched first on iOS in English across the US, UK, Canada, Australia, and New Zealand.

Why business readers should care: AAT was one of the earliest cases of computer vision deployed for accessibility at the scale of a billion-user product. Its conservative design - a tight concept list and a high precision threshold - is a lesson in shipping imperfect AI responsibly, by constraining what it will claim rather than letting it guess.