In October 2018, Reuters reporter Jeffrey Dastin revealed that Amazon had quietly built and then abandoned an experimental AI tool meant to automate resume screening. A team had worked on it since 2014, aiming for a system that could read applications and rate candidates from one to five stars, much like Amazon rates products, so recruiters could focus on the top picks.
The problem was the training data. The model learned from a decade of resumes submitted to Amazon, which skewed heavily male in the technical roles it was meant to fill. The system absorbed that pattern and concluded that male candidates were preferable. According to the report, it penalized resumes containing the word “women’s,” as in “women’s chess club captain,” and downgraded graduates of two all-women’s colleges. Engineers tried to neutralize those specific terms but could not guarantee the model would not find other proxies for gender, and there were broader concerns that it recommended unqualified candidates at random.
Amazon told Reuters the tool was never used by recruiters to evaluate real candidates and was eventually disbanded. The story nonetheless became one of the most cited real-world examples of biased AI, precisely because it showed a sophisticated, well-resourced company stumbling on a problem that is easy to create and hard to remove.
Why a business reader should care: a hiring model trained on a company’s own historical decisions will faithfully reproduce that history, including its imbalances. Removing an obvious biased signal does not fix the problem when the model can rediscover the same bias through correlated proxies.