Sepsis - the body’s overwhelming response to infection - kills quickly, so hospitals want to catch it early. Epic Systems, whose electronic health record software runs much of US healthcare, built a proprietary Sepsis Model that scored patients for sepsis risk, and it was switched on at hundreds of hospitals. Because the model was embedded in the EHR rather than cleared as a standalone medical device, it had been deployed at scale without a rigorous independent evaluation.
In June 2021, JAMA Internal Medicine published “External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients,” by Andrew Wong and colleagues at the University of Michigan. They tested the model against 27,697 patients across 38,455 hospitalizations at Michigan Medicine, in which 2,552 patients (7 percent) developed sepsis. The results were sobering: the model’s area under the curve was just 0.63, far below the 0.76 to 0.83 Epic had reported. It missed 1,709 of the 2,552 sepsis cases - about two-thirds - and it generated alerts on 18 percent of all hospitalized patients, a volume the authors warned could drive alert fatigue, where clinicians start ignoring warnings.
The study landed hard precisely because the model was so widely used. It became a reference case for a structural problem in clinical AI: a tool can be deployed across the country on the strength of internally reported metrics, then perform far worse when an independent group measures it against real outcomes. Epic later said it had updated the model.
Why business readers should care: a predictive model bundled into widely adopted software can reach enormous scale before anyone outside the vendor checks whether it works. Independent, external validation against real outcomes - not the developer’s own figures - is what separates a useful clinical tool from one that quietly floods clinicians with bad alerts.