In July 2021, STAT and the Massachusetts Institute of Technology set out to answer a simple question with big implications for the use of AI in medicine: How do popular algorithms used to warn of bad outcomes for patients hold up over time?
The months-long experiment, born of a novel partnership in journalism and science, yielded an illuminating result: the algorithms deteriorated over several years, delivering faulty advice about which patients were at the highest risk of deadly complications and prolonged hospital stays.
Getting to that conclusion required months of data wrangling and analysis to test key assumptions, replicate findings, and chase down elusive answers in the data. The outcome is described in a narrative designed to explain how algorithms that initially seem so promising can so quickly go off the rails. This document goes into greater depth about the experiment’s methods, technical details, and the limitations of the findings.
In short, this is how we did it.