HeadlinesBriefing favicon HeadlinesBriefing.com

Unraveling the ‘Comically Bad’ Stroke Dataset Controversy

Hacker News •
×

Researchers at Queensland University of Technology uncovered a dataset on Kaggle that mixes celebrity photos with medical images, calling it “comically bad.” The collection, titled “droopy,” contains 1,024 pictures supposedly from stroke patients but includes duplicates of Sylvester Stallone, George Clooney, and Bell’s palsy cases. Kaggle, owned by Google, hosts the files in an open‑source repository.

The faulty set underlies a December paper in Scientific Reports that trained a stroke‑detection model for real‑time clinical use. Barnett and Ph.D. student Alexander Gibson traced 124 papers using the data, many lacking provenance and ethics statements. Alaa Mohamed of Mansoura University, the corresponding author, did not respond to inquiries.

Springer Nature added an editor’s note and retracted several articles, while Elsevier and MDPI are investigating others. Researchers argue that all tools built on these datasets must be removed until provenance is verified. The incident highlights the risk of unvetted data fueling clinical AI, underscoring the need for stricter repository documentation.

This case echoes earlier warnings that datasets containing children’s faces without consent were used in hundreds of studies. It forces publishers to tighten data‑provenance checks and pushes the community to adopt transparent, auditable repositories. Until then, clinicians relying on such models risk misdiagnosis and ethical breaches.