HeadlinesBriefing favicon HeadlinesBriefing.com

Causal Inference Is Eating Machine Learning | Data Science Shift

Towards Data Science •
×

Machine learning models often predict outcomes with high accuracy but fail disastrously when used for real-world decisions. A prime example involves a hospital readmission-prediction model boasting 94% accuracy that guided patient prioritization. Yet, acting on its predictions worsened readmission rates. The model captured correlations like zip code and discharge diagnosis but missed the confounding factors: patients' inability to afford medications or access transportation. These hidden variables (Z) created a 'confounding fork' where the model's predictors (X) weren't the true causes (Y). This gap between prediction and causation explains why models that excel on test sets can lead to harmful interventions.

Moving from association to intervention requires causal inference, a shift from 'what will happen?' to 'what should we do?'. Judea Pearl's Ladder of Causation defines three rungs: Level 1 (association) for pattern recognition, Level 2 (intervention) for testing actions like 'if we give this discount, will sales rise?', and Level 3 (counterfactual) for reasoning about alternate realities. Most ML operates at Level 1, while business decisions demand Level 2 or 3. The kidney stone study and hormone replacement therapy case illustrate how observational data can mislead when confounding variables aren't controlled, turning accurate predictions into dangerous recommendations.

Fortunately, 2026 marks a turning point. Tools like Microsoft Research's DoWhy library now provide accessible, robust causal analysis. DoWhy reduces causal inference to four steps: modeling assumptions, identifying the estimand, estimating effects, and rigorously refuting results. This maturity means any data scientist can now move beyond correlation and build models that not only predict but also guide effective actions.