HeadlinesBriefing favicon HeadlinesBriefing.com

ML model aims to predict World Cup 2026 outcomes

Towards Data Science •
×

With the World Cup 2026 kickoff on June 11, a data scientist assembled 49,000 matches spanning 1872‑2026, merging results, Elo ratings and goal‑scorer logs. The project treats each fixture as a probabilistic event, testing multinomial regression, ridge‑elastic net and LightGBM models to forecast home wins, draws or away wins.

The author engineered features to capture rating freshness, draw propensity and venue context, then split the data chronologically, training on pre‑2018 games and reserving roughly 8,000 post‑2018 matches for validation. Calibration work boosted home‑win prediction accuracy to 86%, though draw detection lagged, improving only 3.3% after feature tweaks.

Analysis shows that low‑scoring soccer matches create a modeling blind spot: models over‑confidently predict home victories when draws are likely. By adding rating‑difference and recent draw‑rate variables, the best model narrows this gap but still misclassifies about one‑fifth of draws. The code and full dataset are publicly available for replication.