HeadlinesBriefing favicon HeadlinesBriefing.com

ML Models: Training vs. Production Challenges

Towards Data Science •
×

Machine learning models often excel in training but falter in production, revealing subtle issues that go unnoticed in development. Sudheer Singamsetty, a seasoned data scientist, shares insights from his experience with real-time fraud detection and recommendation systems. These models, which showed strong performance during development, began to drift in production, with metrics like click-through rates and fraud detection accuracy slipping.

Singamsetty identifies several key issues that lead to this disparity. One major problem is time travel, where models inadvertently use future data during training. For instance, a fraud detection system might use chargeback data from the same day as transactions, giving the model an unfair advantage.

In production, this future data is unavailable, causing the model to fail. Another issue is feature defaults becoming signals. When missing values are filled with default values like zero, the model interprets these defaults as meaningful patterns rather than gaps.

This can lead to incorrect risk assessments, especially during peak hours when data might be temporarily unavailable. Additionally, population shift without distribution shift can cause models to degrade over time. As user bases expand into new demographics or behaviors, the underlying user population changes, even if the feature distributions remain stable.

This shift can go undetected by standard monitoring tools, leading to gradual model degradation. These challenges underscore the importance of designing ML systems with a deep understanding of how data evolves over time and across different user segments. Production systems must be built to handle the complexities of real-world data, where time, latency, and changing user populations play critical roles.