HeadlinesBriefing favicon HeadlinesBriefing.com

ML Inference Optimization on Databricks

Towards Data Science •
×

When a 420-core Databricks cluster spent nearly 10 hours processing just 18 partitions, engineers knew their ML inference pipeline needed optimization. The team implemented a dynamic salting technique to distribute data across buckets proportional to product volumes, addressing their uneven 550M-row dataset where Product D comprised 79.7% of the data.

The solution involved calculating bucket distribution based on row counts, with larger products receiving more buckets. After testing different approaches, they enforced a 1M row limit per partition to prevent Spark from creating partitions with too many rows, which would have increased processing time despite proper distribution.

Partitioned tables with salting and the row limit proved most effective for their inference workload, outperforming liquid-clustered approaches and adaptive query execution. This approach maximized cluster utilization and provided predictable inference times across their four-product ML pipeline, delivering measurable performance improvements without requiring additional infrastructure investment.