HeadlinesBriefing favicon HeadlinesBriefing.com

Why Your Pandas Code Is Slow: Row-wise Operations Explained

Towards Data Science •
×

The author spent years writing Pandas code that worked without errors, even if it took minutes to run. Then they discovered vectorization and realized "no errors" and "efficient code" are two different things. Pandas doesn't warn you when you're doing something expensive—it just does it slowly, trusting you to write efficient code. Unlike SQL with its query optimizer, Pandas gives you flexibility at a hidden cost.

The biggest mistake: looping through dataframes row by row using .iterrows() or .apply(axis=1). On 100,000 rows, .iterrows() took 10.2 seconds while the vectorized approach took 688 microseconds—roughly 14,800x faster. Pandas is built on NumPy and optimized for column operations, not row-by-row processing. The fix is simple: df['sales'] * (1 - df['discount']) instead of looping.

Before optimizing, measure first. Use %timeit to compare approaches and df.memory_usage() to check data types—float64 takes twice the space of int32. The shift is mindset: working code and efficient code aren't the same thing. Once that clicks, everything else follows.