HeadlinesBriefing favicon HeadlinesBriefing.com

LLVM Vectorization Flaw Causes Performance Regression

Hacker News •
×

A recent LLVM patch introduced ordered vector reductions to replace scalar fadd chains on RISCV targets, causing an 89% performance regression. The optimization failed to account for the cost of building the initial vector per iteration, leading the compiler to incorrectly deem unprofitable code as profitable. This represents a significant oversight in LLVM's SLP vectorizer cost model that affected real-world benchmarks.

The problematic codegen performs a sequence of fsd (Float Store Double) instructions to store scalar values to memory, followed by vle64.v to load them into a vector register, then executes vfredosum.vs for reduction. This approach requires expensive memory operations that weren't properly evaluated against the benefits of vectorization. The middle-end introduced these changes before the backend, making this an optimization issue rather than a backend problem.

Developers identified the regression through Igalia's LNT instance for the BPI-F3 benchmark, which showed 26% more instructions and 48% more cycles. By examining LLVM IR, researchers traced the issue to insertelement instructions building vectors that feed into the vector.reduce.fadd intrinsic. The fix requires adjusting the cost model to properly account for the overhead of vector construction in each iteration.