HeadlinesBriefing favicon HeadlinesBriefing.com

B-Trees vs LSM Trees: Database Storage Tradeoffs Explained

ByteByteGo •
×

Databases face a fundamental constraint: disk access is orders of magnitude slower than memory. This single reality shapes every architectural decision in database design. Over decades of research, two approaches have emerged as dominant: B-Trees and LSM Trees. Each takes a fundamentally different approach to organizing data on disk, and each pays a different price for its strengths.

B-Trees keep data sorted on disk in a balanced tree structure, enabling fast lookups. Reads are efficient because the database can traverse directly to the desired record. However, every write operation requires maintaining this sorted structure—insertions must find the correct position, potentially causing page splits and rewrites. This makes B-Trees read-optimized but write-heavy.

LSM Trees take the opposite approach, buffering writes in memory before flushing them to disk in sorted batches. This makes writes dramatically cheaper since there's no immediate need to maintain a perfect on-disk structure. The tradeoff is that reads become more expensive, potentially requiring multiple data structures to be checked. This approach favors write-heavy workloads.

Neither approach is objectively better. B-Trees excel in read-heavy scenarios where data integrity and consistent performance matter—think transactional systems and key-value stores. LSM Trees shine in write-intensive applications like logging, time-series databases, and analytics workloads where ingesting large volumes of data quickly matters more than individual read latency. Understanding this tradeoff is one of the most useful mental models in system design.