How Databases Store Data

Quick notes on database storage internals. Watched this video and it clicked.

The Core Problem

Databases can’t keep everything in RAM—it’s volatile and expensive. Data lives on disk, which is 1000x+ slower than RAM.

RAM: ~100 nanoseconds access time
SSD: ~10-20 microseconds
HDD: ~5-10 milliseconds

The real killer? Random access. HDD seeks cost 5-10ms each. Reading 1000 random records = 10 seconds just in seek time. Sequential reads are way faster.

So databases obsess over locality—keeping related data close together on disk.

Two Main Storage Strategies

B+ Trees (Read-Optimized)

Used by: PostgreSQL, MySQL, SQLite

How it works:

Data organized in a sorted tree structure
Each lookup takes 3-4 disk reads (even with billions of records)
Tree height is predictable: log(records)

The catch: Writes are expensive. Inserting can trigger page splits, rebalancing, and multiple writes for a single logical insert. This is write amplification.

Good for: OLTP, transactions, mixed workloads with frequent lookups

LSM Trees (Write-Optimized)

Used by: Cassandra, ScyllaDB, RocksDB, ClickHouse

How it works:

Writes go to in-memory structure first (memtable)
When full, flush to disk as sorted, immutable files (SSTables)
All writes are sequential—no random I/O
Background compaction merges old SSTables

The catch: Reads check multiple files. Uses bloom filters (probabilistic data structure) to skip files that definitely don’t contain a key.

Good for: Time-series, logs, metrics, write-heavy workloads

Key Differences

Aspect	B+ Trees	LSM Trees
Read speed	Fast, consistent	Variable (check multiple files)
Write speed	Moderate	Very high
Write pattern	Random I/O	Sequential I/O only
Best use case	Transactions, OLTP	Logs, metrics, append-heavy

Real Example

Discord migrated from MongoDB (B-Tree) to Cassandra (LSM) because random reads/writes became too slow as data grew. Message history is mostly append-only—perfect for LSM’s sequential write advantage. They later moved to ScyllaDB (faster LSM) for performance.

When to Pick What

Need strong consistency + ACID? → B+ Tree (PostgreSQL, MySQL)
Need high write throughput? → LSM (Cassandra, RocksDB)
Banking/e-commerce? → B+ Tree
Logs/metrics/events? → LSM

No universal winner—just tradeoffs based on workload.

References

PostgreSQL Docs - B+ Tree internals
RocksDB Wiki - LSM implementation details
Use The Index, Luke - database indexing fundamentals explained simply
SQLite B-Tree — how SQLite actually stores data

Lalit's Cloud & DevOps notes