This is, of course, a great result – compare it to the almost 8-fold space amplification we saw in the previous post for STCS!However, in its current implementation (in both Scylla and Cassandra), LCS doesn’t always provide such excellent space amplification.
The paper Optimizing Space Amplification in Rocks DB suggests that this can be fixed by changing the level sizes so that instead of insisting that L3 has exactly 1000 sstables, we focus on L3 having 10 times more sstables than L2.
Neither Scylla nor Cassandra have this fix yet, so in worst case during massive overwrites, their LCS may still have space amplification of 2.
E.g., consider that we have a filled L2 with 100 sstables but L3 also has just 100 sstables (and not 1000).
In this case, the last level only has about half of the data, half of the data may be duplicated, so we may see 2-fold space amplification.
LCS does not have the temporary disk space problem which plagued STCS: While STCS may need to do huge compactions and temporarily have both input and output on disk, LCS always does small compaction steps, involving roughly 11 input and output sstables of a fixed size.
This means we may need roughly 11*160MB, less than 2 GB, of temporary disk space – not half the disk as in STCS. The reason is that most of the data is stored in the biggest level, and since this level is a run – with different sstables having no overlap – we cannot have any duplicates inside this run. The best case for LCS is that the last level is filled.For example, if the last level is L3, it has 1000 sstables.In this case, L2 and L1 together have just 110 sstables, compared to 1000 sstables in L3.Each of the other levels, L1, L2, L3, etc., is a single run of an exponentially increasing size: L1 is a run of 10 sstables, L2 is a run of 100 sstables, L3 is a run of 1000 sstables, and so on.(Factor 10 is the default setting in both Scylla and Apache Cassandra).The first thing that Leveled Compaction does is to replace large sstables, the staple of STCS, by “runs” of fixed-sized (by default, 160 MB) sstables.A run is a log-structured-merge (LSM) term for a large sorted file split into several smaller files.As unfortunate this is, it is of course not nearly as bad as the 8-fold space amplification we saw for STCS.In the previous post, we looked at two simple examples to demonstrate STCS’s high space amplification. The first example was straightforward writing of new data at a constant pace, and we saw high temporary disk space use during compaction – at some points doubling the amount of disk space needed.It actually has a worst case where we can get 2-fold space amplification.This happens when the last level is not filled, but rather only filled as much as the previous level.