What are Memtable and SStable In Cassandra?
Cassandra processes data at several stages on
the write path, starting with the immediate logging of a write and ending in
with a write of data to disk:
- 1.
Logging data in the commit log
- 2.
Writing data to the memtable
- 3.
Flushing data from the memtable
- 4.
Storing data on disk in SSTables
When a write occurs, Cassandra stores the data in a memory structure
called memtable,
and to provide configurable durability, it also appends writes to the commit
log on disk. The commit log receives every write made to a Cassandra node, and
these durable writes survive permanently even if power fails on a node. The
memtable is a write-back cache of data partitions that Cassandra looks up by
key. The memtable stores writes in sorted order until reaching a configurable
limit, and then is flushed.
When
memtable contents exceed a configurable threshold or the commitlog space exceeds the commitlog_total_space_in_mb, the memtable data, which includes indexes, is
put in a queue to be flushed to disk. To flush the data, Cassandra sorts
memtables by partition key and then writes the data to disk sequentially. The
process is extremely fast because it involves only a commitlog append and the
sequential write.
Data
in the commit log is purged after its corresponding data in the memtable is
flushed to the SSTable. The commit log is for recovering the data in memtable
in the event of a hardware failure.
A sorted string table (SSTable) is an immutable data file to which Cassandra
writes a memtable. Cassandra flushes all the data in the memtables to the
SSTables once the memtables reach a threshold value. Consequently, a partition is typically stored
across multiple SSTable files. A number of other SSTable structures exist to
assist read operations:
For
each SSTable, Cassandra creates these structures:
The SSTable data
Index of the row keys with pointers to their positions in the
data file
Bloom filter (Filter.db)
A structure stored in memory that checks if row data exists in
the memtable before accessing SSTables on disk
A file holding information about uncompressed data length, chunk
offsets and other compression information
Statistical metadata about the content of the SSTable
A file holding adler32 checksum of the data file
A file holding the CRC32 for chunks in an uncompressed file.
A sample of the partition index stored in memory
A file that stores the list of all components for the SSTable
TOC
Built-in secondary index. Multiple SIs may exist per SSTable
References: Datastax Docs