Uploaded by 205223019

LSM TREE

advertisement
LSM TREE
What makes NoSQL databases so fast and efficient
Submitted By
205223019
BEFORE WE BEGIN…
 A typical Database Management System (DBMS in short) consists of multiple
components, each responsible for handling different aspects of data storage,
retrieval and management.
 One such component is the storage engine which is responsible for providing a
reliable interface for reading and writing data efficiently from/to the
underlying storage device.
 It's the component that implements the two among the four big tasks of databases
i.e., the ACID properties: Atomicity and Durability.
 In addition to that, the performance of a storage engine matters a lot in the choice of
a database as it's the component that's closest to the storage device in use.
 Two popular data structures for implementing storage engines are B+ Trees and
LSM Trees.
THE TWO COMPONENT LOG STRUCTURED MERGE TREE ALGORITHM
Memory
Disk
SS(Sorted String) Tables
Balanced binary tree
def put(self, key, value):
# Insert a key-value pair into the LSM Tree
WRITE
self.memtable.add(key, value)
if self.memtable.size() >= self.compaction_threshold:
self.compact()
Lsm tree based database like Cassandra
Aman
200
Ram
700
Ankit
320
Server
Memory
MemorySize
SizeThreshold
Threshold= 3
WRITE
Aman-200
In-Memory Red Black Tree
SSD/HDD
Lsm tree based database like Cassandra
Aman
200
Ram
700
Ankit
320
Memory Size Threshold = 3
WRITE
Aman-200
Ram-700
Server
In-Memory Red Black Tree
SSD/HDD
Lsm tree based database like Cassandra
Aman
200
Ram
700
Ankit
320
Threshold Reached
WRITE
Aman-200
Ankit -320
Server
Ram-700
In-Memory Red Black Tree
SSD/HDD
When the MemTable is flushed, it is persisted to disk as an immutable SSTable that contains the sorted key-value
pairs
Lsm tree based database like Cassandra
SSTable Creation
Aman
200
Ram
700
Ankit
320
Rohan
133
Server
WRITE
Rohan-133
FLUSH
In-Memory Red Black Tree
Aman
220
Rohan
133
sumit
222
SSD/HDD
The amortized time complexity for Write operation is O(1) as it writes in in-memory
UPDATE
DELETE
def get(self, key):
# Retrieve the value associated with the given key
# Search in the memtable first, then in the SSTables
value = self.memtable.get(key)
READ
if value is None:
for table in reversed(self.sstable):
value = table.get(key)
if value is not None:
break
return value
a=3
a=2
 All reads in a LSM Tree is served first from the Memtable.
 If the key is not found in the Memtable, it is then looked up in the most recent level L0 then L1, L2 and so on
till we either find the key or return a null value.
 Since the SSTables are already sorted, we benefit from the ability to perform a binary search on the files to
quickly narrow down the range within which the key may be found.
 The time complexity for Read operation is O(logN) where N is the number of records in the disk.
READ OPTIMISATION
 Even though search is fast on sorted data , going through all the on disk SSTables consumes lot of I/O.
 So Summary tables are kept in the memory that contains the min/max range of each disk block of every
level.
 It allows the system to skip searches on those disk blocks where the key doesn’t fall within the range. This
saves a lot of I/O.
WHAT HAPPENS WHEN THERE ARE SO MANY SSTABLES
 As the number of SSTables grows, it would take an increasingly long time to look up a key.
 As the SSTables accumulate there are more and more outdated entries as keys are updated and
tombstone are added. These take up precious disk space.
COMPACTION
 Compaction is the process of compacting and
eliminating redundant or obsolete data in the LSM
Tree.
 The core algorithm utilized by compaction is the k-way
merge sort algorithm adapted to SSTables .
 It helps manage disk space efficiently.
 Compaction involves the following steps:
 Overlapping Key Ranges: During the merge
process, SSTables with overlapping key ranges
are identified. These overlapping ranges are
resolved to eliminate redundant data.
 Tombstones: In some LSM Tree, tombstones are
used to mark keys that have been deleted. During
compaction, SSTables containing tombstones can
be safely discarded, freeing up disk space.
WHAT IF THE KEY DOESN’T EXIST
WHAT IF THE KEY DOESN’T EXIST
WHAT IF THE KEY DOESN’T EXIST
WHAT IF THE KEY DOESN’T EXIST
 Keep a Bloom Filter at each level.
 A Bloom Filter is a space-efficient data structure that returns a firm no if the key does not exist , and a 'probably
yes’ if a key might exist.
 This allows the system to skip a level entirely if the key does not exist there which reduces the number of
random I/O required.
B-TREE VS LSM TREE AVERAGE CASE TIME COMPARISON
OPERATION
B+TREE
LSM TREE
SEARCH
O(logn)
O(logn)
INSERTION
O(logn)
O(1)
DELETION
O(logn)
O(1)
READ
O(logn)
O(logn)
DATABASES THAT USE LSM TREE
THANK YOU
Download