Data Storage in Distributed Systems: LSM & B-Trees

‭Designing‬‭Data‬‭Intensive‬‭applications‬ ‭Storage‬‭needs‬‭in‬‭distributed‬‭systems‬ ‭‬ ● ‭●‬ ‭●‬ ‭●‬ ‭●‬ ‭ tore‬‭data‬‭so‬‭that‬‭they,‬‭or‬‭another‬‭application,‬‭can‬‭find‬‭it‬‭again‬‭later‬‭(databases)‬ S ‭Remember‬‭the‬‭result‬‭of‬‭an‬‭expensive‬‭operation,‬‭to‬‭speed‬‭up‬‭reads‬‭(caches)‬ ‭Allow‬‭users‬‭to‬‭search‬‭data‬‭by‬‭keyword‬‭or‬‭filter‬‭it‬‭in‬‭various‬‭ways‬‭(search‬‭indexes)‬ ‭Send‬‭a‬‭message‬‭to‬‭another‬‭process,‬‭to‬‭be‬‭handled‬‭asynchronously‬‭(stream‬‭processing)‬ ‭Periodically‬‭crunch‬‭a‬‭large‬‭amount‬‭of‬‭accumulated‬‭data‬‭(batch‬‭processing)‬ ‭●‬ A ‭ mazon‬‭has‬‭also‬‭observed‬‭that‬‭a‬‭100‬‭ms‬‭increase‬‭in‬‭response‬‭time‬‭reduces‬‭sales‬‭by‬ ‭1%‬‭[‬‭20‬‭]‬ ‭●‬ ‭Total‬‭cost‬‭of‬‭ownership‬ ‭○‬ ‭It‬‭is‬‭well‬‭known‬‭that‬‭the‬‭majority‬‭of‬‭the‬‭cost‬‭of‬‭software‬‭is‬‭not‬‭in‬‭its‬‭initial‬ ‭development,‬‭but‬‭in‬‭its‬‭ongoing‬‭maintenance—fixing‬‭bugs,‬‭keeping‬‭its‬‭systems‬ ‭operational,‬‭investigating‬‭failures,‬‭adapting‬‭it‬‭to‬‭new‬‭platforms,‬‭modifying‬‭it‬‭for‬ ‭new‬‭use‬‭cases,‬‭repaying‬‭technical‬‭debt,‬‭and‬‭adding‬‭new‬‭features.‬ ‭●‬ ‭Along‬‭with‬‭reliability‬‭and‬‭scalability,‬‭another‬‭important‬‭things‬‭are‬‭:‬‭operability,‬‭simplicity‬ ‭and‬‭plasticity/evolvability‬ ‭●‬ ‭Making‬‭a‬‭system‬‭simpler‬‭does‬‭not‬‭necessarily‬‭mean‬‭reducing‬‭its‬‭functionality;‬‭it‬‭can‬‭also‬ ‭mean‬‭removing‬‭accidental‬‭complexity.‬‭Moseley‬‭and‬‭Marks‬‭[‬‭32‬‭]‬‭define‬‭complexity‬‭as‬ ‭accidental‬‭if‬‭it‬‭is‬‭not‬‭inherent‬‭in‬‭the‬‭problem‬‭that‬‭the‬‭software‬‭solves‬‭(as‬‭seen‬‭by‬ ‭the‬‭users)‬‭but‬‭arises‬‭only‬‭from‬‭the‬‭implementation.‬ ‭●‬ ‭One‬‭of‬‭the‬‭best‬‭tools‬‭we‬‭have‬‭for‬‭removing‬‭accidental‬‭complexity‬‭is‬‭abstraction‬‭.‬ ‭Chapter‬‭3‬ ‭●‬ ‭Indices‬‭speed‬‭up‬‭reads‬‭but‬‭slows‬‭down‬‭writes.‬‭The‬‭tradeoff.‬ ‭LSM‬‭Storage‬‭Engine‬ ‭Introduction‬‭to‬‭LSM‬‭Trees:‬ ‭●‬ ‭LSM‬‭(Log‬‭Structured‬‭Merge)‬‭Tree‬‭is‬‭a‬‭data‬‭structure‬‭employed‬‭by‬‭various‬ ‭NoSQL‬‭databases‬‭like‬‭DynamoDB,‬‭Cassandra,‬‭and‬‭ScyllaDB.‬ ‭●‬ ‭These‬‭databases‬‭are‬‭designed‬‭to‬‭handle‬‭large‬‭volumes‬‭of‬‭write‬‭operations‬ ‭efficiently,‬‭which‬‭traditional‬‭relational‬‭databases‬‭struggle‬‭with.‬ ‭●‬ ‭LSM‬‭Trees‬‭achieve‬‭this‬‭by‬‭optimizing‬‭write‬‭performance‬‭and‬‭maintaining‬ ‭reasonable‬‭read‬‭performance.‬ ‭Comparing‬‭Storage‬‭Engines:‬ ‭●‬ ‭The‬‭article‬‭contrasts‬‭LSM‬‭Trees‬‭with‬‭B+‬‭Trees,‬‭which‬‭are‬‭commonly‬‭used‬ ‭in‬‭relational‬‭databases.‬ ‭●‬ ‭Unlike‬‭B+‬‭Trees,‬‭which‬‭perform‬‭in-place‬‭updates,‬‭LSM‬‭Trees‬‭are‬ ‭append-only.‬‭This‬‭eliminates‬‭random‬‭I/O‬‭operations,‬‭enhancing‬‭write‬ ‭performance.‬ ‭Architecture‬‭of‬‭LSM‬‭Trees:‬ ‭●‬ ‭LSM‬‭Trees‬‭leverage‬‭multiple‬‭data‬‭structures‬‭to‬‭exploit‬‭different‬‭storage‬ ‭device‬‭characteristics.‬ ‭●‬ ‭They‬‭consist‬‭of‬‭two‬‭main‬‭components:‬‭Memtables‬‭and‬‭SS‬‭Tables.‬ ‭●‬ ‭Memtables‬‭temporarily‬‭store‬‭incoming‬‭writes‬‭in‬‭memory,‬‭organizing‬‭them‬ ‭by‬‭object-key‬‭pairs.‬ ‭●‬ ‭When‬‭a‬‭Memtable‬‭reaches‬‭a‬‭certain‬‭size,‬‭it's‬‭flushed‬‭to‬‭disk‬‭as‬‭an‬ ‭immutable‬‭SS‬‭Table,‬‭ensuring‬‭sequential‬‭I/O‬‭operations.‬‭These‬‭are‬‭called‬ ‭sorted‬‭run‬‭files.‬‭There‬‭is‬‭also‬‭typically‬‭a‬‭small‬‭sparse‬‭index‬‭for‬‭the‬‭range‬ ‭of‬‭keys‬‭that‬‭this‬‭file‬‭holds.‬‭This‬‭makes‬‭searching‬‭for‬‭a‬‭large‬‭disk‬‭file‬‭super‬ ‭fast.‬‭The‬‭secret‬‭sauce‬‭for‬‭SSTable‬‭:)‬ ‭●‬ ‭The‬‭new‬‭SS‬‭Table‬‭becomes‬‭the‬‭most‬‭recent‬‭segment‬‭of‬‭the‬‭LSM‬‭Tree,‬‭and‬ ‭this‬‭process‬‭continues‬‭as‬‭more‬‭data‬‭arrives.‬ ‭Operations‬‭on‬‭LSM‬‭Trees:‬ ‭●‬ ‭Delete:‬‭LSM‬‭Trees‬‭handle‬‭deletions‬‭by‬‭adding‬‭tombstones‬‭to‬‭the‬‭most‬ ‭recent‬‭SS‬‭Table,‬‭indicating‬‭that‬‭an‬‭object‬‭has‬‭been‬‭deleted.‬ ‭●‬ ‭Read:‬‭Reads‬‭involve‬‭searching‬‭through‬‭Memtables‬‭and‬‭SS‬‭Tables‬ ‭sequentially.‬‭Since‬‭SS‬‭Tables‬‭are‬‭sorted,‬‭lookups‬‭can‬‭be‬‭efficient.‬ ‭●‬ ‭Write:‬‭Incoming‬‭writes‬‭are‬‭buffered‬‭in‬‭memory‬‭and‬‭periodically‬‭flushed‬‭to‬ ‭disk‬‭as‬‭SS‬‭Tables.‬ ‭●‬ ‭Compaction:‬‭As‬‭SS‬‭Tables‬‭accumulate,‬‭a‬‭compaction‬‭process‬‭merges‬‭and‬ ‭discards‬‭outdated‬‭or‬‭deleted‬‭values,‬‭reclaiming‬‭disk‬‭space.‬ ‭Compaction‬‭Strategies:‬ ‭●‬ ‭The‬‭article‬‭discusses‬‭different‬‭compaction‬‭strategies,‬‭such‬‭as‬‭size‬‭tier‬ ‭compaction‬‭and‬‭level‬‭compaction.‬ ‭●‬ C ‭ ompaction‬‭aims‬‭to‬‭manage‬‭the‬‭number‬‭of‬‭SS‬‭Tables‬‭efficiently‬‭while‬ ‭minimizing‬‭read‬‭and‬‭write‬‭amplification.‬ ‭LSM‬‭Tree‬‭Enhancements:‬ ‭●‬ ‭Various‬‭optimizations,‬‭like‬‭summary‬‭tables‬‭and‬‭Bloom‬‭filters,‬‭are‬ ‭employed‬‭to‬‭improve‬‭lookup‬‭performance‬‭and‬‭reduce‬‭I/O‬‭operations.‬ ‭●‬ ‭Summary‬‭tables‬‭store‬‭metadata‬‭about‬‭disk‬‭blocks,‬‭enabling‬‭skipping‬ ‭unnecessary‬‭searches.‬ ‭●‬ ‭Bloom‬‭filters‬‭help‬‭in‬‭determining‬‭whether‬‭a‬‭key‬‭exists‬‭in‬‭a‬‭level‬‭without‬ ‭performing‬‭exhaustive‬‭searches,‬‭reducing‬‭I/O.‬ ‭Drawbacks‬‭of‬‭LSM‬‭Trees‬‭:‬ ‭●‬ ‭Despite‬‭their‬‭advantages,‬‭LSM‬‭Trees‬‭have‬‭drawbacks,‬‭particularly‬‭related‬ ‭to‬‭the‬‭resource-intensive‬‭nature‬‭of‬‭compaction.‬ ‭●‬ ‭Compaction‬‭involves‬‭compression/decompression‬‭of‬‭data,‬‭which‬‭can‬ ‭impact‬‭read‬‭and‬‭write‬‭performance.‬ ‭●‬ ‭Additionally,‬‭reads‬‭can‬‭be‬‭slow‬‭in‬‭the‬‭worst-case‬‭scenario‬‭due‬‭to‬‭the‬ ‭append-only‬‭nature‬‭of‬‭LSM‬‭Trees.‬ I‭n‬‭summary,‬‭LSM‬‭Trees‬‭play‬‭a‬‭crucial‬‭role‬‭in‬‭enabling‬‭NoSQL‬‭databases‬‭to‬‭handle‬‭high‬ ‭write‬‭rates‬‭efficiently.‬‭They‬‭achieve‬‭this‬‭through‬‭a‬‭combination‬‭of‬‭append-only‬‭storage,‬ ‭efficient‬‭flushing‬‭mechanisms,‬‭and‬‭compaction‬‭strategies.‬‭However,‬‭mitigating‬‭the‬ ‭drawbacks‬‭associated‬‭with‬‭compaction‬‭remains‬‭a‬‭challenge‬‭for‬‭optimizing‬‭LSM‬‭Tree‬ ‭performance.‬ ‭Resources‬ ‭ ‬ ‭https://www.youtube.com/watch?v=I6jB0nM9SKU&ab_channel=ByteByteGo‬ ● ‭●‬ ‭BTree‬‭Storage‬‭Engine‬ I‭n‬‭a‬‭B-tree‬‭index‬‭structure,‬‭a‬‭root‬‭page‬‭serves‬‭as‬‭the‬‭starting‬‭point‬‭for‬‭key‬‭lookups.‬‭Each‬ ‭page‬‭contains‬‭keys‬‭and‬‭references‬‭to‬‭child‬‭pages,‬‭with‬‭each‬‭child‬‭responsible‬‭for‬‭a‬ ‭specific‬‭key‬‭range.‬‭To‬‭find‬‭a‬‭key,‬‭you‬‭follow‬‭the‬‭reference‬‭corresponding‬‭to‬‭the‬‭key's‬ ‭range‬‭until‬‭you‬‭reach‬‭a‬‭leaf‬‭page‬‭containing‬‭individual‬‭keys‬‭and‬‭their‬‭values.‬ ‭ he‬‭number‬‭of‬‭references‬‭to‬‭child‬‭pages‬‭in‬‭a‬‭page‬‭is‬‭called‬‭the‬‭branching‬‭factor,‬ T ‭typically‬‭several‬‭hundred.‬‭To‬‭update‬‭a‬‭value‬‭for‬‭an‬‭existing‬‭key,‬‭you‬‭locate‬‭the‬‭leaf‬‭page‬ ‭containing‬‭the‬‭key,‬‭update‬‭the‬‭value,‬‭and‬‭write‬‭the‬‭page‬‭back‬‭to‬‭disk.‬‭Adding‬‭a‬‭new‬‭key‬ ‭involves‬‭finding‬‭the‬‭appropriate‬‭page‬‭and‬‭adding‬‭it‬‭there.‬‭If‬‭there's‬‭insufficient‬‭space,‬ ‭the‬‭page‬‭is‬‭split‬‭into‬‭two,‬‭and‬‭the‬‭parent‬‭page‬‭is‬‭updated‬‭to‬‭reflect‬‭the‬‭new‬‭key‬‭ranges.‬ ‭ his‬‭process‬‭ensures‬‭efficient‬‭key‬‭lookups‬‭and‬‭updates‬‭in‬‭the‬‭B-tree‬‭structure,‬‭even‬‭as‬ T ‭the‬‭dataset‬‭grows.‬ ‭Notes‬ ‭●‬ ‭During‬‭page‬‭split,‬‭the‬‭write‬‭amplification‬‭can‬‭be‬‭high‬ ‭●‬ ‭A‬‭four-level‬‭tree‬‭of‬‭4‬‭KB‬‭pages‬‭with‬‭a‬‭branching‬‭factor‬‭of‬‭500‬‭can‬‭store‬‭up‬‭to‬‭250‬‭TB.‬ ‭●‬ ‭In‬‭order‬‭to‬‭make‬‭the‬‭database‬‭resilient‬‭to‬‭crashes,‬‭it‬‭is‬‭common‬‭for‬‭B-tree‬ ‭implementations‬‭to‬‭include‬‭an‬‭additional‬‭data‬‭structure‬‭on‬‭disk:‬‭a‬‭write-ahead‬‭log‬‭(WAL,‬ ‭also‬‭known‬‭as‬‭a‬‭redo‬‭log).‬ ‭●‬ ‭Comparison‬‭with‬‭LSM‬‭Trees‬ ‭○‬ ‭LSM‬‭compaction‬‭can‬‭be‬‭heavy‬‭and‬‭hurt‬‭the‬‭tail‬‭latencies‬‭similar‬‭to‬‭java‬‭garbage‬ ‭collection‬ ‭○‬ ‭The‬‭disk‬‭bandwidth‬‭is‬‭shared‬‭for‬‭both‬‭writes‬‭(i.e.‬‭appends)‬‭and‬‭compaction‬ ‭OLAP‬‭Databases‬ ‭ ome‬‭of‬‭the‬‭columns‬‭in‬‭the‬‭fact‬‭table‬‭are‬‭attributes,‬‭such‬‭as‬‭the‬‭price‬‭at‬‭which‬‭the‬‭product‬‭was‬ S ‭sold‬‭and‬‭the‬‭cost‬‭of‬‭buying‬‭it‬‭from‬‭the‬‭supplier‬‭(allowing‬‭the‬‭profit‬‭margin‬‭to‬‭be‬‭calculated).‬ ‭Other‬‭columns‬‭in‬‭the‬‭fact‬‭table‬‭are‬‭foreign‬‭key‬‭references‬‭to‬‭other‬‭tables,‬‭called‬‭dimension‬ ‭tables‬‭.‬‭As‬‭each‬‭row‬‭in‬‭the‬‭fact‬‭table‬‭represents‬‭an‬‭event,‬‭the‬‭dimensions‬‭represent‬‭the‬‭who‬‭,‬ ‭what‬‭,‬‭where‬‭,‬‭when‬‭,‬‭how‬‭,‬‭and‬‭why‬‭of‬‭the‬‭event.‬ ‭ he‬‭name‬‭“star‬‭schema”‬‭comes‬‭from‬‭the‬‭fact‬‭that‬‭when‬‭the‬‭table‬‭relationships‬‭are‬‭visualized,‬‭the‬ T ‭fact‬‭table‬‭is‬‭in‬‭the‬‭middle,‬‭surrounded‬‭by‬‭its‬‭dimension‬‭tables;‬‭the‬‭connections‬‭to‬‭these‬‭tables‬‭are‬ ‭like‬‭the‬‭rays‬‭of‬‭a‬‭star.‬ ‭ ‬‭variation‬‭of‬‭this‬‭template‬‭is‬‭known‬‭as‬‭the‬‭snowflake‬‭schema‬‭,‬‭where‬‭dimensions‬‭are‬‭further‬ A ‭broken‬‭down‬‭into‬‭subdimensions.‬ ‭Columnar‬‭storage‬ ‭ lthough‬‭fact‬‭tables‬‭are‬‭often‬‭over‬‭100‬‭columns‬‭wide,‬‭a‬‭typical‬‭data‬‭warehouse‬‭query‬‭only‬ A ‭accesses‬‭4‬‭or‬‭5‬‭of‬‭them‬‭at‬‭one‬‭time‬‭(‭ " ‬ SELECT‬‭ *"‬‭queries‬‭are‬‭rarely‬‭needed‬‭for‬‭analytics)‬

Data Storage in Distributed Systems: LSM & B-Trees

Related documents

Products

Support

Data Storage in Distributed Systems: LSM & B-Trees

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib