Uploaded by Shaik Moulaali

System Design Designing Data Intensive Applications (DDIA) Notes - Google Docs

advertisement
‭Designing‬‭Data‬‭Intensive‬‭applications‬
‭Storage‬‭needs‬‭in‬‭distributed‬‭systems‬
‭‬
●
‭●‬
‭●‬
‭●‬
‭●‬
‭ tore‬‭data‬‭so‬‭that‬‭they,‬‭or‬‭another‬‭application,‬‭can‬‭find‬‭it‬‭again‬‭later‬‭(databases)‬
S
‭Remember‬‭the‬‭result‬‭of‬‭an‬‭expensive‬‭operation,‬‭to‬‭speed‬‭up‬‭reads‬‭(caches)‬
‭Allow‬‭users‬‭to‬‭search‬‭data‬‭by‬‭keyword‬‭or‬‭filter‬‭it‬‭in‬‭various‬‭ways‬‭(search‬‭indexes)‬
‭Send‬‭a‬‭message‬‭to‬‭another‬‭process,‬‭to‬‭be‬‭handled‬‭asynchronously‬‭(stream‬‭processing)‬
‭Periodically‬‭crunch‬‭a‬‭large‬‭amount‬‭of‬‭accumulated‬‭data‬‭(batch‬‭processing)‬
‭●‬ A
‭ mazon‬‭has‬‭also‬‭observed‬‭that‬‭a‬‭100‬‭ms‬‭increase‬‭in‬‭response‬‭time‬‭reduces‬‭sales‬‭by‬
‭1%‬‭[‬‭20‬‭]‬
‭●‬ ‭Total‬‭cost‬‭of‬‭ownership‬
‭○‬ ‭It‬‭is‬‭well‬‭known‬‭that‬‭the‬‭majority‬‭of‬‭the‬‭cost‬‭of‬‭software‬‭is‬‭not‬‭in‬‭its‬‭initial‬
‭development,‬‭but‬‭in‬‭its‬‭ongoing‬‭maintenance—fixing‬‭bugs,‬‭keeping‬‭its‬‭systems‬
‭operational,‬‭investigating‬‭failures,‬‭adapting‬‭it‬‭to‬‭new‬‭platforms,‬‭modifying‬‭it‬‭for‬
‭new‬‭use‬‭cases,‬‭repaying‬‭technical‬‭debt,‬‭and‬‭adding‬‭new‬‭features.‬
‭●‬ ‭Along‬‭with‬‭reliability‬‭and‬‭scalability,‬‭another‬‭important‬‭things‬‭are‬‭:‬‭operability,‬‭simplicity‬
‭and‬‭plasticity/evolvability‬
‭●‬ ‭Making‬‭a‬‭system‬‭simpler‬‭does‬‭not‬‭necessarily‬‭mean‬‭reducing‬‭its‬‭functionality;‬‭it‬‭can‬‭also‬
‭mean‬‭removing‬‭accidental‬‭complexity.‬‭Moseley‬‭and‬‭Marks‬‭[‬‭32‬‭]‬‭define‬‭complexity‬‭as‬
‭accidental‬‭if‬‭it‬‭is‬‭not‬‭inherent‬‭in‬‭the‬‭problem‬‭that‬‭the‬‭software‬‭solves‬‭(as‬‭seen‬‭by‬
‭the‬‭users)‬‭but‬‭arises‬‭only‬‭from‬‭the‬‭implementation.‬
‭●‬ ‭One‬‭of‬‭the‬‭best‬‭tools‬‭we‬‭have‬‭for‬‭removing‬‭accidental‬‭complexity‬‭is‬‭abstraction‬‭.‬
‭Chapter‬‭3‬
‭●‬ ‭Indices‬‭speed‬‭up‬‭reads‬‭but‬‭slows‬‭down‬‭writes.‬‭The‬‭tradeoff.‬
‭LSM‬‭Storage‬‭Engine‬
‭Introduction‬‭to‬‭LSM‬‭Trees:‬
‭●‬ ‭LSM‬‭(Log‬‭Structured‬‭Merge)‬‭Tree‬‭is‬‭a‬‭data‬‭structure‬‭employed‬‭by‬‭various‬
‭NoSQL‬‭databases‬‭like‬‭DynamoDB,‬‭Cassandra,‬‭and‬‭ScyllaDB.‬
‭●‬ ‭These‬‭databases‬‭are‬‭designed‬‭to‬‭handle‬‭large‬‭volumes‬‭of‬‭write‬‭operations‬
‭efficiently,‬‭which‬‭traditional‬‭relational‬‭databases‬‭struggle‬‭with.‬
‭●‬ ‭LSM‬‭Trees‬‭achieve‬‭this‬‭by‬‭optimizing‬‭write‬‭performance‬‭and‬‭maintaining‬
‭reasonable‬‭read‬‭performance.‬
‭Comparing‬‭Storage‬‭Engines:‬
‭●‬ ‭The‬‭article‬‭contrasts‬‭LSM‬‭Trees‬‭with‬‭B+‬‭Trees,‬‭which‬‭are‬‭commonly‬‭used‬
‭in‬‭relational‬‭databases.‬
‭●‬ ‭Unlike‬‭B+‬‭Trees,‬‭which‬‭perform‬‭in-place‬‭updates,‬‭LSM‬‭Trees‬‭are‬
‭append-only.‬‭This‬‭eliminates‬‭random‬‭I/O‬‭operations,‬‭enhancing‬‭write‬
‭performance.‬
‭Architecture‬‭of‬‭LSM‬‭Trees:‬
‭●‬ ‭LSM‬‭Trees‬‭leverage‬‭multiple‬‭data‬‭structures‬‭to‬‭exploit‬‭different‬‭storage‬
‭device‬‭characteristics.‬
‭●‬ ‭They‬‭consist‬‭of‬‭two‬‭main‬‭components:‬‭Memtables‬‭and‬‭SS‬‭Tables.‬
‭●‬ ‭Memtables‬‭temporarily‬‭store‬‭incoming‬‭writes‬‭in‬‭memory,‬‭organizing‬‭them‬
‭by‬‭object-key‬‭pairs.‬
‭●‬ ‭When‬‭a‬‭Memtable‬‭reaches‬‭a‬‭certain‬‭size,‬‭it's‬‭flushed‬‭to‬‭disk‬‭as‬‭an‬
‭immutable‬‭SS‬‭Table,‬‭ensuring‬‭sequential‬‭I/O‬‭operations.‬‭These‬‭are‬‭called‬
‭sorted‬‭run‬‭files.‬‭There‬‭is‬‭also‬‭typically‬‭a‬‭small‬‭sparse‬‭index‬‭for‬‭the‬‭range‬
‭of‬‭keys‬‭that‬‭this‬‭file‬‭holds.‬‭This‬‭makes‬‭searching‬‭for‬‭a‬‭large‬‭disk‬‭file‬‭super‬
‭fast.‬‭The‬‭secret‬‭sauce‬‭for‬‭SSTable‬‭:)‬
‭●‬ ‭The‬‭new‬‭SS‬‭Table‬‭becomes‬‭the‬‭most‬‭recent‬‭segment‬‭of‬‭the‬‭LSM‬‭Tree,‬‭and‬
‭this‬‭process‬‭continues‬‭as‬‭more‬‭data‬‭arrives.‬
‭Operations‬‭on‬‭LSM‬‭Trees:‬
‭●‬ ‭Delete:‬‭LSM‬‭Trees‬‭handle‬‭deletions‬‭by‬‭adding‬‭tombstones‬‭to‬‭the‬‭most‬
‭recent‬‭SS‬‭Table,‬‭indicating‬‭that‬‭an‬‭object‬‭has‬‭been‬‭deleted.‬
‭●‬ ‭Read:‬‭Reads‬‭involve‬‭searching‬‭through‬‭Memtables‬‭and‬‭SS‬‭Tables‬
‭sequentially.‬‭Since‬‭SS‬‭Tables‬‭are‬‭sorted,‬‭lookups‬‭can‬‭be‬‭efficient.‬
‭●‬ ‭Write:‬‭Incoming‬‭writes‬‭are‬‭buffered‬‭in‬‭memory‬‭and‬‭periodically‬‭flushed‬‭to‬
‭disk‬‭as‬‭SS‬‭Tables.‬
‭●‬ ‭Compaction:‬‭As‬‭SS‬‭Tables‬‭accumulate,‬‭a‬‭compaction‬‭process‬‭merges‬‭and‬
‭discards‬‭outdated‬‭or‬‭deleted‬‭values,‬‭reclaiming‬‭disk‬‭space.‬
‭Compaction‬‭Strategies:‬
‭●‬ ‭The‬‭article‬‭discusses‬‭different‬‭compaction‬‭strategies,‬‭such‬‭as‬‭size‬‭tier‬
‭compaction‬‭and‬‭level‬‭compaction.‬
‭●‬ C
‭ ompaction‬‭aims‬‭to‬‭manage‬‭the‬‭number‬‭of‬‭SS‬‭Tables‬‭efficiently‬‭while‬
‭minimizing‬‭read‬‭and‬‭write‬‭amplification.‬
‭LSM‬‭Tree‬‭Enhancements:‬
‭●‬ ‭Various‬‭optimizations,‬‭like‬‭summary‬‭tables‬‭and‬‭Bloom‬‭filters,‬‭are‬
‭employed‬‭to‬‭improve‬‭lookup‬‭performance‬‭and‬‭reduce‬‭I/O‬‭operations.‬
‭●‬ ‭Summary‬‭tables‬‭store‬‭metadata‬‭about‬‭disk‬‭blocks,‬‭enabling‬‭skipping‬
‭unnecessary‬‭searches.‬
‭●‬ ‭Bloom‬‭filters‬‭help‬‭in‬‭determining‬‭whether‬‭a‬‭key‬‭exists‬‭in‬‭a‬‭level‬‭without‬
‭performing‬‭exhaustive‬‭searches,‬‭reducing‬‭I/O.‬
‭Drawbacks‬‭of‬‭LSM‬‭Trees‬‭:‬
‭●‬ ‭Despite‬‭their‬‭advantages,‬‭LSM‬‭Trees‬‭have‬‭drawbacks,‬‭particularly‬‭related‬
‭to‬‭the‬‭resource-intensive‬‭nature‬‭of‬‭compaction.‬
‭●‬ ‭Compaction‬‭involves‬‭compression/decompression‬‭of‬‭data,‬‭which‬‭can‬
‭impact‬‭read‬‭and‬‭write‬‭performance.‬
‭●‬ ‭Additionally,‬‭reads‬‭can‬‭be‬‭slow‬‭in‬‭the‬‭worst-case‬‭scenario‬‭due‬‭to‬‭the‬
‭append-only‬‭nature‬‭of‬‭LSM‬‭Trees.‬
I‭n‬‭summary,‬‭LSM‬‭Trees‬‭play‬‭a‬‭crucial‬‭role‬‭in‬‭enabling‬‭NoSQL‬‭databases‬‭to‬‭handle‬‭high‬
‭write‬‭rates‬‭efficiently.‬‭They‬‭achieve‬‭this‬‭through‬‭a‬‭combination‬‭of‬‭append-only‬‭storage,‬
‭efficient‬‭flushing‬‭mechanisms,‬‭and‬‭compaction‬‭strategies.‬‭However,‬‭mitigating‬‭the‬
‭drawbacks‬‭associated‬‭with‬‭compaction‬‭remains‬‭a‬‭challenge‬‭for‬‭optimizing‬‭LSM‬‭Tree‬
‭performance.‬
‭Resources‬
‭ ‬ ‭https://www.youtube.com/watch?v=I6jB0nM9SKU&ab_channel=ByteByteGo‬
●
‭●‬
‭BTree‬‭Storage‬‭Engine‬
I‭n‬‭a‬‭B-tree‬‭index‬‭structure,‬‭a‬‭root‬‭page‬‭serves‬‭as‬‭the‬‭starting‬‭point‬‭for‬‭key‬‭lookups.‬‭Each‬
‭page‬‭contains‬‭keys‬‭and‬‭references‬‭to‬‭child‬‭pages,‬‭with‬‭each‬‭child‬‭responsible‬‭for‬‭a‬
‭specific‬‭key‬‭range.‬‭To‬‭find‬‭a‬‭key,‬‭you‬‭follow‬‭the‬‭reference‬‭corresponding‬‭to‬‭the‬‭key's‬
‭range‬‭until‬‭you‬‭reach‬‭a‬‭leaf‬‭page‬‭containing‬‭individual‬‭keys‬‭and‬‭their‬‭values.‬
‭ he‬‭number‬‭of‬‭references‬‭to‬‭child‬‭pages‬‭in‬‭a‬‭page‬‭is‬‭called‬‭the‬‭branching‬‭factor,‬
T
‭typically‬‭several‬‭hundred.‬‭To‬‭update‬‭a‬‭value‬‭for‬‭an‬‭existing‬‭key,‬‭you‬‭locate‬‭the‬‭leaf‬‭page‬
‭containing‬‭the‬‭key,‬‭update‬‭the‬‭value,‬‭and‬‭write‬‭the‬‭page‬‭back‬‭to‬‭disk.‬‭Adding‬‭a‬‭new‬‭key‬
‭involves‬‭finding‬‭the‬‭appropriate‬‭page‬‭and‬‭adding‬‭it‬‭there.‬‭If‬‭there's‬‭insufficient‬‭space,‬
‭the‬‭page‬‭is‬‭split‬‭into‬‭two,‬‭and‬‭the‬‭parent‬‭page‬‭is‬‭updated‬‭to‬‭reflect‬‭the‬‭new‬‭key‬‭ranges.‬
‭ his‬‭process‬‭ensures‬‭efficient‬‭key‬‭lookups‬‭and‬‭updates‬‭in‬‭the‬‭B-tree‬‭structure,‬‭even‬‭as‬
T
‭the‬‭dataset‬‭grows.‬
‭Notes‬
‭●‬ ‭During‬‭page‬‭split,‬‭the‬‭write‬‭amplification‬‭can‬‭be‬‭high‬
‭●‬ ‭A‬‭four-level‬‭tree‬‭of‬‭4‬‭KB‬‭pages‬‭with‬‭a‬‭branching‬‭factor‬‭of‬‭500‬‭can‬‭store‬‭up‬‭to‬‭250‬‭TB.‬
‭●‬ ‭In‬‭order‬‭to‬‭make‬‭the‬‭database‬‭resilient‬‭to‬‭crashes,‬‭it‬‭is‬‭common‬‭for‬‭B-tree‬
‭implementations‬‭to‬‭include‬‭an‬‭additional‬‭data‬‭structure‬‭on‬‭disk:‬‭a‬‭write-ahead‬‭log‬‭(WAL,‬
‭also‬‭known‬‭as‬‭a‬‭redo‬‭log).‬
‭●‬ ‭Comparison‬‭with‬‭LSM‬‭Trees‬
‭○‬ ‭LSM‬‭compaction‬‭can‬‭be‬‭heavy‬‭and‬‭hurt‬‭the‬‭tail‬‭latencies‬‭similar‬‭to‬‭java‬‭garbage‬
‭collection‬
‭○‬ ‭The‬‭disk‬‭bandwidth‬‭is‬‭shared‬‭for‬‭both‬‭writes‬‭(i.e.‬‭appends)‬‭and‬‭compaction‬
‭OLAP‬‭Databases‬
‭ ome‬‭of‬‭the‬‭columns‬‭in‬‭the‬‭fact‬‭table‬‭are‬‭attributes,‬‭such‬‭as‬‭the‬‭price‬‭at‬‭which‬‭the‬‭product‬‭was‬
S
‭sold‬‭and‬‭the‬‭cost‬‭of‬‭buying‬‭it‬‭from‬‭the‬‭supplier‬‭(allowing‬‭the‬‭profit‬‭margin‬‭to‬‭be‬‭calculated).‬
‭Other‬‭columns‬‭in‬‭the‬‭fact‬‭table‬‭are‬‭foreign‬‭key‬‭references‬‭to‬‭other‬‭tables,‬‭called‬‭dimension‬
‭tables‬‭.‬‭As‬‭each‬‭row‬‭in‬‭the‬‭fact‬‭table‬‭represents‬‭an‬‭event,‬‭the‬‭dimensions‬‭represent‬‭the‬‭who‬‭,‬
‭what‬‭,‬‭where‬‭,‬‭when‬‭,‬‭how‬‭,‬‭and‬‭why‬‭of‬‭the‬‭event.‬
‭ he‬‭name‬‭“star‬‭schema”‬‭comes‬‭from‬‭the‬‭fact‬‭that‬‭when‬‭the‬‭table‬‭relationships‬‭are‬‭visualized,‬‭the‬
T
‭fact‬‭table‬‭is‬‭in‬‭the‬‭middle,‬‭surrounded‬‭by‬‭its‬‭dimension‬‭tables;‬‭the‬‭connections‬‭to‬‭these‬‭tables‬‭are‬
‭like‬‭the‬‭rays‬‭of‬‭a‬‭star.‬
‭ ‬‭variation‬‭of‬‭this‬‭template‬‭is‬‭known‬‭as‬‭the‬‭snowflake‬‭schema‬‭,‬‭where‬‭dimensions‬‭are‬‭further‬
A
‭broken‬‭down‬‭into‬‭subdimensions.‬
‭Columnar‬‭storage‬
‭ lthough‬‭fact‬‭tables‬‭are‬‭often‬‭over‬‭100‬‭columns‬‭wide,‬‭a‬‭typical‬‭data‬‭warehouse‬‭query‬‭only‬
A
‭accesses‬‭4‬‭or‬‭5‬‭of‬‭them‬‭at‬‭one‬‭time‬‭(‭
"
‬ SELECT‬‭
*"‬‭queries‬‭are‬‭rarely‬‭needed‬‭for‬‭analytics)‬
Download