NoSQL and DynamoDB Rick Houlihan Principal Solutions Architect Amazon Web Services October 2015 WWW.AWSEDUCATE.COM What to expect from the session • Brief history of data processing • Introduction to NoSQL • DynamoDB Internals • Tables, API, data types, indexes • Scaling and data modeling • Design patterns and best practices • Event driven applications and DDB Streams • Reference architecture What is a Database? “A structured set of data held in a computer, especially one that is accessible in various ways.” - Google “A database is an organized mechanism for storing, managing and retrieving information.” – About.com “A place to put stuff my app needs.” – Average Developer Data Pressure Timeline of Database Technology Data Volume Since 2010 Data Volume • 90% of stored data generated in last 2 years • 1 Terabyte of data in 2010 equals 6.5 Petabytes today • Linear correlation between data pressure and technical innovation • No reason these trends will not continue over time Historical Current Why NoSQL? SQL NoSQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale The Iron Triangle of Data – All About CAP Availability: All clients can always read and write Data Models: Relational Column Oriented Document Key/Value A CA MSSQL Oracle DB2 Postgres MySQL AP Aster Data Greenplum Vertica Voldemort Cassandra Tokyo Cabinet SimpleDB KAI CouchDB Riak Pick Two C Consistency: All clients always have the same view of data P Partition Tolerance: CP Big Table Hypertable Hbase MongoDB Terastore Couchbase Scalaris DynamoDB BerkeleyDB Memcache Redis The system works well despite physical network partitions Partition Management for AP Systems State: S State: S’ State: S1 Partition Recovery Operations on S Time State: S2 Partition Mode Partition Starts SQL vs. NoSQL Access Pattern Technology Adoption and the Hype Curve Amazon DynamoDB Fully Managed NoSQL Fast and Consistent Document or Key-Value Access Control Scales to Any Workload Event Driven Programming Tables, API, Data Types Table and item API Admin CRUD Create Table Put/Get Item Update Table Batch Put/Get Item Delete Table Update Item Describe Table Delete Item Query Scan DynamoDB Streams API ListStreams DescribeStream GetShardIterator GetRecords Data types String (S) Number (N) Binary (B) String Set (SS) Number Set (NS) Binary Set (BS) Boolean (BOOL) Null (NULL) List (L) Map (M) Used for storing nested JSON documents Table Table Items Attributes Mandatory Key-value access pattern Determines data distribution Hash Range Key Key Optional Model 1:N relationships Enables rich query capabilities All items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values paged responses Hash table Hash key uniquely identifies an item Hash key is used for building an unordered hash index Table can be partitioned for scale 0000 Id = 1 Name = Jim Id = 2 Name = Andy Dept = Eng Id = 3 Name = Kim Dept = Ops Hash (1) = 7B Hash (2) = 48 Hash (3) = CD 54 55 Key Space A9 AA FF Hash-range table Hash key and range key together uniquely identify an Item Within unordered hash index, data is sorted by the range key No limit on the number of items (∞) per hash key • Except if you have local secondary indexes Partition 2 Partition 1 00:0 54:∞ Customer# = 2 Order# = 10 Item = Pen Customer# = 2 Order# = 11 Item = Shoes Hash (2) = 48 Partition 3 A9:∞ 55 Customer# = 1 Order# = 10 Item = Toy FF:∞ AA Customer# = 3 Order# = 10 Item = Book Customer# = 1 Order# = 11 Item = Boots Customer# = 3 Order# = 11 Item = Paper Hash (1) = 7B Hash (3) = CD Partitions are three-way replicated Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 1 Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 2 Id = 2 Name = Andy Dept = Engg Id = 1 Name = Jim Id = 3 Name = Kim Dept = Ops Replica 3 Partition 1 Partition 2 Partition N Indexes Local secondary index (LSI) Alternate range key attribute Index is local to a hash key (or partition) Table A1 A3 A2 (hash) (range) (table key) LSIs 10 GB max per hash key, i.e. LSIs limit the # of range keys! A1 A2 A3 A4 A5 (hash) (range) KEYS_ONLY A1 A4 A2 A3 INCLUDE A3 (hash) (range) (table key) (projected) A1 A5 A2 A3 A4 ALL (hash) (range) (table key) (projected) (projected) Global secondary index (GSI) Online indexing Alternate hash (+range) key Index is across all table hash keys (partitions) Table A2 A1 (hash) (table key) GSIs RCUs/WCUs provisioned separately for GSIs A1 A2 A3 A4 A5 (hash) KEYS_ONLY A5 A4 A1 A3 (hash) (range) (table key) (projected) INCLUDE A3 A4 A5 A1 A2 A3 (hash) (range) (table key) (projected) (projected) ALL How do GSI updates work? Client Table Primary Primary Primary table Global Primary table table Secondary table 2. Asynchronous update (in progress) Index If GSIs don’t have enough write capacity, table writes will be throttled! LSI or GSI? LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI! Scaling Scaling Throughput • Provision any amount of throughput to a table Size • Add any number of items to a table • Max item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit Scaling is achieved through partitioning Throughput Provisioned at the table level • Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second • RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads Read and write throughput limits are independent RCU WCU What causes throttling? If sustained throughput goes beyond provisioned throughput per partition Non-uniform workloads • Hot keys/hot partitions • Very large bursts Mixing hot data with cold data • Use a table per time period Overly partitioned tables • If sustained throughput > partition limit, DynamoDB may throttle requests • Solution: Increase provisioned throughput or restructure data Partition What bad NoSQL looks like… Heat Time Getting the most out of DynamoDB throughput “To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.” —DynamoDB Developer Guide Space: access is evenly spread over the key-space Time: requests arrive evenly spaced in time Much better picture… A global leader in retargeting More than 10,000 active advertisers in >100 countries • • • • • • Provisioned for over 1M transactions per second 4 regions in use with live traffic replication 120B+ key fetches worldwide per day (RTB) 1.5TB of data stored per region 30B+ items stored in reach region <3ms uniform query latency, <10ms 99.95% DynamoDB • Simple video monitoring & security • Fast growth – “suddenly petabytes” Switch to DynamoDB • More inbound video than YouTube Move to AWS cameras 2009 2010 2011 2012 2013 DynamoDB reduces delivery time for video events from 5-10 secs to 50 milliseconds Online Gaming “DynamoDB came along at just the right time, and Halfbrick switched to storing our game data in DynamoDB, which alleviated our scaling problems while also freeing us from the burden of managing all the underlying hardware and software. We love that DynamoDB handles so much of the management for us, freeing us to focus on development.” Data modeling 1:1 relationships or key-values Use a table or GSI with a hash key Use GetItem or BatchGetItem API Example: Given an SSN or license number, get attributes Users Table Hash key SSN = 123-45-6789 SSN = 987-65-4321 Attributes Email = johndoe@nowhere.com, License = TDL25478134 Email = maryfowler@somewhere.com, License = TDL78309234 Users-Email-GSI Hash key Attributes License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321 License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789 1:N relationships or parent-children Use a table or GSI with hash and range key Use Query API Example: • Given a device, find all readings between epoch X, Y Device-measurements Hash Key Range key Attributes DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90 DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90 N:M relationships Use a table and GSI with hash and range key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users. User-Games-Table Hash Key Range key UserId = bob GameId = Game1 UserId = fred GameId = Game2 UserId = bob GameId = Game3 Game-Users-GSI Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob Documents (JSON) New data types (M, L, BOOL, NULL) introduced to support JSON Document SDKs • Simple programming model • Conversion to/from JSON • Java, JavaScript, Ruby, .NET Cannot index (S,N) elements of a JSON object stored in M • Only top-level table attributes can be used in LSIs and GSIs without Streams/Lambda JavaScript DynamoDB string S number N boolean BOOL null NULL array L object M Rich expressions Projection expression • Query/Get/Scan: ProductReviews.FiveStar[0] Filter expression • Query/Scan: #VIEWS > :num Conditional expression • Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar) Update expression • UpdateItem: set Replies = Replies + :num Scenarios and best practices Event logging Storing time series data Older tables RCUs = 10000 WCUs = 10000 Events_table_2015_March Event_id Timestamp Attribute1 …. Attribute N (Hash key) (range key) RCUs = 1000 WCUs = 100 Events_table_2015_Feburary Event_id Timestamp Attribute1 …. Attribute N (Hash key) (range key) RCUs = 100 WCUs = 1 Events_table_2015_January Event_id Timestamp Attribute1 …. Attribute N (Hash key) (range key) RCUs = 10 WCUs = 1 Don’t mix hot and cold data; archive cold data to Amazon S3 Cold data Current table Events_table_2015_April Event_id Timestamp Attribute1 …. Attribute N (Hash key) (range key) Hot data Time series tables Use a table per time period Pre-create daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables Dealing with time series data Product catalog Popular items (read) Scaling bottlenecks SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT" Shoppers Partition 1 2000 RCUs Partition K 2000 RCUs Partition M 2000 RCUs Product B Product A ProductCatalog Table Partition 50 2000 RCU Requests Per Second Request Distribution Per Hash Key Item Primary Key DynamoDB Requests Cache popular items SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT" User User DynamoDB Partition 1 Partition 2 ProductCatalog Table Requests Per Second Request Distribution Per Hash Key Item Primary Key DynamoDB Requests Cache Hits Messaging app Large items Filters vs. indexes M:N Modeling—inbox and outbox David Messages App Inbox SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC Outbox Messages Table SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC Large and small attributes mixed Inbox David Hash key Range key Messages Table Recipient Date Sender Message David 2014-10-02 Bob … … 48 more messages for David … David 2014-10-03 Alice … Alice 2014-09-28 Bob … Alice 2014-10-01 Carol … (Many more messages) SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC 50 items × 256 KB each Large message bodies Attachments Computing inbox query cost 50 * 256KB * (1 RCU / 4KB) Average item size Items evaluated by query * (1 / 2) = 1600 RCU Eventually consistent reads Conversion ratio Uniformly distributes large item reads Separate the bulk data (50 sequential items at 128 bytes) David 1. Query Inbox-GSI: 1 RCU 2. BatchGetItem Messages: 1600 RCU (50 separate items at 256 KB) Inbox-GSI Messages Table Recipient Date Sender Subject MsgId MsgId Body David 2014-10-02 Bob Hi!… afed 9d2b … David 2014-10-03 Alice RE: The… 3kf8 3kf8 … Alice 2014-09-28 Bob FW: Ok… 9d2b ct7r … Alice 2014-10-01 Carol Hi!... ct7r afed … Inbox GSI Define which attributes to copy into the index Outbox GSI Outbox Sender SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC Messaging app David Inbox Outbox Inbox Global secondary index Outbox Global secondary index Messages Table Distribute large items Reduce one-to-many item sizes Configure secondary index projections Use GSIs to model M:N relationship between sender and recipient Outbox Messages Querying many large items at once Inbox Multiplayer online gaming Query filters vs. composite key indexes Common game back-end concepts Think in terms of APIs HTTP + JSON Get friends, leaderboard Binary asset data Multiplayer servers High availability Scalability Core (HA) game back end • • • • • Choose region >=2 Availability Zones Amazon EC2 for app Elastic Load Balancing Amazon RDS database • Multi-AZ ELB Region Scale it way out • Amazon S3 for game data • Assets • UGC • Analytics ELB Region Scale it way out CloudFront CDN • Amazon S3 for game data • Assets • UGC • Analytics • ... With Amazon CloudFront! ELB Region Scale it way out CloudFront CDN • Amazon S3 for game data • Assets • UGC • Analytics • ... with CloudFront! • Auto Scaling group • Capacity on demand • Respond to users • Automatic healing ELB Region Scale it way out • Amazon S3 for game data • Assets • UGC • Analytics • ... with CloudFront! • Auto Scaling group • Capacity on demand • Respond to users • Automatic healing • Amazon ElastiCache • Memcached • Redis CloudFront CDN ELB Region Writing is painful • • • • • CloudFront CDN Games are write heavy Caching of limited use Key value Binary structures Database = bottleneck ELB Region Sharding (not fun) Amazon DynamoDB • • • • • • CloudFront CDN Fully managed NoSQL data store Provisioned throughput Secondary indexes PUT/GET keys Document support! ELB Region Example: Leaderboard in DynamoDB UserID (hash key) BoardName (range key) TopScore TopScoreDate "101" "Galaxy Invaders" 5842 "2014-09-15T17:24:31" "101" "Meteor Blasters" 1000 "2014-10-22T23:18:01" "101" "Starship X" 24 "2014-08-31T13:14:21" "102" "Alien Adventure" 192 "2014-07-12T11:07:56" "102" "Galaxy Invaders" 0 "2014-09-18T07:33:42" "103" "Attack Ships" 3 "2014-10-19T01:13:24" "103" "Galaxy Invaders" 2317 "2014-09-11T06:53:00" "103" "Meteor Blasters" 723 "2014-10-19T01:14:24" "103" "Starship X" 42 "2014-07-11T06:53:03" • Hash key = Primary key • Range key = Sort key • Others attributes are undefined • So… How to sort based on top score? Leaderboard with secondary indexes UserID (hash key) BoardName (range key) "101" "Galaxy Invaders" BoardName (hash key) "Alien Adventure" TopScore TopScore (range key) 5842 UserID 192 "101" "Attack Ships" 3 "103" "Galaxy Invaders" 0 "102" "Galaxy Invaders" 2317 "103" "Galaxy Invaders" 5842 "101" "Meteor Blasters" 723 "103" "Meteor Blasters" 1000 "101" "Starship X" 24 "101" "Starship X" 42 "103" TopScoreDate "2014-09-15T17:24:31" • • • • • Create a secondary index! Set hash key to BoardName Set range key to TopScore Project extra attributes as needed Can now query by BoardName, sorted by TopScore • Handles many common gaming use cases Sparse indexes Scan sparse hash GSIs Game-scores-table Id (Hash) User Game Score Date 1 Bob G1 1300 2012-12-23 2 3 Bob Jay G1 G1 1450 2012-12-23 1600 2012-12-24 4 5 6 Mary G1 Ryan G2 Jones G2 2000 2012-10-24 123 2012-03-10 345 2012-03-20 Award-GSI Award Champ Award (Hash) Id User Score Champ 4 Mary 2000 Real-Time voting Write-heavy items Requirements for voting Allow each person to vote only once No changing votes Real-time aggregation Voter analytics, demographics Real-time voting architecture RawVotes Table Voters Voting App AggregateVotes Table Scaling bottlenecks Voters Provision 200,000 WCUs Partition 1 1000 WCUs Partition K 1000 WCUs Partition M 1000 WCUs Candidate B Candidate A Votes Table Partition N 1000 WCUs Write sharding Voter Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_8 Candidate B_4 Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2 Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 Write sharding Voter UpdateItem: “CandidateA_” + rand(0, 10) ADD 1 to Votes Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_8 Candidate B_4 Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2 Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 Shard aggregation Periodic Process 2. Store Voter 1. Sum Candidate A_1 Candidate A_4 Candidate A_7 Candidate B_5 Candidate B_1 Candidate A_5 Candidate B_8 Candidate B_4 Candidate A Total: 2.5M Candidate B_3 Candidate B_7 Candidate A_3 Candidate A_2 Candidate A_6 Candidate A_8 Votes Table Candidate B_2 Candidate B_6 Shard write-heavy hash keys Trade off read cost for write scalability Consider throughput per hash key and per partition Your write workload is not horizontally scalable Correctness in voting 1. Record vote and de-dupe; retry 2. Increment candidate counter Voter RawVotes Table AggregateVotes Table UserId Candidate Date Segment Votes Alice A 2013-10-02 A_1 23 Bob B 2013-10-02 B_2 12 Eve B 2013-10-02 B_1 14 Chuck A 2013-10-02 A_2 25 Correctness in aggregation? Voter RawVotes Table AggregateVotes Table UserId Candidate Date Segment Votes Alice A 2013-10-02 A_1 23 Bob B 2013-10-02 B_2 12 Eve B 2013-10-02 B_1 14 Chuck A 2013-10-02 A_2 25 DynamoDB Streams DynamoDB Streams Stream of updates to a table Asynchronous Exactly once Strictly ordered • Per item Highly durable • Scale with table 24-hour lifetime Sub-second latency View types UpdateItem (Name = John, Destination = Pluto) View Type Destination Old image—before update Name = John, Destination = Mars New image—after update Name = John, Destination = Pluto Old and new images Name = John, Destination = Mars Name = John, Destination = Pluto Keys only Name = John DynamoDB Streams and Amazon Kinesis Client Library Partition 1 Shard 1 KCL Worker Shard 2 KCL Worker Partition 2 DynamoDB Client Application Updates Partition 3 Shard 3 KCL Worker Shard 4 KCL Worker Partition 4 Table Partition 5 Table Stream Amazon Kinesis Client Library Application Cross-region replication US East (N. Virginia) DynamoDB Streams Asia Pacific (Sydney) Open Source CrossRegion Replication Library EU (Ireland) Replica DynamoDB Streams and AWS Lambda Triggers Real-time voting architecture (improved) AggregateVotes Table Voters Voting App RawVotes Table RawVotes DynamoDB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR Real-time voting architecture Handle any scale of election Voters Voting App AggregateVotes Table RawVotes Table RawVotes DynamoDB Stream Amazon Redshift Your Amazon KinesisEnabled App Amazon EMR Real-time voting architecture Vote only once, no changing votes Voters Voting App AggregateVotes Table RawVotes Table RawVotes DynamoDB Stream Amazon Redshift Your Amazon KinesisEnabled app Amazon EMR Real-time voting architecture Real-time, fault-tolerant, scalable aggregation Voters Voting app RawVotes Table AggregateVotes Table RawVotes DynamoDB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR Real-time voting architecture Voter analytics, statistics AggregateVotes Table Voters Voting app RawVotes Table RawVotes DynamoDB Stream Amazon Redshift Your Amazon Kinesis– Enabled App Amazon EMR Analytics with DynamoDB Streams Collect and de-dupe data in DynamoDB Aggregate data in-memory and flush periodically Performing real-time aggregation and analytics Architecture Reference Architecture Elastic Event Driven Applications Thank You WWW.AWSEDUCATE.COM