AWS Foundation Amazon DynamoDB Copyright IntelliPaat, All rights reserved Agenda 1 NoSQL Databases 2 DynamoDB 3 Components 4 API 5 Provisioned Throughput 6 Partitions Copyright IntelliPaat, All rights reserved NoSQL Databases • NoSQL (“not only” SQL) database is non-relational and largely distributed database. • Provides high scalability and availability. Provides structured, semi-structured and non-structured schemas. • Types: o Graph Database: A network database that uses edges and nodes to represent and store data. Neo4j, Titan etc. o Key Value Store: Schema-less format, data stored as key value. Riak, DynamoDB. o Document based: Data stored as JSON documents. MongoDB, CouchDB. o Column based: Each storage block contains data from only one column. Cassandra, HBase. • Why NoSQL?? Copyright IntelliPaat, All rights reserved DynamoDB Concepts • • Fully managed NoSQL database. No Database Administration required. • Scaling can be done at individual table level. • DynamoDB automatically spreads the data and traffic for tables over a sufficient number of servers to handle throughput and storage requirements. • Data is stored on high performance SSDs and replicated across multiple AZs in a Region. Copyright IntelliPaat, All rights reserved Components • • • Tables – Data are stored in tables. Items – Rows. Attributes – Columns. Copyright IntelliPaat, All rights reserved Demo & Lab • • • AWS Management Console AWS CLI (Command Line Interface) Python Boto3 Copyright IntelliPaat, All rights reserved JSON • Introduction to JSON – Java Script Object Notation. { “EmpID” : 12345 , “EmpName” : “xyz” , “Address” : { “Building” : “Bldg-1” , “Street” : ”40/1 Blvd” , “ZipCode” : 654321 }, “Skills” : [ “AWS” , “Java” , “Oracle” ] , “cars” : [ } { “name” : “Toyota” , “models” : [ “Prius” , “Camry” ,“Corolla”] } , { “name” : “Honda” , “models” : [ “Accord” , “Civic” ] } , { “name” : “Jeep” } ] Copyright IntelliPaat, All rights reserved JSON • Previous Record { “EmpID” : 12345 , “EmpName” : “xyz” , “Address” : { “Building” : “Bldg-1” , “Street” : “40/1 Blvd” , “ZipCode” : 654321 , }, “Skills” : [ “AWS” , “Java” , “Oracle” ] , “cars” : [ { “name” : “Toyota” , “models” : [ “Prius” , “Camry” , “Corolla”] } , { “name” : “Honda” , “models” : [ “Accord” , “Civic” ] } , { “name” : “Jeep” } ] } Copyright IntelliPaat, All rights reserved Attributes • Primary Key – MUST be mentioned while creating a table. Unique for each item. Partition Key (Hash Attribute). Sort Key (Range Attribute). P K a b 11 P Q 22 R S P-1 Partition-1 Partition-2 Partition-3 Copyright IntelliPaat, All rights reserved Secondary Indexes • Additional to primary keys for faster query performance. • Types: Global Secondary Indexes (GSI) Local Secondary Indexes (LSI) Max 5 Global and Local secondary indexes per table. • Indexes are updated automatically when base table is modified. • While creating an index attributes which are to be copied should be mentioned. If nothing is mentioned DynamoDB will copy the key attributes at the least. • Index can be thought of a separate table sitting differently than the base table. Copyright IntelliPaat, All rights reserved Consistency Model • DynamoDB data is replicated across multiple AZs in a Region. • Eventual Consistency • Strong Consistency Partition-1 Partition-1 Replication-1 Partition-1 Replication-2 Copyright IntelliPaat, All rights reserved Capacity Units • While creating a table, throughput capacity can be provisioned in terms of READS and WRITES. • Read Capacity Unit (RCU) Strong Consistent Read: One read per second, for item size up to 4KB Eventual Consistent Read: Two reads per second, for item size up to 4KB. OR, one read per second for item size from 4KB+ to 8KB. • Write Capacity Unit (WCU) One write per second, for item size up to 1KB. • Limits (Soft) – Initial limits on provisioned throughput N. Virginia: 40000 RCU and 40000 WCU per table. 80000 RCU and WCU per account. Other: 10000 RCU and WCU per table. 20000 RCU and WCU per account. • “ProvisionedThroughputExceededException”. Copyright IntelliPaat, All rights reserved Capacity Units - Example - Application Requirement: - 9338 Strongly Consistent reads per second. 5000 Writes per second Average Item Size – 10KB How much RCU/WCU should be provisioned while creating this table. - RCU needed to read one item per second – 10KB/4KB = 3 - RCU needed to read 9338 items per second – 9338 * 3 = 28014 (Only possible in N. Virginia region) - WCU needed to write one item per second – 10KB/1KB = 10 - WCU needed to write 5000 items per second – 5000 * 10 = 50000. This is not possible anywhere until limit is increased for the account with the help of AWS support. Copyright IntelliPaat, All rights reserved Partitions • Partition is a storage allocation for tables backed by SSDs. • Data distribution using primary key only. In the case of composite primary key with partition and sort key, data with same partition is always stored physically close together. • Initial Partitions – A partition can support a maximum of 3000 RCU or 1000 WCU and a data capacity of 10GB. 3000 RCU 1000 WCU 10 GB Copyright IntelliPaat, All rights reserved Partitions - Example Initial Partitions = (RCU/3000) + (WCU/1000) • If the size per partition goes beyond 10GB or Provisioned throughputs are increased DynamoDB doubles current no. of partitions. • Throughput and data are evenly distributed across partitions. • Rebalancing is done automatically by DynamoDB in the background, without any application impact at all. RCU = 8000 WCU = 1500 (8000/3000 = 3) + (1500/1000 = 2) =5 RCU=1600 WCU = 300 RCU=1600 WCU = 300 RCU=1600 WCU = 300 RCU=1600 WCU = 300 RCU=1600 WCU = 300 Copyright IntelliPaat, All rights reserved DynamoDB API • DynamoDB operations has to be done by provided APIs CreateTable – Create a new table. Can be used to create indexes as well. DescribeTable – Returns information about tables. ListTables – Returns all tables. UpdateTable - Modifies the settings of a table or its indexes, creates or remove new indexes on a table, or modifies DynamoDB Streams settings for a table. DeleteTable – Removes table and its dependent objects. PutItem – Writes a single item, primary key must be specified. BatchWriteItem - Writes up to 25 items to a table. GetItem – Retrieves a single item with Primary Key. BatchGetItem – Retrieves up to 100 items from a table. Query - Retrieves all of the items that have a specific partition key. Scan - Retrieves all of the items in the specified table or index. UpdateItem - Modifies one or more attributes in an item when Primary Key is provided. DeleteItem – Deletes a single item with a specific Primary Key. Copyright IntelliPaat, All rights reserved