Deck Guidelines - Amazon S3

advertisement
NoSQL and DynamoDB
Rick Houlihan
Principal Solutions Architect
Amazon Web Services
October 2015
WWW.AWSEDUCATE.COM
What to expect from the session
• Brief history of data processing
• Introduction to NoSQL
• DynamoDB Internals
• Tables, API, data types, indexes
• Scaling and data modeling
• Design patterns and best practices
• Event driven applications and DDB Streams
• Reference architecture
What is a Database?
“A structured set of data held in a computer, especially one
that is accessible in various ways.” - Google
“A database is an organized mechanism for storing, managing
and retrieving information.” – About.com
“A place to put stuff my app needs.” – Average Developer
Data Pressure
Timeline of Database Technology
Data Volume Since 2010
Data Volume
• 90% of stored data generated in
last 2 years
• 1 Terabyte of data in 2010 equals
6.5 Petabytes today
• Linear correlation between data
pressure and technical innovation
• No reason these trends will not
continue over time
Historical
Current
Why NoSQL?
SQL
NoSQL
Optimized for storage
Optimized for compute
Normalized/relational
Denormalized/hierarchical
Ad hoc queries
Instantiated views
Scale vertically
Scale horizontally
Good for OLAP
Built for OLTP at scale
The Iron Triangle of Data – All About CAP
Availability:
All clients can always
read and write
Data Models:
Relational
Column Oriented
Document
Key/Value
A
CA
MSSQL
Oracle
DB2
Postgres
MySQL
AP
Aster Data
Greenplum
Vertica
Voldemort
Cassandra
Tokyo Cabinet SimpleDB
KAI
CouchDB
Riak
Pick Two
C
Consistency:
All clients always have
the same view of data
P Partition Tolerance:
CP
Big Table
Hypertable
Hbase
MongoDB
Terastore
Couchbase
Scalaris
DynamoDB
BerkeleyDB
Memcache
Redis
The system works well
despite physical network
partitions
Partition Management for AP Systems
State: S
State: S’
State: S1
Partition
Recovery
Operations on S
Time
State: S2
Partition Mode
Partition Starts
SQL vs. NoSQL Access Pattern
Technology Adoption and the Hype Curve
Amazon DynamoDB
Fully Managed NoSQL
Fast and Consistent
Document or Key-Value
Access Control
Scales to Any Workload
Event Driven Programming
Tables, API, Data Types
Table and item API
Admin
CRUD
Create Table
Put/Get Item
Update Table
Batch Put/Get Item
Delete Table
Update Item
Describe Table
Delete Item
Query
Scan
DynamoDB
Streams API
ListStreams
DescribeStream
GetShardIterator
GetRecords
Data types
String (S)
Number (N)
Binary (B)
String Set (SS)
Number Set (NS)
Binary Set (BS)
Boolean (BOOL)
Null (NULL)
List (L)
Map (M)
Used for storing nested JSON documents
Table
Table
Items
Attributes
Mandatory
Key-value access pattern
Determines data distribution
Hash Range
Key Key
Optional
Model 1:N relationships
Enables rich query capabilities
All items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top/bottom N values
paged responses
Hash table
Hash key uniquely identifies an item
Hash key is used for building an unordered hash index
Table can be partitioned for scale
0000
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Eng
Id = 3
Name = Kim
Dept = Ops
Hash (1) = 7B
Hash (2) = 48
Hash (3) = CD
54
55
Key Space
A9
AA
FF
Hash-range table
Hash key and range key together uniquely identify an Item
Within unordered hash index, data is sorted by the range key
No limit on the number of items (∞) per hash key
•
Except if you have local secondary indexes
Partition 2
Partition 1
00:0
54:∞
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Hash (2) = 48
Partition 3
A9:∞
55
Customer# = 1
Order# = 10
Item = Toy
FF:∞
AA
Customer# = 3
Order# = 10
Item = Book
Customer# = 1
Order# = 11
Item = Boots
Customer# = 3
Order# = 11
Item = Paper
Hash (1) = 7B
Hash (3) = CD
Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 1
Name = Jim
Id = 3
Name = Kim
Dept = Ops
Replica 1
Id = 2
Name = Andy
Dept = Engg
Id = 1
Name = Jim
Id = 3
Name = Kim
Dept = Ops
Replica 2
Id = 2
Name = Andy
Dept = Engg
Id = 1
Name = Jim
Id = 3
Name = Kim
Dept = Ops
Replica 3
Partition 1
Partition 2
Partition N
Indexes
Local secondary index (LSI)
Alternate range key attribute
Index is local to a hash key (or partition)
Table
A1
A3
A2
(hash) (range) (table key)
LSIs
10 GB max per hash
key, i.e. LSIs limit the
# of range keys!
A1
A2 A3 A4 A5
(hash) (range)
KEYS_ONLY
A1
A4
A2
A3
INCLUDE A3
(hash) (range) (table key) (projected)
A1
A5
A2
A3
A4
ALL
(hash) (range) (table key) (projected) (projected)
Global secondary index (GSI)
Online indexing
Alternate hash (+range) key
Index is across all table hash keys (partitions)
Table
A2
A1
(hash) (table key)
GSIs
RCUs/WCUs
provisioned separately
for GSIs
A1 A2 A3 A4 A5
(hash)
KEYS_ONLY
A5
A4
A1
A3
(hash) (range) (table key) (projected)
INCLUDE A3
A4
A5
A1
A2
A3
(hash) (range) (table key) (projected) (projected)
ALL
How do GSI updates work?
Client
Table
Primary
Primary
Primary
table
Global
Primary
table
table
Secondary
table
2. Asynchronous
update (in progress)
Index
If GSIs don’t have enough write capacity, table writes will be throttled!
LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use
GSI!
Scaling
Scaling
Throughput
• Provision any amount of throughput to a table
Size
• Add any number of items to a table
• Max item size is 400 KB
• LSIs limit the number of range keys due to 10 GB limit
Scaling is achieved through partitioning
Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
RCU
WCU
What causes throttling?
If sustained throughput goes beyond provisioned throughput per partition
Non-uniform workloads
• Hot keys/hot partitions
• Very large bursts
Mixing hot data with cold data
• Use a table per time period
Overly partitioned tables
• If sustained throughput > partition limit, DynamoDB may throttle requests
• Solution: Increase provisioned throughput or restructure data
Partition
What bad NoSQL looks like…
Heat
Time
Getting the most out of DynamoDB throughput
“To get the most out of DynamoDB
throughput, create tables where
the hash key element has a large
number of distinct values, and
values are requested fairly
uniformly, as randomly as
possible.”
—DynamoDB Developer Guide
Space: access is evenly spread
over the key-space
Time: requests arrive evenly
spaced in time
Much better picture…
A global leader in retargeting
More than 10,000 active
advertisers in >100 countries
•
•
•
•
•
•
Provisioned for over 1M transactions per second
4 regions in use with live traffic replication
120B+ key fetches worldwide per day (RTB)
1.5TB of data stored per region
30B+ items stored in reach region
<3ms uniform query latency, <10ms 99.95%
DynamoDB
• Simple video monitoring & security
• Fast growth – “suddenly petabytes”
Switch to
DynamoDB
• More inbound video than YouTube
Move to AWS
cameras
2009
2010
2011
2012
2013
DynamoDB reduces delivery time
for video events from 5-10 secs to
50 milliseconds
Online Gaming
“DynamoDB came along at just the right time, and Halfbrick
switched to storing our game data in DynamoDB, which
alleviated our scaling problems while also freeing us from the
burden of managing all the underlying hardware and software.
We love that DynamoDB handles so much of the
management for us, freeing us to focus on development.”
Data modeling
1:1 relationships or key-values
Use a table or GSI with a hash key
Use GetItem or BatchGetItem API
Example: Given an SSN or license number, get attributes
Users Table
Hash key
SSN = 123-45-6789
SSN = 987-65-4321
Attributes
Email = johndoe@nowhere.com, License = TDL25478134
Email = maryfowler@somewhere.com, License = TDL78309234
Users-Email-GSI
Hash key
Attributes
License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321
License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
1:N relationships or parent-children
Use a table or GSI with hash and range key
Use Query API
Example:
• Given a device, find all readings between epoch X, Y
Device-measurements
Hash Key
Range key
Attributes
DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90
DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
N:M relationships
Use a table and GSI with hash and range key elements
switched
Use Query API
Example: Given a user, find all games. Or given a game,
find all users.
User-Games-Table
Hash Key
Range key
UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game-Users-GSI
Hash Key
Range key
GameId = Game1 UserId = bob
GameId = Game2 UserId = fred
GameId = Game3 UserId = bob
Documents (JSON)
New data types (M, L, BOOL,
NULL) introduced to support
JSON
Document SDKs
• Simple programming model
• Conversion to/from JSON
• Java, JavaScript, Ruby, .NET
Cannot index (S,N) elements
of a JSON object stored in M
• Only top-level table attributes
can be used in LSIs and GSIs
without Streams/Lambda
JavaScript
DynamoDB
string
S
number
N
boolean
BOOL
null
NULL
array
L
object
M
Rich expressions
Projection expression
• Query/Get/Scan: ProductReviews.FiveStar[0]
Filter expression
• Query/Scan: #VIEWS > :num
Conditional expression
• Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar)
Update expression
• UpdateItem: set Replies = Replies + :num
Scenarios and best practices
Event logging
Storing time series data
Older tables
RCUs = 10000
WCUs = 10000
Events_table_2015_March
Event_id Timestamp Attribute1 …. Attribute N
(Hash key) (range key)
RCUs = 1000
WCUs = 100
Events_table_2015_Feburary
Event_id Timestamp Attribute1 …. Attribute N
(Hash key) (range key)
RCUs = 100
WCUs = 1
Events_table_2015_January
Event_id Timestamp Attribute1 …. Attribute N
(Hash key) (range key)
RCUs = 10
WCUs = 1
Don’t mix hot and cold data; archive cold data to Amazon S3
Cold data
Current table
Events_table_2015_April
Event_id Timestamp Attribute1 …. Attribute N
(Hash key) (range key)
Hot data
Time series tables
Use a table per time period
Pre-create daily, weekly, monthly tables
Provision required throughput for current table
Writes go to the current table
Turn off (or reduce) throughput for older tables
Dealing with time series data
Product catalog
Popular items (read)
Scaling bottlenecks
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"
Shoppers
Partition 1
2000 RCUs
Partition K
2000 RCUs
Partition M
2000 RCUs
Product B
Product A
ProductCatalog Table
Partition 50
2000 RCU
Requests Per Second
Request Distribution Per Hash Key
Item Primary Key
DynamoDB Requests
Cache
popular items
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"
User
User
DynamoDB
Partition 1
Partition 2
ProductCatalog Table
Requests Per Second
Request Distribution Per Hash Key
Item Primary Key
DynamoDB Requests
Cache Hits
Messaging app
Large items
Filters vs. indexes
M:N Modeling—inbox and outbox
David
Messages App
Inbox
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
Outbox
Messages
Table
SELECT *
FROM Messages
WHERE Sender ='David'
LIMIT 50
ORDER BY Date DESC
Large and small attributes mixed
Inbox
David
Hash key
Range key
Messages Table
Recipient
Date
Sender
Message
David
2014-10-02
Bob
…
… 48 more messages for David …
David
2014-10-03
Alice
…
Alice
2014-09-28
Bob
…
Alice
2014-10-01
Carol
…
(Many more messages)
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
50 items × 256 KB each
Large message bodies
Attachments
Computing inbox query cost
50
*
256KB
* (1 RCU / 4KB)
Average item size
Items evaluated by query
* (1 / 2)
=
1600 RCU
Eventually consistent reads
Conversion ratio
Uniformly distributes large item reads
Separate the bulk data
(50 sequential items at 128 bytes)
David
1. Query Inbox-GSI: 1 RCU
2. BatchGetItem Messages: 1600 RCU
(50 separate items at 256 KB)
Inbox-GSI
Messages Table
Recipient
Date
Sender
Subject
MsgId
MsgId
Body
David
2014-10-02
Bob
Hi!…
afed
9d2b
…
David
2014-10-03
Alice
RE: The…
3kf8
3kf8
…
Alice
2014-09-28
Bob
FW: Ok…
9d2b
ct7r
…
Alice
2014-10-01
Carol
Hi!...
ct7r
afed
…
Inbox GSI
Define which attributes to copy into the index
Outbox GSI
Outbox
Sender
SELECT *
FROM Messages
WHERE Sender ='David'
LIMIT 50
ORDER BY Date DESC
Messaging app
David
Inbox
Outbox
Inbox
Global secondary
index
Outbox
Global secondary
index
Messages
Table
Distribute large items
Reduce one-to-many item sizes
Configure secondary index projections
Use GSIs to model M:N relationship
between sender and recipient
Outbox
Messages
Querying many large items at once
Inbox
Multiplayer online gaming
Query filters vs.
composite key indexes
Common game back-end concepts
Think in terms of APIs
HTTP + JSON
Get friends, leaderboard
Binary asset data
Multiplayer servers
High availability
Scalability
Core (HA) game back end
•
•
•
•
•
Choose region
>=2 Availability Zones
Amazon EC2 for app
Elastic Load Balancing
Amazon RDS database
• Multi-AZ
ELB
Region
Scale it way out
• Amazon S3 for game data
• Assets
• UGC
• Analytics
ELB
Region
Scale it way out
CloudFront
CDN
• Amazon S3 for game data
• Assets
• UGC
• Analytics
• ... With Amazon
CloudFront!
ELB
Region
Scale it way out
CloudFront
CDN
• Amazon S3 for game data
• Assets
• UGC
• Analytics
• ... with CloudFront!
• Auto Scaling group
• Capacity on demand
• Respond to users
• Automatic healing
ELB
Region
Scale it way out
• Amazon S3 for game data
• Assets
• UGC
• Analytics
• ... with CloudFront!
• Auto Scaling group
• Capacity on demand
• Respond to users
• Automatic healing
• Amazon ElastiCache
• Memcached
• Redis
CloudFront
CDN
ELB
Region
Writing is painful
•
•
•
•
•
CloudFront
CDN
Games are write heavy
Caching of limited use
Key value
Binary structures
Database = bottleneck
ELB
Region
Sharding
(not fun)
Amazon DynamoDB
•
•
•
•
•
•
CloudFront
CDN
Fully managed
NoSQL data store
Provisioned throughput
Secondary indexes
PUT/GET keys
Document support!
ELB
Region
Example: Leaderboard in DynamoDB
UserID
(hash key)
BoardName
(range key)
TopScore
TopScoreDate
"101"
"Galaxy Invaders"
5842
"2014-09-15T17:24:31"
"101"
"Meteor Blasters"
1000
"2014-10-22T23:18:01"
"101"
"Starship X"
24
"2014-08-31T13:14:21"
"102"
"Alien Adventure"
192
"2014-07-12T11:07:56"
"102"
"Galaxy Invaders"
0
"2014-09-18T07:33:42"
"103"
"Attack Ships"
3
"2014-10-19T01:13:24"
"103"
"Galaxy Invaders"
2317
"2014-09-11T06:53:00"
"103"
"Meteor Blasters"
723
"2014-10-19T01:14:24"
"103"
"Starship X"
42
"2014-07-11T06:53:03"
• Hash key = Primary key
• Range key = Sort key
• Others attributes are
undefined
• So… How to sort based
on top score?
Leaderboard with secondary indexes
UserID
(hash key)
BoardName
(range key)
"101"
"Galaxy Invaders"
BoardName
(hash key)
"Alien Adventure"
TopScore
TopScore
(range key)
5842
UserID
192
"101"
"Attack Ships"
3
"103"
"Galaxy Invaders"
0
"102"
"Galaxy Invaders"
2317
"103"
"Galaxy Invaders"
5842
"101"
"Meteor Blasters"
723
"103"
"Meteor Blasters"
1000
"101"
"Starship X"
24
"101"
"Starship X"
42
"103"
TopScoreDate
"2014-09-15T17:24:31"
•
•
•
•
•
Create a secondary index!
Set hash key to BoardName
Set range key to TopScore
Project extra attributes as needed
Can now query by BoardName,
sorted by TopScore
• Handles many common gaming
use cases
Sparse indexes
Scan sparse hash GSIs
Game-scores-table
Id
(Hash)
User Game Score Date
1
Bob
G1
1300 2012-12-23
2
3
Bob
Jay
G1
G1
1450 2012-12-23
1600 2012-12-24
4
5
6
Mary G1
Ryan G2
Jones G2
2000 2012-10-24
123 2012-03-10
345 2012-03-20
Award-GSI
Award
Champ
Award
(Hash)
Id
User Score
Champ
4
Mary 2000
Real-Time voting
Write-heavy items
Requirements for voting
Allow each person to vote only once
No changing votes
Real-time aggregation
Voter analytics, demographics
Real-time voting architecture
RawVotes Table
Voters
Voting App
AggregateVotes
Table
Scaling bottlenecks
Voters
Provision 200,000 WCUs
Partition 1
1000 WCUs
Partition K
1000 WCUs
Partition M
1000 WCUs
Candidate B
Candidate A
Votes Table
Partition N
1000 WCUs
Write sharding
Voter
Candidate A_1
Candidate A_4
Candidate A_7
Candidate B_8
Candidate B_4
Candidate B_5
Candidate B_1
Candidate A_5
Candidate B_3
Candidate B_7
Candidate A_3
Candidate A_2
Candidate A_6
Candidate A_8
Votes Table
Candidate B_2
Candidate B_6
Write sharding
Voter
UpdateItem: “CandidateA_” + rand(0, 10)
ADD 1 to Votes
Candidate A_1
Candidate A_4
Candidate A_7
Candidate B_8
Candidate B_4
Candidate B_5
Candidate B_1
Candidate A_5
Candidate B_3
Candidate B_7
Candidate A_3
Candidate A_2
Candidate A_6
Candidate A_8
Votes Table
Candidate B_2
Candidate B_6
Shard aggregation
Periodic
Process
2. Store
Voter
1. Sum
Candidate A_1
Candidate A_4
Candidate A_7
Candidate B_5
Candidate B_1
Candidate A_5
Candidate B_8
Candidate B_4
Candidate A
Total: 2.5M
Candidate B_3
Candidate B_7
Candidate A_3
Candidate A_2
Candidate A_6
Candidate A_8
Votes Table
Candidate B_2
Candidate B_6
Shard write-heavy hash keys
Trade off read cost for write scalability
Consider throughput per hash key and per partition
Your write workload is not horizontally
scalable
Correctness in voting
1. Record vote and de-dupe; retry
2. Increment candidate counter
Voter
RawVotes Table
AggregateVotes Table
UserId
Candidate
Date
Segment
Votes
Alice
A
2013-10-02
A_1
23
Bob
B
2013-10-02
B_2
12
Eve
B
2013-10-02
B_1
14
Chuck
A
2013-10-02
A_2
25
Correctness in aggregation?
Voter
RawVotes Table
AggregateVotes Table
UserId
Candidate
Date
Segment
Votes
Alice
A
2013-10-02
A_1
23
Bob
B
2013-10-02
B_2
12
Eve
B
2013-10-02
B_1
14
Chuck
A
2013-10-02
A_2
25
DynamoDB Streams
DynamoDB Streams
Stream of updates to a table
Asynchronous
Exactly once
Strictly ordered
• Per item
Highly durable
• Scale with table
24-hour lifetime
Sub-second latency
View types
UpdateItem (Name = John, Destination = Pluto)
View Type
Destination
Old image—before update
Name = John, Destination = Mars
New image—after update
Name = John, Destination = Pluto
Old and new images
Name = John, Destination = Mars
Name = John, Destination = Pluto
Keys only
Name = John
DynamoDB Streams and
Amazon Kinesis Client Library
Partition 1
Shard 1
KCL
Worker
Shard 2
KCL
Worker
Partition 2
DynamoDB
Client Application
Updates
Partition 3
Shard 3
KCL
Worker
Shard 4
KCL
Worker
Partition 4
Table
Partition 5
Table
Stream
Amazon Kinesis Client
Library Application
Cross-region replication
US East (N. Virginia)
DynamoDB Streams
Asia Pacific (Sydney)
Open Source CrossRegion Replication Library
EU (Ireland) Replica
DynamoDB Streams and AWS Lambda
Triggers
Real-time voting architecture (improved)
AggregateVotes
Table
Voters
Voting App
RawVotes Table
RawVotes
DynamoDB
Stream
Amazon
Redshift
Your
Amazon Kinesis–
Enabled App
Amazon EMR
Real-time voting architecture
Handle any scale of
election
Voters
Voting App
AggregateVotes
Table
RawVotes Table
RawVotes
DynamoDB
Stream
Amazon
Redshift
Your
Amazon KinesisEnabled App
Amazon EMR
Real-time voting architecture
Vote only once,
no changing votes
Voters
Voting App
AggregateVotes
Table
RawVotes Table
RawVotes
DynamoDB
Stream
Amazon
Redshift
Your
Amazon KinesisEnabled app
Amazon EMR
Real-time voting architecture
Real-time, fault-tolerant,
scalable aggregation
Voters
Voting app
RawVotes Table
AggregateVotes
Table
RawVotes
DynamoDB
Stream
Amazon
Redshift
Your
Amazon Kinesis–
Enabled App
Amazon EMR
Real-time voting architecture
Voter analytics, statistics
AggregateVotes
Table
Voters
Voting app
RawVotes Table
RawVotes
DynamoDB
Stream
Amazon
Redshift
Your
Amazon Kinesis–
Enabled App
Amazon EMR
Analytics with
DynamoDB Streams
Collect and de-dupe data in DynamoDB
Aggregate data in-memory and flush periodically
Performing real-time aggregation and
analytics
Architecture
Reference Architecture
Elastic Event Driven Applications
Thank You
WWW.AWSEDUCATE.COM
Download