Document

advertisement
Engineering Scalable
Social Games
Dr. Robert Zubek
Top social game developer
Growth Example
Roller Coaster Kingdom
1M DAUs
over a weekend
source: developeranalytics.com
Growth Example
FishVille
6M DAUs
in its first week
source: developeranalytics.com
Growth Example
FarmVille
25M DAUs
over five months
source: developeranalytics.com
Talk Overview
Introduce game developers to best
practices for large-scale web development
Part I. Game Architectures
Part II. Scaling Solutions
Part I. Game• Architectures
server approaches
•
client approaches
Three Major Types
Web server stack
HTML
Flash
Web + MMO
stack
Flash
Server Side
Web stack
Mixed stack
•
Usually based on a
LAMP stack
•
Game logic in MMO
server (eg. Java)
•
Game logic in PHP
•
•
HTTP communication
Web stack for
everything else
Example 1
FishVille
DB
Servers
Cache and
Queue
Servers
Web
Servers
+ CDN
Example 2
YoVille
DB
Servers
MMO
Servers
Cache and
Queue
Servers
Web
Servers
+ CDN
•
•
•
•
HTTP scales very well
Stateless request/response
Easy to load-balance across servers
Why
stack?
Easy toweb
add more
servers
Some limitations
• Server-initiated actions
• Storing state between requests
example
Web stack and state
caches
DB
Servers
Web #1
Web #2
Web #3
Web #4
x
x
x
Client
MMO servers
•
•
Minimally “MMO”:
•
Persistent socket connection per client
•
Live game support: chat, live events
•
Supports data push
•
Keeps game state in memory
Very different from web!
•
Harder to scale out!
•
But on the plus side:
1.
Supports live communication and events
Why
MMO servers?
2. Game logic can run on the server,
independent of client’s actions
3.
Game state can be entirely in RAM
Example
Web + MMO stack
DB
Servers
caches
MMO
Servers
Client
Web
Servers
Client Side
Flash
HTML + AJAX
•
High production
quality games
•
The game is “just”
a web page
•
Game logic on client
•
Minimal sys reqs
•
Can keep open
socket
•
Limited graphics and
animation
Social Network Integration
•
“Easy” because not related to scaling 
•
Calling the host network to get a list of friends,
post something, etc.
•
Networks provide REST APIs, and sometimes
client libraries
Architectures
Data
database, cache, etc.
Server
Web server stack
Client
HTML
Flash
Web +
MMO
Flash
blowing
up
Introducenot
game
developers
to best
grow web development
practices as
foryou
large-scale
Part I. Game Architectures
Part II. Scaling Solutions
Scaling
Scale up vs. scale out
I need a
bigger box
I need
more boxes
Scaling
Scale up vs. scale out
Scale out has
been a clear win
•
At some point you can’t
get a box big enough,
quickly enough
•
Much easier to add
more boxes
•
But: requires architectural
support from the app!
Example:
Scaling up vs. out
•
500,000 daily players in first week
•
Started with just one DB, which quickly
became a bottleneck
•
Short term: scaled up, optimized DB accesses,
fixed slow queries, caching, etc.
•
Long term: must scale out with the size
of the player base!
Scaling
We’ll look at four elements:
Databases
Caching
Game servers
(web + MMO)
Capacity planning
II A. Databases
DBs are the first to fall over
caches
App
Servers
DB
Client
App
Servers
queues
App
W
R
App
App
W
M
R
S
S
App
App
M M
S
M
S
App
App
M0
M
M1
M
How to partition data?
•
Two most common ways:
•
Vertical – by table
•
•
Easy but doesn’t scale with DAUs
Horizontal – by row
•
Harder to do, but gives best results!
•
Different rows live on different DBs
•
Need a good mapping from row # to DB
Row striping
•
•
•
•
Row-to-shard mapping:
id
Data
100
foo
101
bar
102
baz
103
xyzzy
M0
Primary key modulo # of DBs
Like a “logical RAID 0”
More clever schemes exist, of course
M1
Servers
Servers
Servers
Scaling out
your shards
M0
M1
x
x
M2
S0
M3
S1
Sharding in a nutshell
•
It’s just data striping across databases
(“logical RAID 0”)
•
There’s no automatic replication, etc.
•
No magic about how it works
•
That’s also what makes it robust,
easy to set up and maintain!
Example:
Sharding
•
Biggest challenge: designing how to partition
huge existing dataset
•
Partitioned both horizontally and vertically
•
Lots of joins had to be broken
•
Data patterns needed redesigned
•
•
With sharding, you need to know the shard
ID for each query!
Data replication had difficulty catching up with
high-volume usage
Sharding surprises
Be careful with joins
•
Can’t do joins across shards
•
Instead, do multiple selects,
or denormalize your data
Sharding surprises
Skip transactions
and foreign key constraints
•
CPU-expensive; easier to pay this cost in the
application layer (just be careful)
•
The more you keep in RAM, the less you’ll
need these
II B. Caching
•
caches
DBs
can be too slow
•
Spend memory to buy
speed!
Servers
App
Client
DB
App
Servers
queues
Caching
•
•
Two main uses:
•
Speed up DB access to commonly
used data
•
Shared game state storage for stateless
web game servers
Memcached: network-attached RAM cache
Memcache
•
Stores simple key-value pairs
•
•
•
eg. “uid_123” => {name:”Robert”, …}
What to put there?
•
Structured game data
•
Mutexes for actions across servers
Caveat
•
It’s an LRU cache, not a DB!
No persistence!
Memcache scaling out
Shard it just like the DB!
DBs
MCs
App
II C. Game Servers
caches
MMO
Servers
Client
DB
Web
Servers
queues
Scaling web servers
LB
LB
LB
DNS
LB
Scaling content delivery
Web
Servers
Client
game data
only
media
files
media
files
CDN
Scaling MMO servers
•
•
Scaling out is easiest when servers don’t need
to talk to each other
•
Databases: sharding
•
Memcache: sharding
•
Web: load-balanced farms
•
MMO: ???
Our MMO server approach: shard just like
other servers
Scaling MMO servers
•
•
Keep all servers separate from each other
•
Minimize inter-server communication
•
All participants in a live event should be on
the same server
Scaling out means no direct sharing
•
Sharing through third parties is okay
Problem #1:
Load balancing
•
What if a server gets overloaded?
•
•
Can’t move a live player to a
different server
Poor man’s load-balancing:
•
Players don’t pick servers; the LB does
•
If a server gets hot, remove it from the
LB pool
•
If enough servers get hot, add more server
instances and send new connections there
Problem #2:
Deployment
Downtime = lost revenues
•
You can measure cost of
downtime in $/minute!
Problem #2:
Deployment
Deployment comparison:
•
Web: just copy over PHP files
•
Socket servers – much harder
•
How to deploy with zero
downtime?
•
Ideally: set up shadow new
server and slowly transition
players over
II D. Capacity planning
Capacity
We believe
in
scaling
out
Different logistics than
•
Demand could change rapidly
•
How to provision enough servers?
scaling up
Capacity:
colo vs. cloud
Colo
Cloud
•
Higher fixed cost
•
Low or zero fixed cost
•
Lower variable cost
•
Higher variable cost
•
Significant lead time on
additional capacity
•
Little or no lead time on
additional capacity
•
More control over ops
•
Less control over ops
•
Any chips you want
•
Limited choice of VMs
•
Any disks you want
•
Virtualized storage
•
Scale up or out
•
Can’t scale up! Only out.
Operations
•
Scaling out: a legion of servers
•
You’ll need tools
•
App health – custom dashboard
•
Server monitoring – Munin
•
Automatic alerts – Nagios
Ops: Dashboard
Ops: Munin
Server stats from
the entire farm
Ops: Nagios
•
SMS alerts for your server stats
•
Connections drop to zero
•
Or CPU load exceeds 4
•
Or test account can’t connect, etc.
•
Put alerts on business stats, too!
•
DAUs dropping below daily avg, etc.
Server failures
•
•
Common in the cloud
•
Dropping off the net
•
Sometimes even shutting down
Be defensive
•
Ops should be prepared
•
Devs should program defensively
The End
Many thanks to those who taught me a lot:
Virtual Worlds: Scott Dale, Justin Waldron
Coaster: Biren Gandhi, Nathan Brown, Tatung Mei
http://robert.zubek.net/gdc2010
Download