MySpace.com MegaSite v2

advertisement
Aber Whitcomb – Chief Technology Officer
Jim Benedetto – Vice President of Technology
Allen Hurff – Vice President of Engineering
First Megasite
64+ MM Registered Users
38 MM Unique Users
260,000 New Registered Users Per Day
23 Trillion Page* Views/Month
50.2% Female / 49.8% Male
Primary Age Demo: 14-34
100K
1M
6M
70 M
185 M
As of April 2007
185+ MM Registered Users
90 MM Unique Users
Demographics
50.2% Female / 49.8% Male
Primary Age Demo: 14-34
Internet Rank
Page views in ‘000s
MySpace
#1
43,723
Yahoo
#2
35,576
MSN
#3
13,672
Google
#4
12,476
facebook
#5
12,179
AOL
#6
10,609
Source: comScore Media Metrix March - 2007
50,000
45,000
40,000
35,000
MySpace
Yahoo
MSN
Google
Ebay
Facebook
30,000
M
M 25,000
20,000
15,000
10,000
5,000
0
Nov 2006
Source: comScore Media Metrix April 2007
Dec 2006
Jan 2007
Feb 2007
Mar 2007
350,000 new user registrations/day
1 Billion+ total images
Millions of new images/day
Millions of songs streamed/day
4.5 Million concurrent users
Localized and launched in 14 countries
Launched China and Latin America last
week
7 Datacenters
6000 Web Servers
250 Cache Servers 16gb RAM
650 Ad servers
250 DB Servers
400 Media Processing servers
7000 disks in SAN architecture
70,000 mb/s bandwidth
35,000 mb/s on CDN
Typically used for caching
MySpace user data.
Online status, hit counters, profiles, mail.
Provides a transparent client API for
caching C# objects.
Clustering
Servers divided into "Groups" of one or
more "Clusters".
Clusters keep themselves up to date.
Multiple load balancing schemes based
on expected load.
Heavy write environment
Must scale past 20k redundant writes per
second on a 15 server redundant cluster.
Relay
Client
Relay Service
Platform for middle tier
messaging.
Up to 100k request
messages per second per
server in prod.
Purely asynchronous—no
thread blocking.
Concurrency and
Coordination Runtime
Bulk message processing.
IRelayComponents
Relay
Client
Socket
Server
C
C
R
Custom unidirectional
connection pooling.
Custom wire format.
Gzip compression for larger
messages.
Data center aware.
Configurable components
Message
Orchestration
Berkeley DB
Non-locking Memory
Buckets
Fixed Alloc Shared
Interlocked Int Storage
for Hit Counters
Message
Forwarding
C
C
R
MySpace embraced Team Foundation Server and Team
System during Beta 3
MySpace was also one of the early beta testers of
BizDev’s Team Plain (now owned by Microsoft).
Team Foundation initially supported 32 MySpace
developers and now supports 110 developers on it's way
to over 230 developers
MySpace is able to branch and shelve more effectively
with TFS and Team System
MySpace uses Team Foundation Server as a source
repository for it's .NET, C++, Flash, and Cold Fusion
codebases
MySpace uses Team Plain for Product Managers and
other non-development roles
MySpace is a member of the Strategic Design Review
committee for the Team System suite
MySpace chose Team Test Edition which reduced cost
and kept it’s Quality Assurance Staff on the same suite
as the development teams
MySpace using MSSCCI providers and customization of
Team Foundation Server (including the upcoming K2
Blackperl) was able to extend TFS to have better
workflow and defect tracking based on our specific needs
Maintaining consistent, always changing code base and
configs across thousands of servers proved very difficult
Code rolls began to take a very long time
CodeSpew – Code deployment and maintenance utility
Two tier application
Central management server – C#
Light agent on every production server – C#
Tightly integrated with Windows Powershell
UDP out, TCP/IP in
Massively parallel – able to update hundreds of servers
at a time.
File modifications are determined on a per server basis
based on CRCs
Security model for code deployment authorization
Able to execute remote powershell scripts across server
farm
Images
1 Billion+ images
80 TB of space
150,000 req/s
8 Gigabits/sec
Music
25 Million songs
142 TB of space
250,000 concurrent streams
Videos
60TB storage
15,000 concurrent streams
60,000 new videos/day
Millions of MP3, Video and Image Uploads Every Day
Ability to design custom encoding profiles (bitrate, width, height,
letterbox, etc.) for a variety of deployment scenarios.
Job broker engine to maximize encoding resources and provide a
level of QoS.
Abandonment of database connectivity in favor of a web service layer
XML based workflow definition to provide extensibility to the encoding
engine.
Coded entirely in C#
Filmstrip for Image
Review
Thumbnails for
Categorization
DFS 2.0
User Content
Job Broker
MediaProcessor
Upload
Web Service
Communication
Layer
CDN
FTP Server
(Any Application)
Provides an object-oriented file store
Scales linearly to near-infinite capacity on commodity hardware
High-throughput distribution architecture
Simple cross-platform storage API
Accesses
Designed exclusively for long-tail content
Demand
Custom high-performance event-driven web server core
Written in C++ as a shared library
Integrated content cache engine
Integrates with storage layer over HTTP
Capable of more than 1Gbit/s throughput on a dualprocessor host
Capable of tens of thousands of concurrent streams
DFS uses a generic “file pointer” data type for identifying files,
allowing us to change URL formats and distribution
mechanisms without altering data.
Compatible with traditional CDNs like Akamai
Can be scaled at any granularity, from single nodes to complete
clusters
Provides a uniform method for developers to access any media
content on MySpace
300
250
200
150
2005 Server
2006 Server
2007 Server
100
50
0
Pages/Sec
Distribute MySpace servers over 3
geographically dispersed co-location sites
Maintain presence in Los Angeles
Add a Phoenix site for active/active
configuration
Add a Seattle site for active/active/active with
Site Failover capability
Sledgehammer
Cache Engine
Users
Business
Logic
Server Accelerator Engine
Storage Cluster
DFS Cache Daemon
Download