Kozuch_Talk - NSF PI Meeting | The Science of Cloud Computing

advertisement
System Software Considerations for
Cloud Computing on Big Data
March 17, 2011
Michael Kozuch
Intel Labs Pittsburgh
Outline
1. Background: Open Cirrus
2. Cluster software stack
3. Big Data
4. Power
5. Recent news
2
Open Cirrus
Open Cirrus* Cloud Computing Testbed
Collaboration between industry and academia, sharing
•hardware infrastructure
•software infrastructure
UIUC*
•research
•applications and data sets
KIT*
ISPRAS*
ETRI*
CMU*
GaTech*
CESGA*
MIMOS*
China
Telecom*
China
Mobile*
IDA*
Sponsored by HP, Intel, and Yahoo! (with additional support from NSF)
14 sites currently, target of around 20 in the next two years
Open Cirrus*
http://opencirrus.org
Objectives
– Foster systems research around cloud computing
– Enable federation of heterogeneous datacenters
– Vendor-neutral open-source stacks and APIs for the cloud
– Expose research community to enterprise level
requirements
– Capture realistic traces of cloud workloads
Each Site
– Runs its own research and technical teams,
– Contributes individual technologies
– Operates
some of the global services
Independently-managed sites…
providing a cooperative research testbed
Intel BigData Cluster
1 Gb/s
(x4)
45 Mb/s T3
to Internet
20 nodes:
1 Xeon
(single-core)
[Irwindale]
6GB DRAM
366GB disk
10 nodes:
2 Xeon 5160
(dual-core)
[Woodcrest]
4GB RAM
2 75GB Disk
10 nodes:
2 Xeon E5345
(quad-core)
[Clovertown]
8GB DRAM
2 150GB Disk
1 Gb/s
(x4)
1 Gb/s
(x4)
Switch
48 Gb/s
Blade
Rack
40 nodes
-------------
rXrY=row X rack Y
rXrYcZ=row X rack Y chassis Z
(r2r1c1-4)
Nodes
Cores
DRAM (GB)
Spindles
Storage (TB)
(r2r2c1-4)
r2r1c1-4 r2r2c1-4
40
40
140
320
240
320
80
80
12
12
1 Gb/s
(x2x5 p2p)
Switch
48 Gb/s
Switch
48 Gb/s
1U Rack
15 nodes
2 Xeon E5420
(quad-core)
[Harpertown]
8GB DRAM
2 1TB Disk
2 Xeon E5420
(quad-core)
[Harpertown]
8GB DRAM
2 1TB Disk
-------------
r1r1
15
120
120
30
30
x1
(r1r2)
r1r2
27
264
696
102
66
r1r3 r1r4
r2r3
r3r2 r3r3
45
30
360
240
360
480
270
180
270
180
-------------
-------------
2 Xeon X5650
(six-core)
[WestmereEP]
48GB DRAM
6 0.5TB Disk
2 Xeon E5520
(quad-core)
[Nehalem-EP]
16GB DRAM
6 1TB Disk
2 Xeon E5440
(quad-core)
[Harpertown]
8GB DRAM
6 1TB Disk
x3
(r1r3,r1r4,r2r3)
mobile storage
8
5
64
128
16
60
16
60
1 Gb/s
(x15 p2p)
2U Rack
15 nodes
2U Rack
15 nodes
12 nodes
-------------
Switch
48 Gb/s
1 Gb/s
(x15 p2p)
1 Gb/s
(x27 p2p)
x1
(r1r1)
------------12 1TB Disk
Switch
48 Gb/s
1U Rack
15 nodes
-------------
3U Rack
5 storage
nodes
(r1r5)
Switch
48 Gb/s
1 Gb/s
(x15 p2p)
------------2 Xeon E5440
(quad-core)
[Harpertown]
16GB DRAM
2 1TB Disk
1 Gb/s
(x4)
1 Gb/s
(x4)
1 Gb/s
(x4)
1 Gb/s
(x4x4 p2p)
2 Xeon E5345
(quad-core)
[Clovertown]
8GB DRAM
2 150GB Disk
Key:
1 Gb/s
(x4)
1 Gb/s
(x4)
Switch
48 Gb/s
1 Gb/s
(x4x4 p2p)
Blade
Rack
40 nodes
Switch
48 Gb/s
1 Gb/s
(x8)
Switch
24 Gb/s
1 Gb/s
(x8 p2p)
Mobile Rack
8 (1u) nodes
x2
(r3r2,r3r3)
TOTAL
210
1508
2344
818
646
Cloud Software Stack
Cloud Software Stack – Key Learnings
• Enable use of application frameworks (Hadoop, Maui-Torque)
• Enable general IaaS use
• Provide Big Data storage service
Application Frameworks
• Enable physical resources allocation
IaaS
Storage Service
Why Physical?
1. Virtualization overhead
2. Access to phys resource
3. Security issues
Resource Allocator
Node
Node
Node
Node
Node
Node
Zoni Functionality
Provides each project
with a mini-datacenter
• Allocation
Isolation of experiments
•
Assignment of physical resources to users
• Isolation
•
Allow multiple mini-clusters to co-exist without interference
• Provisioning
•
Booting of specified OS
• Management
•
OOB power management
• Debugging
•
Domain 0
Domain 1
PXE/DNS/DHCP
DNS/PXE/DHCP
Server Pool
0
Server
Server Pool
1
Gateway
Pool 0
OOB console access
Intel BigData Cluster Dashboard
Big Data
Example Applications
Application
Big Data
Algorithms
Compute
Style
Scientific study
(e.g. earthquake
study)
Ground model
Earthquake
simulation, thermal
conduction, …
HPC
Internet library
search
Historic web
snapshots
Data mining
MapReduce
Virtual world
analysis
Virtual world
database
Data mining
TBD
Language
translation
Text corpuses,
audio archives,…
Speech recognition,
MapReduce &
machine translation, HPC
moretext-to-speech,
video uploaded
to
…
There has been
YouTube
than if ABC,
NBC,
Video search in the
Video last
data 2 months
Object/gesture
MapReduce
identification,
face24/7/365
and CBS had been airing
content
continuously sincerecognition,
1948. …
- Gartner
12
Big Data
Interesting applications are data hungry
The data grows over time
The data is immobile
– 100 TB @ 1Gbps ~= 10 days
Compute comes to the data
The
value of a cluster is its data
Big Data clusters are the new libraries
13
Example Motivating Application:
Online Processing of Archival Video
•
Research project: Develop a context recognition system that is 90%
accurate over 90% of your day
•
•
•
•
Example query 1: “Where did I leave my briefcase?”
•
•
Leverage a combination of low- and high-rate sensing for perception
Federate many sensors for improved perception
Big Data: Terabytes of archived video from many egocentric cameras
Sequential search through all video streams [Parallel Camera]
Example query 2: “Now that I’ve found my briefcase, track it”
•
Cross-cutting search among related video streams [Parallel Time]
Big Data Cluster
14
14
Big Data System Requirements
Provide high-performance execution over Big Data repositories
 Many spindles, many CPUs
 Parallel processing
Enable multiple services to access a repository concurrently
Enable low-latency scaling of services
Enable each service to leverage its own software stack
 IaaS, file-system protections where needed
Enable slow resource scaling for growth
Enable rapid resource scaling for power/demand
 Scaling-aware storage
15
Storing the Data – Choices
Model 1: Separate Compute/Storage
Compute and storage
can scale independently
Many opportunities for
reliability
compute
servers
storage
servers
Model 2: Co-located Compute/Storage
No compute resources
are under-utilized
Potential for higher
throughput
compute/storage
servers
16
Cluster Model
external
network
BWswitch
TOR Switch
Cluster Switch
Connections to
R Racks
BWnode
BWdisk
p cores
Rack of
N server
nodes
d disks
The cluster switch quickly
becomes the bottleneck.
Local computation
is crucial.
17
I/O Throughput Analysis
Random Placement
Location-Aware Placement
11X
9.2X
5000
4000
3000
1000
3.5X
2000
3.6X
Data Throughput (Gb/s)
6000
0
Disk-1G
SSD-1G
Disk-10G
SSD-10G
20 racks of 20 2-disk servers; BWswitch = 10 Gbps
18
Data Location Information
Issues:
• Many different file system possibilities (HDFS, PVFS, Lustre, etc)
• Many different application framework possibilities
• Consumers could be virtualized
Solution:
• Standard cluster-wide Data Location Service
• Resource Telemetry Service to evaluate scheduling choices
• Enables virtualized location info and file system agnosticism
19
Exposing Location Information
LA runtime
DFS
Resource
Telemetry
Service
LA application
LA runtime
DFS
Guest OS
OS
VM Runtime
VMM
Virtual Machines
LA application
Data
Location
Service
DFS
OS
(a) non-virtualized
(b) virtualized
20
Power
(System) Efficiency
Demand Scaling/
Power Proportionality
“A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud
Computing Systems,” Anton Beloglazov, Rajkumar Buyya, Young
Choon Lee, and Albert Zomaya
22
Power Proportionality and Big Data
The Hadoop Filesystem
(10K blocks)
Number of blocks
stored on node i
2000
Possible power savings: ~66%
~0%
Optimal: ~95%
Node number i i=100
Rabbit Filesystem
Simple Strategy: Maintain a
“primary replica”
24
A reliable,
power-proportional
filesystem for Big Data
workloads
Recent News
Recent News
• “Intel Labs to Invest $100 Million in U.S. University Research”
•
•
•
•
Over five years
Intel Science and Technology Centers– 3+2 year sponsored research
Half-dozen or more by 2012
Each can have small number of Intel research staff on site
• New ISTC focusing on cloud computing possible
26
Tentative Research Agenda Framing
Potential Questions
Potential Research Questions
Software stack
• Is physical allocation an interesting paradigm for the public cloud?
• What are the right interfaces between the layers?
• Can multi-variable optimization work across layers?
Big Data
• Can a hybrid cloud-HPC file system provide best-of-both-worlds?
• How should the file system deal with heterogeneity?
• What are the right file system sharing models for the cloud?
• Can physical resources be taken from the FS and given back?
29
Potential Research Questions
Power
• Can storage service power be reduced without reducing availability?
• How should a power-proportional FS maintain a good data layout?
Federation
• Which applications can cope with limited bandwidth between sites?
• What are the optimal ways to join data across clusters?
• How necessary is federation?
How should compute, storage, and power
be managed to optimize for
performance, energy, and fault-tolerance?
30
Backup
Scaling– Power Proportionality
Demand scaling presents perf./power trade-off
•
Our servers: 250W loaded, 150W idle, 10W off, 200s setup
Research underway for scaling cloud applications
•
•
•
Control theory
Load prediction
Autoscaling
Cloud-based App
Request rate: λ
Scaling beyond single tier less well-understood
Note: proportionality issue is orthogonal to FAWN design
Scaling– Power Proportionality
Project 1: Multi-tier power management
•
E.g. Facebook
λ
Project 2: Multi-variable optimization
IaaS
Distributed file system
Resource allocator
Physical resources
Project 3: Collective optimization
•
Open Cirrus may have key role
λ
e.g. Tashi
e.g. Rabbit
e.g. Zoni
Download