On Managing Continuous Media Data Edward Chang Hector Garcia-Molina

advertisement
On Managing
Continuous Media Data
Edward Chang
Hector Garcia-Molina
Stanford University
Challenges
Large Volume of Data
MPEG2 100 Minute Movie: 3-4 GBytes
Large Data Transfer Rate
MPEG2: 4 to 6 Mbps
HDTV: 19.2 Mbps
Just-in-Time Data Requirement
Simultaneous Users
2
...Challenges
Traditional Optimization Objectives:
Maximizing Throughput!
Maximizing Throughput!!
Maximizing Throughout!!!
How about Cost?
How about Initial Latency?
3
Related Work
IBM T.J. Watson Labs. (P. Yu)
USC (S. Ghandeharizadeh)
UCLA (R. Muntz)
UBC (Raymond Ng)
Bell Labs. (B. Ozden)
etc.
4
Outline
Server (Single Disk)
Revisiting Conventional Wisdom
Minimizing Cost
Minimizing Initial Latency
Server (Parallel Disks)
Balancing Workload
Minimizing Cost & Initial Latency
Client
Handling VBR
Supporting VCR-like Functions
5
Conventional Wisdom
(for Single Disk)
Reducing Disk Latency leads to
Better Disk Utilization
Reducing Disk Latency leads to
Higher Throughput
Increasing Disk Utilization leads to
Improved Cost Effectiveness
6
Is Conventional Wisdom Right?
Does Reducing Disk Latency
lead to Better Disk Utilization?
Does Reducing Disk Latency
lead to Higher Throughput?
Does Increasing Disk Utilization
lead to Improved Cost Effectiveness?
7
Tseek: Disk Latency
TR: Disk Transfer Rate
DR: Display Rate
S:
Segment Size (Peak Memory Use per Request)
T:
Service Cycle Time
8
S = DR × T
T = N × (Tseek + S/TR)
9
Disk Utilization
N × TR × DR × Tseek
S
=
S
is directly proportional to Tseek
Dutil
TR - N × DR
S/TR
=
Dutil
S/TR + Tseek
is
Constant!
10
Is Conventional Wisdom Right?
Does Reducing Disk Latency
lead to Better Disk Utilization? NO!
Does Reducing Disk Latency
lead to Higher Throughput?
Does Increasing Disk Utilization
lead to Improved Cost Effectiveness?
11
What Affects Throughput?
×
Disk
Utilization
Disk Latency
Throughput
?
Memory
Utilization
12
Memory Requirement
We Examine Two Disk Scheduling
Policies’ Memory Requirement
Sweep (Elevator Policy): Enjoys the
Minimum Seek Overhead
Fixed-Stretch: Suffers from High
Seek Overhead
13
Per User Peak Memory Use
S
N × TR × DR × Tseek
=
TR - N × DR
14
Sweep (Elevator)
Disk Latency: Minimum
IO Time Variability: Very High
15
Sweep (Elevator)
Memory Sharing: Poor
Total Memory Requirement:
2 * N * Ssweep
16
Fixed-Stretch
Disk Latency: High (because of
Stretch)
IO Variability: No (because of Fixed)
17
Fixed-Stretch
Memory Sharing: Good
Total Memory Requirement:
1/2 * N * Sfs
18
Throughput
Sweep
2 * N * Ssweep
Available Memory
= 40 Mbytes
N = 40
Fixed Stretch
1/2 * N * Ssf
Available Memory
= 40 Mbytes
N= 42
Higher Throughput
* Based on A Realistic Case Study Using Seagate Disks
19
What Affects Throughput?
×
Disk
Utilization
Disk Latency
Throughput
?
Memory
Utilization
20
Is Conventional Wisdom Right?
Does Reducing Disk Latency
lead to Better Disk Utilization? NO!
Does Reducing Disk Latency
lead to Higher Throughput? NO!
Does Increasing Disk Utilization
lead to Improved Cost Effectiveness?
21
Per Stream Cost
22
Per-Stream Memory Cost
Cm × S
Cm × N × TR × DR × Tseek
=
TR - N × DR
23
Example
 Disk Cost: $200 a unit
 Memory Cost: $5 each MBytes
 Supporting N = 40 Requires 60 MBytes Memory
$200 + 300 = $500
 Supporting N = 50 Requires 160 MBytes Memory
$200 + 800 = $1,000
 For the same cost $1,000, it’s better to buy 2 Disks
and 120 Mbytes to support N = 80 Users!
 Memory Use is Critical
24
Is Conventional Wisdom Right?
Does Reducing Disk Latency
lead to Better Disk Utilization? NO!
Does Reducing Disk Latency
lead to Higher Throughput? NO!
Does Increasing Disk Utilization
lead to Improved Cost Effectiveness?
NO!
25
So What?

26
Outline
Server (Single Disk)
Revisiting Conventional Wisdom
Minimizing Cost
Minimizing Initial Latency
Server (Parallel Disks)
Balancing Workload
Minimizing Cost & Initial Latency
Client
Handling VBR
Supporting VCR-like Functions
27
Initial Latency
What is it?
The time between when a request arrives at
the server to the time when the data is
available in the server’s main memory
Where is it important?
Interactive applications (e.g., video game)
Interactive features (e.g., fast-scan)
28
Sweep (Elevator)
29
Fixed-Stretch
Space Out IOs
30
Fixed-Stretch
31
Fixed-Stretch
32
Our Contribution:
BubbleUp
Fixed-Stretch Enjoys Fine Throughput
BubbleUp Remedies Fixed-Stretch to
Minimize Initial Latency
33
Schedule Office Work
8am:
9am:
10am:
11am:
Noon:
Host a Visitor
Do Email
Write Paper
Write Paper
Lunch
34
BubbleUp
35
BubbleUp
Empty Slots are Always Next in Time
No additional Memory Required
Fill the Buffer up to the Segment Size
No additional Disk Bandwidth Required
The Disk Is Idle Otherwise
36
Evaluation
37
Fast-Scan
38
Fast-Scan
39
Data Placement Policies
Please refer to our publications
40
41
Chunk Allocation
Allocate Memory in Chunks
A Chunk = k * S
Replicate the Last Segment of a Chunk
in the Beginning of Next Chunk
Example
Chunk 1: s1, s2, s3, s4, s5
Chunk 2: s5, s6, s7, s8, s9
42
Chunk Allocation
Largest-Fit First
Best Fit (Last Chunk)
43
18 Segment Placement
44
Largest-Fit First
45
Best Fit
46
Outline
Server (Single Disk)
Revisiting Conventional Wisdom
Minimizing Cost
Minimizing Initial Latency
Server (Parallel Disks)
Balancing Workload
Minimizing Cost & Initial Latency
Client
Handling VBR
Supporting VCR-like Functions
47
Unbalanced Workload
48
Balanced Workload
49
Per Stream Memory Use
(Use M Disks Independently)
S
=
N × TR × DR × Tseek
TR - N × DR
M×N
50
Per Stream Memory Use
(Use M Disks As One Disk)
M×N
51
…Continue
S
=
S’ =
S’ =
N × TR × DR × Tseek
TR - N × DR
N × M × TR × M × DR × Tseek
TR × M - N × M × DR
M × N × TR × DR × Tseek
= M×S
TR - N × DR
52
Challenges
Using M Disks Independently:
Unbalanced Workload
Low Per-Stream Memory Cost
Using M Disks As One Virtual Disk (i.e.,
Employing Fine-Grained Striping):
Balanced Workload
High Per-Stream Memory Cost
53
Our Approach (2DB)
Use Disks Independently
To Minimize Cost
Replicate Hot Movies (20% Movies)
To Balance Workload
Use BubbleUp
To Minimize Initial Latency
54
2D BubbleUp (2DB)
Intelligent Data Placement
Efficient Request Scheduling
FODO, 1998
55
2DB Data Placement:
Chunk Allocation
56
2DB Scheduling
Formally, This is a Bipartite Weighted
Matching problem
Can be solved using Hungarian method in
O(V^3), where V = NM
We use a Greedy Method to reduce the
problem to a Bipartite Unweighted
Matching problem
Can be solved in O(M^2)
57
Why 2DB Works?
58
59
60
n balls n urns, finite n:
ln n / ln ln n(1 + o(1))
ln ln n / ln 2 + O(1)
m balls n urns, m > n and infinite m and n:
d: number of possible destinations
ln ln n / ln d (1 + o(1)) + O(m/n)
61
What 2DB Costs?
Storage Cost
Addition disk cost = % hot movies
Typically 20% of movies subscribed
80% of time
Throughput
Throughput is scaled back by a fraction to
achieve balanced work
62
Evaluation
2DB Achieves Balanced Workload with
High Throughput
Compared to e.g., some dynamic load
balancing schemes
2DB Incurs Low Additional Storage Cost
2DB Enjoys Minimum Initial Latency
63
Outline
Server (Single Disk)
Revisiting Conventional Wisdom
Minimizing Cost
Minimizing Initial Latency
Server (Parallel Disks)
Balancing Workload
Minimizing Cost & Initial Latency
Client
Handling VBR
Supporting VCR-like Functions
64
Media Client
Most Studies Assume Dumb Clients
We Propose Smart Clients for
Handling VBR
Supporting VCR-like Functions
65
Handling VBR
Server Can Handle VBR
Frame rate fluctuates but the moving
average does not fluctuate as much
Rates are even out when N is large, which
is typically the case
66
...VBR
But, the Server Cannot Eliminate Bitrate
Mismatch
Packetization and Channel Delay
can change the bitrate
The Solution Must Be at the Client Side!
67
Supporting VCR-like Functions
Pause
Phone call interruptions
Biological needs
Fast Forward
Catching up the program after a pause
Instant Replay
68
How to Pause A Movie?
Broadcast TV Cannot Be Paused
Pausing Via a Point-to-point Link Affects
the Server’s Scheduling
Caching!!!
Main Memory Caching?
Too expensive!
(19.2 mbps * 20 min = 2 GBytes)
69
Buffer Management
70
Challenges
Must Ensure Arriving Bits Do Not
Overflow the Network Buffer
Must Ensure Decoder Buffer Does Not
Underflow
Must Work for Any Off-the-shelf Disks,
CPU Box
71
Our Contribution: MEDIC
MEDIC: MEmory & Disk Integrated Cache
MEDIC Manages IOs Between Memory
and Disk Efficiently
Only 4 Mbytes main memory needed!!!
Make a set-top box affordable
MEDIC Adapts to Hardware Configuration
72
Demo
Regular Playback
Pause
Resume Regular Playback
Fast Forward
Instant Replay (not shown)
73
Visualize MEDIC
74
Conclusions
(Contributions in Blue)
Server (Single Disk)
Revisiting Conventional Wisdom
Minimizing Cost
Minimizing Initial Latency
Server (Parallel Disks)
Balancing Workload
Minimizing Cost & Initial Latency
Client
Handling VBR
Supporting VCR-like Functions
75
…Conclusions
Our Server Supports
Low Latency Playback and Fast Forward
Our Client Supports
Pause and Low Latency Instance Replay
Together, We Propose A Complete Endto-end Solution for Continuous Media
Data Delivery!
76
Future Work
Enhancing MEDIC for Managing
Heterogeneous Data, from Both
Broadcast & Internet Channels
Video Panoramas
Interactive TV
Indexing Videos for Replay
Video/Image databases
77
Download