Applications of Redundancy in Mobile Computing

advertisement
Embracing Redundancy in
Mobile Computing
Jason Flinn
MobiHeld 2011
Jason Flinn
MobiHeld 2011
1
Game plan
• A bit of a rant about mobile system design
– Motivated by difficulties deploying successful(?) research
– Motivated by common mistakes I (we?) have made
• Sadly, I’m not this guy
– But, I will try to be a little provocative
– Hopefully, we can start a good discussion
Jason Flinn
MobiHeld 2011
2
Resource scarcity
• What distinguishes mobile systems?
• One clear distinction: resource scarcity
–
–
–
–
–
Less powerful CPU
Less memory
Smaller storage
Poorer network quality
Limited battery power…
• What’s the correct lesson for system design?
Jason Flinn
MobiHeld 2011
3
The real lesson?
• Don’t waste CPU cycles!
Stop!
• Every Joule is precious!
Caution!
Networks
• Minimize network usage!
Warning!
Phones
Danger!
• Is this how
we should build mobile systems?
Jason Flinn
MobiHeld 2011
4
Pitfall #1
• Don’t focus on computational resources
– Network, battery, CPU, etc.
– Easy to measure (esp. with micro-benchmarks)
• Instead, focus on human resources
– E.g., interactive response time
– Application latency, not network bandwidth
– Remember the person waiting for results
Jason Flinn
MobiHeld 2011
5
One systems challenge
Diversity of networks
How to exploit this?
1. Use option with strongest signal?
2. Use option with lowest RTT?
3. Use option with highest bandwidth?
4. Stripe data across all networks?
All optimize a computational resource
• Focus on using network efficiently
• May or may not save human resources
Jason Flinn
MobiHeld 2011
6
Challenge (cont.)
Diversity of networks
Diversity of behavior
Email
• Fetch messages
Media
• Download video
How to minimize response time?
• Low latency network for e-mail
• Stripe across networks for video
Jason Flinn
MobiHeld 2011
7
Intentional Networking
• Can’t consider just network usage
– Must consider impact on user response time
– Unfortunately, no one-size-fits-all strategy
• Must consider intentions in using the network
– Ask the user? No, the user’s time is precious
– Instead, ask the applications
• work with Brett Higgins, Brian Noble, TJ Giuli, David Watson
Jason Flinn
MobiHeld 2011
8
Parallels to Parallel Programming
• We’ve seen this before!
– Transition from uniprocessing to multiprocessing
– Now: uninetworking to multinetworking
• What lessons can we learn from history?
– Automatic parallelization is hard!
– More effective to provide abstractions to applications
• Applications express policies for parallelization
• System provides mechanism to realize policies
Jason Flinn
MobiHeld 2011
9
Abstractions
Multiple Processors
Multiple Networks
Multithreaded programs
Multi-Sockets
Locks
IROBS
Condition Variables
Ordering Constraints
Priorities
Labels
Async. Events/Signals
Thunks
• Key insight: find parallel abstractions for applications
Jason Flinn
MobiHeld 2011
10
Abstraction: Socket
• Socket: logical connection between endpoints
Client
Jason Flinn
Server
MobiHeld 2011
11
Abstraction: Multi-socket
• Multi-socket: virtual connection
– Measures performance of each alternative
– Encapsulates transient network failure
Client
Jason Flinn
Server
MobiHeld 2011
12
Abstraction: Label
• Qualitative description of network traffic
• Size: small (latency) or large (bandwidth)
• Interactivity: foreground vs. background
Client
Jason Flinn
Server
MobiHeld 2011
13
Abstraction: IROB
• IROB: Isolated Reliable Ordered Bytestream
• Guarantees atomic delivery of data chunk
• Application specifies data, atomicity boundary
IROB 1
4
Jason Flinn
3
2
MobiHeld 2011
1
14
Abstraction: Ordering Constraints
• App specifies partial ordering on IROBs
• Receiving end enforces delivery order
IROB 1
Server
IROB 1
IROB 2
IROB 2
IROB 3
Dep: 2
Jason Flinn
use_data()
IROB 3
MobiHeld 2011
15
Abstraction: Thunk
• What happens when traffic should be deferred?
– Application passes an optional callback + state
– Borrows from PL domain
• If no suitable network is available:
– Operation will fail with a special code
– Callback will be fired when a suitable network appears
• Use case: periodic background messages
– Send once, at the right time
Jason Flinn
MobiHeld 2011
16
Evaluation: Methodology
• Gathered network traces in a moving vehicle
– Sprint 3G & open WiFi
– BW up/down, RTT
• Replayed in lab
(trace map
here)
Jason Flinn
MobiHeld 2011
17
Evaluation: Comparison Strategies
• Generated from the network traces
• Idealized migration
– Always use best-bandwidth network
– Always use best-latency network
• Idealized aggregation
– Aggregate bandwidth, minimum latency
• Upper bounds on app-oblivious performance
Jason Flinn
MobiHeld 2011
18
Evaluation Results: Email
• Trace #2: Ypsilanti, MI
On-demand fetch time
BG fetch time
70
Best
bandwidth
50
Best latency
40
Aggregate
30
10
7x
Intentional
Networking
0
Jason Flinn
350
300
Time (seconds)
Time (seconds)
60
20
400
250
3%
200
150
100
50
0
MobiHeld 2011
19
Evaluation Results: Vehicular Sensing
• Trace #2: Ypsilanti, MI
Urgent update time
BG throughput
Best
bandwidth
7
8
7
4
Aggregate
3
2
1
0
Jason Flinn
Best latency
5
48%
Intentional
Networking
MobiHeld 2011
Throughput (KB/sec)
Time (seconds)
6
6
5
6%
4
3
2
1
0
20
But, some issues in practice…
• Followed lab evaluation with field trial
• Deployed on Android phones
• Lessons:
– Networks failed in unpredictable ways
• Hard to differentiate failures and transient delays
• Timeouts either too conservative/aggressive
– Predictions not as effective as during lab evaluation
• May be caused by walking vs. driving
• May be caused by more urban environment
Jason Flinn
MobiHeld 2011
21
Pitfall #2
• It’s harder than we think to predict the future
• Many predictors that work well in the lab
– Mobile environments more unpredictable
– Traces often capture only limited types of variance
• Modularity exacerbates the problem
– Higher levels often assume predictions 100% correct
– Pick optimal strategy given predictions
Jason Flinn
MobiHeld 2011
22
An exercise
• Number generator picks 0 or 1:
– Picking the correct number wins $10
– Costs $4 to pick 0
– Costs $4 to pick 1
• Which one should you pick if 0 and 1 equally likely?
– Both! (trick question - sorry)
• What if I told you that 0 is more likely than 1?
– “How much more likely?”
• “A lot” -> pick 0, “A little”, pick both
• (most mobile systems just pick 0)
Jason Flinn
MobiHeld 2011
23
Strawman adaptation strategy
• Estimate future conditions (e.g., latency, bandwidth)
• Calculate benefit of each option
– E.g., Response time
• Calculate cost of each option
– Energy, wireless minutes
• Equate diverse metrics
– Utility function, constraints, etc.
• Choose the best option
• Optimal, but only if predictions 100% accurate
Jason Flinn
MobiHeld 2011
24
A better strategy
• Estimates should be probability distributions
– Need to understand independence of estimates
• Calculate best single option as before
• But, also consider redundant strategies
– E.g., send request over both networks
– Does decrease in exp. response time outweigh costs?
• Redundant strategies employed when uncertain
• Single strategies used when confident
Jason Flinn
MobiHeld 2011
25
Back to intentional networking
• Consider latency-sensitive traffic
– Predict distribution of network latencies
– Send over lowest latency network
– Send over additional networks when benefit exceeds cost
• But, not quite there yet…
– Predict a 10 ms. RTT for network x with high confidence
– How confident are we after 20 ms. with no response?
Jason Flinn
MobiHeld 2011
26
Pitfall #3
• Re-evaluate predictions due to new information
• Sometimes, lack of response is new information
• Consider conditional distributions
– Expected RTT given no response after n ms.
– Eventually, starting redundant option makes sense
• Intentional networking: trouble mode
– If no response after x ms., send request over 2nd network
– Cost: bandwidth, energy
– Benefit: less fragile system, less variance in response time
Jason Flinn
MobiHeld 2011
27
Generalizing: File systems
• BlueFS can read data from multiple locations
– Predict latency, energy for each, fetch from best option
>/bluefs
BlueFS
Server
Fallacy: assumes perfect prediction
• N/w degradation -> high latency
• Detect n/w failures w/long timeout
Instead, embrace redundancy!
• Uncertainty in device predictions
• Devices largely independent
• Consider fetching from >1 location
Jason Flinn
MobiHeld 2011
28
Generalizing: Cyber-foraging
• By another name: remote execution, code offload
• Idea: move computation from mobile to server
• Not always a good idea
– Shipping inputs and outputs over n/w takes time, energy
– Need to recoup costs through faster, remote execution
• Example systems:
– Spectra (I’ll pick on this one)
– RPF, CloneCloud, MAUI, etc.
Jason Flinn
MobiHeld 2011
29
Example: Language Translation
4 components that could be executed remotely
Execution of each engine is optional.
EBMT
Engine
Input
Text
Dictionary
Engine
Language
Modeler
Output
Text
Glossary
Engine
Input parameters: translation type and text size
Jason Flinn
MobiHeld 2011
30
Example: Execution Plan
• Spectra chooses and execution plan:
– which components to execute
– where to execute them
Remote
EBMT
Engine
Local
Input
Text
Dictionary
Engine
Language
Modeler
Output
Text
Glossary
Engine
Jason Flinn
MobiHeld 2011
31
Choosing an Execution Plan
Predict Resource
Supply
Choose an
Execution Plan
Calculate
Utility
Heuristic Solver
Predict
Resource
Demand
Calculate Performance,
Energy, & Quality
Choose Plan
w/Maximum Utility
Jason Flinn
MobiHeld 2011
32
Cyber-foraging: summary
• Fallacy: assumes many perfect predictions
– Network latency, bandwidth, CPU load, file system state
– Also computation, communication per input
• Reality: predicting supply and demand both hard
• A better approach:
– Consider redundant execution (local & remote)
– Reevaluate execution plan based on (lack of) progress
Jason Flinn
MobiHeld 2011
33
Embracing redundancy
• When does it make sense to consider redundancy?
– Imperfect predictions
– Options that are (largely) non-interfering
– Trading computational resources for human resources
• Why is this painful?
– With 20/20 hindsight, redundancy always wrong!
– But, right when the future is unknown.
Jason Flinn
MobiHeld 2011
34
Pushing the needle too far?
• Sometimes embracing redundancy opens new doors
• Thought experiment for cyber-foraging:
– What if redundant execution is the common case?
– Work with Mark Gordon, Morley Mao
Jason Flinn
MobiHeld 2011
35
Reflections
• What really distinguishes mobile systems?
• One clear distinction: resource variability
scarcity
–
–
–
–
–
Variable
Less
powerful
CPU load
CPU
Less memory
Different
platform capacities
Smaller storage
Variable
network bandwidth
Poorer network
Variable
networkquality
latency
Limited importance
Varying
battery power…
of energy…
• What’s the correct lesson for system design?
Jason Flinn
MobiHeld 2011
36
Conclusions
• Build systems robust to variance
– Trade computational resources for human resources
– Accept that our predictions are imperfect
– Embrace redundancy
• Questions?
Jason Flinn
MobiHeld 2011
37
Deterministic replay
• Record an execution, reproduce it later
– Academic and industry implementations
• Uses include:
–
–
–
–
Jason Flinn
Debugging: reproducing a software bug
Forensics: trace actions taken by an attacker
Fault tolerance: run multiple copies of execution
Many others…
MobiHeld 2011
38
How deterministic replay works
• Most parts of an execution are deterministic
• Only a few source of non-determinism
– E.g., system calls, context switches, and signals
Recorded
Execution
Initial state
NonDeterministic
Events
Jason Flinn
Log
MobiHeld 2011
Replayed
Execution
39
Prior approaches to improving latency
Code offload
Remote execution
Compute
Compute
State
Compute
Disadvantages:
(faster)
• Need large compute chunks
Results
• Need accurate
CPU, N/W predictions
Jason Flinn
MobiHeld 2011
I/O
Disadvantages:
Compute
• I/O
Poor interactive performance
• App state disappears with N/W
Compute
40
Idea: Replay-enabled redundancy
• Execute 2 copies of the application
– One on the mobile phone
– One on the server
– Use deterministic replay to make them the same
• Potential advantages:
– App state still on phone if server/network dies
– Less waiting: most communication asynchronous
– No prediction: fastest execution generates output
Jason Flinn
MobiHeld 2011
41
Redundant execution example
Initial app state
Get user input “A”
“A”
Supply “A” from log
Get user input “B”
“B”
Supply “B” from log
Get “Best of both worlds” by overlapping I/O w/slower execution
Jason Flinn
MobiHeld 2011
42
How much extra communication?
• Common sources of non-determinism:
–
–
–
–
–
User input (infrequent, not much data)
Network I/O (server can act as n/w proxy)
File I/O (potentially can use distributed storage)
Scheduling (make scheduler more deterministic)
Sensors (usually polled infrequently)
• High-data rate sensors (e.g., video) may be hard
• Hypothesis: will not send much extra data for most
mobile applications
Jason Flinn
MobiHeld 2011
43
What about energy usage?
• Compare to local execution
– Extra cost: sending initial state, logged events
• Several opportunities to save energy:
– Faster app execution -> less energy used
– No need for mobile to send network output
• Server proxy will send output on mobile’s behalf
– Can skip some code execution on the mobile
• Need to transmit state delta at end of skipped section
• A big win if application terminates after section
Jason Flinn
MobiHeld 2011
44
Download