About Me Joshua Silver 4th year CS major – graduating in May Interests:

advertisement
About Me
Joshua Silver
 4th year CS major – graduating in May
 Specialization: Databases
 Interests:
 The business side of computing … and no, not IT
 How can companies use technology to improve
and enable their business
 Think Enterprise Web 2.0, mobile strategies, viral
promotion on the internet, Netflix
recommendation engine, e-commerce, etc. etc.
 Startups!
Sleepers & Workaholics
Caching Strategies in Mobile Computing
Authors: Dr. Daniel Barbará and Dr.Tomasz Imielinski
Presented by:
Joshua Silver, Fall 2008
Sleepers & Workaholics
Caching Strategies in Mobile Computing
Dr. Daniel Barbará
 Professor at George Mason University
 Several patents associated with mobile caching
Dr.Tomasz Imielinski
 Professor at Rutgers University
 Senior VP: Search Technology at Ask.com
The Big Picture Problem
 Wireless devices have limited bandwidth,
limited storage, and limited battery life
 To save power, devices go offline
 Mobile devices appear randomly in new cells
 Makes data caching difficult since server can’t
track client caches
Then and now
 Paper written in 1994
 Devices, bandwidth, battery limitations are
different
 Essential problem still exists
With an explosion of wireless devices,
the problem is even greater
US Cell Phone Subscribers
250
24 Million in 1994
>240 Million in 2008
Millions
200
150
100
50
0
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
… and that doesn’t even take into account proprietary
handheld units (like UPS driver delivery computers ,
Amazon Kindles, grocery store handheld scanners, etc.)
Source: CTIA—The Wireless Association. http://www.infoplease.com/ipa/A0933563.html
Why Caching is Important
Conserve:
1. Computational resources
2. Battery life
3. Network bandwidth
Can’t store entire dataset on handheld.
-US maps on GPS unit
-Delivery routes for UPS drivers
-Contact list on Blackberry
Traditional Strategies Fail
In a traditional client-server model:
 the server keeps track of client caches
 pushes only the changes/sends cache invalidation
messages
BUT…. Server lacks knowledge of:
 Which units are in its cell
 Which units are powered ON
Quintessential problem:
Client caches in a mobile environment
cannot be tracked by a server
The Solution
Purpose: "…to propose a taxonomy
of different cache invalidation
strategies and study the impact of
clients' disconnection times on
their performance."
Sleepers & Workaholics proposes a few
solutions and evaluates their
effectiveness with mathematical rigor
Evaluation Criteria
Complicated math! …. The paper’s appendices have details.
Essentially: Define two types of Mobile Units
 Sleepers (offline/off all the time)
 Workaholics (never go offline)
 Almost all real world devices fall in between
How do you compare?
Normalize by defining “hit ratio” since it affects
overall throughput
valid cache hits
HX 
total data size
Strategies to Evaluate
Proposed Strategies:
 Timestamps (TS)
 Amnesic Terminals (AT) (only remembering part – like amnesia)
 Signatures (SIG)
Control Strategy:
 No Cache (NC)
Timestamps
-Each cache entry has a timestamp
-Synchronous, history based, uncompressed in nature
SERVER:
Communicates with clients every n seconds (and retries until
successfully connected)
Sends a list of items and their associated timestamps
(to accommodate for potential delay in transmission)
CLIENT:
For each item in cache:
 If entry is in received report from server, purge from cache
 If NOT in report, simply update timestamp to current time
Amnesic Terminals
-Each cache entry has a identifier
-ALSO Synchronous, history based, uncompressed in nature
SERVER:
Notify clients of identifiers of items changed since the last
invalidation report.
CLIENT:
For each item in cache:
◦ If in report, purge from cache
◦ If NOT in report, do nothing
◦ ALSO, if enough time has elapsed, drop WHOLE cache and rebuild
completely.
Signatures
-Checksums calculated over value of data to form Signature
-Since the mobile unit does not have entire database, need an
algorithm to compute a partial checksum – see the appendix
-Signatures combined using XOR
-Synchronous, state based, compressed reports
SERVER:
Server broadcasts the set of combined signatures
CLIENT:
Item in cache is declared invalid if it belongs to “too many”
unmatching signatures (suspected of being out of date)
No Cache
There is no cache
SERVER:
Responds to direct queries from the client with appropriate
information
CLIENT:
Query the database directly anytime item is needed
Conclusions on Effectiveness
Strategy depends on circumstances:

Signatures best for long sleepers, when the disconnection
period is long and difficult to predict

Timestamps best for query-intensive scenarios, when
the rate of queries is greater than the rate of updates,
provided that units are not workaholics

Amnesiac Terminals is best for workaholics, units that
are awake most of the time
Still not satisfied …. how can we
improve effectiveness?
Only 2 options:
1. Update less often
or
2. Send less info
Relax the Consistency of the Cache
Depending on data type, data may not need to be
exact…
EX: stocks, weather, etc.
Allow to vary by a set tolerance (like .05% for stock
prices, outdated weather reports by 2 hours, etc)
Makes shorter invalidation reports possible
How Do We Decide to Update?
- Consider cached copies to be quasi-copies
- Each quasi-copy has a coherency
condition attached to it
Coherency Conditions:
Delay Condition - updated based on time
Arithmetic Condition - updated based on difference
between data and quasi-copy
Criticism
Which resources are most scarce is not
really still accurate (eg. bandwidth better
than predicted, longer battery life)
 Units rarely powered down

◦ Battery life better than predicted
◦ Battery life does not dictate use patterns …
reception does also

Units still lose reception frequently
◦ Today’s most common “sleeper” condition -explicitly excluded from definition in S&W
Download