Storage Allocation in Prefetching Techniques of Web Caches

advertisement
Storage Allocation in Prefetching
Techniques of Web Caches
D. Zeng, F. Wang, S. Ram
Appeared in proceedings of ACM conference
in Electronic commerce (EC’03) San Diego
June 9-12, 2003
Presented by Laura D. Goadrich
The Web


Large-scale distributed information system
where data Objects are published and
accessible by users
Problems caused by the demand of increased
web capacity:



Network traffic congestion
Web server overloads
Solution: web caching
Web caching:

Benefits:






Improves web performance (reduces access latency)
Increases web capacity
Alleviate traffic congestion (reducing network bandwidth
consumption)
Reducing number of client requests (workload)
Possibly improve failure tolerance and robustness of Web
(maintaining cached copies of web objects for unreachable
networks)
Prefetching:


Anticipate users’ future needs
This research:



Focuses on making cache-related storage capacity decisions (storage
capacity limits the number of prefetched web objects)
Therefore allocate cache storage in prefetching
The authors state this focus has not been researched**
Ideas:

Current research:


Predict user web accesses without considering
cache storage limit
This research: optimization based models
Maximize hit rate
 Maximize byte hit rate
 Minimize access latency
(first 2 are primary goals of web caching: maximize)


Benefit of this research: guide the operations
of a prefetching system
Web prefetching techniques

Client-initiated policies



Server-initiated policies



Anticipate future requests based on server logs and
proactively send the corresponding Web objects to
participating cache servers or client browsers
Top-n algorithm
Hybrid policies


User A is likely to access URL U2 right after URL U1
Patterns learned via Markov algorithms
Combine user access patterns from clients and general
statistics from servers to improve the quality of prediction
Failing of policies: how to make decisions of which
Web objects to prefetch considering storage capacity
Assumptions/Notation
C
maximum amount of storage space is
available to store prefetched Web
objects
i
URL of potential interest
Pi
Predicted probability with which URL i
will be visited
Prediction of users’ future accesses
(i, Pi)
N
Set of all URLs of potential interest
Si э (Si<C) Size of each Web object referred to by i
Hit Rate (HR) Model
Z HR  max  Pi X i
(1)
 Si X i  C
(2)
iN
iN
X i  0,1
i  N
(3)
Byte Hit Rate (BHR) Model
1
Z BHR  max
 Pi Si
S X
iN
i
 PS X
iN
iN
i
X i  0,1
C
i  N
i i
i
(4)
(2)
(3)
Byte Hit Rate (BHR) Model
αi
# of seconds to establish the network connection
between the client machine and the Web server
hosting i
βi
# of seconds per byte to transmit i over the network
1
Z AL  max
Pi ( i   i Si ) X i

 Pi ( i   i Si ) iN
(7)
S X
(2)
iN
i
iN
i
X i  0,1
C
i  N
(3)
Transforming HR, BHR & AL
into the Knapsack problem

Benefits of Knapsack problem





Well studied
“easiest” NP-hard problem
Can solve optimally by a pseudo-polynomial
algorithm based on dynamic programming
A fully polynomial approximation is possible
Focus on greedy algorithm (due to paper
length limits)
Greedy Algorithm:
1.
Sort all URLs into a sequence
i1 , i2 ,, i N 
2.
Pi1
Si1

Pi2
S i2

Pi N
Si N
Determine a threshold k defined as:
ij


k  max  j  1,2,  , N :   C 
i i1


3.
Prefetch Web objects referred to by URLs
i1 , i2 ,, ik
X i  1, if (i  {i1 , i2 ,, ik })
X i  0, otherwise
Other Allocation Policies Tested

Optimal policy using CPLEX

Disadvantages




Complex
Increased implementation time
Difficult to implement
Top-n



Developed for Web usage prediction
Used to regulate storage allocations by
appropriately setting n
Equivalent to Greedy BHR relying only on Pi
Simulations
a.
b.
|N|
rep
C
α/ β
Small
50
Text
100,000
5,000 (slow)
Large
200
multimedia
100,000
30,000 (fast)
LN(μ,σ)= lognormal distribution with mean eμ and shape σ
Performance Comparison
Experimental
Condition
Hit Rate
Byte Hit Rate
% Savings in
Access Latency
Opt G-HR Top-n Opt G-HR Top-n Opt G-HR Top-n
a=50, LN(10,.05),
b=5000
.47
.45
.44
.45
.44
.44
.45
.44
.44
a=50 , LN(10,.05) ),
b=30000
.47
.45
.44
.45
.44
.44
.46
.44
.44
a=50 , LN(10,1) ),
b=5000
.47
.44
.35
.32
.28
.28
.33
.29
.29
a=50 , LN(10,1) ),
b=30000
.47
.44
.35
.32
.28
.28
.38
.34
.32
a=200 , LN(10,.05) ),
b=5000
.36
.34
.33
.34
.33
.33
.34
.33
.33
a=200 , LN(10,.05) ),
b=30000
.36
.34
.33
.34
.33
.33
.35
.34
.33
a=200 , LN(10,1) ),
b=5000
.36
.34
.25
.20
.17
.17
.22
.18
.18
a=200 , LN(10,1) ),
b=30000
.36
.34
.25
.20
.17
.17
.27
.23
.21
Results



Greedy algorithms and Top-n in general
achieve reasonable performance
Greedy algorithms outperform Top-n with
respect to hit rate and access latency
There exists a relatively large performance
gap between an optimal approach and fast
heuristic methods when Web objects vary
greatly in size

Suggests the need for developing more
sophisticated allocation policies such as a dynamic
programming-based approach
Contributions:

Focus: stress importance of effective
storage allocation in prefetching

Paper contributions:
1.
2.
3.
Provide new formulations for prefetching storage
allocation
Create computationally efficient allocation
policies based on storage allocations solved by
the knapsack problem
Models created lead to more precise
understanding of the applicability and
effectiveness of Top-n policy
Future Work

Trace-based simulation



Actual web access logs
More realistic environment
Modeling

Integrate allocation models with caching
storage management models
i.e. Cache replacement
Changes- Recommendations




Not renaming the same constraints
More resources (5 articles, 2 books)
Discuss feasible solve times (opt)
Test/Hypothesize implementation
strategies for real application
Download