Efficient Content Distribution on Internet Who pays for showing a Web page to a user? • Receiving side – Users pay to small ISPs, who pay to big ISPs, who pay to even bigger ISPs; – Concerns: reduce traffic / better response time • Sending side – Web sites pay to ISP, or aggregators of ISPs (example: broadcast, akamai); – Concerns? It depends... A Simplistic View: Two Kinds of Web Sites • Portals: really want to be seen and want to be seen with high quality • Libraries: be available Who should pay for showing a Web page to the user? 1. Library pages: the user 2. Portal pages: the Web site Many ISPs so far are focusing on 1 (the users, cutting down traffic), and ignoring 2 (the content providers, QoS) – the ignorance creates a vacuum that lets akamai.com flourish What should be done? • A mechanism that links ISPs and Portals – Addressing the logistical difficulties: • One ISP charging many portals for content delivery; • One portals getting certain assurance of QoS from many ISPs • Trust between the parties • The mechanism should also be efficient – Caching, Replication, Routing, Differentiality Peregrine Net Inc. • To ISP: – Caching proxy – Caching proxy plus services --- 1-6 months – Content distribution box • To Portals: – Web acceleration proxy – ISP-coordinated QoS and load distribution --- 1-6 months – URL rewrite for content distribution The Proxy Products • Runs on Linux, FreeBSD, Solaris, NT and other standard OS • Scales from < 10Mb/s to over 4Gb/s • Tiered pricing: – – – – – <10Mb/s: free 10Mb/s - 30Mb/s: $2K-4K per copy 30Mb/s - 60Mb/s: $10K per copy or appliance 60Mb/s - 155Mb/s: more expensive 155Mb/s and up: clustering, even more expensive Service 1: Premium Content Management • Portals pay ISP for each object delivered from cache • * Optimal cache management balancing needs of users and thost of portals • * Efficient hit reporting to portals • Establishing trust: – Third-party inspection of software – statistical analysis of hit reports • Estimated time: 2 months Service 2: Active Cache Proxy • Caching objects instead of datagrams – Web servers provide a piece of code (cache applet) that is associated with an object – The object and code are cached at the proxy; upon cache hit, the code is executed to generate responses • Example: user-customized Web pages • Benefit: Scale the Web server! • Estimated time: 3-6 months Service 3: Content Distribution • Routing methods: – At Web server: rewrite URLs for image objects • what Sandpiper and Akamai are doing – At proxy: redirect URLs to content distribution box, or mark objects as permanently cached • * Optimal load balancing – Constant server + network load-monitoring • * Efficient content authentication • Estimated time: 3-6 months Service 4: Rent-A-Server • Rent when needed: – Web accelerator monitoring the load – If load exceeds limit, redirect or route requests to “rental” servers • Rental servers capable of handling dynamic contents via process migration technology • * Optimal server selection algorithm • Estimated time: 6 months How are we different from everyone else? • We do what everyone else does • We return part of the profits to ISPs, who carry the bits But, in addition • Everyone uses our proxies • Proxies control the routing • Proxies can do arbitrary transformation on the URLs Additional Service: Mining the Log • Web server performance data • User auto-rating of search results – Which item really answered the user’s question? Guess it from user’s surfing – Technique: build the user’s surfing graph with the search result as root • User-profiling and feedback to Ad servers Looking Forward: Efficient Video Content Distribution • Caching proxy capable of handling video streams • A hierarchy of caching proxies for video distribution – * Efficient “Prefix-Caching” algorithm – * Object popularity probing, and optional Satellite distribution for very popular objects Steps to get there • 1. proxy product and sales – Current beta testers: Siemens AG, Union Bank of Swiss, NetOne (Japan), a medium-size ISP in UK, JANET in UK • 2. Premium content management, active caching proxy, content distribution service • 3. Rent-A-Server After sufficient proxy deployment: log-mining and video distribution