Верхний колонтитул (оставить верстальщику) Modeling of Hybrid CDN-P2P for Full HD video downloading with realistic demand distribution Салищев С.И., ст. преп. кафедры информатики СПбГУ, sergey.i.salishev@gmail.com Шеин Р.Е., студент кафедры системного программирования СПбГУ, marso.des@gmail.com Introduction With rising popularity of Full High Definition (HD) quality video the problem of delivering such content to end-users online appeared. The existing Internet infrastructure is physically unable to stream Full HD quality video on demand to a significant number of users even in well-developed areas not mentioning rural areas and developing countries. So the only feasible solution is content downloading. Even for content downloading, the network bandwidth is still a problem. To overcome it a CDN is usually employed which distributes payload across different nodes and network segments. The number of films currently is over 100K and growing exponentially over time. Due to the long-tail property of the demand distribution [2] the number of the most popular items is growing at the same rate. Large number of popular items reduces the efficiency of caching and multicasting of data as most of the neighboring users watch different content. So the CDN capacity should exponentially grow with the demand. Another major problem is the content protection from redistribution. Unsecure digitally downloaded content is ready for redistribution with effort as small as one click, as opposed to the long process of grabbing a Blu-ray Disk (BD). So the content should be copy-protected. Digital Rights Management (DRM) is a standard technique in the downloaded video copy-protection. On the other hand, DRM hinders the user experience as it severely limits the ways of using the content below the level considered as the fair use. Also, DRM is always decrypted before showing, so it is susceptible to attack on the chain-oftrust, i.e. hacking the playback pipeline after the decoding. All known DRM including HDCP have already been hacked by hardware or software means. Due to bandwidth and copy-protection problems there is no Full HDquality online video service which could compete in popularity with physical BD sales. On the contrary, DVD physical sales are pushed by online services like Netflix, Hulu, and Amazon. Based on the multibillion size of the video market, developing such a service may create a promising business opportunity. P2P is similar to CDN except two properties. First, peers are not reliable sources, they can appear and disappear at random leading to Quality of Service Верхний колонтитул (оставить верстальщику) (QoS) degradation which is a problem for video streaming scenarios. Second, peers are not trusted to own non-copy-protected content. The content copyprotection problem for P2P is yet to be solved. P2P effectively solves the bandwidth problem as it effectively scales up with the number of users with the cost burdened on the users themselves. For the downloading scenario the constant QoS is a secondary concern. As in case of low QoS, a user can watch the video offline after downloading. The efficiency of P2P is supported by the latest Internet traffic analysis attributing about 30% of total Internet traffic to P2P [3]. P2P is future-proof for new content types like HDR video, retina resolution video, multi-view 3D video. While DRM can be adapted to P2P networks it is not quite suitable for it, as P2P is based not only on Internet networking, but also on social networking between users which presumes hardware neutrality and user convenience. The solution is embedding digital fingerprint using video watermarking on each copy of video and using traitor tracing to discourage users form redistribution. In this paper we consider Hybrid CDN-P2P system for video downloading. It is different usage model from streaming used by existing Adobe Live Video P2P system and system considered by LaFortune et al. [4]. As user can watch the content off-line QoS requirements are easier to comply as the system only needs to guarantee completing the download in reasonable time independently from order of blocks. While video streaming needs to guarantee a constant throughput for sequential blocks. In opposite to streaming, downloading system can provide reasonable service even for low bandwidth users and allows better load balancing as video is persistently stored on clients. The contribution of this paper is as follows: We analyze the data form Demonoid torrent tracker [5] and show that it is better fitted by Stretched Exponential (SE) distribution (complementary to Weibull) than Zipf distribution which supports the data of Guo et al. [6] on demand distribution in commercial VoD systems and for user generated video content. It suggests that SE distribution is universal for video demand independently from content source and method of delivery. We compare the data from Demonoid vs. IMDB [7], and demonstrate good correlation between user demand distribution and votes distribution within one year. It allows us to predict the demand distribution based on publicly available popularity statistics independently of method of video delivery. We implement the agent based model of P2P assisted CDN for video downloading using bit-torrent protocol with Demonoid demand distribution. Simulation shows 94% reduction in CDN traffic which exceeds the 75% Верхний колонтитул (оставить верстальщику) reduction reported by LaFortune et al. [4] for Hybrid CDN-P2P video streaming model. The main difference of our model is that we drop all QoS requirements needed for streaming and only implement QoS requirements embedded into bittorrent protocol which guarantee timely download completion. Data sets and demand analysis For this analysis we gathered Demonoid user statistics snapshot on 13.10.2011 and IMDB votes snapshot on 26.10.2011. We considered the CDN assisted with P2P network used for content downloading. Our hypothesis is that such a network would behave similarly to existing bit-torrent networks. We aimed to assess the impact of P2P assistance and downloading strategy on the CDN data traffic. Our first goal was to accurately study the user behavior. We used Demonoid statistics to analyze the actual demand in P2P networks (Fig. 1). The Demonoid data was fitted with Zipf distribution common to Web content and Stretched Exponential Distribution. SE better captures the demand distribution R2=0.99 than Zipf R2=0.77 with the scale parameter close to reported by Guo et al. [6] for commercial VoD systems. -1 10 demonoid 2 Zipf( =0.58) R =0.7664 -2 10 2 SE( =3.1,c=0.36) R =0.9927 -3 10 -4 PDF 10 -5 10 -6 10 -7 10 -8 10 -9 10 0 10 1 10 2 3 10 10 4 10 5 10 rank Figure 1. "Demonoid" downloaders in category "movie" Tracker statistics snapshot in the category ‘Movies’ by the end of 2011 containing 33K of unique film names with non-zero number of downloaders shows that 13.7% of films serve 86.3% of the users which is in agreement with statistics of Tan et al [8]. It should be noted that for niche markets the rate can be substantially higher, i.e. 28/72 for “noir” films (Fig.2). Верхний колонтитул (оставить верстальщику) Figure 2. Pareto number per category As we are interested in easier and more reliable prediction of user demand we investigated the correlation of the number of torrent downloaders with a popularity rating on ratings sites. We observed that within one year there is a high correlation (Pearson’s r=0.99) between the ranked number of votes on IMDB and ranked number of torrent downloaders on overlapping subset of movies (Fig. 3). It suggests that both values are generated by similar processes. Figure 3. "IMDB" vs. "Demonoid" demand on overlapping subset (ranked), 2010 year The correlation between votes and demand probability for the same film is much less explicit (Fig. 4); with r=0.84. It means that vote distribution only predicts the shape of demand distribution but not the demand probability for the specific film. This may be explained by the snapshot nature of Demonoid data and differences in audience between Demonoid and IMDB. The overall behavior for all years is substantially different as torrent popularity decreases exponentially over time, which does not happen on ratings sites. Верхний колонтитул (оставить верстальщику) Figure 4. "IMDB" vs. "Demonoid" demand on overlapping subset, 2010 year The usual model for the video demand probability is a Zipf distribution which is commonly observed in citation ratings and word frequencies in natural languages. However, it’s noted that the actual video demand distribution has differences [6][9], that are a limited fetch for top-rated items and an exponential cutoff for low-rated items so it is better fitted by Stretched Exponential distribution which is supported by our data. Currently there is no clear explanation of the process generating this distribution. P2P simulations To investigate the impact of changing the system architecture we implemented a behavioral simulator of a P2P network based on ROSS framework [10]. We used two-layer network model consisting of a star-like backbone model and local networks (LAN) with CDN modeled as a super-peer (Fig. 5). We implemented bit-torrent protocol simulation for communication between nodes. As we considered only downloading usage model we implemented no QoS requirements needed for streaming. We only implemented QoS requirements of bit-torrent protocol which guarantee the download completion in reasonable time. During the simulation with 10K peers we observed CDN traffic reduction by 94%. This is higher than 75% reduction reported by LaFortune et al. [4] for modeling of Hybrid CDN-P2P video streaming and is on par with the numbers for large scale P2P software distribution [11]. We achieve better result due to weaker QoS requirements and larger swarm of uploaders for the downloading scenario. The result suggests that P2P assisted CDN can be profitable even if operating in niche markets, i.e. no top-rated movies. Верхний колонтитул (оставить верстальщику) Figure 5. Star network topology Larger-scale simulations based on GPS P2P simulator were performed to eliminate potential mistakes in bit-torrent stack implementation. For these simulations a similar star topology network model with 200 LANs with 512 peers each was considered, CDN and tracker being super-peers. Due to performance issues in the simulator's core only a brief interval of time was feasible for simulation: after approximately 45-48 seconds simulated the simulator's core practically stalls due to overgrowth of events queue. The data obtained a show that upon stabilizing the CDN traffic reduction is 94% or more. Integrated copy-protection The bit-torrent protocol does not provide the copy-protection. Copy protection is a critical requirement from the content owners for any practical implementation of a video distribution system. To amend this problem we consider the combined solution including a centralized billing system (Fig. 6). P2P Network Peer Common Stream m CDN Private Stream Authority Figure 6. System with copy-protection layer The content is divided into a public and a private layer. The public layer is distributed through P2P-assisted CDN, while the private layer is distributed through a centralized system and is unique per user. The public layer is “encrypted” making the content useless to the user without the private layer. This can be achieved by modifying the semantic elements of video stream preventing it from decoding or producing embarrassing artifacts . After downloading both layers are jointly “decrypted” to provide a useful copy. The private layer can be as small as 1% of the data which does not substantially hinder the system performance. This approach is compatible with both DRM and fingerprinting copy-protection schemes. Верхний колонтитул (оставить верстальщику) Conclusions The correlation between Demonoid and IMDB ratings allows us to predict user demand based on publicly available statistics. We use Demonoid demand probability distribution to simulate Hybrid CDN-P2P video distribution system for downloading. Simulations with different tools show 94% traffic reduction for CDN. This demonstrates that Hybrid CDN-P2P for video downloading can effectively solve bandwidth shortage for full HD quality video distribution. Paired with the copy-protection suggested above and some motivation for peers to keep and upload stored video to the system the approach can dramatically improve speed and reduce maintenance costs of video distribution services. Most likely, the model can be adapted to other categories of bulk downloads beyond video like video games. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] C. Anderson, “The Long Tail: Why the Future of Business is Selling Less of More,” Hyperion, 2006. S. Goel, A. Broder, E. Gabrilovich, B. Pang, “Anatomy of the long tail: ordinary people with extraordinary tastes,” In Proceedings of the third ACM international conference on Web search and data mining (WSDM '10). ACM, New York, NY, USA, 2010, pp.201-210. C. Labovitz, “The Other 50% of the Internet”, 54th North America Network Operators’ Group Meeting (NANOG54), February 2012. R. LaFortune, C.D. Carothers, W.D. Smith, J. Czechowski, W. Xi, "Simulating Large-Scale P2P Assisted Video Streaming," 42nd Hawaii International Conference on System Sciences (HICSS '09), Jan. 2009. “Demonoid torrent tracker,” http://demonoid.me L. Guo, E. Tan, S. Chen, Z. Xiao, X. Zhang. “Does internet media traffic really follow Zipflike distribution?” In Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (SIGMETRICS '07). ACM, New York, NY, USA, 2007, pp.359-360. “International Movie Database,” http://imdb.com T. F. Tan and S. Netessine. Is Tom Cruise threatened? Using Netflix Prize data to examine the long tail of electronic commerce. Working Paper, 2009. M. Cha, H. Kwak, P. Rodriguez, Y. Ahn, S. Moon. “Analyzing the video popularity characteristics of large-scale user generated content systems,” IEEE/ACM Trans. Netw. 17, 5 October 2009, pp.1357-1370. C. D. Carothers, D. Bauer, S. Pearce, “ROSS: A high-performance, low-memory, modular Time Warp system,” Journal of parallel and distributed computing (Elsevier) 62 (11), 2002, pp.1648–1669. C. Huang, A. Wang, J. Li, K.W. Ross, “Understanding hybrid CDN-P2P: why limelight needs its own Red Swoosh,” In Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV '08). ACM, New York, NY, USA, 2008, pp.75-80 Z. Dekun, N. Prigent, J. Bloom, "Compressed video stream watermarking for peer-to-peer based content distribution network," IEEE International Conference on Multimedia and Expo, (ICME 2009), pp.1390-1393, 2009, June 28 GPS peer-to-peer simulator http://www.cs.binghamton.edu/~nael/gps/