Storage Optimization for a Peer-to-Peer Video-OnDemand Network Jagadeesh M. Dyaberi, Vijay S. Pai, and Karthik Kannan Purdue University Introduction Video-on-Demand (VoD) is commercially important (video anytime, anywhere) Huge growth in VoD subscribers and requests HDTV-quality streams enabled by increasing downlink bandwidth Clients VoD Server 2 IPTV definition The official definition approved by the International Telecommunication Union focus group on IPTV (ITU-T FG IPTV) is as follows: "IPTV is defined as multimedia services such as television/video/audio/text/graphics/data delivered over IP based networks managed to provide the required level of quality of service and experience, security, interactivity and reliability." (e.g., Amazon VoD; Hulu; YouTube) 3 VoD Growth (a) Subscriber growth (b) # of VoD requests Data analysis of logs from a nationwide IPTV service provider (Amazon VoD 7.5% per month; 75% more requests per day July 2009; spike on weekend) 4 IPTV Architecture National Server Constrained Servers Regional Servers Constrained Link Community Switch/ DSLAM STB STB 5 Network trends Client-server model alone is not scalable Inter-client (peer to peer) data transfers can help Especially Peer-to-Peer (P2P) System under ISP-level control! (e.g., cable operator) Demand varies cyclically during day 6 Contributions Present mathematical model to optimize data allocation and retrieval Pre-populate data during low-load phase Simulation results show reduction in server load up to 50% 7 Outline Introduction and Contributions Background Optimization Formulation Results and Conclusion 8 Background: IPTV Characteristics Customers are provided with a Set-top Box(STB) to access IPTV service Built in hard-drive for DVR purposes Hard-drive capacity of STBs currently exceeds 100 GB STBs are always on (no “churn”) STBs centrally controlled 9 Background: IPTV Data Characteristics Viewing behavior influenced by content recency and external factors among others Nearly 47% of content overlapped between consecutive weeks Recently added content more popular Six of the top ten popular movies for the week were added at the beginning of the week 10 Background: BitTorrent (BT) Popular P2P file distribution protocol on the Internet Splits file into many pieces and downloads concurrently from multiple peers Toast added VoD Server to BT P2P network Server serves any piece upon request Modified Client Tracks current location in file stream Requests upcoming missing pieces from server Toast allows BitTorrent to operate normally, but still ensures clients have uninterrupted streams 11 Improving Toast with Pre-seeding Toast deals with a single stream Multiple streams result in fewer peers to download data from Pre-seeding: distribute data ahead of time to improve data availability in the network Pre-seeding done at low load cycle Centralized control obviates need for BitTorrent incentive mechanisms 12 Analysis: VoD system load Request Distribution by Hour 2500 1500 1000 500 9: 00 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 8: 00 7: 00 6: 00 5: 00 4: 00 3: 00 2: 00 1: 00 0 0: 00 # Of Requests 2000 Time Low load between 2 am to 8 am Exploit this low load cycle for pre-seeding (Source: Zebroid: using IPTV data to support peer-assisted VoD content delivery, NOSSDAV 2009) 13 Challenges in pre-seeding Peers must be providercontrolled to avoid churn (e.g., cable STBs) BitTorrent Peers System must consider capacity and bandwidth constraints of nodes, as well as object popularity Heavily replicate most popular objects Pre-seed less popular objects only if capacity remains Tracker 14 Pre-seeding strategies Random seeding Popularity-weighted random Optimal strategy Optimization framework with objective function (minimize server load) and constraints (bandwidth and capacity utilization at nodes) 15 Outline Introduction and Contributions Background Optimization Formulation Results and Conclusion 16 Optimization Formulation 17 Optimization Formulation Objective function Object i is present at node j Probability of request for object i Probability that node j is serving object i Function tries to minimize the probability a request is served by the server Constraint Node j is free to serve an object Probability there are n requests Node is free if it does not server any object over all possible requests Optimization Formulation Constraint Probability that node j is serving object i Node j will serve object i if • object i is present at node j • node j is free • no lower-numbered node can serve the object Constraint Capacity of node Formulation Simplification Poisson request arrival (rate of arrival λ and service time t) and Taylor series expansion Simplify Equation (2) to Linearize Equation(3) to Unlike real BitTorrent, self-service keeps a node 20 busy Outline Introduction and Contributions Background Optimization Formulation Results and Conclusion 21 Experimental Evaluation Optimization problem solved using GAMS/BARON Time limit leads to good but not always optimal solution Extended BitTorrent simulator developed by Bharambe et.al., from Microsoft Research No tit-for-tat, choking/unchoking “In-order” piece selection Pre-seeding capacity of 2 streams per STB 22 Experimental Evaluation Upload rate of 1 Mbps Download rate of 22 Mbps Bit rate of 2 Mbps Movie stream size of 1GB 40 clients and 120 streams Movie popularity Zipf curve with α = 1 Three pre-seeding strategies Solver solution for different time limit – 4 and 2.5 hours 23 Solver allocation for 4 hour limit 24 Full Load 25 Varying load level in the network 26 Load Balance – Optimized Standard Deviation of 30 chunks Max to Min Ratio: 2.8 27 Load Balance – Weighted Random Standard Deviation of 38 chunks Max to Min Ratio : 6 28 System Robustness Reduced problem size (25 nodes and 50 objects) for fully optimal solution 29 Summary Provider-controlled pre-seeding can help to reduce server load beyond basic P2P For best impact, must consider node capacity and bandwidth constraints, as well as object popularity Constrained optimization framework shown to outperform heuristics Reduction in server load up to 50% System robust to external events 30 Questions?