Network Modeling (NetMod): Wed. 8:45-12:00am Instructor: Thrasyvoulos Spyropoulos Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr Eurecom, Sophia-Antipolis A Few Words About Your Teacher 1995-2000: Undergraduate studies in Greece National Technical University of Athens (NTUA) Specialization: Telecommunications and Networking 2000-2006: MSc and PhD in Los Angeles, California University of Southern California (USC) Thesis: Perf. Analysis and Protocols for Wireless Networks 2006-2007: INRIA, Sophia-Antipolis Post-doc at Planete group 2007-2010: ETH, Zurich Senior Researcher/Lecturer 2010-present: EURECOM Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Tenured Professor Eurecom, Sophia-Antipolis 2 A Few Words About the Class Goal: to teach some mathematical tools that are valuable, when trying to understand real networks…in fact real systems! STOCHASTIC PROCESSES NETWORK SCIENCE Learn to deal with Randomness Modeling Large Networks Cloud Computing APPLICATIONS Web Server Farms Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis LTE/4G Networks Web Server Farms 3 Why Randomness? Many things in nature are random => same for networks more precisely: increased complexity described as randomness Randomness in: Propagation phenomena (coding, diversity) Location and mobility of nodes (handoff) Traffic/Service arrival patterns (cellular capacity allocation) Next link to be clicked on a webpage (browser prefetch) Size of files downloaded (cache sizing) Computing job arrivals and duration (cloud computing) Number of (facebook) friends per user (advertising) … Computer Science Approach • Design algorithm • Deal with worst case Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Electrical Engineering Approach • Optimize for probable cases • Ignore rare events Eurecom, Sophia-Antipolis 4 Why Large Networks? Network of Internet Routers Online Social Nets (FaceBook) Mesh Networks Most networks can be modeled as a large graph Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 5 Need a Science of (Social/Large/Complex) Networks Difficult to study/model a specific graph Specific graph: an instance of a random graph with specific qualitative properties “Complex/Social Network Analysis or Network Science” the study of qualitative properties of large graphs/networks Degree distribution, diameter, connectivity, clusters (a) WHY do these properties arise? (Scientist) (b) HOW can they be exploited? (Engineer) Degree distribution searching, security Clustering advertising, information spreading Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 6 Where Do They All Come Together Many interesting problems on Networks can be modeled as a Random Process on a (Random) Graph Searching Resource Allocation in 4G/5G Malware Spread A timely course! Few course around on these topics Performance Analysis classes from CalTech and Carnegie Mellon Univ Complex/Social Networks classes from CalTech and Cornell University Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 7 Course Textbooks - Reading Material PART I: Stochastic Processes and Queueing “Performance Modeling and Design of Computer Systems” by Mor-Harchol-Balter – shared copies in library – online: http://proquestcombo.safaribooksonline.com/book/electricalengineering/computer-engineering/9781139610834 “Stochastic Processes” by S. Ross – library “Introduction to Probability Models” by S. Ross – library PART II: Complex Network Analysis “Networks, Crowds, and Markets: Reasoning About a Highly Connected World” by D. Easley and T. Kleinberg – pdf freely available online “Networks: An Introduction” by M. Newman – shared copy in library Additional reference material (tutorials, articles) per topic Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 8 Course Evaluation (Grading) Regular Homeworks --- 20% of Grade Among them 1-2 “lab” sessions Midterm Exam (after Part I) --- 30% of Grade Final Exam --- 50% of Grade Participation --- extra credit! Office Hours: TBD Class Web Site: http://www.eurecom.fr/~spyropou/netmod2015.htm Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 9 Course Expectations What to expect from me: To make the class entertaining Many examples and application Interaction Interaction Interaction! To teach you key insights Not just “tools” when to use which tool why it works What I expect from you: To study the assigned material every week To work hard on your homeworks Interaction Interaction Interaction!!! Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 10 Course Prerequisites Introductory Probability Theory Distributions (bernoulli, geometric, binomial, gaussian, poisson) Expectations, Variance, etc. Conditional probabilities and expectations Independence and Correlation Review Reference: “Introduction to Prob. Models” or “A First Course in Probability” by Sheldon Ross – available in library (very!) Elementary Linear Algebra Matrix multiplication Solving Linear Systems Eigenvalues Check out Gilbert Strang’s online lectures for a refresher (excellent!) A tiny bit of MatLab Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 11 Probability Refresher: Playing the Odds Betting on the Roulette 18 red 18 black 2 green John observes the roulette and counts 5 reds in a row Q: In the next roll should he bet on red or black? Q: What if John sees 20 reds in a row?? On August 18, 1913, at the casino in Monte Carlo, black came up a record John: “Itimes should bet on black! 21 reds in awas row are VERY unlikely! twenty-six in succession in roulette… There a near-panicky rush to bet on red, “No, beginning about the black hadRolls comeare up aindependent!” phenomenal fifteen James: it makes no time difference! times. …players doubled and tripled their stakes, led to believe after black came Q: Prob{20 reds in a row followed by a black}? up the twentieth time that there was not a chance in a million of another repeat. In the end thereds unusual runrow}? enriched the Casino by some millions of francs. Prob{21 in a Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 12 Probability Refresher: Change or Not? 1. Pick a door: 1 door has a price! The other 2 have goats Q: A door with a goat is opened: do you want to change your chosen door, or stay with the one you have? A: Switching gives you a 2/3 chance to win! Look up Monty Hall problem – Conditional probabilities Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 13 Probability Refresher: You Be The Judge A terrible crime has occurred in city A, and John is one suspect A DNA matching that of John is found in the crime scene This is the only evidence against John Two DNAs matching have a 1 in a million chance The prosecutor and jury conclude John is guilty Q: Were they right? City A has about 10 million people. Q: What is the chance that John is innocent? A: John is innocent with a 90% chance!!! BAYES RULE: P[Innocent | Evidence] Thrasyvoulos Spyropoulos / spyropou@eurecom.fr P[Evidence | Innocence]P[Innocence] P[Evidence] Eurecom, Sophia-Antipolis 14 Probability Refresher: Network Bootstrapping Sensor node bootstrapping Each node has a unique ID Goal: each node needs to broadcast x.y.z its ID to all other nodes Protocol Node X picks a slot n uniformly in [1,N] P{n = i} = 1/N Broadcasts its ID in slot n t SUCCESS: no other node picked n COLLISION: 2 or more nodes picked n nodes fail and stay “off” Tradeoff: Low N many collisions || High N long delay Q: If 30 nodes, what is the minimum N P{collision} < 10%? A: 200? 500? N 1000? > 45005000? (look up “birthday paradox”) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 15 Why Modeling and Performance Analysis? Amazon Cloud Traffic from different clients Throughput? Delay per Client? Best Algorithm? E.g. Resource Allocation (CPU/Network/Memory) Identify bottlenecks: Improve Eurecom WiFi: a) b) Install more Access Points? (network is too sparse) Propose a better channel selection algorithm? (high interference) Knowing perf. analysis can make you good consultants :-) “All models are bad! But some are useful” by statistician George Box Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 16 Upgrading the Company’s Web Server Job requests / per second 2x Jobs served / per second ? Your company has just got very positive publicity The incoming load (arrival rate of requests) to the web server is expected to double as a result Your boss tells you that you need to upgrade the server with a faster one (higher service rate μ) to ensure the same mean response time E[T] Q: How much should you increase the service rate? a) b) c) Double the server speed? More than double the speed? Less than double the speed? Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 17 Supermarket Queues and FDMA vs. CSMA 30 cust/hour 10 cust/hour OPTION 1 3 slow cashiers 3 lines randomly choose line and stay there Each cashier: 10 customers per hour Thrasyvoulos Spyropoulos / spyropou@eurecom.fr OPTION 2 1 fast cashier single line 30 customers per hour Eurecom, Sophia-Antipolis 18 Supermarket Queues and FDMA vs. CSMA 30 cust/hour 10 cust/hour Option 1 Option 2 Q: Which options has the smallest waiting time? A: Option 2 is 3x faster! - Option 1 or Option 2? Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 19 19 Supermarket Queues and FDMA vs. CSMA (2) 30 cust/hour 10 cust/hour Option 2 Option 3 Q: Which options has the smallest waiting time? A: Similar delay (for high load) - Option 2 or Option 3? - Low load: Option 2 or Option 3? A: Option 2 up to 3x faster Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 20 Supermarket Queues and FDMA vs. CSMA (3) How does all these relate to networking?? 600KHz OPTION 1: FDMA Separate 200KHz channel to each Flows do not compete 600KHz OPTION 2: CSMA Each node senses the channel first If idle transmit pkt using 600KHz If busy queue (wait) Q: Which option would you prefer for data? Q: Which option would you prefer for voice? Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 21 Google’s PageRank: Searching for Web Pages What lies behind this simple box?? Searching the Web Step 1: crawl all web pages and create index of {keywords-web pages} Step 2: User enters keywords (e.g. “Network Modeling”) Step 3: Google finds all web pages matching these keywords (these 3 steps are generic to almost every search engine) Step 4: Return a list of matches ranked by importance. HOW??? Intuition 1: page important if many web pages refer to it Intuition 2: page important if important web pages refer to it Solution: PageRank algorithms solves an appropriate Markov Chain Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 22 the Celebrated (and Demonized!) Poisson Proccess Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr Eurecom, Sophia-Antipolis Arrivals of Customers/Packets: How to Model? Iceland Volcano: Why could you not talk to airline cust. service? New Years Eve: Why can I not call my relatives? Problem 1: Call Center Dimensioning Customers call randomly Assume (for now!) duration of each call is fixed N workers : if all busy, call is dropped Question: What should N be to ensure at most 5% of calls are dropped? Case 1: calls arrive regularly (one every X min) Case 2: calls arrive in bursts (many together, then silence) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 24 Arrivals of Customers/Packets: How to Model? Problem 2: Internet Router Buffer Sizing Packets arrive at a core router Need to be buffered before forwarded further Question: How large should the buffer be (to ensure few drops)? buffers Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 25 Lesson: Arrival Models => System Design Need to know/model the (random) arrival of “work” => to optimize the system! Calls at a call center => to pick the number of employees Calls to a base station (inside a cell) => to allocate frequencies Packets at a router => to choose the right buffer size (large) jobs at a cluster/supercomputer => to choose the number of CPUs What might we need to know? Average amount of work per min/hour/day Probability of 3, 4, 5 customers arriving within T min Probability that > N customers arrive within T min Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 26 Poisson Distribution Rate of events: λ average number of events in an interval Probability of n events in an interval λne λ P(n) n! Examples well approximated by Poisson distribution The number of deaths per year in a given age group. The number of phone calls arriving at a call centre per minute. The number of new sessions arriving at a web server per hour The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry … Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 27 Poisson Distribution Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 28 Poisson Process (Definition 1) time T1 T2 Definition 1: A counting process viewpoint Property 1 (“independent increments”): # of arrivals in nonoverlapping intervals (e.g. N(T1) and N(T2)) is independent Property 2 (“stationary increments”): # of arrivals in [t1,t2] only depends on (t2-t1) Property 3: # of arrivals N(t) in interval t is Poisson (λt) t e λt Prob{N(t) n} n n! Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 29 Poisson Process: 2 More Definitions T: exponential Definition 2: a Renewal Process viewpoint Inter-arrivals times are independent Time T between arrivals (“renewals”) is exponential(λ) Prob{T t} 1 - e λt dt Definition 3: “aggregate of many rare events” Prob{1 event in dt} = λdt + o(dt) (independently of past events) Prob{> 1 events in dt} = o(dt) (negligible as dt -> 0) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 30 All Definitions are Equivalent!! We can go (prove) from any definition to any other Definition 1 (Poisson) => Definition 2 (Exponential) Prob{T > t) = Prob{0 events in t} => t 0 e λt Prob{0 events in t} Prob{N(t) 0} 0! T is exponential e λt Definition 1 (Poisson) => Definition 3 (rare events) 2 dt dt λdt Prob{N(dt) 0} e 1 1 dt o(dt ) 2! dt dt 2 λdt Prob{N(dt) 1} dt e dt (1 ) dt o(dt ) 1! 2! Prob{ N (dt) 1} 1 Prob{ N (dt) 1} - Prob{ N (dt) 0} o(dt ) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr 1! Eurecom, Sophia-Antipolis 31 Poisson as a Binomial Approximation δt t P{arrival} = λ•δt Number of arrivals N(t) in t Binomial(n, p) If Then n = t/ δt p = λ•δt + o(δt) δt 0, such that np = λt: Binomial (n,p) Poisson (λt) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 32 Poisson Properties: Waiting Time to n-th Arrival Time to wait until the next arrival (T1) is exponential Time to wait until the n-th arrival (Sn=T1+T2+…+Tn)? T1 T2 0 t Sn Sum of n independent and identically distributed (IID) exponential random variables Gamma Distribution (pdf) f S (t) λe λt λt n 1 (n 1)! How to get this? Proof 1: Moment Generating Function Proof 2: (CDF) Fs(t) = Prob{Sn ≤ t} = P{N(t) ≥ n} (Ross, Ch.2) Proof 3: P{t < Sn < t+dt} = P{n-1 events in t,1 event in (t,t+dt)} (Ross) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 33 Poisson Properties: 1 arrival in a window T 1 arrivals in T S1 0 t T We are told that 1 arrival has occurred in the interval T Question: When did it happen exactly? NOTE: this is a conditional probability Answer: Arrival is uniformly distributed: any instant in the interval is equally probable P{S1 ≤ s} = s/T (0 ≤ s ≤ T) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 34 Poisson Properties: N arrivals in a window T S1 S2 S3 0 t n arrivals in T N arrivals in T each arrival is uniformly distributed Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 35 An Example: Energy-Efficiency Sensed data: Poisson (λ) 0 sleep T 2T t wakeup wakeup Receives event readings with rate λ must sent to a base station To save battery power: (a) wireless card in sleep mode, (b) queue events during sleep mode, (c) wake up every T minutes and transmit all queued events QoS: When an event is queued for cost of queueing for t : c(t) = ct Q: What is the total cost incurred each period T? A: 0.5 • c •λT2 Q: Assume battery consumption is a(T) = a/T. What is the optimal T? Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 36 Poisson Thinning/Sampling Assume Poisson arrivals with rate λ A 2nd random process is created as follows: We accept each arrival with probability p < 1 (or reject with 1-p) accept with prob p X X X X XX X T Question: what is the expected number of arrivals within T? Answer: p•λT Question: what is the second process? Answer: Poisson with rate pλ Proof? Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 37 Poisson Thinning Examples Load Balancer: Assign job i with probability pj to CPU j Q: Input process to CPU j? A: Poisson with rate pj ·λ www.movie-clips.com: 1 slow and 1 fast server| job size < S e.g. short clips CPU2: p2 CPU3: CPU4: p3 p4 Poisson rate λ load balancer New jobs: Poisson rate λ scheduler CPU1: p1 slow fast job size ≥ S e.g. long movies Job size x is random ~ CDF is F(x) Q: Is the input process to the slow server Poisson? Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 38 Poisson Process Merging Ethernet packets from each Base Station are Poisson Poisson λ1 Poisson λ2 Poisson λ3 Question: what is the arrivals process of ALL packets at the input of the Ethernet switch? Answer: Poisson with rate λ1 +λ2 + λ3 rate λ1 + rate λ2 Poisson(λ1+λ2) Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis Compound Poisson Process Definition: A stochastic process Map of open WiFi access points (AP) [X(t), t ≥ 0] is a compound Poisson process if: N(t) X(t) Yi i 1 [N(t), t ≥ 0] is a Poisson process [Yi, i ≥ 1] is a family of IID random variables, independent of N(t) User uploads (a large no. of) pictures on Results 1) E[X(t)] = λt•E[Y1] 2) Var(X(t)) = λt•E[Y12] Thrasyvoulos Spyropoulos / spyropou@eurecom.fr DropBox using WiFi only User walks around randomly encounters APs as a Poisson process with rate λ Bytes uploaded during each WiFi is random: Yi Depends on speed, congestion, distance Q: How long until all pictures uploaded? Eurecom, Sophia-Antipolis 40 Poisson Process: Why We Like It Memory-less property: simplifies models No need to know/keep track of the past to predict future - Stationary behavior is sufficient! Good approximation for aggregate “traffic” of many and independent sources Palm-Khintchine Theorem Why we don’t like it: Not always true Many workloads have “heavy-tailed” properties memory Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis 41