PPT - Eurecom

advertisement
Network Modeling (NetMod): Wed. 8:45-12:00am
Instructor: Thrasyvoulos Spyropoulos
Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr
Eurecom, Sophia-Antipolis
A Few Words About Your Teacher
1995-2000: Undergraduate studies in Greece
National Technical University of Athens (NTUA)
 Specialization: Telecommunications and Networking

2000-2006: MSc and PhD in Los Angeles, California
University of Southern California (USC)
 Thesis: Perf. Analysis and Protocols for Wireless Networks

2006-2007: INRIA, Sophia-Antipolis

Post-doc at Planete group
2007-2010: ETH, Zurich

Senior Researcher/Lecturer
2010-present: EURECOM

Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Tenured Professor
Eurecom, Sophia-Antipolis
2
A Few Words About the Class
Goal: to teach some mathematical tools that are valuable, when
trying to understand real networks…in fact real systems!
STOCHASTIC PROCESSES
NETWORK SCIENCE
Learn to deal with Randomness
Modeling Large Networks
Cloud Computing
APPLICATIONS
Web Server Farms
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
LTE/4G Networks
Web Server Farms
3
Why Randomness?
 Many things in nature are random => same for networks

more precisely: increased complexity  described as randomness
 Randomness in:








Propagation phenomena (coding, diversity)
Location and mobility of nodes (handoff)
Traffic/Service arrival patterns (cellular capacity allocation)
Next link to be clicked on a webpage (browser prefetch)
Size of files downloaded (cache sizing)
Computing job arrivals and duration (cloud computing)
Number of (facebook) friends per user (advertising)
…
Computer Science Approach
• Design algorithm
• Deal with worst case
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Electrical Engineering Approach
• Optimize for probable cases
• Ignore rare events
Eurecom, Sophia-Antipolis
4
Why Large Networks?
Network of Internet Routers Online Social Nets (FaceBook)
Mesh Networks
 Most networks can be modeled
as a large graph
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
5
Need a Science of (Social/Large/Complex) Networks
 Difficult to study/model a specific graph
 Specific graph: an instance of a random graph
with specific qualitative properties
 “Complex/Social Network Analysis or Network
Science”  the study of qualitative properties
of large graphs/networks



Degree distribution, diameter, connectivity, clusters
(a) WHY do these properties arise? (Scientist)
(b) HOW can they be exploited? (Engineer)
 Degree distribution  searching, security
 Clustering  advertising, information
spreading
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
6
Where Do They All Come Together
 Many interesting problems on Networks can be modeled as a
Random Process on a (Random) Graph
Searching
Resource Allocation in 4G/5G
Malware Spread
 A timely course! Few course around on these topics


Performance Analysis classes from CalTech and Carnegie Mellon Univ
Complex/Social Networks classes from CalTech and Cornell University
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
7
Course Textbooks - Reading Material
 PART I: Stochastic Processes and Queueing

“Performance Modeling and Design of Computer Systems”
by Mor-Harchol-Balter – shared copies in library – online:
http://proquestcombo.safaribooksonline.com/book/electricalengineering/computer-engineering/9781139610834


“Stochastic Processes” by S. Ross – library
“Introduction to Probability Models” by S. Ross – library
 PART II: Complex Network Analysis


“Networks, Crowds, and Markets: Reasoning About a
Highly Connected World” by D. Easley and T. Kleinberg –
pdf freely available online
“Networks: An Introduction” by M. Newman – shared copy
in library
 Additional reference material (tutorials, articles)
per topic
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
8
Course Evaluation (Grading)
 Regular Homeworks --- 20% of Grade

Among them 1-2 “lab” sessions
 Midterm Exam (after Part I) --- 30% of Grade
 Final Exam --- 50% of Grade
 Participation --- extra credit!
 Office Hours: TBD
 Class Web Site:
http://www.eurecom.fr/~spyropou/netmod2015.htm
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
9
Course Expectations
What to expect from me:
 To make the class entertaining


Many examples and application
Interaction Interaction Interaction!
 To teach you key insights

Not just “tools”  when to use which tool 
why it works
What I expect from you:
 To study the assigned material every week
 To work hard on your homeworks
 Interaction Interaction Interaction!!!
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
10
Course Prerequisites
 Introductory Probability Theory





Distributions (bernoulli, geometric, binomial, gaussian, poisson)
Expectations, Variance, etc.
Conditional probabilities and expectations
Independence and Correlation
Review Reference: “Introduction to Prob. Models” or “A First
Course in Probability” by Sheldon Ross – available in library
 (very!) Elementary Linear Algebra





Matrix multiplication
Solving Linear Systems
Eigenvalues
Check out Gilbert Strang’s online lectures for a refresher
(excellent!)
A tiny bit of MatLab
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
11
Probability Refresher: Playing the Odds
 Betting on the Roulette



18 red
18 black
2 green
 John observes the roulette and counts
5 reds in a row
Q: In the next roll should he bet on red
or black?
Q: What if John sees 20 reds in a row??
On August 18, 1913, at the casino in Monte Carlo, black came up a record

John: “Itimes
should
bet on black!
21 reds
in awas
row
are VERY unlikely!
twenty-six
in succession
in roulette…
There
a near-panicky
rush to
bet
on red, “No,
beginning
about the
black hadRolls
comeare
up aindependent!”
phenomenal fifteen

James:
it makes
no time
difference!
times. …players doubled and tripled their stakes, led to believe after black came
Q:
Prob{20 reds in a row followed by a black}?
up the twentieth time that there was not a chance in a million of another repeat.
In the
end thereds
unusual
runrow}?
enriched the Casino by some millions of francs.
Prob{21
in a
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
12
Probability Refresher: Change or Not?
1. Pick a door: 1 door has a price! The other 2 have goats
Q: A door with a goat is opened: do you want to change your
chosen door, or stay with the one you have?
A: Switching gives you a 2/3 chance to win! Look up Monty Hall
problem – Conditional probabilities
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
13
Probability Refresher: You Be The Judge
 A terrible crime has occurred in city A, and John is one
suspect
 A DNA matching that of John is found in the crime scene

This is the only evidence against John
 Two DNAs matching have a 1 in a million chance
 The prosecutor and jury conclude John is guilty
Q: Were they right?
 City A has about 10 million people.
Q: What is the chance that John is innocent?
A: John is innocent with a 90% chance!!!
BAYES RULE: P[Innocent | Evidence] 
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
P[Evidence | Innocence]P[Innocence]
P[Evidence]
Eurecom, Sophia-Antipolis
14
Probability Refresher: Network Bootstrapping
 Sensor node bootstrapping

Each node has a unique ID
 Goal: each node needs to broadcast
x.y.z
its ID to all other nodes
Protocol
 Node X picks a slot n uniformly
in [1,N]  P{n = i} = 1/N
 Broadcasts its ID in slot n


t
SUCCESS: no other node picked n
COLLISION: 2 or more nodes picked n  nodes fail and stay “off”
Tradeoff: Low N  many collisions || High N  long delay
Q: If 30 nodes, what is the minimum N  P{collision} < 10%?
A: 200? 500?
N 1000?
> 45005000?
(look up “birthday paradox”)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
15
Why Modeling and Performance Analysis?
Amazon Cloud
Traffic
from
different
clients
Throughput?
Delay per Client?
Best Algorithm? E.g. Resource
Allocation (CPU/Network/Memory)
 Identify bottlenecks: Improve Eurecom WiFi:
a)
b)
Install more Access Points? (network is too sparse)
Propose a better channel selection algorithm? (high interference)
 Knowing perf. analysis can make you good consultants :-)
“All models are bad! But some are useful”
by statistician George Box
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
16
Upgrading the Company’s Web Server
Job requests /
per second
2x 
Jobs served / per second
?
 Your company has just got very positive publicity  The
incoming load (arrival rate of requests) to the web server
is expected to double as a result
 Your boss tells you that you need to upgrade the server
with a faster one (higher service rate μ) to ensure the
same mean response time E[T]
Q: How much should you increase the service rate?
a)
b)
c)
Double the server speed?
More than double the speed?
Less than double the speed?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
17
Supermarket Queues and FDMA vs. CSMA
30 cust/hour
10 cust/hour
OPTION 1
 3 slow cashiers
 3 lines  randomly
choose line and stay there
 Each cashier: 10
customers per hour
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
OPTION 2
 1 fast cashier
 single line
 30 customers per
hour
Eurecom, Sophia-Antipolis
18
Supermarket Queues and FDMA vs. CSMA
30 cust/hour
10 cust/hour
Option 1
Option 2
Q: Which options has the smallest waiting time?
A: Option 2 is 3x faster!
- Option 1 or Option 2?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
19
19
Supermarket Queues and FDMA vs. CSMA (2)
30 cust/hour
10 cust/hour
Option 2
Option 3
Q: Which options has the smallest waiting time?
A: Similar delay (for high load)
- Option 2 or Option 3?
- Low load: Option 2 or Option 3? A: Option 2 up to 3x faster
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
20
Supermarket Queues and FDMA vs. CSMA (3)
 How does all these relate to networking??
600KHz
OPTION 1: FDMA
 Separate 200KHz channel to each
 Flows do not compete
600KHz
OPTION 2: CSMA
 Each node senses the channel first
 If idle  transmit pkt using 600KHz
 If busy  queue (wait)
Q: Which option would you prefer for data?
Q: Which option would you prefer for voice?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
21
Google’s PageRank: Searching for Web Pages
 What lies behind this simple box??
 Searching the Web
Step 1: crawl all web pages and create index of {keywords-web pages}
 Step 2: User enters keywords (e.g. “Network Modeling”)
 Step 3: Google finds all web pages matching these keywords
(these 3 steps are generic to almost every search engine)
 Step 4: Return a list of matches ranked by importance. HOW???

 Intuition 1: page important if many web pages refer to it
 Intuition 2: page important if important web pages refer to it
Solution: PageRank algorithms solves an appropriate Markov Chain
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
22
the Celebrated (and Demonized!)
Poisson Proccess
Thrasyvoulos Spyropoulos / spyropoul@eurecom.fr
Eurecom, Sophia-Antipolis
Arrivals of Customers/Packets: How to Model?
Iceland Volcano: Why could you not talk to airline cust. service?
New Years Eve: Why can I not call my relatives?
Problem 1: Call Center Dimensioning
 Customers call randomly

Assume (for now!) duration of each call is fixed
 N workers : if all busy, call is dropped
 Question: What should N be to ensure at most 5% of
calls are dropped?


Case 1: calls arrive regularly (one every X min)
Case 2: calls arrive in bursts (many together, then silence)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
24
Arrivals of Customers/Packets: How to Model?
Problem 2: Internet Router Buffer Sizing
 Packets arrive at a core router
 Need to be buffered before forwarded further
 Question: How large should the buffer be (to ensure
few drops)?
buffers
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
25
Lesson: Arrival Models => System Design
 Need to know/model the (random) arrival of “work” => to
optimize the system!




Calls at a call center => to pick the number of employees
Calls to a base station (inside a cell) => to allocate frequencies
Packets at a router => to choose the right buffer size
(large) jobs at a cluster/supercomputer => to choose the number of
CPUs
 What might we need to know?



Average amount of work per min/hour/day
Probability of 3, 4, 5 customers arriving within T min
Probability that > N customers arrive within T min
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
26
Poisson Distribution
 Rate of events: λ

average number of events in an interval
 Probability of n events in an interval
λne  λ
P(n) 
n!
 Examples well approximated by Poisson distribution





The number of deaths per year in a given age group.
The number of phone calls arriving at a call centre per minute.
The number of new sessions arriving at a web server per hour
The number of soldiers killed by horse-kicks each year in each
corps in the Prussian cavalry
…
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
27
Poisson Distribution
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
28
Poisson Process (Definition 1)
time
T1
T2
 Definition 1: A counting process viewpoint
 Property 1 (“independent increments”): # of arrivals in nonoverlapping intervals (e.g. N(T1) and N(T2)) is independent
 Property 2 (“stationary increments”): # of arrivals in [t1,t2]
only depends on (t2-t1)
 Property 3: # of arrivals N(t) in interval t is Poisson (λt)

t  e  λt
Prob{N(t)  n} 
n
n!
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
29
Poisson Process: 2 More Definitions
T: exponential
 Definition 2: a Renewal Process viewpoint
 Inter-arrivals times are independent
 Time T between arrivals (“renewals”) is exponential(λ)
Prob{T  t}  1 - e λt
dt
 Definition 3: “aggregate of many rare events”
 Prob{1 event in dt} = λdt + o(dt) (independently of past events)
 Prob{> 1 events in dt} = o(dt) (negligible as dt -> 0)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
30
All Definitions are Equivalent!!
 We can go (prove) from any definition to any other
 Definition 1 (Poisson) => Definition 2 (Exponential)

Prob{T > t) = Prob{0 events in t} =>

t 0 e  λt
 Prob{0 events in t}  Prob{N(t)  0} 

0!
T is exponential
 e  λt
 Definition 1 (Poisson) => Definition 3 (rare events)
2






dt


dt

λdt

Prob{N(dt)  0}  e


1
   1  dt  o(dt )
2!

dt  dt 2
 λdt
Prob{N(dt)  1}  dt   e
 dt (1 

 )  dt  o(dt )
1!
2!
Prob{ N (dt)  1}  1  Prob{ N (dt)  1} - Prob{ N (dt)  0}  o(dt )
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
1!

Eurecom, Sophia-Antipolis
31
Poisson as a Binomial Approximation
δt
t
P{arrival} = λ•δt
 Number of arrivals N(t) in t  Binomial(n, p)


If
Then
n = t/ δt
p = λ•δt + o(δt)
δt  0, such that np = λt:
Binomial (n,p)  Poisson (λt)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
32
Poisson Properties: Waiting Time to n-th Arrival
 Time to wait until the next arrival (T1) is exponential
 Time to wait until the n-th arrival (Sn=T1+T2+…+Tn)?
T1
T2
0
t
Sn
 Sum of n independent and identically distributed (IID)
exponential random variables

Gamma Distribution
(pdf) f S (t)  λe
 λt
λt 
n 1
(n  1)!
 How to get this?



Proof 1: Moment Generating Function
Proof 2: (CDF) Fs(t) = Prob{Sn ≤ t} = P{N(t) ≥ n} (Ross, Ch.2)
Proof 3: P{t < Sn < t+dt} = P{n-1 events in t,1 event in (t,t+dt)} (Ross)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
33
Poisson Properties: 1 arrival in a window T
1 arrivals in T
S1
0
t
T
 We are told that 1 arrival has occurred in the interval T
 Question: When did it happen exactly?

NOTE: this is a conditional probability
 Answer: Arrival is uniformly distributed: any instant in the
interval is equally probable
 P{S1 ≤ s} = s/T
(0 ≤ s ≤ T)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
34
Poisson Properties: N arrivals in a window T
S1
S2
S3
0
t
n arrivals in T
 N arrivals in T  each arrival is uniformly distributed
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
35
An Example: Energy-Efficiency
Sensed data:
Poisson (λ)
0 sleep T
2T
t
wakeup
wakeup
 Receives event readings with rate λ  must sent to a base station
 To save battery power: (a) wireless card in sleep mode, (b) queue
events during sleep mode, (c) wake up every T minutes and transmit all
queued events
 QoS: When an event is queued for  cost of queueing for t : c(t) = ct
Q: What is the total cost incurred each period T?
A: 0.5 • c •λT2
Q: Assume battery consumption is a(T) = a/T. What is the optimal T?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
36
Poisson Thinning/Sampling
 Assume Poisson arrivals with rate λ
 A 2nd random process is created as follows:

We accept each arrival with probability p < 1 (or reject with 1-p)
accept with prob p
X
X X
X
XX
X
T





Question: what is the expected number of arrivals within T?
Answer: p•λT
Question: what is the second process?
Answer: Poisson with rate pλ
Proof?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
37
Poisson Thinning Examples

Load Balancer: Assign job i
with probability pj to CPU j
Q: Input process to CPU j?
A: Poisson with rate pj ·λ
www.movie-clips.com: 1 slow and 1 fast server|
job size < S
e.g. short clips
CPU2:
p2
CPU3: CPU4:
p3
p4
Poisson
rate λ
load balancer
New jobs:
Poisson rate λ
scheduler
CPU1:
p1
slow
fast
job size ≥ S
e.g. long movies
Job size x is random ~ CDF is F(x)
Q: Is the input process to the slow server Poisson?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
38
Poisson Process Merging
Ethernet packets from each Base Station are Poisson
Poisson λ1
Poisson λ2
Poisson λ3
 Question: what is the arrivals process of ALL packets at
the input of the Ethernet switch?
 Answer: Poisson with rate λ1 +λ2 + λ3
rate λ1
+
rate λ2
Poisson(λ1+λ2)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
Compound Poisson Process
 Definition: A stochastic process
Map of open WiFi access points (AP)
[X(t), t ≥ 0] is a compound Poisson
process if:
N(t)
X(t)   Yi
i 1


[N(t), t ≥ 0] is a Poisson process
[Yi, i ≥ 1] is a family of IID random
variables, independent of N(t)
 User uploads (a large no. of) pictures on
 Results
1) E[X(t)] = λt•E[Y1]
2) Var(X(t)) = λt•E[Y12]
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
DropBox using WiFi only
 User walks around  randomly encounters
APs as a Poisson process with rate λ
 Bytes uploaded during each WiFi is random: Yi
 Depends on speed, congestion, distance
Q: How long until all pictures uploaded?
Eurecom, Sophia-Antipolis
40
Poisson Process: Why We Like It
 Memory-less property: simplifies models

No need to know/keep track of the past to predict future
- Stationary behavior is sufficient!
 Good approximation for aggregate “traffic” of many and
independent sources

Palm-Khintchine Theorem
 Why we don’t like it:


Not always true
Many workloads have “heavy-tailed” properties  memory
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr
Eurecom, Sophia-Antipolis
41
Download