PERFORMANCE ANALYSIS OF COMMUNICATIONS NETWORKS AND SYSTEMS PIET VAN MIEGHEM Delft University of Technology cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge cb2 2ru, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521855150 © Cambridge University Press 2006 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2006 isbn-13 isbn-10 978-0-511-16917-5 eBook (NetLibrary) 0-511-16917-5 eBook (NetLibrary) isbn-13 isbn-10 978-0-521-85515-0 hardback 0-521-85515-2 hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Waar een wil is, is een weg. to my father to my wife Saskia and my sons Vincent, Nathan and Laurens Contents Preface xi 1 Introduction 1 Part I 7 2 3 4 Probability theory Random variables 9 2.1 2.2 2.3 2.4 2.5 2.6 9 16 20 26 28 34 Probability theory and set theory Discrete random variables Continuous random variables The conditional probability Several random variables and independence Conditional expectation Basic distributions 37 3.1 3.2 3.3 3.4 3.5 3.6 3.7 37 43 47 51 54 58 59 Discrete random variables Continuous random variables Derived distributions Functions of random variables Examples of other distributions Summary tables of probability distributions Problems Correlation 61 4.1 4.2 4.3 61 67 68 Generation of correlated Gaussian random variables Generation of correlated random variables The non-linear transformation method v vi Contents 4.4 4.5 4.6 5 6 Examples of the non-linear transformation method Linear combination of independent auxiliary random variables Problem 8 9 78 82 Inequalities 83 5.1 5.2 5.3 5.4 5.5 5.6 5.7 83 84 86 87 90 92 94 The minimum (maximum) and infimum (supremum) Continuous convex functions Inequalities deduced from the Mean Value Theorem The Markov and Chebyshev inequalities The Hölder, Minkowski and Young inequalities The Gauss inequality The dominant pole approximation and large deviations Limit laws 97 6.1 6.2 6.3 6.4 97 101 103 104 General theorems from analysis Law of Large Numbers Central Limit Theorem Extremal distributions Part II 7 74 Stochastic processes 113 The Poisson process 115 7.1 7.2 7.3 7.4 7.5 7.6 115 120 122 129 130 132 A stochastic process The Poisson process Properties of the Poisson process The nonhomogeneous Poisson process The failure rate function Problems Renewal theory 137 8.1 8.2 8.3 8.4 8.5 138 144 149 153 155 Basic notions Limit theorems The residual waiting time The renewal reward process Problems Discrete-time Markov chains 157 9.1 157 Definition Contents 9.2 9.3 9.4 10 11 12 13 14 Discrete-time Markov chain The steady-state of a Markov chain Problems vii 158 168 177 Continuous-time Markov chains 179 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 179 180 187 188 193 195 196 199 Definition Properties of continuous-time Markov processes Steady-state The embedded Markov chain The transitions in a continuous-time Markov chain Example: the two-state Markov chain in continuous-time Time reversibility Problems Applications of Markov chains 201 11.1 Discrete Markov chains and independent random variables 11.2 The general random walk 11.3 Birth and death process 11.4 A random walk on a graph 11.5 Slotted Aloha 11.6 Ranking of webpages 11.7 Problems 201 202 208 218 219 224 228 Branching processes 229 12.1 12.2 12.3 12.4 12.5 231 233 237 240 243 The probability generating function The limit Z of the scaled random variables Zn The Probability of Extinction of a Branching Process Asymptotic behavior of Z A geometric branching processes General queueing theory 247 13.1 13.2 13.3 13.4 13.5 13.6 247 252 256 263 266 267 A queueing system The waiting process: Lindley’s approach The Benes̆ approach to the unfinished work The counting process PASTA Little’s Law Queueing models 271 viii Contents 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 The M/M/1 queue Variants of the M/M/1 queue The M/G/1 queue The GI/D/m queue The M/D/1/K queue The N*D/D/1 queue The AMS queue The cell loss ratio Problems Part III 15 16 17 Physics of networks 271 276 283 289 296 300 304 309 312 317 General characteristics of graphs 319 15.1 15.2 15.3 15.4 15.5 15.6 15.7 Introduction The number of paths with m hops The degree of a node in a graph Connectivity and robustness Graph metrics Random graphs The hopcount in a large, sparse graph with unit link weights 15.8 Problems 319 321 322 325 328 329 The Shortest Path Problem 347 16.1 The shortest path and the link weight structure 16.2 The shortest path tree in NQ with exponential link weights 16.3 The hopcount kQ in the URT 16.4 The weight of the shortest path 16.5 The flooding time WQ 16.6 The degree of a node in the URT 16.7 The minimum spanning tree 16.8 The proof of the degree Theorem 16.6.1 of the URT 16.9 Problems 348 349 354 359 361 366 373 380 385 The e!ciency of multicast 387 17.1 General results for jQ (p) 17.2 The random graph Js (Q ) 17.3 The n-ary tree 388 392 401 340 346 Contents 17.4 17.5 17.6 17.7 17.8 18 The Chuang—Sirbu law Stability of a multicast shortest path tree Proof of (17.16): jQ (p) for random graphs Proof of Theorem 17.3.1: jQ (p) for n-ary trees Problem ix 404 407 410 414 416 The hopcount to an anycast group 417 18.1 18.2 18.3 18.4 18.5 18.6 417 419 423 424 431 Introduction General analysis The n-ary tree The uniform recursive tree (URT) Approximate analysis The performance measure in exponentially growing trees 432 Appendix A Stochastic matrices 435 Appendix B Algebraic graph theory 471 Appendix C Solutions of problems 493 Bibliography 523 Index 529 Preface Performance analysis belongs to the domain of applied mathematics. The major domain of application in this book concerns telecommunications systems and networks. We will mainly use stochastic analysis and probability theory to address problems in the performance evaluation of telecommunications systems and networks. The first chapter will provide a motivation and a statement of several problems. This book aims to present methods rigorously, hence mathematically, with minimal resorting to intuition. It is my belief that intuition is often gained after the result is known and rarely before the problem is solved, unless the problem is simple. Techniques and terminologies of axiomatic probability (such as definitions of probability spaces, filtration, measures, etc.) have been omitted and a more direct, less abstract approach has been adopted. In addition, most of the important formulas are interpreted in the sense of “What does this mathematical expression teach me?” This last step justifies the word “applied”, since most mathematical treatises do not interpret as it contains the risk to be imprecise and incomplete. The field of stochastic processes is much too large to be covered in a single book and only a selected number of topics has been chosen. Most of the topics are considered as classical. Perhaps the largest omission is a treatment of Brownian processes and the many related applications. A weak excuse for this omission (besides the considerable mathematical complexity) is that Brownian theory applies more to physics (analogue fields) than to system theory (discrete components). The list of omissions is rather long and only the most noteworthy are summarized: recent concepts such as martingales and the coupling theory of stochastic variables, queueing networks, scheduling rules, and the theory of long-range dependent random variables that currently governs in the Internet. The confinement to stochastic analysis also excludes the recent new framework, called Network Calculus by Le Boudec and Thiran (2001). Network calculus is based on min-plus algebra and has been applied to (Inter)network problems in a deterministic setting. As prerequisites, familiarity with elementary probability and the knowledge of the theory of functions of a complex variable are assumed. Parts in the text in small font refer to more advanced topics or to computations that can be skipped at first reading. Part I (Chapters 2—6) reviews probability theory and it is included to make the remainder self-contained. The book essentially starts with Chapter 7 (Part II) on Poisson processes. The Poisxi xii Preface son process (independent increments and discontinuous sample paths) and Brownian motion (independent increments but continuous sample paths) are considered to be the most important basic stochastic processes. We briefly touch upon renewal theory to move to Markov processes. The theory of Markov processes is regarded as a fundament for many applications in telecommunications systems, in particular queueing theory. A large part of the book is consumed by Markov processes and its applications. The last chapters of Part II dive into queueing theory. Inspired by intriguing problems in telephony at the beginning of the twentieth century, Erlang has pushed queueing theory to the scene of sciences. Since his investigations, queueing theory has grown considerably. Especially during the last decade with the advent of the Asynchronous Transfer Mode (ATM) and the worldwide Internet, many early ideas have been refined (e.g. discrete-time queueing theory, large deviation theory, scheduling control of prioritized flows of packets) and new concepts (self-similar or fractal processes) have been proposed. Part III covers current research on the physics of networks. This Part III is undoubtedly the least mature and complete. In contrast to most books, I have chosen to include the solutions to the problems in an Appendix to support self-study. I am grateful to colleagues and students whose input has greatly improved this text. Fernando Kuipers and Stijn van Langen have corrected a large number of misprints. Together with Fernando, Milena Janic and Almerima Jamakovic have supplied me with exercises. Gerard Hooghiemstra has made valuable comments and was always available for discussions about my viewpoints. Bart Steyaert eagerly gave the finer details of the generating function approach to the GI/D/m queue. Jan Van Mieghem has given overall comments and suggestions beside his input with the computation of correlations. Finally, I thank David Hemsley for his scrupulous corrections in the original manuscript. Although this book is intended to be of practical use, in the course of writing it, I became more and more persuaded that mathematical rigor has ample virtues of its own. Per aspera ad astra January 2006 Piet Van Mieghem 1 Introduction The aim of this first chapter is to motivate why stochastic processes and probability theory are useful to solve problems in the domain of telecommunications systems and networks. In any system, or for any transmission of information, there is always a non-zero probability of failure or of error penetration. A lot of problems in quantifying the failure rate, bit error rate or the computation of redundancy to recover from hazards are successfully treated by probability theory. Often we deal in communications with a large variety of signals, calls, sourcedestination pairs, messages, the number of customers per region, and so on. And, most often, precise information at any time is not available or, if it is available, deterministic studies or simulations are simply not feasible due to the large number of dierent parameters involved. For such problems, a stochastic approach is often a powerful vehicle, as has been demonstrated in the field of physics. Perhaps the first impressing result of a stochastic approach was Boltzmann’s and Maxwell’s statistical theory. They studied the behavior of particles in an ideal gas and described how macroscopic quantities as pressure and temperature can be related to the microscopic motion of the huge amount of individual particles. Boltzmann also introduced the stochastic notion of the thermodynamic concept of entropy V, V = n log Z where Z denotes the total number of ways in which the ensembles of particles can be distributed in thermal equilibrium and where n is a proportionality factor, afterwards attributed to Boltzmann as the Boltzmann constant. The pioneering work of these early physicists such as Boltzmann, Maxwell and others was the germ of a large number of breakthroughs in science. Shortly after their introduction of stochastic theory in classical physics, the 1 2 Introduction theory of quantum mechanics (see e.g. Cohen-Tannoudji et al., 1977) was established. This theory proposes that the elementary building blocks of nature, the atom and electrons, can only be described in a probabilistic sense. The conceptually di!cult notion of a wave function whose squared modulus expresses the probability that a set of particles is in a certain state and the Heisenberg’s uncertainty relation exclude in a dramatic way our deterministic, macroscopic view on nature at the fine atomic scale. At about the same time as the theory of quantum mechanics was being created, Erlang applied probability theory to the field of telecommunications. Erlang succeeded to determine the number of telephone input lines p of a switch in order to serve QV customers with a certain probability s. Perhaps his most used formula is the Erlang E formula (14.17), derived in Section 14.2.2, p Pr [QV = p] = Ppp! m m=0 m! where the load or tra!c intensity is the ratio of the arrival rate of calls to the telephone local exchange or switch over the processing rate of the switch per line. By equating the desired blocking probability s = Pr [QV = p], say s = 1034 , the number of input lines p can be computed for each load . Due to its importance, books with tables relating s, and p were published. Another pioneer in the field of communications that deserves to be mentioned is Shannon. Shannon explored the concept of entropy V. He introduced (see e.g. Walrand, 1998) the notion of the Shannon capacity of a channel, the maximum rate at which bits can be transmitted with arbitrary small (but non zero) probability of errors, and the concept of the entropy rate of a source which is the minimum average number of bits per symbol required to encode the output of a source. Many others have extended his basic ideas and so it is fair to say that Shannon founded the field of information theory. A recent important driver in telecommunication is the concept of quality of service (QoS). Customers can use the network to transmit dierent types of information such as pictures, files, voice, etc. by requiring a specific level of service depending on the type of transmitted information. For example, a telephone conversation requires that the voice packets arrive at the receiver G ms later, while a file transfer is mostly not time critical but requires an extremely low information loss probability. The value of the mouth-to-ear delay G is clearly related to the perceived quality of the voice conversation. As long as G ? 150 ms, the voice conversation has toll quality, which is roughly speaking, the quality that we are used to in classical Introduction 3 telephony. When G exceeds 150 ms, rapid degradation is experienced and when G A 300 ms, most of the test persons have great di!culty in understanding the conversation. However, perceived quality may change from person to person and is di!cult to determine, even for telephony. For example, if the test person knows a priori that the conversation is transmitted over a mobile or wireless channel as in GSM, he or she is willing to tolerate a lower quality. Therefore, quality of service is both related to the nature of the information and to the individual desire and perception. In future Internetworking, it is believed that customers may request a certain QoS for each type of information. Depending on the level of stringency, the network may either allow or refuse the customer. Since customers will also pay an amount related to this QoS stringency, the network function that determines to either accept or refuse a call for service will be of crucial interest to any network operator. Let us now state the connection admission control (CAC) problem for a voice conversation to illustrate the relation to stochastic analysis: “How many customers p are allowed in order to guarantee that the ensemble of all voice packets reaches the destination within G ms with probability s?”This problem is exceptionally di!cult because it depends on the voice codecs used, the specifics of the network topology, the capacity of the individual network elements, the arrival process of calls from the customers, the duration of the conversation and other details. Therefore, we will simplify the question. Let us first assume that the delay is only caused by the waiting time of a voice packet in the queue of a router (or switch). As we will see in Chapter 13, this waiting time W of voice packets in a single queueing system depends on (a) the arrival process: the way voice packets arrive, and (b) the service process: how they are processed. Let us assume that the arrival process specified by the average arrival rate and the service process specified by the average service rate are known. Clearly, the arrival rate is connected to the number of customers p. A simplified statement of the CAC problem is, “What is the maximum allowed such that Pr [W A G] ? ?” In essence, the CAC problem consists in computing the tail probability of a quantity that depends on parameters of interest. We have elaborated on the CAC problem because it is a basic design problem that appears under several disguises. A related dimensioning problem is the determination of the buer size in a router in order not to lose more than a certain number of packets with probability s, given the arrival and service process. The above mentioned problem of Erlang is a third example. Another example treated in Chapter 18 is the server placement problem: “How many replicated servers p are needed to guarantee that any user can access the information within n hops with probability Pr [kQ (p) A n] ”, where 4 Introduction is certain level of stringency and kQ (p) is the number of hops towards the most nearby of the p servers in a network with Q routers. The popularity of the Internet results in a number of new challenges. The traditional mathematical models as the Erlang B formula assume “smooth” tra!c flows (small correlation and Markovian in nature). However, TCP/IP tra!c has been shown to be “bursty” (long-range dependent, self-similar and even chaotic, non-Markovian (Veres and Boda, 2000)). As a consequence, many traditional dimensioning and control problems ask for a new solution. The self-similar and long range dependent TCP/IP tra!c is mainly caused by new complex interactions between protocols and technologies (e.g. TCP/IP/ATM/SDH) and by other information transported than voice. It is observed that the content size of information in the Internet varies considerably in size causing the “Noah eect”: although immense floods are extremely rare, their occurrence impacts significantly Internet behavior on a global scale. Unfortunately, the mathematics to cope with the self-similar and long range dependent processes turns out to be fairly complex and beyond the scope of this book. Finally, we mention the current interest in understanding and modeling complex networks such as the Internet, biological networks, social networks and utility infrastructures for water, gas, electricity and transport (cars, goods, trains). Since these networks consists of a huge number of nodes Q and links O, classical and algebraic graph theory is often not suited to produce even approximate results. The beginning of probabilistic graph theory is commonly attributed to the appearance of papers by Erdös and Rényi in the late 1940s. They investigated a particularly simple growing model for a graph: start from Q nodes and connect in each step an arbitrary random, not yet connected pair of nodes until all O links are used. After about Q@2 steps, as shown in Section 16.7.1, they observed the birth of a giant component that, in subsequent steps, swallows the smaller ones at a high rate. This phenomenon is called a phase transition and often occurs in nature. In physics it is studied in, for example, percolation theory. To some extent, the Internet’s graph bears some resemblance to the Erdös-Rényi random graph. The Internet is best regarded as a dynamic and growing network, whose graph is continuously changing. Yet, in order to deploy services over the Internet, an accurate graph model that captures the relevant structural properties is desirable. As shown in Part III, a probabilistic approach based on random graphs seems an e!cient way to learn about the Internet’s intriguing behavior. Although the Internet’s topology is not a simple ErdösRényi random graph, results such as the hopcount of the shortest path and the size of a multicast tree deduced from the simple random graphs provide Introduction 5 a first order estimate for the Internet. Moreover, analytic formulas based on other classes of graphs than the simple random graph prove di!cult to obtain. This observation is similar to queueing theory, where, beside the M/G/x class of queues, hardly closed expressions exist. We hope that this brief overview motivates su!ciently to surmount the mathematical barriers. Skill with probability theory is deemed necessary to understand complex phenomena in telecommunications. Once mastered, the power and beauty of mathematics will be appreciated. Part I Probability theory 2 Random variables This chapter reviews basic concepts from probability theory. A random variable (rv) is a variable that takes certain values by chance. Throughout this book, this imprecise and intuitive definition su!ces. The precise definition involves axiomatic probability theory (Billingsley, 1995). Here, a distinction between discrete and continuous random variables is made, although a unified approach including alsoR mixed cases via the Stieltjes integral (Hardy et al., 1999, pp. 152—157), j({)gi ({), is possible. In general, the distribution I[ ({) = Pr [[ {] holds in both cases, and Z X j(n) Pr[[ = n] where [ is a discrete rv j({)gI[ ({) = n Z = j({) gI[ ({) g{ g{ where [ is a continuous rv In most practical situations, the Stieltjes integral reduces to the Riemann integral, else, Lesbesgue’s theory of integration and measure theory (Royden, 1988) is required. 2.1 Probability theory and set theory Pascal (1623—1662) is commonly regarded as one of the founders of probability theory. In his days, there was much interest in games of chance1 and the likelihood of winning a game. In most of these games, there was a finite number q of possible outcomes and each of them was equally likely. The 1 “La règle des partis”, a chapter in Pascal’s mathematical work (Pascal, 1954), consists of a series of letters to Fermat that discuss the following problem (together with a more complex question that is essentially a variant of the probability of gambler’s ruin treated in Section 11.2.1): Consider the game in which 2 dice are thrown q times. How many times q do we have to throw the 2 dice to throw double six with probability s = 12 ? 9 10 Random variables probability of the event D of interest was defined as qD Pr [D] = q where qD is the number of favorable outcomes (samples points of D). If the number of outcomes of an experiment is not finite, this classical definition of probability does not su!ce anymore. In order to establish a coherent and precise theory, probability theory employs concepts of group or set theory. The set of all possible outcomes of an experiment is called the sample space . A possible outcome of an experiment is called a sample point $ that is an element of the sample space . An event D consists of a set of sample points. An event D is thus a subset of the sample space . The complement Df of an event D consists of all sample points of the sample space that are not in (the set) D, thus Df = \D. Clearly, (Df )f = D and the complement of the sample space is the empty set, f = > or, vice a versa, >f = . A family F of events is a set of events and thus a subset of the sample space that possesses particular events as elements. More precisely, a family F of events satisfies the three conditions that define a -field2 : (a) f > 5 F, (b) if D1 > D2 > = = = 5 F, then ^" m=1 Dm 5 F and (c) if D 5 F, then D 5 F. These conditions guarantee that F is closed under countable unions and intersections of events. Events and the probability of these events are connected by a probability measure Pr [=] that assigns to each event of the family F of events of a sample space a real number in the interval [0> 1]. As Axiom 1, we require that Pr [ ] = 1. If Pr [D] = 0, the occurrence of the event D is not possible, while Pr [D] = 1 means that the event D is certain to occur. If Pr [D] = s with 0 ? s ? 1, the event D has probability s to occur. If the events D and E have no sample points in common, D _ E = >, the events D and E are called mutually exclusive events. As an example, the event and its complement are mutually exclusive because D _ Df = >. Axiom 2 of a probability measure is that for mutually exclusive events D and E holds that Pr [D ^ E] = Pr [D]+Pr [E]. The definition of a probability measure and the two axioms are su!cient to build a consistent framework on which probability theory is founded. Since Pr [>] = 0 (which follows from 2 A field F posseses the properties: (i) M F; (ii) if D> E M F, then D E M F and D K E M F; (iii) if D M F, then Df M F= This definition is redundant. For, we have by (ii) and (iii) that (D E)f M F. Further, by De Morgan’s law (D E)f = Df K E f , which can be deduced from Figure 2.1 and again by (iii), the argument shows that the reduced statement (ii), if D> E M F, then D E M F, is su!cient to also imply that D K E M F. 2.1 Probability theory and set theory 11 Axiom 2 because D _ > = > and D = D ^ >), for mutually exclusive events D and E holds that Pr [D _ E] = 0. As a classical example that explains the formal definitions, let us consider the experiment of throwing a fair die. The sample space consists of all possible outcomes: = {1> 2> 3> 4> 5> 6}. A particular outcome of the experiment, say $ = 3, is a sample point $ 5 . One may be interested in the event D where the outcome is even in which case D = {2> 4> 6} and Df = {1> 3> 5}. If D and E are events, the union of these events D ^ E can be written using set theory as D ^ E = (D _ E) ^ (Df _ E) ^ (D _ E f ) because D_E, Df _E and D_E f are mutually exclusive events. The relation is immediately understood by drawing a Venn diagram as in Fig. 2.1. Taking ABc AB AcB A B : Fig. 2.1. A Venn diagram illustrating the union D ^ E. the probability measure of the union yields Pr [D ^ E] = Pr [(D _ E) ^ (Df _ E) ^ (D _ E f )] = Pr [D _ E] + Pr [Df _ E] + Pr [D _ E f ] (2.1) where the last relation follows from Axiom 2. Figure 2.1 shows that D = (D _ E) ^ (D _ E f ) and E = (D _ E) ^ (Df _ E). Since the events are mutually exclusive, Axiom 2 states that Pr [D] = Pr [D _ E] + Pr [D _ E f ] Pr [E] = Pr [D _ E] + Pr [Df _ E] Substitution into (2.1) yields the important relation Pr [D ^ E] = Pr [D] + Pr [E] Pr [D _ E] (2.2) Although derived for the measure Pr [=], relation (2.2) also holds for other measures, for example, the cardinality (the number of elements) of a set. 12 Random variables 2.1.1 The inclusion-exclusion formula A generalization of the relation (2.2) is the inclusion-exclusion formula, Pr [^qn=1 Dn ] = q X Pr [Dn1 ] n1 =1 + q X q X q X Pr [Dn1 _ Dn2 ] n1 =1 n2 =n1 +1 q X q X Pr [Dn1 _ Dn2 _ Dn3 ] n1 =1 n2 =n1 +1 n3 =n2 +1 q q X X q1 + · · · + (1) ··· n1 =1 n2 =n1 +1 q X ¤ £ Pr _qm=1 Dnm (2.3) nq =nq1 +1 The formula shows that the probability of the union consists of the sum of probabilities of the individual events (first term). Since sample points can belong to more than one event Dn , the first term possesses double countings. The second term removes all probabilities of samples points that belong to precisely two event sets. However, by doing so (draw a Venn diagram), we also subtract the probabilities of samples points that belong to three events sets more than needed. The third term adds these again, and so on. The inclusion-exclusion formula can be written more compactly as, q q q q i h X X X X q m31 (2.4) (1) ··· Pr _mp=1 Dnp Pr [^n=1 Dn ] = m=1 n1 =1 n2 =n1 +1 or with h i Pr _mp=1 Dnp X Vm = nm =nm31 +1 1$n1 ?n2 ?===?nm $q as Pr [^qn=1 Dn ] = q X (1)m31 Vm (2.5) m=1 Proof of the inclusion-exclusion formula 3 : Let D = q31 n=1 Dn and E = Dq such that 3 Another proof (Grimmett and Stirzacker, 2001, p. 56) uses the indicator function defined in Section 2.2.1. Useful indicator function relations are 1DKE = 1D 1E 1Df = 1 3 1D 1DX E = 1 3 1(DE)f = 1 3 1Df KE f = 1 3 1Df 1E f = 1 3 (1 3 1D )(1 3 1E ) = 1D + 1E + 1D 1E = 1D + 1E + 1DKE Generalizing the last relation yields 1q D =13 n=1 n q \ (1 3 1Dn ) n=1 Multiplying out and taking the expectations using (2.13) leads to (2.3). 2.1 Probability theory and set theory 13 q31 q31 D E = q n=1 Dn and D K E = Dq K n=1 Dn = n=1 Dn K Dq by the distributive law in set theory, then application of (2.2) yields the recursion in q l l k k q31 q31 Pr [q n=1 Dn ] = Pr n=1 Dn + Pr [Dq ] 3 Pr n=1 Dn K Dq (2.6) By direct substitution of q < q 3 1, we have k l k l l k q32 q32 Pr q31 n=1 Dn = Pr n=1 Dn + Pr [Dq31 ] 3 Pr n=1 Dn K Dq31 while substitution in this formula of Dn < Dn K Dq gives l k l l k k q32 q32 Pr q31 n=1 Dn K Dq = Pr n=1 Dn K Dq + Pr [Dq31 K Dq ] 3 Pr n=1 Dn K Dq K Dq31 Substitution of the last two terms into (2.6) yields l k q32 Pr [q n=1 Dn ] = Pr [Dq31 ] + Pr [Dq ] 3 Pr [Dq31 K Dq ] + Pr n=1 Dn k l k l k l q32 q32 3 Pr q32 n=1 Dn K Dq31 3 Pr n=1 Dn K Dq + Pr n=1 Dn K Dq K Dq31 (2.7) Similarly, in a next iteration we use (2.6) after suitable modification in the right-hand side of (2.7) to lower the upper index in the union, k l k l l k q33 q33 Pr q32 n=1 Dn = Pr n=1 Dn + Pr [Dq32 ] 3 Pr n=1 Dn K Dq32 l k l k q33 Pr q32 n=1 Dn K Dq31 = Pr n=1 Dn K Dq31 + Pr [Dq32 K Dq31 ] k l 3 Pr q33 n=1 Dn K Dq31 K Dq32 l k l l k k q33 q33 Pr q32 n=1 Dn K Dq = Pr n=1 Dn K Dq + Pr[Dq32 K Dq ]3Pr n=1 Dn K Dq K Dq32 k l k l q33 Pr q32 n=1 Dn K Dq K Dq31 = Pr n=1 Dn K Dq K Dq31 + Pr [Dq32 K Dq K Dq31 ] l k 3 Pr q33 n=1 Dn K Dq K Dq31 K Dq32 The result is Pr [q n=1 Dn ] = Pr [Dq32 ] + Pr [Dq31 ] + Pr [Dq ] + 3 Pr [Dq32 K Dq31 ] 3 Pr [Dq32 K Dq ] l k 3 Pr [Dq31 K Dq ] + Pr [Dq32 K Dq31 K Dq ] + Pr q33 n=1 Dn k l k l k l q33 q33 3 Pr q33 n=1 Dn K Dq32 3 Pr n=1 Dn K Dq31 3 Pr n=1 Dn K Dq k l k l q33 + Pr q33 n=1 Dn K Dq31 K Dq32 + Pr n=1 Dn K Dq K Dq32 k l k l q33 + Pr q33 n=1 Dn K Dq K Dq31 3 Pr n=1 Dn K Dq K Dq31 K Dq32 which starts revealing the structure of (2.3). Rather than continuing the iterations, we prove the validity of the inclusion-exclusion formula (2.3) via induction. In case q = 2, the basic expression (2.2) is found. Assume that (2.3) holds for q, then the case for q + 1 must obey (2.6) where q < q + 1, k l q q Pr q+1 n=1 Dn = Pr [n=1 Dn ] + Pr [Dq+1 ] 3 Pr [n=1 Dn K Dq+1 ] 14 Random variables Substitution of (2.3) into the above expression yields, after suitable grouping of the terms, q q k l [ [ Pr q+1 Pr Dn1 3 n=1 Dn = Pr[Dq+1 ] + n1 =1 + q [ Pr Dn1 K Dn2 3 Pr Dn1 K Dq+1 n1 =1 n2 =n1 +1 q [ q [ q [ q [ n1 =1 q [ Pr Dn1 KDn2 K Dn3 + n1 =1 n2 =n1 +1 n3 =n2 +1 + · · · + (31)q31 q [ + · · · + (31)q q [ ··· = Pr [Dn ] 3 n1 =1 + q+1 [ q+1 [ k l Pr Kq m=1 Dnm 3 q [ ··· n1 =1 n2 =n1 +1 q+1 [ Pr Dn1 KDn2 KDq+1 n1 =1 n2 =n1 +1 q [ n1 =1 n2 =n1 +1 q [ q [ nq =nq1 +1 l k Pr Kq m=1 Dnm K Dq+1 q [ nq =nq1 +1 Pr Dn1 K Dn2 n1 =1 n2 =n1 +1 q+1 [ q+1 [ q+1 [ Pr Dn1 K Dn2 K Dn3 n1 =1 n2 =n1 +1 n3 =n2 +1 + · · · + (31)q q+1 [ q+1 [ ··· n1 =1 n2 =n1 +1 which proves (2.3). q+1 [ l k Pr Kq m=1 Dnm K Dq+1 nq+1 =nq +1 ¤ Although impressive, the inclusion-exclusion formula is useful when dealing with dependent random variables because of its general nature. In parh i m ticular, if Pr _p=1 Dnp = dm and not a function of the specific indices np , the inclusion-exclusion formula (2.4) becomes more attractive, Pr [^qn=1 Dn ] = q X (1)m31 dm m=1 X 1 1$n1 ?n2 ?===?nm $q µ ¶ q X m31 q = (1) dm m m=1 An application of the latter formula to multicast can be found in Chapter 17 and many others are in Feller (1970, Chapter IV). Sometimes it is useful to reason with the complement of the union (^qn=1 Dn )f = \ ^qn=1 Dn = _qn=1 Dfn . Applying Axiom 2 to (^qn=1 Dn )f ^ (^qn=1 Dn ) = , Pr [(^qn=1 Dn )f ] = Pr [ ] Pr [^qn=1 Dn ] and using Axiom 1 and the inclusion-exclusion formula (2.5), we obtain Pr [(^qn=1 Dn )f ] = 1 q q X X (1)m31 Vm = (1)m Vm m=1 m=0 (2.8) 2.1 Probability theory and set theory 15 with the convention that V0 = 1. The Boole’s inequalities Pr [^qn=1 Dn ] q X Pr [Dn ] (2.9) n=1 Pr [_qn=1 Dn ] 1 q X Pr [Dfn ] n=1 are derived as consequences of the inclusion-exclusion formula (2.3). Only if all events are mutually exclusive, the equality sign in (2.9) holds whilst the inequality sign follows from the fact that possible overlaps in events are, in contrast to the inclusion-exclusion formula (2.3), not subtracted. The inclusion-exclusion formula is of a more general nature and also applies to other measures on sets than Pr [=], for example to the cardinality as mentioned above. For the cardinality of a set D, which is usually denoted by |D|, the inclusion-exclusion variant of (2.8) is |(^qn=1 Dn )f | = q X (1)m |Vm | (2.10) m=0 where the total number of elements in the sample space is |V0 | = Q and ¯ ¯ X ¯ m ¯ |Vm | = ¯_p=1 Dnp ¯ 1$n1 ?n2 ?===?nm $q A nice illustration of the above formula (2.10) applies to the sieve of Eratosthenes (Hardy and Wright, 1968, p. 4), a procedure to construct the table of prime numbers4 up to Q . Consider the increasing sequence of integers = {2> 3> 4> = = = > Q } and remove successively all multiples of 2 (even numbers starting from 4, 6, ...), all multiples of 3 (starting from 32 and not yet removed previously), all multiples of 5, all multiples of the next number larger than 5 and still in the list (which is the prime 7) and so on, up to all multiples hs i of the largest Q . Here [{] is the possible prime divisor that is equal to or smaller than largest integer smaller than or equal to {. The remaining numbers in the list are prime numbers. Let us now compute the number of primes (Q ) smaller than or equal to Q by using the inclusion-exclusion formula (2.10). 4 An integer number s is prime if s A 1 and s has no other integer divisors than 1 and itself s. The sequence of the first primes are 2, 3, 5, 7, 11, 13, etc. If I d and e are divisors of q, then q = de from which it follows that d and e cannot exceed both q. Hence, any composite I number q is divisible by a prime s that does not exceed q. 16 Random variables The number of primes smaller than a real number { is ({) and, evidently, if sq denotes the q-th prime, then (sq ) = q. Let Dn denote the set of the multiples of the n-th prime sn that belong to . The number of such sets Dn in the sieve of Eratosthenes is equal to³the largest prime number sq smaller hs i s ´ than or equal to Q , hence, q = Q . If t 5 (^qn=1 Dn )f , this means that t is not divisible by each prime s number smaller than sq and that t is a prime number lying between Q ? t sQ . The cardinality of the set (^qn=1 Dn )f , the number of primes between Q ? t Q is ³s ´ f q |(^n=1 Dn ) | = (Q ) Q On the other hand, if u 5 _mp=1 Dnp for 1 n1 ? n2 ? · · · ? nm q, then u is a multiple of sn1 sn2 = = = snm and the number of multiples of the integer sn1 sn2 = = = snm in is ¸ ¯ ¯ Q ¯ ¯ = ¯_mp=1 Dnp ¯ sn1 sn2 = = = snm Applying ´ inclusion-exclusion formula (2.10) with | | = V0 = Q 1 and ³s the Q gives q= (Q ) q ³s ´ X Q =Q 1 (1)m m=1 X 1$n1 ?n2 ?===?nm $q Q sn1 sn2 = = = snm ¸ hs i Q , i.e. the The knowledge of the prime numbers smaller than or equal to ³s ´ first q = Q primes, su!ces to compute the number of primes (Q ) smaller than s or equal to Q without explicitly knowing the primes t lying between Q ? t Q . 2.2 Discrete random variables Discrete random variables are real functions [ defined on a discrete probability space as [ : $ R with the property that the event {$ 5 : [ ($) = {} 5 F for each { 5 R. The event {$ 5 : [ ($) = {} is further abbreviated as {[ = {}. A discrete probability density function (pdf) Pr[[ = {] has the following properties: (i) 0 Pr[[ = {] 1 for real { that are possible outcomes of an 2.2 Discrete random variables 17 experiment. The set of values { can be finite or countably infinite and constitute the discrete probability space. P (ii) { Pr[[ = {] = 1= In the classical example of throwing a die, the discrete probability space = {1> 2> 3> 4> 5> 6} and, since each of the six edges of the (fair) die is equally possible as outcome, Pr[[ = {] = 16 for each { 5 . 2.2.1 The expectation An important operator acting on a discrete random variable [ is the expectation, defined as X { Pr [[ = {] (2.11) H [[] = { The expectation H [[] is also called the mean or average or first moment of [. More generally, if [ is a discrete random variable and j is a function, then \ = j([) is also a discrete random variable with expectation H [\ ] equal to X H [j([)] = j({) Pr [[ = {] (2.12) { A special and often used function in probability theory is the indicator function 1| defined as 1 if the condition | is true and otherwise it is zero. For example, X X H [1[Ad ] = 1{Ad Pr [[ = {] = Pr [[ = {] = Pr[[ A d] { {Ad H [1[=d ] = Pr[[ = d] (2.13) The higher moments of a random variable are defined as the case where j({) = {q , X H [[ q ] = {q Pr [[ = {] (2.14) { From the definition (2.11), it follows that the expectation is a linear operator, # " q q X X dn [n = dn H [[n ] H n=1 n=1 The variance of [ is defined as h i Var[[] = H ([ H [[])2 (2.15) 18 Random variables The variance is always non-negative. Using the linearity of the expectation operator and = H [[], we rewrite (2.15) as £ ¤ Var[[] = H [ 2 2 (2.16) £ ¤ Since Var[[] 0, relation (2.16) indicates that H [ 2 (H [[])2 . Often p the standard deviation, defined as = Var [[], is used. An interesting variational principle of the variance follows, for the variable x, from i h i h H ([ x)2 = H ([ )2 + (x )2 which is minimized at x = = H [[] with value Var[[]. Hence, the best least square approximation of the random variable [ is the number H [[]. 2.2.2 The probability generating function The probability generating function (pgf) of a discrete random variable [ is defined, for complex }, as £ ¤ X { } Pr [[ = {] (2.17) *[ (}) = H } [ = { where the last equality follows from (2.12). If [ is integer-valued and nonnegative, then the pgf is the Taylor expansion of the complex function *[ (}). Commonly the latter restriction applies, otherwise the substitution ¡ ¢ } = hlw is used such that (2.17) expresses the Fourier series of *[ hlw . The importance of the pgf mainly lies in the fact that the theory of functions can be applied. Numerous examples of the power of analysis will be illustrated. Concentrating on non-negative integer random variables [, *[ (}) = " X Pr [[ = n] } n (2.18) n=0 and the Taylor coe!cients obey ¯ 1 gn *[ (}) ¯¯ Pr [[ = n] = n! g} n ¯}=0 Z *[ (}) 1 = g} 2l F(0) } n+1 (2.19) (2.20) where F(0) denotes a contour around } = 0. Both are inversion formulae5 . Since the general form H[j([)] is completely defined when Pr[[ = {] is 5 A similar inversion formula for Fourier series exist (see e.g. Titchmarsh (1948)). 2.2 Discrete random variables 19 known, the knowledge of the pgf results in a complete alternative description, ¯ " X j(n) gn *[ (}) ¯¯ H [j([)] = (2.21) n! g} n ¯}=0 n=0 Sometimes it is more convenient to compute values of interest directly from (2.17) £ ¤rather than from (2.21). For example, q-fold dierentiation of *[ (}) = H } [ yields µ ¶ ¸ £ ¤ [ [3q gq *[ (}) 1 [3q } = H [([ 1) · · · ([ q + 1)} = H q g} q q! such that ¯ µ ¶¸ [ 1 gq *[ (}) ¯¯ H = q q! g} q ¯}=1 (2.22) Similarly, let } = hw , then ¤ £ gq *[ (hw ) = H [ q hw[ q gw from which the moments follow as ¯ gq *[ (hw ) ¯¯ H [[ ] = gwq ¯ q (2.23) w=0 and, more generally, ¡ ¢¯ gq h3wd *[ (hw ) ¯¯ H [([ d) ] = ¯ ¯ gwq q (2.24) w=0 2.2.3 The logarithm of the probability generating function The logarithm of the probability generating function is defined as ¡ £ ¤¢ O[ (}) = log (*[ (})) = log H } [ (2.25) *0 (}) from which O[ (1) = 0 because *[ (1) = 1. The derivative O0[ (}) = *[ [ (}) ³ 0 ´2 00 (}) * * (}) shows that O0[ (1) = *0[ (1), while from O00[ (}) = *[ *[ , it follows [ (}) [ (}) that O00[ (1) = *00[ (1) (*0[ (1))2 . These first few derivatives are interesting because they are related directly to probabilistic quantities. Indeed, from (2.23), we observe that H[[] = *0[ (1) = O0[ (1) (2.26) 20 Random variables and from H[[ 2 ] = *00[ (1) + *0[ (1) ¡ ¢2 Var[[] = *00[ (1) + *0[ (1) *0[ (1) = O00[ (1) + O0[ (1) (2.27) 2.3 Continuous random variables Although most of the concepts defined above for discrete random variables are readily transferred to continuous random variables, the calculus is in general more di!cult. Indeed, instead of reasoning on the pdf, it is more convenient to work with the probability distribution function defined for both discrete and continuous random variables as I[ ({) = Pr [[ {] (2.28) Clearly, we have lim{<3" I[ ({) = 0, while lim{<+" I[ ({) = 1. Further, I[ ({) is non-decreasing in { and Pr [d ? [ e] = I[ (e) I[ (d) (2.29) This relation follows from the observations {[ d} ^ {d ? [ e} = {[ e} and {[ d} _ {d ? [ e} = >. For mutually exclusive events D _ E = >, Axiom 2 in Section 2.1 states that Pr [D ^ E] = Pr [D] + Pr [E] which proves (2.29). As a corollary of (2.29), I[ ({) is continuous at the right which follows from (2.29) by denoting d = e for any A 0. Less precise, it follows from the equality sign at the right, [ e, and inequality at the left, d ? [. Hence, I[ ({) is not necessarily continuous at the left which implies that I[ ({) is not necessarily continuous and that I[ ({) may possess jumps. But even if I[ ({) is continuous, the pdf is not necessary continuous6 . The pdf of a continuous random variable [ is defined as i[ ({) = 6 gI[ ({) g{ (2.30) Weierstrass was the first to present a continuous non-dierentiable function, i ({) = " [ eq cos (dq {) q=0 where 0 ? e ? 1 and d is an odd positive integer. Since the series is uniformly convergent for any {, i ({) is continuous everywhere. Titchmarsh (1964, Chapter IX) demonstrates for ({) de A 1 + 3 that i ({+k)3i takes arbitrarily large values such that i 0 ({) does not exist. 2 k Another class of continuous non-dierentiable functions are the sample paths of a Brownian motion. The Cantor function which is discussed in (Berger, 1993, p. 21) and (Billingsley, 1995, p. 407) is an other classical, noteworthy function with peculiar properties. 2.3 Continuous random variables 21 Assuming that I[ ({) is dierentiable at {, from (2.29), we have for small, positive { Pr [{ ? [ { + {] = I[ ({ + {) I[ ({) ³ ´ gI[ ({) { + R ({)2 = g{ Using the definition (2.30) indicates that, if I[ ({) is dierentiable at {, Pr [{ ? [ { + {] {{<0 { i[ ({) = lim (2.31) If i[ ({) is finite, then lim{{<0 Pr [{ ? [ { + {] = Pr [[ = {] = 0, which means that for well-behaved (i.e. I[ ({) is dierentiable for most {) continuous random variables [, the event that [ precisely equals { is zero7 . Hence, for well-behaved continuous random variables where Pr [[ = {] = 0 for all {, the inequality signs in the general formula (2.29) can be relaxed, Pr [d ? [ e] = Pr [d [ e] = Pr [d [ ? e] = Pr [d ? [ ? e] If i[ ({) is not finite, then I[ ({) is not dierentiable at { such that lim I[ ({ + {) I[ ({) = I[ ({) 6= 0 {{<0 This means that I[ ({) jumps upwards at { over I[ ({). In that case, there is a probability mass with magnitude I[ ({) at the point {. Although the second definition (2.31) is strictly speaking not valid in that case, one sometimes denotes the pdf at | = { by i[ (|) = I[ ({)(| {) where R +" ({) is the Dirac impulse or delta function with basic property that 3" (| {)g{ = 1. Even apart from the above-mentioned di!culties for certain classes of non-dierentiable, but continuous functions, the fact that probabilities are always confined to the region [0,1] may suggest that 0 i[ ({) 1. However, the second definition (2.31) shows that i[ ({) can be much larger than 1. For example, if [ is a Gaussian random variable 1 can be with mean and variance 2 (see Section 3.2.3) then i[ () = I2 made arbitrarily large. In fact, ³ ´ 2 exp ({3) 22 s lim = ({ ) <0 2 7 In Lesbesgue measure theory (Titchmarsh, 1964; Billingsley, 1995), it is said that a countable, finite or enumerable (i.e. function evaluations at individual points) set is measurable, but its measure is zero. 22 Random variables 2.3.1 Transformation of random variables It frequently appears useful to know how to compute I\ ({) for \ = j([). the event {} is equivalent Only j 31 exists, © if the31inverse ª function © ª {j([) gj gj to [ j ({) if g{ A 0 and to [ A j 31 ({) if g{ ? 0. Hence, ( ¡ ¢ gj A0 I[ j 31 ({) > ¡ 31 ¢ g{ (2.32) I\ ({) = Pr [j([) {] = gj 1 I[ j ({) > g{ ? 0 For well-behaved continuous random variables, we may rewrite (2.31) in terms of dierentials, i[ ({) g{ = Pr [{ [ { + g{] and, similarly for i\ (|), i\ (|) g| = Pr [| \ = j ([) | + g|] If j is increasing, then theª event {| j ([) | + g|} is equivalent to © j 31 (|) [ j 31 (| + g|) = {{ [ { + g{} such that i\ (|) g| = i[ ({) g{ If j is deceasing, we find that i\ (|) g| = i[ ({) g{. Thus, if j 31 and j 0 exists, then the relation between the pdf of a well-behaved continuous random variable [ and that of the transformed random variable \ = j([) is ¯ ¯ ¯ g{ ¯ i[ ({) i\ (|) = i[ ({) ¯¯ ¯¯ = 0 g| |j ({)| This expression also follows by straightforward dierentiation of (2.32). The chi-square distribution introduced in Section 3.3.3 is a nice example of the transformation of random variables. 2.3.2 The expectation Analogously to the discrete case, we define the expectation of a continuous random variable as Z " H [[] = 3" {i[ ({)g{ (2.33) R" In addition for the expectation to exist8 , we require 3" |{| i[ ({)g{ ? 4. If [ is a continuous random variable and j is a continuous function, then 8 This requirement is borrowed from measure theory and Lebesgue integration (Titchmarsh, 1964, Chapter X)(Royden, 1988, Chapter 4), where a measurable function is said to be integrable (in the Lebesgue sense) over D if i + = max(i ({)> 0) and i 3 = max(3i ({)> 0) are both integrable over D. Although this restriction seems only of theoretical interest, in some applications (see the 2.3 Continuous random variables 23 \ = j([) is also a continuous random variable with expectation H [\ ] equal to Z " H [j([)] = 3" j({)i[ ({)g{ (2.34) It is often useful to express the expectation H [[] of a non-negative random variable [ in tail probabilities. Upon integration by parts, ¯" Z " Z " Z " Z " ¯ {i[ ({)g{ = { i[ (x)gx¯¯ + g{ i[ (x)gx H [[] = { 0 { 0 Z0 " (1 I[ ({)) g{ (2.35) = 0 The case for a non-positive random variable [ is derived analogously, ¯0 Z { Z 0 Z { Z 0 ¯ ¯ {i[ ({)g{ = { i[ (x)gx¯ g{ i[ (x)gx H [[] = 3" Z 0 = 3" 3" 3" 3" 3" I[ ({)g{ The general case follows by addition: Z " Z 0 H [[] = (1 I[ ({)) g{ I[ ({)g{ 3" 0 A similar expression exists for discrete random variables. In general for any discrete random variable [, we can write H [[] = " [ n=3" = 31 [ 31 [ n Pr [[ = n] = n Pr [[ = n] + n=3" 31 [ n Pr [[ = n] n=0 " [ n (Pr [[ $ n] 3 Pr [[ $ n 3 1]) + n=3" = " [ n (Pr [[ D n] 3 Pr [[ D n + 1]) n=0 32 [ n Pr [[ $ n] 3 n=3" = 3 Pr [[ $ 31] 3 (n + 1) Pr [[ $ n] + n=3" 32 [ n=3" Pr [[ $ n] + " [ n=1 " [ n Pr [[ D n] 3 " [ (n 3 1) Pr [[ D n] n=1 Pr [[ D n] n=1 Cauchy distribution defined U in (3.38)) the Riemann integral may exists where the Lesbesgue does not. For example, 0" sin{ { g{ equals, in the Riemann sense, 2 (which is a standard excercise in contour integration), but this integral does not exists in the Lesbesgue sense. Only for improper integrals (integration interval is infinite), Riemann integration may exist where Lesbesgue does not. However, in most other cases (integration over a finite interval), U Lesbesgue integration is more general. For instance, if i ({) = 1{{ is ra tio n a l} , then 01 i (x)gx does not exist in the Riemann sense (since upper and lower sums do not converge to each U other). However, 01 i (x)gx = 0 in the Lesbesgue sense (since there is only a set of measure zero dierent from 0, namely all rational numbers in [0> 1] ). In probability theory and measure theory, Lesbesgue integration is assumed. 24 Random variables or the mean of a discrete random variable [ expressed in tail probabilities is9 " 31 X X H [[] = Pr [[ n] Pr [[ n] (2.36) n=1 n=3" 2.3.3 The probability generating function The probability generating function (pgf) of a continuous random variable [ is defined, for complex }, as the Laplace transform Z " ¤ £ *[ (}) = H h3}[ = h3}w i[ (w)gw (2.37) 3" Again, in some cases, it may be more convenient to use } = lx in which case the double sided Laplace transform reduces to a Fourier transform. The strength of these transforms is based on the numerous properties, especially the inverse transform, Z f+l" 1 i[ (w) = *[ (})h}w g} (2.38) 2l f3l" where f is the smallest real variable Re(}) for which the integral in (2.37) converges. as for discrete random variables, we have h}d *[ (}) = ¤ £ 3}([3d)Similarly H h ¯ q }d ¯ q q g (h *[ (})) ¯ (2.39) H [([ d) ] = (1) ¯ g} q }=0 £ ¤ The main dierence £with¤ the discrete case lies in the definition H h3}[ (continuous) versus H } [ (discrete). Since the exponential is an entire 9 We remark that " [ H [[] = n Pr [[ = n] = n=3" " [ 6= " [ n (Pr [[ D n] 3 Pr [[ D n + 1]) n=3" n Pr [[ D n] 3 n=3" " [ n Pr [[ D n + 1] = n=3" " [ Pr [[ D n] n=3" because the series in the second line are diverging. In fact, there exists a finite integer n such that, for any real arbitrarily small A 0 holds that Pr [[ D n ] = 1 3 and Pr [[ D n ] $ Pr [[ D n] for all n ? n . Hence, H [[] = n [ n=3" Pr [[ D n] + " [ n=n Pr [[ D n] D (1 3 ) n [ 1+f<" n=3" S" S where " n=n Pr [[ D n] = f is finite. Also, even for negative [, n=3" Pr [[ D n] is always positive. 2.3 Continuous random variables 25 P" function10 with power series around } = 0, h3}[ = expectation and summation can be reversed leading to £ ¤ " £ 3}[ ¤ X (1)n H [ n n } = H h n! n=0 (31)n [ n n! } n , the (2.40) n=0 £ ¤ provided11 H [ n = R (n!) which is a necessary condition for the summation to converge for } 6= 0. Assuming convergence12 , the Taylor series ¤ £ 3}[ around } = 0 is expressed as function of H h £ of¤ the moments of [, whereas in the discrete case, the Taylor series of H } [ around } = 0 given by (2.18) £is expressed in terms of probabilities of [. This observation has ¤ £ led ¤ 3}[ sometimes the moment generating function, while H } [ to call H h is the probability generating function £ [ ¤ of the random variable [. On the around } = 1, other hand, series expansion of H } *[ (}) = " X n Pr [[ = n] (} + 1 1) = n=0 " X n=0 n µ ¶ X n Pr [[ = n] (} 1)m m 5 6 " " µ ¶ X X n 7 = Pr [[ = n]8 (} 1)m m m=0 m=0 n=m shows with (2.22) that H µ ¶¸ X " µ ¶ [ n = Pr [[ = n] m m n=m £ ¤ If moments are desired, the substitution } $ h3x in H } [ is appropriate. 2.3.4 The logarithm of the probability generating function The logarithm of the probability generating function is defined as ¡ £ ¤¢ O[ (}) = log (*[ (})) = log H h3}[ 10 (2.41) An entire (or integral) function is a complex function without singularities in the finite complex plane. Hence, a power series around any finite point has infinite radius of convergence. In other words, it exists for all finite complex values. 11 The Landau big R-notation specifies the “order of a function” when the argument tends to some limit. Most often the limit is to infinity, but the R-notation can also be used to characterize the behavior of a function around some finite point. Formally, i ({) = R (j ({)) for { < " means that there exist positive numbers f and {0 for which |i ({)| $ f|j({)| for { A {0 . 12 The lognormal distribution defined by (3.43) is an example where the summation (2.40) diverges for any } 6= 0. 26 Random variables from which O[ (0) = 0 because *[ (0) = 1. Further, analogous to the discrete case, we see that O0[ (0) = *0[ (0), O00[ (0) = *00[ (0) (*0[ (0))2 and H[[] = *0[ (0) = O0[ (0) However, the dierence with the discrete case lies in the higher moments, ¯ q ¯ q q g *[ (}) ¯ (2.42) H [[ ] = (1) g} q ¯}=0 because with H[[ 2 ] = *00[ (0), ¡ ¢2 Var[[] = *00[ (0) *0[ (0) = O00[ (0) (2.43) The latter expression makes O[ (}) for a continuous random variable particularly useful. Since the variance is always positive, it demonstrates that O[ (}) is convex (see Section 5.5) around } = 0. Finally, we mention that ¤ £ H ([ H[[])3 = O000 [ (0) 2.4 The conditional probability The conditional probability of the event D given the event E (or on the hypothesis E) is defined as Pr [D|E] = Pr [D _ E] Pr [E] (2.44) The definition implicitly assumes that the event E has positive probability, otherwise the conditional probability remains undefined. We quote Feller (1970, p. 116): Taking conditional probabilities of various events with respect to a particular hypothesis E amounts to choosing E as a new sample space with probabilities proportional to the original ones; the proportionality factor Pr[E] is necessary in order to reduce the total probability of the new sample space to unity. This formulation shows that all general theorems on probabilities are valid for conditional probabilities with respect to any particular hypothesis. For example, the law Pr [D ^ E] = Pr [D] + Pr [E] Pr [D _ E] takes the form Pr [D ^ E|F] = Pr [D|F] + Pr [E|F] Pr [D _ E|F] The formula (2.44) is often rewritten in the form Pr [D _ E] = Pr [D|E] Pr [E] (2.45) 2.4 The conditional probability 27 which easily generalizes to more events. For example, denote D = D1 and E = D2 _ D3 , then Pr [D1 _ D2 _ D3 ] = Pr [D1 |D2 _ D3 ] Pr [D2 _ D3 ] = Pr [D1 |D2 _ D3 ] Pr [D2 |D3 ] Pr [D3 ] Another application of the conditional probability occurs when a partitioning of the sample space is known: = ^n En and all En are mutually exclusive, which means that En _ Em = > for any n and m 6= n. Then, with (2.45), X X Pr [D _ En ] = Pr [D|En ] Pr [En ] n n The event Dn = {D _ En } is a decomposition (or projection) of the event D in the basis event En , analogous to the decomposition of a vector in terms of a set of orthogonal basis vectors that span the total state space. Indeed, using the associative property D _ {E _ F} = D _ E _ F and D _ D = D, the intersection Dn _ Dm = {D _ En } _ {D _ Em } = D _ {En _ Em } = >, which implies mutual exclusivity (or orthogonality). Using the distributive property D _ {En ^ Em } = {D _ En } ^ {D _ Em }, we observe that D=D_ = D _ {^n En } = ^n {D _ En } = ^n Dn P Finally, since all events Dn are mutually exclusive, Pr [D] = n Pr [Dn ] = P = ^n En and in addition, for any pair m> n holds n Pr [D _ En ]. Thus, if that En _ Em = >, we have proved the law of total probability or decomposability, X Pr [D|En ] Pr [En ] (2.46) Pr [D] = n Conditioning on events is a powerful tool that will be used frequently. If the conditional probability Pr [D|En ] is known as a function j (En ), the law of total probability can also be written in terms of the expectation operator defined in (2.12) as Pr [D] = H [j (En )] (2.47) Also the important memoryless property of the exponential distribution (see Section 3.2.2) is an example of the application of the conditional probability. Another classical example is Bayes’ rule. Consider again the events En defined above. Using the definition (2.44) followed by (2.45), Pr [En |D] = Pr [En _ D] Pr [D _ En ] Pr [D|En ] Pr [En ] = = Pr [D] Pr [D] Pr [D] (2.48) 28 Random variables Using (2.46), we arrive at Bayes’ rule Pr [D|En ] Pr [En ] Pr [En |D] = P m Pr [D|Em ] Pr [Em ] (2.49) where Pr [En ] are called the a-priori probabilities, while Pr [En |D] are the a-posteriori probabilities. The conditional distribution function of the random variable \ given [ is defined by I\ |[ (||{) = Pr [\ ||[ = {] (2.50) for any { provided Pr [[ = {] A 0. This condition follows from the definition (2.44) of the conditional probability. The conditional probability density function of \ given [ is defined by i\ |[ (||{) = Pr [\ = ||[ = {] = = Pr[[ = {> \ = |] Pr [[ = {] i[\ ({> |) i[ ({) (2.51) for any { such that Pr [[ = {] A 0 (and similarly for continuous random variables i[ ({) A 0) and where i[\ ({> |) is the joint probability density function defined below in (2.59). 2.5 Several random variables and independence 2.5.1 Discrete random variables Two events D and E are independent if Pr [D _ E] = Pr [D] Pr [E] (2.52) Similarly, we define two discrete random variables to be independent if Pr [[ = {> \ = |] = Pr [[ = {] Pr [\ = |] (2.53) If ] = i ([> \ ), then ] is a discrete random variable with X Pr [[ = {> \ = |] Pr [] = }] = i ({>|)=} Applying the expectation operator (2.11) to both sides yields X H [i ([> \ )] = i ({> |) Pr [[ = {> \ = |] {>| (2.54) 2.5 Several random variables and independence 29 If [ and \ are independent and i is separable, i ({> |) = i1 ({)i2 (|), then the expectation (2.54) reduces to X X i1 ({) Pr [[ = {] i2 (|) Pr [\ = |] = H [i1 ([)] H [i2 (\ )] H [i([> \ )] = { | (2.55) The simplest example of the general function is ] = [ + \ . In that case, the sum is over all { and | that satisfy { + | = }. Thus, X X Pr [[ = {> \ = } {] = Pr [[ = } |> \ = |] Pr [[ + \ = }] = { | If [ and \ are independent, we obtain the convolution, X Pr [[ + \ = }] = Pr [[ = {] Pr [\ = } {] { = X Pr [[ = } |] Pr [\ = |] | 2.5.2 The covariance The covariance of [ and \ is defined as Cov [[> \ ] = H [([ [ ) (\ \ )] = H [[\ ] [ \ (2.56) If Cov[[> \ ] = 0, then the variables [ and \ are uncorrelated. If [ and \ are independent, then Cov[[> \ ] = 0. Hence, independence implies uncorrelation, but the converse is not necessarily true. The classical example13 is Q (0> 1) (Section 3.2.3) because \ = [ 2 where [ £ 3distribution ¤ £ has2 ¤a normal [ = 0 and H [\ = H [ = 0 as follows from (3.23). Although [ and \ are perfect dependent, they are uncorrelated. Thus, independence is a stronger property than uncorrelation. The covariance Cov[[> \ ] measures the degree of dependence between two (or generally more) random variables. If [ and \ are positively (negatively) correlated, the large values of [ tend to be associated with large (small) values of \ . As an application of the covariance, consider the problem of computing the variance of a sum Vq of random variables [1 > [2 > = = = > [q . Let n = H [[n ], 13 Another example: let X be uniform on [0> 1] and [ = cos(2X) and \ = sin (2X ). Using (2.34), ] 1 cos(2x) sin (2x) gx = 0 H [[\ ] = 0 as well as H [[] = H [\ ] = 0. I Thus, Cov[[> \ ] = 0, but [ and \ are perfectly dependent because [ = cos (arcsin \ ) = ± 1 3 \ 2 . 30 Random variables then H [Vq ] = Pq n=1 n and 5à !2 6 q h i X ([n n ) 8 Var [Vq ] = H (Vq H [Vq ])2 = H 7 n=1 5 6 q X q X =H7 ([n n )([m m )8 n=1 m=1 6 5 q q q X X X ([n n )([m m )8 = H 7 ([n n )2 + 2 n=1 n=1 m=n+1 Using the linearity of the expectation operator and the definition of the covariance (2.56) yields Var [Vq ] = q X n=1 Var [[n ] + 2 q q X X Cov [[n > [m ] (2.57) n=1 m=n+1 Observe that for a set of independent random variables {[n } the double sum with covariances vanishes. The Cauchy-Schwarz inequality (5.17) derived in Chapter 5 indicates that h i h i (H [([ [ ) (\ \ )])2 H ([ [ )2 H ([ [ )2 such that the covariance is always bounded by |Cov [[> \ ]| [ \ 2.5.3 The linear correlation coe!cient Since the covariance is not dimensionless, the linear correlation coe!cient defined as Cov [[> \ ] ([> \ ) = (2.58) [ \ is often convenient to relate two (or more) dierent physical quantities expressed in dierent units. The linear correlation coe!cient remains invariant (possibly apart from the sign) under a linear transformation because (d[ + e> f\ + g) = sign(df)([> \ ) This transform shows that the linear correlation coe!cient ([> \ ) is inde2 provided 2 A 0. pendent of the value of the mean [ and the variance [ [ Therefore, many computations simplify if we normalize the random variable properly. Let us introduce the concept of a normalized random variable 2.5 Several random variables and independence 31 [ [ W = [3 [ . The normalized random variable has a zero mean and a variance equal to one. By the invariance under a linear transform, the correlation coe!cient ([> \ ) = ([ W > \ W ) and also ([> \ ) = Cov[[ W > \ W ]. The variance of [ W ± \ W follows from (2.57) as Var [[ W ± \ W ] = Var[[ W ] + Var[\ W ] ± 2 Cov [[ W > \ W ] = 2(1 ± ([> \ )) Since the variance is always positive, it follows that 1 ([> \ ) 1. The extremes ([> \ ) = ±1 imply a linear relation between [ and \ . Indeed, ([> \ ) = 1 implies that Var[[ W \ W ] = 0, which is only possible if \ + f0 . A similar argu[ W = \ W + f, where f is a constant. Hence, [ = [ \ ment applies for the case ([> \ ) = 1. For example, in curve fitting, the goodness of the fit is often expressed in terms of the correlation coe!cient. A perfect fit has correlation coe!cient equal to 1. In particular, in linear regression where \ = d[ + e, the regression coe!cients h i dU and eU are the 2 minimizers of the square distance H (\ (d[ + e)) and given by dU = Cov [[> \ ] 2 [ eU = H [\ ] dU H [[] Since a correlation coe!cient ([> \ ) = 1 implies Cov[[> \ ] = [ \ , we see that dU = [ as derived above with normalized random variables. \ Although the linear correlation coe!cient is a natural measure of the dependence between random variables, it has some disadvantages. First, the variances of [ and \ must exist, which may cause problems with heavy-tailed distributions. Second, as illustrated above, dependence can lead to uncorrelation, which is awkward. Third, linear correlation is not invariant under non-linear strictly increasing transformations W such that (W ([)> W (\ )) 6= ([> \ ). Common intuition expects that dependence measures should be invariant under these transforms W . This leads to the definition of rank correlation which satisfies that invariance property. Here, we merely mention Sperman’s rank correlation coe!cient, which is defined as V ([> \ ) = (I[ ([)> I\ (\ )) where is the linear correlation coe!cient and where the non-linear strict increasing transform is the probability distribution. More details are found in Embrechts et al. (2001b) and in Chapter 4. 32 Random variables 2.5.4 Continuous random variables We define the joint distribution function by I[\ ({> |) = Pr [[ {> \ |] and the joint probability density function by i[\ ({> |) = C 2 I[\ ({> |) C{C| Hence, (2.59) Z { Z | I[\ ({> |) = Pr [[ {> \ |] = The analogon of (2.54) is 3" 3" i[\ (x> y)gxgy (2.60) Z "Z " H [j([> \ )] = 3" 3" j({> |)i[\ ({> |)g{g| (2.61) Most of the di!culties occur in the evaluation of the multiple integrals. The change of variables in multiple dimensions involves the Jacobian. Consider the transformed random variables X = j1 ([> \ ) and Y = j2 ([> \ ) and denote the inverse transform by { = k1 (x> y) and | = k2 (x> y), then iX Y (x> y) = i[\ (k1 (x> y)> k1 (x> y)) M (x> y) where the Jacobian M (x> y) is C{ M (x> y) = det Cx C| Cx C{ Cy C| Cy ¸ If [ and \ are independent and ] = [ + \ , we obtain the convolution, Z " Z " i] (}) = i[ ({)i\ (} {)g{ = i[ (} |)i\ (|)g| (2.62) 3" 3" which is often denoted by i] (}) = (i[ i\ )(}). If both i[ ({) = 0 and i\ ({) = 0 for { ? 0, then the definition (2.62) of the convolution reduces to Z } (i[ i\ )(}) = i[ ({)i\ (} {)g{ 0 2.5.5 The sum of independent random variables P Let VQ = Q n=1 [n , where the random variables [n are all independent. We first concentrate on the case where Q = q is a (fixed) integer. Since VQ = VQ31 + [Q , direct application of (2.62) yields the recursion Z " iVQ (}) = iVQ 31 (} |)i[Q (|)g| (2.63) 3" 2.5 Several random variables and independence 33 which, when written out explicitly, leads to the Q -fold integral Z 4 iVQ (}) = 4 Z 4 i[Q (|Q )g|Q · · · 4 i[1 (|1 )i[0 (} |Q · · · |1 )g|1 (2.64) In many cases, convolutions are more e!ciently computed via generating functions. The generating function of Vq equals # " q h Sq i Y £ Vq ¤ [ [ = H } n=1 n = H *Vq (}) = H } } n n=1 Since all [n are independent, (2.55) can be applied, *Vq (}) = q Y ¤ £ H } [n n=1 or, in terms of generating functions, *Vq (}) = q Y *[n (}) (2.65) n=1 Hence, we arrive at the important result that the generating function of a sum of independent random variables equals the product of the generating functions of the individual random variables. We also note that the condition of independence is crucial in that it allows the product and expectation operator to be reversed, leading to the useful result (2.65). Often, the random variables [n all possess the same distribution. In this case of independent identically distributed (i.i.d.) random variables with generating function *[ (}), the relation (2.65) further simplifies to *Vq (}) = (*[ (}))q (2.66) In the case where the number of terms Q in the sum VQ is a random variable with generating function *Q (}), independent of the [n , we use the general definition of expectation (2.54) for two random variables, " X £ V ¤ X Q = *VQ (}) = H } } { Pr [VQ = {> Q = n] n=0 { = " X X } { Pr [VQ = {|Q = n] Pr [Q = n] n=0 { where the conditional probability (2.45) is used. Since the value of VQ 34 Random variables depends on the number of terms Q in the sum, we have Pr [VQ = {|Q = n] = Pr [Vn = {]. Further, with X } { Pr [VQ = {|Q = n] = *Vn (}) { we have *VQ (}) = " X *Vn (}) Pr [Q = n] (2.67) n=0 The average H [VQ ] follows from (2.26) as H [VQ ] = " X *0Vn (1) Pr [Q = n] = n=0 hP n " X H [Vn ] Pr [Q = n] (2.68) n=0 i Pn Since H [Vn ] = H m=1 [m = m=1 H [[m ] and assuming that all random variables [m have equal mean H [[m ] = H [[], we have H [VQ ] = " X nH [[] Pr [Q = n] n=0 or H [VQ ] = H [[] H [Q ] (2.69) This relation (2.69) is commonly called Wald’s identity. Wald’s identity holds for any random sum of (possibly dependent) random variables [m provided the number Q of those random variables is independent of the [m . In the case of i.i.d. random variables, we apply (2.66) in (2.67) so that *VQ (}) = " X (*[ (}))n Pr [Q = n] = *Q (*[ (})) (2.70) n=0 This expression is a generalization of (2.66). 2.6 Conditional expectation The generating function (2.67) of a random sum of independent random variables can be derived using the conditional expectation H [\ |[ = {] of two random variables [ and \ . We will first define the conditional expectation and derive an interesting property. Suppose that we know that [ = {, the conditional density function 2.6 Conditional expectation 35 i\ |[ (||{) defined by (2.51) of the random variable \f = \ |[ can be regarded as only function of |. Using the definition of the expectation (2.33) for continuous random variables (the discrete case is analogous), we have Z " |i\ |[ (||{) g| (2.71) H [\ |[ = {] = 3" Since this expression holds for any value of { that the random variable [ can take, we see that H [\ |[ = {] = j ({) is a function of { and, in addition since [ = {, H [\ |[ = {] = j ([) can be regarded as a random variable that is a function of the random variable [. Having identified the conditional expectation H [\ |[ = {] as a random variable, let us compute its expectation or the expectation of the slightly more general random variable k ([) j ([) with j ([) = H [\ |[ = {]. From the general definition (2.34) of the expectation, it follows that Z " Z " H [k ([) j ([)] = k ({) j ({) i[ ({) g{ = k ({) H [\ |[ = {] i[ ({) g{ 3" 3" Substituting (2.71) yields Z "Z " H [k ([) j ([)] = k ({) |i\ |[ (||{) i[ ({) g|g{ 3" 3" Z "Z " k ({) |i[\ ({> |) g|g{ = H [k ([) \ ] = 3" 3" where we have used (2.51) and (2.61). Thus, we find the interesting relation H [k ([) H [\ |[ = {]] = H [k ([) \ ] (2.72) As a special case where k({) = 1, the expectation of the conditional expectation follows as H [\ ] = H[ [H\ [\ |[ = {]] where the index in H] clarifies that the expectation is over the random P variable ]. Applying this relation to \ = } VQ where VQ = Q n=1 [n and all [n are independent yields ¤ £ £ £ ¤¤ *VQ (}) = H } VQ = HQ HV } VQ |Q = q £ ¤ Since HV } VQ |Q = q = *VQ (}) and specified in (2.65), we end up with *VQ (}) = HQ [*VQ (})] = " X n=0 which is (2.67). *Vn (}) Pr [Q = n] 3 Basic distributions This chapter concentrates on the most basic probability distributions and their properties. From these basic distributions, other useful distributions are derived. 3.1 Discrete random variables 3.1.1 The Bernoulli distribution A Bernoulli random variable [ can only take two values: either 1 with probability s or 0 with probability t = 1 s. The standard example of a Bernoulli random variable is the outcome of tossing a biased coin, and, more generally, the outcome of a trial with only two possibilities, either success or failure. The sample space is = {0> 1} and Pr[[ = 1] = s, while Pr [[ = 0] = t. From this definition, the pgf follows from (2.17) as £ ¤ *[ (}) = H } [ = } 0 Pr [[ = 0] + } 1 Pr [[ = 1] or *[ (}) = t + s} (3.1) From (2.23) or (2.14), the q-th moment is H [[ q ] = s which shows that = H[[] = s. From (2.24), we find H [([ d)q ] = s(1 d)q + t(d)q such that the moments centered around the mean are ¢ ¡ H [([ )q ] = st q + (1)q tsq = st t q31 + (1)q sq31 £ ¤ Explicitly, with s + t = 1, Var[[] = st and H ([ )3 = st(t s). 37 38 Basic distributions 3.1.2 The binomial distribution A binomial random variable [ is the sum of q independent Bernoulli random variables. The sample space is = {0> 1> · · · > q}. For example, [ may represent the number of successes in q independent Bernoulli trials such as the number of heads after q-times tossing a (biased) coin. Application of (2.66) with (3.1) gives *[ (}) = (t + s})q (3.2) Expanding the binomial pgf in powers of }, which justifies the name “binomial”, q µ ¶ X q n q3n n *[ (}) = } s t n n=0 and comparing to (2.18) yields µ ¶ q n q3n Pr[[ = n] = s t n (3.3) The alternative, probabilistic approach starts with (3.3). Indeed, the probability that [ has n successes out of q trials consists of precisely n successes (an event with probability sn ) and q n failures (with probability equal to t q3n ). The total number ¡q¢ of ways in which n successes out of q trials can be obtained is precisely n . P The mean follows from (2.23) or from the definition [ = qm=1 [Bernoulli and the linearity of the expectation as H [[] = qs. Higher order moments around the mean can be derived from (2.24) as ¯ ¡ ¡ ¢q ¢ ¯¯ q µ ¶ gp h3wqs t + shw gp X q n q3n w(tq3n) ¯¯ ¯ p H [([ ) ] = t s = p h ¯ ¯ ¯ ¯ gwp gw n n=0 w=0 w=0 q µ ¶ X q n q3n t s = (tq n)p n n=0 In general, this form seems di!cult to express more elegantly. It illustrates that, even for simple random variables, computations may rapidly become unattractive. For p = 2, the above dierentiation leads to Var[[] = qst. But, this result is more economically obtained from (2.27), since O[ (}) = qs qs2 and O00[ (}) = (t+s}) q log (t + s}), O0[ (}) = t+s} 2 . Thus, Var [[] = qs2 + qs = qst (3.4) 3.1 Discrete random variables 39 3.1.3 The geometric distribution The geometric random variable [ returns the number of independent Bernoulli trials needed to achieve the first success. Here the sample space is the infinite set of integers. The probability density function is Pr [[ = n] = st n31 (3.5) because a first success (with probability s) obtained in the n-th trial is proceeded by n 1 failures (each having probability t = 1 s). Clearly, Pr [[ = 0] = 0. The series expansion of the probability generating function, *[ (}) = s} " X tn }n = n=0 s} 1 t} (3.6) justifies the name “geometric”. The mean H [[] = *0[ (1) equals H [[] = s1 . The higher-order moments can be deduced from (2.24) as à !¯ " X h3wt@s ¯¯ gq q = s t n (n t@s)q H [([ ) ] = s q ¯ gw 1 thw ¯ w=0 n=0 Similarly as for the binomial random variable, the variance most easily folt , lows from (2.27) with O[ (}) = log s+log (})log(1t}), O0[ (}) = }1 + 13t} 2 t O00[ (}) = }12 + (13t}) 2 . Thus, Var [[] = t t2 t + = 2 2 s s s (3.7) P The distribution function I[ (n) = Pr [[ n] = nm=1 Pr [[ = m] is obtained as n31 X 1 tn tm = s = 1 tn Pr [[ n] = s 1t m=0 The tail probability is Pr[[ A n] = t n (3.8) Hence, the probability that the number of trials until the first success is larger than n decreases geometrically in n with rate t. Let us now consider an important application of the conditional probability. The probability that, given the success is not found in the first n trials, success does not occur within the next p trials, is with (2.44) Pr[[ A n + p|[ A n] = Pr [{[ A n + p} _ {[ A n}] Pr [[ A n + p] = Pr [[ A n] Pr [[ A n] 40 Basic distributions and with (3.8) Pr[[ A n + p|[ A n] = tp Pr[[ A p] This conditional probability turns out to be independent of the hypothesis, the event {[ A n}, and reflects the famous memoryless property. Only because Pr[[ A n] obeys the functional equation i ({ + |) = i ({)i (|), the hypothesis or initial knowledge does not matter. It is precisely as if past failures have never occurred or are forgotten and as if, after a failure, the number of trials is reset to 0. Furthermore, the only solution to the functional equation is an exponential function. Thus, the geometric distribution is the only discrete distribution that possesses the memoryless property. 3.1.4 The Poisson distribution Often we are interested to count the number of occurrences of an event in a certain time interval, such as, for example, the number of IP packets during a time slot or the number of telephony calls that arrive at a telephone exchange per unit time. The Poisson random variable [ with probability density function n h3 (3.9) n! turns out to model many of these counting phenomena well as shown in Chapter 7. The corresponding generating function is Pr [[ = n] = *[ (}) = h3 " X n n=0 n! } n = h(}31) (3.10) and the average number of occurrences in that time interval is H[[] = (3.11) This average determines the complete distribution. In applications it is convenient to replace the unit interval by an interval of arbitrary length w such that (w)n h3w Pr [[ = n] = n! equals the probability that precisely n events occur in the interval with duration w. The probability that no events occur during w time units is Pr [[ = 0] = h3w and the probability that at least one event (i.e. one or more) occurs is Pr [[ A 0] = 1 h3w . The latter is equal to the exponential distribution. We will also see later in Theorem 7.3.2 that the Poisson 3.1 Discrete random variables 41 counting process and the exponential distribution are intimately connected. The sum of q independent Poisson random variables each with mean n is P again a Poisson random variable with mean qn=1 n as follows from (2.65) and (3.10). The higher-order moments can be deduced from (2.24) as ³ ´¯ w gq h3(w3h ) ¯¯ ¯ H [([ )q ] = h3 ¯ gwq ¯ w=0 from which £ ¤ H[[] = Var[[] = H ([ )3 = The Poisson tail distribution equals Pr [[ A p] = 1 p X n h3 n=0 n! which precisely equals the sum of p exponentially distributed variables as demonstrated below in Section 3.3.1. The Poisson density approximates the binomial density (3.3) if q $ 4 but the mean qs = . This phenomenon is often referred to as the law of rare events: in an arbitrarily large number q of independent trials each with arbitrarily small success s = q , the total number of successes will approximately be Poisson distributed. The classical argument is to consider the binomial density (3.3) with s = q q3n q! n 1 3 n!(q 3 n)! qn q 3n n31 n \ m q 13 = 13 13 n! q q q m=1 Pr[[ = n] = or log (Pr[[ = n]) = log n n! n31 [ m log 1 3 3 n log 1 3 + + q log 1 3 q q q m=1 33 { { {2 = 3q to obtain up to order For large q, we use the Taylor expansion log 1 3 q 3 2q 2 +R q 32 R q n n(n 3 1) 2 + n + R q32 3 log (Pr[[ = n]) = log + R q32 3 3 + R q32 n! q 2q 2q n 1 (n 3 )2 3 n + R q32 = log 33 n! 2q With h{ = 1 + { + R({2 ), we finally obtain the approximation for large q, 1 n h3 13 (n 3 )2 3 n + R q32 Pr[[ = n] = n! 2q 42 Basic distributions t t l k 1 The coe!cient of q is negative if n M + 12 3 + 14 > + 12 + + 14 . In that n-interval, the Poisson density is a lower bound for the binomial density for large q and qs = . The reverse Pr[[=n] holds for values of n outside that interval. Since for the Poisson density Pr[[=n31] = , we n see that Pr[[ = n] increases as A n and decreases as ? n. Thus, the maximum of the Poisson density lies around n = = H[[]. In conclusion, we can say that the Poisson density approximates the binomial density for large q and qs = from below in the region of about the I standard deviation around the mean H[[] = and from above outside this region (in the tails of the distribution). A much shorter derivation anticipates results of Chapter 6 and starts from the probability generating function (3.2) of the binomial distribution after substitution of s = q , ¶ µ (} 1) q lim *[ (}) = lim 1 + = h(}31) q<" q<" q Invoking the Continuity Theorem 6.1.3, comparison with (3.10) shows that the limit probability generating function corresponds to a Poisson distribution. The Stein—Chen (1975) Theorem1 generalizes the law of rare events: this law even holds when the Bernoulli trials are weakly dependent. As a final remark, let Vq be the sum of i.i.d. Bernoulli trials each with mean s, then Vq is binomially distributed as shown in Section 3.1.2. If s is a constant and independent of the number of trials q, the Central Limit Theorem 6.3.1 states that sVq 3qs tends to a Gaussian distribution. In qs(13s) summary, the limit distribution of a sum Vq of Bernoulli trials depends on how the mean s varies with the number of trials q when q $ 4: 1 g n 3 if s = q , then h Vq $ n! if s is constant, then sVq 3qs qs(13s) g 3{ 2 $ hI22 The proof (see e.g. Grimmett and Stirzacker (2001, pp. 130—132)) involves coupling theory of stochastic random variables. The degree of dependence is expressed in terms of the total variation distance. The total variation distance between two discrete random variables [ and \ is defined as gW Y ([> \ ) = [ |Pr [[ = n] 3 Pr [\ = n]| n and satisfies gW Y ([> \ ) = 2 sup |Pr [[ M D] 3 Pr [\ M D]| DaZ 3.2 Continuous random variables 43 3.2 Continuous random variables 3.2.1 The uniform distribution A uniform random variable [ has equal probability to attain any value in the interval [d> e] such Rthat the probability density function is a constant. e Since Pr[d [ e] = d i[ ({)g{ = 1, the constant value equals i[ ({) = 1 1 e d {M[d>e] (3.12) where 1| is the indicator function defined in Section 2.2.1. The distribution function then follows as {d + 1{Ae 1 Pr [d [ {] = e d {M[d>e] The Laplace transform (2.37) is2 Z " h3}d h3}e *[ (}) = h3}w i[ (w)gw = }(e d) 3" (3.13) while the mean = H [[] most easily follows from Z " {g{ d+e H[[] = 1{M[d>e] = 2 3" e d The centered moments are obtained from (2.39) as ³ } ´¯ (e3d) 3 }2 (e3d) ¯ 2 q q h h ¯ (1) g ¯ H [([ )q ] = ¯ q e d g} } ¯ }=0 ¯ ¯ )} 2(1)q gq sinh( e3d ¯ 2 = ¯ ¯ e d g} q } }=0 Using the power series X ( e3d )2n+1 sinh( e3d 2 )} 2 = } 2n } (2n + 1)! " n=0 leads to ¤ £ (e d)2q H ([ )2q = (2q + 1)22q £ ¤ H ([ )2q+1 = 0 2 (3.14) (}) Notice that }*[ equals the convolution i W j of two exponential densities i and j with rates de d and e, respectively. 44 Basic distributions Let us define X as the uniform random variable in the interval [0> 1]. If Z = 1 X is a uniform random variable on [0> 1], then Z and X have the g same distribution denoted as Z = X because Pr[Z {] = Pr[1 X {] = Pr[X 1 {] = 1 (1 {) = { = Pr [X {] = The probability distribution function I[ ({) = Pr[[ {] = j({) whose inverse exists can be written as a function of IX ({) = {1{M[0>1] . Let [ = j 31 (X ). Since the distribution function is non-decreasing, this also holds for the inverse j 31 (=). Applying (2.32) yields with [ = j 31 (X ) £ ¤ I[ ({) = Pr j 31 (X ) { = Pr [X j({)] = IX (j({)) = j({) g = lnX are exponentially random variFor instance, j 31 (X ) = ln(13X) 31 ables (3.17) with parameter ; j (X ) = X 1@ are polynomially distributed random variables with distribution Pr [[ {] = { ; j 31 (X ) = cot(X ) is a Cauchy random variable defined in (3.38) below. In addition, we observe that X = j ([) = I[ ([), which means that any random variable [ is transformed into a uniform random variable X on [0> 1] by its own distribution function. The numbers dn that satisfy congruent recursions of the form dn+1 = (dn +) mod P , where P is a large prime number (e.g. P = 231 1), and are integers (e.g. = 397 204 094 and = 0) are to a good approximation dn are nearly uniformly uniformly distributed. The scaled numbers |n = P31 distributed on [0> 1]. Since these recursions with initial value or seed d0 5 [0> P 1] are easy to generate with computers (Press et al., 1992), the above property is very useful to generate arbitrary random variables [ = j 31 (X ) from the uniform random variable X . 3.2.2 The exponential distribution An exponential random variable [ satisfies the probability density function i[ ({) = h3{ > { 0 (3.15) where is the rate at which events occur. The corresponding Laplace transform is Z " h3w h3}w gw = (3.16) *[ (}) = }+ 0 and the probability distribution is, for { 0, I[ ({) = 1 h3{ (3.17) 3.2 Continuous random variables 45 The mean or average follows from (2.33) or from H [[] = *0[ (0) as = H [[] = 1 . The centered moments are obtained from (2.39) as ³ ´¯ ¶q ¸ µ q h}@ ¯ g }+ ¯ 1 ¯ = (1)q H [ ¯ g} q ¯ }=0 h}@ around } = 0 is õ ¶ q ! " " " ³ ´n X 1 q X (1)n h}@ X 1 ³ } ´n X n } = }q (1) = }+ n! n! Since the Taylor expansion of n=0 we find that n=0 }+ q=0 n=0 ¶ ¸ µ q 1 q q! X (1)n H [ = q n! (3.18) n=0 For large q, the centered moments are well approximated by µ ¶ ¸ 1 q q! H [ ' q h The exponential random variable possesses, just as its discrete counterpart, the geometric random variable, the memoryless property. Indeed, analogous to Section 3.1.3, consider Pr[[ w + W |[ A w] = Pr [[ w + W ] Pr [{[ w + W } _ {[ A w}] = Pr [[ A w] Pr [[ A w] and since Pr [[ A w] = h3w , the memoryless property Pr[[ w + W |[ A w] = Pr[[ A W ] is established. Since the only non-zero solution (proved in Feller (1970, p. 459)) to the functional equation i ({ + |) = i ({)i (|), which implies the memoryless property, is of the form f{ , it shows that the exponential distribution is the only continuous distribution that has the memoryless property. As we will see later, this memoryless property is a fundamental property in Markov processes. It is instructive to show the close relation between the geometric and exponential random variable (see Feller (1971, p. 1)). Consider the waiting time W (measured in integer units of w) for the first success in a sequence of W Bernoulli trials where only one trial occurs in a timeslot w. Hence, [ = {w is a (dimensionless) geometric random variable. From (3.8), Pr[W A nw] = (1 s)n and the average waiting time is H [W ] = wH [[] = {w s . The 46 Basic distributions transition from the discrete to continuous space involves the limit process w $ 0 subject to a fixed average waiting time H [W ]. Let w = nw, then ¶ µ w w@{w lim Pr[W A w] = lim 1 = h3w@H[W ] {w<0 {w<0 H [W ] For arbitrary small time units, the waiting time for the first success and with average H [W ] turns out to be an exponential random variable. 3.2.3 The Gaussian or normal distribution The Gaussian random variable [ is defined for all { by the probability density function ¸ 1 ({ )2 i[ ({) = s exp (3.19) 2 2 2 which explicitly shows its dependence on the average and variance 2 . The importance of the Gaussian random variables stems from the Central Limit Theorem 6.3.1. Often a Gaussian — also called normal — random variable with average and variance 2 is denoted by Q (> 2 ). The distribution function is ¶ ¸ µ Z { 1 (w )2 { I[ ({) = s exp (3.20) gw 2 2 2 3" R{ w2 where3 ({) = I12 3" h3 2 gw is the normalized Gaussian distribution corresponding to = 0 and = 1. The double-sided Laplace transform is 1 *[ (}) = s 2 3 Z " 3}w h 3" ¸ 2 }2 (w )2 3} 2 exp gw = h 2 2 (3.22) Abramowitz and Stegun (1968, Section 7.1.1) define the error function as 2 erf (}) = I ] } 2 h3w gw 0 such that (Abramowitz and Stegun, 1968, Section 7.1.22) 1 I 2 1 (w 3 )2 {3 gw = exp 3 I 1 + erf 22 2 2 3" ] { (3.21) 3.3 Derived distributions 47 and the centered moments (2.39) are µ 2 2 ¶¯ ¯ } 2q ¯ µ 2 ¶q g h 2 ¯ ¤ £ (2q)! 2q ¯ = = H ([ ) ¯ 2q g} q! 2 ¯ ¯ }=0 £ ¤ 2q+1 H ([ ) =0 (3.23) We note from (2.65) that a sum of independent Gaussian variables ¢ ¡Pq random Pq 2 > Q (n > n2 ) is again a Gaussian random variable Q n=1 n n=1 n . If 2 [ = Q (> 2 ), then the scaled random variable \ = d[ is a Q (d> £ (d)| ¤) random variable that is verified by computing Pr [\ |] = Pr [ d . Similarly for translation, \ = [ + e, then \ = Q ( + e> 2 ). Hence, a linear combination of Gaussian random variables is again a Gaussian random variable, ! à q q q X X X dn Q (n > n2 ) + e = Q dn n + e> d2n n2 n=1 n=1 n=1 3.3 Derived distributions From the basic distributions, a large number of other distributions can be derived as illustrated here. 3.3.1 The sum of independent exponential random variables By applying (2.65) and (2.38) a substantial amount of practical problems can be solved. For example, the sum of q independent exponential random variables, each with dierent rate n A 0, has the generating function *Vq (}) = q Y n=1 n } + n and probability density function Qq Z f+l" h}w n=1 n Qq iVq (w) = g} 2l f3l" n=1 (} + n ) The contour can be closed over the negative half plane for w A 0, where the integral has simple poles at } = n . From the Cauchy integral theorem, we obtain ! q à q Y X h3m w Qq n iVq (w) = n=1;n6=m (n m ) n=1 m=1 48 Basic distributions ³ If all rates are equals n = , the case reduces to *Vq (}) = H [Vq ] = q and with probability density Z h}w q f+l" iVq (w) = g} 2l f3l" (} + )q }+ ´q with Again, the contour can be closed over the negative half plane and the q-th order poles are deduced from Cauchy’s relation for the q-th derivative of a complex function ¯ Z i ($) g$ 1 1 gn i (}) ¯¯ = n! g} n ¯}=}0 2l F(}0 ) ($ }0 )n+1 as ¯ q (w)q31 3w gq31 h}w ¯¯ = h iVq (w) = (q 1)! g} q31 ¯}=3 (q 1)! (3.24) For integer q, this density corresponds to the qErlang random variable. When extended to real values of q = , i[ (w; > ) = (w)31 3w h () (3.25) it is called the Gamma probability density function, with corresponding pgf µ ¶ ³ } ´3 = 1+ (3.26) *[ (}; > ) = }+ and distribution I[ ({; > ) = () Z { w31 h3w gw (3.27) 0 This integral, the incomplete Gamma-function, can only be expressed in closed analytic form if is an integer. Hence, for the q-Erlang random variable [, the distribution follows after repeated partial integration as Z { q31 X ({)n q q31 3w w h gw = 1 (3.28) h3{ I[ ({; > q) = (q 1)! 0 n! n=0 P ({)n 3{ We observe that Pr[[ A {] = q31 , which equals Pr[\ q 1] n=0 n! h where \ is a Poisson random variable with mean = {. Further, Pr[[ A W {] = Pr[ [ A {], where H[[ W ] = q, or the distribution of the sum of i.i.d. exponential random variables each with rate follows by scaling { $ { from the distribution of the sum of i.i.d. exponential random variables each with unit rate (or mean 1). Moreover, (2.65) and (3.26) show that a 3.3 Derived distributions 49 sum of q independent Gamma random variables specified by n (but with P same ) is again a Gamma random variable with = qn=1 n . At last all centered moments follow from (2.39) by series expansion around } = 0 as ³ ¡ ¢3 ´ ¯¯ gq h}d 1 + } ¯ ¯ H [([ d)q ] = (1)q ¯ g} q ¯ }=0 µ ¶ q 3p X (d) = (1)q q!dq p (q p)! p=0 In particular, since H [[] = = , we find with ¡3} ¢ p = (1)p K(}+p) p!K(}) ¶ q µ q X 3p H [([ ) ] = (1) q! q p=0 p (q p)! q µ ¶ q X ( + p) q q = (1) q (1)p p=0 p () p q q = (1)q q () X (> + 1 + q> ) q where X (d> e> }) is the confluent hypergeometric function (Abramowitz and Stegun, 1968, Chapter 13). For example, if q = 2, the variance equals ¤ ¤ £ £ 2 = 2 and further, H ([ )3 = 23 , H ([ )4 = 3(+2) and 4 ¤ 4(5+6) £ 5 . H ([ ) = 5 3.3.2 The sum of independent uniform random variables Pn The sum Vn = m=1 Xm of n i.i.d. uniform random variables Xm has as distribution function the n-fold convolution of the uniform density function (nW) iX ({) = 10${$1 on [0> 1] denoted by iX ({). The distribution function equals Pr [Vn {] = [{] X m=0 (1)m ({ m)n m!(n m)! Indeed, from (2.66) and (3.13) the Laplace transform of Vn is µ *Vn (}) = 1 h3} } ¶n (3.29) 50 Basic distributions The inverse Laplace transform determines, for f A 0, ¶n Z f+l" µ 1 1 h3} { g (nW) h}{ g} Pr [Vn {] = iX ({) = g{ 2l f3l" } ¡ ¢ R f+l" hvd P n 1 Using (1 h3} ) = nm=0 nm (1)m h3m} and the integral 2l f3l" vq+1 gv = dq q! 1Re(d)A0 , yields (nW) iX ({) = n µ ¶ X ({ m)n31 n 1 (1)m (n 1)! ({3m)D0 m (3.30) m=0 from which (3.29) follows by integration. 3.3.3 The chi-square distribution Suppose that the total error of q independent measurements [n , each perturbed by Gaussian noise, has to be determined. In order to prevent that erP rors may cancel out, the sum of the squared errors V = qn=0 h2n is preferred Pq rather than n=0 |hn |. For simplicity, we assume that all errors hn = [n {n , where {n is the exact value of quantity n, have zero mean and unit variance. The corresponding distribution of V is known as the chi-square distribution. From the "2 -distribution, the "2 -test in statistics is deduced which determines the goodness of a model of a distribution to a set of measurements. We refer for a discussion of the "2 -test to Leon-Garcia (1994, Section 3.8) or Allen (1978, Section 8.4). We first deduce the distribution of the square \ = [ 2 of a random variable [ and note that if X and Y are independent so are the random variables s j(X ) and k(Y ). The event {\ |} or {[ 2 |} is equivalent to { | s [ |} and non-existent if | ? 0. With (2.29) and | 0, s s s s Pr [\ |] = Pr [ | [ |] = I[ ( |) I[ ( |) and, after dierentiation, s s i[ ( {) + i[ ( {) s i[ 2 ({) = 2 { If [ is a Gaussian random variable Q(> 2 ), then is, for { 0, h i 2) µ s ¶ exp ({+ { 2 2 s cosh i[ 2 ({) = 2 2{ In particular, for Q (0> 1) random variables where = 0 and = 1, i[ 2 ({) = 3.4 Functions of random variables 51 3{ 2 h I reduces to a Gamma distribution (3.25) with = 12 and = 12 . Since 2{ the sum of q independent Gamma random variables with (> ) is again a Gamma random variable (> q), we arrive at the chi-square "2 probability density function, q { { 2 31 i"2 ({) = q ¡ q ¢ h3 2 2 2 2 (3.31) 3.4 Functions of random variables 3.4.1 The maximum and minimum of a set of independent random variables The minimum of p i.i.d. random variables {[n }1$n$p possesses the distribution4 ¸ Pr min [n { = Pr [at least one [n {] = Pr [not all [n A {] 1$n$p or Pr ¸ p Y min [n { = 1 Pr[[n A {] 1$n$p (3.32) n=1 whereas for the maximum, ¸ p Y Pr[[n {] Pr max [n A { = Pr [not all [n {] = 1 1$n$p n=1 or Pr ¸ Y p Pr[[n {] max [n { = 1$n$p (3.33) n=1 For example, the distribution function for the minimum of p independent exponential random variables follows from (3.17) as ! à ¸ p p Y X 3n { h = 1 exp { n Pr min [n { = 1 1$n$p n=1 n=1 or, the minimum of p independent exponential random variables each with Pp rate n is again an exponential random variable with rate n=0 n . In addition to the memoryless property, this property of the exponential distribution will determine the fundamentals of Markov chains. 4 An alternative argument for independent random variables is that the event {min1$n$p [n A {} is only possible if and only if {[n A {} for each 1 $ n $ p. Similarly, the event {max1$n$p [n $ {} is only possible if and only if all {[n $ {} for each 1 $ n $ p. 52 Basic distributions 3.4.2 Order statistics The set [(1) > [(2) > = = = > [(p) are called the order statistics of the set of random variables {[n }1$n$p if [(n) is the n-th smallest value of the set {[n }1$n$p . Clearly, [(1) = min1$n$p [n while [(p) = max1$n$p [n . If the set {[n }1$n$p consists of i.i.d. random variables with pdf i[ , the joint density function of the order statistics is, for only {1 ? {2 ? · · · ? {p , ¤ £ Cp i{[(m) } ({1 > {2 > = = = > {p ) = Pr [(1) {1 > = = = > [(p) {p C{1 = = = C{p = p! p Y i[ ({m ) (3.34) m=1 Indeed, confining to discrete random variables for simplicity, if {1 ? {2 ? · · · ? {p , then ¤ £ Pr [(1) = {1 > = = = > [(p) = {p = p! Pr [[1 = {1 > = = = > [p = {p ] else ¤ £ Pr [(1) = {1 > [(2) = {2 > = = = > [(p) = {p = 0 because there are precisely p! permutations of the set {[n }1$n$p onto the given ordered sequence {{1 > {2 > = = = > {p }. If the sequence is not ordered such that {n A {o for at least©one couple of ª indices n ? o, then the probability is zero because the event [(n) A [(o) is, by definition, impossible. Finally, the product in (3.34) follows by independence. If the set {[n }1$n$p is uniformly distributed over [0> w], then p! i{[(m) } ({1 > {2 > = = = > {p ) = p w 0 {1 ? {2 ? · · · ? {p w =0 elsewhere while for exponential random variables with i[ ({) = h3{ i{[(m) } ({1 > {2 > = = = > {p ) = p!p h3 =0 Sp m=1 {m 0 {1 ? {2 ? · · · ? {p elsewhere The order relation between the set [(1) [(2) · · · ¡ [(p) ¢ is preserved ¡ ¢ j [ after a continuous, non-decreasing transform j, i.e. j [ (1) (2) ¡ ¢ · · · j [(p) . If the distribution function I[ is continuous (it is always non-decreasing), the argument shows that the order statistics of a general set of i.i.d. random variable {[n }1$n$p can be reduced to a study of the order statistics of the set of i.i.d. uniform random variables {Xn }1$n$p on [0,1] because X = I[ ([). 3.4 Functions of random variables 53 © ª The event [(n) { means that at least n among the p random variables {[m }1$m$p are smaller than {. Since each of the p random variables is chosen independently from a same distribution I[ , the probability that precisely q of the p random variables is smaller than { is binomially distributed with parameter s = Pr [[ {]. Hence, p µ ¶ ¤ X £ p (3.35) (Pr [[ {])q (1 Pr [[ {])p3q Pr [(n) { = q q=n The probability density function can be obtained in the usual, though cumbersome, way by dierentiation, ¤ £ g Pr [(n) { i[(n) ({) = g{ p µ ¶ X p g = (Pr [[ {])q (1 Pr [[ {])p3q q g{ q=n µ ¶ p X p = i[ ({) q (Pr [[ {])q31 (1 Pr [[ {])p3q q q=n µ ¶ p X p i[ ({) (p q) (Pr [[ {])q (1 Pr [[ {])p3q31 q q=n ¡p31¢ ¡ ¢ ¡p31¢ ¡p¢ and lowering the upper index Using q q = p q31 , (p q) p q =p q in the last summation, we have ¶ p µ X p1 i[(n) ({) = pi[ ({) (Pr [[ {])q31 (1 Pr [[ {])p3q q1 q=n p31 X µp 1¶ (Pr [[ {])q (1 Pr [[ {])p3q31 pi[ ({) q q=n ¶ p µ X p1 = pi[ ({) (Pr [[ {])q31 (1 Pr [[ {])p3q q1 q=n ¶ p µ X p1 pi[ ({) (Pr [[ {])q31 (1 Pr [[ {])p3q q1 q=n+1 or, with I[ ({) = Pr [[ {], µ ¶ p1 i[(n) ({) = pi[ ({) (I[ ({))n31 (1 I[ ({))p3n n1 (3.36) The more elegant and faster argument is as follows: in order for [(n) to be equal to {, exactly n 1 of the p random variables {[m }1$m$p must be 54 Basic distributions less than {, one equal to { and the£ other p ¤ n must all be greater¡than¢ {. Abusing the notation i[ ({) = Pr [(n) = { and observing that p p31 n31 = p! p! 1!(n31)!(p3n)! is an instance of the multinomial coe!cient q1 !q2 !···qn ! which gives the number of ways of putting p = q1 + q2 + · · · + qn dierent objects into n dierent boxes with qm in the m-th box, leads alternatively to (3.36). 3.5 Examples of other distributions 1. The Gumbel distribution appears in the theory of extremes (see Section 6.4) and is defined by the distribution function 3d({3e) IGumbel ({) = h3h (3.37) The corresponding Laplace transform is Z " ³ }´ 3d(w3e) *Gumbel (}) = h3}w h3h dh3d(w3e) gw = h3e} 1 + d 3" ¢¯ ¡ g 3e} from which the mean follows as H [[] = g} h 1 + d} ¯}=0 = e + d , where = 0.57721=== is the Euler constant. The variance is best computed 2 with (2.43) resulting in Var[[] = 6d 2. 2. The Cauchy distribution has the probability density function iCauchy ({) = and corresponding distribution, ICauchy ({) = 1 (1 + {2 ) ´ 1 ³ + arctan { 2 The Laplace transform 1 *Cauchy (}) = (3.38) Z " h3}{ g{ 2 3" 1 + { only converges for purely imaginary } = l$, in which case it reduces to a Fourier transform, Z 1 " h3l${ g{ *Cauchy (l$) = 3" 1 + {2 This integral is best evaluated by contour integration. If $ 0, we consider a contour F consisting of the real axis and the semi-circle that encloses the negative Im({)-plane, Z 3l${ Z " 3l${ Z 3l$uh3l ¡ 3l ¢ h g uh h g{ h g{ = + lim 2 2 2 u<" 0 1 + u h32l 3" 1 + { F 1+{ 3.5 Examples of other distributions 55 ¯ ¯ 3l ¯ ¯ Since ¯h3l$uh ¯ = h$u sin = h3|$|u sin and sin 0 for 0 , the limit of the last integral vanishes. The contour encloses the simple pole (zero of {2 + 1 = ({ l)({ + l)) at { = l. Applying Cauchy’s residue theorem, we obtain Z " 3l${ h g{ h3l${ ({ + l) = 2l lim = h3$ 2 2 {<3l 1 + { 1 + { 3" If $ 0, we close the contour over the positive Im({)-plane such that the contribution of the semi-circle to the contour F again vanishes. The resulting contour then encloses the simple pole at { = l and Z " 3l${ h g{ h3l${ ({ l) = 2l lim = h3$ 2 2 {<l 1 + { 1 + { 3" Combining both expressions results in £ ¤ *Cauchy (l$) = H h3l$[ = h3|$| Since |$| is not analytic around $ = 0, none of the moments of the Cauchy distribution exists! Hence, the Cauchy distribution is an example of a distribution without mean (see the requirement for the existence of the expectaR " {g{ tion in Section 2.3.2), although the improper integral 3" 1+{2 = 0 due to R " {g{ R 0 {g{ diverge. symmetry (in the Riemann sense), but both 3" 1+{ 2 and 0 1+{2 Pq In addition, if Vq = n=1 [n is the sum of i.i.d. Cauchy random variables [n , the sample mean Vqq has the Fourier transform, q h h $ i ³ h $ i´q i h $ Sq i Y Vq H h3l$ q = H h3l q n=1 [n = H h3l q [n = H h3l q [ = h3|$| n=1 Hence, the sample mean Vqq of i.i.d. Cauchy random variables is again a Cauchy random variable independent of q. This means that the law of large numbers (see Section 6.2) does not hold for the Cauchy random variable, as a consequence of the non-existence of the mean. Also, the sum Vq has 1 Fourier transform h3|q$| and the pdf equals iVq ({) = q 1+({@q) 2 . ( ) 3. The Weibull distribution with pdf defined for { 0 and d> e A 0 ³ ¡ ¢´ e exp {d ¢ ¡ (3.39) iWeibull ({) = d 1 + 1e generalizes the exponential distribution (3.17) corresponding to e = 1 and d = 1 . It is related to the Gaussian distribution if e = 2. Let [ be a Weibull 56 Basic distributions random variable. All higher moments can be computed from (2.34) as ¡ ¢ µ ³ ´¶ Z " h i dn n+1 { e 1 n n e ¢ ¡ ¡ ¢ { exp g{ = H [ = d d 1 + 1e 0 1e The generating function possesses the expansion µ ¶ " " £ 3}[ ¤ X n + 1 (}d)n (})n h n i 1 X = *[ (}) = H h H [ = ¡1¢ n! e n! e n=0 n=0 which cannot be summed in explicit form for general e. Sometimes an alternative definition of the Weibull distribution appears iWeibull ({) = de{e31 h3d{ e (3.40) 3d{e IWeibull ({) = 1 h with the advantage of a simpler expression for the distribution function IWeibull ({). If [ possesses this probability density (3.40), the moments and variance are ¡ ¢ h i Z " n + 1e n n { iWeibull ({)g{ = H [ = dn@e 0 ¡ ¢ ¡ ¢ Z " 1 + 2e 2 1 + 1e 2 Var[[] = ({ H [[]) iWeibull ({)g{ = d2@e 0 The interest of the Weibull distribution in the Internet stems from the self-similar and long-range dependence of observables (i.e. quantities that can be measured such as the delay, the interarrival times of packets, etc.). Especially if the shape factor e 5 (0> 1), the Weibull has a sub-exponential tail that decays more slowly than an exponential, but still faster than any power law. 4. Power law behavior is often described via the Pareto distribution with pdf for { 0 and A 0> ³ { ´331 (3.41) 1+ iPareto ({) = and with distribution function Z ³ {³ { ´3 { ´331 IPareto ({) = gw = 1 1 + (3.42) 1+ 0 Since lim{<" I ({) = 1, the power must exceed 0. The higher moments are Beta-functions (Abramowitz and Stegun, 1968, Section 6.2.1) h i Z " {n g{ n ( n) H [n = { +1 = n! 0 (1 + ) () 3.5 Examples of other distributions 57 £ ¤ and show that H [ n only exists if A n. Hence, the mean H [[] ¡only ex-¢ ists if A 1. The deep tail asymptotic for large { is iPareto ({) = R {331 and Pr [[ A {] = R ({3 ). For example, the distribution of the nodal degree in the Internet has an exponent around = 2=4 (see Section 15.3). 5. Another distribution with heavy tails is the lognormal ¢ distribution ¡ defined as the random variable [ = h\ where \ = Q > 2 is£a Gaussian ¤ or normal random variable. From (2.32), it follows that Pr h\ { = Pr [\ log {] for { 0, and with (3.20) ¸ Z log { 1 (w )2 Ilognormal ({) = s exp gw (3.43) 2 2 2 3" and, for { A 0, h i {3)2 exp (log2 2 s ilognormal ({) = { 2 (3.44) The moments are ¸ Z " h i (log { )2 1 n n31 H [ = s { exp g{ 2 2 2 0 ¸ Z " (x )2 1 nx h exp gx = s 2 2 2 3" or, explicitly, µ 2 2¶ h i n n H [ = exp (n) exp 2 and (3.45) ³ 2 ´ 2 Var[[] = h2 h2 h (3.46) The probability generating function is by definition (2.37) 1 *[ (}; > 2 ) = s 2 Z " 0 h3 h3}w (log w3)2 2 2 w 1 gw = s 2 Z " { ({3)2 h3}h h3 22 g{ 3" (3.47) (}; > 2 ) only exists for Re(}) 0. The integral (3.47) indicates that *[ This means that *[ (}; > 2 ) is not analytic at any point } = lw on the imaginary axis because the circle with arbitrary small but non-zero radius around } = lw necessarily encircles points with Re(}) ? 0 where *[ (}; > 2 ) does not exist. Hence, the Taylor expansion (2.40) of the generating function around } = 0 does not exist, although all moments or derivatives at } = 0 58 Basic distributions exist. Indeed, the series £ ¤ " X (1)n H [ n n! n=0 n } = " X (}h )n n! n=0 µ 2 2¶ n exp 2 is a divergent series (except for = 0 or } = 0). The fact that the pgf (3.44) is not available in closed form complicates the computation of the sum of i.i.d. lognormal random variables via (2.66). This sum appears in radio communications with several transmitters and receivers. In radio communications, the received signal levels decrease with the distance between the transmitter and the receiver. This phenomenon is called pathloss. Attenuation of radio signals due to pathloss has been modeled by averaging the measured signal powers over long times and over various locations with the same distances to the transmitter. The mean value of the signal power found in this way is referred to as the area mean power Pd (in Watts) and is well-modeled as Pd (u) = f·u3 where f is a constant and is the pathloss exponent5 . In reality the received power levels may vary significantly around the mean power Pd (u) due to irregularities in the surroundings of the receiving and transmitting antennas. Measurements have revealed that the logarithm of the mean power P (u) at dierent locations on a circle with radius u around the transmitter is approximately normally distributed with mean equal to the logarithm of the area mean power Pd (u). The lognormal shadowing model assumes that the logarithm of P(u) is precisely normally distributed around the logarithmic value of the area mean power: log10 (P(u)) = log10 (Pd (u))+[, where [ = Q (0> ) is a zero-mean normal distributed random variable (in dB) with standard deviation (also in dB and for severe fluctuations up to 12 dB). Hence, the random variable P(u) = Pd (u)10[ has a lognormal distribution (3.43) equal to Pr [P(u) $ {] = Pr [ $ log10 & % ] { 1 { (log10 x 3 log10 (Pd (u)))2 gx = I exp 3 Pd (u) 22 x 2 log 10 0 3.6 Summary tables of probability distributions 3.6.1 Discrete random variables Name Pr [[ = n] H [[] Var[[] £ ¤ *[ (}) = H } [ Bernoulli Binomial Geometric Pr [[ = 1] = s ¡q¢ n q3n n s (1 s) s (1 s)n31 s qs s (1 s) qs (1 s) 1 s + s} ((1 s) + s})q 1 s 13s s2 Poisson n s} 13(13s)} h(}31) 5 3 n! h The constant f depends on the transmitted power, the receiver and the transmitter antenna gains and the wavelength. The pathloss exponent depends on the environment and terrain structure and can vary between 2 in free space to 6 in urban areas. 3.7 Problems 59 3.6.2 Continuous random variables Name Uniform Exponential Gaussian Gamma Gumbel Cauchy Weibull Pareto Lognormal i[ ({) H [[] Var[[] *[ (}) = H h3}[ 1d{e e3d h3{ ({)2 exp 3 22 I 2 ({)1 3{ h K() { 3{ h h h d+e 2 1 (e3d)2 12 1 2 h}d 3h}e }(e3d) }+ 2 2 2 6 exp 1 (1+{2 ) e de{e31 h3d{ 331 1+ { (log {)2 exp 3 22 I { 2 = 0=5772=== does not exist does not exist K(1+ 1 e) K(1+ 2 3K2 (1+ 1 e) e) d1@e 1{A1} 31 } 2 l 3 } }+ K (} + 1) h3| Im(})| (Re(}) = 0) d2@e 2 1{A2} (31)2 (32) exp () exp k 2 2 2 2 2 2 h2 h2 3 h 3.7 Problems (i) If *[ (}) is the probability generating function of a non-zero discrete random variable [, find an expression of H [log [] in terms of *[ (}). (ii) Compute the mean value of the n-th order statistic in an ensemble of (a) p i.i.d. exponentially distributed random variables with mean 1 and (b) p i.i.d. polynomially distributed random variables on [0,1]. (iii) Discuss how a probability density function of a continuous random variable [ can be approximated from a set {{1 > {2 > = = = > {q } of q measurements or simulations. (iv) In a circle with radius u around a sending mobile node, there are Q 1 other mobile nodes uniformly distributed over that circle. The possible interference caused by these other mobile nodes depends on their distance to the sending node at the center. Derive for large Q but constant density of mobile nodes the pdf of the distance of the p-th nearest node to the center. (v) Let X and Y be two independent random variables. What is the probability that the one is larger than the other? 4 Correlation In this chapter methods to compute bi-variate correlated random variables are discussed. As a measure for the correlation, the linear correlation coe!cient defined in (2.58) is used. First, the generation of q correlated Gaussian random variables is explained. The sequel is devoted to the construction of two correlated random variables with arbitrary distribution. 4.1 Generation of correlated Gaussian random variables Due to the importance of Gaussian correlated random variables as an underlying system for generating arbitrary correlated random variables, as will be demonstrated in Section 4.3, we discuss how they can be generated in multiple dimensions. With the notation of Section 3.2.3, a Gaussian (normal) random variable with average and variance 2 is denoted by Q (> 2 ). By linearly combining Gaussian random variables, we can create a new Gaussian random variable with a desired mean and variance 2 . 4.1.1 Generation of two independent Gaussian random variables The fact that a linear combination of Gaussian random variables is again a Gaussian random variable allows us to concentrate on normalized Gaussian random variables Q (0> 1). Let [1 and [2 be two independent normalized Gaussian random variables. Independent random variables are not correlated and the linear correlation coe!cient = 0. The resulting joint probability distribution is i[1 [2 ({> |; ) = i[1 ({)i[2 (|) and with (3.19), i[1 [2 ({> |; 0) = 61 h3 {2 +| 2 2 2 62 Correlation It is natural to consider a polar transformation ³ ´and the transformed random 2 2 2 variables U = [1 + [2 and = arctan [ [1 . The inverse transform is s s { = u cos and | = u sin , which diers slightly from the usual polar transformation in that we now define u = {2 + | 2 instead of u2 = {2 + |2 . The reason is that the Jacobian is simpler for our purposes, # " cos s C{ C{ ¸ I u sin 1 2 u Cu C = M (u> ) = det C| C| = det sin s I u cos 2 Cu C 2 u whereas the usual polar transformation has the Jacobian equal to the variable u. Using the transformation rules in Section 2.5.4, u h3 2 iUX (u> ) = 4 which shows that iUX (u> ) does not depend on . Hence, we can write iUX (u> ) = iU (u) iX () with iX () = f, where f is a constant and iU (u) = u h3 2 4f . This implies that is a uniform random variable over an interval 1@f. We also recognize from (3.15) that iU (u) is close to an exponential random variable with rate = 12 . Therefore, it is instructive to choose the constant f such that U is precisely an exponential random variable with rate = 12 . 3u 1 1 Thus, choosing f = 2 , we end up with iU (u) = h 2 2 and iX () = 2 . These two independent random variables U and can each be generated separately from a uniform random variable X on [0,1], as discussed in Section 3.2.1, leading to U = 2 ln(X1 ) = 2X2 and, finally, to the independent Gaussian random variables p p [1 = 2 ln(X1 ) cos 2X2 [2 = 2 ln(X1 ) sin 2X2 The procedure can be used to generate a single Gaussian random variable, but also more independent Gaussians by repeating the generation procedure. 4.1.2 The q-joint Gaussian probability distribution function A collection of q random variables [l is called a random vector [ = ([1 > [2 > = = = > [q )W , a matrix with dimension q × 1. The average of a random vector is a vector with components H [[l ] for 1 l q. The variance of a random vector £ ¤ £ ¤ Var [[] = H ([ H [[])([ H [[])W = H [[ W H [[] (H [[])W 4.1 Generation of correlated Gaussian random variables 63 is a matrix [ with elements ( [ )l>m = Cov[[l > [m ]. Since Cov[[l > [m ] = Cov[[m > [l ], the covariance matrix [ is real and symmetric, [ = W[ . The importance of real, symmetric matrices is that they have real eigenvectors (see Appendix A.2). Moreover, [ is non-negative definite because, using vector norms defined in Section A.3, £ ¤ {W [ { = H {W ([ H [[])([ H [[])W { i h¡ ¢W = H ([ H [[])W { ([ H [[])W { h° °2 i = H °([ H [[])W {°2 0 which implies that all real eigenvalues l are non-negative. Hence, there exists an orthogonal matrix X such that [ = Xdiag(l )X W (4.1) If all random variables [l are independent, Cov[[l > [m ] = 0 for l 6= m and Cov[[l > [l ] = Var[[l ] 0 then [ = diag(Var[[l ]). Gaussian random variables are completely determined by the mean and the variance, i.e. by the first two moments. We will now show that the existence of an orthogonal transformation for any probability distribution such that X W [ X = diag(l ) implies that a vector of joint Gaussian random variables can be transformed into a vector of independent Gaussian random variables. Also the reverse holds, which will be used below to generate q joint correlated Gaussian random variables. The multi-dimensional generating function of a q-joint Gaussian or q-joint normal random vector [ is defined for the vector } = (}1 > }2 > = = = > }q )W as µ ¶ £ 3}[ ¤ 1 W W = exp (4.2) } [ } H [[] } *[ (}) = H h 2 Using (4.1), and the fact that X is an orthogonal matrix such that X 31 = X W and X X W = L, µ ¶ ¡ W ¢W W 1 ¡ W ¢W W *[ (}) = exp X } diag(l )X } X H [[] X } 2 Denote the vectors z = X W } and p = X W H [[]. Then we have µ ¶ 1 W W z diag(l )z p z *[ (}) = exp 2 3 4 q q z2 2 X Y m m m zm C D = exp h 2 3pm zm pm zm = 2 m=1 m=1 64 Correlation m zm2 3pm zm 2 and h = *[m (zm ) is the Laplace transform (3.22) of a Gaussian random variable [m because all m are real and non-negative. With (2.65), this shows that a vector of joint Gaussian random variables can be transformed into a vector of independent Gaussian random variables. Reversing the order of the manipulations also justifies that (4.2) indeed defines a general q-joint Gaussian probability generating function. If [1 > [2 > = = = > [q are joint normal and not correlated, then [ is a diagonal matrix, which implies that [1 > [2 > = = = > [q are independent. As discussed in Section 2.5.2, independence implies non-correlation, but the converse is generally not true. These properties make Gaussian random variables particularly suited to deal with correlations. y 2 0 2 0.25 fXY(x,y) 0.2 0.15 0.1 0.05 0 2 0 x 2 Fig. 4.1. The joint probability density function (4.4) with [ = \ = 0 and [ = \ = 1 and = 0= The corresponding q-joint Gaussian probability density function of the vector [ can be derived after inverse Laplace transform for the vector { = ({1 > {2 > = = = > {q )W as µ ¶ 1 1 W 31 i[ ({) = ¡s ¢q s exp ({ H[{]) [ ({ H[{]) (4.3) 2 2 det [ The inverse Laplace transform for q = 2 is computed in Section C.2. After computing the inverse matrix and the determinant in (4.3) explicitly, the two-dimensional (q = 2) or bi-variate Gaussian probability density function is 5 2 2 6 ({3[ ) exp 7 i[\ ({> |; ) = 2 [ 3 2 ({3[ )(|3\ )+ [ \ 2(132 ) 2[ \ p 1 2 (|3\ ) 2 \ 8 (4.4) 4.1 Generation of correlated Gaussian random variables 65 Figures 4.1—4.3 plot i[\ ({> |; ) for various correlation coe!cients . If = 0, we observe that i[\ ({> |; 0) = i[ ({)i\ (|), which indicates that uncorrelated Gaussian random variables are also independent. y 2 0 2 0.25 fXY(x,y) 0.2 0.15 0.1 0.05 0 2 0 x 2 Fig. 4.2. The joint probability density function (4.4) with [ = \ = 0 and [ = \ = 1 and = 0=8= y 2 0 2 fXY(x,y) 0.25 0.2 0.15 0.1 0.05 0 2 0 x 2 Fig. 4.3. The joint probability density function (4.4) with [ = \ = 0 and [ = \ = 1 and = 0=8= [) \) If we denote {W = ({3 and |W = (|3 , the bi-variate normal density [ \ (4.4) reduces to h i W 2 W | W +(| W )2 exp ({ ) 32{ 2(132 ) p i[\ ({W > |W ; ) = 2[ \ 1 2 66 Correlation from which we can verify the partial dierential equation Ci[\ ({W > |W ; ) Ci[\ ({W > |W ; ) = C C{W C|W (4.5) and the symmetry relations i[\ ({W > |W ; ) = i[\ ({W > | W ; ) = i[\ ({W > | W ; ) (4.6) 4.1.3 Generation of q correlated Gaussian random variables Let {[l }1$l$q be a set of q independent normal random variables, where each [l is distributed as Q (0> 1). The vector [ is rather easily simulated. The analysis above shows that H [[] = 0 (the null-vector) and [ = diag(Var[[l ]) = diag(1) = L, the identity matrix. We want to generate the correlated normal vector \ with a given mean vector H [\ ] and a given covariance matrix \ . Since linear combinations of normal random variables are normal random variables, we consider the linear transformation \ = D[ + E where D and E are constant matrices. We will now determine D and E. First, H [\ ] = H [D[] + H [E] = DH [[] + H [E] = H [E] Hence, the matrix E is a vector with components equal to the given components H [\l ] of the mean vector H [\ ]. Second, £ ¤ W \ = H (\ H [\ ])(\ H [\ ]) £ ¤ £ ¤ £ ¤ = H D[(D[)W = H D[[ W DW = DH [[ W DW =D W W [ D = DD From the eigenvalue decomposition of \ = X diag(l )X W with real eigens ¡ s ¢W such values l 0 and the fact that diag(l ) = diag( l ) diag( l ) that p ³ p ´W DDW = X diag( l ) diag( l ) X W we obtain p D = X diag( l ) The matrix D is also called the square root matrix of \ and can be found from the singular value decomposition of \ or from Cholesky factorization (Press et al., 1992). Example Generate a normal vector \ with H [\ ] = (300> 300)W , with standard deviations 1 = 106=066, 2 = 35=355 and correlation \ = 0=8. 4.2 Generation of correlated random variables 67 Solution: The covariance matrix of \ is obtained using the definition of the linear correlation coe!cient (2.58), ¶ µ 2 ¶ µ 1 11250 3000 \ 1 2 = = \ 3000 1250 \ 1 2 22 The square root matrix D of \ µ D= is 63=640 84=853 0 35=355 ¶ which is readily checked by computing DDW = \ . It remains to generate p independent draws for [1 and [2 from a normal distribution with zero mean and unit variance as explained in Section 4.1.1. Each pair ([1 > [2 ) out of the p pairs is transformed as \ = D[ +H [\ ]. The result, component \2 versus \1 , is shown in Fig. 4.4. Y2 450 400 350 300 250 200 150 100 50 0 0 100 200 300 400 500 600 700 Y1 Fig. 4.4. The scatter diagram of the simulated vector \= 4.2 Generation of correlated random variables Let us consider the problem of generating two correlated random variables [ and \ with given distribution functions I[ and I\ . The correlation is expressed in terms of the linear correlation coe!cient ([> \ ) = defined in (2.58). The need to generate correlated random variables often occurs in simulations. For example, as shown in Kuipers and Van Mieghem 68 Correlation (2003), correlations in the link weight structure may significantly increase the computational complexity of multi-constrained routing, called in brief QoS routing. The importance of measures of dependence between quantities in risk and finance is discussed in Embrechts et al. (2001b). In general, given the distribution functions I[ and I\ , not all linear correlations from 1 1 are possible. Indeed, let [ and \ be positive real random variables with infinite range which means that I[ ({) = 1 if { $ 4 and that I[ ({) = I\ ({) = 0 for { ? 0. Consider \ = d[ + e with d ? 0 and e 0. For all finite | ? 0, ¸ ¸ |e |e I\ (|) = Pr [\ |] = Pr [d[ + e |] = Pr [ Pr [ A d d ¶ µ |e = 1 I[ A0 d which contradicts the fact that I\ (|) = 0 for | ? 0. Hence, positive random variables with infinite range cannot be correlated with = 1= The requirement that the range needs to be unbounded is necessary because two uniform random variables on [0> 1], X1 and X2 , are negatively correlated with = 1 if X1 = 1 X2 . In summary, the set of all possible correlations is a closed interval [min > max ] for which min ? 0 ? max . The precise computation of min and max is, in general, di!cult, as shown below. 4.3 The non-linear transformation method The non-linear transformation approach starts from a given set of two random variables [1 and [2 that have a correlation coe!cient [ 5 [1> 1]. If the joint distribution function Z {1 Z {2 i[1 [2 (x> y; [ )gxgy I[1 [2 ({1 > {2 ; [ ) = Pr [[1 {1 > [2 {2 ] = 3" 3" is known, the marginal distribution follows from (2.60) as Z {1 Z " i[1 [2 (x> y; [ )gxgy Pr [[1 {1 ] = 3" 3" Since for any random variable [ holds that I[ ([) = X where X is a uniform random variable on [0> 1], it follows that X1 = I[1 ([1 ) and X2 = I[2 ([2 ) are uniformly correlated random variables with correlation coe!cient X . As shown in Section 3.2.1, if X is a uniform random variable on [0,1], any other random variable \ with distribution function j({) can be constructed as j 31 (X ). By combining the two transforms, we can generate 4.3 The non-linear transformation method 69 \1 = j131 (I[1 ([1 )) and \2 = j231 (I[2 ([2 )) that are correlated because [1 and [2 are correlated. It may be possible to construct directly the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ) if the transforms W1 and W2 are known. The goal is to determine the linear correlation coe!cient \ defined in (2.58), H [\1 \2 ] H [\1 ] H [\2 ] p \ = p Var [\1 ] Var [\2 ] as a function of [ . Using (2.61), £ ¤ H [\1 \2 ] = H j131 (I[1 ([1 )) j231 (I[2 ([2 )) Z "Z " j131 (I[1 (x)) j231 (I[2 (y)) i[1 [2 (x> y; [ )gxgy (4.7) = 3" 3" This relation shows that \ is a continuous function in [ and that the joint distribution function of [1 and [2 is needed. The main di!culty lies now in the computation of the integral appearing in H [\1 \2 ]. For [1 and [2 , Gaussian correlated random variables are most often chosen because an exact analytic expression (4.4) exists for the joint distribution function I[2 [2 ({1 > {2 ; [ ). 4.3.1 Properties of \ as a function of [ From now on, we choose Gaussian correlated random variables for [1 and [2 . Theorem 4.3.1 The correlation coe!cient \ is a dierentiable and increasing function of [ . Proof: From the partial dierential equation (4.5) of i[1 [2 (x> y; [ ), it follows that ] " ] " C 2 i[1 [2 (x> y; [ ) CH [\1 \2 ] gxgy = j131 I[1 (x) j231 I[2 (y) C[ CxCy 3" 3" Partial integration with respect to x and y yields ] " ] " gj131 I[1 (x) gj231 I[2 (y) CH [\1 \2 ] = i[1 [2 (x> y; [ )gxgy C[ gx gx 3" 3" 1 1 Applying the chain rule for dierentiation and gj g{({) = j0 (j1 gives ({)) gj 31 ({) i[ (x) g{ gj 31 (I[ (x)) = = 0 31 gx g{ j (j (I[ (x))) {=I[ (x) gx C\ 1 \2 ] Since j 0 ({) and i[ (x) are probability density functions and positive, CH[\ = C A 0. Hence, C [ we have shown that \ is a dierentiable, increasing function of [ . [ ¤ 70 Correlation Since [ 5 [1> 1], \ increases from \ min at [ = 1 to \ max corresponding to [ = 1. In the sequel, we will derive expressions to compute the boundary cases [ = 1 and [ = 1. Theorem 4.3.2 (of Lancaster) For any two strictly increasing real functions W1 and W2 that transform the correlated Gaussian random variables [1 and [2 to the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ), it holds that |\ | |[ | If two correlated random variables \1 and \2 can be obtained by separate transformations from a bi-variate normal distribution with correlation coefficient [ , the correlation coe!cient \ of the transformed random variables cannot in absolute value exceed [ . The interest of the proof is that it uses powerful properties of orthogonal polynomials and that \ is expanded in a power series in [ in (4.12). Proof: The proof is based on the orthogonal Hermite polynomials Kq ({) (see e.g. Rainville (1960) and Abramowitz and Stegun (1968, Chapter 22)) defined by the generating function " [ Kq ({) wq exp 2{w 3 w2 = q! q=0 (4.8) After expanding exp 2{w 3 w2 in a Taylor series and equating corresponding powers in w, we find that [ q2 ] [ (31)n (2{)q32n Kq ({) = q! (4.9) n! (q 3 2n)! n=0 with K0 ({) = 1. The Hermite polynomials satisfy the orthogonality relations ] " 2 h3{ Kq ({) Kp ({) g{ = 0 p 6= q 3" ] " 3" I 2 h3{ Kq2 ({) g{ = 2q q! These orthogonality relations enable us to expand functions in terms of Hermite polynomials (similar to Fourier analysis). If the expansion of a function i ({), i ({) = " [ dn Kn ({) n=0 converges for all {, then it follows from the orthogonality relations that ] " 2 1 h3{ i ({) Kn ({) g{ dn = n I 2 n! 3" The joint normalized Gaussian density function can be expanded (Rainville, 1960, pp. 197—198) in terms of Hermite polynomials 2 2 exp 3 { 32{|+| " 2 (13 ) 2 2 [ q s = h3{ 3| Kq ({) Kq (|) q (4.10) 2 q! 1 3 2 q=0 4.3 The non-linear transformation method 71 In order for the covariance Cov[\1 \2 ] to exist, both H \12 and H \22 must be finite. Since \m = Wm ([m ) for m = 1> 2, the mean is & % ] " ] " 1 ({ 3 [ )2 H [\m ] = g{ Wm ({)i[m ({) g{ = I Wm ({) exp 3 2 2[ 2[ 3" 3" ] " I 2 1 = I Wm ([ + 2[ x)h3x gx 3" Let " [ I Wm [ + 2[ x = dn;m Kn (x) (4.11) n=0 with dn;m = 1 I 2n n! ] " 3" I 2 h3{ Wm [ + 2[ { Kn ({) g{ then, since K0 ({) = 1, H [\m ] = d0;m k l The second moment H \m2 follows from (2.34) as H \m2 = ] " 3" 1 = I ] " 1 2 (Wm ({)) i[m ({) g{ = I 2[ I 2 3x2 Wm ([ + 2[ x)h gx ] " 3" % Wm2 ({) exp & ({ 3 [ )2 3 g{ 2 2[ 3" Substituting (4.11) gives " [ " [ 1 dp;m dn;m I H \m2 = n=0 p=0 ] " 3" 2 Kn (x) Kp (x) h3x gx = " [ d2n;m 2n n! n=0 which is convergent. Similarly, using (4.4), ] " ] " W1 (x) W2 (y) i[1 [2 (x> y; [ )gxgy H [\1 \2 ] = 3" 3" 5 9 exp 73 ] " ] " W1 (x) W2 (y) (x[ )2 2 [ 6 2[ (y\ )2 (x3[ )(y3\ )+ 2 [ \ : \ 8 2(132 [) 3 s gxgy 1 3 2 {2 32[ {|+| 2 ] " ] " exp 3 I I (132[ ) W1 [ + 2[ { W2 \ + 2\ | g{g| = t 3" 3" 1 3 2[ % & ] " ] " " " [ [ 1 {2 3 2[ {| + | 2 = t dn;1 dp;2 Kn ({) Kp (|) exp 3 g{g| 1 3 2[ 3" 3" 1 3 2 n=0 p=0 = 3" 3" 2[ \ [ Using (4.10), H [\1 \2 ] = ] " ] " " " " [ [ q 2 2 1 [ [ dn;1 dp;2 h3{ Kn ({) Kq ({) g{ h3| Kp (|) Kq (|) g| q n=0 2 q! 3" 3" p=0 q=0 72 Correlation Introducing the orthogonality relations for Hermite polynomials leads to H [\1 \2 ] = " [ dq;1 dq;2 2q q!q [ q=0 The correlation coe!cient becomes S" S" q q dq;1 dq;2 2q q!q q=0 dq;1 dq;2 2 q![ 3 d0;1 d0;2 [ \ = tS tS = tS q=1 S" " " " 2 2 2 2 2 2 2n n! n n n n! d 2 n! 3 d d 2 n! 3 d d 2 d n=0 n;1 n=1 n;2 n=1 n;1 n=1 n;2 0;1 0;2 I I S S" 2 2 Denote q = dq;1 2q q! and q = dq;2 2q q!, then Var[\1 ] = " n=1 n and Var[\2 ] = n=1 n . Since the linear correlation coe!cient ([> \ ) equals the correlation coe!cient of the corresponding normalized random with mean zero and variance 1, as shown in Section 2.5.3, we may S S"variable 2 2 choose " n=1 n = n=1 n = 1 such that \ = " [ q q q [ (4.12) q=1 If 21 = 1 and 12 = 1, then |\ | = |[ | because all other n and n must then vanish. In all other cases, either 21 ? 1 or 12 ? 1 or both, such that \ = 1 1 [ + " [ q q q [ q=2 and y y x" x" " " [ [ x[ x[ q q q 2 | |q q q [ $ |q q | |[ | $ w 2q |[ | w q [ q=2 q=2 q=2 where we have used the Cauchy-Schwarz inequality partial summation, " [ 2q |[ |q = (1 3 |[ |) q=2 because Sq 2 n=2 n ? S de $ q=2 sS d2 sS e2 (see Section 5.5). By " [ q [ |[ |2 2n |[ |q $ (1 3 |[ |) 1 3 21 = 1 3 21 |[ |2 1 3 | | [ q=2 n=2 S" 2 2 n=2 n = 1 3 1 . Thus t t |\ | $ |[ | |1 1 | + 1 3 21 1 3 12 |[ | Finally, for 21 $ 1 and 12 $ 1, the inequality |1 1 | + Lancaster’s theorem because |[ | $ 1. t t 1 3 21 1 3 12 $ 1 holds. This proves ¤ 4.3.2 Boundary cases Let us investigate some cases for special values of [ . 1. [ = 0. Since uncorrelated Gaussian random variables ([ = 0) are independent, also \1 = j131 (I[1 ([1 )) and \2 = j231 (I[2 ([2 )) are independent such that \ = 0. Hence, uncorrelated Gaussian random variables with [ = 0 lead to uncorrelated random variables \1 and \2 with \ = 0. 4.3 The non-linear transformation method 73 2. [ = 1. Perfect positively correlated Gaussian random variables [1 = [2 = [ have joint distribution ¸ 1 (x [ )2 i[1 [2 (x> y; 1) = s exp (x y) 2 2[ 2[ which follows from Pr [[1 {1 > [2 {2 ] = Pr [[ {1 > [ {2 ] = Pr [[ {] with { = max({1 > {2 ). In that case, Z " j131 (I[ (x)) j231 (I[ (x)) gI[ (x) H [\1 \2 ] = 3" Z 1 = 0 j131 ({) j231 ({) g{ (4.13) which may lead to \ max ? 1 depending on the specifics of j1 and j2 . By transforming { = j1 (x), we obtain Z j31 (1) 1 xj231 (j1 (x)) j10 (x)gx H [\1 \2 ] = j131 (0) which shows that, if j1 = j2 = j, Z j31 (1) £ ¤ x2 j 0 (x)gx = H \ 2 H [\1 \2 ] = j 31 (0) Hence, if \1 and \2 have the same distribution function j as \ , the case [ = 1 leads to £ ¤ H \ 2 (H [\ ])2 =1 \ = Var [\ ] 3. [ = 1. Perfect negatively correlated Gaussian random variables [1 = [2 = [ have joint distribution ¸ 1 (x [ )2 i[1 [2 (x> y; 1) = s exp (x + y) 2 2[ 2[ which follows from the symmetry relations (4.6). In that case, Z " j131 (I[ (x)) j231 (I[ (x)) gI[ (x) H [\1 \2 ] = Z3" " j131 (I[ (x)) j231 (1 I[ (x)) gI[ (x) = 3" Z 1 = 0 j131 ({) j231 (1 {) g{ (4.14) which may lead to \ min A 1, depending on the specifics of j1 and j2 . 74 Correlation 4.4 Examples of the non-linear transformation method 4.4.1 Correlated uniform random variables Let us first focus on the relation between [ and X . Since H [X ] = 12 and 1 X2 = 12 , the definition of the linear correlation coe!cient (2.58) gives X = H [X1 X2 ] 14 1 12 where, using (2.61), H [X1 X2 ] = H [I[1 ([1 )I[2 ([2 )] Z "Z " = I[1 (x)I[2 (y)i[1 [2 (x> y; [ )gxgy 3" 3" In the case of Gaussian correlated random variables specified by (3.20) and (4.4), we must evaluate the integral ] " ] " H [X1 X2 ] = gx 3" ] x 3 gy gw h 3" (w[ )2 1 22 [1 ] y 3" 3 g h ( [ )2 2 22 [2 3" 2 2 6 x[ y[ 2[ 1 2 3 x3[1 )(y3[2 )+ ( 2 2 : 9 [1 [2 [1 [2 : exp 9 8 73 2(132 [) 5 × 2 2 (2)2 [ 1 [2 Substituting successively x0 = H [X1 X2 ] = 1 ] " (2)2 3" gx0 t 1 3 2[ x3[1 w3 y3 3 , w0 = [1 , y0 = [2 , 0 = [2 , we obtain [1 [1 [2 [2 ] " gy 0 ] x0 3" w02 h3 2 gw0 3" ] y0 02 h3 2 g 0 3" 02 0 0 02 [ x y +y exp 3 x 32 2(132 [) t 1 3 2[ We now use the partial dierential equation (4.5), 4 3 02 0 0 02 x02 32[ x0 y 0 +y02 C2 exp 3 x 32[ x 2y +y exp 3 Cx0 Cy 0 2(13[ ) 2(132 F C E [) E F= t t D C C 1 3 2[ 1 3 2[ such that CH [X1 X2 ] = C ] " gx0 ] " 3" gy0 ] x0 3" w02 h3 2 gw0 3" ] y0 3" 02 h3 2 g 0 C2 Cx0 Cy 0 02 0 0 02 exp 3 x 32[ x 2y +y 2(13[ ) t 2 (2) 1 3 2[ Partial integration in the last integral y 0 &$ # % C2 x02 3 2[ x0 y0 + y 02 exp 3 L2 = gy h g Cx0 Cy 0 2 1 3 2[ 3" 3" &$ # % ] " y2 C x02 3 2[ x0 y + y2 = exp 3 gyh3 2 Cx0 2 1 3 2[ 3" ] " 0 ] y0 02 3 2 0 4.4 Examples of the non-linear transformation method 75 yields CH [X1 X2 ] = C ] " y2 gyh3 2 ] " 3" gx0 ] x0 C Cx0 w02 h3 2 gw0 3" 3" 02 0 0 02 exp 3 x 32[ x 2y +y 2(13[ ) t (2)2 1 3 2[ and similarly in the x0 integral, CH [X1 X2 ] 1 t = 2 C (2) 1 3 2 gy 3" [ % ] " ] " gx exp 3 3" & x2 2 3 2[ 3 2[ xy + y 2 2 3 2[ 2 1 3 2[ 5 (432 ) $2 6 # ] " y2 ] " exp 3 2 23[2 ( 2 3 2[ y [) [ x3 8 = gy t gx exp 73 2 3 2[ 2 1 3 2[ 3" 3" (2)2 1 3 2[ v v 2 2 3 2[ 2 1 3 2[ 1 1 1 = t t = 2 2 2 2 4 3 2 3 2 [ [ (2) 1 3 [ 4 3 2[ Thus, we find that 6 1 CX = t C[ 4 3 2 [ or that X = 6 ] 1 6 [ t +f + f = arcsin 2 4 3 2[ It remains to determine the constant f. We have shown in Section 4.3.2 that random variables generated from uncorrelated Gaussian random variables are also uncorrelated implying that X = 0 if [ = 0 and, hence, that the constant f = 0. This finally results in 6 [ (4.15) X = arcsin 2 In summary, two uniform correlated random variables X1 and X2 with correlation coe!cient X are found by transforming two Gaussian correlated ¡ ¢ random variables [1 and [2 with correlation coe!cient [ = 2 sin 6X . Equation (4.15) further shows that X = ±1 if [ = ±1, which indicates that the whole range of the correlation coe!cient X is possible. 4.4.2 Correlated exponential random variables In Section 3.2.1, we have seen that, if X is a uniform random variable on [0,1], j 31 (X ) = 1 log X is an exponential random variable with mean 1 . The correlation coe!cient for two exponential random variables, \1 and \2 , with mean 11 and 12 respectively, is \ = H [\1 \2 ] 112 1 1 2 = H [1 2 \1 \2 ] 1 76 Correlation As above, we generate \1 = 11 log I[1 ([1 ) and \2 = 12 log I[2 ([2 ), where [1 and [2 are correlated Gaussian random variables with correlation coe!cient [ . Then, 1 2 H [1 2 \1 \2 ] = H [log I[1 ([1 ) log I[2 ([2 )] Z 1" 2Z " log I[1 (x) log I[2 (y)i[1 [2 (x> y; [ )gxgy = 3" 3" In the general case for [ 6= 0, the previous method can be followed, which yields after substitution towards normalized variables, 2 2 ] x ] y exp 3 x 32[ xy+y ] " ] " 2 2 2 2 13 ( ) w [ gx gy log h3 2 gw log h3 2 g t H [1 2 \1 \2 ] = 3" 3" 3" 3" (2)2 1 3 2[ Unfortunately, we cannot evaluate this integral analytically. Let us compute the upper bound \ max from (4.13) with j131 ({) = 11 log { and j231 ({) = 12 log {, Z 1 log2 {g{ = 2 H [1 2 \1 \2 ; [ = 1] = 0 and thus \ max = 1. The lower boundary \ min follows from (4.14) as1 , Z 1 2 H [1 2 \1 \2 ; [ = 1] = log { log(1 {)g{ = 2 6 0 2 Here, we find \ min = 1 6 = 0.644 934===. In summary, exponential correlated random variables can be generated from Gaussian correlated random variables, but the correlation coe!cient 1 Substituting the Taylor expansion log(1 3 {) = 3 ] 1 log { log(1 3 {)g{ = 3 0 and ] 1 {n log {g{ = 3 0 ] " S" {n n=1 n " [ 1 ] 1 n n=1 0 gives {n log {g{ h3(n+1)x xgx = 3 0 1 (n + 1)2 Thus, ] 1 log { log(1 3 {)g{ = 0 " [ n=1 1 n(n + 1)2 1 1 1 1 Since n(n+1) 2 = n 3 n+1 3 (n+1)2 , ] 1 log { log(1 3{)g{ = 0 " [ 1 n=1 n 3 " [ 1 n=2 n 3 " [ 1 n=2 n2 = 13 " [ 1 n=2 n2 = 23 " [ 1 n=1 n2 = 2 3(2) = 2 3 2 6 4.4 Examples of the non-linear transformation method 77 h i 2 \ is limited to the interval 1 6 > 1 . As explained in the introduction of Section 4.2, the exponential random variables are positive with infinite range for which not all negative correlations are possible. The analysis demonstrates that it is not possible to construct two exponential random 2 variables with correlation coe!cient smaller than \ min = 1 6 ' 0=645. 4.4.3 Correlated lognormal random variables Two correlated lognormal random variables \1 and \2 with distribution specified in (3.43) can be constructed directly from two correlated Gaussian random variables [1 and [2 . In particular, let \1 = hd1 [1 and \2 = hd2 [2 . The explicit scaling parameters can be used to determine the desired mean. From (4.7), Z "Z " H [\1 \2 ] = 3" 3" hd1 x hd2 y i[1 [2 (x> y; [ )gxgy ¸ 12 2 22 2 = exp d1 1 + d2 2 + d1 + d1 d2 [ 1 2 + d2 2 2 where the Laplace transform (4.2) for q = 2 has been used. Invoking (3.45) and (3.46) with m $ dm m and m2 $ d2m m2 , the correlation coe!cient \ is hd1 1 d2 2 [ 1 \ = r³ ´³ ´ 2 2 2 2 hd1 1 1 hd2 2 1 (4.16) If at least one (but not all) of the quantities 1 > 2 > d1 or d2 grows large, \ tends to zero irrespective of [ . Thus even if [1 and [2 and, hence also \1 and \2 , have the strongest kind of dependence possible, i.e. [ = ±1, the correlation coe!cient \ can be made arbitrarily small. In case d1 1 = d2 2 = , (4.16) reduces to 2 \ = h [ 1 h2 1 2 We observe that \ max = 1, while \ min = h3 A 1 for A 0; again a manifestation that for positive random variables with infinite range not all negative correlations are possible. 78 Correlation 4.5 Linear combination of independent auxiliary random variables In spite of the generality of the non-linear transformation method, the involved computational di!culty suggests us to investigate simpler methods of construction. It is instructive to consider two independent random variables Y and Z with known probability generating functions *Y (}) and *Z (}) respectively. In the discussion of the uniform random variable in Section 3.2.1, it was shown how to generate by computer an arbitrary random variable from a uniform random variable. We thus assume that Y and Z can be constructed. Let us now write [ and \ as a linear combination of Y and Z, [ = d11 Y + d12 Z + e1 \ = d21 Y + d22 Z + e2 which is specified by the matrix D= d11 d12 d21 d22 ¸ and compute the covariance defined in (2.56), Cov [[> \ ] = H [[\ ] [ \ h £ ¤ i = d11 d21 H Y 2 (H [Y ])2 + (d11 d22 + d12 d21 ) H [Y Z ] h £ i ¤ (d11 d22 + d12 d21 ) H[Y ] H[Z ] + d12 d22 H Z 2 (H[Z ])2 Since X and Y are independent, H [Y Z ] = H [Y ] H [Z ], and with the definition of the variance (2.16) and denoting Y2 =Var[Y ] and similarly for Z , we obtain 2 Cov [[> \ ] = d11 d21 Y2 + d12 d22 Z In a same way, we find 2 2 [ = d211 Y2 + d212 Z 2 \2 = d221 Y2 + d222 Z (4.17) such that the correlation coe!cient, in general, becomes 2 d11 d21 Y2 + d12 d22 Z q = q 2 2 d211 Y2 + d212 Z d221 Y2 + d222 Z which is independent of the constants e1 and e2 since for a centered moment H ([ H [[])2 = H ([ + e H [[ + e])2 . 4.5 Linear combination of independent auxiliary random variables 79 In order to achieve our goal of constructing two correlated random variables [ and \ , we can choose the coe!cients of the matrix D to obtain an expression as simple as possible. If we choose [ = Y or d11 = 1, d12 = e1 = 0, the correlation coe!cient reduces to 2 d21 [ 1 = q q =r 2 2 2 + d2 2 d2 Z [ d221 [ 22 Z 1 + 22 2 2 d21 [ By rewriting this relation, we obtain d21 = ± If we choose d22 = d Z p 22 [ 1 2 p 1 2 , the random variables [ and \ are specified as [=Y \ =± p Z Y + 1 2 Z + e2 [ 2 = 2 and 2 = 2 . Finally, and the corresponding variances (4.17) are [ Y \ Z we require that H [Z ] = Z = 0, which specifies e2 = H [\ ] \ H [[] [ If Z is a zero mean random variable with standard deviation Z = \ , the random variables [ and \ are correlated with correlation coe!cient p \ \ \ =± [ + 1 2 Z + \ [ (4.18) [ [ In the sequel, we take the positive sign for . Let us now investigate what happens with the distribution functions of [¤ £ 3}[ and \ . Using the pgfs for continuous random variables * (}) = H h [ ¤ £ and *\ (}) = H h3}\ , the last relation (4.18) becomes h i h s 2 i £ 3}\ ¤ 3} \ 3 \ [ 3} \ [ [ H h H h [ H h3} 13 Z =h because Y = [ and Z are independent, or ¶ µ ´ ³p \ 3} \ 3 \ [ [ *\ (}) = h *[ } *Z 1 2 } [ (4.19) In order to produce two random variables [ and \ that are correlated with correlation coe!cient , the pgf of the zero mean random variable Z with 80 Correlation variance \2 must obey, h \ 3 \ [ [ I } 132 µ *Z (}) = *[ [ µ *\ \ s 13 ¶ s} 2 13 ¶ (4.20) } 2 which can be written in terms of the translated random variables \ 0 = \ \ and [ 0 = [ \ [ , µ ¶ } *\ 0 s 2 13 ¶ µ *Z (}) = \ s *[ 0 } 2 [ 13 This form shows that, if [ 0 and \ 0 have a same distribution, Z possesses, in general, a dierent distribution. Only the pgf of a Gaussian (with zero mean) obeys the functional equation µ ¶ } i s 2 13 ¶ i (}) = µ i s 2} 13 The joint probability generating function follows from (2.61) as Z "Z " ¤ £ h3}1 {3}2 | i[\ ({> |)g{g| *[\ (}1 > }2 ) = H h3}1 [3}2 \ = 3" and the inverse is i[\ ({> |) = 1 (2l)2 Z f1 +l" Z f2 3l" f1 3l" f2 3l" 3" h}1 {+}2 | *[\ (}1 > }2 )g}1 g}2 (4.21) Using (4.18), we have ¸ s 2 i h 3 }1 +}2 \ [ [ [ H h *[\ (}1 > }2 ) = h H h3}2 13 Z µ ¶ ³ p ´ \ 3} 3 \ *Z }2 1 2 (4.22) = h 2 \ [ [ *[ }1 + }2 [ 3}2 \ 3 \ [ Introduced into the complex double integral (4.21), the joint probability density function of the two correlated random variables can be computed. The main deficiency of the linear combination method is the implicit assumption that any joint distribution function i[\ ({> |) can be constructed from two independent random variables [ and Z . The corresponding joint 4.5 Linear combination of independent auxiliary random variables 81 pgf (4.22) possesses a product form that cannot always be made compatible with the form of an arbitrary pgf *[\ (}1 > }2 ). The examples below illustrate this deficiency. 4.5.1 Correlated Gaussian random variables ¤ £ If [ and \ are Gaussian random variables with Laplace transform H h3}\ given in (3.22), the expression (4.20) for *Z (}) becomes µ 2 ¶ ¸ \ *Z (}) = exp }2 2 which shows that Z is also a Gaussian random variable with mean Z = 0 and standard deviation Z = \ . Further, the joint pgf follows from (4.22) as ¸ 2 £ 3}1 [3}2 \ ¤ [ \2 2 2 } + }1 }2 \ [ + } = exp }2 \ [ }1 + H h 2 1 2 2 Since ¸ ¸ 2 2 ¤ [ \2 2 1 £ }1 [ \ [ 2 } + }1 }2 \ [ + } = }1 }2 [ \ \2 }2 2 1 2 2 2 ¤ £ formula (4.2) indicates that H h3}1 [3}2 \ is the two dimensional pgf of a joint Gaussian with pdf (4.4). The linear combination method thus provides the exact results for correlated Gaussian random variables. 4.5.2 Correlated exponential random variables Let [ and \ be two correlated, exponential random variables with rate { and | . Recall that H [[] = [ = 1{ . Using the Laplace transform (3.16) in (4.20), we obtain *Z (}) = h | 13 I 132 } 1+ s} 132 } 1+ s 2 | 13 | The corresponding probability distribution function follows from (2.38) as IZ (w) = 1 2l Z f+l" 1 + f3l" s} } w+ 132 h 1 + s} 2 | 13 | | 13 I } 132 g} fA0 82 Correlation Define the normalized time W = 1 IZ (w) = 2l | s1 Z f+l" f3l" 132 , then 1 + W } h}(w+(13)W ) g} 1 + }W } Since w + (1 ) W A 0, the contour can be closed over the negative Re(}) plane encircling the poles at } = W1 and } = 0. By Cauchy’s residue theorem, ¢ ¡ (1 + W }) } + W1 }(w+(13)W ) (1 + W }) } }(w+(13)W ) IZ (w) = lim + lim h h }<0 (1 + }W ) } (1 + }W ) } }<3 W1 w = 1 (1 ) h3(13) h3 W Hence, for the generation of two exponential, correlated random variables, the auxiliary random variable Z has an exponential distribution with an atom of size (1 ) h3(13) at w = 0, which is fortunately easily to generate with a computer. It appears that only for 0, the linear combination method leads to correct results for exponential random variables. Moreover, the method does not give anh indication i of the validity in the range of . We 2 have shown above that 5 1 6 > 1 . While the linear combination method applied to generate two exponential random variables still correctly treats a range of , the application to correlated uniform random variables leads to bizarre results and definitely shows the deficiency of the method. The di!culties already encountered in this chapter in generating q = 2 correlated random variables with arbitrary distribution suspects that the case for q A 2 must be even more intractable. 4.6 Problem (i) Show in two dimensions that (4.3), or in explicit form (4.4), is indeed the joint pdf corresponding to (4.2). 5 Inequalities Hardy et al. (1999) view the most known inequalities from various angles, provide several dierent proofs and relate the nature of these inequalities. For example, starting from the most basic inequality between geometric and arithmetic mean1 , {+| s max({> |) (5.1) {| 2 ¡s s ¢2 { | 0, they masterly extend this which directly follows from relation to the theorem of the arithmetic and geometric mean in several real variables {n , q q Y X {tnn tn {n (5.2) min({> |) Pq n=1 n=1 where n=1 tn = 1. They further move to the inequalities of CauchySchwarz, of Hölder, of Minkowski and many more. Only a few inequalities are reviewed here and we recommend the classic treatise on inequalities by Hardy, Littlewood and Polya for those who search for more depth, elegance and insight. 5.1 The minimum (maximum) and infimum (supremum) Since these concepts will be frequently used, we explain the dierence by concentrating on the minimum and infimum (the maximum and the supremum follow analogously). Let be a non-empty subset of R. The subset 1 The arithmetic-geometric mean P({> |) is the limit for q < " of the recursion {q = I 1 ({q31 + |q31 ), which is an arithmetic mean, and |q = {q31 |q31 , which is a geometric 2 mean, with initial values {0 = { and |0 = |. Gauss’s famous discovery on intriguing properties of P({> |) (which lead e.g. to very fast converging series for computing ) is narrated in a paper by Almkvist and Berndt (1988). 83 84 Inequalities is said to be bounded from below by P if there exists a number P such that, for all { 5 holds that { P . The largest lower bound (largest number P ) is called the infimum and is denoted by inf ( ). Further, if there exists an element p 5 such that p { for all { 5 , then this element p is called the minimum and is denoted by min ( ). If the minimum min ( ) exists, then min ( ) = inf ( ). However, the minimum does not always exists. The classical example is the open interval (d> e), where inf ((d> e)) = d, but the minimum does not exist because d 5 @ (d> e). On the other hand, for the closed interval [d> e], we have that inf ([d> e]) = min ([d> e]) = d. This example also illustrates that every finite non-empty subset of R has a minimum. 5.2 Continuous convex functions A continuous function i ({) that satisfies for x and y belonging to an interval L, µ ¶ x+y i (x) + i(y) i 2 2 is called convex in that interval L. If i is convex, i is concave. Hardy et al. (1999, Section 3.6) demonstrate that this condition is fundamental from which the more general condition2 ! à q q X X i tn {n tn i ({n ) (5.3) n=1 Pq n=1 where n=1 tn = 1, can be deduced. Moreover, they show that a convex function is either very regular or very irregular and that a convex function that is not “entirely irregular” is necessarily continuous. Current textbooks, in particular the book by Boyd and Vandenberghe (2004), usually start with the definition of convexity from (5.3) in case q = 2 where t1 = 1 t2 = t and 0 t 1 as i (tx + (1 t)y) ti (x) + (1 t)i (y) (5.4) where x and y can be vectors in an p-dimensional space.Fig_convex Geometrically with p = 1 as illustrated in Fig. 5.1, relation (5.4) shows that each point on the chord between (x> i (x)) and (y> i (y)) lies above the 2 The convexity concept can be generalized (Hardy et al., 1999, Section 98) to several variables in which case the condition (5.3) becomes $ # [ [ [ tn {n > tn |n $ tn i ({n > |n ) i n n n 5.2 Continuous convex functions 85 f(x) f f(v) c2 c1 f(u) u a a' b' b v x Fig. 5.1. The function i is convex between x and y. curve i in the interval L. The more general form (5.3) asserts that the centre of gravity of any number of arbitrarily weighted points of the curve lies above or on the curve. Figure 5.1 illustrates that for any convex function i and points d> d0 > e> e0 5 [x> y] such that d d0 e0 and d ? e e0 , the chord f1 over (d> e) has a smaller slope than the chord f2 over (d0 > e0 ) or, 0 (d) (d0 ) i (ee)3i = Suppose that i ({) is twice dierentiable equivalently, i (e)3i 0 3d0 e3d in the interval L, then a necessary and su!cient condition for convexity is i 00 ({) 0 for each { 5 L. This theorem is proved in Hardy et al. (1999, pp. 76—77). Moreover, they prove that the equality in (5.3) can only occur if i({) is linear. Applied to probability, relation (5.3) with tn = Pr [[ = n] and {n = n is written with (2.12) as i (H [[]) H [i ([)] (5.5) and is known as Jensen’s inequality. The Jensen’s inequality (5.5) also hold for continuous random variables. Indeed, if i is dierentiable and convex, then i ({) i (|) i 0 (|)({ |). Substitute { by the random variable [ and | = H [[], then i ([) i(H [[]) i 0 (H [[])({ H [[]) After applying the expectation operator to both sides, we obtain (5.5). An important application of Jensen’s inequality is obtained for i ({) = h3}{ with real } as £ ¤ h3}H[[] H h3}[ = *[ (}) 86 Inequalities Any probability generating function *[ (}) is, for real }, bounded from below by h3}H[[] . A continuous analog of (5.3) with i ({) = h{ (and similarly for i ({) = log {) ¸ Z y Z y 1 1 exp i ({)g{ hi ({) g{ yx x yx x can be regarded as a generalization of the inequality between arithmetic and geometric mean. 5.3 Inequalities deduced from the Mean Value Theorem The mean value theorem (Whittaker and Watson, 1996, p. 65) states that if j({) is continuous on { 5 [d> e], there exists a number 5 [d> e] such that Z e j(x)gx = (e d)j() d or, alternatively, if i ({) is dierentiable on [d> e], then i (e) i (d) = (e d)i 0 () (5.6) R{ The equivalence follows by putting i ({) = d j(x)gx= It is convenient to rewrite this relation for 0 1 as i ({ + k) i ({) = ki 0 ({ + k) In this form, the mean value theorem is nothing else than a special case for q = 1 of Taylor’s theorem (Whittaker and Watson, 1996, p. 96), i ({ + k) i ({) = q31 X n=1 i (n) ({) n kq (q) k + i ({ + k) n! q! (5.7) An important application of Taylor’s Theorem (or of the mean value theorem) to the exponential function gives a list of inequalities. First, h{ = 1 + { + {2 { h 2 and, since h{ A 0 for any finite {, we have for any { 6= 0, h{ A 1 + { (5.8) 5.4 The Markov and Chebyshev inequalities 87 A direct generalization follows from Taylor’s Theorem (5.7), { h = q31 X {n {q { + h n! q! n=0 such that, for q = 2p and any {, h{ A 2p31 X n=0 {n n! and, for q = 2p + 1, h{ A 2p n X { n=0 h{ ? 2p n X { n=0 Second, estimates of the product from (5.8) as3 q Y {A0 n! {?0 n! Qq n=0 (1 + dn {) where dn { 6= 0 are obtained (1 + dn {) ? exp à q X n=0 ! dn { n=0 5.4 The Markov and Chebyshev inequalities Consider first a non-negative random variable [. The expectation reads Z d Z " Z " {i[ ({)g{ = {i[ ({)g{ + {i[ ({)g{ H [[] = 0 d Z0 " Z " {i[ ({)g{ d i[ ({)g{ = d Pr [[ d] d d Hence, we obtain the Markov inequality Pr [[ d] 3 H [[] d (5.9) A tighter bound relation indicates SqThe above Tqis obtained if all dn A 0 (e.g. dn is a probability). that j({) = n=0 (1 + dn {) is smaller than i ({) = exp { n=0 dn for any { 6= 0 and j(0) = i (0) = 1. Further, from (1 + dn {) ? hdn { it can be verified that, S for all Taylor q n coe!cients 1 ? n $ q holds that 0 ? jn $ in and j2 ? i2 such that j({) = n=0 jn { ? Sq S S " q n ? n for { A 0. Thus, for { = 1, we have j(1) ? i { i { i or n=0 n n=0 n n=0 n q \ (1 + dn ) ? n=0 q [ 1 n=0 n! # q [ n=0 $n dn 88 Inequalities Another proof of the Markov inequality follows after taking the expectation of the inequality d1[Dd [ for [ 0. The restriction to non-negative random variables can be circumvented by considering the random variable [ = (\ H [\ ])2 and d = w2 in (5.9), h i h i H (\ H [\ ])2 Var [\ ] Pr (\ H [\ ])2 w2 = w2 w2 From this, the Chebyshev inequality follows as Pr [|[ H [[]| w] 2 w2 (5.10) The Chebyshev inequality quantifies the spread of [ around the mean H [[]. The smaller , the more concentrated [ is around the mean. Further extensions of the Markov inequality use the equivalence between the events {[ d} / {j([) j(d)} where j is a monotonously increasing function. Hence, (5.9) becomes Pr [[ d] H [j([)] j(d) H [[ n ] For example, if j({) = {n , then Pr [[ d] dn . An interesting application of this idea is based on the equivalence of the events {[ H [[]+w} / {hx[ hx(H[[]+w) } provided x 0. For x 0, h i ¤ £ Pr [[ H [[] + w] = Pr hx[ hx(H[[]+w) h3x(H[[]+w) H hx[ (5.11) where in the last step Markov’s inequality £ (5.9) ¤ has been used. If the generating function or Laplace transform H hx[ is known, the sharpest bound is obtained by the minimizer xW in x of the right-hand side because (5.11) holds for any x A 0. In Section 5.7, we show that this minimizer xW obeying Re x A 0 indeed exists for probability generating functions. The resulting inequality h W i W (5.12) Pr [[ H [[] + w] h3x (H[[]+w) H hx [ is called the Cherno bound. The Cherno bound of the binomial distribution Let [ denote a binomial random variable hwith probability generating function given by i £ x[ ¤ [ x = H (h ) = (t + shx )q . Then, with H [[] = qs, (3.2) such that H h £ ¤ x h3x(H[[]+w) H hx[ = h3x(qs+w)+q log(t+sh ) 5.4 The Markov and Chebyshev inequalities £ x[ ¤¯¯ g2 3x(H[[]+w) Provided gx h H h ¯ 2 x=xW 89 A 0, the minimum xW is solution of g 3x(H[[]+w) £ x[ ¤ =0 H h h gx Explicitly, µ ¶ qshx g 3x(H[[]+w) £ x[ ¤ x = h3x(qs+w)+q log(t+sh ) (qs + w) + H h h gx t + shx from which xW follows using t = 1 s as µ ¶ qst + tw xW = log qst sw Hence, ³ ´w3qt w h i 1 qt W W h3x (H[[]+w) H hx [ = ³ ´w+qs w 1 + qs £ W ¤ W For large q, but s and w fixed, we observe4 that h3x (H[[]+w) H hx [ = ¡ ¢¢ w2 ¡ w2 h3 qst 1 + R q1 . Since Var[[] = qst and by denoting | 2 = Var[[] , we find that the asymptotic regime for large q, # " |[ H [[]| 2 | h3| Pr p (5.13) Var [[] is in agreement with the Central Limit Theorem 6.3.1. The corresponding Chebyshev inequality, # " 1 |[ H [[]| | 2 Pr p | Var [[] is considerably less tight for the binomial distribution than the Cherno bound (5.13). More advanced and sharper inequalities than that of Chebyshev are surveyed by Janson (2002). 4 Write k l w w 3 (w + qs) log 1 + h3x (H[[]+w) H hx [ = exp (w 3 qt) log 1 3 qt qs and use the Taylor expansion of log (1 ± {) around { = 0. 90 Inequalities 5.5 The Hölder, Minkowski and Young inequalities The Hölder inequality is ³X ´ ³X ´ ³X ´ X d } d e · · · } e ··· + +··· = 1 Let d = [ and e = \ , and further s = 1 A 1 and t = 1 A 1 such that 1 1 s + t = 1, then we obtain as a frequently used application, H [[\ ] (H [[ s ])1@s (H [\ t ])1@t (5.14) The Hölder inequality can be deduced from the basic convexity inequality (5.4). Since log { is a convex function for real { A 0, the basic convexity inequality (5.4) is with 0 1, log (x + (1 )y) log(x) + (1 ) log(y) After exponentiation, we obtain for x> y A 0 a more general inequality than (5.1), which corresponds to = 12 , x y 13 x + (1 )y |{ |s || |t Substitute x = Sq m |{m |s and y = Sq m ||m |t , then m=1 à |{ |s Pq m s m=1 |{m | ! à m=1 || |t Pq m t m=1 ||m | !13 |{m |s ||m |t P Pq + (1 ) q s t m=1 |{m | m=1 ||m | and summing over all m yields q X 3 4 3 413 q q X X |{m |s ||m |t(13) C |{m |s D C ||m |t D m=1 m=1 (5.15) m=1 1 By choosing s = 1 and t = 13 , we arrive at the Hölder inequality with 1 1 s A 1 and s + t = 1, q X m=1 3 41 3 41 s t q q X X s t |{m |m | C |{m | D C ||m | D m=1 (5.16) m=1 A special important case of the Hölder inequality (5.14) for s = t = 2 is the Cauchy—Schwarz inequality, £ ¤ £ ¤ (5.17) (H [[\ ])2 H [ 2 H \ 2 It is of interest to mention that the Hölder inequality is of a general type in 5.5 The Hölder, Minkowski and Young inequalities 91 the following sense (Hardy et al., 1999, Theorem 101 (p. 82)). Suppose that i ({) is convex (such Rthat the inverse j({) =Ri 31 ({) is also convex) and that { { i (0) = 0. If I ({) = 0 i (x)gx and J({) = 0 j(x)gx, and if à q à q ! ! q X X X tn dn en I 31 tn I (dn ) J31 tn J(en ) Pq n=1 n=1 n=1 with n=1 tn = 1 holds for all positive dn and en , then i ({) = {u and the above inequality is Hölder’s inequality. The next inequalities are of a dierent type. For s A 1, the Minkowski inequality is (H [|[ + \ |s ])1@s (H [|[|s ])1@s + (H [|\ |s ])1@s (5.18) or, written algebraically, 41 3 41 3 41 3 s s s q q q X X X s s s C |{m + |m | D C |{m | D + C ||m | D m=1 m=1 m=1 Suppose that i ({) is continuous and strictly increasing for { 0 and i(0) = 0. Then the inverse function j({) = i 31 ({) satisfies the same conditions. The Young inequality states that for d 0 and e 0 holds that Z e Z d i (x)gx + j(x)gx (5.19) de 0 0 with equality only if e = i (d). The Young inequality follows by geometrical consideration. The first integral is the area under the curve | = i ({) from [0> d], while the second is the area under the curve { = j(|) = i 31 (|) from [0> e]. Applications of the Cauchy—Schwarz inequality ¤ will demonstrate that both the generating function *[ (}) = £1. We H h3}[ and its logarithm O[ (}) = log(*[ (})) are convex functions of }. First, the ¤ derivative is continuous and non-negative because £ second *00[ (}) = H [ 2 h3}[ 0. Further, since *[ (})*00[ (}) (*0[ (}))2 00 O[ (}) = *2[ (}) it remains to show that *[ (})*00[ (})(*£0[ (}))2 ¤ 0. From Cauchy—Schwarz } inequality (5.17) applied to *0[ (}) = H [h3}[ with [ $ h3 2 [ and \ = ¤¢ ¤ £ ¤ ¡ £ £ } 2 [h3 2 [ , we obtain (*0[ (}))2 = H [h3}[ H h3}[ H [ 2 h3}[ = *[ (})*00[ (}). Hence, O00[ (}) 0. 92 Inequalities 2. Let \ = 1[A0 in (5.17) while [ is a non-negative random variable, then with (2.13), £ ¤ £ ¤ (H [[])2 H [ 2 H [1[A0 ] = H [ 2 (1 Pr [[ = 0]) such that an upper bound for Pr [[ = 0] is obtained, Pr [[ = 0] 1 (H [[])2 H [[ 2 ] (5.20) 5.6 The Gauss inequality In this section, we consider a continuous random variable [ with even probability density function, i.e. i[ ({) = i[ ({), which is not increasing for { A 0. A typical example of such random variables are measurement errors due to statistical fluctuations. In his epoch-making paper, Gauss (1821) established the method of the least squares (see e.g. Section 2.2.1 and 2.5.3). In that same paper, Gauss (1821, pp. 10-11) also stated and proved Theorem 5.6.1, which is appealing because of its generality. We define the probability p as Z i[ (x) gx (5.21) p = Pr [ [ ] = 3 p where = Var [[] is the standard deviation. Theorem 5.6.1 (Gauss) If [ is a continuous random variable with even probability density function, i.e. i[ ({) = i[ ({), which is not increasing for { A 0, then s if p ? 23 then q p 3 if p = 23 then 43 2 if p A 23 then ? 3I13p and, conversely, q if ? if A 4 then p I3 4 3 then p 1 42 q3 Given a bound on the probability p, Gauss’s Theorem 5.6.1 bounds the extent of the error [ around its mean zero in units of the standard deviation or, equivalently, it provides bounds for the normalized random variable 5.6 The Gauss inequality 93 [ W = [3H[[] . The proof of this theorem only uses real function theory and is characteristic for the genius of Gauss. Proof: Consider the inverse function { = j (|) of the integral | = I[ (3{). An interesting general property of the inverse function is ] 1 j 2 (x) gx = ] " 3" 0 U{ 3{ i[ (x) gx = I[ ({) 3 {2 i[ ({) g{ which is verified by the substitution { = j (x). Since H [[] = 0 and Var[[] = H [ 2 , we have ] 1 j 2 (x) gx = 2 = Var [[] (5.22) 0 Beside j (0) = 0, the derivative j 0 (|) = 1 1 = 0 ({) 3 I 0 (3{) I[ i ({) + i[ (3{) [ [ is increasing from | = 0 until | = 1 because i[ ({) attains a maximum at { = 0 and is not 00 increasing for { A 0. Hence, j (|) D 0. From the dierential 00 g |j0 (|) = j 0 (|) g| + |j (|) g| we obtain by integration |j 0 (|) 3 j (|) = ] | 00 xj (x) gx 0 00 Since j (|) D 0, we have that |j 0 (|) 3 j (|) D 0 and since |j 0 (|) A 0 (for | A 0) that k (|) = 1 3 j (|) |j 0 (|) lies in the interval [0> 1]. From (5.21), it follows that = j (p) and that k (p) = 1 3 pj 0 (p) or j 0 (p) = p (1 3 k (p)) With this preparation, consider now the following linear function J (|) = (| 3 pk(p)) p (1 3 k (p)) (5.23) Clearly, we have that J (p) = and that J0 (|) = p(13k(p)) = j 0 (p) is independent of |. Since j 0 (|) is non decreasing — which is the basic assumption of the theorem — the dierence g j 0 (|)3J0 (|) is negative if | ? p, but positive if | A p. Since j 0 (|)3J0 (|) = g| (j (|) 3 J (|)), the function j (|) 3 J (|) is convex with minimum at | = p for which j (p) 3 J (p) = 0. Hence, j (|) 3 J (|) D 0 for all | M [0> 1]. Further, J (|) is positive for | M (pk (p) > 1]. Especially in this interval, the inequality j (|) D J (|) is sharp because j (|) is positive in (0> 1]. Thus, ] 1 J2 (|) g| $ pk(p) ] 1 j 2 (|) g| ? pk(p) ] 1 j 2 (|) g| 0 Using (5.22) and with (5.23), we have 2 2 p2 (1 3 k (p))2 (1 3 pk(p))3 ? 2 3 94 Inequalities from which we arrive at the inequality 2 ? 3p2 (1 3 })2 (5.24) (1 3 p})3 where } = k (p) M [0> 1]. The derivative of the right-hand side with respect to }, # $ g 3 (1 3 })2 2 3p2 (1 3 }) =3 p (2 3 3p + p}) g} (1 3 p})3 (1 3 p})4 3p2 (13})2 is monotonously decreasing for all } M [0> 1] if p ? 23 with maximum at (13p})3 I } = 0. Thus, if p ? 23 , evaluating (5.24) at } = 0 yields ? 3p. On the other hand, if p A 23 , 2 2 (13}) 2 then 3p is maximal provided 2 3 3p + p} = 0 or for } = 3 3 p . With that value of }, (13p})3 2 2 2 I the inequality (5.24) yields ? 3 13p . Both regimes p A 3 and p ? 3 tend to a same bound ? I2 if p < 23 . The converse is similarly derived from (5.24). ¤ 3 shows that 1 If [ has a symmetric uniform distribution with i[ ({) = 2d 1{M[3d>d] , then d p = d and = I3 from which p = I3 . This example shows that Gauss’s Theorem 5.6.1 is sharpsfor p 23 in the sense that equality can occur in the first condition 3p. 5.7 The dominant pole approximation and large deviations In this section, we relate asymptotic results of generating functions to the theory of large deviations. An asymptotic expansion in discrete-time is compared to established large deviations results. The first approach using the generating function *[ (}) of the random variable [ is an immediate consequence of Lemma 5.7.1. Lemma 5.7.1 If *[ (}) is meromorphic with residues un at the (simple) poles sn ordered as 0 ? |s0 | |s1 | |s2 | · · · and if *[ (}) = r(} Q+1 ) as } $ 4, then holds *[ (}) = = Q X n=0 Q X n Pr [[ = n] } + n Pr [[ = n] } + n=0 " X un } Q+1 Q +1 s (} s ) n n=0 n " X n=0 à un Q X 1 }p + } sn p=0 sp+1 n (5.25) ! (5.26) The normalization condition *[ (1) = 1 implies that Pr[[ A Q ] = 1 Q X n=0 Pr [[ = n] = " X un Q+1 s (1 sn ) n=0 n (5.27) 5.7 The dominant pole approximation and large deviations 95 The Lemma follows from Titchmarsh (1964, Section 3.21). Rewriting (5.26) gives, Ã" ! Q " X X X un n }m *[ (}) = Pr [[ = n] } (5.28) m+1 s m=Q+1 n=0 n n=0 and hence, Pr [[ = m] = " X un m+1 n=0 sn (m A Q ) (5.29) The cumulative density function for N A Q follows from (5.29) as Pr[[ A N] = " X Pr [[ = m] = m=N+1 " X un N+1 s (1 sn ) n=0 n (N A Q ) (5.30) Lemma 5.7.1 means that, ¡if the ¢ plot Pr [[ = m] versus m exhibits a kink at m = Q , then *[ (}) = R } Q as } $ 4. Alternatively5 , the asymptotic regime does not start earlier than m Q . For large N, only the pole with smallest modulus, s0 , will dominate. Hence, u0 Pr[[ A N] N+1 (5.31) s0 (1 s0 ) This approximation is called the dominant pole approximation with the residue at the simple pole s0 equal to u0 = lim}<s0 *[ (})(} s0 ). The second approach is a large deviations approximation in discrete-time. We have " X Pr [[ = m] log Pr [[ A N] = log log m=N+1 " X {m3N31 Pr [[ = m] m=N+1 3 log C{3N31 " X ({ 5 R and { 1) 4 {m Pr [[ = m]D m=0 = (N + 1) log { log *[ ({) (5.32) This inequality holds for all real { 1. To get the tightest bound, we determine the maximizer {max of (5.32), thus L(N) = sup{D1 [(N +1) log { log *[ ({)]. There exists such a supremum on account of the convexity of 5 In terms of the queue occupancy in ATM, the initial Pr [[ = m]-regime for m ? Q reflects the cell scale, while the asymptotic regime m D Q refers to the burst scale. 96 Inequalities L(N) because *[ ({) and log *[ ({) are convex for { 1 as shown in Section 5.5. Assuming that the maximum, say {max exists, then it is solution of [ ({m a x ) {max = (N + 1) * *0 ({m a x ) and the large deviations estimate becomes [ 3(N+1) Pr [[ A N] h3[(N+1) log {m a x 3log *[ ({m a x ))] = *[ ({max ) {max (5.33) Observe that (5.33) can be obtained directly from (5.11) with N = w+H[[]. Comparing (5.33) and (5.31) indicates, for large N, that {max = s0 because lim N<" log Pr [[ A N] = log s0 = log {max N Example A frequently appearing “dominant pole” (see, for example, the extinction probability of a Poisson branching process in Section 12.3, the M/D/1 queue in Section 14.5 and the size of the giant component in the random graph in Section 15.6.4) is the real zero dierent from 1 of h(}31) }. The trivial zero is } = 1. The non-trivial solution h(31) = can be expressed as a Lagrange series (Markushevich, 1985, p. 94) for6 A 1, = h3 " X (q + 1)q31 ³ 3 ´q h q! q=0 (5.34) An exact and fast converging expansion for around = 1, " (1 )2 2 (1 )3 22 (1 )4 52 (1 )5 2 (1 ) + =1+ + + + 3 9 135 405 20 (1 )6 3824 (1 )7 1424 (1 )8 15856 (1 )9 + + + 189 42525 18225 229635 # 10 11 ¢ ¡ 44536288 (1 ) 11714672 (1 ) 11 + + r (1 ) + 189448875 795685275 + (5.35) (}31) is derived in Van Mieghem (1996) as the zero of h }31 3} . The numerical data show that the approximation ' 12 , which can be deduced from the series, is within 1% accurate for 0=84 ? 1. 6 From (14.43) we observe for d = e = 1 and } = that in (5.34) = 1 for all 0 $ ? 1. 6 Limit laws Limit laws lie at the heart of analysis and probability theory. Solutions of problems often considerably simplify in limit cases. For example, in Section 16.5.1, the flooding time in the complete graph with Q nodes and exponentially distributed link weights can be computed exactly. However, the expression is unattractive, but, fortunately, the limit result for Q $ 4 is appealing. Many more results and deep discussions are found in the books of Feller (1970, 1971). In this chapter, we will mainly be concerned with P sums of independent random variables, Vq = qn=1 [n . 6.1 General theorems from analysis In this section, we define modes of convergence of sequences of random variables and state (without proof) some general theorems that will be used later on. 6.1.1 Summability We will need results from the analysis on summability1 . First the discrete case is presented and then the continuous case. Lemma 6.1.1 Let {dq }qD1 be a sequence of numbers with limq<" dq = d, then the average of the partial sums converges to d, 1 X dp = d q<" q p=1 q lim 1 (6.1) In his classical treatise on Divergent Series, Hardy (1948) discusses Césaro, Abel, Euler and Borel summability in depth. 97 98 Limit laws Proof: The demonstration of (6.1) is short enough to include here. The fact that there is a limit d of the sequence d1 > d2 > = = = implies that, for an arbitrary % A 0, there exist a finite number q0 such that, for all q A q0 , P holds that |dq d| ? %. Consider the average partial sum vq = q1 qp=1 dp or, rewritten, vq d = q0 q 1 X 1 X (dp d) + (dp d) q p=1 q p=q 0 Hence, µ ¶ q0 q q q0 1 X 1 X f |vq d| |dp d| + |dp d| ? + % q q p=q q q p=1 0 f ? +% q Since f is a constant, qf can be made arbitrarily small for q large enough ¤ such that |vq d| ? %, which is equivalent to (6.1). In fact, as illustrated by many examples in Hardy (1948, Chapter I and II), relation (6.1) converges in more cases than limq<" dq = d does. For example, if d2q = 1 and d2q+1 = 0, the limit limq<" dq does not exist, but (6.1) tends to 12 . Probabilistically, the Lemma 6.1.1 is closely related to the sample mean and the Law of Large Numbers (Section 6.2). The continuous case distinguishes between limw<" j(w), which is called the pointwise limit (for su!ciently large w, all points R w w will be arbitrarily close to that limit) and between the limit limw<" 1w 0 j(x)gx, which is called the time average2 of j= Lemma 6.1.2 If the pointwise limit limw<" j(w) = j" exists, then the time average Z 1 w j(x)gx = j" = lim w<" w 0 Proof: The proof is analogous to that of Lemma 6.1.1 in the discrete case since limw<" j(w) = j" means that for an arbitrary % A 0, there exist a finite number W such that, for all w A W holds that |j(w) j" | ? %. For any w A W , Z Z Z 1 W 1 w 1 w j(x)gx j" = (j(x) j" ) gx + (j(x) j" ) gx w 0 w 0 w W 2 In summability theory, it is also known as the Cesaro limit of j. 6.1 General theorems from analysis and 99 ¯Z ¯ ¯ Z w ¯ Z ¯ 1¯ W ¯1 ¯ 1 w ¯ ¯ ¯ ¯ j(x)gx j" ¯ ¯ (j(x) j" ) gx¯ + |j(x) j" | gx ¯w w 0 w W 0 wW f ? +% w w Since f is a constant, the lemma follows by letting w $ 4. ¤ Both in Markov theory (Section 9.3.2) and in Little’s Law (Section 13.6) these Lemmas will be used. 6.1.2 Convergence of a sequence of random variables A sequence {[n }nD0 of random variables may converge to a random variable [ in several ways. If ¸ Pr lim |[n [| = 0 = 1 n<" then the sequence {[n }nD0 converges to [ with probability 1 (w.p. 1) or almost surely (a.s.). This mode of convergence is denoted by [n $ [ w.p. 1 or a.s. as n $ 4. If, for any A 0, lim Pr [|[n [| A ] = 0 n<" then it is said that the sequence {[n }nD0 converges in probability or in meas sure to [. This mode of convergence is denoted by [n $ [ as n $ 4. Convergence in probability is a weaker notion of convergence than almost sure convergence. Almost sure convergence implies convergence in probability, whereas convergence in probability means that there exists a subsequence of {[n }nD0 that converges almost surely. An equivalent criterion for almost surely convergence is Pr [|[n [| A i.o.] = 0 where “i.o.” stands for “infinitely often”, thus for an infinite number of n. If, for all { with the possible exception of a set of measure zero where I[q ({) is discontinuous, the distributions lim I[q ({) = I[ ({) n<" g then the sequence {[n }nD0 converges in distribution to [, denoted as [n $ 100 Limit laws g [ as n $ 4 or, sometimes, in mixed form as [n $ I[ as n $ 4. If, for 1 t, lim H [|[n [|t ] = 0 n<" t then the sequence R " {[n }tnD0 converges to [ in O , the space of all functions i for which 3" |i ({) | g{ ? 4. The most common values of t are 1, 2 and t = 4. This convergence is also called convergence in norm (see Appendix A.3). The Markov inequality (5.9) H [|[n [|] shows that convergence in mean (t = 1) implies convergence in probability. In general, it is fair to say that the convergence of sequences belong to the most complicated topics in both analysis and probability theory. In many limit theorems, for example, the Law of Large Numbers in Section 6.2 and Little’s Law in Section 13.6, the art consists in proving the theorem with the least possible number of assumptions or in its most widely applicable form. Pr [|[n [| ] 6.1.3 List of general theorems Theorem 6.1.3 (Continuity Theorem) Let {Iq }qD1 be a sequence of distribution functions with corresponding probability generating functions {*q }qD1 . If limq<" *q (}) = *(}) exists for all }, and, in addition if * is continuous at } = 0, then there exists a limiting distribution function I g with generating function * for which Iq $ I . Proof: See e.g. Berger (1993, p. 51). ¤ Theorem 6.1.4 (Dominated Convergence Theorem) Let {iq }qD1 and i be real functions and suppose that for each { lim iq ({) = i ({) q<" If there exists a real function j ({) such that |iq ({)| ? j({) and for which the random variable j([) has finite expectation, then lim H [iq ([)] = H [i ([)] q<" Proof: See e.g. Royden (1988, Chapter 4). ¤ 6.2 Law of Large Numbers 101 6.2 Law of Large Numbers Theorem 6.2.1 (Weak Law of Large Numbers) Let {[n } be a sequence of independent random variables each with distribution identical to that of the random variable [ with = H [[]. If the expectation = H [[] exists, then, for any A 0, ¯ ¸ ¯ ¯ ¯ Vq (6.2) lim Pr ¯¯ ¯¯ = 0 q<" q Proof 3 : Replacing [n by [n demonstrates that, without loss of generality, that = 0. Denote Xq = Vqq , then *Xq (}) = £ 3}X ¤ we may £ 3}Vassume ¤ q H h = H h q @q . Since the set {[n } is independent with common distribution, applying relation (2.66) yields ³ ³ } ´´q *Xq (}) = *[ q Since the expectation exists ( = 0), the Taylor expansion ¢¢q of *[ ¡ ¡ } (2.40) around } = 0 is *[ (}) = 1 + r(}) . Taking ¡ and¡*X¢¢q (}) = ¡ 1¢+ r q the logarithm, log(*Xq (})) = q log 1 + r q} = q=r q} = r(}) for large q such that limq<" *Xq (}) = 1. By the Continuity Theorem 6.1.3, *X (}) = ¤ £ g H h3}X = 1 which implies that Xq $ 0. Hence, the sequence Vqq converges in distribution to = 0, which is equivalent to (6.2). ¤ The Weak Law of Large Numbers is a general result of the behavior of the sample mean Vqq of independent random variables with same existing expectation . It is weak in the sense that only convergence in probability is established. For large q, the weak law of large numbers states that the sample mean Vqq will be close (less than an arbitrary ¯ ) to the expectation ¯V ¯ q ¯ remains small for all with high probability. It does not imply that ¯ ¯q the¯ Weak Law large q. In fact, large fluctuations in ¯ Vqq ¯ can happen; ¯ of Large Numbers only concludes that large values of ¯ Vqq ¯ occur with (very) small probability. For example, in a coin-tossing experiment with a fair coin such that Pr [[n = 1] = Pr [[n = 0] = = 12 in q-trials, the sequence of always head {[n = 1}1$n$q is possible with probability 23q and Vqq = 1 A . But, only for q $ 4, the probability of this “always head sequence” is impossible (limq<" 23q = 0). For all finite q there is a non-zero probability of having a large deviation from the mean. If we assume in addition to the existence of the expectation that also the variance Var[[] exists, the Weak Law follows from the Chebyshev inequality 3 An alternative proof is given in Feller (1970, p. 247—248). 102 Limit laws (5.10). This exemplifies the increasingly complexity if less restrictions in the theorems are assumed. Indeed, using (2.57) for independent random £ ¤ and the Chebyshev inequality (5.10) gives variables, Var Vqq = Var[[] q ¯ ¸ ¯ ¯ Vq ¯ Var [[] ¯ ¯ ¯ Pr ¯ q q2 which tends to zero for any fixed and finite Var[[]. In fact, with the additional assumption of a finite variance Var[[], a much more precise result can be proved known as the Central Limit Theorem (Section 6.3). We remark that the Weak Law of Large Numbers also holds in the case Var[[] does not exist. Theorem 6.2.2 (Strong Law of Large Numbers) Let {[n } be a sequence of independent random variables each with distribution identical to that of the random variable [ with = H [[]. If the expectation = H [[] and variance Var[[] exists, then, ¸ Vq Pr lim = =1 (6.3) q<" q Proof : See e.g. Feller (1970, p. 259—261), Berger (1993, pp. 46—48) or Wol (1989, pp. 40—41). Their proof is based on the Kolmogorov criterion: P Var[[n ] the convergence of " is a su!cient condition for the Strong Law n=1 n2 of Large Numbers for independent random£ variables [n with mean H [[n ] ¤ and variance Var[[n ]. If the existence of H [ 4 is assumed, Ross (1996, pp. 56—58) and Billingsley (1995, p. 85) provide a dierent proof. Wol (1989, pp. 41—42) remarks that both the Weak and Strong Laws hold under much weaker conditions: it is only needed that the [n are not correlated. In other words, Vqq $ w.p. 1 implies H [[n ] = even if Var[[n ] = 4. ¤ ¯ ¯ The Strong Law of Large Numbers roughly states that ¯ Vqq ¯ remains small for su!ciently large q with overwhelming probability. The importance of the Law of Large Numbers is the mathematical foundation of the intuition that the sample mean is the best estimator. Theorem 6.2.3 (Law of the Iterated Logarithm) Let {[n } be a sequence of independent random variables each with distribution identical to that of the random variable [ with = H [[] and, if Var[[] exists, then, ¸ Vq q Pr lim sup s =1 =1 (6.4) q<" 2q log log q 6.3 Central Limit Theorem 103 Proof: See e.g. Billingsley (1995, p. 154—156) or Feller (1970, Section ¤ VIII.5)4 . In addition to the Weak and Strong Laws of Large Numbers, the ¯ Law of ¯ the Iterated Logarithm provides information about large values of ¯ Vqq ¯. q ¯ ¯ Specifically, it states that the bound ¯ Vq ¯ 2 log log q holds almost q q surely. The latter means that it is satisfied infinitelyqoften and only for a ¯ ¯ finite number of values of q, the converse ¯ Vq ¯ A 2 log log q may occur. q q 6.3 Central Limit Theorem Theorem 6.3.1 (Central Limit Theorem) Let {[n } be a sequence of independent random variables each with distribution identical to that of the 3q g random variable [ with finite = H [[] and 2 = Var[[]. Then VqI $ q Q (0> 1) or, explicitly, ¸ Z { w2 1 Vq q s h3 2 gw { $ ({) = s Pr q 2 3" Proof : Without loss of generality, we may confine to normalized random variables — replace [n by [n3 — such that = 0 and = 1. Consider the scaled random variable Xq = q Vq , where q is a real number depending on q and to be determined later. Similarly as in the proof of the Weak Law of Large Numbers, we find that *Xq (}) = (*[ (q }))q . Due to the existence of the variance, the Taylor expansion (2.40) of *[ around } = 0 is known 2 with higher precision as *[ (}) = 1 + }2 + r(} 2 ). For su!ciently small }, the logarithm µ ¶ q2 } 2 2 } 2 2 2 + r(dq } ) = q q + r(qd2q } 2 ) log(*Xq (})) = q log 1 + 2 2 ³ ´ only converges to a finite (non-zero) number if q = R I1q . Choosing the simplest function that satisfies this condition, q = I1q , leads to 2 limq<" log(*Xq (})) = }2 or, since ³ the ´ logarithm is a continuous, increasing 2 function, limq<" *Xq (}) = exp }2 . The transform (3.22) shows that the corresponding limit random variable is a Gaussian Q (0> 1). The theorem then follows by virtue of the Continuity Theorem 6.1.3. ¤ 4 Feller also mentions sharper bounds. 104 Limit laws An alternative formulation of the Central Limit Theorem is that the nfold convolution of any probability density function converges to a Gaussian h i 2 (nW) ({3) 1 probability distribution, i[ ({) $ I2 exp 22 with = nH [[] and 2 = nVar[[]. Both the Law of Large Numbers and the Central Limit Theorem can be shown to be valid for a surprisingly large class of sequences {[n } where each random variable may have a dierent distribution. The conditions for the extension of the Central Limit Theorem are summarized in the Lindeberg conditions (Feller, 1971, p. 263). An example where the sum of independent random variables tend to a dierent limit distribution than the Gaussian appears in Section 16.5.1. If higher moments are known, the convergence to the Gaussian distribution can be bounded. Feller (1971, Chapter XVI) devotes a chapter on expansions related to the Central Limit Theorem culminating in the BerryEsseen Theorem. Theorem 6.3.2 (Berry—Esseen Theorem) Let {[n } be a sequence of independent random variables each with distribution identical to that h of the i 3 2 random variable [ with finite = H [[], = Var[[] and = H |[3| . 3 Then, with F = 3, ¯ ¯ ¸ ¯ ¯ F Vq q ¯ s { ({)¯¯ s sup ¯Pr q q { (6.5) Proof : See e.g. Feller (1971, Section XVI.5). The constant F can be slightly improved to F 2=05. ¤ As an example of the rate of convergence towards the Gaussian distribution, the n-fold convolutions of the uniform density given by (3.30) is plotted in Fig. 6.1 together with the Gaussian approximation (3.19). 6.4 Extremal distributions 6.4.1 Scaling laws In this section, limit properties of the maximum and minimum of a set {[n } of independent random variables are discussed. For simplicity, we assume that all random variables [n have identical distribution I ({) = Pr [[ {] 6.4 Extremal distributions 105 1.0 Pdf of k-convolved uniform random variables k=2 Exact convolution Gaussian approximation 0.8 k=3 k=4 0.6 k=8 0.4 k = 16 0.2 0.0 0 2 4 6 8 10 12 14 x (n) Fig. 6.1. Both the exact iX ({) with iX ({) = 10{1 and the Gaussian approximation for several values of n. such that (3.33) and (3.32) simplify to ¸ Pr max [n { = I p ({) 1$n$p ¸ Pr min [n A { = (1 I ({))p 1$n$p Consider the limit process when p $ 4. Let {{p } be a sequence of real numbers. Then, confining to the maximum first, µ ¸¶ log Pr max [n {p = p log I ({p ) 1$n$p Since 0 I ({p ) 1 and since the logarithm has a Taylor expansion log(1 P {n {) = " n=1 n around { = 0 and convergent for |{| ? 1, we rewrite the right-hand side as log I ({p ) = log [1 (1 I ({p ))] and, after expansion, ¸¶ µ = p (1 I ({p )) + r [p (1 I ({p ))] log Pr max [n {p 1$n$p 106 Limit laws If limp<" p (1 I ({p )) = , we arrive at ¸ lim Pr max [n {p = h3 p<" (6.6) 1$n$p Hence, by choosing an appropriate sequence {{p } such that is finite (and preferably non-zero), a scaling law for the maximum of a sequence can be obtained and, similarly, for the minimum, if limp<" p (I ({p )) = , ¸ (6.7) lim Pr min [n A {p = h3 p<" 1$n$p The distribution of limp<" min1$n$p [n and limp<" max1$n$p [n are called extremal distributions. 1.0 D=1 0.8 Probability density function D=2 D=2 D = 0.5 D = 0.5 0.6 0.4 Weibull D=1 Fréchet 0.2 Gumbel 0.0 -6 -4 -2 0 2 4 6 x Fig. 6.2. The probability density function of the three types of extremal distributions. 6.4.2 The Law of Extremal Types Two distribution functions I and J are said to be of the same type if there exists constants d A 0 and e for which I (d{ + e) = J({) for all {. 6.4 Extremal distributions 107 Theorem 6.4.1 (Law of Extremal Types) Any extremal distribution of a sequence of i.i.d random variables can only have one of the three types 3{ 1. Gumbel I ({) = h3h 3 2. Fréchet I ({) = h3{ 1{D0 3. Weibull I ({) = h3(3{) 1{?0 + 1{D0 where A 0. Proof : See e.g. Berger (1993, pp. 65—69). ¤ The generality of this theorem is appealing: any maximum or minimum of a set of i.i.d. random variables has (apart from the scaling constants d and e) one of the above three types. The corresponding probability density functions are plotted in Fig. 6.2. 6.4.3 Examples 1. Consider the set {[n } of exponentially distributed random variables with I ({) = 1 h3{ . The condition for the maximum is ph3{p $ or, equivalently, {p = 1 (log p log ) and (6.6) becomes, after putting { = log , ¸ 1 3{ lim Pr max [n (log p + {) = h3h p<" 1$n$p The minimum p (1 h3{p ) $ ³ is equivalent to ´h3{p log p log log or {p 1 log(log p log ) 1 log log p log p . Hence, after putting { = log , the limit law for the minimum of exponential random variables is µ ¶¸ 1 { 3{ lim Pr min [n A log log p + = h3h p<" 1$n$p log p For both the maximum and the minimum of exponential random variables, 3{ a scaling law exists that leads to a Gumbel distribution IJxpeho ({) = h3h . In other words, for large p, the random variables P = max1$n$p [n log p and Q = log p (min1$n$p [n ) log p log log p have an identical distribution equal to the Gumbel distribution. 2. Another example is the maximum of a set of i.i.d. uniform random variables {Xn } in [0> 1] with I ({) = { for 0 { 1. Since p (1 {p ) $ with 0 {p 1 we have, after putting { = or, equivalently, {p $ 1 p with { 0, ¸ { lim Pr max Xn 1 = h3{ p<" 1$n$p p 108 Limit laws 3. Consider a rectangular lattice with size }1 and }2 and with independent and identical, uniformly distributed link weights on (0> 1] between each lattice point. The number of lattice points (nodes) equals Q = (}1 + 1)(}2 + 1) and the number of links is O = 2}1 }2 + (}1 + }2 ). The shortest hop path between two diagonal corner points consists of k = }1 + }2 hops. The weight Zk of such a k hop path is the sum of k independent uniform random variables with distribution specified in (3.29), k µ ¶ 1 X k I ({) = Pr [Zk {] = (1)m ({ m)k 1m${ m k! m=0 k In particular, Pr [Zk k] = 1 and for small { ? 1 it holds that I ({) = {k! . The precise computation of the minimum weight of a k hop path in a lattice is di!cult due to dependence among those k hop paths and we content ourselves here with an approximate estimate. If we neglect the dependence of the k hops paths due to possible overlap, then the minimum weight among all k hop can be approximated by (6.7) because the number5 p = ¡}1 +} ¢ paths k! 2 = }1 !}2 ! of those k hop paths is large. The limit sequence must obey }1 p (I ({p )) $ for su!ciently large p, which implies that I ({p ) must be ³ ´1 k k small or, equivalently, {p must be small. Hence, p {k!p = or {p = k! . p The limit law (6.7) for the minimum weight Z = min1$n$p Zk>n of the shortest hop path in a rectangular lattice is " lim Pr p<" µ min Zk>n A 1$n$p k!{ p ¶1 # k = h3{ k In other words, the random variable pZ tends to an exponential random k! or variable with mean 1 for large p = }1k! !}2 ! µ ¶ |k Pr [Z |] 1 exp p k! 5 Any path in a rectangular lattice can be representated by a sequence of r(ight), l(eft), u(p) and d(own), which is called an encoded path word. The encoded path word of the shortest hop path between diagonal corner points consists of }1 r’s (or l’s) and }2 d’s (or u’s). The total number of these paths equals }1}+}2 . Two paths coincide in a same lattice point at j $ k hops 1 from the source node if their encoded path word has the same sum of r’s and d’s in the first j lettres. The number of overlapping links between two paths equals the number of the same consecutive lettres (r or d) in a block after a same sum of r’s and d’s in the encoded path words. Checking for overlap between k hop paths requires a comparison of }1}+}2 ! permutations in 1 the encoded path words. 6.4 Extremal distributions 109 From (2.35), the mean shortest weight of a k hop path equals µ ¶ µ ¶ Z " Z " 1 {k 1 (1 IZ ({)) g{ exp p g{ = 1 + (}1 !}2 !) k H [Z ] = k! k 0 0 For a square lattice where }1 = }2 = k2 , we have ¶ µµ ¶ ¶ 2 µ k k 1 ! H [Z ] = 1 + k 2 Using Stirling’s formula (Abramowitz and Stegun, 1968, Section 6.1.38) for s 1 the factorial k! = 2kk+ 2 h3k+ 12k where 0 ? ? 1, for large k, the mean H [Z ] increases about linearly in the number of hops k, µ ¶³ ´2 s k k k H [Z ] ' kh 12k 2h 2h The average weight of a link (or 1 hop) of the shortest k hop path is roughly 1 2h 0=184. In spite of the fact that path dependence (overlap) has been in the ´ ³s ignored Q is correct. computation of the minimum weight, H [Z ] = R (k) = R However, the approximate analysis does not give the correct prefactor in H [Z ] nor the correct limit pdf which turns out to be Gaussian. Hence, if random variables are not independent, Theorem 6.4.1 does not apply. Finally, a shortest k hop path is not necessarily the overall shortest path because it is possible — though with small probability — that the overall shortest path has k + 2m hops with m A 0. 4. The probability density function of the longest shortest path The most commonly used process that informs each node about changes in a network topology (e.g. an autonomous domain) is called flooding: every router forwards the packet on all interfaces except for the incoming one and duplicate packets are discarded. Flooding is particularly simple and robust since it progresses, in fact, along all possible paths from the emitting node to the receiving node. Hence, a flooded packet reaches a node in the network in the shortest possible time (if overhead in routers are ignored). Therefore, the interesting problem lies in the determination of the flooding time WQ , which is the minimum time needed to inform the last node in a network with Q nodes. Only after WQ , all topology databases at each router in the network are again synchronized, i.e. all routers possess the same topology information. Rather than investigating the flooding time WQ (for which we refer to Section 16.5), the largest number of traversed routers (hops) or the longest shortest path from the emitting node to the furthermost node in its shortest path tree is computed. The number of hops, in short the hopcount KQ , along the shortest path between two arbitrary nodes in a network containing Q nodes is modeled subject to the following assumptions: (a) the hopcount KQ is a Poisson random variable with mean H [KQ ] = = log Q with A 0, which is motivated in Section 16.3.1; (b) the number of nodes Q is very large6 ; (c) all shortest paths from the emitting node towards any other node in the network are independent. The problem reduces to compute the pdf of the random variable max1$n$Q 31 Kn . The distribution function follows from (3.9) as IKQ ({) = { [ n h3 n=0 6 n! =13 " [ n h3 n! n={+1 The size of the Internet is currently estimated at about Q ; 105 . 110 Limit laws The condition limp<" p (1 3 I ({p )) = becomes " [ lim Q 13 Q <" n={Q n = n! +1 from which we must choose the appropriate {Q as function of Q. Observe that the maximum term in the series has index n = [], where the latter denotes the largest integer smaller or equal to . For, the ratio between two consecutive (positive) terms in the n-sum equals d dn = such n n1 that, if A n, then dn A dn31 , implying that the terms increase, while, if ? n, the terms dn ? dn31 form a decreasing sequence. The series is rewritten as " [ " {Q +1 [ ({Q + 1)!n n = n! ({Q + 1)! n=0 ({Q + 1 + n)! n={Q +1 2 {Q +1 1+ + +··· = ({Q + 1)! {Q + 2 ({Q + 2)({Q + 3) We choose {Q = [] + [] ; (1 + ) for large Q and, thus large , where must be related to . The series then consists of decreasing terms. Moreover, for large , " [ (1+)+1 1 1 n = 1+ + +··· n! ((1 + ) + 1)! (1 + ) + 2@ ((1 + ) + 2@)((1 + ) + 3@) n=[]+[]+1 ? 1+ (1+)+1 ((1 + ) + 1)! 1 1 + + ··· (1 + ) (1 + )2 and thus, " [ n=[]+[]+1 (1+)+1 1 + n = n! ((1 + ) + 1)! 1 1+R Using Stirling’s formula (Abramowitz and Stegun, 1968, Section 6.1.38), {! ; large { yields I 1 2{{+ 2 h3{ , for (1+)+1 1 + (1+)+1 h(1+) 1+ ; I 1 1 ((1 + ) + 1)! ((1 + ) + 1) 2(1+)+ 2 (1 + )(1+)+ 2 ; h(1+)[13log(1+)] 1 s 2(1 + ) For large Q, the condition becomes ; Q (1+)[13log(1+)]+13 1 s 2(1 + ) log Q 1+R 1 log Q and, after taking the logarithm of both sides, 1 1 log ; ((1 + ) [1 3 log (1 + )] + 1 3 ) log Q3 log log Q3 log(2(1+))3log +R 2 2 1 log Q or log + ( 3 1) log Q + 1 log log Q + R 2 1 log Q ; ((1 + ) [1 3 log (1 + )]) log Q 3 1 log(2(1 + )) 3 log 2 (6.8) At this point, we will assume that ? 1, which justifies the expansion log (1 + ) = + R( 2 ). This assumption will be checked later. Thus, (1 + ) [1 3 log (1 + )] ; (1 + ) [1 3 ] ; (1 3 2 ) 6.4 Extremal distributions 111 and must be solved from U ; (1 3 2 ) log Q 3 3 log 2 I with U = log 2 + ( 3 1) log Q + 12 log log Q . The Newton—Raphson iteration can be applied with starting value 0 to find the solution of the equation up to the leading order in log Q, i.e. U ; (1 3 2 ) log Q. Hence, v log U U 31 13 ;13 ;13 3 log Q 2 log Q 2 0 = I 2 + 12 log log Q 2 log Q which demonstrates that, for D 1, the assumption ? 1 is correct for large Q. The case ? 1 requires the application of Newton—Raphson’s method on (6.8), which we omit here. The second iteration in Newton—Raphson’s method leads to 1 = 0 3 0 + log 0 2 20 log Q + 12 + 1 0 and shows that the n-th iteration improves the previous with a quantity of order R log3n Q . 31 Since (6.8) is only accurate up to R log Q , a second iteration is superfluous and we obtain the choice {Q = (1 + ), or {Q = 1 I 3 + 1 1 2 3 log log Q log Q 3 log 2 2 4 After substituting { = 3 log , we finally arrive for D 1 and large Q at Pr max 1$n$Q 31 Kn $ { 3 + 1 1 1 { + log Q 3 log log Q 3 log (2) = h3h 2 2 4 4 from which the pdf of the hopcount of the longest shortest path (lsp) follows as iovs ({) = 2h3h 2({f) 32({3f) h (6.9) with 3 + 1 1 1 log Q 3 log log Q 3 log (2) 2 4 4 1 1 3 1 = H [KQ ] 3 log H [KQ ] 3 log (2) + 2 2 4 4 f= and H [ovs] = f + ydu [ovs] = E 2 1 3 + 2 2 H [KQ ] 3 1 log H [KQ ] 3 0=170 4 2 ' 0=4112 24 Observe that the average longest shortest path is about twice the average hopcount if = 1 while the variance is small, constant and independent of the scaling parameter f or . Figure 6.3 compares the above approximate analysis with simulations. 112 Limit laws 0.6 Theory Simulation 0.5 Pr[lsp = k] 0.4 0.3 0.2 0.1 0.0 7 8 9 10 11 12 13 Number of hops k Fig. 6.3. The hopcount of the shortest path for Q = 4000. Both simulations based on an Internet-like topology generator (with unit link weights) and theory iovs (n) with = 0=4786 are shown. Notes (i) The classical theory of extremes, extremal properties of dependent sequences and extreme values in continuous-time are treated in detail in the book by Leadbetter et al. (1983). (ii) A more recent book by Embrechts et al. (2001a) applies the theory of extremal events to problems in insurance and finance. Part II Stochastic processes 7 The Poisson process The Poisson process is a prominent stochastic process, mainly because it frequently appears in a wealth of physical phenomena and because it is relatively simple to analyze. Therefore, we will first treat the Poisson process before considering the more general Markov processes. 7.1 A stochastic process 7.1.1 Introduction and definitions A stochastic1 process, formally denoted as {[(w)> w 5 W }, is a sequence of random variables [(w), where the parameter w — most often the time — runs over an index set W . The state space of the stochastic process is the set of all possible values for the random variables [(w) and each of these possible values is called the state of the process. If the index set W is a countable set, [[n] is a discrete stochastic process. Often n is the discrete time or a time slot in computer systems. If W is a continuum, [(w) is a continuous stochastic process. For example, the outcome of q tosses of a coin is a discrete stochastic process with state space {heads, tails} and the index set W = {0> 1> 2> = = = > q}. The number of arrivals of packets in a router during a certain time interval [d> e] is a continuous stochastic process because w 5 [d> e]. Any realization of a stochastic process is called a sample path. For example, a sample path of the outcome of q tosses of a coin is {heads, tails, tails, = = =, heads}, while a sample path of the number of arrivals in [d> e] is 1d$w?d+k > 3 × 1d+k$w?d+4k > 8 × 1d+4k$w?d+5k > = = = > 13 × 1d+(n31)k$w?e , where k = e3d n . Other examples are the measurement of the temperature each day, the notation of the value of a stock each minute or rolling a die and recording its value, which is illustrated in Fig. 7.1. 1 The word “stochastic” is derived from r"l in Greek which means “to aim at, try to hit”. 115 116 The Poisson process Especially in continuous stochastic processes, it is convenient to define increments as the dierence [(w) [(x). The continuous time stochastic process [(w) has independent increments if changes in the value of the process in dierent time intervals are independent, or, if for all w0 ? w1 ? · · · ? wq , the random variables [(w1 ) [(w0 )> [(w2 ) [(w1 )> = = = > [(wq ) [(wq31 ) are independent. The continuous (time) stochastic process has stationary increments if [(w + v) [(v) possesses the same distribution for all v. Hence, changes in the value of the process only dependent on the distance w between process events, not on the time point v. 6 6 4 4 2 2 t t Fig. 7.1. Two dierent sample paths of the experiment: roll a die and record the outcome. The total number of dierent sample paths is 6W where W is the number of times an outcome is recorded. The state space only contains 6 possible outcomes {1> 2> 3> 4> 5> 6}. Stochastic processes are distinguished by (a) their state space, (b) the index set W and (c) by the dependence relations between random variables [(w). For example, a standard Brownian motion (or Wiener process)2 is defined as a stochastic process [(w) having continuous sample paths, stationary independent increments and [(w) has a normal distribution Q (0> w). A Poisson process, defined in more detail in Section 7.2, is a stochastic process [(w) having discontinuous sample paths, stationary independent increments and [(w) has a Poisson distribution. A generalization of the Poisson process is a counting process. A counting process is defined as a stochastic process Q (w) 0 with discontinuous sample paths, stationary independent increments, but with arbitrary distribution. A counting process Q (w) represents the total number of events that have occurred in a time interval [0> w]. Examples of a counting process are the number of telephone calls at a local exchange during an interval, the number of failures in a telecommunication network, the number of corrupted bits after transmission due to channel errors, etc. 2 Harrison (1990) shows that the converse is also true: if \ is a continuous process with stationary independent increments, then \ is a Brownian motion. 7.1 A stochastic process 117 7.1.2 Modeling a stochastic process from measurements In practice, understanding observed phenomena often asks for a stochastic model that captures the main characteristics of the studied phenomena and that enables computations of diverse quantities of interest. Examples in the field of data communications networks are the determination of the arrival process at a switch or router in order to dimension the number of buer (memory) places, the modeling of the graph of the Internet, the distribution of the duration of a telephone call or web browsing session, the number of visits to certain websites, the number of links that refer to a web page, the amount of downloaded information, the number of traversed routers by an email, etc. Accurate modeling is in general di!cult and often trades o complexity against accuracy of the model. Let us illustrate some aspects of modeling by considering Internet delay measurements. A motivation for obtaining an end-to-end delay model for (a part of) the Internet is the question whether massive service deployment of voice over IP (VoIP) can substitute classical telephony with a comparable quality. Specifically, classical telephony requires that the end-to-end delay of an arbitrary telephone conversation hardly exceeds 100 ms. end-to-end delay D [ms] 50 Interdepature time = 12 s hopcount IP path = 13 # measurement points = 1006 E[D] = 35.03 ms V[D] = 1.36 ms min[D] = 34.18 ms max[D] = 53.95 ms 45 40 35 5:00 a.m. 6:00 a.m. 7:00 a.m. 8:00 a.m. 8:30 a.m. Time from 5:00 a.m. until 8:30 a.m. measured on 21/11/2002 Fig. 7.2. The raw data of the end-to-end delay of IP test packets along a same path of 13 hops in the Internet measured during 3.5 hours. 118 The Poisson process The end-to-end delay along a fixed path between source and destination measured during some interval is an example of a continuous time stochastic process. We have received data of the delay measured at RIPE-NCC as illustrated in Fig. 7.2. Figure 7.2 shows a sample path of this continuous stochastic process. The precise details of the measurement configuration are for the present purpose not relevant. It su!ces to add that Figure 7.2 shows the time dierence between the departure of an IP test packet of 100 byte at the sending box and its arrival at the destination box accurate within 10 s. 1 packets per second. Each The average sending rate of IP test packets is 12 IP test packet is assumed to follow the same path from sending to receiving box. The steadiness of the path is checked by trace-route measurements every 6 minutes. Usually, in the next step, the histogram of the raw data is made. A histogram counts the number of data points that lie in an interval of G ms, which is often called the bin size. Most graphical packages allow to choose the bin size. Figure 7.3 shows two dierent histograms with bin size G = 0=5 ms and G = 0=1 ms. In general, there is no universal rule to choose the bin size G. Clearly, the bin size is bounded below by the measurement accuracy, in our case G A 10 s. A finer bin size provides more detail, but the resulting histogram exhibits also more stochastic variations because there are fewer data points in a small bin and adjacent bins may possess a significantly dierent amount of data points. Hence, compared to one larger bin that covers a same interval, less averaging or smoothing occurs in a set of smaller bins. The normalized histogram obtained by dividing the counts per bin by the total number of data points provides a first approximation to the probability density function of G. However, it is still discrete and approximates Pr [n ? G n + G]. A more precise description of constructing a histogram is given in Section C.1. The histogram is generally better suited to decide whether outliners in de data points may be due to measurement errors or not. Figure 7.3 suggests to either neglect the data points with G A 40 ms or to measure at a higher sending rate of IP test packets in order to have more details in the intervals exceeding 38 or 40 ms. If there existed a good3 stochastic model for the endto-end delay along fixed Internet paths, a normal procedure4 in engineering and physics would be to fit the histogram with that stochastic model to obtain the parameters of that stochastic model. The accuracy of the fit can 3 4 Which is still lacking at the time of writing. Other more di!cult methods in the realm of statistics must be invoked in case the measurement data are so precious and rare that any additional measurement point has a far larger cost than the cost of extensive additional computations. 7.1 A stochastic process 119 be expressed in terms of the correlation coe!cient explained in Section 2.5.3. The closer tends to 1, the better the fit, which gives confidence that the stochastic model corresponds with the real phenomenon. Number of data points 140 120 300 100 200 80 100 60 0 40 35 40 45 50 20 0 34 35 36 37 38 39 40 41 42 43 44 D in ms Fig. 7.3. The histogram of the end-to-end delay with a bin size of 0.1 ms (the insert has bin size of 0.5 ms). Assuming that the presented measurement is a typical measurement along a fixed Internet path (which is true for about 80% of the investigated dierent paths), it demonstrates that there is a clear minimum at about 34 ms due to the propagation delay of electromagnetic waves. In addition, the end-toend delay lies for 99% between 34 and 38 ms. However, there is insu!cient data to pronounce claims in the tail behavior (Pr [G A {] for { A 40 ms). Just this region is of interest to compute the quality of service expressed as the probability that the end-to-end delay exceeds { ms is smaller than 103d where d specifies the stringency on the quality requirement. Toll quality in classical telephony sets { at 100 ms and d in the range of 4 to 5. The existence of a good stochastic model covering the whole possible range of the end-to-end delay G would enable us to compute tail probabilities based on the parameters that can be fitted from the measurements. The histogram is in fact a projection of the raw measurement data onto the ordinate (end-to-end delay axis). All time information (the abscissa in Fig. 7.2) is lost. Usually, the time evolution and the dependencies or correlations over time of a stochastic phenomenon are di!cult and most analyses are only tractable under certain simplifying conditions. For example, often only a steady state analysis is possible and the increments [(wn ) [(wn31 ) 120 The Poisson process of the process for all w0 ? · · · ? wn31 ? wn ? · · · ? wq are assumed to be independent or weakly dependent. The study of Markov processes (Chapters 9—11) basically tries to compute and analyze the process in steady state. Figure 7.2 is measured over a relatively long period of time and indicates that after 8.00 a.m. the background tra!c increases. The background traffic interferes with the IP test packets and causes them to queue longer in routers such that larger variations are observed. However, it is in general di!cult to ascertain that (a part of) the measurement is performed while the system operates in a certain stable regime (or steady state). We have touched upon some aspects in the art of modeling to motivate the importance of studying stochastic processes. In the sequel of this chapter, one of the most basic and simplest stochastic processes is investigated. 7.2 The Poisson process A Poisson process with parameter or rate A 0 is an integer-valued, continuous time stochastic process {[(w)> w 0} satisfying (i) [(0) = 0 (ii) for all w0 = 0 ? w1 ? · · · ? wq , the increments [(w1 ) [(w0 )> [(w2 ) [(w1 )> = = = > [(wq ) [(wq31 ) are independent random variables (iii) for w 0, v A 0 and non-negative integers n, the increments have the Poisson distribution Pr [[(w + v) [(v) = n] = (w)n h3w n! (7.1) It is convenient to view the Poisson process [(w) as a special counting process, where the number of events in any interval of length w is specified via condition (iii). From this definition, a number of properties can be derived: (a) Condition (iii) implies that the increments are stationary because the right-hand side does not dependent on v. In other words, the increments only depend on the length of the interval w and not on the time v when the interval begins. Further, with (3.11), the mean H [[(w + v) [(v)] = w and because the increments are stationary, this holds for any value of v. In particular with v = 0 and condition 1, the expected number of events in a time interval with length w is H [[(w)] = w (7.2) Relation (7.2) explains why is called the rate of the Poisson process, namely, the derivative over time w or the number of events per time unit. 7.2 The Poisson process 121 (b) The probability that exactly one event occurs in an arbitrarily small time interval of length k follows from condition (iii) as Pr [[(k + v) [(v) = 1] = kh3k = k + r(k) while the probability that no event occurs in an arbitrarily small time interval of length k is Pr [[(k + v) [(v) = 0] = h3k = 1 k + r(k) Similarly, the probability that more than one event occurs in an arbitrarily small time interval of length k is Pr [[(k + v) [(v) A 1] = r(k) Example 1 A conversation in a wireless ad-hoc network is severely disturbed by interference signals according to a Poisson process of rate = 0=1 per minute. (a) What is the probability that no interference signals occur within the first two minutes of the conversation? (b) Given that the first two minutes are free of disturbing eects, what is the probability that in the next minute precisely 1 interfering signal disturbs the conversation? (a) Let [ (w) denote the Poisson interference process, then Pr [[(2) = 0] needs to be computed. Since [ (0) = 0 and with (7.1), we can write Pr [[(2) = 0] = Pr [[(2) [ (0) = 0] = h32 , which equals Pr [[(2) = 0] = h30=2 = 0=8187. (b) The events during two non-overlapping intervals of a Poisson process are independent. Thus the event {[(2) [(0) = 0} is independent from the event {[(3) [(2) = 1} which means that the asked conditional probability Pr [[(3) [(2) = 1|[(2) [(0) = 0] = Pr [[(3) [(2) = 1]. From (7.1), we obtain Pr [[(3) [(2) = 1] = 0=1h30=1 = 0=0905= Example 2 During a certain time interval [w1 > w1 + 10 s], the number of IP packets that arrive at a router is on average 40/s. A service provider asks us to compute the probability that there arrive 20 packets in the period [w1 > w1 + 1 s] and 30 IP packets in [w1 > w1 + 3 s]. We may regard the arrival process as a Poisson process. We are asked to compute Pr [[(1) = 20> [(3) = 30] knowing that = 40 31 s . Using the independence of increments and (7.1), we rewrite Pr [[(1) = 20> [(3) = 30] = Pr [[(1) [(0) = 20> [(3) [(1) = 10] = Pr [[(1) [(0) = 20] Pr [[(3) [(1) = 10] = ()20 h3 (2)10 h32 = 10326 0 20! 10! 122 The Poisson process which means that the request of the service provider does not occur in practice. 7.3 Properties of the Poisson process The first theorem is the converse of the above property (b) that immediately followed from the definition. The Theorems presented here reveal the methodology of how stochastic processes are studied. Theorem 7.3.1 A counting process Q (w) that satisfies the conditions (i) Q (0) = 0, (ii) the process Q (w) has stationary and independent increments, (iii) Pr [Q (k) = 1] = k + r(k) and (iv) Pr [Q (k) A 1] = r(k) is a Poisson process with rate A 0. Proof: We must show that conditions (iii) and (iv) are equivalent to condition (iii) in the definition of the Poisson process. Denote Sq (w) = Pr [Q (w) = q] and consider first the case q = 0, then S0 (w + k) = Pr [Q (w + k) = 0] = Pr [Q (w + k) Q (w) = 0> Q (w) = 0] Invoking independence via (ii) S0 (w + k) = Pr [Q (w + k) Q (w) = 0] Pr [Q (w) = 0] By definition, S0 (w) = Pr [Q (w) = 0] and from (iii), (iv) and the fact that P" n=0 Pr [Q (k) = n] = 1, it follows that Pr [Q (k) = 0] = 1 k + r(k) (v) Combining these with the stationarity in (ii), we obtain S0 (w + k) = S0 (w) (1 k + r(k)) or r(k) S0 (w + k) S0 (w) = S0 (w) + k k from which, in the limit k $ 0, the dierential equation S00 (w) = S0 (w) is immediate. The solution is S0 (w) = Fh3w and the integration constant F follows from (i) and S0 (0) = Pr [Q (0) = 0] = 1 as F = 1. This establishes condition (iii) in the definition of the Poisson process for n = 0. 7.3 Properties of the Poisson process 123 The verification for q A 0 is more involved. Applying the law of total probability (2.46), Sq (w + k) = Pr [Q(w + k) = q] q X Pr [Q (w + k) Q (w) = m|Q (w) = q m] Pr [Q (w) = q m] = m=0 By independence (ii), Pr [Q (w + k) Q (w) = m|Q (w) = q m] = Pr [Q (w + k) Q (w) = m] and by definition Pr [Q(w) = q m] = Sq3m (w), we have Sq (w + k) = q X Pr [Q (w + k) Q (w) = m] Sq3m (w) m=0 By the stationarity (ii) Pr [Q(w + k) Q (w) = m] = Pr [Q (k) Q (0) = m] we obtain using (i) Sq (w + k) = q X Pr [Q (k) = m] Sq3m (w) m=0 while (v) and (iii) suggest to write the sum as Sq (w + k) = Sq (w) Pr [Q (k) = 0] + Sq31 (w) Pr [Q (k) = 1] q X Sq3m (w) Pr [Q (k) = m] + m=2 Since Sq (w) 1 and using (iv), q X Sq3m (w) Pr [Q (k) = m] m=2 q X Pr [Q(k) = m] = Pr [Q (k) A 1] = r(k) m=2 we arrive with (v), (iii) at Sq (w + k) = Sq (w) (1 k + r(k)) + Sq31 (w) (k + r(k)) + r(k) or Sq (w + k) Sq (w) r(k) = Sq (w) + Sq31 (w) + k k which leads, after taking the limit k $ 0, to the dierential equation Sq0 (w) = Sq (w) + Sq31 (w) 124 The Poisson process with initial condition Sq (0) = Pr [Q(0) = q] = 1{q=0} . This dierential equation is rewritten as ´ g ³ w (7.3) h Sq (w) = hw Sq31 (w) gw In ¡case q ¢= 1, the dierential equation reduces with S0 (w) = h3w to g w w gw h S1 (w) = . The general solution is h S1 (w) = w + F and, from the initial condition S1 (0) = 0, we have F = 0 and S1 (w) = wh3w . The general q 3w solution to (7.3) is proved by induction. Assume that Sq (w) = (w)q!h holds for q, then the case q + 1 follows from (7.3) as ´ (w)q g ³ w h Sq+1 (w) = gw q! q+1 3w h and integrating from 0 to w using Sq+1 (0) = 0, yields Sq+1 (w) = (w)(q+1)! which establishes the induction and finalizes the proof of the theorem. ¤ The second theorem has very important applications since it relates the number of events in non-overlapping intervals to the interarrival time between these events. Theorem 7.3.2 Let {[(w); w 0} be a Poisson process with rate A 0 and denote by w0 = 0 ? w1 ? w2 ? · · · the successive occurrence times of events. Then the interarrival times q = wq wq31 are independent identically distributed exponential random variables with mean 1 . Proof: For any v 0 and any q 1, the event {q A v} is equivalent to the event {[(wq31 +v)[(wq31 ) = 0}. Indeed, the q-th interarrival time q can only be longer than v time units if and only if the q-th event has not yet occurred v time units after the occurrence of the (q 1)-th event at wq31 . Since the Poisson process has independent increments (condition (ii) in the definition of the Poisson process), changes in the value of the process in nonoverlapping time intervals are independent. By the equivalence in events, this implies that the set of interarrival times q are independent random variables. Further, by the stationarity of the Poisson process (deduced from condition (iii) in the definition of the Poisson process), Pr [q A v] = Pr [[(wq31 + v) [(wq31 ) = 0] = h3v which implies that any interarrival time has an identical, exponential distribution, Iq ({) = Pr [q {] = 1 h3{ 7.3 Properties of the Poisson process This proves the theorem. 125 ¤ The converse of Theorem 7.3.2 also holds: if the interarrival times {q } of a counting process {Q (w)> w 0} are i.i.d. exponential random variables with mean 1 , then {Q (w)> w 0} is a Poisson process with rate . An association to the exponential distribution is the memoryless property, Pr[q A v + w|q A v] = Pr[q A w] By the equivalence of the events, for any w> v 0, Pr[q A v + w|q A v] = Pr[[(wq31 + v + w) 3 [(wq31 ) = 0|[(wq31 + v) 3 [(wq31 ) = 0] = Pr[[(wq31 + v + w) 3 [(wq31 + v) = 0|[(wq31 + v) 3 [(wq31 ) = 0] By the independence of increments (in non-overlapping intervals), Pr[q A v + w|q A v] = Pr[[(wq31 + v + w) [(wq31 + v) = 0] and by the stationarity of the increments, the memoryless property is established, Pr[q A v + w|q A v] = Pr[[(wq31 + w) [(wq31 ) = 0] = Pr[q A w] Hence, the assumption of stationary and independent increments is equivalent to asserting that, at any time v, the process probabilistically restarts again with the same distribution and is independent of occurrences in the past (before v). Thus, the process has no memory and, since the only continuous distribution that satisfies the memoryless property is the exponential distribution, exponential interarrival times q are a natural consequence. The arrival time of the q-th event or the waiting time until the q-event is P Zq = qn=1 n . In Section 3.3.1, it is shown that the probability distribution of the sum of independent exponential random variables has a Gamma distribution or Erlang distribution (3.24). Alternatively, the equivalence of the events, {Zq w} +, {Q (w) q}, directly leads to the Erlang distribution, IZq (w) = Pr [Zq w] = Pr [Q (w) q] = " X (w)n h3w n=q n! The equivalence of the events, {Zq w} +, {Q (w) q}, is a general relation and a fundamental part of the theory of renewal processes, which we will study in the next Chapter 8. Theorem 7.3.3 Given that exactly one event of a Poisson process {[(w); w 0} has occurred during the interval [0> w], the time of occurrence of this event is uniformly distributed over [0> w]. 126 The Poisson process Proof: Immediate application of the conditional probability (2.44) yields for 0 v w, Pr [1 v|[(w) = 1] = Pr [{1 v} _ {[(w) = 1}] Pr [[(w) = 1] Using the equivalence {1 v} +, {[(w0 + v) [(w0 ) = 1} and the fact that {[(w0 + v) [(w0 ) = 1} = {[(v) = 1} by the stationarity of the Poisson process gives {1 v} _ {[(w) = 1} = {[(v) = 1} _ {[(w) = 1} = {[(v) = 1} _ {[(w) [(v) = 0} Applying the independence of increments over non-overlapping intervals and (7.1) yields Pr [1 v|[(w) = 1] = = Pr [[(v) = 1] Pr [[(w) [(v) = 0] Pr [[(w) = 1] (v) h3v h3(w3v) v = (w) h3w w ¤ which completes the proof. Theorem 7.3.3 is immediately generalized to q events. For any set of real variables vm satisfying 0 = v0 ? v1 ? v2 ? · · · ? vq ? w and given that q events of a Poisson process {[(w); w 0} have occurred during the interval [0> w], the probability of the successive occurrence times 0 ? w1 ? w2 ? · · · ? wq ? w of these q Poisson events is Pr [w1 v1 > = = = > wq ? vq |[(w) = q] = Pr [{w1 v1 > = = = > wq ? vq } _ {[(w) = q}] Pr [[(w) = q] Using a similar argument as in the proof of Theorem 7.3.3, s = Pr [{w1 v1 > w2 v2 > = = = > wq ? vq } _ {[(w) = q}] = Pr [[(v1 ) [ (v0 ) = 1> = = = > [(vq ) [(vq31 ) = 1> [(w) [(vq ) = 0] 3 4 q Y = C Pr [[(vm ) [ (vm31 ) = 1]D Pr[[(w) [(vq ) = 0] m=1 3 =C q Y 4 h3(vm 3vm31 ) (vm vm31 )D h3(w3vq ) m=1 = q q Y m=1 (vm vm31 ) h3 Sq m=1 (vm 3vm31 )3(w3vq ) = q q Y m=1 (vm vm31 ) h3w 7.3 Properties of the Poisson process 127 Thus, q Pr [w1 v1 > w2 v2 > = = = > wq ? vq |[(w) = q] = q Y (vm vm31 ) h3w m=1 (w)q h3w q! q! Y (vm vm31 ) wq q = m=1 from which the density function i{wm } (v1 > = = = > vq |[(w) = q) = Cq Pr [w1 v1 > = = = > wq ? vq |[(w) = q] Cv1 = = = Cvq follows as q! wq which is independent of the rate . If 0 ? w1 ? w2 ? · · · ? wq ? w are the successive occurrence times of q Poisson events in the interval [0> w], then the random variables w1 > w2 > = = = > wq are distributed as a set of order statistics, defined in Section 3.4.2, of q uniform random variables in [0> w]. In other words, if q i.i.d. uniform random variables on [0> w] are assorted in increasing order, they may represent q successive occurrence times of a Poisson process. The average spacing between these q ordered i.i.d. uniform random variables is qw as computed in Problem (ii) of Section 3.7. A related example is the conditional probability where 0 ? v ? w and 0 n q, i{wm } (v1 > v2 > = = = > vq |[(w) = q) = Pr [{[(v) = n} _ {[(w) = q}] Pr [[(w) = q] Pr [{[(v) = n} _ {[(w) [(v) = q n}] = Pr [[(w) = q] Pr [[(v) = n] Pr[[(w) [(v) = q n] = Pr [[(w) = q] Pr [[(v) = n|[(w) = q] = q!(v)n h3v ((w v))q3n h3(w3v) n!(w)q h3w (q n)! µ ¶ n q v = (w v)q3n n wq = Hence, if s = vw , the conditional probability becomes µ ¶ q n Pr [[(v) = n|[(w) = q] = s (1 s)q3n n 128 The Poisson process Given that a total number of q Poisson events have occurred in time interval [0> w], the chance that precisely n events have taken place in the sub-interval [0> v] is binomially distributed with parameter q and s = vw . Observe that also this conditional probability is independent of the rate . In addition, since limw<" [(w) = 4 such that q $ 4, applying the law of rare events results in lim Pr [[(v) = n|[(w) = q] = w<" vn 3v h n! Given an everlasting Poisson process, the chance that precisely n events occur in the interval [0> v] is Poisson distributed with mean equal to the length of the interval. Application The arrival process of most real-time applications (such as telephony calls, interactive-video, ...) in a network is well approximated by a Poisson process. Suppose a measurement configuration is built to collect statistics of the arrival process of telephony calls in some region. During a period [0> W ], precisely 1 telephony call has been measured. What can be said of the time { 5 [0> W ] at which the telephony call has arrived at the measurement device? Theorem 7.3.3 tells us that any time in that interval is equally probable. Theorem 7.3.4 If [(w) and \ (w) are two independent Poisson processes with rates { and | , then is ](w) = [(w) + \ (w) also a Poisson process with rate { + | . Proof: It su!ces to demonstrate that the counting process Q] (w) = Q[ (w) + Q\ (w) has exponentially distributed interarrival times ] . Suppose that Q] (wq ) = q, it remains to compute the next arrival at time wq+1 = wq +v for which Q] (wq + v) = q + 1. Due to the memoryless property of the Poisson process, the occurrence of an event from wq on for each random variable [ and \ is again exponentially distributed with parameter { and | , respectively. In other words, it is irrelevant which process [ or \ has previously caused the arrival at time wq . Further, the event that the interarrival time of the sum processes {] A v} is equivalent to {[ A v} _ {\ A v} or Pr [] A v] = Pr [[ A v> \ A v] = Pr [[ A v] Pr [\ A v] = h3({ +| )v where the independence of [(w) and \ (w) has been used. This proves the theorem. ¤ 7.4 The nonhomogeneous Poisson process 129 A direct consequence is that any sum of independent Poisson processes is also a Poisson process with aggregate rate equal to the sum of the individual rates. This theorem is in correspondence with the sum property of the Poisson distribution. 7.4 The nonhomogeneous Poisson process As will be shown later in Section 11.3.2, the Poisson process is a special case of a birth-and-death process, which is in turn a special case of a Markov process. Hence, it seems more instructive to discuss these special processes as applications of the Markov process. Therefore, only associations to the Poisson process are treated here. In many cases, the rate is a time variant function (w) and such process is termed a nonhomogeneous or nonstationary Poisson process. For example, the arrival rate of a large number p of individual IP-flows at a router is well approximated by a nonhomogeneous Poisson process, where the rate (w) varies over the day depending on the number p and the individual rate of each flow of packets. Since the sum of independent Poisson random variables is again a Poisson random variable, Pp(w) we have (w) = m=1 m (w). If [(w) is a nonhomogeneous Poisson process with rate (w), the increment [(w) [(v) reflects the number of events in an interval (v> w] and increments of non-overlapping intervals are still independent. Rw Theorem 7.4.1 If (w) = 0 (x)gx and v ? w, then [(w) [(v) is Poisson distributed with mean (w) (v). The demonstration is analogous to the proof of Theorem 7.3.1. Proof (partly): Denote by Sq (w) = Pr [Q (w) Q (v) = q], then S0 (w + k) = Pr [Q (w + k) Q (v) = 0] = Pr [Q (w + k) Q (w) = 0> Q(w) Q (v) = 0] Invoking independence of the increments, S0 (w + k) = Pr [Q(w + k) Q (w) = 0] Pr[Q (w) Q(v) = 0] = S0 (w)(1 (w)k + r(k)) or S0 (w + k) S0 (w) r(k) = (w)S0 (w) + k k from which, in the limit k $ 0, the dierential equation S00 (w) = (w)S0 (w) 130 The Poisson process g is immediate. Rewritten as gw log S0 (w) = (w), after integration over (v> w], we find log S0 (w) = ((w) (v)) since S0 (v) = Pr [Q (v) Q (v) = 0] = 1. Thus, for the case q = 0, we find S0 (w) = exp [ ((w) (v))], which proves the theorem for q = 0. The remainder of the proof (q A 0) uses the same ingredients as the proof of Theorem 7.3.1 and is omitted. ¤ A nonhomogeneous Poisson process [(w) with rate (w) can be transformed to a homogeneous Poisson process \ (x) with rate 1 by the time transform x = (w). For, \ (x) = \ ((w)) = [(w), and \ (x + x) = \ ((w) + (w)) = [(w + w) because (w) = (w) for small w such that Pr [\ (x + x) \ (x) = 1] = Pr [[(w + w) [(w) = 1] = (w)w + r(w) = x + r(x) because x = (w)w + r(w). Hence, all problems concerning nonhomogeneous Poisson processes can be reduced to the homogeneous case treated above. 7.5 The failure rate function Previous sections have shown that the Poisson process is specified by a rate function (w). In this section, we consider the failure rate function of some object or system. Often it is interesting to know the probability that an object will fail in the interval [w> w + w] given that the object was still functioning well up to time w. Let [ denote the lifetime of an object5 , then this probability can be written with (2.44) as Pr [{w [ w + w} _ {[ A w}] Pr [[ A w] Pr [w ? [ w + w] = Pr [[ A w] Pr [w [ w + w|[ A w] = If i[ (w) is the probability density function of [ and I[ (w) = Pr [[ w], then for small w and assuming that i[ (w) is well behaved6 such that 5 In medical sciences, [ can represent in general the time for a certain event to occur. For example, the time it takes for an organism to die, the time to recover from illness, the time for a patient to respond to a therapy and so on. 6 Recall the discussion in Section 2.3. 7.5 The failure rate function 131 Pr [w ? [ w + w] = i[ (w)w, Pr [w [ w + w|[ A w] = i[ (w) w 1 I[ (w) This expression shows that u(w) = i[ (w) 1 I[ (w) (7.4) can be interpreted as the intensity or rate that a w-year old object will fail. It is called the failure rate u(w) and U(w) = 1 I[ (w) = Pr [[ A w] (7.5) for is usually termed7 the reliability function. Since u(w) = Pr[w$[$w+{w|[Aw] {w small w, the failure rate u (w) A 0 because u (w) = 0 would imply an infinite lifetime [. Using the definition (2.30) of a probability density function, we observe that u(w) = gU(w) gw U(w) = g ln U(w) gw Or, since U(0) = 1, the corresponding integrated relation is Z w ¸ U(w) = exp u(x)gx (7.6) (7.7) 0 The expressions (7.6) and (7.7) are inverse relations that specify u(w) as function of U(w) and vice versa. The reliability function U(w) is non-increasing with maximum at w = 0 since it is a probability distribution function. On the other hand, the failure rate u(w) being a probability density function can take any positive real value. From (7.4) we obtain the density function of the lifetime [ in terms of failure rate u(w) as Z w ¸ i[ (w) = u(w)U(w) = u(w) exp u(x)gx 0 with i[ (0) = u(0). Using the tail relation (2.35) for the expectation of the lifetime [ immediately gives the mean time to failure, Z " H [[] = U(w)gw (7.8) 0 In case I[ (W ) = 1 and i[ (W ) 6= 0 for a finite time W , which is the maximum lifetime, the definition (7.4) demonstrates that u(w) has a pole at 7 In biology, medical sciences and physics, U(w) is called the survival function and u(w) is the corresponding mortality rate or hazard rate. 132 The Poisson process w = W= In practice, the failure rate u(w) is relatively high for small w due to initial imperfections that cause a number of objects to fail early and u(w) is increasing towards the maximum life time W due to aging or wear and tear. This shape of u(w) as illustrated in Fig. 7.4 is called a “bath-tub” curve, which is convex. r(t) fX(0) t T 0 Fig. 7.4. Example of a “bath-tub” shaped failure rate function u(w). An often used model for the failure rate is u(w) = dwd31 with corresponding reliability function U(w) = exp [wd ] and where the lifetime [ has a Weibull distribution function I[ (w) = 1 U(w) as in (3.40). In case d = 1, the failure rate u(w) = is constant over time, while d A 1 (d ? 1) reflects an increasing (decreasing) failure rate over time. Hence, a “bath-tub” shaped (realistic) failure function as in Fig. 7.4 can be modeled by a Weibull model for u (w) with d ? 1 in the beginning, d = 1 in the middle and d A 1 at the end of the life time. For an exponential lifetime where i[ (w) = h3w , the failure rate (7.4) equals u(w) = and is independent of time. This means that the failure rate for a w-year-old object is the same as for a new object, which is a manifestation of the memoryless property of the exponential distribution. It also explains why in both the exponential as Poisson process is often called a ’rate’. 7.6 Problems (i) A series of test strings each with a variable number Q of bits all equal to 1 are transmitted over a channel. Due to transmission errors, each 1-bit can be eected independently from the others and only arrives non-corrupted with probability s. The length Q of the test strings (words) is a Poisson random variable with mean length bits. In 7.6 Problems 133 this test, the sum \ of the bits in the arriving words is investigated to determine the channel quality via s. Compute the pdf of \ . (ii) At a router, four QoS classes are supported and for each class packets arrive according to a Poisson process with rate m for m = 1> 2> 3> 4. Suppose that the router had a failure at time w1 that lasted W time units. What is the probability density function of the total number of packets of the four classes that has arrived during that period? (iii) Let Q (w) = Q1 (w) + Q2 (w) be the sum of two independent Poisson processes with rates 1 and 2 . Given that the process Q (w) had an arrival, what is the probability that that arrival came from the process Q1 (w)? (iv) Peter has been monitoring the highway for nearly his entire life and found that the cars pass his house according to a Poisson process. Moreover, he discovered that the Poisson process in one lane is independent from that in the other lanes. The rate of these independent processes diers per lane and is denoted by 1 > 2 > 3 , where m is expressed in the number of cars on lane m per hour. (a) Given that one car passed Peter, what is the probability that it passed in lane 1? (b) What is the probability that q cars pass Peter in 1 hour ? (c) What is the probability that in 1 hour q cars have passed and that they all have used lane 1? (v) In a game, audio signals arrive in the interval (0> W ) according to a Poisson process with rate , where W A 1@. The player wins only if at least one audio signal arrives in that interval, and if he or she pushes a button (only one push allowed) upon the last of the signals. The player uses the following strategy: he or she pushes the button upon the arrival of the first signal (if any) after a fixed time v W . (a) What is the probability that the player wins? (b) Which value of v maximizes the probability of winning, and what is the probability in that case? (vi) The arrivals of voice over IP (VoIP) packets to a router is close to a Poisson process with rate = 0=1 packets per minute. Due to an upgrade to install weighted fair queueing as priority scheduling rule, the router is switched o for 10 minutes. (a) What is the probability of receiving no VoIP packets when switched o? (b) What is the probability that more than ten VoIP packets will arrive during this upgrade? 134 The Poisson process (c) If there was one VoIP in the meantime, what is the most probable minute of the arrival? (vii) A link of a packet network carries on average ten packets per second. The packets arrive according to a Poisson process. A packet has a probability of 30 % to be an acknowledgment (ACK) packet independent of the others. The link is monitored during an interval of 1 second. (a) What is the probability that at least one ACK packet has been observed? (b) What is the expected number of all packets given that five ACK packets have been spotted on the link? (c) Given that eight packets have been observed in total, what is the probability that two of them are ACK packets? (viii) An ADSL helpdesk treats exclusively customer requests of one of three types: (i) login-problems, (ii) ADSL hardware and (iii) ADSL software problems. The opening hours of the helpdesk are from 8:00 until 16:00. All requests are arriving at the helpdesk according to a Poisson process with dierent rates: 1 = 8 requests with login problems/hour, 2 = 6 requests with hardware problems/hour, and 3 = 6 requests with software problems/hour. The Poisson arrival processes for dierent types of requests are independent. (a) What is the expected number of requests in one day? (b) What is the probability that in 20 minutes exactly three requests arrive, and that all of them have hardware problems? (c) What is the probability that no requests will arrive in the last 15 minutes of the opening hours? (d) What is the probability that one request arrives between 10:00 and 10:12 and two requests arrive between 10:06 and 10:30? (e) If at the moment w + v there are n + p requests, what is the probability that there were n requests at the moment w? (ix) Arrival of virus attacks to a PC can be modeled by a Poisson process with rate = 6 attacks per hour. (a) What is the probability that exactly one attack will arrive between 1 p.m. and 2 p.m.? (b) Suppose that at the moment the PC is turned on there were no attacks on PC, but at the shut-down time precisely 60 attacks have been observed. What is the expected amount of time that the PC has been on? 7.6 Problems 135 (c) Given that six attacks arrive between 1 p.m. and 2 p.m., what is the probability that the fifth attack will arrive between 1:30 p.m. and 2 p.m.? (d) What is the expected arrival time of that fifth attack? (x) Consider a system V consisting of q subsystems in series as shown in Fig. 7.5. The system V operates correctly only if all subsystems operate correctly. Assume that the probability that a failure in a subsystem Vl occurs is independent of that in subsystem Vm . Given the reliability functions Um (w) or each subsystem Vm , compute the reliability function U(w) of the system V. S1 S3 S2 Sn Fig. 7.5. A system consisting of q subsystems in series. (xi) Same question as in previous exercise but applied to a system V consisting of q subsystem in parallel as shown in Fig. 7.6. S1 S2 Sn Fig. 7.6. A system consisting of q subsystems in parallel. 8 Renewal theory A renewal process is a counting process for which the interarrival times q are i.i.d. random variables with distribution I (w). Hence, a renewal process generalizes the exponential interarrival times in the Poisson process (see Theorem 7.3.2) to an arbitrary distribution. Since the interarrival times are i.i.d. random variables, at each event (or renewal) the process probabilistically restarts. The classical example of a renewal process is the successive replacement of light bulbs: the first bulb is installed at time Z0 , fails at time Z1 = 1 , and is immediately exchanged for a new bulb, which in turn fails at Z2 = 1 + 2 , and thereafter replaced by a third bulb, and so on. How many light bulbs are replaced in a period of w time units given the life time distribution I (w)? N(t) t W1 W0 W2 W1 W3 W2 W4 W3 W5 W4 W5 Fig. 8.1. The relation between the renewal counting process Q (w), the interarrival time q and the waiting time Zq . 137 138 Renewal theory 8.1 Basic notions P As illustrated in Fig. 8.1, the waiting time Zq = qn=1 n (for q 1, with Z0 = 0 by convention) is related to the counting process {Q (w)> w 0} by the equivalence {Q (w) q} +, {Zq w}: the number of events (renewals) up to time w is at least q if and only if the q-th renewal occurred on or before time w. Alternatively, the number of events by time w equals the largest value of q for which the q-th event occurs before or at time w, Q (w) = max [q : Zq w]. The convention that Z0 = 0 implies that Q(0) = 0: the counting process starts counting from zero at time 0. The main objective of renewal theory is to deduce properties of the process {Q(w)> w 0} as a function of the interarrival distribution I (w) = Pr [ w]. 8.1.1 The distribution of the waiting time Zq If we assume that the interarrival times are i.i.d. having a Laplace transform Z " Z " 3}w * (}) = h gI (w) = h3}w i (w)gw 0 0 the waiting time Zq is the sum of q i.i.d. random variables specified by (2.66) as Z " h3}w iZq (w)gw = *q (})= (8.1) *Zq (}) = 0 By partial integration, we R w find the Laplace transform of the distribution IZq (w) = Pr [Zq w] = 0 iZQ (x)gx Z " *Zq (}) *q (}) h3}w IZq (w)gw = = (8.2) } } 0 The inverse Laplace transform follows1 with (2.38) as Z f+l" q * (}) }w 1 h g} Pr [Zq w] = 2l f3l" } (8.3) As an alternative to the approach with probability generating functions, 1 In general, by integration of (2.38), we find ] w i[ (x)gx = I[ (w) = 0 1 2l ] f+l" f3l" *[ (}) h}w 3 1 g} } U f+l" *[ (}) 1 whose form seems dierent from (8.3). However, 2l g} = 0 because the contour f3l" } can be closed over the positive Re(}) A f plane where *[ (}) is analytic and because limU<" * (Uhl ) = 0 for 3 2 ? ? 2 , which follows from the existence of the Laplace integral U[ " 3}w i[ (w)gw. 0 h 8.1 Basic notions 139 we can resort to the q-th convolution, which follows from (2.63) as iZ1 (w) = i (w) Z " Z w iZQ 31 (w |)i (|)g| = iZQ 31 (w |)i (|)g| iZQ (w) = 3" 0 Integrated, Z w Z " Pr [Zq w] = gx 3" 3" Z " µZ w3| = 3" Z w 3" iZQ 31 (x |)i (|)g| ¶ iZQ 31 (x)gx i (|)g| Pr [Zq31 w |] i (|)g| = 0 (qW) By denoting Pr [Zq w] = I (w), we have I(1W) (w) = I (w) Z w (qW) I((q31)W) (w |)i (|)g| I (w) = 0 (0W) These equations also show that we can define I (w) = 1. Let us define P (nW) Xq (w) = qn=1 I (w). By summing both sides in the last equation, we obtain Z wX Z w q31 q X ((n31)W) Xq (w) = I (w |)i (|)g| = I((n)W) (w |)i (|)g| 0 n=1 0 n=0 (0W) With the definition I (w) = 1, we arrive at Z w Xq31 (w |)gI (|) + I (w) Xq (w) = (8.4) 0 or, written in terms of convolutions, Xq (w) = (Xq31 I ) (w) + I (w) (qW) Finally, we mention the interesting bound on the convolution I a non-negative random variable , Z w (qW) I (w) = I((q31)W) (w |)gI (|) 0 Z w ((q31)W) (w) gI (|) = I((q31)W) (w)I (w) I 0 (w) for 140 Renewal theory which follows from the monotone increasing nature of any distribution func(0W) tion. By iteration on q starting from I (w) = 1, it is immediate that I(qW) (w) (I (w))q (8.5) Since (I (w))q is the distribution of the maximum (3.33) of a set of q i.i.d. random variables {n }1$n$q , the bound (8.5) means that, for n 0, # " q ¸ X n { Pr max n { Pr 1$n$q n=1 P which is rather obvious because qn=1 n max1$n$q n . The equality sign is only possible if q 1 of the n are zero. 8.1.2 The renewal function p (w) = H [Q (w)] From the equivalence {Q (w) q} +, {Zq w}, we directly have Pr [Q (w) q] = Pr [Zq w] = I(qW) (w) (8.6) Pr [Q (w) = q] = Pr [Q (w) q] Pr [Q (w) q + 1] = I(qW) (w) I((q+1)W) (w) The expected number of events in (0> w] expressed via the tail probabilities (2.36) follows with (8.6) as p(w) = H [Q (w)] = " X I(nW) (w) (8.7) n=1 and p(w) is called the renewal function. According to a property of the counting process, Q (0) = 0, the number of events in (0> w] when w $ 0 is assumed to be zero such that p(0) = 0. From (8.5), it follows at each point w for which I (w) ? 1 that p(w) " X n=1 (I (w))n = 1 1 1 I (w) Hence, for finite w where I (w) ? 1, the renewal function p(w) converges at least as fast as a geometric series and is bounded. In the limit w $ 4, where limw<" I (w) = 1, we see that p(w) is not bounded anymore. Intuitively, the number of repeated events (renewals) in an infinite time interval is clearly infinite. The renewal function p(w) completely characterizes the renewal process. Indeed, if *p (}) is the Laplace transform of p(w), then after taking the 8.1 Basic notions 141 Laplace transform of both sides in (8.7) and using the definition Pr [Zq w] = (qW) I (w) together with (8.2), we obtain 1X n 1 * (}) *p (}) = * (}) = } } 1 * (}) " (8.8) n=1 provided |* (})| ? 1. From this expression, the interarrival time can be found from }*p (}) * (}) = 1 + }*p (}) after inverse Laplace transform. By taking the inverse Laplace transform (2.38), p(w) is written as a complex integral Z f+l" * (}) h}w 1 g} p(w) = 2l f3l" 1 * (}) } 8.1.3 The renewal equation After taking the inverse Laplace transform of *p (}) = *p (})* (}) + *}(}) , which is deduced from (8.8), a third relation for p (w) that often occurs is Z w p(w) = p(w x)gI (x) + I (w) 0 Z w I (w x)gp(x) + I (w) (8.9) = 0 and is called the renewal equation. Taking the limit q $ 4 in (8.4) also leads to the renewal equation. Since p(0) = 0, the renewal equation implies that I (0) = Pr [ 0] = 0 or that processes where a zero interarrival time is possible (e.g. in simultaneous events) are ruled out. For a Poisson process, Theorem 7.3.1 states that the occurrence of simultaneous events (k $ 0) is zero. The requirement p(0) = 0 generalizes the exclusion of simultaneous events in any renewal process. The probabilistic argument that leads to the renewal equation is as follows. By conditioning on the first renewal for n A 0, Pr [Q (w) = n|Z1 = v] = 0 = Pr [Q (w v) = n 1] w?v wv where in the last case for w v the event {Q (w) = n} is only possible if n 1 renewals occur in time interval (v> w], which is, due to the stationarity of the 142 Renewal theory renewal process, equal to n 1 renewals in (0> w v]. By the law of total probability (2.46), we uncondition to find for n 1, Z " g Pr [Z1 v] Pr [Q (w) = n] = Pr [Q (w) = n|Z1 = v] gv gv 0 Z w Pr [Q (w v) = n 1] i (v)gv (8.10) = 0 Multiplying both sides by n and summing over all n 1 gives the average at the left-hand side, H [Q (w)] = " X n Pr [Q (w) = n] n=1 The sum at the right-hand side is " X n Pr [Q (w v) = n 1] = n=1 " X (n + 1) Pr [Q (w v) = n] n=0 = H [Q (w x)] + 1 Combining both sides yields Z w H [Q (w)] = I (w) + H [Q (w x)] gI (v) 0 which is again the renewal equation (8.9) since p(w) = H [Q (w)]. 8.1.4 A generalization of the renewal equation The renewal equation (8.9) is a special case of the more general class of integral equations Z w \ (w x)gI (x)> w0 (8.11) \ (w) = k (w) + 0 in the unknown function \ (w), where k (w) is a known function and I (w) is a distribution function. This equation can be written using the convolution notation as \ (w) = k (w) + \ I (w) By conditioning on the first renewal as shown above, many renewal problems can be recasted into the form of the general renewal equation (8.11). An example is the derivation of the residual life or waiting time given in Section 8.3. Therefore, it is convenient to present the solution to the general renewal equation (8.11). 8.1 Basic notions 143 Lemma 8.1.1 If k (w) is bounded for all w, then the unique solution of the general renewal equation (8.11) is Z w k(w x)gp(x) (8.12) \ (w) = k (w) + where p (w) = P" n=1 0 I (nW) (w) is the renewal function. Proof: Let us first concentrate on the formal solution. In general, convolutions are best treated in the transformed domain. After taking the Laplace transform of the general renewal equation (8.11), we obtain *\ (}) = *k (}) + *\ (}) *I (}) such that *\ (}) = *k (}) 1 *I (}) There always exists a region in the }-domain where |*I (})| ? 1 such that the geometric series applies, *\ (}) = *k (}) " X (*I (}))n = *k (}) + *k (}) n=0 " X (*I (}))n n=1 Back transforming and taking into account that (*I (}))n is the transform of a n-fold convolution yields \ (w) = k (w) + k " X I (nW) (w) = k (w) + k p (w) n=1 This formal manipulation demonstrates2 that (8.12) is a solution of the general renewal equation (8.11). Suppose now that there are two solutions \1 (w) and \2 (w). Their dierence Y (w) = \1 (w) \2 (w) obeys Z w Y (w) = Y (w x)gI (x) = Y I (w) 0 2 Alternatively, by substituting the solution into the equation, a check is \ (w) = k (w) + \ W I (w) = k (w) + k W I (w) + k W p W I (w) % & " [ (nW) = k (w) + k W I (w) + I (w) W I (w) % = k (w) + k W I (w) + n=1 " [ n=2 = k (w) + k W p (w) & I (nW) (w) = k (w) + k W %" [ n=1 & I (nW) (w) 144 Renewal theory By convolving both sides with I and using the original equation, we deduce that Y (w) = Y I I (w). Continuing this process, for each n, we have that Y (w) = Y I (nW) (w). Since I (nW) (w) $ 0 for all finite w and n $ 4 (because p (w) exists for all finite w), and if Y (w) is bounded, this implies that Y (w) = 0 for all finite w. This demonstrates the uniqueness and motivates the requirement that k (w) should be bounded. ¤ 8.1.5 The renewal function for a Poisson process Before showing below that the renewal function p(w) can be specified in detail as w $ 4, we consider first the Poisson process where the interarrival times {q }qD1 are i.i.d. exponentially distributed with rate . Since * (}) = }+ , Z f+l" }w h 1 g} fA0 p(w) = 2l f3l" } 2 The contour can be closed over the negative Re(})-plane (because w 0). The only singularity of the integrand is a double pole at } = 0 with residue ¯ gh}w ¯ p(w) = g} ¯ = w. This result, of course, follows directly from the }=0 definition of the Poisson process given in (7.2). We see that the renewal function p(w) for the Poisson process is linear for all w. Moreover, the Poisson process is the only continuous time renewal process with a linear renewal function p(w). Indeed, if3 p (w) = w, the renewal equation is Z w Z w w = ((w x)) gI (x) + I (w) = I (x)gx wI (0) + I (w) 0 0 By dierentiation with respect to w and assuming non-zero interarrival times such that I (0) = Pr [ 0] = 0, we obtain a dierential equation = I (w) + gI (w) gw whose solution is I (w) = 1 h3w . By Theorem 7.3.2, exponential interarrival times characterize a Poisson process with rate = . 8.2 Limit theorems In the limit w $ 4, the equivalence relation (8.6) indicates that, for any fixed value of q, Pr [Q (w) q] = 1, which means that the number of events 3 A linear form p(w) = w + with 6= 0 is impossible because p (0) = 0. 8.2 Limit theorems 145 Z Q (w) Q (w) $ 4 as w $ 4. Let us consider Q(w) , which is the sample mean of the first Q (w) interarrival times in the w]. ¤The Strong Law of £ intervalZ(0> q Large Numbers (6.3) indicates that Pr limq<" q = = 1 and, because Z (w) $ = H [ ] as w $ 4. Since Q (w) $ 4 as w $ 4, we have that QQ(w) ZQ(w) w ? ZQ(w)+1 , we obtain the inequality ZQ(w)+1 ZQ(w) w ? Q (w) Q (w) Q(w) Since both lower and upper bound tend to , we arrive at the important = 1 . The random variable counting the number result that limw<" Q(w) w of events in (0> w] per interval length w, converges to the average interarrival time = H [ ]. Unfortunately4 , weh cannot i simply deduce the intuQ(w) itive result that also the expectation, H tends to 1 . On the other w hand,£ the expectation of ZQ(w) is obtained from Wald’s identity (2.69) ¤ as H ZQ(w) = H [Q (w)] H [ ]. Taking the expectation in the inequality H[Q(w)] 1 ZQ(w) w ? ZQ(w)+1 , leads to H[Q(w)] H[ + 1w from which, after w w ] ? the limit w $ 4, the intuitive result follows. Thus, we have proved5 the following theorem: Theorem 8.2.1 (Elementary Renewal Theorem) If = H [ ] is the average interarrival time of events in the renewal process, then H [Q (w)] p(w) 1 = lim = w<" w w Q (w) 1 = lim w<" w lim w<" (8.13) The left-hand side in (8.13) describes the long run average number of events (renewals) per unit time. The right-hand side is the reciprocal of the average interarrival rate (or life time). For example, in the light bulb replacement process, a bulb lasts on average time units, then, in the long run or steady state, the light bulbs must be replaced at rate 1 per time unit. 4 As remarked by Ross (1996, p. 108), if X is uniformly distributed on (0> 1), consider the random variables \q defined as \q = q1X $ 1 . For large q, X A 0 with probability 1, whence \q < 0 q k l 1 = 1, for all q. The sequence of random if q < ". However, H [\q ] = qH 1X $ 1 = q q q variables \q converges to 0, although the expected values of \q are all precisely 1= 5 The elementary renewal theorem can be proved only by resorting to complex function theory and using Laplace—Stieltjes transforms (Cohen, 1969, p. 100). The limit argument provided by the Strong Law of Large Numbers follows then from a Tauberian theorem. 146 Renewal theory The extension6 of the Elementary Renewal Theorem is the Key Renewal Theorem. The Key Renewal Theorem gives the limit w $ 4 of the solution (8.12) of the general renewal equation (8.11). Theorem 8.2.2 (Key Renewal Theorem) If j(w) is directly7 Riemann integrable over [0> 4), then Z Z w 1 " lim j(w x)gp(x) = j(x)gx (8.14) w<" 0 0 The proof8 is more complicated, based on analysis and found in Feller (1971, Section XI.1). The essential di!culty is demonstrating that the limit at the right-hand side indeed exists. An application of the Key Renewal Theorem is presented in Section 8.3 and here we consider Blackwell’s Theorem. Blackwell’s Theorem follows from the Key Renewal Theorem when choosing k (w) = 1wM[0>W ) in the general renewal equation (8.11). The corresponding solution (8.12) for w A W is Z w Z w \ (w) = 1w3xM[0>W ) gp(x) = gp(x) = p(w) p(w W ) w3W 0 R" while the Key Renewal Theorem states that limw<" \ (w) = 1 0 k(x)gx = W . Hence, we arrive at Blackwell’s Theorem, for any fixed W A 0, p(w) p(w W ) 1 = w<" W lim The interpretation of Blackwell’s Theorem is that the number of expected renewals in an interval with length W su!ciently far from the origin (or in steady-state regime) is approximately equal to W . It can be shown that the reverse, i.e. the Key Renewal Theorem can be deduced from Blackwell’s theorem, also holds. Hence, the Key Renewal Theorem is equivalent to Blackwell’s Theorem. Similarly to the Key Renewal Theorem the di!culty in Blackwell’s Theorem is the proof that the limit exists. If the existence of the limit is proved, 6 In the sequel we assume that the distribution of S the interarrival times I (w) is not periodic in the sense that there exists no integer g such that " q=0 Pr [ = qg] = 1 or, the random variable does not only take integer units of some integer g. 7 The concept is introduced to avoid widly oscillating functions that are still integrable over [0> "), such as j(w) = w1{|w3q|? 1 } . The precise definition is given in Feller (1971). A q2 su!cient condition for Udirect Riemann integrability is (a) j(w) D 0 for all w D 0, (b) j(w) is non-increasing and (c) 0" j(x)gx ? ". 8 Based on the relatively new probabilistic concept of “coupling”, alternative proofs of the Key Renewal Theorem exist (see e.g. Grimmett and Stirzacker (2001, pp. 429—430)). 8.2 Limit theorems 147 which means that limw<" p(w) p(w W ) = d(W ) exists, the Elementary Renewal Theorem su!ces to prove that the limit has value 1 . Following the argument of Ross (1996, p. 110), we can write, for finite { and |, d({ + |) = lim [p(w) p(w { |)] w<" = lim [p(w) p(w {)] + lim [p(w {) p(w { |)] w<" w<" = d({) + d(|) Apart from the trivial solution d({) = 0, the only other9 solution of d({ + |) = d({) + d(|) is d({) = f{, where f is a constant. Hence, given that limw<" p(w) p(w W ) = d(W ) exists, this is equivalent to the fact that q 3W ) the sequence {eq }qD0 where eq = p(wq )3p(w and wq A wq31 converges W to a constant f. The simplest sequence with this property is {eWq }qD0 where eWq = p(q) p(q 1) and W = 1. Lemma 6.1.1 states that 1X W 1X p(q) 1 en = lim p(n) p(n 1) = lim = q<" q q<" q q<" q q q n=1 n=1 f = lim where the last equality follows from the Elementary Renewal Theorem (8.13). Theorem 8.2.3 (Asymptotic Renewal Distribution) If the average = H [ ] and variance 2 = Var[ ] of the interarrival time of the events in a renewal process exist, then 5 6 Z { Q (w) w 1 2 lim Pr 7 q ? {8 = s h3x @2 gx w<" w 2 3" 3 (8.15) Proof : The Elementary Renewal Theorem states that Q (w) w for large w, which suggests to consider the random variable X (w) = Q(w) w with H [X(w)] $ 0. From the equivalence {Q (w) ? q} +, {Zq A w}, we have {X (w) ? {w } +, {Z{w + w A w} where {w is such that {w + w is a positive 9 The proof is as follows: (i) if | = 0, we see that d({ + 0) = d({) + d(0) or d(0) = 0. (ii) d(q{) = qd({) for integer q. (iii) Using (ii), we have that d(q{ + p|) = qd({) + pd(|). By choosing q{ + p| = 0 it follows from (i) that d 3 p | = 3p d (|) such that (ii) holds for q q rational numbers. Thus, d(t1 { + t2 |) = t1 d({) + t2 d(|) for rational numbers t1 and t2 . (iv) (y) $ i (x)+i of a convex function in Section 5.2 and the fact Recalling the definition i x+y 2 2 that a function that is both concave and convex is a linear function, it follows that d({) is linear and with (i) that d({) = f{. 148 Renewal theory integer. Then, h i Pr [X (w) ? {w ] = Pr Z{w + w A w ³ ´ ´ 6 ³ 5 w Z{w + w {w + w {w + w 8 q q A = Pr 7 {w + w {w + w The waiting time Zq consists of a sum of i.i.d. random variables with mean and variance 2 . By the Central Limit Theorem 6.3.1, there holds that ¸ Z " 1 Zq q 2 s lim Pr A{ = s h3x @2 gx q<" q 2 { which implies that ³ ´ ³ ´ 6 5 Z " Z{w + w {w + w w {w + w 1 2 8 7 q q s lim Pr A h3x @2 gx = w<" w w 2 | {w + {w + w3 {w + w = |. provided limw<" t {w + w Hence, we must determine {w such that, for large w, { q w = | {w + w à ! r ³ ´2 2 2 w and provided the negwhich is satisfied if {w = |22 1 ± 1 + 4 | q w ative sign is chosen. For large w, we see that {w | + R(1). Thus, r ¸ Z " | w 1 2 h3x @2 gx =s lim Pr X (w) ? w<" 2 | which is equivalent to 6 Z " Q (w) w 1 2 ? {8 = s h3x @2 gx lim Pr 7 q w<" 2 3{ w3 Noting that R" 5 3x2 @2 gx = 3{ h R{ 3x2 @2 gx finally proves (8.15). 3" h ¤ Comparing Theorem 8.2.3 to the Central Limit Theorem 6.3.1 shows that the asymptotic variance of Q (w) behaves as Var [Q (w)] 2 = 3 w<" w lim (8.16) 8.3 The residual waiting time 149 Moreover, Theorem 8.2.3 is a central limit theorem for the dependent random variables Q (Zq ) where dependence is obvious from Q (Zq ) = Q (Zq31 ) + 1. 8.3 The residual waiting time Suppose we inspect a renewal process at time w and ask the question “How long do we have to wait on average to see the next renewal?” This question frequently arises in renewal problems. For instance, the arrivals of taxis at a station is a renewal process and, often, we are interested to known how long we have to wait until the next taxi. Also, packets arriving at a router may find an earlier packet that is partially served. In order to compute the total time spent in the system, it is desirable to know the residual service time of that packet. In addition, this problem belongs to one of the classical examples to demonstrate how misleading intuition in probability problems can be. There are two dierent arguments to the question above leading to two dierent answers: (i) since my inspection of the process does not alter or influence the process, the distribution of my waiting time should not depend on the time w; hence, my average waiting time equals the average interarrival time of the renewal process. (ii) the time w of the inspection is chosen at random in (i.e. uniformly distributed over) the interval between two consecutive renewals; hence my expected waiting time should be half of the average interarrival time. Both arguments seem reasonable although it is plain that one of them must be wrong. Let us try to sort out the correct answer to this apparent paradox, which, according to Feller (1971, pp. 12—13), has puzzled many before its solution was properly understood. A(t) R(t) time t WN(t) WN(t)+1 L(t) Fig. 8.2. Definition of the random variables the age D(w), the lifetime O(w) and the residual life (or waiting time) U(w). Figure 8.2 defines the setting of the renewal problem and the quantities of interest: D(w) is the age at time w, which is the total time elapsed since the 150 Renewal theory last renewal before w at time ZQ(w) , the residual waiting time (or residual life or excess life) U(w) is the remaining time at w until the next renewal at time ZQ(w)+1 and O(w) is the total waiting time (or life time). From Fig. 8.2, we verify that D(w) = w ZQ (w) U(w) = ZQ (w)+1 w O(w) = ZQ (w)+1 ZQ(w) = D(w) + U(w) The distribution of the residual waiting time, IU(w) ({) = Pr [U(w) {] will be derived. Similar to the probabilistic argument before, we condition on the first renewal. If Z1 = v w, then the first renewal occurs before time w and the event {U(w) A {|Z1 = v} has the same probability as the event {U(w v) A {} because the renewal process restarts from scratch at time v. If v A w, the residual waiting time U(w) lies in the first renewal interval [0> v]. In this case, we have either that the residual waiting time U(w) is certainly shorter than { if v is contained in the interval [w> w + {], else the residual waiting time U(w) is surely larger than {. In summary, ; ? Pr [U(w v) A {] if 0 v w Pr [U(w) A {|Z1 = v] = 0 if w ? v w + { = 1 if v A { + w Using the total law of probability (2.46), Z " g Pr [Z1 v] Pr [U(w) A {] = gv Pr [U(w) A {|Z1 = v] gv 0 Z w Z " g Pr [ v] gv = Pr [U(w v) A {] i (v)gv + gv 0 {+w Z w = Pr [U(w v) A {] gI (v) + 1 I ({ + w) 0 This relation is an instance of the general renewal equation (8.11). Since 1 I ({ + w) is monotonously decreasing, for all {, it holds with (2.35) that Z " Z " (1 I ({ + w)) gw (1 I (w)) gw = H [ ] ? 4 0 0 which also implies that limw<" 1I ({+w) = 0. Hence, k (w) = 1I ({+w) is bounded for all w 0 and Lemma 8.1.1 is applicable, yielding Z w [1 I ({ + w v)] gp(v) Pr [U(w) A {] = 1 I ({ + w) + 0 8.3 The residual waiting time 151 Also, the conditions for direct Riemann integrability in the Key Renewal Theorem 8.2.2 for j(w) = 1 I ({ + w) are satisfied such that Z w lim Pr [U(w) A {] = lim [1 I ({ + w v)] gp(v) w<" w<" 0 Z " 1 = (1 I ({ + w)) gw with (8.14) H [ ] 0 Z " 1 = (1 I (w)) gw H [ ] { In other words, the steady-state or equilibrium distribution function for the residual waiting time equals Z { 1 (1 I (w)) gw (8.17) lim Pr [U(w) {] = Pr [U {] = IU ({) = w<" H [ ] 0 Similarly, for w A |, the event {D(w) A |} is equivalent to the event {no renewals in [w |> w]}, which is equivalent to {U(w |) A |}. Hence, lim Pr [D(w) A |] = lim Pr [U(w |) A |] = lim Pr [U(w) A |] w<" w<" Z " 1 (1 I (w)) gw = H [ ] | w<" or, both the residual waiting time U and the age D have the same distribution in steady state (w $ 4). Intuitively, when reversing the time axis in steady state or looking backward in time, an identically distributed renewal process is observed in which the role of the age D and the residual life U are interchanged. Thus, by a time symmetry argument, both distributions must be the same in steady state. It is instructive to compute the average residual waiting time H [U] = H [D] in steady state. Using the expression of the average in terms of tail probabilities (2.35), we have Z " H [U] = (1 IU ({)) g{ 0 Z " Z " 1 = g{ (1 I (w)) gw H [ ] 0 { Reversing the order of the {- and w-integration yields Z w Z " 1 gw (1 I (w)) g{ H [U] = H [ ] 0 0 Z " 1 = w (1 I (w)) gw H [ ] 0 152 Renewal theory After partial integration, we end up with £ ¤ Z " H 2 Var[ ] + (H [ ])2 1 2 w i (w)gw = = H [U] = 2H [ ] 0 2H [ ] 2H [ ] or H [ ] Var[ ] H [U] = + 2 2H [ ] (8.18) This expression shows that the average remaining waiting time equals half of the average interarrival time plus the ratio of the variance over the mean of the interarrival time. The last term is always positive. Since H [D] = H [U] and H [O] = H [D] + H [U], we observe the curious result that H [O] = H [ ] + Var[ ] H [ ] H [ ] or that the average total waiting H [O] is longer than the average interarrival time H [ ], contrary to intuition. This fact is referred to as the inspection paradox: the steady-state interrenewal time, O(w) = ZQ(w)+1 ZQ(w) , containing the inspection point at time w, exceeds on average the generic interarrival time, say Z1 . The explanation is that the inspection point at time w is uniformly chosen over the time axis and every inspection point is thus equally likely. The chance that the inspection point w lies in a renewal interval is proportional to the length of that interval. Hence, it has higher probability to fall in a long interval, which explains10 why H [O] H [ ]. Only for deterministic interarrival times where Var[ ] = 0 holds the equality sign, H [O] = H [ ]. For exponential interarrival times, application of (3.18) gives Var[ ] = (H [ ])2 and H [U] = H [ ] while H [O] = 2H [ ]: the fact of being inspected at time w changes the lifetime distribution and even doubles the expected total life time for exponentially distributed failure or interoccurrence times. Returning to the initial question, we observe that the intuitive result that ] my waiting time H [U] = H[ 2 is only correct for deterministic processes. Thus, the variability in the interarrival process causes the paradox. We will see later, in queueing theory in Section 14.3.1, that also in queueing systems the variability in service discipline causes the average waiting time to increase. At last, Feller (1971, p. 187) remarks that an apparently unbiased inspection plan may lead to false conclusions because the actual observations are not typical of the population as a whole. When people complain that buses or trains start running irregularly, the inspection paradox shows 10 A similar type of reasoning is used in the computation of the waiting of the GI/D/m queueing system in Section 14.4.2. 8.4 The renewal reward process 153 that above-average interarrival times are experienced more often. The inspection paradox thus implies that complaints may be erroneously based on an overestimation of the real deviations from the regular time schedule of busses or trains. By separating each renewal interval into two non-overlapping subintervals D(w) and U(w), we have described an alternating renewal process. An alternating renewal process models a system that can be in on- or o-period with a repeating pattern [1 > \1 > [2 > \2 > = = = where each on-period [q has a same distribution Ion and is followed by an o-period \q . Each o-period has also a same distribution Io . The o-period \q may dependent on the on-period [q , but the q-th renewal cycle with duration [q + \q is independent of any other cycle. An alternating renewal process can be used to model a data stream of packets, where the on-period reflects the time to store or process an arriving packet and the o-period a (random) delay between two packets. Another example is the modeling of the end-to-end delay from a source v to a destination g in the Internet, where the o-period describes a queueing in a router due to other interfering tra!c along that path. During the onperiod, a packet is not blocked by other packets. The on-period equals the propagation delay to travel from the output port of one router to the output port of the next-hop router. The end-to-end delay along a path with k hops equals the sum of k consecutive o-periods augmented by the propagation time from v to g. 8.4 The renewal reward process The renewal reward process associates at each renewal at time Zq a certain cost or reward Uq , which may vary over time and can be negative. For example, each time a light bulb fails, it must be replaced at a certain cost (negative reward) or each customer in a restaurant pays for his meal (positive reward). The reward Uq may depend on the interarrival time q or length of the q-th renewal interval, but it is independent of other renewal epochs (dierent from the q-th). Thus, the pairs (Uq > q ) are assumed to be independent and identically distributed. Most often one is interested in the total reward U(w) over a period w (not to be confused with the residual life time) defined as X Q(w) U(w) = Uq (8.19) q=1 In this setting, the renewal reward process is a generalization of the counting process where Uq = 1. 154 Renewal theory By slightly rewriting the total reward U(w) earned over an interval w as PQ(w) U(w) Uq Q (w) = q=1 w Q (w) w and taking the limit w $ 4, the first fraction tends with probability one to the average reward H [U] per renewal period by the Strong Law of Large 1 Numbers (Theorem 6.2.2), while the second fraction tends to 1 = H[ ] by the Elementary Renewal Theorem 8.2.1. Hence, with probability one holds that H [U] U(w) = (8.20) lim w<" w H [ ] which means that the time average reward rate equals the average award per renewal period multiplied by the interarrival rate of renewals (or divided by the average length of a renewal interval). Similarly as in the proof of the Elementary Renewal Theorem 8.2.1, the inequality for any w, X Q(w) X Q(w) Uq U(w) q=1 Uq + UQ(w)+1 q=1 leads, after taking the expectations and using Wald’s identity (2.69), to an inequality for the averages, £ ¤ H [Q (w)] H [U] H [U(w)] H [Q (w)] H [U] + H UQ(w)+1 Dividing by w, the limit w $ 4 becomes ¤ £ H UQ(w)+1 H [Q (w)] H [U(w)] H [Q (w)] lim H [U] lim + lim H [U] lim w<" w<" w<" w<" w w w w ¤ £ Since the average reward per renewal period is finite and H UQ(w)+1 = H [U], we obtain by the Elementary Renewal Theorem 8.2.1 that H [U] H [U(w)] = w<" w H [ ] lim (8.21) Hence, by comparing (8.21) and (8.20), the time average of the average reward rate equals the time average of the reward rate. Example The hard disc in a network server is replaced at cost F1 at time W . The lifetime or age of this mass storage has pdf iD . If the hard disc fails earlier, the cost of the repair and the penalties for service disruption is F2 . What is the long run cost of the hard disc in the server per unit time? 8.5 Problems 155 Consider the replacement of hard discs as a renewal process with i.i.d. interarrival times and with distribution ½ Rw 0 iD (x)gx if w ? W Pr [ ? w] = 1 if w W The average replacement time follows from the tail expression (2.35), Z " Z W H [ ] = (1 Pr [ ? w]) gw = (1 Pr [ ? w]) gw 0 0 The replacement cost F ( ) equals F ( ) = F1 1 ?W +F2 1 DW and the average cost is with (2.13), H [F] = F1 Pr [ ? W ] + F2 (1 Pr [ ? W ]) The Elementary Renewal Reward Theorem (8.21) and (8.20) states that the long-run cost of replacements equals F1 Pr [ ? W ] + F2 (1 Pr [ ? W ]) H [F] = RW H [ ] (1 Pr [ ? w]) gw 0 Usually the replacement time W is chosen to minimize this long-run cost. 8.5 Problems More worked examples can be found in Karlin and Taylor (1975, Chapter 5). £ ¤ (i) Calculate Pr ZQ(w) { . (ii) Derive a recursion equation for the generating function *Q(w) (}) = ¤ £ H } Q(w) of the number of renewals in the interval [0> w] and deduce from that equation the renewal equation (8.9) and a relation for Var[Q (w)]. (iii) In a TCP session from A to B, IP data packets and IP acknowledgement packets travel a distance of 2000 km over precisely the same bi-directional path. In case of congestion, the average speed is 40000 km/s and without congestion the speed is three times higher. Congestion only occurs in 20% of the travels. What is the average speed of IP packets in the TCP session? (iv) The production of digitalized speech samples depends primarily on the codec, with an eective average rate u (bits/s). Since this rate is low compared to the ATM capacity F (bits/s), UMTS will use AAL2 mini-cells in which 1 ATM cell is occupied by Q users. The financial cost of an UMTS operator increases at qf euro per unit time whenever 156 Renewal theory there are q ? Q speech samples are waiting for transmission and an additional cost of N euro each time an ATM cell is transmitted. What is the average cost per unit time for the UMTS operator? (v) The cost of replacing a router that has failed is D euro. However, one can decide to replace a router that has been in service for a period of time W . The advantage of this approach is that the cost of replacing a working router is only E euro, where E ? D. The policy ChangeRouter consists of replacing a router either upon failure or upon reaching the age W , whichever occurs first. Replacement of the current router by a new one occurs instantaneously and at each time there can only be one router in the network. Let {[m } be a sequence of i.i.d. random variables, where [m is the lifetime of a router m. (a) Find the time average cost rate F of the policy ChangeRouter. (b) Compute F if W = 5 years, the cost of replacing the failed router is D = 10000 euro and the cost of replacing a working router is E = 7000 euro. The independent random variables [m are exponentially distributed and the average lifetime of a router is 10 years. 9 Discrete-time Markov chains A large number of stochastic processes belong to the important class of Markov processes. The theory of Markov chains and Markov processes is well established and furnishes powerful tools to solve practical problems. This chapter will be mainly devoted to the theory of discrete-time Markov chains, while the next chapter concentrates on continuous time Markov chains. The theory of Markov processes will be applied in later chapters to compute or formulate queueing and routing problems. 9.1 Definition A stochastic process {[(w)> w 5 W } is a Markov process if the future state of the process only depends on the current state of the process and not on its past history. Formally, a stochastic process {[(w)> w 5 W } is a continuous time Markov process if for all w0 ? w1 ? · · · ? wq+1 of the index set W and for any set {{0 > {1 > = = = > {q+1 } of the state space it holds that Pr[[(wq+1 ) = {q+1 |[(w0 ) = {0 ,...,[(wq ) = {q ] = Pr[[(wq+1 ) = {q+1 |[(wq ) = {q ] (9.1) Similarly, a discrete-time Markov chain {[n > n 5 W } is a stochastic process whose state space is a finite or countably infinite set with index set W = {0> 1> 2> = = =} obeying Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ] (9.2) A Markov process is called a Markov chain if its state space is discrete. The conditional probabilities Pr [[n+1 = m|[n = l] are called the transition probabilities of the Markov chain. In general, these transition probabilities can depend on the (discrete) time n. A Markov chain is entirely defined by the transition probabilities (9.2) and the initial distribution of the Markov chain 157 158 Discrete-time Markov chains Pr [[0 = {0 ]. Indeed, by the definition of conditional probability (2.45), we obtain Pr [[0 = {0 > = = = > [n = {n ] = Pr [[n = {n |[0 = {0 > = = = > [n31 = {n31 ] × Pr [[0 = {0 > = = = > [n31 = {n31 ] and, by the definition of the Markov chain (9.2), Pr [[0 = {0 > = = = > [n = {n ] = Pr [[n = {n |[n31 = {n31 ] × Pr [[0 = {0 > = = = > [n31 = {n31 ] This recursion relation can be iterated resulting in Pr [[0 = {0 > = = = > [n = {n ] = n Y Pr [[m = {m |[m31 = {m31 ] Pr [[0 = {0 ] m=1 (9.3) which demonstrates that the complete information of the Markov chain is obtained if, apart from the initial distribution, all time depending transition probabilities are known. 9.2 Discrete-time Markov chain If the transition probabilities are independent of time n, Slm = Pr [[n+1 = m|[n = l] (9.4) the Markov chain is called stationary. In the sequel, we will confine ourselves to stationary Markov chains. Since the discrete-time Markov chain is conceptually simpler than the continuous counterpart, we start the discussion with the discrete case. Let us consider a state space V with Q states (where Q = dim V can be infinite). It is convenient to introduce a vector notation1 . Since [n can only take Q possible values, we denote the corresponding state vector at discrete-time n by v[n] = [v1 [n] v2 [n] · · · vQ [n]] with vl [n] = Pr [[n = l]. Hence, v[n] is a 1 × Q vector. Since the state [n at discrete-time n must P be in one of the Q possible states, we have that Q l=1 Pr [[n = l] = 1 or, in PQ vector notation, v[n]=x = l=1 1=vl [n] = 1, where xW = [1 1 · · · 1]. This fact is also written as kv[n]k1 = 1, where kdk1 is the t = 1 norm of vector d 1 Unfortunately, a vector in Markov theory is represented as a single row matrix which deviates from the general theory in linear algebra, followed in Appendix A, where a vector is represented as a single column matrix. In order to be consistent with the literature on Markov processes, we have chosen to follow the notation of Markov theory here, but elsewhere we adhere to the general convention of linear algebra. 9.2 Discrete-time Markov chain 159 defined in the Appendix A.3. In a stationary Markov chain, the states [n+1 and [n are connected via the law of total probability (2.46), Pr [[n+1 = m] = = Q X l=1 Q X Pr [[n+1 = m|[n = l] Pr [[n = l] Slm Pr [[n = l] (9.5) l=1 which holds for all m, or, in vector notation, v[n + 1] = v[n]S where the transition probability matrix S is 6 5 S12 S13 ··· S1;Q31 S1Q S11 9 S21 S22 S23 ··· S2;Q31 S2Q : : 9 9 S31 S32 S33 ··· S3;Q31 S3Q : : 9 S =9 : .. .. .. .. .. : 9 . . . · · · . . : 9 7 SQ31;1 SQ31;2 SQ31;3 · · · SQ31;Q31 SQ 31;Q 8 SQ1 SQ;2 SQ3 ··· SQ;Q31 SQQ (9.6) (9.7) Since (9.6) must hold for any initial state vector v[0], by choosing v[0] equal to a base vector [0 · · · 0 1 0 · · · 0] (all columns zero except for column l) which expresses that the Markov chain starts from one of the possible states, say state l, then v[1] = [Sl1 Sl2 · · · SlQ ]. Furthermore, since kv[n]k1 = 1 for P any n, it must hold that Q m=1 Slm = 1 for any state l. The relation Q X Slm = 1 (9.8) m=1 means that, at discrete-time n, there certainly occurs a transition in the Markov chain, possibly to the same state as at time n 1. The Q × Q transition probability matrix S thus consists of Q 2 Q transition probabilities Slm and at each row, one transition probability can be expressed in P terms of the others, e.g. Sln = 1 Q m=1;m6=n Slm . A matrix with elements 0 Slm 1 obeying (9.8) is called a stochastic matrix whose properties are investigated in Appendix A. Apart from the matrix representation, Markov chains are often described by a directed graph (as illustrated in the figure below), where Slm is represented by an edge from state l to m provided Slm A 0. Especially, this feature enables to deduce structural properties of the Markov chain (such as e.g. communicating states) elegantly. 160 Discrete-time Markov chains P22 P12 1 P41 4 2 P32 P16 P34 P45 5 7 0 0 9 0 9 S = 9 S41 7 0 0 0 3 P55 P75 P47 5 P63 P56 6 P67 P76 S12 S22 S32 0 0 0 0 0 0 0 0 0 S63 0 0 0 S34 0 0 0 0 0 0 0 S45 S55 0 S75 S16 0 0 0 S56 0 S67 0 6 0 0 : : S47 : 0 8 S67 0 Given the initial state vector v[0], the general solution of (9.6) is v[n] = v[0]S n (9.9) Similarly, when knowledge of the Markov chain at discrete-time n is available, we obtain from (9.6) that v [n + q] = v[n]S q The elements of the matrix S q are called the q-step transition probabilities, Slmq = Pr [[n+q = m|[n = l] (9.10) for n 0 and q 0. Since the discrete Markov chain must be surely in one of the Q states q time units later given that it started at time n in state l, we obtain an extension of (9.8), for all q 1, Q X Slmq = 1 (9.11) m=1 The demonstration of (9.11) is by induction. If q = 1, (9.8) justifies (9.11). Assume that (9.11) holds. Any matrix element of S q+1 can be written as P q Slmq+1 = Q n=1 Sln Snm . Summing over all m, yields for q 0, Q X Slmq+1 = m=1 Q X Sln Q X q Snm m=1 n=1 = Q X Sln (induction argument) n=1 =1 This proves (9.11). (q = 1 case) 9.2 Discrete-time Markov chain 161 9.2.1 Definitions and classification 9.2.1.1 Irreducible Markov chains A state m in a Markov chain is said to be reachable from state l if it is possible to proceed from state l to state m in a finite number of transitions which is equivalent to Slmq A 0 for finite q. If every state is reachable from every other state, the Markov chain is said to be irreducible. The example of the Markov graph above is not irreducible because state 2 is absorbing. Markov theory is considerably more simplified if we know that the chain is irreducible, which justifies to investigate methods to determine irreducibility. An equivalent requirement for the Markov chain to be irreducible is that the associated directed graph is strongly connected, i.e. if there is a path from node l to node m for any pair of distinct nodes (l> m). Let us review some basic notions from graph theory (see Appendix B.1). Denote by D the adjacency matrix of S where all non-zero elements in S are replaced by 1. A walk of length n from state l to state m is a succession of n arcs of the form (q0 $ q1 )(q1 $ q2 ) · · · (qn31 $ qn ), where q0 = l and qn = m. A path is a walk in which all nodes are dierent, i.e. qo 6= qp for all 0 o 6= p n. Lemma B.1.1 (proved in Appendix B.1, art. 5) states that the ¡number of ¢ n walks of length n from state l to state m is equal to the element D lm . A directed graph is strongly connected if and only if each non-diagonal P PQ31 n n element of the matrix Q31 n=1 S or, equivalent, of E = n=1 D is positive. Since S has Q states, the longest possible path between two states consists of Q 1 hops. By summing over all powers of 1 n Q 1, the element elm of the matrix E equals the number of all possible walks (of any possible length) between l and m. Hence, if elm A 0 for all l 6= m, there exists walks from any state l to any other state m. The converse is readily verified. Another way to determine irreducibility follows from the definition of reducibility in the Appendix A.4. However, the methods for strongly connectivity or irreducibility are still algebraic in that they require matrix operations. A computationally more e!cient method consists of applying allpair shortest path algorithms on the Markov graph. Examples of all-pair shortest path algorithms are that of Floyd-Warshall (with computational complexity FFloyd-Warschall = R(Q 3 )) or the algorithm of Johnson (complexity FJohnson = R(Q 2 log Q + Q O), where O is the total number of links in the Markov graph). These algorithms are nicely discussed in Cormen et al. (1991). 162 Discrete-time Markov chains 9.2.1.2 Communicating states If two states l and m are reachable from one to each other, they are said to communicate, which is often denoted by l #$ m. The concept of communication is an equivalence relation: (a) reflexivity: l #$ l since S 0 = L or Slm = lm . (b) symmetry: l #$ m then m #$ l which follows from the definition of communication and (c) transitivity: if l #$ m and m #$ n then l #$ n. The transitivity follows from the nonq+p negativity of S and S q+p = S q S p such that the matrix element Sln = PQ q p q p o=1 Slo Son Slm Smn . By definition of l #$ m and m #$ n, we have q p A 0 for some finite q and p. Hence, S q+p A 0, which Slm A 0 and Smn ln implies l #$ n. As an application, the total state space can be partitioned into equivalence classes. States in one equivalence class communicate with each other. If there is a possibility to start in one class and to enter another class in which case there is no return possible to the first class (otherwise the two classes would form one class), the Markov chain is reducible. In other words, a Markov chain is irreducible if the equivalence relations result into one class. 9.2.1.3 Periodic and aperiodic Markov chains q A 0 for some q 1. The Consider a state m in a Markov chain with Smm period gm is defined as the greatest common divisor of those q for which q A 0. The figure below illustrates a Markov chain with period g = Q . Smm 1 2 N 5 0 9 0 9 9 0 9 S =9 . 9 .. 9 7 0 3 ... 6 4 5 1 0 0 0 1 0 0 0 1 .. .. . . ··· 0 0 0 1 0 0 0 ··· ··· ··· .. . ··· ··· 6 0 0 : : 0 : : .. : . : : 1 8 0 Since the greatest common divisor of a set is the largest integer g that divides any integer in a set, it is smaller than the minimum element in the set. Thus, q 1 gm min{Smm A 0} PQ q q+p qSp + q p The relation Smm = Smm mm o=1;o6=m Smo Som deduced from matrix multiplication and the fact that all elements in S are non-negative shows that 9.2 Discrete-time Markov chain 163 n cannot decrease with increasing n = q + p. Hence, if S A 0, then all Smm mm n Smm A 0 for n A 1 and thus gm = 1. Lemma 9.2.1 If two states l and m communicate (l #$ m), then gl = gm . Proof: Let q and p be integers such that Slmq A 0 and Smlp A 0. From P q p q p S q+p = S q S p , the matrix element Sllq+p = Q n=1 Sln Snl Slm Sml . By definition of q and p, Sllq+p A 0, and, by definition of a period, gl |(q + p). Similarly, from S q+o+p = S q S o+p , the matrix element Sllq+o+p = Q X u=1 Sluq Q X o p o Sun Snl Slmq Smm Smlp (9.12) n=1 o A 0, which implies by definition that g |o, then we also have Now, if Smm m that Sllq+o+p A 0 from which gl |(q + p + o). Both conditions gl |(q + p) and gl |(q + p + o) imply that gl |o. But since gm is the largest such divisor gm gl . By symmetry of the communication relation (replace l $ m and ¤ m $ l), gl gm which proves the lemma. The consequence of Lemma 9.2.1 is that all the states in an irreducible Markov chain have common period g. The irreducible Markov chain is periodic with period g if g A 1 else it is aperiodic (g = 1). A simple su!cient condition for an irreducible chain to be aperiodic is that Sll A 0 for some state l. Most Markov chains of practical interest are aperiodic. 9.2.2 The hitting time Let D be a subset of states, D V. The hitting time WD is the first positive time the Markov chain is in a state of the set D, thus for n 0, WD = min(n : [n 5 D). The hitting time2 of a state m follows from the definition if D = {m}. For irreducible Markov chains, the hitting time Wm is finite, for any state m. From the definition of the hitting time, the recursion X Pr [[1 = n|[0 = l] Pr [Wm = p 1|[0 = n] Pr [Wm = p|[0 = l] = n6=m is immediate. Indeed, in order to have the transition from state l to state m at discrete-time p, it is necessary to have first a transition from state l to some other state n and to pass from that state n to state m for the first time 2 The hitting time Wm is also called the first passage time into a state m. 164 Discrete-time Markov chains after p 1 time units. For a stationary Markov chain, we have for p A 0 that X Sln Pr [Wm = p 1|[0 = n] (9.13) Pr [Wm = p|[0 = l] = n6=m and, by definition for p 0, Pr [Wm = p|[0 = m] = 0p (9.14) The event {[q = m} can be decomposed in terms of the hitting time Wm . Indeed, since the events {Wm = p> [q = m} are disjointed for 1 p q, {[q = m} = ^qp=1 {Wm = p> [q = m} Applied to the q-step transition probabilities, Pr [[q = m|[0 = l] = Pr [^qp=1 {Wm = p> [q = m}|[0 = l] q X = Pr [Wm = p> [q = m|[0 = l] = p=1 q X Pr [Wm = p|[0 = l] Pr [[q = m|[0 = l> Wm = p] p=1 By definition of the hitting time, {Wm = p} = ^p31 n=1 {[n 6= m} {[p = m} such that £ ¤ Pr [[q = m|[0 = l> Wm = p] = Pr [q = m|[0 = l> ^p31 n=1 {[n 6= m} {[p = m} = Pr [[q = m|[p = m] where the last step follows from the Markov property (9.2). Thus we obtain Pr [[q = m|[0 = l] = q X Pr [Wm = p|[0 = l] Pr [[q = m|[p = m] p=1 or, written in terms of q-step transition probabilities with (9.10), Slmq = q X q3p Pr [Wm = p|[0 = l] Smm (9.15) p=1 For an absorbing state m where Smm = 1, relation (9.15) simplifies to Slmq = Pr [Wm q|[0 = l] (9.16) 9.2 Discrete-time Markov chain 165 9.2.3 Transient and recurrent states The probability that a Markov chain initiated at state l will ever come into state m is denoted as ulm = Pr [Wm ? 4|[0 = l] (9.17) If the starting state l equals the target state m, then ull is the probability of ever returning to state l. If ull = 1, the state l is a recurrent state, while, if ull ? 1, state l is a transient state. If l is a recurrent state, the Markov chain started at l will definitely (i.e. with probability 1) return to state l after some time. On the other hand, if l is a transient state, the Markov chain started at l has probability 1 ull of never returning to state l. For an absorbing state l defined by Sll = 1, we have by (9.16) that ulm = 1, implying that an absorbing state is a recurrent state. Further, the mean return time to state m when the chain started in m is denoted by pm = H [Wm ? 4|[0 = m] (9.18) Concepts of renewal theory will now be applied to a Markov process. Let Qn (m) denote the number of times that the Markov chain is in state m during the time interval [1> n] given the chain started in state l or Qn (m) = Pn q=1 1{[q =m|[0 =l} . Using (2.13), the average number of visits to state m in the time interval [1> n] is " n # n X X £ ¤ 1{[q =m} |[0 = l = H 1{[q =m} |[0 = l H [Qn (m)|[0 = l] = H q=1 = n X q=1 Pr [[q = m|[0 = l] q=1 or, in terms of q-step transition probabilities (9.10), H [Qn (m)|[0 = l] = n X Slmq (9.19) q=1 The average number of times that the Markov chain is ever in state m given that it started from state l, is with Q (m) = limn<" Qn (m), H [Q (m)|[0 = l] = " X q=1 Pr [[q = m|[0 = l] = " X Slmq q=1 Hence, if state m is reachable from state l, by definition, there is some n for which Slmn A 0, which implies that H [Q (m)|[0 = l] A 0. Further, consider the probability Pr [Q (m) q|[0 = l] that the number of returns to 166 Discrete-time Markov chains state m exceeds q, given the Markov chain started from state l. The event {Q (m) q} is equivalent to the occurrence of the events {Q (m) q 1} and the event that the Markov chain will return to m again given that it started from m. The probability of the latter event is precisely umm . Thus, we obtain the recursion Pr [Q (m) q|[0 = l] = umm Pr [Q (m) q 1|[0 = l] with solution for q 1, Pr [Q (m) q|[0 = l] = (umm )q31 Pr [Q (m) 1|[0 = l] Now, Pr [Q (m) 1|[0 = l] = Pr [Wm ? 4|[0 = l] = ulm , such that Pr [Q (m) q|[0 = l] = (umm )q31 ulm (9.20) The average computed with (2.36) yields, H [Q (m)|[0 = l] = " X ulm = Slmq 1 umm q=1 (9.21) provided ulm A 0. If ulm = 0 then (9.20) vanishes for every q and thus H [Q(m)|[0 = l] = 0, which means that state m is not reachable from state l. In summary: • For a recurrent state m for which umm = 1, we obtain from (9.21) that H [Q (m)|[0 = l] $ 4 (if ulm 6= 0 else H [Q (m)|[0 = l] = 0) and from (9.20), Pr [Q(m) = 4|[0 = l] = lim Pr [Q (m) q|[0 = l] = ulm q<" P q A state m is recurrent if and only if " q=1 Smm $ 4. • For a transient state m for which umm ? 1, there holds that H [Q (m)|[0 = l] will be finite and Pr [Q (m) = 4|[0 = l] = 0 or, equivalently, Pr [Q (m) ? 4|[0 = l] = 1 P q A state m is transient if and only if " q=1 Smm is finite. These relations explain the dierence between a recurrent and a transient state. When the Markov chain starts at a recurrent state, it returns infinitely often to that state because umm = Pr [Q (m) = 4|[0 = m] = 1. If the chain starts at some other state l that is reachable from state m (ulm A 0), then the chain will visit state m infinitely often. From this analysis, some consequences arise. 9.2 Discrete-time Markov chain 167 Corollary 9.2.2 A finite-state Markov chain must have at least one recurrent state. Proof: Suppose the contrary that, if the state space V is finite, all states are transient states. For a transient state m it follows from (9.21) that P" n n n=1 Slm is finite, which implies that limn<" Slm = 0 for any other state l. If the state space is finite and all states are transient states, then X lim Slmn = 0 mMV n<" Since the summation has a finite number of terms, the limit and summation operator can be reversed, X lim Slmn = 0 n<" mMV But the total law of probability (9.11) requires that time n), which leads to a contradiction. ¤ P n mMV Slm = 1 (for any Theorem 9.2.3 If l is a recurrent state that leads to a state m, then the state m is also a recurrent state and ulm = uml = 1. Proof: Clearly, the theorem is true if l = m. Suppose that, for l 6= m, Pr [Wl ? 4|[0 = m] = uml ? 1. This implies that the Markov chain starting from state m has probability 1 uml A 0 of never hitting state l, which is impossible because l is a recurrent state that will be visited infinitely often. Hence uml = 1. Since state m 6= l is reachable from state l, by definition ulm A 0 and there is a minimum discrete-time q such that Slmq A 0 and Pr [[n = m|[0 = l] = 0 for n ? q. Similarly, since uml = 1, there exists a minimum discrete-time p to have a transition from m $ l given the chain started in state m, thus, Pr [[p = l|[0 = m] = Smlp A 0. From (9.12) and the fact that Slmq A 0 and Smlp A 0 and state l is a recurrent state such that Sllo A 0, we have, for any o 0, q+o+p Smlp Sllo Slmq A 0 Smm Summing over all o, " X o=1 q+o+p Smm Slmq Smlp " X o=1 Sllo 168 or Discrete-time Markov chains " X o=1 o Smm A " X o Smm = o=q+p+1 " X q+o+p Smm Slmq Smlp o=1 " X Sllo o=1 P o It follows from (9.21) that the right-hand side diverges. Hence, " o=1 Smm diverges and relation (9.21) indicates that umm = 1 or, that m must be a recurrent state. ¤ A non-empty set F V of states is said to be closed if no state l 5 F @ F, which is leads to a state m 5 @ F. Thus, ulm = 0 for any l 5 F and m 5 equivalent to Slm = 0. If the set F is closed, the Markov chain starting in F will remain, with probability 1, in F all the time. For example, if l is an absorbing state, F = {l} is closed. A closed set F is irreducible if state l is reachable from state m for all l> m 5 F. Theorem 9.2.3 together with Corollary 9.2.2 implies that, if F is a finite, irreducible closed set, all states are recurrent. 9.3 The steady-state of a Markov chain 9.3.1 The irreducible Markov chain The steady-state vector = limn<" v[n] follows, after taking the limit n $ 4 in (9.6), as = =S (9.22) or, for each component m , m = Q X Snm n (9.23) n=1 with =x = 1 or kk1 = 1. Equation (9.22) shows that the steady-state vector does not depend on the initial state v[0]. Alternatively, in view of (9.9), we trivially write S n S n31 S = 0 or S S n31 S n = 0 and if D = limn<" S n exists, then DDS = 0 or S DD = 0. This implies that D(L S ) = (L S )D = 0 or D = T(1), where T() is the adjoint matrix of S (see Appendix A.1, art. 7). The non-zero columns (or rows) of the adjoint matrix T() consist of the (unscaled) eigenvector(s) belonging to eigenvalue . By (9.11) the rows of S n for any n are normalized and so must D = T(1). Since there is only one eigenvector belonging to = 1 (Frobenius Theorem A.4.2), the rows of D = limn<" S n must all be the same and equal to = [d11 d12 · · · d1Q ]. Furthermore, only if all rows of D are equal to the steady-state vector , then the dependence on the 9.3 The steady-state of a Markov chain 169 PQ initial state v[0] vanishes since relation (9.9) becomes m = l=1 vl [0]dlm = P n d1m Q l=1 vl [0] = d1m . Hence, D = limn<" S = x= or, componentwise, for all 1 m Q, lim Slmn = m (9.24) n<" The sequence of matrices S> S 2 > S 3 > = = = > S n thus converges to D = x= for su!ciently large n. Instead of multiplying the last matrix S n in the sequence by S to obtain the next one S n+1 , with a same computational eort, the n sequence S> S 2 > S 4 > = = = > S 2 , obtained by successively squaring, converges considerably faster to D = x= and may be useful for sparse S . On the other hand, relation (9.22) is an eigenvalue equation with eigenvalue = 1 and eigenvector . The Frobenius Theorem A.4.2 states that the transition probability matrix S has one eigenvalue = 1 with corresponding eigenvector . Since in (9.22) the set (S L)W W = 0 has rank Q 1, the normalization condition kk1 = 1 furnishes the (last) remaining equation. Except for the trivial case where S is the identity matrix L, the solution of is obtained from 5 S21 S31 S11 1 9 S12 S22 1 S32 9 S S23 S33 1 9 13 9 .. .. .. 9 . . . 9 7 S S2;Q 1 S3;Q 1 1;Q 1 1 1 1 ··· ··· ··· .. . ··· ··· 6 5 6 5 6 0 : 9 0: : 9 : 9 : : 9 : 9 0: : 9 : =9 . : : =9 : 9 . : : 9 : 9 . : : 9 8 7 Q 1 8 7 0 8 SQ 1;Q 1 1 SQ;Q 1 1 1 Q 1 (9.25) SQ1;1 SQ1;2 SQ1;3 .. . SQ 1 SQ 2 SQ 3 .. . 1 2 3 .. . In practice, this method is used, especially if the number of states Q is large and the transition probability matrix S does not exhibit a special matrix structure. In summary, for irreducible Markov chains, there are in general two ways3 of computing the state distribution : via the limiting process (9.24) or via solving the set of linear equations (9.25). Recall that we have invoked the Frobenius Theorem A.4.2, which is only applicable for irreducible Markov chains. There exist cases of practical interest where (9.24) fails to hold. For example, in the two-state Markov chain studied in Section 9.3.3, there is a chain where the limit state bounces back and forth between state 0 and state 1. It is of importance to know whether the steady-state distribution exists in the sense that m 6= 0 for at least one m. If m = 0 for all m, then there is no stationary (or equilibrium or steady-state) probability distribution. 3 A third method consists of a directed graph solution of linear, algebraic equations discussed by Chen (1971, Chapter 3) and applied to the steady state equation (9.22) by Hooghiemstra and Koole (2000). 170 Discrete-time Markov chains 9.3.2 The average number of visits to a recurrent state A direct application of Lemma 6.1.1 to the steady-state of a Markov chain is that, if (9.24) holds, then 1 X p Slm = m lim q<" q p=1 q Invoking (9.19) where Qqq(m) is the fraction of the time the chain is in state m during the interval [1> q], the relation is equivalent to H [Qq (m)|[0 = l] = m q<" q lim (9.26) The time average of the average number of visits to state m given the Markov chain started in state l converges to the steady-state distribution. In other words, the long run mean fraction of time that the chain spends in state m equals m and is independent of the initial state l. From (9.21), it immediately follows that, if m is a transient state, m = 0. Only recurrent states m have a non-zero probability m that the steady-state is in recurrent state m. Lemma 6.1.1 and its consequence (9.26) suggests to investigate Qqq(m) for recurrent states m. If the Markov chain starts in a recurrent state m, we know from (9.21) that the chain returns to state m infinitely often. Let Zn (m) denote the time of the n-th visit of the Markov chain to state m. Then, Zn (m) = min(Qp (m) = n) pD1 The interarrival time between the n-th and (n1)-th visit is n (m) = Zn (m) Zn31 (m). The interarrival times {n (m)}nD1 are independent and identically distributed random variables as follows from the Markov property. Indeed, every time w the Markov chain returns to state m, it behaves from that time onwards as if the Markov process would have started from state m, ignoring the past before time w. Moreover, they have a common mean H [ (m)] = H [1 (m)] equal to the mean return time to m given by H [Wm |[0 = m] = pm because the hitting time is Wm = 1 (m). In other words, just as in renewal theory in Chapter 8, we have a counting process {Qp (m)> p 1} with associated waiting times Zn (m) and i.i.d. interarrival times n (m), specified by the equivalence {Qp (m) ? n} +, {Zn (m) A p} Invoking the Elementary Renewal Theorem (8.13), we obtain with pm = 9.3 The steady-state of a Markov chain 171 H [Wm |[0 = m] lim q<" Qq (m) 1 = q pm (9.27) Thus, the chain returns to state m on average every pm time units and, hence, the fraction of time the chain is in state m is roughly p1m . These results are summarized as follows: Theorem 9.3.1 (Limit Law of Markov Chains) If m is a recurrent state and the Markov chain starts in state l, then, with probability 1, 1{Wm ?"} Qq (m) = q<" q pm lim (9.28) and lim q<" ulm H [Qq (m)|[0 = l] = q pm (9.29) Proof: Above we have proved the case (9.27) where the initial state [0 = m. In that case 1{Wm ?"} = 1. For an arbitrary initial distribution, it is possible that the chain will never reach the recurrent state m. In that case 1{Wm ?"} = 0 given [0 = l. It remains to proof (9.29). By definition, 0 Qq (m) q or, 0 Qqq(m) 1, which demonstrates that, for any q, Qqq(m) is bounded. From the Dominated Convergence Theorem 6.1.4, we have ¯ ¯ ¸ ¸ Qq (m) ¯¯ Qq (m) ¯¯ [0 = l = H lim [0 = l lim H q<" q<" q ¯ q ¯ ¯ ¸ 1{Wm ?"} ¯ ¯ [0 = l =H pm ¯ ulm Pr [Wm ? 4|[0 = l] = = pm pm which completes the proof. ¤ Theorem 9.3.1 introduces the need for an additional definition. A recurrent state m is called null recurrent if pm = 4, in which case (9.29) reduces to H [Qq (m)|[0 = l] =0 (9.30) lim q<" q By Tauberian theorems (which investigate conditions for the converse of Lemma 6.1.1 but which are far more di!cult, as illustrated in the book by Hardy (1948)), it can be shown that, for null recurrent states, the stronger result limq<" Slmq = 0 also holds. A recurrent state m is called positive 172 Discrete-time Markov chains recurrent if pm ? 4. The dierence between a transient and a null recurrent state that both obey (9.30) lies in the fact that, for a transient state, the limit limq<" H [Qq (m)|[0 = m] is finite while, for a null recurrent state, limq<" H [Qq (m)|[0 = m] = 4. Relation (9.30) indicates that for a null recurrent state H [Qq (m)|[0 = m] = R (qd ) where 0 ? d ? 1, while for a positive recurrent state H [Qq (m)|[0 = m] = m q + r (q) The strength of the increase of H [Qq (m)|[0 = m] leads to term positive recurrent states also as strongly ergodic states while null recurrent states are called weakly ergodic. Figure 9.1 sketches the classification of states in a Markov process. state j transient recurrent Sj does not exist positive recurrent Sj > 0 aperiodic lim Pijk S j kof null recurrent Sj = 0 periodic lim Pijkd d S j k of Fig. 9.1. Classification of the states in a Markov process with the corresponding steady state vector m . With these additional definitions, Corollary 9.2.2 can be sharpened as follows: Corollary 9.3.2 A finite-state Markov chain must have at least one positive recurrent state. 9.3 The steady-state of a Markov chain 173 Proof: By summing (9.11) over q and dividing by q, we find 1XX p Slm = 1 q p=1 Q q m=1 Using (9.19) yields Q X H [Qq (m)|[0 = l] q m=1 =1 When taking the limit q $ 4 of both sides, the summation and limit operator can be reversed because the summation involves a finite number of terms. Hence, Q X m=1 H [Qq (m)|[0 = l] =1 q<" q lim which is only possible if at least one state m is positive recurrent because transient and null recurrent states obey (9.30). ¤ Similarly, Theorem 9.2.3 and the combined consequence can be sharpened: Theorem 9.3.3 If l is a positive recurrent state that leads to a state m, then the state m is also a positive recurrent state. Theorem 9.3.4 An irreducible Markov chain with finite-state space is positive recurrent. Alternatively, a Markov chain with a finite number of states has no null recurrent states. Thus, finite-state Markov chains appear to have simpler behavior than infinite-state Markov chains. Theorem 9.3.5 For an irreducible, positive recurrent Markov chain (even with an infinite-state space), the steady-state is unique. Proof: The steady-state of a positive recurrent irreducible Markov chain satisfies both (9.23) and (9.24), even for an infinite-state Markov chain. Suppose that d 6= is a second steady-state vector which satisfies kdk1 = 1 and dm = Q X n=1 Snm dn (9.31) 174 Discrete-time Markov chains Multiplying both sides by Sml and summing over all m Q X Sml dm = m=1 Q X Sml m=1 = Q X Q X Snm dn n=1 dn n=1 Q X Snm Sml = m=1 Q X 2 dn Snl n=1 The reversal in m- and n-summation is always allowed (even for Q $ 4) by absolute convergence. Using (9.31), dl = Q X 2 dn Snl n=1 Repeating this process leads, for any q 1 and l 1 to dl = Q X q dn Snl n=1 In the limit for q $ 4, application of (9.24) yields dl = l Q X dn = l n=1 which demonstrates uniqueness. ¤ Theorem 9.3.6 For an irreducible, positive recurrent Markov chain holds lim q<" H [Qq (m)|[0 = l] Qq (m) 1 = m = lim = q<" q q pm (9.32) and Qq (m) qm g $ Q(0> 1) 3@2 s m m q (9.33) where m2 = Var[Wm |[0 = m] Proof: For an irreducible, finite-state Markov chain where ulm = 1, Theorem 9.3.1 and Theorem 9.3.4 together with (9.26) lead to the fundamental relation (9.32). Relation (9.33) is an application of the Asymptotic Renewal Distribution Theorem 8.2.3. We have shown that the interarrival times {n (m)}nD1 are i.i.d. with mean H [ (m)] = H [Wm |[0 = m] = pm and ¤ (assumed finite) variance Var[ (m)] = Var[Wm |[0 = m] = m2 . 9.3 The steady-state of a Markov chain 175 As a corollary, from (8.16), we have Var [Qq (m)|[0 = l] = m2 m3 q Moreover, since kk1 = 1, it must hold from (9.32) that lim q<" Q X 1 m=1 pm (9.34) =1 and, from (9.22), that X Slm 1 = pm pl Q l=1 A Markov chain that is irreducible and for which all states are positive recurrent is said to be ergodic. Ergodicity implies that both the steady-state distribution and the long-run probability distribution limn<" v[n] are the same. Ergodic Markov chains are basic stochastic processes in the study of queueing theory. 9.3.3 Example: the two-state Markov chain The two-state Markov chain is defined by ¸ 1s s S = t 1t and illustrated in Fig. 9.2. A matrix computation of the two-state Markov 1p p 0 1q 1 q Fig. 9.2. A two-state Markov chain. chain is presented in Appendix A.4.2. Here, we follow a probabilistic approach. Since there are only two states, at any discrete-time n, there holds that Pr [[n = 0] = 1Pr [[n = 1]. Hence, it su!ces to compute Pr [[n = 0]. By the total law of probability and the Markov property (9.2), we have Pr [[n+1 = 0] = Pr [[n+1 = 0|[n = 1] Pr [[n = 1] + Pr [[n+1 = 0|[n = 0] Pr [[n = 0] 176 Discrete-time Markov chains or, from Fig. 9.2, the Markov chain can only be in state 0 at time n + 1, if it is in state 0 at time n and the next event at time n + 1 brings it back to that same state 0, or if it is in state 1 at time n and the next event at time n + 1 induces a transfer to state 0. Introducing the transition probabilities, Pr [[n+1 = 0] = t Pr [[n = 1] + (1 s) Pr [[n = 0] = t (1 Pr [[n = 0]) + (1 s) Pr [[n = 0] = (1 s t) Pr [[n = 0] + t This recursion can be iterated back to n = 0, n31 X (1 s t)m Pr [[n = 0] = (1 s t) Pr [[0 = 0] + t n m=0 Using the finite geometric series n, Pn31 13{n m m=0 { = 13{ for any { 6= 1 else Pn31 ¶ µ t t n + (1 s t) Pr [[0 = 0] Pr [[n = 0] = s+t s+t m m=0 { = (9.35) With Pr [[n = 1] = 1 Pr [[n = 0], ¶ µ s s + (1 s t)n Pr [[0 = 1] (9.36) s+t s+t ¤ £ If |1 s t| ? 1, the state = Pr [[" = 0] Pr [[" = 1] directly follows as h i t s = s+t (9.37) s+t Pr [[n = 1] = t = Pr [[" = 0] and Observe from (9.35) and (9.36) that, if Pr [[0 = 0] = s+t s Pr [[0 = 1] = s+t = Pr [[" = 1], the Markov chain starts and remains the whole time (for all n) in the steady-state. In addition, the probability of a particular sequence of states can be computed from (9.3) or directly from Fig. 9.2. For example, Pr [[0 = 1> [1 = 0> [2 = 1> [3 = 1] = ts(1 t) Pr [[0 = 1] We distinguish three cases: (i) s = t = 0: The Markov chain consists of two separate states that do not communicate. Each state can be considered as a single state, irreducible, Markov chain. Any real number belonging to [0> 1] is a steady-state solution of each separate set. Also, S = L and, hence, S " = limn<" S n = L. 9.4 Problems 177 (ii) 0 ? s + t ? 2: The Markov chain is aperiodic irreducible positive recurrent with steady-state given in (9.37). This is the regular case. (iii) s = t = 1: The Markov chain is periodic with period £ 1 12,¤ but still irreducible positive recurrent with steady-state = 2 2 given above. However, S 2q = L and S 2q+1 = S such that limn<" S n does not exist, but 1 1 ¸ n 1X m lim S = 21 12 n<" n 2 2 m=1 9.4 Problems (i) Given the transition probability matrix S> 5 6 0=8 0=2 0=0 S = 7 0=8 0=0 0=2 8 0=0 0=8 0=2 (a) draw the Markov chain, (b) compute the steady-state vector in three dierent ways. (ii) Consider the discrete-time Markov chain with Q states and with transition probabilities at each state m, Sm>m+1 = 1 Sm1 = 1 m 1 m (a) draw the Markov chain, (b) show that the drift is positive, but that the Markov chain is nevertheless recurrent. (iii) Assume that trees in a forest fall into four age groups. Let e [n], | [n], p [n] and x [n] denote the number of baby trees, young trees, middle-aged trees and old trees, respectively, in the forest at a given time period n. A time period lasts 15 years. During a time period, the total number of trees remains constant, but a certain percentage of trees in each age group dies and is replaced with baby trees. All surviving trees in the baby, young and middle-aged group enter into the next age group. Surviving old trees remain old. Let 0 ? se > s| > sp > sr ? 1 denote the loss rates in each age group in percent. (a) Make a discrete Markov chain presentation of the process of aging and replacement in the forest. 178 Discrete-time Markov chains (b) The distribution of tree population amongst dierent age categories in time period n is represented by £ ¤W { [n] = e [n] | [n] p [n] x [n] If {[n + 1] = S {[n], what is the transition probability matrix S? (c) Let se = 0=1> s| = 0=2> sp = 0=3> sr = 0=4 and suppose that £ ¤W { [n] = 5000 0 0 0 . What is the number of trees in each category after 15 and after 30 years? (d) What is the steady-state situation? (iv) A faulty digital video conferencing system shows a clustered error pattern. If a bit is received correctly, then the chance to receive the next bit correctly is 0.999. If a bit is received incorrectly, then the next bit is incorrect with probability 0.95. (a) Model the error pattern of this system using the discrete-time Markov chain. (b) How many communicating classes does the Markov chain have? Is it irreducible? (c) In the long run, what is the fraction of correctly received bits and the fraction of incorrectly received bits? (d) After the system is repaired, it works properly for 99.9% of the time. A test sequence after repair shows that, when always starting with a correctly received bit, the next 10 bits are correctly received with probability 0.9999. What is the probability now that a correctly (and analogously incorrectly) received bit is followed by another correct (incorrect) bit? 10 Continuous-time Markov chains Just as it was convenient in Chapter 2 to treat discrete and continuous random variables distinctly, the same recipe is advised for discrete-time and continuous-time Markov chains. Here also, it appears that the continuous case is more intricate than the discrete counterpart. 10.1 Definition For the continuous-time Markov chain {[(w)> w 0} with Q states, the Markov property (9.1) can be written as Pr[[(w + ) = m|[( ) = l> [(x) = {(x)> 0 x ? ] = Pr[[(w + ) = m|[( ) = l] and reflects the fact that the future state at time w + only depends on the current state at time . Similarly as for the discrete-time Markov chain, we assume that the transition probabilities for the continuous-time Markov chain {[(w)> w 0} are stationary, i.e. independent of a point in time, Slm (w) = Pr [[(w + ) = m|[( ) = l] = Pr [[(w) = m|[(0) = l] (10.1) Analogous to (9.5) and (9.6), the state vector v(w) in continuous-time with components vn (w) = Pr [[(w) = n] obeys v(w + ) = v( )S (w) (10.2) Immediately, it follows from (10.2) that v(w + x + ) = v( )S (w + x) v(w + x + ) = v( + x)S (w) = v( )S (x)S (w) = v( + w)S (x) = v( )S (w)S (x) such that, for all w> x 0, the Q × Q transition probability matrix S (w) 179 180 Continuous-time Markov chains satisfies S (w + x) = S (x)S (w) = S (w)S (x) (10.3) This fundamental relation1 (10.3) is called the Chapman-Kolmogorov equation. Furthermore, since the Markov chain must be at any time in one of the Q states, the analogon of (9.8) is, for any state l, Q X Slm (w) = 1 (10.4) m=1 For continuous-time Markov chains, it is convenient to postulate the initial condition of the transition probability matrix S (0) = L (10.5) where S (0) = limw0 S (w). The relations (10.1), (10.3), (10.4) and (10.5) are su!cient to describe the continuous-time Markov process completely. 10.2 Properties of continuous-time Markov processes We will now concentrate on typical properties of a continuous-time Markov process. 10.2.1 The infinitesimal generator T Lemma 10.2.1 The transition probability matrix S (w) is continuous for all w 0. Proof: Continuity is proved if limk<0 S (w + k) = limk<0 S (w k) = S (w). From (10.3) and (10.5), we have for k A 0, lim S (w + k) = S (w) lim S (k) = S (w)L = S (w) k<0 k<0 Similarly, the other limit follows for w A 0 and 0 ? k ? w from S (w) = S (w k)S (k). ¤ If a function is dierentiable, it is continuous. However, the converse is not generally true. Therefore, we include the additional assumption that 1 On a higher level of abstraction, S (w) can be viewed as a linear operator acting upon the vector space defined by all possible state vectors v(w). Relation S (w + x) = S (x)S (w) is known as the semigroup property. The family of these commuting operators possesses an interesting algebraic structure (see e.g. Schoutens (2000)). 10.2 Properties of continuous-time Markov processes 181 S (k) L = S 0 (0) = T k0 k (10.6) the matrix lim exists. This matrix T is called the infinitesimal generator of the continuoustime Markov process and it plays an important role as shown below. The infinitesimal generator T corresponds to S L in discrete-time. From (10.4), Q X Slm (k) = 1 Sll (k) m=1>m6=l and, dividing both sides by k and letting k approach zero, we find for each l with the definition of T that Q X tlm = tll 0 (10.7) m=1>m6=l S (k) Hence, the sum of the rows in T is zero, tlm = limk0 lmk 0 and tll 0. The elements tlm of T are derivatives of probabilities and reflect a change in transition probability from state l towards state m, which suggests us to call P them “rates”. Usually, one defines tl = tll 0. Then, Q m=1 |tlm | = 2tl , which demonstrates that T is bounded if and only if the rates tl are bounded. Karlin and Taylor (1981, p. 140) show that tlm is always finite. For finitestate Markov processes, tm are finite (since tlm are finite), but, in general, tm can be infinite. If tm = 4, the state is called instantaneous because when the process enters this state, it immediately leaves the state. In the sequel, we confine the discussion to non-instantaneous states, thus 0 tm ? 4. Continuous-time Markov chains with all states non-instantaneous are coined conservative. Probabilistically, (10.1) indicates that, for small k, Pr [[(w + k) = m|[(w) = l] = tlm k + r(k) Pr [[(w + k) = l|[(w) = l] = 1 tl k + r(k) (l 6= m) (10.8) which clearly generalizes the Poisson process (see Theorem 7.3.1) and motivates us to call tl the rate corresponding to state l. Lemma 10.2.2 Given the infinitesimal generator T, the transition probability matrix S (w) is dierentiable for all w 0, S 0 (w) = S (w)T (10.9) = TS (w) (10.10) 182 Continuous-time Markov chains These equations are called the forward (10.9) and backward (10.10) equation. Proof: For w = 0, the lemma follows from the existence of T = S 0 (0). The derivative S 0 (w) is defined, for w A 0, as S 0 (w) = lim k<0 S (w + k) S (w) k where the derivative of the matrix has elements Slm0 (w) = (10.3), gSlm (w) gw . Using S (w + k) S (w) = S (w)S (k) S (w) = S (w) (S (k) L) = S (k)S (w) S (w) = (S (k) L) S (w) we obtain S (k) L = S (w)T k<0 k S (k) L S (w) = TS (w) = lim k<0 k S 0 (w) = S (w) lim which proves the lemma. ¤ Suppose we are interested in the probabilities vn (w) = Pr [[(w) = n] of finding the system in state n at time w. Each component of the state vector v(w) is determined by (10.2) as vn (w + k) = Q X vm (w)Smn (k) m=1 from which Q X Smn (k) vn (w + k) vn (w) Snn (k) 1 = vn (w) + vm (w) k k k m=1>m6=n In the limit k & 0, we find with tmn = limk0 the dierential equation for vn (w), v0n (w) = tn vn (w) + Smn (k) and tn = limk0 13Sknn (k) k Q X tmn vm (w) (10.11) m=1>m6=n which, together with the initial condition vn (0), completely determines the probability vn (w) that the Markov process is in state n at time w= 10.2 Properties of continuous-time Markov processes 183 10.2.2 Algebraic properties of the infinitesimal generator T Equation (10.10) is a matrix dierential equation in w that can be similarly solved as the scalar dierential equation i 0 (w) = ti (w). With the initial condition (10.5), the solution is S (w) = hTw (10.12) which demonstrates the importance of the infinitesimal generator T, explicitly given by 6 5 t1 t12 t13 · · · t1;Q 31 t1Q 9 t21 t2 t23 · · · t2;Q 31 t2Q : 9 : 9 t31 t32 t3 · · · t3;Q 31 t3Q : 9 : T=9 (10.13) : .. .. .. .. .. .. 9 : . . . . . . 9 : 7 tQ31;1 tQ31;2 tQ31;3 · · · tQ31 tQ31;Q 8 tQ1 tQ;2 tQ 3 ··· tQ;Q31 tQ Moreover, if all eigenvalues n of T are distinct, art. 4 and art. 8 in Appendix A.1 indicate that S (w) = hTw = [diag(hn w )\ W (10.14) where [ and \ contain as columns the right- and left-eigenvectors of T respectively. Written explicitly in terms of the right-eigenvectors {n and left-eigenvectors |n (which both are an 1 × Q matrices or column vectors as common in vector algebra), (10.14) reads S (w) = Q X hn w {n |nW n=1 where the inner or scalar vector product |nW {n = 1 while the outer product {n |nW is an Q × Q matrix, 6 5 {n1 |n1 {n1 |n2 {n1 |n3 · · · {n1 |nQ 9 {n2 |n1 {n2 |n2 {n2 |n3 · · · {n2 |nQ : 9 : 9 : W {n |n = 9 {n3 |n1 {n3 |n2 {n3 |n3 · · · {n3 |nQ : 9 : . . . . . .. .. .. .. .. 7 8 {nQ |n1 {nQ |n2 {nQ |n3 · · · {nQ |nQ If we further assume (thus omitting pathological cases) that S (w) is a stochastic, irreducible matrix ¯ for¯ any time w, Frobenius’ Theorem A.4.2 indicates that all eigenvalues ¯hn w ¯ ? 1 and that only the largest one is precisely equal to 1, say h1 w = 1, which corresponds to the steady-state eigenvector |1W = and {1 = x, where xW = [1 1 · · · 1]. Frobenius’ Theorem A.4.2 184 Continuous-time Markov chains implies that all eigenvalues of T have a negative real part, except for the steady-state eigenvalue 1 = 0. Hence, we may write S (w) = x + Q X h3|Re n w|+Im n w {n |nW (10.15) n=2 where S" = x is the Q × Q matrix with each row containing the steadystate vector . The expression (10.15) is called the spectral or eigen decomposition of the transition probability matrix S (w). Apart from the eigen decomposition method and the Taylor expansion Tw h = " X (Tw)n n=0 n! the matrix equivalent of h{ = limq<" (1 + {@q)q can be used, ¶ µ Tw q Tw S (w) = h = lim L + q<" q (10.16) (10.17) Since T has negative diagonal elements and positive o-diagonal elements, computing the powers Tn as required in (10.16) suers from numerical rounding-o error propagation. Relation (10.17) circumvents this problem by choosing q su!ciently high, maxl tl w ? q, such that L + Tw q has nonnegative elements smaller than 1 everywhere. For stochastic matrices S , n the sequence S> S 2 > S 4 > = = = > S 2 rapidly converges. Yet another useful representation (10.24) of S (w) is discussed in Section 10.4.1. 10.2.3 Exponential sojourn times We end this section on properties by proving a remarkable and important characteristic of continuous-time Markov processes. Theorem 10.2.3 The sojourn times m of a continuous-time Markov process in a state m are independent, exponential random variables with mean t1m . Proof: The independence of the sojourn times follows from the Markov property (see the renewal argument in Section 9.3.2). The exponential sojourn time is proved in two dierent ways. 1. The proof consists in demonstrating that the sojourn times m satisfy the memoryless property. In Section 3.2.2, it has been shown that the only continuous distribution that satisfies the memoryless property is the exponential distribution. 10.2 Properties of continuous-time Markov processes 185 The event {m w + W |m A W } for any W 0 and w 0 is equivalent to the event {[(w + W + x) = m|[(W + x) = m> [(x) = m}. According to the Markov property (9.1) and with (10.1), Pr [m w + W |m A W ] = Pr [[(w + W + x) = m|[(W + x) = m> [(x) = m] = Pr [[(w + W + x) = m|[(W + x) = m] = Smm (w) which is independent of W illustrating the memoryless property. Using the definition of conditional probability (2.44), Pr [m w + W |m A W ] = Pr [m w + W ] = Smm (w) Pr [m A W ] which holds for any W and thus also for W = 0, where Pr [m A 0] = 1. The distribution of the sojourn time at state m satisfies Pr [m w] = h3m w = Smm (w) After dierentiation evaluated at w = 0, we find m = tm . 2. An alternative demonstration of the exponential sojourn times starts by considering for an initial state m, the probability Kq that the process remains in state m during an interval [0> w]. The idea is to first sample the continuous-time interval with step qw and afterwards proceed to the limit q $ 4, which corresponds to a sampling with infinitesimally small step, µ ¶ µ ¶ ¸ w 2w Kq = Pr [(0) = m> [ = m> [ = m> = = = > [ (w) = m q q ¯ µ ¶ ¶ ¸ µ q31 Y ¯ (p + 1)w pw ¯ Pr [ = m¯ [ = m Pr [[(0) = m] = q q p=0 ¯ µ µ ¶ ¸¶q ¯ w ¯ = Pr [ Pr [[(0) = m] = m ¯ [ (0) = m q µ ¶¸q w Pr [[(0) = m] = Smm q ¡ ¢ where (9.3) and (10.1) are used. For large q, Smm qw can be expanded in a Taylor series around the origin, µ ¶ µ ¶ w 1 w 0 Smm = Smm (0) + Smm (0) + R q q q2 µ ¶ w 1 = 1 tm + R q q2 186 Continuous-time Markov chains such that µ ¶¶¸ µ µ ¶¸q 1 w w = exp q log 1 tm + R Smm q q q2 For large q the logarithm can be expanded to first order as µ µ ¶¶ µ ¶ w w 1 1 log 1 tm + R = tm + R q q2 q q2 which shows that µ ¶¸q w = h3tm w lim Smm q<" q On the other hand, lim Kq = Pr [[(x) = m> 0 x w] q<" Hence, the probability that the process remains in state m at least for a duration w equals Pr [[(x) = m> 0 x w] = h3tm w Pr [[(0) = m] Conditioned to the initial state with (2.44), Pr [[(x) = m> 0 x w|[(0) = m] = Pr [m w] = h3tm w (10.18) Without resorting to the memoryless property, Theorem 10.2.3 has been proved. ¤ In summary, the continuous-time Markov process {[(w)> w 5 W } can be described in two equivalent ways, either by the transition probability matrix S (w) or by the infinitesimal generator T. In the first description, the process starts at time w = w0 = 0 in state {0 , where it stays until a transition occurs at w = w1 , which makes the process jump to state {1 . In state {1 , the process stays until w = w2 at which time it jumps to state {2 , and so on. The sequence of states {0 > {1 > {2 > = = = is a discrete Markov process and is called the embedded Markov chain. The embedded Markov chain is further discussed in Section 10.4. The infinitesimal description based on T formulates the evolution of the process in terms of rates. The process waits in a state m until a jump or trigger occurs with rate tm and the average waiting time in state m is t1m . If tm = 0, the Markov process stays infinitely long in state m, implying that state m is an absorbing state. 10.3 Steady-state 187 10.3 Steady-state Theorems 9.3.4 and 9.3.6 demonstrate that, when a finite-state Markov chain is irreducible (all states communicate and Slm (w) A 0), the steady-state exists. Since, by definition, the steady-state does not change over time, or limw<" S 0 (w) = 0, it follows from (10.9) and (10.10) that TS" = S" T = 0 where limw<" S (w) = S" . This relation implies that S" is the adjoint matrix of T belonging to eigenvalue = 0, which plays a role analogous to = 1 in the discrete case. By the same arguments as in the discrete case and as shown in Section 10.2.2, all rows of S" are proportional to the eigenvector of T belonging to = 0. Thus, the steady-state (row) vector is solution of T = 0 (10.19) which means that is orthogonal to any column vector of T such that necessarily det T = 0 in order for a non-zero solution to exist. A single component of in (10.19) obeys, using (10.7), l tl = Q X m tml (10.20) m=1>m6=l This equation has a continuity or conservation law interpretation. The lefthand side reflects the long-run rate at which the process leaves state l. The right-hand side is the sum of the long-run rates of transitions towards the state l from other states l 6= m or the aggregate long-run rate towards state l. Both in- and outwards flux at any state l are in steady-state precisely in balance. Therefore relations (10.20) are called the balance equations. The balance equation (10.20) directly follows from the dierential equation (10.11) of the state probabilities vn (w) since limw<" vn (w) = n and limw<" v0n (w) = 0. Alternatively, the steady-state vector obeys (10.2) or = v(0)S" = lim v(0)hTw w<" which, together with (10.14), implies that all eigenvalues of T must have negative real part such that only = 0 determines the steady-state. This stability condition on the eigenvalues corresponds to that in a linear, time-variant system. Since all rows in S" are equal (see also (10.15)), the dependence of the steady-state vector on the initial state drops out. For, analogous to 188 Continuous-time Markov chains the discrete-time case and recalling the normalization kv(0)k1 = 1, a single component becomes m = Q X vn (0) (S" )nm = (S" )1m n=1 Q X vn (0) = (S" )1m n=1 10.4 The embedded Markov chain The main dierence between discrete and continuous-time Markov chains lies, apart from the concept of time, in the determination of the number of transitions. The sojourn time in a discrete chain is deterministic and all times are equal to 1. In other words, if Ilm (w) denotes the distribution function of the time until a transition from state l to state m occurs, then it is plain that, for a discrete-time process, Ilm (w) = 1wD1 Even though the process remains in state m with probability Smm , there has been a transition precisely after w = 1 units. On the other hand, Theorem 10.2.3 demonstrates that the sojourn times in state m are exponential distributed with mean t1m . After, on average t1m time units, a transition from state m to another state occurs. In contrast to discrete-time Markov chains, after the exponentially distributed sojourn time in state m the process makes a transition to other states l 6= m. Let us investigate this fact in more detail. Let us denote Ylm (k) = Pr [[(k) = m|[(k) 6= l> [(0) = l] which describes the probability that, if a transition occurs, the process moves from state l to a dierent state m 6= l. Using the definition of conditional probability (2.44), Ylm (k) = Slm (k) Pr [{[(k) = m} _ {[(k) 6= l} |[(0) = l] = Pr [[(k) 6= l|[(0) = l] 1 Sll (k) In the limit k & 0, we have Slm (k) Ylm = lim Ylm (k) = lim 13Sk (k) = k0 k0 ll k tlm tl P By (10.7), we see that m=1>m6=l Ylm = 1, demonstrating that, given a transition, it is a transition out of state l to another state m. The quantities Ylm correspond to the transition probabilities of the embedded Markov chain. 10.4 The embedded Markov chain 189 Alternatively, we can write the rate tlm in terms of the transition probabilities Ylm of the embedded Markov chain as tlm = tl Ylm (10.21) Since tl is the rate (i.e. the number of transitions per unit time) of the process in state l, relation (10.21) shows that the transition rate tlm from state l to state m equals the rate of transitions in state l multiplied by the probability that a transition from state l to state m occurs. By definition, Yll = 0. For, if we assume that Ylm A 0, relation (10.21) would result in tll = Yll tl A 0 which contradicts the definition tll = tl . Hence, in the embedded Markov chain specified by the transition probability matrix Y , there are no self-transitions (Yll = 0), which is equivalent to the fact that the sum of the eigenvalues of Y is zero (A.7), since trace(Y ) = 0. From the steady-state equation or balance equation (10.20), (10.21) and Yll = 0, we observe that l tl = Q X m tm Yml m=1 On the other hand, the embedded Markov chain has a steady-state vector y that obeys (9.22) or (9.23) yl = Q X ym Yml m=1 and kyk1 = 1. The relations between the steady-state vectors of the continuoustime Markov chain and of its corresponding embedded discrete-time Markov chain y, are l tl (10.22) yl = PQ m=1 l tl yl @tl l = PQ m=1 ym @tm (10.23) The classification in the discrete-time case into transient and recurrent can be transferred via the embedded Markov chain to continuous Markov processes. 10.4.1 Uniformization The restriction Yll = 0 or tll = 0, which means that there are no selftransitions from a state into itself, can be removed. Indeed, we can rewrite 190 Continuous-time Markov chains the basic relation (10.12) between the transition probability matrix S (w) and the infinitesimal generator T for all as µ ¶¸ ¶¸ µ T T = h3w exp w L + S (w) = exp Lw + w L + Defining W () = L + T and maxl tl , a description, alternative to (10.15), (10.16) and (10.17), appears Slm (w) = h3w " X (w)n n=0 n! Wlmn () (10.24) where W () is a stationary transition probability matrix and, hence, a stochastic matrix. We also observe that W () = T + L can be regarded as a rate matrix, with the property that, for each state l, Q X Wlm () = m=1 Q X m=1 Tlm + Q X lm = m=1 the transition rate in any state l is precisely the same, equal to . Whereas the embedded Markov chain defined by (10.21) has no self-transitions (Yll = P tl 0), we see for any l and m, that Wll () = 1 1 Q m=1;m6=l tlm = 1 0 while t Wlm () = lm . Hence, W () can be interpreted as an embedded Markov chain that allows self-transitions. In view of (10.21), the embedded structure of W () is summarized as tlm = Wlm () for l 6= m tll = 1 Wll () where the constant rate tl = for any state l is, besides self-transitions tll 6= 0, the characterizing property. These properties also reveal that, starting from the embedded chain Y where Yll = 0, we can add self-transitions tll A 0 with the eect that, on (10.7), the transition rate tl $ tl + tll . The opposite figure illustrates an embedded Markov chain with self-transitions and the corresponding transition probability matrix, where the transition P t rates tl follow from 6m=1 Ylm = 1 with Ylm = tlml . This change in transition rate will change the steady-state vector since the balance equations (10.20) change. However, the Markov process {[(w)> w 0} is not modified because a self-transition does not change [(w) nor the distribution of the time until the next transition to a dierent state. But self-transitions clearly change the number of transitions during some period of time. When the transition 10.4 The embedded Markov chain 191 rate tm at each state m are the same, the embedded Markov chain W () is called a uniformized chain. 5 6 t12 0 0 0 0 0 q22 t1 9 : t22 q12 q46 0 tt242 0 0 : 9 0 1 t 2 9 : q61 q24 2 9 0 tt32 tt33 0 0 tt363 : 4 3 3 9 : Y = q51 q32 9 0 0 tt434 0 0 tt464 : q65 9 : 6 q52 q43 9 t51 t52 t53 0 t56 : 0 7 3 t5 t5 t5 t5 8 5 q53 q56 t61 t65 0 0 0 0 t6 t6 q36 q33 In addition, in a uniformized chain, the steady-state vector w() of W () is the same as the steady-state vector . Indeed, from (9.23), wm () = Q X Wnm ()wn () n=1 we have, with W () = L + T , ¶ Q µ X tnm wm () = wn () nm + n=1 1X wn ()tnm Q = wm () + n=1 or, wn ()tn = Q X wn ()tnm n=1;n6=m where wn () = n (independent of ) since it satisfies the balance equation (10.20) and Theorem 9.3.5 assures that the steady-state of a positive recurrent chain is unique. We will now interpret (10.24) probabilistically. Let Q(w) denote the total number of transitions in [0> w] in the uniformized (discrete) process {[n ()}. Since the transition rates tl = are all the same, Q (w) is a Poisson process with rate because, for any continuous-time Markov chain, the inter-transition or sojourn times are i.i.d. exponential random variables. n is recognized as the probability that the Thus, Pr [Q (w) = n] = h3w (w) n! number of transitions that occur in [0> w] in the uniformized Markov chain with rate equals n. With (9.10), Wlmn () = Pr [[n () = m|[0 () = l] is the n-step transition probability of that discrete {[n ()} uniformized Markov 192 Continuous-time Markov chains process. Relation (10.24) can be interpreted as Slm (w) = " X Pr [[n () = m|[0 () = l> Q (w) = n] Pr [Q (w) = n] n=0 or, the probability that the continuous Markov process moves from state l to state m in a time interval of length w, can be decomposed in an infinite sum of probabilities. Each probability corresponds to a transition from state l to state m in n-steps, where the number of intermediate transitions n is a Poisson counting process with rate . 10.4.2 A sampled-time Markov chain The sampled-time Markov chain approximates the continuous Markov process in that the transition probabilities Slm (w) are expanded to first order as in (10.8) with fixed step k = w. The transition probabilities of the sampledtime Markov chain are Slm = tlm w (l 6= m) Sll = 1 tl w Clearly, the sampled-time Markov chain also allows self-transitions, as illustrated in Fig. 10.1. 1 q12't q12 1 q51 2 q52 5 q23 3 q53 q34 4 q45 Continuous-time Markov process 1 1 q23't q12't 2 q23't q53't q51't 1 q34't 3 q34't q52't 4 1 q45't q45't 1 (q51 + q52 + q53 't Sampled-time Markov chain 5 Fig. 10.1. A continuous-time Markov process and its corresponding sampled-time Markov chain. From (10.8), we observe that the approximation lies in two facts: (a) w is fixed such that tlm w Pr [[(w + w) = m|[(w) = l] is increasingly accurate as w $ 0 and (b) transitions occur at discrete times every w time units. The sampling step w should be chosen such that the transition probabilities obey 0 Slm 1, from which we find that w max1l tl . 10.5 The transitions in a continuous-time Markov chain 193 Let y denote the steady-state vector of the sampled-time Markov chain with kyk1 = 1. Being a discrete Markov chain, the steady-state vector components ym satisfy (9.23) for each component m, ym = Q X Q X Snm yn = w n=1 tnm yn + (1 tm w) ym n=1;n6=m or tm ym = Q X tnm yn n=1;n6=m By comparing with the balance equation (10.20) and on the uniqueness of the steady-state (Theorem 9.3.5), we observe that y = or, the steady-state of the sampled-time Markov chain is exactly (not approximately) equal to the steady-state of the continuous Markov chain for any sampling step w max1l tl . Although we can possibly miss by sampling every w time units the smaller-scale dynamics of the continuous Markov chain, the long-run behavior or steady-state is exactly captured! 10.5 The transitions in a continuous-time Markov chain Based on the embedded Markov chain, there exists a framework that deduces all properties of the continuous Markov chain. In particular, the exponential sojourn times of a continuous-time Markov (Theorem 10.2.3) chain is postulated as a defining characteristic. Theorem 10.5.1 Let Ylm denote the transition probabilities of the embedded Markov chain and tlm the rates of the infinitesimal generator. The transition probabilities of the corresponding continuous-time Markov chain are found as Z w X Slm (w) = lm h3tl w + tl Yln h3tl x Snm (w x)gx (10.25) n6=l 0 Proof: If l is an absorbing state (tl = 0), then, by definition, Slm (w) = lm for all w 0. For a non-absorbing state l and a process starting from state l, the event { w> [( ) = n} _ {[(w) = m} is possible if and only if the first transition from l to n occurs at some time x 5 [0> w] and the next transition from n to m takes place in the remaining time w x. The probability density g Pr [l w] = tl h3tl w for w 0 and function of the sojourn time is il (w) = gw 194 Continuous-time Markov chains for infinitesimally small , we have s = Pr [ w> [( ) = n> [(w) = m|[(0) = l] Z w gx Pr [ = x> [(x ) = l|[(0) = l] = 0 × Pr [[(x) = n|[(x ) = l] Pr [[(w) = m|[(x) = n] Z w gxil (x)Yln Snm (w x) = 0 Z w tl h3tl x Snm (w x)gx = Yln 0 Furthermore, Pr[ w and [(w) = m|[(0) = l] = X Pr[w > [( ) = n> [(w) = m|[(0) = l] n6=l and Pr [ A w and [(w) = m|[(0) = l] = lm Pr [l A w] = lm h3tl w Finally, Slm (w) = Pr [[(w) = m|[(0) = l] = Pr[ w and [(w) = m|[(0) = l] + Pr[ A w and [(w) = m|[(0) = l] Combining all above relations into the last one proves the theorem. By a change of variable v = w x in (10.25), we have Slm (w) = lm h3tl w + tl X Yln h3tl w Z w htl v Snm (v)gv 0 n6=l and, after dierentiation with respect to w, we find for w 0, Z w X X tn Yln h3tl w htl v Snm (v)gv + tl Yln Snm (w) Slm0 (w) = tl lm h3tl w tl n6=l 0 n6=l X ¡ ¢ = tl lm h3tl w tl Slm (w) lm h3tl w + tl Yln Snm (w) = tl Slm (w) + tl X n6=l n6=l Yln Snm (w) ¤ 10.6 Example: the two-state Markov chain in continuous-time 195 Evaluated at w = 0, recalling that S 0 (0) = T and S (0) = L, X Yln Snm (0) Slm0 (0) = tl Slm (0) + tl tlm = tl lm + tl X n6=l Yln nm = tl lm + tl Ylm n6=l which is precisely relation (10.21). With tl = tll and (10.21), we arrive at Slm0 (w) = Q X tln Snm (w) n=1 which is precisely the backward equation (10.10). Hence, (10.25) can be interpreted as an integrated form of the backward equation and thus of the entire continuous-time Markov process. 10.6 Example: the two-state Markov chain in continuous-time The continuous-time two-state Markov chain is defined by the infinitesimal generator ¸ T= where > 0. We will solve S (w) from the forward equation (10.9), ¸ 0 ¸ ¸ 0 (w) S11 (w) S12 S11 (w) S12 (w) = 0 (w) S 0 (w) S21 S21 (w) S22 (w) 22 which actually contains two independent transition probabilities because S12 (w) = 1 S11 (w) and S21 (w) = 1 S22 (w). The forward equation simplifies to 0 (w) = ( + )S11 (w) + S11 0 (w) = ( + )S22 (w) + S22 Only the first equation needs to be solved since, by symmetry, the solution of S11 (w) equals that of S22 (w) after changing the role of $ and $ . The linear, first-order, non-homogeneous dierential equation consists of the solution to the corresponding homogeneous dierential equation and a particular solution. The solution of the homogeneous dierential equation, 0 (w) = ( + )S (w), is S (w) = Fh3(+)w . The particular solution S11 11 11 is generally found by variation of the constant F, which proposes S11 (w) = 196 Continuous-time Markov chains F(w)h3(+)w as general solution, where F(w) needs to satisfy the original dierential equation. Hence, F 0 (w) = h(+)w (+)w or, after integration, F(w) = + h + f. The integration constant f follows from the initial condition (10.5), S11 (0) = 1. Finally, we arrive at 3(+)w + h + + 3(+)w S22 (w) = + h + + S11 (w) = from which the steady-state vector is immediate, i h = + + 10.7 Time reversibility In this section, we consider only ergodic Markov chains that have a non-zero steady-state distribution . Suppose the Markov process operates already in the steady-state, or, in other words, the Markov process is stationary. We are interested in the time-reversed process defined by the sequence [q > [q31 > = = = We will show that this reversed time sequence again constitutes a Markov process. Theorem 10.7.1 The time-reversed Markov process is a Markov chain. Proof: It su!ces to demonstrate that the time-reversed process satisfies the Markov property Pr [[q = {q |[q+1 = {q+1 > = = = > [q+n = {q+n ] = Pr [[q = {q |[q+1 = {q+1 ] By definition of the conditional probability (2.44), U = Pr [[q = {q |[q+1 = {q+1 > [q+2 = {q+2 > = = = > [q+n = {q+n ] ¤ £ = Pr [q = {q | _np=1 {[q+p = {q+p } ¤ £ Pr _np=0 {[q+p = {q+p } £ ¤ = Pr _np=1 {[q+p = {q+p } Since the intersection is commutative D _ E = E _ D, the indices can be reversed, ¤ £ Pr _0p=n {[q+p = {q+p } U= Pr [_1p=n {[q+p = {q+p }] ¤ £ ¤ £ Pr [q+n = {q+n | _0p=n1 {[q+p = {q+p } Pr _0p=n1 {[q+p = {q+p } = Pr [_1p=n {[q+p = {q+p }] 10.7 Time reversibility 197 The original stationary process is a Markov process that satisfies (9.2). Using (9.2) and (9.3) we have ¤ £ Pr [q+n = {q+n | _0p=n1 {[q+p = {q+p } = Pr[[q+n = {q+n |[q+n1 = {q+n1] and n1 Y ¤ £ Pr[[q+p = {q+p |[q+p1 = {q+p1 ] Pr _0p=n1 {[q+p = {q+p } = Pr[[q = {q ] p=1 and, similarly, n Y ¤ £ Pr[[q+p = {q+p |[q+p1 = {q+p1 ] Pr _1p=n {[q+p = {q+p } = Pr[[q+1 = {q+1 ] p=2 Hence, Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ] Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 ] Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ] = Pr [[q+1 = {q+1 ] U= Applying Bayes’ rule (2.48) to the last relation finally proves the theorem. ¤ Consider the transition probability of the time-reversed Markov process Ulm = Pr [[q = m|[q+1 = l] With Bayes’ rule (2.48), Pr [[q = m|[q+1 = l] = Pr [[q+1 = l|[q = m] Pr [[q = m] Pr [[q+1 = l] and, since the process is stationary, Pr [[q = m] = m > Pr [[q+1 = l] = l the transition probability of the time-reversed process is Ulm = m Sml l (10.26) A Markov chain is said to be time reversible if, for all l and m, Slm = Ulm . From (10.26), the condition for time reversibility is l Slm = m Sml (10.27) This condition means that, for all states l and m, the rate l Slm from state l $ m equals the rate m Sml from state m $ l. An interesting property of time reversible Markov chains is that any vector { satisfying k{k1 = 1 and 198 Continuous-time Markov chains {l Slm = {m Sml is a steady-state vector of a time-reversible Markov chain. Indeed, summing over all l, X X {l Slm = {m Sml = {m l l Theorem 9.3.5 indicates that the steady-state is unique and, thus, { = . As a side remark, we note that a transition matrix is only equal to its transpose S = S W if the Markov process is time reversible and doubly stochastic (i.e. l = Q1 for all l, as shown in Appendix A.5.1). The continuous-time analogon can be immediately deduced from the discrete-time embedded Markov chain defined by the transition probabilities Ylm . Let Xlm denote the transition probabilities of the time-reversed embedded Markov chain and ulm the rates of the corresponding continuous Markov chain, then by (10.21) ulm = ul Xlm (10.28) We will now show that the rates ul of the time-reversed continuous Markov process are indeed exponential random variables. Assume that the timereversed process is in state l at time w. The probability that the process is still in state l at reversed time w x is, using Theorem 10.2.3, Pr [[( ) = l> w x w] Pr [[(w) = l] Pr [[(w x) = l] h3tm w = h3tm w = Pr [[(w) = l] Pr [[( ) = l> w x w|[(w) = l] = because, in steady-state w $ 4, Pr [[(w x) = l] = Pr [[(w) = l] = l for any finite x. Thus, the sojourn time in state l of the time-reversed process is exponentially distributed with precisely the same rate ul = tl as the forward time process. The steady-state vector y of the embedded Markov chain can be written in terms of the steady-state vector of the continuous Markov chain via (10.22). By (10.26), we obtain Xlm = ym Yml m tm Yml = yl l tl With (10.21) and (10.28) l ulm m tml = ul tl but, since ul = tl , we finally arrive at l ulm = m tml (10.29) 10.8 Problems 199 Comparing (10.29) with the discrete case (10.26), we see that the transition probabilities Slm and Ulm are changed for the rates tlm and ulm . We know that m is the portion of time the process (both forward and reversed) spend in state m and that tlm is the rate at which the process makes transitions from state l to state m. Equation (10.29) has again a balance interpretation: m tml is the rate at which the forward process moves from state m to l, while l ulm is the rate of the time-reversed process from state l to m and both rates are equal. Intuitively, when a process jumps from state l $ m in forward time, it is plain that the process makes, in reversed time, just the opposite transition from m $ l. Similarly as above, a continuous-time Markov chain is time reversible if, for all l and m, it holds that ulm = tlm . For these processes (which occur often in practice, as demonstrated in the chapters on queueing), the rate from l $ m is equal to the rate from m $ l since l tlm = m tml . 10.8 Problems (i) Consider a computer that has two identical and independent processors. The time between failures has an exponential distribution. The mean value of this distribution is 1000 hours. The repair time for a damaged processor is exponentially distributed as well, with a mean value of 100 hours. We assume damaged processors can be repaired in parallel. There are clearly three states for this computer: (1) both processors work, (2) one processor is damaged and (3) both processors are damaged. (a) Make a continuous Markov chain presentation of these states. (b) What is the infinitesimal generator matrix T for this Markov chain? Give the relation between the state probability at time w and its derivative. (c) Calculate the steady-state of this process. (d) What is the availability of the computer if (i) both processors are required to work, or (ii) at least one processor should work. (ii) Consider two identical servers that are working in parallel. When one server fails, the other has to do the whole job alone under a higher load. The failure times of servers are exponentially distributed: H = 3 × 1034 k31 , when the servers are equally loaded and F = 7 × 1034 k31 , when one of the servers works under the “full load”. In addition, both servers may fail at the same time with a failure rate of E = 6 × 1035 k31 . As soon as one of the servers fails, the repair is initiated. The 200 Continuous-time Markov chains average downtime of a server is 31 = 10 hours. However, if both servers are damaged, the whole system must be shut down. The average time needed to repair both damaged servers is 31 B = 20 hours. (a) Draw the Markov chain for this system. (b) Determine the infinitesimal generator matrix T. (c) Determine the steady-state probabilities. (d) Determine the average lifetime of dierent states. (e) What is the average number of server repairs needed during a period of one year? 11 Applications of Markov chains This chapter illustrates the theory of Markov chains with several examples. Examples of queueing problems are deferred to later chapters. Generally, Markov processes can be solved explicitly provided the transition probability matrix S or the infinitesimal generator T has a special structure. Only in a very small number of problems is the entire time dependence of the process available in analytic form. 11.1 Discrete Markov chains and independent random variables This section illustrates examples of some simple Markov chains. Consider a set {\q }qD1 of positive integer, independent random variables that are identically distributed with Pr [\ = n] = dn . The discrete-time Markov process, defined by [q = \q for q 1, possesses the (infinite) transition probability matrix, 6 5 d1 d2 d3 d4 · · · 9 d1 d2 d3 d4 · · · : : 9 : S =9 9 d1 d2 d3 d4 · · · : 7 d1 d2 d3 d4 · · · 8 ··· ··· ··· ··· ··· All rows are identical and Pr [[q+1 = m|[q = l] = dm shows that the states [q+1 and [q are independent from each other. Another, more interesting, discrete-time Markov process is defined by [q = max [\1 > \2 > \3 > = = = > \q ] = Hence, the process [q mirrors the maxima of the first q random variables. Clearly, [q+1 = max [[q > \q+1 ] reflects the Markov property: the next state is only dependent on the previous state of the process. From [q+1 = 201 202 Applications of Markov chains max [[q > \q+1 ], we observe that Pr [[q+1 = m|[q = l] = 0 if m ? l because the maximum does not decrease by adding a new random variable in the list. If m A l, the state m is determined by \q+1 = m with probability dm . If m = l, then \q+1 m, which has probability Pr [\q+1 m] = m X Pr [\q+1 = n] = n=1 m X dn = Dm n=1 The corresponding probability matrix is 5 D1 d2 d3 9 0 D2 d3 9 S =9 0 D3 9 0 7 0 0 0 ··· ··· ··· 6 d4 · · · d4 · · · : : d4 · · · : : D4 · · · 8 ··· ··· P A related discrete-time Markov chain is [q = qn=1 \n which obeys [q+1 = [q + \q+1 . Furthermore, if m l, Pr [[q+1 = m|[q = l] = 0 because the random variables \q are non-negative such that the sum cannot decrease by adding a new member. If m A l, then Pr [[q+1 = m|[q = l] = Pr [[q + \q+1 = m|[q = l] = Pr [\q+1 = m l] = dm3l The corresponding probability matrix has a Toeplitz structure possessing the same elements on diagonal lines 5 6 0 d1 d2 d3 · · · 9 0 0 d1 d2 · · · : 9 : 9 S =9 0 0 0 d1 · · · : : 7 0 0 0 0 ··· 8 ··· ··· ··· ··· ··· This list can be extended by considering other integer functions of the set Q {\q }qD1 (such as [q+1 = min [[q > \q+1 ] or [q = qn=1 \n etc.). 11.2 The general random walk The general random walk is an important model that describes the motion of an item that is constrained to moving either one step forwards, stay at the position where it currently is or move one step backwards. In general, this “three-possibility motion” has transition probabilities that depend on the position m as depicted in Fig. 11.1. Figure 11.1 illustrates that, if the process is in state m, it has three possible choices: remain in state m with probability 11.2 The general random walk 203 um = Pr [[n+1 = m|[n = m], move to the next state m + 1 with probability sm = Pr [[n+1 = m + 1|[n = m] or jump back to state m 1 with probability tm = Pr [[n+1 = m 1|[n = m]. A general random walk is defined by the (Q + 1) × (Q + 1) band matrix 5 u0 s0 0 0 9 t1 u1 s1 0 9 9 0 t2 u2 s2 9 S =9 . .. .. .. 9 .. . . . 9 7 0 0 0 0 0 0 0 0 ··· ··· ··· .. . ··· ··· 0 0 0 .. . 0 0 0 .. . tQ31 uQ31 0 tQ 6 0 0 0 .. . : : : : : : : sQ31 8 uQ (11.1) where sm A 0, tm A 0, um 0 and tm + um + sm = 1 for all 0 m Q . The bordering states zero and Q are special: s0 0, u0 0 and s0 + u0 = 1 and tQ 0, uQ 0 and tQ + uQ = 1. rj 0 ... j1 pj j j+1 ... N qj Fig. 11.1. A transition graph of the general random walk. The general random walk serves as model for a number of practical phenomena: • The one-dimensional motion of physical particles, electrons that hop from one atom to another. In this case, the number of states Q can be very large. • The gambler’s ruin problem: a state m reflects the capital of a gambler whereas the sm is the chance that the gambler wins while tm is the probability that he looses. The gambler achieves his target when he reaches state Q , but he is ruined at state 0. In that case are both states absorbing states with u0 = uQ = 1. In games, most often the probabilities are independent of the state and simplify to sm = s, tm = t and um = 1 s t= • The continuous-time counterpart, the birth and death process (Section 11.3), has applications to queueing processes. For a wealth of examples and applications of the random walk, we refer to the classical treatise of Feller (1970, Chapter III, XIV). 204 Applications of Markov chains 11.2.1 The probability of gambler’s ruin The probability of gambler’s ruin is defined as xm = Pr [[W = 0|[0 = m] where W = minn {[n = 0} is the hitting time to state 0, which is equivalent to xm = Pr [W0 ? 4|[0 = m]. By definition, x0 = 1 and since the gambler achieves his goal at state Q , he stops and never gets ruined, xQ = 0. The law of total probability (2.46) gives the situation after the first transition, Pr [[W = 0|[0 = m] = = Q X n=0 Q X Pr [[W = 0|[0 = m> [1 = n] Pr [[1 = n|[0 = m] Pr [[W = 0|[1 = n] Pr [[1 = n|[0 = m] n=0 (Markov property) = Q X Smn Pr [[W = 0|[1 = n] n=0 = tm Pr [[W = 0|[1 = m 1] + um Pr [[W = 0|[1 = m] + sm Pr [[W = 0|[1 = m + 1] After the first transition, the probability Pr [[W = 0|[1 = m] = xm remains the same as the initial Pr [[W = 0|[0 = m] because W is a random variable depending on the state and not on the discrete-time. Hence, we obtain the equations x0 = 1 xm = tm xm31 + um xm + sm xm+1 (1 m ? Q ) which are dierent from the corresponding steady-state equations (11.5). The dierence lies in left-multiplication of S , yielding x = S x instead of right-multiplication in = S . Substituting um = 1 sm tm gives after some modification µ ¶ tm tm (11.2) xm+1 = xm31 + 1 + xm sm sm Iteration on m for the first few values using x0 = 1 yields µ ¶ t1 t1 x2 = + 1 + x1 s1 s1 µ µ ¶ ¶ µ ¶ t2 t2 t1 t1 t2 t1 t2 t1 x3 = x1 + 1 + + x2 = + 1+ x1 s2 s2 s1 s1 s2 s1 s1 s2 11.2 The general random walk 205 which suggests xm = m31 Y n X tp n=1 s p=1 p à + 1+ m31 Y n X tp n=1 s p=1 p ! x1 as readily verified by substitution in (11.2). The unknown x1 is determined by the last relation, xQ = 0. Finally, the probability of gambler’s ruin is PQ31 Qn tp Pr [[W = 0|[0 = m] = n=m 1+ p=1 sp tp n=1 p=1 sp PQ31 Qn (11.3) Similarly, the mean hitting time m = H [W |[0 = m] follows by reasoning on the possible transitions. From state m, there is a transition to state m 1 with probability tm . In case of a transition from state m to state m 1, the hitting time W consists of the first transition plus the remaining time from state m 1 on which is m31 . Using the law of total probability or reading all possible transitions from the transition graph (see Fig. 11.1), we find m = 1 + tm m31 + um m + sm m+1 with the boundary equations for the state 0 and state Q (which in the gambler’s ruin problem are absorbing states), 0 = Q = 0. With um = 1 sm tm , µ ¶ tm tm 1 m+1 = m31 + 1 + m sm sm sm By iteration, ¶ µ 1 t1 1 + 1+ s1 s1 ¶ µ ¶ µ ¶ µ t2 t1 1 t2 t2 1 1 t1 t2 2 = 1+ + 1+ 1 3 = 1 + 1 + + s2 s2 s2 s1 s2 s1 s1 s1 s2 ¶ µ 1 t3 t3 3 4 = 2 + 1 + s3 s3 s3 µ ¶ µ ¶ µ ¶ t2 1 t3 t1 1 1 t2 t3 t1 t2 t1 t2 t3 1+ 1+ + 1+ 1 = + + + s1 s2 s1 s3 s2 s1 s2 s1 s1 s2 s1 s2 s3 2 = or m = m31 X 1 n=1 sn à 1+ n31 Y q X tn3p+1 q=1 p=1 sn3p ! à + 1+ m31 Y n X tp n=1 p=1 sp ! 1 Eliminating 1 from Q = 0, finally leads to mean hitting time to ruin or 206 Applications of Markov chains the mean duration of the game à ! m31 q n31 Y X X 1 tn3p+1 m = 1+ sn sn3p q=1 p=1 n=1 ´ Pn31 Qq ! PQ31 1 ³ à tn3p+1 m31 Y n 1 + X n=1 sn q=1 p=1 sn3p tp + 1+ PQ31 Qn s 1 + n=1 p=1 tp p=1 p (11.4) sp n=1 In spite of the relatively simple dierence equations, the solution rapidly grows unattractive. The particular case of the Markov chain where tn = t and sn = s simplifies considerably. The probability of gambler’s ruin (11.3) becomes ³ ´m ³ ´Q PQ31 ³ t ´n t st n=m s s = Pr [[W = 0|[0 = m] = ³ ´Q PQ31 ³ t ´n t 1 n=0 s s and, if s = t = 12 , via de l’Hospital’s rule, Pr [[W = 0|[0 = m] = Q3m Q . If the target fortune Q (at which the game ends) is infinitely large, Pr [[W = 0|[0 = m] = 1 µ ¶m t = s if t s if t ? s which demonstrates that the gambler surely will loose all his money if his chances s on winning are smaller than those t on losing. Even in a fair game where s = t, he will be defeated surely. In a favorable game (s A t) ³ ´m and with start capital m, ruin is possible with probability st . Another interpretation is a game with two players d and e in which player d starts with capital m and has winning chance of s, while player e starts with capital Q m and wins with probability t = 1 s. Similarly, the mean duration of the game (11.4) simplifies to 3 ³ ´n 4 3 ³ ´m 4 ³ ´n t t t m31 Q31 1 1 1 XE s s s FX F E m = C D+C ³ ´Q D st st n=1 n=1 1 st or 5 H [W |[0 = m] = 3 ³ ´m 4 t s 6 1 9 E1 : F 7Q C ³ ´Q D m 8 st 1 st 11.2 The general random walk 207 11.2.2 The steady-state The steady-state equation (9.23) for the vector component m becomes, for 1 m ? Q, m = sm31 m31 + um m + tm+1 m+1 (11.5) and, for m = 0 and m = Q, 0 = u0 0 + t1 1 Q = sQ31 Q31 + uQ Q We rewrite these equations using um = 1tm sm , u0 = 1s0 and uQ = 1tQ as s0 0 = t1 1 (11.6) sm m = (sm31 m31 tm m ) + tm+1 m+1 (11.7) sQ31 Q31 = tQ Q Explicitly, for a few values of m, we observe that s1 1 = (s0 0 t1 1 ) + t2 2 = t2 2 s2 2 = (s1 1 t2 2 ) + t3 3 = t3 3 ··· or, in general, for all m, sm m = tm+1 m+1 s m By iteration of m+1 = tm+1 m starting at m = 0, we find Y sp sm sm31 s0 m+1 = · · · 0 = 0 tm+1 tm t1 tp+1 m p=0 The normalization kk1 = 1 yields a condition for 0 , 431 3 Q m31 X Y sp D 0 = C1 + t p=0 p+1 m=1 which determines the complete steady-state vector for the general random walk as Qm31 sp m = p=0 tp+1 1+ PQ Qm31 m=1 sp p=0 tp+1 (11.8) These relations remain valid even when the number of states Q tends to infinity provided the infinite sum converges. 208 Applications of Markov chains In the simple case where sn = s and tn = t, we obtain with = st , m = (1 ) m 1 Q+1 (11.9) 11.3 Birth and death process A birth and death process1 is defined by the infinitesimal generator matrix 5 0 0 0 0 0 (1 + 1 ) 1 0 0 0 (2 + 2 ) 2 0 0 2 (3 + 3 ) 3 0 0 3 (4 + 4 ) 4 0 0 4 .. .. .. .. .. . . . . . 0 9 1 9 0 9 T=9 9 0 9 0 7 .. . 0 0 0 0 0 .. . 6 ··· ··· : ··· : : ··· : : ··· : 8 .. . The transition graph is shown in Fig. 11.2. Although the theory in the previous chapter was derived for finite-state Markov chains, the birth and death process is a generalization to an infinite number of states. The general random walk (Section 11.2) forms the embedded Markov chain of the birth and death process with transition probabilities specified by (10.21) resulting l l , Yl>l+1 = l+ and Yln = 0 for n 6= l 1 6= l + 1. in Yl>l31 = l+ l l The transition probability matrix is a tri-band diagonal matrix which is irreducible if all m A 0 and m A 0. O0 Oj1 ... 0 j1 P1 Oj j Pj j+1 ... Pj+1 Fig. 11.2. The transition graph of a birth and death process. The basic system of dierential equations that completely describes the birth and death process follows from the general state probability equations (10.11) for vn (w) = Pr [[(w) = n] as v00 (w) = 0 v0 (w) + 1 v1 (w) (11.10) v0n (w) = (n + n ) vn (w) + n31 vn31 (w) + n+1 vn+1 (w) (11.11) with initial condition vn (0) = Pr [[(0) = n]. Exact analytic solutions for 1 Kleinrock (1975, p. ix) mentions that William Feller was the father of the birth and death process. 11.3 Birth and death process 209 any n and n are not possible. Indeed, let us denote the Laplace transform of vn (w) by Z " h3}w vn (w)gw= (11.12) Vn (}) = 0 Since vn (w) is a continuous and bounded function (|vn (w)| 1 for all w A 0), the Laplace transform exists for Re(}) A 0. The Laplace transform of (11.10) and (11.11) becomes, (0 + }) V0 (}) = v0 (0) + 1 V1 (}) (11.13) (n + n + }) Vn (}) = vn (0) + n31 Vn31 (}) + n+1 Vn+1 (}) (11.14) which is a set of dierence equations more complex due to the initial condition vn (0) than the set (A.51) in Appendix A.5.2.3. That set (A.51) appears in the general random walk whose solution is shown to be intractable in general. This infinite set of dierential equations has been thoroughly studied over years under several simplifying conditions for n and n , for example, n = and n = for all n. As shown in Chapter 13, they form the basis for the simplest set of queueing models of the family M/M/m/K. 11.3.1 The steady-state The steady-state follows from (10.19) as solution of the set 0 0 + 1 1 = 0 m31 m31 (m + m ) m + m+1 m+1 = 0 This set is identical to (11.6) and (11.7) provided sm and tm are changed for m and m . After this modification in (11.8), the steady-state of the birth and death process is 0 = 1 P" Qm31 p p=0 p+1 m=1 Qm31 p p=0 p+1 m = P" Qm31 p 1 + m=1 p=0 p+1 1+ (11.15) m1 (11.16) Theorem 9.3.4 states that an irreducible Markov chain with a finite number of states is necessarily recurrent. However, it is in general di!cult to decide whether an irreducible Markov chain with an infinite number of states is recurrent or transient. In case of the birth and death process, it is possible to determine when the process is transient or recurrent. The process is transient if and only if the embedded Markov chain (determined 210 Applications of Markov chains above) is transient. Section 9.2.3 discusses that, for a recurrent chain, ulm = Pr [Wm ? 4|[0 = l] equals 1: a finite hitting time means every state m is certainly visited starting from initial state l. Applied to the embedded Markov chain, it follows from the gambler’s ruin (11.3) that Pm31 Qn tp n=0 p=1 sp tp p=1 sp Pr [W0 ? 4|[0 = m] = 1 PQ31 Qn n=0 Thus, for any fixed initial state m, the condition for a recurrent chain Pr [Wm ? 4|[0 = l] = 1 P Qn tp is only possible in the limit Q $ 4 if limQ<" Q31 n=0 p=1 sp = 4. Transformed to the birth and death rates, the condition for recurrence becomes P" Qm31 p 2 = p=0 p = 4. Furthermore, we observe from (11.16) that the m=1 P Qm31 p infinite series 1 = " p=0 p+1 must converge to have a stationary or m=1 steady-state distribution. In summary, if 1 ? 4 and 2 = 4 the birth and death process is positive recurrent. If 1 = 4 and 2 = 4, it is null recurrent. If 2 ? 4, the birth and death process is transient. 11.3.2 A pure birth process A pure birth process is defined as a process {[(w)> w 0} for which in any state l it holds that l = 0. It follows from Fig. 11.2 that a birth process can only jump to higher states such that Slm (w) = 0 for m ? l. Similarly, in a pure death process {[(w)> w 0} all birth rates l = 0. 11.3.2.1 The Poisson process Let us first consider the simplest case where all birth rates are equal l = and where Smm (w) = Pr [m A w|[(0) = m] = h3w . Using either the back or forward equation or (10.25) with Yl>l+1 = l>l+1 , yields Z w 3w Slm (w) = lm h + h3x Sl+1>m (w x)gx 0 or, for m = l + n with n A 0 3w Z w Sl>l+n (w) = h hx Sl+1>l+n (x)gx 0 Explicitly, for n = 1, 3w Z w Sl>l+1 (w) = h 0 hx h3x gx = wh3w 11.3 Birth and death process 211 which is independent of l and, thus, it holds for any l 1. For n = 2, Z w Z w (w)2 3w 3w x 3w h h Sl+1>l+2 (x)gx = h hx xh3x gx = Sl>l+2 (w) = h 2 0 0 This suggests us to propose for any l 0, Sl>l+n (w) = (w)n 3w h n! (11.17) which is verified inductively as Z w Z w 3w x 3w Sl>l+n (w) = h h Sl+1>l+n (x)gx = h hx Sl>l+n31 (x)gx 0 = h3w Z w 0 hx 0 n31 n (x) (w) 3w h3x gx = h (n 1)! n! Hence, the transition probabilities of a pure birth process have a Poisson distribution (11.17) and are only function of the dierence in states n = m l 0 for any w 0. Moreover, for 0 x w, consider the increment [(w) [(x), X Pr [[(w) [(x) = n] = Pr [[(x) = l> [(w) = l + n] lD0 = X Pr [[(x) = l] Pr [[(w) = l + n|[(x) = l] lD0 = X Pr [[(x) = l] Sl>l+n (w x) lD0 ( (w x))n 3(w3x) X = Pr [[(x) = l] h n! lD0 Thus, the increment [(w) [(x) has a Poisson distribution, Pr [[(w) [(x) = n] = ( (w x))n 3(w3x) h n! (11.18) and, since [(0) = 0 and the increments are independent (Markov property), we conclude that the pure birth process is a Poisson process (Section 7.2). 11.3.2.2 The general birth process In case the birth rates n depend on the actual state n, the pure birth process can be regarded as the simplest generalization of the Poisson. The Laplace 212 Applications of Markov chains transform dierence equations (11.13) and (11.14) reduce to the set v0 (0) 0 + } n31 vn (0) Vn (}) = + Vn31 (}) n + } n + } V0 (}) = which, by the usual iteration, has the solution, Q n X vm (0) n31 p=m p Vn (}) = Qn p=m (p + }) m=0 (11.19) Q with the convention that ep=d i (p) = 1 if d A e. The validity of this general solution is verified by substitution into the dierence equation for Vn (}). The form of Vn (}) is a ratio that can always be transformed back to the time-domain provided that n is known. If all n A 0 are distinct, using (2.38) with f A 0, we find vn (w) = n X vm (0) m=0 n31 Y p=m 1 p 2l Z f+l" f3l" Qn h}w p=m (p + }) g} By closing the contour over the negative real plane (Re(}) ? 0), only simple poles at } = q are encountered, 1 2l Z f+l" f3l" h}w Qn p=m (p + }) g} = n X Qn q=m h3q w p=m;p6=q (p q ) resulting in vn (w) = n X m=0 vm (0) n X q=m h3q w Qn Qn31 p=m p p=m;p6=q (p q ) (11.20) If some n = m , multiple poles occur and a slightly more complex result appears that still can be computed in exact analytic form. 11.3.2.3 The Yule process A classical example of a process with distinct birth rates is the Yule process, where n = n. In that case, (11.20) can be simplified. With Qn31 (n 1)! p=m p = Qq31 Q Qn (m 1)! p=m (p q) np=q+1 (p q) p=m;p6=q (p q ) 11.3 Birth and death process 213 Qq31 Q q313m (q m)! and and with p=m (p q) = (1)q313m q3m o=1 o = (1) Qn Qn3q p=q+1 (p q) = o=1 o = (n q)! we find Qn31 (1)q313m (n 1)! p=m p = Qn (m 1)!(q m)!(n q)! p=m;p6=q (p q ) such that n X Qn31 n3m (n 1)! X h3(q+m)w (1)q31 = Qn (m 1)! q=0 q!(n m q)! p=m;p6=q (p q ) q=m ¶ n3m µ (n 1)!h3mw X n m ³ 3w ´q = h (n m)!(m 1)! q=0 q µ ¶ ´n3m n 1 3mw ³ = h 1 h3w m 1 h3q w p=m p Finally, for the Yule process, we obtain from (11.20) the evolution of the state probabilities over time µ ¶ n ´n3m X n 1 3mw ³ vn (w) = vm (0) h (11.21) 1 h3w m1 m=0 In practice, vm (0) = mq if the process starts from state q (implying vn (w) = 0 for n ? q because the process moves to the right for w 0) and the general form simplifies to µ ¶ ´n3q n 1 3qw ³ h (11.22) vn (w) = 1 h3w q1 The Yule process has been used as a simple model for the evolution of a population in which each individual gives birth at exponential rate and [(w) denotes the number of individuals in the population (that never decreases as there are no deaths) as a function of time w. At each state n the population has precisely n individuals that each generate births such that n = n, the birth rate of the population. If the population starts at w = 0 with one individual q = 1, the evolution over time has the distribution ¡ ¢n31 , which is recognized from (3.5) as a geometric vn (w) = h3w 1 h3w distribution with mean hw . Since the sojourn times of a Markov process are i.i.d. exponential random variables, the average time Wn to reach n inPn 1 1 1 Pn dividuals from one ancestor equals H [Wn ] = m=1 m = m=1 m which is well approximated (Abramowitz and Stegun, 1968, Section 6.3.18) as H [Wn ] log(n+1)+ , where = 0.577 215. . . is Euler’s constant. If the 214 Applications of Markov chains population starts with q individuals, the distribution (11.22) at time w consists of a sum of q i.i.d. geometric random variables, which is a negative binomial distribution. The Yule process has been employed for example as a crude model to estimate the spread of a disease or epidemic and the split of molecules in new species by cosmic rays. 11.3.3 Constant rate birth and death process In a constant rate birth and death process, both the birth rate n = and death rate n = are constant for any state n. From (11.16), the steady-state for all states m with = ? 1, m = (1 ) m m0 (11.23) only depends on the ratio of birth over death rate. The time-dependent constant rate birth and death process can still be computed in analytic form. In this case, the matrix form of the infinitesimal generator T has the tri-band Toeplitz structure, which can be diagonalized in analytic form as shown in Appendix A.5.2.1. In this section, we present an alternative approach. Instead of dealing with an infinite set of dierence equations, a generating function approach seems more convenient. Let us denote the generating function of the Laplace transforms Vn (}) by *({> }) = " X Vn (}){n (11.24) n=0 Using (11.12) into (11.24) gives Z " " Z " " X X 3}w n h vn (w){ gw = h3}w vn (w){n gw *({> }) = n=0 0 0 n=0 where the reversal of summation and integration is allowed because all terms are positive. Since 0 vn (w) 1, the sum is at least convergent for |{| ? 1, which shows that *({> }) is analytic inside the unit circle |{| ? 1 for any Re(}) A 0. After multiplying (11.14) by {n and summing over all n, we obtain ( + }) V0 (}) + ( + + }) " X n=1 Vn (}){n = " X vn (0){n + n=0 + V1 (}) + " X Vn31 (}){n n=1 " X n=1 Vn+1 (}){n 11.3 Birth and death process 215 and, written in terms of *({> }), P n+1 + (1 {) V (}) " 0 n=0 vn (0){ *({> }) = {2 { ( + + }) + Note that in the general case n and n , an expression in terms of *({> }) is not possible. Before continuing with the computations, we make the additional simplification that vn (0) = nm : we assume that the constant rate birth and death process starts in state m. With this initial condition, the generating function *({> }) = (1 {) V0 (}) {m+1 {2 { ( + + }) + (11.25) still depends on the unknown function V0 (}). The following derivation involving the theory of complex functions demonstrates a standard procedure that will also be useful in other queuing problems. The denominator in (11.25) has two roots, q ++} 1 ( + + })2 4 + {1 = 2 2 q ++} 1 ( + + })2 4 {2 = 2 2 We need the powerful theorem of Rouché (Titchmarsh, 1964, p. 116) to deduce more on the location of {1 and {2 . Theorem 11.3.1 (Rouché) If i (}) and j(}) are analytic inside and on a closed contour C, and |j(})| ? |i (})| on C, then i (}) and i (}) + j(}) have the same number of zeros inside C. Choose i ({) = { ( + + }) and j({) = {2 such that i({) + j({) = {2 { ( + + }) + , the denominator in (11.25). Since both i ({) and j({) are polynomials, they are analytic everywhere in the complex {-plane. We know that *({> }) is analytic inside the unit disk. If the roots {1 or {2 lie inside the unit disk, the numerator in (11.25) must have zeros at precisely the same place in order for *({> }) to be analytic inside the unit disk. Hence, we consider as contour C in Rouché’s Theorem, the unit circle |{| = 1. Clearly, inside the unit circle (because A 0, A 0 i ({) has one single zero ++} and Re(}) A 0). Furthermore, on the unit circle |{| = 1, | { ( + + })| | |{| |( + + })|| = | |( + + })|| A = |{2 | which shows that |j(})| ? |i (})| on the unit circle. Rouché’s Theorem then tells us that i ({) + j({) has precisely one zero inside the unit circle. This 216 Applications of Markov chains implies that |{1 | A 1 and |{2 | ? 1 and that the numerator in (11.25) has a zero {2 , (1 {2 ) V0 (}) {m+1 =0 2 This relation determines the unknown function V0 (}) as V0 (}) = {m+1 2 (1 {2 ) such that (11.25) becomes 13{ m+1 {m+1 2 13{2 { (1 {) {m+1 (1 {2 ) {m+1 = 2 *({> }) = ({ {1 ) ({ {2 ) (1 {2 ) ({ {1 ) ({ {2 ) We know that the numerator can be divided by ({ {2 ), or explicitly, {m+1 (1 {) {m+1 (1 {2 ) = {m+1 {m+1 + {2 {({m {m2 ) 2 2 # " m m31 X m3n X = ({ {2 ) {2 {n + {2 { {m313n {n 2 " = ({ {2 ) n=0 n=0 {m2 + ({2 1) m X # n {m3n 2 { n=1 Finally, P n {m2 + ({2 1) mn=1 {m3n 2 { *({> }) = (1 {2 ) ({ {1 ) (11.26) By expanding the denominator in a Taylor series around { = 0, and denoting d0 = {m2 dn = ({2 1){m3n 2 dn = 0 we have 1 *({> }) = (1 {2 ) = 1 (1 {2 ) nAm Ã" X !à dn {n n=0 à n " X X n=0 dn3p {p 1 p=0 " X ! p {3p 1 { p=0 ! {n Comparing with (11.24) and equating the corresponding powers in {, we 11.3 Birth and death process 217 find an explicit form of the Laplace transforms of the probabilities that the birth and death process is in state n, n X dn3p 1 (1 {2 ) p=0 {p 1 à ! n31 p n X 1 { {m3n { 2 2 1n?m = 2 + (1 {2 ) {n1 p=0 {p 1 à ! m 13n 1 { { { 1 {m2 {m3n 1 2 1 1n?m = + 2 (1 {2 ) {n1 {1 {2 Vn (}) = This expression can be put in dierent forms by using relations among the zeros {1 and {2 , such as {1 + {2 = ++} and {1 {2 = . This ingenuity is required to recognize in Vn (}) a known Laplace transform. Otherwise, one has to proceed by computing the inverse Laplace transform by contour integration via (2.38). In any case, the computation needs advanced skills in complex function theory and we content ourselves here to present the result without derivation (see e.g. Cohen (1969, pp. 80—82)), i h (11.27) vn (w) = h3(+)w (n3m)@2 Ln3m (dw) + (n3m31)@2 Ln+m+1 (dw) 3(+)w +h n (1 ) " X 3p@2 Lp (dw) p=n+m+2 s where = , d = 2 and where Lv (}) denotes the modified Bessel function (Abramowitz and Stegun, 1968, Section 9.6.1). Using the asymptotic formulas for the modified Bessel function, the behavior of vn (w) for large w can be derived (see e.g. Cohen (1969, p. 84)), s 2 µ ¸ s ¶ s ¶µ (nm)@2 h(1 ) w 1 m + R(w ) n vn (w) = (1 ) + s s s ¡s ¢3@2 1 1 2 w ¤ 1 £ 1 + R(w1 ) only if = 1 =s w n This expression demonstrates that the constant rate birth death process ¡ s ¢2 converges to the steady-state (1 )n with a relaxation rate 1 . Clearly, the higher , the lower the relaxation rate and the slower the process tends to equilibrium as illustrated in Fig. 11.3. Intuitively, two eects play a role. Since the probability that states with large n are visited increases with increasing , the built-up time for this occupation will be larger. In addition, the variability of the number of visited states (further derived for the M/M/1 queue in Section 14.1) increases with increasing , which 218 Applications of Markov chains 0.10 U = 0.8 U = 0.9 U = 0.7 400 300 0.06 U = 0.6 200 0.04 100 0.02 U = 0.4 0.0 0.00 0.2 0.4 20 0.8 0 U U = 0.2 0 0.6 Relaxation time s4(t,U) 0.08 40 60 80 100 t Fig. 11.3. The probability v4 (w) that the process is in state 4 given that it started from state 0 as function of time (in units of average death s time, = 1) for various = . The insert shows the relaxation time = (1 )2 (in units of average death time, = 1). The corresponding steady state probability 4 are 0.0012, 0.015, 0.051, 0.072, 0.082, 0.065 for = 0=2> 0=4> 0=6> 0=7> 0=8> 0=9 respectively. Observe that for = 0=9, the plotted 100 time units are smaller than the relaxiation time, which is 379 time units. suggests that larger oscillations of the sample paths around the steady-state are likely to occur, enlarging the convergence time. 11.4 A random walk on a graph Let J(Q> O) denotes a graph with Q nodes and O links. Suppose that the link weight zlm = zml of an edge from node l $ m (or vice versa) is proportional to the transition probability Slm that a packet at node l decides to move to node m. Clearly, zll = 0. Specifically, with (9.8), zlm Slm = PQ n=1 zln This constraint (9.8) destroys the symmetry in link weight structure (zlm = P PQ zml ) because, in general, Slm 6= Sml since Q n=1 zln 6= n=1 zmn . The sequence of nodes (or links) visited by that packet resembles a random walk on the graph J(Q> O) and constitutes a Markov chain. Moreover, the steadystate of this Markov process is readily obtained by observing that the chain is 11.5 Slotted Aloha 219 time reversible. Indeed, the condition for time reversibility (10.27) becomes l zlm m zml = PQ PQ n=1 zln n=1 zmn or, since zlm = zml , l PQ n=1 zln PQ m = PQ n=1 zmn This implies that l = n=1 zln and using the normalization kk1 = 1, we obtain the steady-state probabilities for all nodes l, PQ PQ zln n=1 zln l = PQ PQ = PQ n=1 PQ 2 l=1 n=l+1 zln l=1 n=1 zln This Markov process can model an active packet that monitors the network by collecting state information (number of packets, number of lost or retransmitted packets, etc.) in each router. Of course, the link weight structure zlm for the active packet is decisive and requires additional information to be chosen e!ciently. For example, for tra!c monitoring, the distribution of the number of packets forwarded by each router must be obtained. For the collection of these data, the active packet should in steady-state visit all nodes about equally frequently or l = Q1 , implying that the Markov transition matrix S must be doubly stochastic (see Appendix A.5.1). 11.5 Slotted Aloha The Aloha protocol is a basic example of a multiple access communication scheme of which Ethernet2 is considered as the direct descendant. Aloha — which means “hello” in the Hawaiian language — was invented by Norman Abramson at the university of Hawaii in the beginning of 1970s to provide packet-switched radio communication between a central computer and various data terminals at the campus. Slotted Aloha is a discrete-time version of the pure Aloha protocol, where all transmitted packets have equal length and where each packet requires one timeslot for transmission. Consider a network consisting of Q nodes that can communicate with each other via a shared communication channel (e.g. a radio channel) using the slotted Aloha protocol. The simplest arrival process D of packets at each node is a Poisson process. We assume that these Poisson arrivals at a 2 The essential dierence with the Ethernet’s CSMA/CD (carrier sense multiple access with collision detection) is that Aloha does not use carrier sensing and does not stop transmitting when collisions are detected. Carrier sensing is only adequate if the nodes are near to each other (as in a local area network) such that collisions can be detected before the completion of transmission. Only then is a timely reaction possible. 220 Applications of Markov chains node are independent from the Poisson arrivals at another node and that all Poisson arrivals at a node have the same rate Q where is the overall arrival rate at the network of Q nodes. The idea of the Aloha protocol is that, upon receipt of a packet, the node transmits that newly arrived packet in the next timeslot. In case two nodes happen to transmit a packet at the same timeslot, a collision occurs, which results in a retransmission of the packets. A node with a packet that must be retransmitted is said to be backlogged. Even if new packets arrive at a backlogged node, the retransmitted packet is the first one to be transmitted and, for simplicity (to ignore queueing of packets at a node), we assume that those new packets are discarded. If backlogged nodes retransmit the packet in the next timeslot, surely a new collision would occur. Therefore, backlogged nodes wait for some random number of timeslots before retransmitting. We assume, for simplicity, that su is the probability (which is the same for all backlogged nodes) that a successful transmission occurs in the next time slot. Moreover, the probability su of retransmission is the same for each timeslot. The number of time slots between the occurrence of a collision and a successful transmission is a geometric random variable Wu (see Section 3.1.3) with parameter su such that Pr [Wu = n] = su (1 su )n31 . 11.5.1 The Markov chain The slotted Aloha protocol constitutes a discrete-time Markov chain [n 5 {0> 1> = = = > Q }, where a state m counts the number of backlogged nodes out of the Q nodes in total and the subscript n refers to the n-th timeslot. Each of the m backlogged nodes retransmits a packet in the next time slot with probability su , while each of the Q m unbacklogged nodes will transmit surely a packet in the next time slot provided a packet arrives in the current timeslot. The latter event (at least one arrival D) occurs with probability sd = Pr [D A 0] = 1 Pr [D = ¡0]. If¢ we assume that the arrival process is Poissonean, then sd = 1 exp Q , but the computations in this section are more generally valid. The probability that q backlogged nodes in state m retransmit in the next time slot is binomially distributed µ ¶ m q eq (m) = s (1 su )m3q q u and, similarly, the probability that q unbacklogged nodes in state m transmit 11.5 Slotted Aloha 221 in the next time slot is µ ¶ Q m q sd (1 sd )Q3m3q xq (m) = q A packet is transmitted successfully if and only if (a) one new arrival and no backlogged packet or (b) no new arrival and one backlogged packet is transmitted. The probability of successful transmission in state m and per time slot equals sv (m) = x1 (m)e0 (m) + x0 (m) e1 (m) The transition probability Sm>m+p equals ; A A ? 2pQ m xp (m) p=1 x1 (m) (1 e0 (m)) Sm>m+p = x (m) e (m) + x (m) (1 e (m)) p=0 A 0 0 1 A = 1 p = 1 x0 (m) e1 (m) The state with m backlogged nodes jumps to the state m 1 with one backlogged node less if no new packets are sent and there is precisely 1 successful retransmission. The state m remains in the state m if there is 1 new arrival and there are no retransmission or if there are no new retransmissions and none or more than 1 retransmission. The state m jumps to state m +1 if there is 1 new arrival from a non-backlogged node and at least 1 retransmission because then there are surely collisions and the number of backlogged nodes increases by 1. Finally, the state m jumps to state m + p if p new packets arrive from p dierent non-backlogged nodes, which always causes collisions irrespective of how many backlogged nodes also retransmit in the next time slot. The Markov chain is illustrated in Fig. 11.4, which shows that the state can only decrease by at most 1. P0i P01 0 1 P10 P00 2 P21 P11 P04 P03 P02 3 P32 P22 4 N P44 PNN P43 P33 Fig. 11.4. Graph of the Markov chain for slotted Aloha. Each state m counts the number of backlogged nodes. 222 Applications of Markov chains The transition probability matrix S has the structure 5 6 S00 S01 S02 ··· ··· S0Q 9 S10 S11 S12 ··· ··· S1Q : : 9 : 9 0 S21 S22 · · · · · · S 2Q : 9 S =9 . : .. .. .. .. .. 9 .. : . . . . . 9 : 7 0 0 · · · SQ31>Q32 SQ31;Q31 SQ31;Q 8 0 0 ··· 0 SQ;Q31 SQQ whose eigenstructure is computed in Appendix A.5.3. In the asymptotic regime when Q $ 4, slotted Aloha has the peculiar property that the steady-state vector does not exist. Although for a small number of nodes Q , the steady-state equations can be solved, when the number Q grows, Slotted Aloha turns out to be instable. It seems di!cult to prove that limQ<" = 0, but there is another argument that suggests the truth of this awkward Aloha property. The expected change in backlog per time slot is equivalent to H [[n+1 [n |[n = m] = (Q m) sd sv (m) (11.28) and equals the expected number of new arrivals minus the expected number of successful transmissions. This quantity H [[n+1 [n |[n = m] is often called the drift. If the drift is positive for all timeslots n, the Markov chain moves (on average) to higher or to the right in Fig. 11.4. Since ¢ ¡ states sv (m) 1 and sd = 1 exp Q , it follows that lim H [[n+1 [n |[n = m] = 4 Q<" Thus, the drift tends to infinity, which means that, on average, the number of backlogged nodes increases unboundedly and suggests (but does not prove, a counter example is given in problem (ii) of Section 9.4) that the Markov chain is transient for Q $ 4. A more detailed discussion and engineering approaches to cure this instability are found in Bertsekas and Gallager (1992, Chapter 4). The interest of the analysis of slotted Aloha lies in the fact that other types of multiple access protocols, such as the important class of carrier sense multiple access (CSMA) protocols, can be deduced in a similar manner. Of the CSMA class with collision detection, Ethernet is by far the most important because it is the basis of local area networks. Multiple access protocols of the CSMA/CD type are discussed in our book Data Communications Networking (Van Mieghem, 2004a). 11.5 Slotted Aloha 223 11.5.2 E!ciency of slotted Aloha and the oered tra!c J We now investigate the probability of a successful transmission in state m in more detail, sv (m) = x1 (m)e0 (m) + x0 (m) e1 (m) = (Q m) sd (1 sd )Q3m31 (1 su )m + msu (1 su )m31 (1 sd )Q3m ¸ (Q m) sd msu = + (1 sd )Q3m (1 su )m 1 sd 1 su For small arrival probability sd and small retransmission probability su , the probability of successful transmission in state m can be approximated by using the Taylor expansions of (1 {) = h ln(13{) = h3{ (1 + r (1)) and ¢ ¡ { 2 as 13{ = { + r { ¢¤ £ ¡ sv (m) = (Q m) sd + msu + r s2d + s2u h3[(Q3m)sd +msu ] (1 + r (1)) = [(Q m) sd + msu ] exp ( [(Q m) sd + msu ]) (1 + r (1)) Similarly, the probability that no packet is transmitted in state m equals sqr (m) = x0 (m) e0 (m) = (1 su )m (1 sd )Q3m = exp ( [(Q m) sd + msu ]) (1 + r (1)) Hence, for small sd and small su , the probability of successful transmission and of no transmission in state m is well approximated by sv (m) ' w (m) h3w(m) sqr (m) ' h3w(m) Now, w (m) = (Q m) sd +msu is the expected number of arrivals and retransmissions in state m or, equivalently, the total rate of transmission attempts in state m. That total rate of transmissions in state m, w(m), is also called the oered tra!c J. The analysis shows that, for small sd and small su , sv (m) and sqr (m) are closely approximated in terms of a Poisson random variable with rate w (m). Moreover, the probability of successful transmission sv (m) can be interpreted as the departure rate from state m or the throughput VSAloha = Jh3J , which is maximized if J = w (m) = 1. By controlling su to achieve w (m) = (Q m) sd + msu = 1, slotted Aloha performs with highest throughput. The e!ciency SAloha of slotted Aloha with many nodes Q A 1 is defined as the maximum fraction of time during which packets are transmitted successfully which is max sv (m) = h31 . Hence, SAloha = 36%. Pure Aloha (Bertsekas and Gallager, 1992), where the nodes can start transmitting at arbitrary times instead of only at the beginning of time 224 Applications of Markov chains slots, only performs half as e!ciently as slotted Aloha with PAloha = 18%. Recall that each packet is assumed to have an equal length that corresponds with the length of one timeslot. In pure Aloha, a transmitted packet at time w is successful if no other packet is sent during (w 1> w + 1). This time interval is precisely equal to two timeslots in slotted Aloha which explains why PAloha = 12 SAloha . The same observation tells us that, in pure Aloha, sqr (m) ' h32w(m) because in the successful interval the expected number of arrivals and retransmissions is twice that in slotted Aloha. The throughput V roughly equals the total rate of transmission attempts J (which is the same as in slotted Aloha) multiplied by sqr (m) ' h32w(m) , hence, VPAloha = Jh32J . 11.6 Ranking of webpages To retrieve webpages related to a user’s query, current websearch engines first perform a search similar to that in text processors to find all webpages containing the query terms. Due to the massive size of the World Wide Web, this first action can result in a huge number of retrieved webpages. Several thousands of webpages related to a query are not uncommon. To reduce the list of webpages, many websearch engines apply a ranking criterion to sort this list. In this section, we discuss PageRank, the hyperlink-based ranking system used by the Google search engine. PageRank elegantly exploits the power of discrete Markov theory. 11.6.1 A Markov model of the web The hyperlink structure of the World Wide Web can be viewed as a directed graph with Q nodes. Each node in the webgraph represents a certain webpage and the directed edges represent hyperlinks. Let us consider a small collection of webpages as in Fig. 11.5 to illustrate the underlying idea of PageRank, invented by Brin and Page, the founders of Google. 2 3 1 P 5 4 (a) ª0 «0 « «0 « «0 «¬ P51 P12 0 0 0 P52 0 P23 P14 P24 0 P43 0 0 0 0 P15 º P25 »» 0» » 0» 0 »¼ (b) Fig. 11.5. A subgraph of the World Wide Web (a) and the corresponding transition probability matrix S (b). 11.6 Ranking of webpages 225 The topology of any graph is determined by an adjacency matrix (see Appendix B.1). A reasonable criterion to assess the importance of a webpage is the number of times that this webpage is visited. This criterion suggests us to consider a discrete Markov chain whose transition probability matrix S corresponds to adjacency matrix of the webgraph (shown in (b) in Fig. 11.5). The element Slm of the Markov transition probability matrix is the probability of moving from webpage l (state l) to webpage m (state m) in one time step. The components vl [n] of the corresponding state vector v[n] denotes the probability that at time n the webpage l is visited. The long run mean fraction of time that webpage l is visited equals the steadystate probability l of the Markov chain. This probability l is the ranking measure of the importance of webpage l used in Google. The basic idea is indeed simple, but we have not shown yet how to determine the elements Slm nor whether the steady-state probability vector exists. In particular, we will demonstrate that guaranteeing that the steady-state vector exists and that can be computed for a Markov chain containing some billion of states — the order of magnitude of the number Q of webpages — requires a deeper knowledge of discrete Markov chains. To start determining the elements Slm , we assume that, given we are on webpage l, any hyperlink on that webpage has equal probability to be clicked on. This assumption implies that Slm = g1l where the degree gl of a node l (see also Section 15.3) equals the number of adjacent neighbors of node l in the webgraph. This number gl is thus equal the number of hyperlinks on webpage l. The transition probability matrix in Fig. 11.5 then becomes 6 5 0 13 0 13 13 9 0 0 1 1 1 : 3 3 3 : 9 : S =9 9 0 0 0 0 0 : 7 0 0 1 0 0 8 1 2 1 2 0 0 0 The uniformity assumption is in most cases the best we can make if no additional information is available. If, for example, web usage information is available showing that a random surfer accessing page 2 is twice as likely to jump to page 4 than to any¡ other neighboring ¢ webpage of 2, then the 1 1 1 second row can be replaced by 0 0 4 2 4 . When solely adopting the adjacency matrix of the webgraph as underlying structure of the Markov transition probability matrix S , we cannot assure that S is a stochastic matrix. For example, it often occurs that a node such as node 3 in our example in Fig. 11.5 does not contain outlinks. Such nodes are called dangling nodes. For example, many webpages may point 226 Applications of Markov chains to an important document on the web, which itself does not refer to any other webpage. The corresponding row in S possesses only zero elements, which violates the basic law (9.8) of a stochastic matrix. To rectify the deviation from a stochastic matrix, each zero row must be replaced by a particular non-zero row vector3 y W that obeys (9.8), i.e. kyk1 = y W x = 1 where xW = [1 1 · · · 1]. Again, the simplest recipe is to invoke uniformity W and to replace zero row by¢y W = xQ . In our example, we replace the ¡ any third row by 15 51 15 51 51 and obtain 5 0 9 0 9 1 S̄ = 9 9 5 7 0 1 2 1 3 0 1 5 0 1 2 1 3 1 5 1 3 1 3 1 5 1 0 0 0 0 1 3 1 3 1 5 6 : : : : 0 8 0 However, this adjustment is not su!cient to insure the existence of a steady-state vector . In Section 9.3.1, we have shown that, if the Markov chain is irreducible, the steady-state vector exists. In an irreducible Markov chain any state is reachable from any other (Section 9.2.1.1). By its very nature, the World Wide Web leads almost surely to a reducible Markov chain. In order to create an irreducible matrix, Brin and Page have considered = S = S̄ + (1 ) x=xW Q where 0 ? ? 1, x=xW is a Q × Q matrix with each element equal to 1 and S̄ is the previously adjusted matrix without zero-rows. The linear combination of the stochastic matrix S̄ and a stochastic perturbation matrix = x=xW ensures that S is an irreducible stochastic matrix. Every node is now directly connected (reachable in one step) to any other (because of x=xW ), which makes the Markov chain irreducible with aperiodic, positive recurrent states (see Fig. 9.1). Slightly more general, we can replace the W W matrix x=x Q by xy , where y is a probability vector as above but where we must additionally require that each component of y is non-zero in order to guarantee reachability. Brin and Page have called yW the personalization vector which enables to deviate from non-uniformity. Hence, we arrive at the Brin and Page Markov transition probability matrix = S = S̄ + (1 )xy W 3 (11.29) We use the normal vector algebra convention, but remark that the stochastic vectors and v[n] are also row vectors (without the transpose sign)! 11.6 Ranking of webpages £ 1 4 6 4 1 For y W = 16 16 16 16 16 matrix in our example becomes 5 9 = 9 S =9 9 7 1 80 1 80 1 16 1 80 33 80 ¤ 19 60 1 20 1 4 1 20 9 20 227 and = 45 , the probability transition 3 40 41 120 3 8 7 8 3 40 19 60 19 60 1 4 1 20 1 20 67 240 67 240 1 16 1 80 1 80 6 : : : : 8 If the presented method were implemented, the initially very sparse matrix = S would be replaced by the dense matrix S , which for the size Q of the web would increase storage dramatically. Therefore, a more eective way is to define a special vector u whose component um = 1 if row m in S is a zero-row or node m is dangling node. Then, S̄ = S + uy W is a rank-one update of S = and so is S because = ¢ ¡ S = S + uy W + (1 )x=y W = S + (u + (1 )x) y W 11.6.2 Computation of the PageRank steady-state vector The steady-state vector obeys the eigenvalue equation (9.22), thus = S̄ . Rather than solving this equation, Brin and Page propose to compute the steady-state vector from = limn<" v[n]. Specifically, for any starting W vector v[0] (usually v[0] = xQ ), we iterate the equation (9.6) p-times and choose p su!ciently large such that kv[p] k where is a prescribed tolerance. Before turning to the convergence of the iteration process that = actually computes powers of S as observed from (9.9), we first concentrate on the basic iteration (9.6), = ¢ ¡ v [n + 1] = v[n]S = v[n] S + (u + (1 )x) y W Since v[n]x = 1, we find v [n + 1] = v[n]S + (v[n]u + (1 )) y W (11.30) This formula indicates that only the product of v[n] with the (extremely) = sparse matrix S needs to be computed and that S̄ and S are never formed nor stored. As shown in Appendix A.4.3, the rate of convergence of a Markov chain towards the steady-state is determined by the second largest eigenvalue. Furthermore, Lemma A.4.4 demonstrates that, for any personaliza= tion vector yW , the second largest eigenvalue of S is 2 , where 2 is the second largest eigenvalue of S̄ . Lemma A.4.4 thus shows that by choosing 228 Applications of Markov chains in (11.29) appropriately, the convergence of the iteration (11.30) tends at least as n (since 2 ? 1 for irreducible and 2 = 1 for reducible Markov chains) towards the steady-state vector . Brin and Page report that only 50 to 100 iterations of (11.30) for = 0=85 are su!cient. Clearly, a fast convergence is found for small , but then (11.29) shows that the true characteristics of the webgraph are suppressed. This brings us to a final remark concerning the irreducibility approach. The original method of Brin and Page that resulted in (11.29) by enforcing that each node is connected to each other alters the true nature of the Webgraph even though the “connectivity strength” to create irreducibility W is extremely small, Q1 in the case yW = xQ . Instead of maximally connecting all nodes, an other irreducibility approach of minimally connecting nodes investigated by Langville and Meyer (2005) consists of creating one dummy node that is connected to all other nodes and to which all other nodes are connected to ensure overall reachability. Such approach changes the webgraph less. The large size Q of the web introduces several challenges such as storage, stability and updating of the PageRank vector, choosing the personalization vector y and other implementational considerations for which we refer to Langville and Meyer (2005). 11.7 Problems (i) Determine the steady-state probability distribution for the birthdeath processes with following transition intensities (a) l = and l = l, and l = (b) l = l+1 where and are constants. (ii) Consider a slotted ALOHA in Section 11.5. There are eight stations that compete for slots by transmitting with probability 0.12 each in one slot. Assume that the stations always have packets to transmit. Compute the average time for one station to transmit seven packets. 12 Branching processes A branching process is an evolutionary process that starts with an initial set of items that produce several other items with a certain probability distribution. These generated items in turn again produce new items and so on. If we denote by [n the number of items in the n-th generation and by \n>m the number of items produced by the m-th item in generation n, then the basic law between the number of items in n-th and n + 1-th generation is, for n 0, [n+1 = [n X \n>m (12.1) m=1 Figure 12.1 illustrates the basic law (12.1) of a branching process. X0 = 1 0 1 2 X1 = 3 = Y0 3 X2 = 5 = Y1,1 + Y1,2 + Y1,3 = 2 +0 + 3 Fig. 12.1. A branching process with one root ([0 = 1) drawn as a tree in which all nodes of generation n lie at a same distance n from the root (label 0). In general, the production process in each generation n can be dierent, but most often and in the sequel it is assumed that all generations produce items with the same probability distribution such that all random variables \n>m are independent and have the same distribution as \ . The branching 229 230 Branching processes process is entirely defined by the basic law (12.1) and the distribution of the initial set [0 . The basic law (12.1) indicates that the number of items [n+1 in generation n + 1 is only dependent on the number of items [n in the previous generation n. The Markov property (9.2) Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ] 5 6 {n X = Pr 7 \ = {n+1 8 m=1 = Pr [{n \ = {n+1 ] is obeyed, which shows that the branching process {[n }nD0 is a Markov chain with transition probabilities Slm = Pr [l\ = m]. The discrete branching process can be extended to a continuous-time branching process in which items are produced continuously in time, rather than by generations. Since continuous-time Markov processes are mathematically more di!cult than their discrete counterpart, we omit the continuous-time branching processes but refer to the book of Harris (1963) and to a simple example, the Yule process, in Section 11.3.2.3. There are many examples of branching processes and we briefly describe some of the most important. In biology, a certain species generates osprings and the survival of that species after q generations is studied as a branching process. In the same vein, what is the probability that a family name that is inherited by sons only will eventually become extinct? This was the question posed by Galton and Watson that gave birth to the theory of branching processes in 1874. In physics, branching processes have been studied to understand nuclear chain reactions. A nucleus is split by a neutron and several new free neutrons are generated. Each of these free neutrons again may hit another nucleus producing additional free neutrons and so on. In micro-electronics, the avalanche break-down of a diode is another example of a branching process. In queuing theory, all new arrivals of packets during the service time of a particular packet can be described as a branching process. The process continues as long as the queue lasts. The number of duplicates generated by a flooding process in a communications network is a branching process: a flooded packet is sent on all interfaces of a router except for the incoming interface. The spread of computer viruses in the Internet can be modeled approximately as a branching process. The application of a branching process to compute the hopcount of the shortest path between two arbitrary nodes in a network is discussed in Section 15.7. 12.1 The probability generating function 231 12.1 The probability generating function Since \n>m are independent, identically distributed random variables with same distribution as \ and independent of the random variable [n , the probability generating function *[n+1 (}) of [n+1 follows from (2.70) and the basic law (12.1) as *[n+1 (}) = *[n (*\ (})) (12.2) ¤ £ with *[0 (}) = H } [0 = i (}) where i (}) is a given probability generating function. Iterating the general relation (12.2) gives, *[n+1 (}) = *[n (*\ (}) = *[n31 (*\ (*\ (}))) = i (*\ (*\ (= = = (*\ (}))))) where the last relation consists of n nested repeated functions *\ (=), called the iterates of *\ (=). The expectation can be derived from the probability generating function by derivation and setting } = 1. More elegantly, by taking the expectation of the basic law (12.1) and recalling that [n and \n>m are independent, we have with = H [\ ] 6 5 6 5 [n [n X X \n>m 8 = H 7 \ 8 = H [\ [n ] = H [[n ] H [[n+1 ] = H 7 m=1 m=1 Iteration starting from a given average H [[0 ] of the initial population gives H [[n ] = n H [[0 ] (12.3) Using (2.27), the variance of [n+1 follows from (12.2) with *0[n+1 (}) = *0[n (*\ (}))*0\ (}) ¡ ¢2 *00[n+1 (}) = *00[n (*\ (})) *0\ (}) + *0[n (*\ (}))*00\ (}) as ³ ´2 Var [[n+1 ] = *00[n+1 (1) + *0[n+1 (1) *0[n+1 (1) ¡ ¢2 2 = *00[n (1) (*0\ (1)) + *0[n (1)*00\ (1) + *0[n (1)*0\ (1) *0[n (1)*0\ (1) 2 2 = *00[n (1) (H [\ ]) + H [[n ] *00\ (1) + H [[n ] H [\ ] (H [[n ] H [\ ]) = 2 Var [[n ] + H [[n ] Var [\ ] Iteration starting from a given variance Var[[0 ] of the initial set of items 232 Branching processes and employing the expression for the average (12.3) yields Var [[1 ] = 2 Var [[0 ] + H [[0 ] Var [\ ] Var [[2 ] = (H [\ ])2 Var [[1 ] + H [[1 ] Var [\ ] ¡ ¢ = 4 Var [[0 ] + 2 + H [[0 ] Var [\ ] Var [[3 ] = (H [\ ])2 Var [[2 ] + H [[2 ] Var [\ ] ¢ ¡ = 6 Var [[0 ] + 4 + 3 + 2 H [[0 ] Var [\ ] from which we deduce X 2(n31) 2n Var [[n ] = Var [[0 ] + m H [[0 ] Var [\ ] m=n31 or Var [[n ] = 2n Var [[0 ] + H [[0 ] Var [\ ] n31 1 n 1 (12.4) Substitution into the recursion for Var[[n ] justifies the correctness of (12.4). The relations for the expectation (12.3) and the variance (12.4) of the number of items in generation n imply that, if the average production per generation is H [\ ] = = 1, H [[n ] = H [[0 ] and that Var[[n ] = Var[[0 ] + nH [[0 ]Var[\ ]. In the case that the average production H [\ ] = A 1 (H [\ ] ? 1), the average population per generation grows (decreases) exponentially p in n with rate log and, similarly for large n, the standard deviation Var [[n ] grows (decreases) exponentially in n with the same rate log . Hence, the most important factor in the branching process is the average production H [\ ] = per generation. The variance terms and H [[0 ] only play a role as prefactor. A branching process is called critical if = 1, subcritical if ? 1 and supercritical if A 1. In the sequel, we will only consider supercritical ( A 1) branching processes. Often, the initial set of items consists of only one item. In that case, [0 = 1 and *[0 (}) = i (}) = } and H [[n ] = n Var [[n ] = Var [\ ] n31 1 n 1 while the explicit nested form of the probability generating function indicates that *[n+1 (}) = *\ (*[n (})) (12.5) This relation is only valid if i (}) = } or, equivalently, only if [0 = 1. In case i (}) = }, *[n (}) is the n-th iterate of *\ (}). 12.2 The limit Z of the scaled random variables Zn 233 Example Due to the nested structure (12.2), closed form expressions for the n-th generation probability generating function *[n (}) are rare. Assume that [0 = 1. A simple case that allows explicit computation occurs in a deterministic production of p osprings in each generation for which *\ (}) = } p . We have from (12.5) that *[1 (}) = *\ (}) = } p and *[n (}) = } np This branching process evolves as an p-ary tree shown in Fig. 17.7. A second example that can be computed exactly is the geometric branching process studied in Section 12.5. 12.2 The limit Z of the scaled random variables Zn The conditional expectation, defined in Section 2.6, H [[n+1 |[n > [n31 > = = = > [0 ] = H [[n+1 |[n ] (Markov property) ¯ 6 5 ¯ [n X ¯ 7 =H \n>m ¯¯ [n 8 = H [\ [n |[n ] = [n ¯ m=1 is a random variable, which suggests us to consider the scaled random varin able Zn = [ because n H [Zn+1 |Zn > Zn31 > ===> Z1 ] = Zn while (12.3) shows that H [Zn ] = H [[0 ] for all n. The stochastic process {Zn }nD1 is a martingale process, which is a generalization of a fair game with characteristic property that at each step n in the process H [Zn ] is a constant (independent of n). From (12.4), the variance of the scaled random n is variables Zn = [ n Var [Zn ] = Var [[0 ] + H [[0 ] ´ Var [\ ] ³ 3n 1 2 (12.6) which geometrically tends, provided H [\ ] = A 1, to a constant independent of n. The expression for the variance (12.6) indicates that Var [Z ] = lim Var [Zn ] = Var [[0 ] + H [[0 ] n<" Var [\ ] 2 (12.7) exists provided H [\ ] = A 1. We now show that the limit variable Z = limn<" Zn exists if H [\ ] A 1. Theorem 12.2.1 If H [\ ] = A 1, the scaled random variables Zq $ Z a.s. 234 Branching processes Proof: Consider h i £ 2 ¤ £ ¤ H (Zn+q Zq )2 = H Zn+q + H Zq2 2H [Zn+q Zq ] Using (2.72) with k({) = { and the Markov property, ¤ £ H [Zn+q Zq ] = H [H [Zn+q |Zq ] Zq ] = H Zq2 we have, with (2.16), H [Zn+q ] = H [Zn ] = H [[0 ] and (12.6), h i ´ Var [\ ] 3q ³ 3n H (Zn+q Zq )2 = Var [Zn+q ]Var [Zq ] = H [[0 ] 1 2 In the limit n $ 4, h i ¢ ¡ H (Z Zq )2 = R 3q which means that the sequence {Zn }nD1 converges to Z in O2 or in mean square (see Section 6.1.2). Moreover, # "" " h i X X 2 2 H (Z Zq ) = H (Z Zq ) = R(1) q=1 q=1 P 2 which means that the series has finite expectation and that " q=1 (Z Zq ) is finite with probability 1. The convergence of this series implies for large q that (Z Zq )2 $ 0 with probability 1 or that Zq $ Z a.s. ¤ Theorem 12.2.1 means that the number of items in generation n is, for large n, well approximated by [n Z n . Hence, an asymptotic analysis of a branching process crucially relies on the properties of the limit random variable Z . £ ¤ The generating function *Z (}) = H } Z of this limit random variable can be deduced as the limit of the sequence of generating functions [ ¸ n £ Z ¤ 3n n *Zn (}) = H } = H } n = *[n (} ) (12.8) 3n31 Using (12.5) in case1 [0 = 1 with } $ } [n+1 ¸ µ [ ¸¶ n n+1 H } = *\ H } n n leads with Zn = [ to the recursion of the pgf of the scaled random variables n Zn , ³ ³ 1 ´´ *Zn+1 (}) = *\ *Zn } 1 The use of the general equation (12.2) is inadequate. 12.2 The limit Z of the scaled random variables Zn 235 In the limit n $ 4 where Zn $ Z a.s., we can apply the Continuity Theorem 6.1.3 which results in the functional equation for the pgf of the continuous limit random variable Z , ³ ³ 1 ´´ (12.9) *Z (}) = *\ *Z } Since Z is a continuous random variable except at Z = 0 as explained below (see (12.19)), it is£more ¤convenient to define the moment generating function "Z (w) = H h3wZ . Obviously, the relation between the two generating functions is with } = h3w "Z (w) = *Z (h3w ) With } = h3w in (12.9) the functional equation of "Z (w) is for w 0 and H [Z ] = H [[0 ] = 1, µ µ ¶¶ w "Z (w) = *\ "Z (12.10) The functional equation (12.10) is simpler than (12.9) and "Z (w) is convex for all w, while *Z (}) is not convex for all }. In particular, *Z (}) = "Z ( log }) is not analytic at } = 0 and appears2 to have a concave regime 00 near } & 0 where "0Z ( log }) + "Z ( log }) ? 0. Lemma 12.2.2 "Z (w) is the only probability generating function satisfying the functional equation (12.10). ¤ £ £ W¤ Proof : Let #Z W (w) = H h3wZ and "Z (w) = H h3wZ be two probability generating functions that satisfy both (12.10). Then #Z W (w) "Z (w) is continuous for Re w 0 and, since H[Z ] = H[Z W ] = 1, the Taylor series (2.40) around w = 0 is # "" ´ X (w)n ³ n #Z W (w) "Z (w) = (w)H [Z W Z ] + H (Z W ) Z n n! n=2 "" # ´ X (w)n ³ = wH Z n+1 (Z W )n+1 (n + 1)! n=1 from which #Z W (w) "Z (w) = w k(w) and k(0) = 0. Since |*0\ (})| for |}| 1, equation (5.6) of the Mean Value Theorem implies |*\ (d) *\ (e)| |d e| for any |d|> |e| 5 [0> 1]. Since |"Z (w) | 1 and |#Z W (w)| 1 for 2 This fact is observed for both a geometric and Poisson production distribution function. 236 Branching processes Re(w) 0, we obtain ¯ µ µ ¶¶ µ µ ¶¶¯ ¯ ¯ w w ¯ ¯ |w k(w)| = ¯*\ #Z W *\ "Z ¯ ¯ ¯ µ ¶¯ µ ¶ µ ¶¯ ¯ ¯w w ¯¯ w w ¯¯ ¯¯#Z W = ¯¯ k "Z ¯ ¯ or ¯ µ ¶¯ ¯ w ¯¯ |k(w)| ¯¯k ¯ ¯ ³ ´¯ ¯ ¯ After N iterations, we have that |k(w)| ¯k wN ¯ which hold for any integer N. Hence,³for ´any finite w and since ³ ´ k (w) is continuous which allows that w w limN<" k N = k limN<" N , ¯ µ ¶¯ ¯ w ¯¯ = k(0) = 0 |k(w)| lim ¯¯k N<" N ¯ which proves the Lemma. ¤ Lemma 12.2.2 is important because solving the functional equation, for example by Taylor expansion, is one of the primary tools to determine "Z (w). If *\ (}) is analytic inside a circle with radius U\ A 0 centered at } = 1, then the Taylor series around }0 = 1, *\ (}) = 1 + " X xn (} 1)n n=0 ¤ £ converges for all |} 1| ? U\ . The definition "Z (w) = H h3wZ implies that the maximum value of |"Z (w)| inside and on a circle with radius u around the origin is attained at "Z (u). The functional equation (12.10) then shows that "Z ³ (w) is´analytic inside a circle around w = 0 with radius UZ for which "Z UZ ? 1 + U\ . Since "Z (0) = 1, "Z (w) is convex and decreasing for real w and U\ A 0, there exists such a non-zero value of UZ . This implies that the Taylor series "Z (w) = 1 + " X $n wn (12.11) n=1 converges around w = 0 for |w| ? UZ . There exists a recursion to compute $n for any n 1 as shown in Van Mieghem (2005). If "Z (w) is not known in closed form, the interest of the Taylor series (12.11) lies in the fast convergence for small values of |w| ? 1. The recursion for the Taylor coe!cients $n 12.3 The Probability of Extinction of a Branching Process 237 enables the computation of "Z (w) for |w| ? 1³ to any desired degree of accu³ ´´ w racy. The functional equation "Z (w) = *\ "Z extends the w-range to the entire complex plane. For large values of particular for nega³ w and in ´ w tive real w, "Z (w) is best computed from "Z [log |w|]+1 after [log |w|] + 1 w ? 1, |w|]+1 [log ³ ´ the Taylor series (12.11) provides an accurate start value "Z [logw|w|]+1 functional iteratives of (12.10). Indeed, since A 1 such that for this iterative scheme. 12.3 The Probability of Extinction of a Branching Process In many applications the probability that the process will eventually terminate and which parameters influence this extinction probability are of interest. For instance, a nuclear reaction will only lead to an explosion if critical starting conditions are obeyed. The branching process terminates if, for some generation q A 0, [q = 0 and, of course, [p = 0 for all p A q. Let us denote tn = Pr [[n = 0] = *[n (0) If we assume that [0 = 1, the analysis simplifies because the more specific version (12.5) holds. Hence, only if the initial set consists of a single item [0 = 1, tn+1 = *[n+1 (0) = *\ (*[n (0)) = *\ (tn ) (12.12) and with t0 = *[0 (0) = 0, t1 = *\ (0) = Pr [\ = 0] 0. Obviously, if there is no production, Pr [\ = 0] = 1, or always production, Pr [\ = 0] = 0, extinction never occurs. By its definition (2.18), a probability generating function of a non-negative discrete random variable is strict increasing along the positive real }axis. When excluding the extreme cases such that 0 ? Pr [\ = 0] ? 1, by the strict increase of *\ ({) for { = Re } 0, we observe that 0 = t0 ? t1 = *\ (0) ? t2 = *\ (t1 ) ? t3 = *\ (t2 ) ? === The series t0 > t1> t2 > = = = is a monotone increasing sequence bounded by 1 because *\ (1) = 1. Hence, the probability of extinction 0 = lim Pr [[n = 0] = Pr [Z = 0] n<" exists and 0 ? 0 1. The existence of a limiting process and the fact that the probability generating function is analytic for |}| ? 1 and hence, continuous, which allows us to interchange limn<" *\ (tn ) = *\ (limn<" tn ) 238 Branching processes yields the equation for the extinction probability 0 , 0 = *\ (0 ) (12.13) It demonstrates that the extinction probability 0 is a root of *\ ({) { in the interval { 5 [0> 1]. Since *Z (0) = Pr [Z = 0] = 0 , this equation (12.13) follows more directly from (12.9). Notice, however, that in the functional equation (12.9) @ N which may cause that iZ ({) the function } is not analytic at } = 0 if 5 is possibly not continuous at { = 0, although the limit lim}<0 *Z (0) = 0 exists. On the other hand, since *Z (}) = "Z ( log }), the extinction probability is found as lim "Z (w) = 0 w<" and the convexity of "Z (w) implies that, for any real value of w, "Z (w) 0 . An alternative, more probabilistic derivation of equation (12.13) is as follows. Applying the law of total probability (2.46) to the definition of the extinction probability 0 = Pr [[q = 0 for some q A 0] " X = Pr [[q = 0 for some q A 0|[1 = m] Pr [[1 = m] m=0 Only if [0 = 1, relation (12.5) indicates that *[1 (}) = *\ (}) which implies that Pr [[1 = m] = Pr [\ = m]. In addition, given the first generation consists of m items, the branching process will eventually terminate if and only if each of the m sets of items generated by the first generation eventually dies out. Since each set evolves independently and since the probability that any set generated by a particular ancestor in the first generation becomes extinct is 0 , we arrive at " X 0 = 0m Pr [\ = m] = *\ (0 ) m=0 The dierent viewpoints thus lead to a same result summarized by: Theorem 12.3.1 If [0 = 1 and 0 ? Pr [\ = 0] ? 1, the extinction probability 0 is (a) the smallest positive real root of { = *\ ({) and (b) 0 = 1 if and only if H [\ ] 1 and Pr [\ = 0] + Pr [\ = 1] ? 1. Proof: (a) Suppose that {r is the smallest positive real root obeying *\ ({r ) = {r A 0. Then, t1 = *\ (0) ? *\ ({r ) = {r . Assume (induction hypothesis) that tq ? {r . The recursion (12.12) and the strict increase of *\ ({) then shows that tq+1 = *\ (tq ) ? *\ ({r ) = {0 . Hence, the principle of induction demonstrates that tq ? {r for all (finite) q and, hence, that 0 $ {0 . 12.3 The Probability of Extinction of a Branching Process 239 (b) First, the condition Pr [\ = 0] + Pr [\ = 1] ? 1 implies that Pr [\ A 1] A 0 and that there exists at least one integer m A 1 such that Pr [\ = m] A 0. In that case, for real { A 0 but { smaller convergence which is at least U = 1, the second derivative S" than the radius U of m32 *00 is positive, which implies that *\ ({) is strict convex in m=2 m (m 3 1) Pr [\ = m] { \ ({) = (0> 1). Since { = 1 obeys { = *\ ({) and *\ (0) = Pr [\ = 0] M (0> 1), the strict convex function | = *\ ({) can only intersect the line | = { in some point { M (0> 1) if *\ ({) is below that line near their intersection at { = 1 or if *0\ (1) = H [\ ] A 1. In the other case, if H [\ ] ? 1, the only intersection is at { = 1. ¤ The two possibilities are drawn in Fig. 12.2. MY (x) 1 a b Pr[Ya = 0] Pr[Yb = 0] x3 0 S0 x2 x1 x0 1 x Fig. 12.2. The generating function *\ ({) along the positive real axis {. The two possible cases are shown: curve d corresponds to H [\ ] ? 1 and curve e to H [\ ] A 1. The fast convergence towards the zero 0 is exemplified by the sequence {0 A {1 = *\ ({0 ) A {2 = *\ ({1 ) A {3 = *\ ({2 ). A root equation such as (12.13) also appears in queuing models such as the M/G/1 (Section 14.3) and GI/D/1 (Section 14.4) and reflects the asymptotic behavior as explained in Section 5.7. The extinction probability 0 can be expressed explicitly as a Lagrange series as demonstrated in Van Mieghem (1996). The branching process with infinitely many generations n $ 4 can be viewed as an infinite directed tree where each node has a finite degree a.s. The fact that 0 ? 1 if H [\ ] A 1 implies that, in infinite directed trees, there exists an infinitely long path starting from the root with probability 1 0 . Theorem 12.3.2 The limiting branching process with [0 = 1 obeys for 240 Branching processes n$4 Pr [[n = 0] $ 0 Pr [[n = m] $ 0 for any m A 0 Proof: First, if H [\ ] ? 1, then Theorem 12.3.1 states that 0 = 1. For any probability generating function * (}), it holds that |* (}) | $ 1 for |}| $ 1. Hence, *[n ({) $ 1 for real { M [0> 1]. Moreover, tn = *[n (0) $ *[n ({). In the limit n < ", tn < 0 = 1, which implies that for all { M [0> 1] it holds that *[n ({) < 0 = 1. The fact that a probability generating function, a Taylor series around } = 0, converges to a constant 0 for 0 $ { $ 1 implies that Pr [[n = m] < 0 for any m A 0 and Pr [[n = 0] < 0 . The second case H [\ ] A 1 possesses an extinction probability 0 ? 1. For { M (0 > 1), Fig. 12.2 shows that 0 ? *\ ({) ? { ? 1. By induction using (12.5), we find that 0 ? *[n ({) ? *[n1 ({) ? · · · ? 1 or limn<" *[n ({) = 0 for { M (0 > 1). For { M [0> 0 ), the same argument tn = *[n (0) $ *[n ({) $ 0 shows that limn<" *[n ({) = 0 for { M [0> 1). This proves the theorem. ¤ Theorem 12.3.2 states that, regardless of the value of H [\ ], the probability that the n-th generation will consists of any finite positive number of items tends to zero if n $ 4. Theorem 12.3.2 is equivalent to the statement that, after an infinite number n $ 4 of evolutions or generations, [n $ 4 with probability 1 0 . Theorem 12.3.2 also illustrates that the Markov chain with an infinite number of states behaves dierently than a chain with a finite number of states. In particular, Theorem 12.3.2 shows that the infinite Markov chain {[n }nD0 has a single absorbing state [n = 0 while all other states m are transient (limq<" Slmq = 0 for 1 l> m ? 4). The existence of the steady-state vector (not all components are zero) does not imply that the branching process with [0 = 1 and 0 ? Pr [\ = 0] ? 1 and infinitely many states is an irreducible Markov chain. 12.4 Asymptotic behavior of Z The convexity of "Z (w) implies that "0Z (w) 0 for all real w and that "0Z (w) is decreasing in w. We know that "0Z (0) = 1= Since limw<" "Z (w) = 0 , it follows that limw<" "0Z (w) = 0. The following Lemma 12.4.1 is a little more precise. Lemma 12.4.1 "0Z (w) = r(w31 ) for w $ 4. Proof: The derivative of the functional equation (12.10) is "0Z (w) = *0\ ("Z (w)) "0Z (w). By iteration, we have N31 \ *0\ "Z m w N "0Z N w = "0Z (w) m=0 Since "Z (w) M [0 > 1] for real w D 0, then *0\ (0 ) $ *0\ "Z m w $ for any m. Theorem 12.3.1 0 states that if = *\ (1) A 1, then there are two zeros 0 and 1 of i (}) = *\ (})3} in } M [0> 1]. By Rolle’s Theorem applied to the continuous function i (}) = *\ (}) 3 }, there exists an M (0 > 1) 12.4 Asymptotic behavior of Z 241 for which i 0 () = 0. Equivalently, *0\ () = 1 and A 0 . Since *0\ (}) is monotonously increasing in } M [0> 1], we have that *0\ (0) = Pr [\ = 1] $ *0\ (0 ) ? 1. Since "Z (w) is continuous and monotone decreasing, there exists an integer N0 such that *0\ "Z m w ? 1 for m A N0 and any w A 0. Hence, lim N<" N31 \ " 0 31 N\ \ *0\ "Z m w = *0\ "Z m w *0\ "Z m w < 0 m=0 m=0 m=N0 and, for any finite w A 0, N "0Z N w < 0 for N < " which implies the lemma. ¤ Lemma 12.4.1 is, for large w, equivalent to |"0Z (w) | Fw313 for some real A 0 and where F is a finite positive real number. Lemma 12.4.1 thus suggests that "0Z (w) = j (w) w331 (12.14) where 0 ? j (w) F on the real positive waxis. Lemma 12.4.2 If *0\ (0 ) A 0 and A 1, then I = lim j (w) w<" (12.15) exists, is finite and strictly positive. Proof: We first use (a) the convexity of any pgf "Z (w) implying that "00 Z (w) D 0 for all w and we then invoke (b) the functional equation (12.10) of "Z (w). (a) The function j (w) = 3"0Z (w) w+1 is dierentiable, thus continuous, and has for real w A 0 only one extremum at w = obeying = 3"0Z ( ) 00 "Z ( ) ( + 1) A 0 Since "0Z (0) = 31, implying that j (w) = w+1 (1 + r(w) as w 0 or that j (w) is initially monotone increasing in w, the extremum at w = is a maximum. The derivative of j (w) = 3"0Z (w) w+1 is, with (12.14), 0 j (w) = +1 +1 j (w) 3 "00 Z (w) w w +2 00 such that, for finite, max j (w) = +1 "00 Z ( ). Since "Z (w) D 0 for all w, we also obtain the inequality for w D 0 0 j (w) $ +1 +1 j (w) $ F w w 0 (w) $ 0. Hence, j (w) is not increasing for w < ". from which limw<" j (b) Substitution of (12.14) in the derivative of the functional equation (12.10) yields w w j (12.16) j (w) = *0\ "Z Since *0\ "Z w D *0\ (0 ) A 0 (the restriction of this Lemma), there holds with D = *0\ (0 ) A 0 for all w A 0 that j (w) D Dj w 242 Branching processes For w ? , j (w) is shown in (a) to be monotone increasing, which requires that D D 1 for A 1. But, since the inequality with D D 1 holds for all w A 0, we must have that < ". Hence, j (w) is continuous and strict increasing for all w D 0 with a maximum at infinity, which proves the existence of a unique limit I $ F. If I = 0, the suggestion (12.14) is not correct implying that "0Z (w) decreases faster than any ¤ power of w31 . The proof of Lemma 12.4.1 indicates that his case can occur if *0\ (0 ) = 0. In fact, D = 1. For, when passing to the limit w $ 4 in (12.16) using Lemma 12.4.2, we obtain 3 = *0\ (0 ) which determines the exponent 1 as = log *0\ (0 ) log After integration of (12.14), we have that Z " j (x) x331 gx "Z (w) = 0 + (12.17) (12.18) w Approximating j (x) by its limit I for large w, we obtain the asymptotic form I 3 w "Z (w) 0 + Beside and the extinction probability 0 , the parameter I appears as additional characterizing quantity of a branching process. The behavior of the Laplace transform (2.37) for large w reflects the Rbehavior of the probaf+l" h{w 1 {v31 bility density function for small {. Hence, using 2l f3l" wv gw = K(v) for Re v A 0, the probability density function is, for small {, iZ ({) 0 ({) + I {31 ( + 1) (12.19) The probability density function iZ ({) is not continuous at { = 0 if 0 A 0 and reflects the two dierent regimes: (a) Z = 0 implying that the branching process extincts, [n = 0, from some generation n on and (b) Z A 0 implying [n Z n for large n, the number of items per generation grows exponentially with prefactor Z . If two sample paths of a same branching process are generated, [n>1 Z1 n and [n>2 Z2 n may be largely dierent for large n, because of the random nature of Z : although the prefactor Z1 and Z2 both have the same probability density function iZ ({), Z1 can dier substantially from Z2 as illustrated by the pdf iZ ({) in Fig. 12.3. 12.5 A geometric branching processes 243 0 10 P =5 0.8 P =4 -1 10 P =3 -2 10 fW(x) 0.6 increasing P fW (x) -3 10 P =2 0.4 increasing P -4 10 -5 10 0.2 0 2 4 6 8 10 x increasing P Poisson Geometric 0.0 0 2 4 6 8 10 x Fig. 12.3. The probability density function of the limit random variable Z for both a geometric and a Poisson production process for a same set of values of the average = H[\ ]. 12.5 A geometric branching processes Consider a production generation function *\ (}) of the form i (}) = d}+e f}+g . Beside straightforward iteration of (12.5), a more elegant approach3 relies on the following property of i (}). For } = {, the dierence is i (}) i ({) = dg ef (} {) (g + f}) (g + f{) and, hence, for any two points {0 and {1 , i (}) i ({0 ) = i (}) i ({1 ) 3 µ f{1 + g f{0 + g ¶µ } {0 } {1 ¶ The linear fractional transformation i (}) is an automorphism of the extended complex plane and basic in the geometric theory of a complex function for which we refer to the book of Sansone and Gerretsen (1960, vol. 2). Fixed points of an automorphism of the extended plane are solutions of } = i (}), which is a quadratic equation f} 2 + (g 3 d) } 3 e = 0 and which shows that there are at most two dierent fixed points. 244 Branching processes Let us now confine to the two fixed points, {0 and {1 , of i (}) that are 1 +g solution of i (}) = } and let = f{ f{0 +g , then i (}) {0 } {0 = i (}) {1 } {1 Now, substitute } $ i (}), then i (i (})) {0 } {0 i (}) {0 = = 2 i (i (})) {1 i (}) {1 } {1 Let us denote the iterates of i (}) by zq = iq (}) = i (iq31 (})). By iterating, we find that the iterates obey } {0 zq {0 = q zq {1 } {1 or zq = {0 (} {1 ) {1 q (} {0 ) } {1 q (} {0 ) (12.20) Since the probability generating function (3.6) of a geometric random variable \ is of the form i (}) = d}+e f}+g , a geometric branching process is regarded as a basic reference model in the study of branching processes. The production process in each generation obeys Pr [\ = n] = tsn for n 0 t leading to *\ (}) = 13s} , which is slightly dierent from (3.6). We know that the equation *\ ({) = { can have two real zeros in [0> 1], one at {1 = 1 since *\ (}) is a probability generating function and another at {0 = st = 1 1 H[\ ] = = 0 such that = s{1 + 1 t 1 = = s{0 + 1 s The functional equation (12.5) associates zq = *[q (}) and after substitution in (12.20) we obtain ¡ q31 ¢ 1 } q31 + 1 *[q (}) = (12.21) (q 1) } q + 1 In the case that H [\ ] = $ 1 or s = t, using the rule of de l’Hospital gives *[q (}) = (q 1)} q q} q + 1 From (12.21), the probabilities of extinction at the n-th generation are à ! 1 n 1 Pr [[n = 0] = *[n (0) = n 1 12.5 A geometric branching processes 245 If H [\ ] = A 1, then limn<" Pr [[n = 0] = 1 = {0 = 0 (Theorem 12.3.2) whereas for H [\ ] = 1, we find that limn<" Pr [[n = 0] = 1. If W0 is the hitting time defined in Section 9.2.2 as the smallest discrete-time n such that [n = 0, then Pr [W0 n] = Pr [[n = 0]. The probability generating function of the scaled random variables Zn = [n follows from (12.21) and (12.8) as n *Zn (}) = ¡ n31 ¢ 3n 1 } n31 + 1 (n 1) } 3n n + 1 from which *Z ;Jhr (}) = limn<" *Zn (}) follows as *Z ;Geo (}) = 1 log } + 1 1 log } + 1 1 (12.22) and "Z ;Geo (w) = w 1 " X n31 (1)n n +1 = 1 + w n31 w + 1 1 ( 1) n=0 (12.23) ¤ £ ¤ £ Since "Z (w) = H h3Z w and using (2.40), all moments are found as H Z n = n!n31 . (31)n31 Furthermore, with 0 = *Z (0) = 1 and from (2.38), the probability density function follows as Z f+l" w + 1 1 1 {w iZ ;Geo ({) = h gw (f A 0) 1 2l f3l" w + 1 By closing the contour for { A 0 over the negative Re(w)-plane, we encounter a simple pole at w = 1 + 1 = (1 0 ) ? 0 (since A 1) resulting in ´2 ³ ³ ´´ ; ³ 1 1 A 1 exp { 1 {A0 ? 1 iZ ;Geo ({) = (12.24) {=0 ({) A = 0 {?0 From (12.7) the variance is Var[ZGeo ] = +1 31 . The limit random variable ZGeo of a geometric branching process is exponentially distributed with an atom at { = 0 equal to the extinction probability 0 = 1 . From (12.17), the exponent Geo = 1 for any value of 1. Comparing (12.24) and the general relation (12.19) for small { indicates that the parameter I = ´2 ³ 1 1 for a geometric production process. The limit random variable Z for production processes \ of which all moments exist can be computed via Taylor series expansions. In Van Mieghem 246 Branching processes (2005), series for both "Z ;Po (w) and iZ ;Po ({) of a Poisson branching process are presented. Fig. 12.3 illustrates that the probability density function iZ ;Po ({) of a Poisson branching process is definitely distinct from that of 1 geometric branching process. Since H [Z ] = 1, the variance Var[ZPo ] = 31 of a Poisson limit random variable ZPo implies that iZ ;Po ({) is centered around { = 1 more tightly as increases. 13 General queueing theory Queueing theory describes basic phenomena such as the waiting time, the throughput, the losses, the number of queueing items, etc. in queueing systems. Following Kleinrock (1975), any system in which arrivals place demands upon a finite-capacity resource can be broadly termed a queueing system. Queuing theory is a relatively new branch of applied mathematics that is generally considered to have been initiated by A. K. Erlang in 1918 with his paper on the design of automatic telephone exchanges, in which the famous Erlang blocking probability, the Erlang B-formula (14.17), was derived (Brockmeyer et al., 1948, p. 139). It was only after the Second World War, however, that queueing theory was boosted mainly by the introduction of computers and the digitalization of the telecommunications infrastructure. For engineers, the two volumes by Kleinrock (1975, 1976) are perhaps the most well-known, while in applied mathematics, apart from the penetrating influence of Feller (1970, 1971), the Single Server Queue of Cohen (1969) is regarded as a landmark. Since Cohen’s book, which incorporates most of the important work before 1969, a wealth of books and excellent papers have appeared, an evolution that is still continuing today. 13.1 A queueing system Examples of queueing abound in daily life: queueing situations at a ticket window in the railway station or post o!ce, at the cash points in the supermarket, the waiting room at the airport, train or hospital, etc. In telecommunications, the packets arriving at the input port of a router or switch are buered in the output queue before transmission to the next hop towards the destination. In general, a queueing system consists of (a) arriving items 247 248 General queueing theory Service process Departure process Arrival process Queueing process Fig. 13.1. The main processes in a general queueing system. (packets or customers), (b) a buer or waiting room, (c) a service center and (d) departures from the system. The main processes as illustrated in Fig. 13.1 are stochastic in nature. Initially in queueing theory, the main stochastic processes were described in continuous time, while with the introduction of the Asynchronous Transfer Mode (ATM) at the late eighties, many queueing problems were more eectively treated in discrete time, where the basic time unit or time slot was the minimum service time of one ATM cell. In the literature, there is unfortunately no widely adopted standard notation for the main random variables, which often troubles the transparency. Let us start defining the main random variables in continuous time. 13.1.1 The arrival process The arrival process is characterized by the arrival time wq of the q-th packet (customer) and the interarrival time q = wq wq31 between the q-th and (q 1)-th packet. If all interarrival times are i.i.d. random variables with distribution ID (w), then Pr [q w] = ID (w) As illustrated in Fig. 8.1, we can associate a counting process {Q (w)> w 0} to the arrival process {wq > w 0} by the equivalence {Q (w) q} +, {wq w}. In other words, if all interarrival times are i.i.d., the number of arriving packets (customers) is a general renewal process with interarrival time distribution specified by ID (w). We mention explicitly the condition of independence which was initially considered as a natural assumption. In recent measurements, however, arrivals of IP packets are shown not to obey this simple condition of independence, which has lead to the use of complicated self-similar and long-range dependent arrival processes. In the sequel, we will use the following notation: QD (w) is the number of Rw arrivals at time w, while D(w) = 0 QD (x)gx is the total number of arrivals in the interval [0> w]. 13.1 A queueing system 249 13.1.2 The service process The service process is specified in similar way by the service time {q of the qth packet (customer). If the random variables {q are i.i.d. with distribution I{ (w), then Pr [{q w] = I{ (w) (13.1) The service process needs additional specifications. First of all, in a singleserver queueing system, only one packet (customer) is served at a time. If there is more than one server, more packets can evidently be served simultaneously. Next, we must detail the service discipline or scheduling rule, which describes the way a packet is treated. There is a large variety of service disciplines. If all packets are of equal priority, the simplest rule is first-in-first-out (FIFO), which serves the packets in the same order in which they arrive. Other types such as last-in-first-out or a random order are possible, though in telecommunication, FIFO occurs most often. If we have packets of dierent multimedia flows, all with dierent quality of service requirements, not all packets have equal priority. For instance, a delay-sensitive packet (of e.g. a voice call) must be served as soon as possible preferably before non-delay-sensitive packets (of e.g. a file transfer). In these cases, packets are extracted from the queue by a certain scheduling rule. The simplest case is a two-priority system with a head-of-the line scheduling rule: high-priority packets are always served before low-priority packets. In the sequel, we confine the presentation to a single-server system with one type of packet and a FIFO discipline. Hence, we omit a discussion of scheduling rules. A next assumption is that of work conservation: if there is a packet waiting for service, the server will always serve the packet. Thus, the server is only idle if there are no packets waiting in the buer and immediately starts service when the first packet is placed in the queue or arrives. In a non-work-conservative system, the server may stay idle, even if there are customers waiting (e.g. a situation where patients have to wait during a coee break in a hospital). Finally, we assume that the arrival process is independent of the service process. Situations where arriving packets of some type (e.g. control) change the way the remaining packets in the buer are served, or a service discipline that serves at a rate proportional to the number of waiting packets, are not treated. The service in a router consists in fetching the packet from the buer, inspecting the header to determine the correct output port and in placing the packet on the output link for transmission. In this chapter unless the contrary is explicitly mentioned, we consider 250 General queueing theory a single-server queueing system under a work-conservative, FIFO service discipline in which the arrival and service process are independent. 13.1.3 The queueing process From Fig. 13.1, we observe at least two aspects regarding the queue or buer: (a) the number of dierent queues and (b) the number of positions in the queue. In general, a queueing system may have several queues or even a shared queue for dierent servers. For example, in a router, there is one physical fast memory or buer in which arriving packets are placed. Depending on the output interfaces, each link driver per output port is a server that extracts the packet destined for its link from the common buer and transmits the packet on this link. For simplicity, we consider here only one queue with N positions. Often queueing analyses are greatly simplified in the infinite buer case N $ 4. If the buer is infinitely long, there is zero loss, as opposed to the finite buer case in which losses can occur if the queue is full and packets arrive. So far, the description of the queueing system is complete: we have specified the arrival process, the service process and the physical size of the waiting room or queue. We now turn our attention to desirable quantities that can be deduced from the model specification of the queueing system such as (a) the waiting or queueing time zq of the q-th packet, (b) the system time Wq = zq + {q of the q-th packet, (c) the unfinished work (also called the virtual waiting time or workload) y(w) at time w, (d) the number of packets in the queue QT (w) or in the system QV (w) at time w and (e) the departure time uq of the q-th packet. The waiting or queueing time zq of the q-th packet is only zero if the queue is empty at arrival time wq . The unfinished work y(w) at time w is the time needed to empty the queueing system or to serve all remaining packets in the system (queue plus server) at time w. Hence, the unfinished work at time w is equal to the sum of the service times of the QT (w) buered packets at time w plus the remaining service time of the packet under service at time w. Precisely at an arrival epoch w = wq as illustrated in Fig. 13.2, we observe that y(wq ) = Wq = zq + {q . In addition, y 0 (w) = 1 for all w 6= wq or y(w) = max [Wq w + wq > 0] for w wq . The departure times uq satisfy uq = wq + Wq . The time during which the server is busy is called a busy period, and likewise, the interval of nonactivity is called an idle period. 13.1 A queueing system 251 v(t) x2 x1 x3 x6 w4 w3 w2 x4 x5 w 6 r1 r2 r3 r4 r5 r6 NS(t) t1 t2 t3 t4 t5 busy period idle t6 t t Fig. 13.2. The unfinished work y(w) and the number of packets in the system QV (w) as function of time. At any new arrival at wq holds y(wq ) = zq +{q . The unfinished work y(w) decreases with slope 1 between two arrivals. The waiting times zq and departure times uq are also shown. Notice that z1 = z5 = 0. 13.1.4 The Kendall notation for queueing systems Kendall introduced a notation that is commonly used to describe or classify the type of a queueing system. The general syntax is D@E@q@N@p, where D specifies the interarrival process, E the service process, q the number of servers, N the number of positions in the queue and p restricts the number of allowed arrivals in the queueing system. Examples for both the interarrival distribution D and the service distribution E are P (memoryless or Markovian) for the exponential distribution, J for a general distribution1 and G for a deterministic distribution. When other letters are used besides these three common assignments, the meaning will be defined. For example, P@J@1 stands for a queueing system with exponentially distributed interarrival times, a general service distribution and 1 server. If one of the two last identifiers N and p is not written, they should be interpreted as infinity. Hence, P@J@1 has an infinitely long queue and no restriction on the number of allowed arrivals. 1 Often J is written where JL, general independent process, is meant. We interpret J as a general interarrival process, which can be correlated over time. 252 General queueing theory 13.1.5 The tra!c intensity An important parameter in any queueing system is the tra!c intensity also called the load or the utilization, defined as the ratio of the mean service time H [{] = 1 over the mean interarrival time H [ ] = 1 = H [{] = H [ ] (13.2) where and are the mean interarrival and service rate, respectively. Clearly, if A 1 or H [{] A H [ ], which means that the mean service time is longer than the mean interarrival time, then the queue will grow indefinitely long for large w, because packets are arriving faster on average than they can be served. In this case ( A 1), the queueing system is unstable or will never reach a steady-state. The case where = 1 is critical. In practice, therefore, mostly situations where ? 1 are of interest. If ? 1, a steadystate can be reached. These considerations are a direct consequence of the law of conservation of packets in the system, but can be proved rigorously by ergodic theory or Markov steady-state theory, which determine when the process is positive recurrent. 13.2 The waiting process: Lindley’s approach From the definition of the waiting time and from Fig. 13.2, a relation between zq+1 and zq is found. Suppose the waiting time for the first packet z1 = z, which is the initialization. If uq wq+1 , which means that the q-th packet leaves the queueing system before the (q + 1)-th packet arrives, the system is idle and zq+1 = 0. In all other situations, uq A wq+1 , the q-th packet is still in the queueing system while the next (q + 1)-th packet arrives and zq+1 = wq + zq + {q wq+1 . Indeed, the waiting time of (q + 1)-th packet equals the system time Wq = zq + {q of the q-th packet which started at wq minus his own arrival time wq+1 . During the interval [wq > wq+1 ], the queueing system has processed an amount of the unfinished work equal to wq+1 wq = q+1 time units. Hence, we arrive at the general recursion for the waiting time, zq+1 = max (zq + {q q+1 > 0) Let q = {q q+1 , then zq+1 = max [0> zq + q ] = max [0> max [zq31 + q31 > 0] + q ] = max [0> q > zq31 + q31 + q ] (13.3) 13.2 The waiting process: Lindley’s approach and, by iteration, " zq+1 = max 0> q > q31 + q > q32 + q31 + q > = = = > q X 253 # n + z1 (13.4) n=1 A number of observations are in order: First, if both the interarrival times q+1 and the service times {q are i.i.d. random variables and mutually independent, then the dierences q are i.i.d. random variables. In addition, zq and q are also independent because (13.4) shows that zq only depends on n with indices n ? q. Then, the waiting time process {zq }qD1 is a discrete time Markov process with a continuous state space (the waiting times zq are positive real numbers) because the general relation (13.3) reveals that, since the random variable q is independent of zq , the waiting time for the (q + 1)-th packet is only dependent on the waiting time of the previous q-th packet. This is the Markov property. Since the state space is a continuum, it is not a Markov chain, merely a Markov process. Second, if there exists a packet p for which zp = 0 (e.g. packet p = 5 in Fig. 13.2), which means that the p-th packet finds the system empty, then all packets after the p-th packet are isolated from the eects of those before the p-th. Mathematically, this separation between two busy periods directly follows from (13.3) because zp+1 = max [0> p ] leading via iteration for q p to # " q q X X zq+1 = max 0> q > q31 + q > = = = > n > n n=p+1 n=p In other words, this relation is similar to (13.4) as if the system were started from n = p and zp = 0 instead of n = 1 with z1 = z. Any busy period can be regarded as a renewal of the waiting process, independent of the previous busy periods. Third, again invoking the assumption that q are i.i.d. random variables, then the order in the sequence {q }qD1 is of no importance in (13.4) and we may relabel the random variables in (13.4) as n $ q3n+1 to obtain a new random variable # " q31 q X X n > n + z1 $q+1 = max 0> 1 > 1 + 2 > = = = > n=1 n=1 which is identically distributed as zq+1 . The interest of this observation P is that, provided z1 = 0 and, hence, $q = max1$m$q q3m n=0 n where 0 = 0, the sequence {$q }qD1 can only increase with q, because the maximum 254 General queueing theory P cannot decrease2 if an additional term q+1 n=0 n is added. Thus, if z1 = 0, the event {$q+1 ? {} is always contained in {$q ? {}. In steady-state, which is reached if q $ 4, lim {$q ? {} = _" q=1 {$q ? {} = {sup q<" mD0 m X n ? {} n=0 which means that the random variable $q with same distribution as the waiting time zq converges to a limit random variable that is the supremum P of the terms mn=0 n in the series. From this relation, it follows that the steady-state distribution Z ({) of the waiting time is " # m X n ? { Z ({) = lim Pr [zq ? {] = lim Pr [$q ? {] = Pr sup q<" q<" mD0 n=0 if the latter probability exists, i.e. not zero for all {. Lindley has proved that, if ? 1, the latter corresponds to a proper probability distribution. In other words, the steady-state distribution of the waiting time in a GI/G/1 system3 exists. Alternatively, the Markov process {zq }qD1 is ergodic if ? 1. Lindley’s proof is as follows. Due to the assumption that q are i.i.d. random variables, the 1 Sq Strong Law of Large Numbers (6.3) is applicable: Pr limq<" q n=0 n = H [] = 1 where H [] = H [{] 3 H [ ] ? 0 (the mean service time is smaller than the mean interarrival time) if ? 1 while H [] A 0 if A 1. In case A 1, there exists a number A 0 and A 1 such Sq Sq 1 that, for all q A , holds n=0 n D H [] q with probability 1. For large , n=0 n can be k l Sm made larger than any fixed { such that Pr supmD0 n=0 n ? { = 0. In case ? 1, we have for Sq su!ciently large q that n=0 n ? 0. Thus, for any { A 0 and A 0, there exists a number (independent of {) such that, for all q A , % q & % q & [ [ Pr n ? { D Pr n ? 0 A 1 3 n=0 n=0 while, for q ? , we can always find a number A 0 such that for all { A , & % q [ n ? { A 1 3 Pr n=0 Sm Since supmD0 n=0 n is attained for m ? or m A and because both regimes can be bounded by the same lower bound, 6 5 & %qA & %q? m [ [ [ n ? {8 A Pr n ? { 1m? + Pr n ? { 1mA Pr 7sup mD0 n=0 n=0 n=0 A13 k l k l S S Clearly, lim{<" Pr supmD0 mn=0 n ? { = 1 and Pr [zq ? 0] = 0, thus Pr supmD0 mn=0 n ? { 2 This observation cannot be made from (13.4) because q , which aects all but the first term in the maximum, can be negative. 3 Notice that the analysis crucially relies on the independence of the interarrival and service process. 13.2 The waiting process: Lindley’s approach 255 is non-decreasing and a proper probability distribution. We omit the considerations for the case = 1. ¤ We now concentrate on the computation of the steady-state distribution for the waiting time (in the queue) in the case that the load ? 1 and under the confining assumption that both the interarrival times q+1 and the service times {q are i.i.d. random variables. We find from (13.3) that Pr [zq+1 ? {] = Pr [zq + q ? {] if { 0 Pr [zq+1 ? {] = 0 if { ? 0 With the law of total probability (2.46) and since q can be negative, the right-hand side is Z " g Pr [zq ? { v|q = v] Pr [q ? v] gv Pr [zq + q ? {] = gv 3" Using the independence of zq and q , and that zq 0, we obtain for { 0, Z { Pr [zq + q ? {] = Pr [zq ? { v] g Pr [q ? v] 3" The distribution Pr [q ? v] = Pr [{q q+1 ? v] = Fq (v) can be computed (see Problem (v) in Chapter 3) provided the interarrival and service process are known. Proceeding to the steady-state by letting q $ 4 amounts to Lindley’s integral equation in Z ({) = limq<" Pr [zq ? {] with F({) = limq<" Fq ({), Z { Z ({ v)gF(v) if { 0 (13.5) Z ({) = 3" =0 if { ? 0 The integral equation (13.5) is of the Wiener-Hopf type and treated in general by Titchmarsh (1948, Section 11.17) and specifically by Kleinrock (1975, Section 8.2) and Cohen (1969, p. 337). Apart from Lindley’s approach, Pollaczek has used variants of the complex integral expression for Z {h3d{ f+l" h{} g} (f A Re(d) A 0) max({> 0) = 2l f3l" } d to treat the complicating non-linear function max({> 0) in (13.3). Several other approaches (Kleinrock, 1975, Chapter 8) have been proposed to solve (13.3). We will only discuss the approach due to Benes̆, because his approach does not make the confining assumption that both the interarrival times q+1 and the service times {q are i.i.d. random variables. As mentioned before, in Internet tra!c, which has been shown to be long-range dependent 256 General queueing theory (i.e. correlated over many time units mainly due to TCP’s control loop), the interarrival times can be far from independent. D(t) [(W) v(W) D(u) x1+ x2+ x3 [(u) v(u) x1+ x2 u x1 idle t 0 t1 t2 t3 t 4 u t5 t 6 t7 t8 t9 W t10 Fig. 13.3. The amount of work arriving to the queueing system (w) versus time w. At w = x, we observe that (w) = (w) w + y(0 ) A 0. The largest value of (w) (x) is found for x = w1 because (w1 ) = w1 , the only negative value of (w) in [0> x). Graphically, we shift the line at 45r so to intersect the point (w1 > (w1 )) to determine y(x). At w = , ( ) ? 0 and the largest negative value of (w) in [0> ) is attained at w = w8 . Three of the five idle periods have also been shown. 13.3 The Benes̆ approach to the unfinished work Instead of observing the queueing system at a time w, the Benes̆ approach considers the behavior over a time interval [0> w). Let (w), (w) and e(w) denote the amount of work arriving to the queueing system in the interval [0> w), the total idle time of the server and the total busy time of the server in the interval [0> w), respectively. The amount of work arriving to the system is expressed in units of time and must be regarded as the time needed to process this work, similarly to the definition of the unfinished work. If the work to process arrives at discrete times, then (w) increases in jumps, as 13.3 The Benes̆ approach to the unfinished work 257 illustrated in Fig. 13.3, X D(w) (w) = {m m=0 In general, however, the work may arrive continuously over time with possibly jumps at certain times. The purpose is to determine the unfinished work or virtual waiting time y(w) at time instant w, and not Rover a time w interval [0> w) as the previously defined quantities and D(w) = 0 QD (x)gx. Clearly, for w A 0, the unfinished work at time w consists of the total amount of work brought in by arrivals during [0> w) plus the amount of work present just before w = 0 minus the total time the server has been active, y(w) = y(03 ) + (w) e(w) (13.6) From the definitions above, (w) + e(w) = w (13.7) Moreover, (w), (w) and e(w) are non-decreasing and right continuous (jumps may occur) functions of time w. Since (w) and e(w) are complementary, it is convenient to eliminate e(w) from (13.6) and (13.7) and further concentrate on the total idle time (w) given as (w) = y(w) + w y(03 ) (w) (13.8) If y(x) A 0 at any time x 5 [0> w), then (w) = 0. On the other hand, if y(x) = 0 at some time x 5 [0> w), then it follows from (13.8) that (x) = x y(03 ) (x) (13.9) Since (w) is non-decreasing in w, the total idle time in the interval [0> w) at moments x when the buer is empty (y(x) = 0) is the largest value for that can be reached in [0> w), ¢ ¡ (w) = sup x y(03 ) (x) 0?x?w and the supremum is needed because (w) can increase discontinuously (in jumps). Combining the two regimes, we obtain in general that ¸ ¡ ¢ 3 (w) = max 0> sup x y(0 ) (x) (13.10) 0?x?w 258 General queueing theory Equating the two general expressions (13.8) and (13.10) for the total idle time of the server leads to an new relation for the unfinished work, ¸ ¡ ¢ 3 3 y(w) = y(0 ) + (w) w + max 0> sup x y(0 ) (x) 0?x?w ¸ 3 = max y(0 ) + (w) w> sup {(w) w ((x) x)} 0?x?w The quantity (w) = (w) w + y(03 ) is recognized as the server overload during [0> w), while (w) w is the amount of excess work arriving during the interval [0> w). Thus, (w) (x) is the amount of excess work during [x> w) or the overload of the server during [x> w) provided x A 0 and (0) = y(03 ). Then, ¸ y(w) = max (w)> sup {(w) (x)} 0?x?w and, with the convention that y(03 ) = sup0?x?03 {(0) (x)} = (0), y(w) = sup {(w) (x)} (13.11) 0?x?w The unfinished work y(w) at time w is equal to the largest value of the overload or excess work during any interval [x> w) [0> w). The relation (13.11) is illustrated and further explained in Fig. 13.3. This general relation (13.11) shows that the unfinished work is the maximum of a stochastic process. Furthermore, if y(w) = 0, (13.11) indicates that sup0?x?w {(w) (x)} = 0. Let xW denote the value at which sup0?x?w {(w) (x)} = (w) (xW ) = 0. But (xW ) is the lowest value of (w) in [0> w) and, unless an arrival occurs during the interval [w> w+ w], (w + w) = (w) w ? (w). This argument shows that, as soon as a new idle period begins, (w) attains the minimum value so far. During the idle period as shown in Fig. 13.4, (w) further decreases linearly with slope 1 towards a new minimum (em ) in [0> em ] until the beginning of a new busy period, say the m-th at w = em . Then, for all em ? w ? em+1 , y(w) = (w) (em ) = sup {(w) (x)} em ?x?w In other words, we observe that idle periods decouple the past behavior from future behavior, as deduced earlier from the waiting time analysis in Section 13.2. As illustrated in Fig. 13.4, the series {(em )}, where em denotes the start of the m-th busy period, is monotonously decreasing in em , i.e. (em ) A (em+1 ) for any m. 13.3 The Benes̆ approach to the unfinished work 259 [(t) b1 b3 b2 b4 b5 0 t 0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 Fig. 13.4. The excess work (w) for the same process as in previous plot. The arrows with em denote the start of the m-th busy period. Observe that (em ) is the minimum so far and that a busy period ends at w A em for which (w) = (em ). The length of a busy period has been represented by a double arrow. Let us proceed to compute the distribution of the unfinished work following an idea due to Benes̆. Benes̆ applies the identity4 , valid for all }, Z w 3}w =1} h3} g h 0 to the total idle time of the server by putting = (x) 3}w h Z 31 (w) =1} h3}(x) g(x) 31 (0) where 31 (w) is the inverse function. Note that (0) = 0 and that g(x) = 1{y(x)=0} gx = (y(x)) gx where ({) is the Dirac impulse. Let w $ (w), then Z w 3}(w) =1} h3}(x) (y(x)) gx h 0 Substituting (13.9) in the integral, which is only valid if y(x) = 0, and (13.8) at the left-hand side, which is generally valid, gives Z w 3 3 h3}(y(w)+w3y(0 )3(w)) = 1 } h3}(x3y(0 )3(x)) (y(x)) gx 0 4 Borovkov (1976, p. 30) proposes another but less simple approach by avoiding the use of the identity ingeniously introduced by Benes̆= 260 General queueing theory or, in terms of the excess work (w) = (w) w + y(03 ), Z w 3}y(w) 3}(w) =h } h3}((w)3(x)) (y(x)) gx h 0 Taking the expectation of both sides yields, Z w h h i i h i 3}y(w) 3}(w) 3}((w)3(x)) H h H h (y(x)) gx =H h } 0 Recall that, with (2.34), with the definition of a generating function (2.37), and further with (2.61), i h T = H h3}((w)3(x)) (y(x)) Z "Z " C2 = h3}{ (|) Pr [(w) (x) {> y(x) |] g{g| C{C| 3" 3" and with (2.45), we have Z 4Z 4 C2 Pr [(w) (x) {|y(x) |] Pr [y(x) |] g{g| h}{ (|) C{C| 4 4 Z 4 Z 4 C Pr [y(x) |] C = g{h}{ (|) Pr [(w) (x) {|y(x) |] g| C{ 4 C| 4 Z 4 g h}{ = Pr [(w) (x) {|y(x) = 0] Pr [y(x) = 0] g{ g{ 4 T= Combining all leads to ] " 3" h3}{ iy(w) ({)g{ = ] " 3" 3} h3}{ i(w) ({)g{ ] w ] " g h3}{ Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{ g{ 3" 0 By partial integration, we can remove the factor } at both sides. Indeed, since Pr [y(w) {] = 0 for { ? 0, Z " Z " 3}{ h iy(w) ({)g{ = } h3}{ Pr [y(w) {] g{ 3" 3" Hence, we arrive at ] " h3}{ Pr [y(w) $ {] g{ = 3" ] " k h3}{ Pr [(w) $ {] 3" 3 which is equivalent to Pr [y(w) {] = Pr [(w) {] g g{ ] w Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{ Z w 0 g Pr [(w) (x) {|y(x) = 0] Pr [y(x) = 0] gx g{ 0 (13.12) 13.3 The Benes̆ approach to the unfinished work 261 This general relation for the distribution of the unfinished work in terms of the excess work is the Benes̆ equation. If y(x) = 0 for all x 5 [0> w), this means that during that interval no work arrives and that (w) (x) = x w or that w (w) (x) 0 for any x 5 [0> w). Thus, if we choose { 5 [w> 0) such that the event {(w) (x) = x w {} is possible, the probabilities appearing in the right-hand side are not identically zero while Pr [y(w) {] = 0. Hence, for { 5 [w> 0), the Benes̆ equation reduces to Z w g Pr [(w) (x) {|y(x) = 0] Pr [y(x) = 0] gx Pr [(w) {] = g{ 0 from which the unknown probability of an empty system Pr [y(x) = 0] can be found5 for w + { x w. The Benes̆ equation translates the problem of finding the time-dependent virtual waiting time or unfinished work in an integral equation that, in principle, can be solved. We further note that in the derivation hardly any assumptions about the queueing system nor the arrival process are made such that the Benes̆ equation provides the most general description of the unfinished work in any queueing system. Of course, the price for generality is a considerable complexity in the integral equation to be solved. However, we will see examples6 of its use in ATM. 13.3.1 A constant service rate If the server operates deterministically as in ATM, for example, the amount of work arriving to the queueing system in the interval [0> w) simplifies to (w) = D(w), the number of ATM cells in the interval [0> w), because {m = { is the time to process one ATM cell, which we take as time unit { = 1. With this convention, we have that (w) = D(w) w. After substitution of x = w |, the integral L in (13.12) is Z w g Pr [(w) (w |) {|y(w |) = 0] Pr [y(w |) = 0] g| L= g{ 0 and the event {(w) (w |) {} = {D(w) D(w |) { + |} 5 This relation in the unknown function i (x) = Pr [y(x) = 0] is a Volterra equation of the first kind (see e.g. Morse and Feshbach (1978, Chapter 8)) ] } j(}) = N(}|x)i (x)gx d These integral equations frequently appear in physics in boundary problems, potential and Green’s function theory. 6 Borovkov (1976) investigates the Benes̆ method in more detail. He further derives from (13.12) formulae for light and heavy tra!c, and the discrete time process. 262 General queueing theory Since D(w) D(w |) is a non-negative integer n and thus a discrete random variable, the probability density function is gPr[(w)(w|) {|y(w|) = 0] = Pr[D(w)D(w|) = n|y(w|) = 0] 1{+|=n g{ which implies that only values at | = n { contribute to the integral L. Hence, 0 | w implies that d{e n b{ + wc where b}c (respectively d}e) are the integer equal to or smaller (respectively larger) than }, X b{+wc L= Pr [D(w) D(w + { n) = n|y(w + { n) = 0] Pr [y(w + { n) = 0] n=d{e Hence, for a discrete queue with time slots equal to the constant service time, the Benes̆ equation reduces to Pr [y(w) {] = Pr [D(w) b{ + wc] X (13.13) b{+wc Pr [D(w)D(w+{n) = n|y(w+{n) = 0]Pr[y(w+{n) = 0] n=d{e 13.3.2 The steady-state distribution of the virtual waiting time Let us turn to the steady-state distribution Y ({) = limw<" Pr [y(w) {]. Since (w) is the amount of work arriving to the queueing system in the is the mean amount of work arriving in that interval interval [0> w), (w) w [0> w). The steady-state stability condition, equivalent to ? 1, requires that (w) =?1 lim w<" w because the server capacity is 1 unit of work per unit of time. Since (w) is not decreasing, (w) decreases continuously with slope 1 between two arrivals and increases (possibly discontinuously with jumps) during arrival epochs, as illustrated in Fig. 13.3. In the stable steady-state regime where ? 1 and limw<" (w) w ? 1, we find that µ ¶ (w) lim (w) = lim w 1 = 4 w<" w<" w and thus limw<" Pr [(w) {] = 1 and limw<" (w) w = 1 ? 0. From (13.7), we see that (w) e(w) = 1 lim =1 lim w<" w w<" w 13.4 The counting process 263 which, with (13.9), suggests that Y (0) = lim Pr [y(w) = 0] = 1 w<" (13.14) If the Strong Law of Large Numbers is applicable, which implies that the lengths of the idle periods are independent and identically distributed, this relation Y (0) = 1 is proved to be true by Borovkov (1976, pp. 33—34). Hence, for any stationary single-server system with tra!c intensity , the probability of an empty system at an arbitrary time is 1 . Taking the limit w $ 4 in (13.12) then yields Z w gPr[(w)(w |) {|y(w |) = 0] Pr[y(w |) = 0]g| Y ({) = 1 lim w<" 0 g{ The tail probability 1 Y ({) = limw<" Pr [y(w) A {] = Pr [y(w" ) A {] is Z " gPr[(w" )(w"|) {|y(w"|) = 0] Pr[y(w" |) = 0]g| Pr[y(w" ) A {] = g{ 0 This relation shows that, at a point in the steady-state w" $ 4, the contributions to Y ({) are due to arrivals and idle periods in the past. The corresponding steady-state equation for (13.13) is ¯Z w" ¸ " X ¯ Pr y(w" + { n) = 0 ¯¯ QD (x) gx = n Pr [y(w" ) A {] = n=d{e w" +{3n ¸ Z w" × Pr w" +{3n QD (x) gx = n (13.15) 13.4 The counting process A similar general conservation relation to (13.3) can be deduced for the counting process, QV (un+1 ) = max (QV (un ) 1> 0) + QD (un+1 ) QD (un ) The number of packets in the system at the departure time of the (n + 1)-th packet equals the number of packets in the system at the departure time of the previous packet n minus that packet, but increased by the number of arrivals in the time interval [un > un+1 ]. Similarly, for the queue (which is system minus the packet currently under service), QT (un+1 ) = max (QT (un ) + QD (un+1 ) QD (un ) 1> 0) which is the direct analog of (13.3) for the waiting time. Whereas the waiting process is more natural to consider in problems where 264 General queueing theory interarrival times are specified, the counting process has more advantages in a discrete time analysis. In the latter, the queueing system is observed at certain moments in time, for instance, at the beginning of a timeslot n that starts immediately7 after the departure of the n-th packet and is equal to the interval [un > un+1 ], for all un A 0 and u0 = 0. It will be convenient to simplify the notation: Sn = QV (un+ ) denotes the system content (i.e. the number of occupied queue positions including the packets currently being served) at the beginning of timeslot n, Qn = QT (un+ ) is the queue content at the beginning of timeslot n, and Xn and An are the number of served packets and of arriving packets during timeslot n respectively. The system content satisfies the continuity (or balance) equation Sn+1 = (Sn Xn )+ + An (13.16) whereas the queue content obeys Qn+1 = (Qn Xn + An )+ (13.17) where ({)+ max({> 0). On the other hand, the relation between system and queue content implies that Qn = (Sn Xn )+ such that (13.16) is rewritten as Sn+1 = Qn + An (13.18) The number of cells at the beginning of a timeslot n + 1 in the system is the sum of the number of queued packets at the beginning of the previous timeslot n and the newly arrived packets during timeslot n. 13.4.1 Queue observations It is worthwhile to investigate the relation between observations at various instances of time of the queueing process {QV (w)> w 0} which represents the number of packets in the system at time w. As seen before and as illustrated in Fig. 13.5, two time instances seem natural: an inspection at departure times where QV (uq+ ) describes the number of packets in the system just after departure of the q-th packet and an observation at arrival times where QV (w3 q ) describes the number of packets in the system just before the q-th packet enters. Suppose that the q-th packet leaves QV (uq+ ) = n m packets behind in the system. This implies that precisely n arrivals after the q-th packet have entered the system. Hence, the (q + n + 1)-th packet sees, just before 7 We write {+ = { + and {3 = { 3 where A 0 is an arbitrary, positive real number. The notation {+ should not be confused with ({)+ = max({> 0). 13.4 The counting process 265 entering the system, at most n packets, because during the period w = uq and w = wq+n+1 uq , only departures are possible. Thus, QV (w3 q+n+1 ) n and, clearly, for m n, the (q + m + 1)-th packet observes no more than QV (w3 q+m+1 ) m. Hence, the following implication holds n o © ª QV (uq+ ) m =, QV (w3 ) m q+m+1 t n j 1 t n k 1 tn+k n+k n+k n+1 x d n+1 n+j n+j+1-k n r n j k r only departures possible ^N r S n ` ^ k NS t n k 1 only arrivals possible dk ` ^N t S which holds for all k d j, n S k N S rn j k d k ` which holds for all k d j, ^N r d j` ^N t S ` ^ n j 1 n j 1 ^N t S d j n j 1 ` ^N t S ` ^ n j 1 d j N S rn d j ` ^ d j N S rn d j ` ` Fig. 13.5. Relation between queue observations at arrival and at departure epochs. Consider now the converse. Suppose that the (q + m + 1)-th packet sees precisely n m packets in front of it upon arrival: QV (w3 q+m+1 ) = n m. This implies that the (q + m + 1 n)-th packet is the first packet that will leave the system after w = wq+m+1 and that the (q + m n)-th packet has already left the system. At its departure at w = uq+m3n wq+m+1 , it has observed at most n packets behind it, because only arrivals are possible in the interval + ) n and, set n = m, then QV (uq+ ) m [uq+m3n > wq+m+1 ). Hence, QV (uq+m3n leading to the implication o n © ª + QV (w3 q+m+1 ) m =, QV (uq ) m Combining both implications leads to the equivalence, o n © ª ) m +, QV (uq+ ) m QV (w3 q+m+1 266 General queueing theory or, for any sample path (or realization), it holds, for any non-zero integer m, that h i £ ¤ Pr QV (w3 ) m = Pr QV (uq+ ) m q+m+1 In steady-state for q $ 4, with limq<" QV (w3 q ) = QV;D and limq<" QV (uq+ ) = QV;G , we find that Pr [QV;D = m] = Pr [QV;G = m] (13.19) In words, in steady-state, the probability of the number of packets in the system observed by arriving packets is equal to the probability of the number of packets in the system left behind by departing packets. Of course, we have assumed that the steady-state distribution exists. If one of these distributions exists, the analysis demonstrates that the other must exist. Notice that no assumptions about the distribution or dependence are made and that (13.19) is a general result which only assumes the existence of a steady-state. 13.5 PASTA Let us denote by limw<" QV (w) = QV the steady-state system content or the number of packets in the system in steady-state. To compute the waiting time distribution (under a FIFO service discipline), we must take the view of how a typical arriving packet in steady-state finds the queue. Therefore, it is of interest to know when ? Pr [QV;D = m] = Pr [QV = m] (13.20) The equality would imply that, in steady-state, the probability that an arriving packet finds the system in state m equals the probability that the system is in state m. Recall with (6.1) that the existence of the probabilities means that Pr [QV = m] also equals the long-run fraction of the time the system contains m packets or is in state m. Similarly, Pr [QV;D = m] also equals the long-run fraction of arriving packets that see the system in state m. In general, relation (13.20) is unfortunately not true. For example, consider a D/D/1 queue with a constant interarrival time f and a constant service time {f ? f . Clearly, the D/D/1 system has a periodic service cycle: a busy period takes {f time units and the idle period equals f {f time units. Thus, every arriving packet always finds the system empty and conf cludes Pr [QV;D = 0] = 1, while Pr [QV = 1] = {ff and Pr [QV = 0] = f 3{ f . The waiting time computation of the GI/D/c system in Section 14.4.2 is another counter example. Since the arrival process {QD (w)> w 0} interacts 13.6 Little’s Law 267 with the system process {QV (w)> w 0} because every arrival increases the system content with one, they are dependent processes. Relation (13.20) is true for Poisson arrivals and this property is called “Poisson arrivals see time averages” (PASTA). Theorem 13.5.1 (PASTA) The long-run fraction of time that a process spends in state m is equal to the long-run fraction of Poisson arrivals that find the process in state m> Pr [QV;D = m] = Pr [QV = m] Proof: See8 e.g. Wol (1982). ¤ The Poisson process has the typical property that future increments are independent of the past and, thus also of the past system history. In certain sense, Poisson arrivals perform a random sampling which is su!cient to characterize the steady-state of the system exactly. The PASTA property also applies to Markov chains. The transitions in continuous time Markov chains are Poisson processes if self-transitions are allowed (see Section 10.4.1). For any state m, the fraction of Poisson events that see the chain in state m is m , which (see Lemma 6.1.2) also equals the fraction of time the chain is in state m. 13.6 Little’s Law Little’s Law is perhaps the simplest of the general queueing formulae. Theorem 13.6.1 (Little’s Law) The average number of packets (customers) in the system H [QV ] (or in the queue H [QT ]) equals the average arrival rate times the average time spent in the system H [W ] (or in the queue H [z]), H [QV ] = H [W ] (13.21) H [QT ] = H [z] 8 Although Wollf’s general proof (Wol, 1982) only contains two pages, it is based on martingales and on axiomatic probability theory. 268 General queueing theory Little’s Law holds if two of the three limits D(w) = w (13.22) QV (x)gx = H [QV ] (13.23) 1X lim Wn = H [W ] q<" q (13.24) lim lim 1 w<" w w<" Z w 0 q n=1 exist. $(t) idle T3 T2 3 2 1 t T1 t 1 = 0 t2 t5 W t3 t 4 t6 t7 t8 t9 t10 Fig. 13.6. The arrival (bold) and departure (dotted) process, together with the system time Wn for each packet in the queueing system. Proof: Recall that D(w) represents the total number of arrivals in time interval [0> w]. If QV (w) = 0 or the system is idle at time w, then Z w QV (x)gx = 0 Z wX " 0 m=1 D(w) Z w 1{xM[wm >wm +Wm ) gx = X m=1 0 X D(w) 1{xM[wm >wm +Wm ) gx = Wn n=1 The general case where QV (w) 0 is more complicated as Fig. 13.6 shows for w = because not all intervals [wm > wm + Wm ) for 1 m D( ) are contained PD( ) in n=1 Wn counts too much and is an upper bound for R [0> ). Hence, Q (x)gx. If G(w) denotes the number of departures in [0> w], Fig. 13.6 V 0 illustrates that the area (in grey) in an interval [0> w],Rwhich equals the total w number of packets in the system in that interval 0 (D(x) G(x))gx = Rw 0 QV (x)gx, can be bounded for any realization (sample path) and any w 0 13.6 Little’s Law 269 by Z w X Wn QV (x)gx 0 n:Wn +wn $w X D(w) Wn n=1 where the lower bound only counts the packets that have left the system by time w. By dividing by w, we have D(w) w X n:Wn +wn $w 1 Wn D(w) w Z w D(w) X Wn QV (x)gx w D(w) 0 D(w) (13.25) n=1 Since we assume that the limit (13.22) exists, we have that D(w) = R(w) for w $ 4. From the existence of the limit (13.24), we can thus write X Wn D(w) lim w<" n=1 D(w) = H [W ] When w $ 4 in (13.25) and using the limits defined above, we find that the upper bound converges to H [W ]. In order to proof (13.21), it remains to show that also the lower bound in (13.25) converges to the same limit H [W ]. Since D(w) = w + r(w) for w $ 4, it follows for the sequence of arrival times wq that wq $ 4 if q $ 4 and that q D(wq ) = $ wq wq as q $ 4 The convergence of the series (13.24) implies for q $ 4 that Wq X Wn q 1 X Wn = $0 q q q q1 q q31 n=1 n=1 Combining both relations leads to Wq q Wq = $0 wq q wq as q $ 4 which implies that, for any % A 0, there exists a fixed p such that, for all n A p, we have that Wwnn ? % or wn + Wn ? (1 + %)wn . For w A wp , the lower bound in (13.25) is X nAp:Wn +wn $w Wn + p X n=1 X D(w@(1+%)) Wn = n=1 Wn 270 General queueing theory or 1 w X nD1:Wn +wn $w 1 D(w@(1 + %)) Wn = 1 + % w@(1 + %) In the limit w $ 4, we obtain X 1 w nD1:Wn +wn $w Wn $ X D(w@(1+%)) n=1 Wn D(w@(1 + %)) H [W ] 1+% Since % can be made arbitrarily small, this finally proves (13.21). ¤ Although the proof may seem rather technical9 for, after all, an intuitive result, it reveals that no assumptions about the distributions of arrival and service process apart from steady-state convergence are made. There are no probabilistic arguments used. In essence Little’s Law is proved by showing that two limits exist for any sample path or realization of the process, which guarantees a very general theorem. Moreover, no assumptions about the service discipline, nor about the dependence between arrival and service process or about the number of servers are made which means that Little’s Law also holds for non-FIFO scheduling disciplines, in fact for any scheduling discipline! Little’s Law connects three essential quantities: once two of them are known the third is determined by (13.21). Little’s Law is very important in operations where it relates the average inventory (similar to H [QV ]), the average flow rate or throughput and the average flow time H [W ] in a process flow of products or services. Several examples can be found in Chapter 14, in Anupindi et al. (2006) and Bertsekas and Gallager (1992, pp. 157—162). 9 We have chosen for a very general proof. Other proofs (e.g. in Ross (1996) and Gallager (1996)) use arguments from renewal reward theory (Section 8.4) which makes their proofs less general because they require that the system has renewals. 14 Queueing models This chapter presents some of the simplest and most basic queueing models. Unfortunately, most queueing problems are not available in analytic form and many queueing problems require a specific and sometimes tailor-made solution. Beside the simple and classical queueing models, we also present two other exact solvable models that have played a key role in the development of Asynchronous Transfer Mode (ATM). In these ATM queueing systems the service discipline is deterministic and only the arrival process is the distinguishing element. The first is the N*D/D/1 queue (Roberts, 1991, Section 6.2) whose solution relies on the Benes̆ approach. The arrivals consist of Q periodic sources each with period of G time slots, but randomly phased with respect to each other. The second model is the fluid flow model of Anick et al. (1982), known as the AMS-queue, which considers Q on-o sources as input. The solution uses Markov theory. Since the Markov transition probability matrix has a special tri-band diagonal structure, the eigenvector and eigenvalue decomposition can be computed analytically. We would like to refer to a few other models. Norros (1994) succeeded in deriving the asymptotic probability distribution of the unfinished work for a queue with self-similar input, modeled via a fractal Brownian motion. The resulting asymptotic probability distribution turns out to be a Weibull distribution (3.40). Finally, Neuts (1989) has established a matrix analytic framework and was the founder of the class of Markov Modulated arrival processes and derivatives as the Batch Markovian Arrival process (BMAP). 14.1 The M/M/1 queue The M/M/1 queue consists of a Poisson arrival process of packets with exponentially distributed interarrival times, a service process with exponen271 272 Queueing models tially distributed service time, one server and an infinitely long queue. The M/M/1 queue is a basic model in queueing theory for several reasons. First, as shown below, the M/M/1 queue can be computed in analytic form, even the transient time behavior. Apart from the computational advantage, the M/M/1 queue possesses the basic feature of queuing systems: the quantities of interest (waiting time, number of packets, etc.) increase monotonously with the tra!c intensity . Packets arrive in the M/M/1 queue with interarrival rate and are served with service rate . The M/M/1 queue is precisely described by a constant rate birth and dead process. Any arrival of a packet to the queueing system can be regarded as a birth. The current state n that reflects the number of packets in the M/M/1 system jumps to state n + 1 at the arrival of a new packet and the transition rate equals the interarrival rate : on average every 1 time units a packet arrives to the system. A packet leaves the M/M/1 system after service, which corresponds to a death: at each departure from the system the current state is decreased by one, with death rate equal to the service rate : on average every 1 time units, a packet is served. In the sequel, we concentrate on the steady-state behavior and refer for the transient behavior to the discussion of the birth and death process in Section 11.3.3. 14.1.1 The system content in steady-state From the analogy with the constant rate birth and death process as studied in Section 11.3.3, we obtain immediately the steady-state queueing distribution (11.23) as Pr [QV = m] = (1 ) m m0 (14.1) where QV = limw<" QV (w) is the number of packets in the system in the stationary regime. In other words, Pr [QV = m] is the probability that the M/M/1 system (queue plus server) contains m packets. It has been shown in Section 11.3 that the M/M/1 queue is ergodic (i.e. that an equilibrium or steady-state exists) if = ? 1, which is a characteristic of a general queueing system. The probability density function of the system content (14.1) is a geometric distribution reflecting the memoryless property. We observe that the M/M/1 system is empty with probability Pr [QV = 0] = 1 and of all states, the empty state has the highest probability. Immediately, the chance that there is a packet in the M/M/1 system is precisely equal to the tra!c intensity , namely Pr [QV A 0] = 1 Pr [QV = 0] = . 14.1 The M/M/1 queue 273 The corresponding probability generating function (2.18) is *QV (}) = " X n=0 Pr [QV = n] } n = 1 1 } The average number of packets in the M/M/1 system H [QV ] = *0V (1) equals ¤ £ H QV;P@P@1 = 1 while the variance Var[QV ] follows from (2.27) as ¤ £ Var QV;P@P@1 = (1 )2 Both the mean and variance of the number of packets in the system diverges as $ 1. When the interarrival rate tends to the service rate, the queue grows indefinitely long with indefinitely large variation. From Little’s law (13.21), the average time spent in the M/M/1 system equals ¤ H [QV ] £ 1 1 = = H WP@P@1 = (1 ) (14.2) where ? 1 or, equivalently, ? . If = 0, there is no load in the system and the average waiting time attains its minimum equal to the average service time 1 . In the other limit $ 1, the average waiting time grows unboundedly, just as the queue length or number of packets in the system. The behavior of the M/M/1 system in the limit $ 1 is characteristic for the average of quantities (QV , z> W> = = =) in many queueing systems: a simple pole at = 1. As a remark, the average waiting time in the M/M/1 queue follows, after taking expectations from the general relation Wq {q = zq or H [z] = H [W ] 1 , as ¤ £ H zP@P@1 = 1 1 = (1 ) (1 ) (14.3) 14.1.2 The virtual waiting time For the M/M/1 queue, the virtual waiting time y(w) at some time w consists of (a) the residual service time of the packet currently under service, (b) the time needed to serve the QT (w) packets in the queue. As mentioned in Section 13.1.3, the virtual waiting time at arrival epochs equals the system time Wq . In other words, if a new packet, say the q-th packet, enters the M/M/1 system at w = wq , the total time (system time) that the packet 274 Queueing models spends in the M/M/1 system equals y(wq ) = Wq . At w = w3 q , the number ) does not include the new packet at the last position and the packet QV (w3 q 3 “sees” QV (wq ) other packets in the system (queue plus the packet in the server) in front of it. We assume further that the server operates in FIFO (first in, first out) order. Since the service time is exponentially distributed and possesses the memoryless property, the residual or remaining service time of the packet currently under service has the same distribution. In other words, it does not matter how long the packet has already been under service. The more general argument is that the PASTA property applies. The system time of the q-th packet is thus the sum of QV (w3 q )+1 exponential 3 i.i.d. random variables. As shown in Section 3.3.1, if QV (wq ) = n, the system time has an Erlang distribution given by (3.24) with q = n + 1, iWq (w|QV (w3 q ) = n) = (w)n 3w h n! Using the law of total probability (2.46), the system time Wq of the q-th packet or the virtual waiting time at time w = wq becomes g Pr [Wq w] gw " X £ ¤ 3 = iWq (w|QV (w3 q ) = n) Pr QV (wq ) = n iWq (w) = n=0 = h3w " X (w)n n=0 n! ¤ £ Pr QV (w3 q) = n 3 In Section 11.3.3, vn (w3 q ) = Pr [QV (wq ) = n] is computed in (11.27) assuming that the system starts with m packets, i.e. vn (0) = nm . In steadyn state, where wq $ 4, it is shown that vn (w3 q ) $ (1 ) . In most cases, however, a time-dependent solution is not available in closed form. Fortunately, for Poisson arrivals, the PASTA property helps to circumvent this inconvenience. Based on the PASTA property, in steady-state, limq<" Pr [QV (w3 q ) = n] = Pr [QV = n] given by (14.1). The probability density function iW (w) = limq<" iWq (w) of the steady-state system time W (or the total waiting time of a packet) is 3w iW (w) = h " X (w)n n=0 n! (1 ) n or iW (w) = (1 ) h3(13)w (14.4) 14.1 The M/M/1 queue 275 In summary, the total time spent in the M/M/1 system in steady-state 1 1 = 3 , which has ( ? 1) has an exponential distribution with mean (13) 1 been found above in (14.2) by Little’s law. Similarly , the waiting time in the M/M/1 queue is iz (w) = (1 ) (w) + (1 ) h3(13)w (14.5) where the first term with Dirac function reflects a zero queueing time provided the system is empty, which has probability Pr [QV = 0] = 1 . 14.1.3 The departure process from the M/M/1 queue There is a remarkable theorem due to Burke which has far-reaching consequences for networks of M/M/1 queues. Theorem 14.1.1 (Burke) In a steady-state M/M/1 queue, the departure process is a Poisson process with rate Burke’s Theorem is equivalent to the statement that the interdeparture times uq uq31 in steady-state are i.i.d. exponential random variables with mean 1 . Proof: Let us denote the probability density function of the interdeparture time u by g Pr [u w] iu (w) = gw In steady-state, it holds in general that Pr [QV;D = m] = Pr [QV;G = m], as shown in Section 13.4.1, while the PASTA property (Theorem 13.5.1) states that Pr [QV;D = m] = Pr [QV = m]. Hence, in steady-state in the M/M/1 queue, departing packets see the steady-state system content, i.e. Pr [QV;G = m] = Pr [QV = m]. Moreover, in steady-state, the departure process can be decomposed into two dierent situations after the departure of a packet: (a) the departing packet sees an empty system (which is equivalent to “the system is empty”) or (b) the departing packet sees a next packet in the queue (which is equivalent to “the system serves immediately the next packet in 1 The Laplace transform of the waiting time in the queue follows from Wq = zq + {q as (1 3 ) } + *W (}) = *{ (}) } + (1 3 ) (1 3 ) = (1 3 ) + } + (1 3 ) *z (}) = which, after inverse Laplace transformation, gives (14.5). 276 Queueing models the queue”), Pr [u w] = Pr [u w|QV = 0] Pr [QV = 0] + Pr [u w|QV A 0] Pr [QV A 0] In case (a), we must await for the next packet to arrive and to be served. This total time is the sum of an exponential random variable with rate and an exponential random variable with rate . It is more convenient to compute the Laplace transform as shown in Section 3.3.1, Z " h3}w g (Pr [u w|QV = 0]) = *u|QV =0 (}) = }+}+ 0 In case (b), the next packet leaves the M/M/1 queue after an exponential service time with rate , Z " *u|QV A0 (}) = h3}w g (Pr [u w|QV A 0]) = }+ 0 Hence, *u (}) = *u|QV =0 (}) Pr [QV = 0] + *u|QV A0 (}) Pr [QV A 0] = (1 ) + = }+}+ }+ }+ which proves the theorem. ¤ Burke’s Theorem states that the steady-state arrival and departure process of the M/M/1 queue are the same! Consequently, the steady-state departure rate equals the steady-state arrival rate . 14.2 Variants of the M/M/1 queue A number of readily obtained variants from the birth-death analogy are worth considering here. Mainly a steady-state analysis is presented. 14.2.1 The M/M/m queue Instead of one server, we consider the case with p servers. The buer is still infinitely long and the interarrival process is exponential with interarrival rate . The M/M/m queue can model a router with p physically dierent interfaces (or output ports) with same transmission rate towards the same next hop. All packets destined to that next hop can be transmitted over any of the p interfaces. This type of load balancing frequently occurs in the Internet. As shown in Fig. 14.1, the M/M/m system can still be described by a birth 14.2 Variants of the M/M/1 queue 277 and death process with birth rate n = , but with death rate n = n for 0 n p and n = p if n p. Indeed, if there are n p packets in system, they can all be served and the departure (or death) rate from the system is n. Only if there are more packets n A p, only p of them can be served such that the death rate is limited to the maximum service rate p. O 0 P O 1 P O O ... m1 (m 1)P mP O 2 P O m mP O m+1 ... mP Fig. 14.1. The birth—death process corresponding to the M/M/m queue. 14.2.1.1 System content From the basic steady-state relations for the birth and death process (11.15) and (11.16), we find Pr [QV = 0] = 1+ 1 P" m m=1 m!m + m=p pm3p p!m Pp31 m = Pp31 m 1 (14.6) p 1 m=0 m!m + p!p 13 p m Pr [QV = m] = Pr [QV = 0] m!m µ ¶ m pp = Pr [QV = 0] p! p mp (14.7) mp (14.8) , the ratio between average interarrival rate The tra!c intensity is = p and average (maximum) service rate. Again, ? 1 corresponds to the stable (ergodic) regime. For the M/M/m system it is of interest to know what the probability of queueing is. Queueing occurs when an arriving packet finds all servers busy, which happens with probability Pr [QV p], or explicitly, Pr [QV p] = Pr [QV = 0] p ³ ´ p p! 1 p (14.9) This probability also corresponds to a situation in classical telephony where no trunk is available for an arriving call. Relation (14.9) is known as the Erlang C formula. 278 Queueing models 14.2.1.2 Waiting (or queueing) time Instead of computing the virtual waiting time (or system time, unfinished work), we will now concentrate on the waiting time of a packet in the M/M/m queue. The system time can be deduced from the basic relation Wq = zq + {q , where {q is an exponential random variable with rate p. A packet only experiences queueing if all servers are occupied. This event has probability Pr [QV p] specified by the Erlang C formula (14.9). Hence, the queueing time z can be decomposed into two cases: (a) an arriving packet does not queue (z = 0) and (b) an arriving packet must wait in the M/M/m queue, Pr[z w] = Pr[z w|QV ? p] Pr[QV ? p] + Pr[z w|QV p] Pr[QV p] = Pr [QV ? p] + Pr [z w|QV p] Pr [QV p] = 1 Pr [QV p] + Pr [z w|QV p] Pr [QV p] (14.10) It remains to compute Pr [z w|QV p]. The reasoning is analogous to that in the M/M/1 queue. An arriving packet must wait for the packet currently under service and for the m packets already in the queue before it. Thus, z equals the sum of m + 1 exponentially random variables with rate p because for the M/M/m queue, the service rate is p. Hence (see Section 3.3.1), the distribution for the waiting time z in the queue, provided m packets are in the queue, is an Erlang distribution, iz (w|QT = m) = p(pw)m 3pw h m! Furthermore, the number of packets in the queue QT in steady-state is related to the system content as QV = p + QT . Using the law of total probability (2.46), the waiting time in the queue in steady-state is g Pr [z w|QV p] gw " X iz (w|QV = p + m) Pr [QV = p + m|QV p] = iz (w|QV p) = m=0 The conditional probability Pr [QV = p + m|QV p] follows from (2.44) as and (14.8) with = p Pr [QV = p + m] Pr [QV = 0] pp Pr [QV = p + m|QV p] = = Pr [QV p] Pr [QV p] p! µ ¶µ ¶m = 1 = (1 ) m p p µ p ¶m+p 14.2 Variants of the M/M/1 queue 279 We observe from (14.1) that, if all p servers are busy, the system content of an M/M/m system behaves as that in a M/M/1 system. Thus, the conditional probability density function for the waiting time in the M/M/m queue is also an exponential distribution, iz (w|QV p) = (1 ) ph3pw " X (pw)m m=0 m! = (1 ) ph3(13)pw or Pr [z w|QV p] = 1 h3(13)pw Substitution in (14.10) finally results in the distribution of the waiting time in the queue of the M/M/m system, Iz (w) = Pr [z w] = 1 Pr [QV p] h3(13)pw (14.11) Since Iz (0) = 1 Pr [QV p] A 0 while obviously Iz (03 ) = 0, there is probability mass at w = 0, which is reflected by a Dirac impulse in the probability density function, iz (w) = (1 Pr [QV p]) (w) + (1 ) p Pr [QV p] h3(13)pw (14.12) The pdf of the system time W = z + { follows after convolution of (14.12) and i{ (w) = h3w as iW (w) = (1 Pr [QV p]) hw + ´ ³ Pr [QV p] (1 ) p h(1)pw hw 1 p (1 ) (14.13) and the average system time can be computed from (14.13) with (2.33) or directly from H [W ] = H [z] + H [{] as H [W ] = 1 Pr [QV p] + p (1 ) (14.14) Also, in the single-server case (p = 1), (14.12) reduces to the pdf (14.5) of the M/M/1 queue. Furthermore, Burke’s Theorem 14.1.1 can be extended to the M/M/m queue: the arrival and departure process of the M/M/m queue are both Poisson processes with rate . 14.2.2 The M/M/m/m queue The dierence with the M/M/m queue is that the number of packets (calls) in the M/M/m/m queue is limited to p. Hence, when more than p packets (calls) arrive, they are lost. This situations corresponds with classical 280 Queueing models telephony where a conversation is possible if no more than p trunks are occupied, otherwise you hear a busy tone and the connection cannot be set-up. The limitation to p arrivals is modeled in the birth and death process by limiting the interarrival rates, n = if n ? p and n = 0 if n p. The death rates are the same as in the M/M/m queue, n = n for 0 n p and n = p if n p. From the basic steady-state relations for the birth and death process (11.15) and (11.16), we find Pr [QV = 0] = Pp Pr [QV = m] = 1 m m=0 m!m m m!m =0 Pr [QV = 0] (14.15) mp mAp (14.16) The quantity of interest in the M/M/m/m system is the probability that all trunks (servers) are busy, which is known as the Erlang B formula, p p!p Pr [QV = p] = Pp m m=0 m!m (14.17) In practice, a telephony exchange is dimensioned (i.e. the number of lines p is determined) such that the blocking probability Pr [QV = p] is below a certain level, say below 1034 . In summary, the Erlang B formula (14.17) determines the blocking probability or loss probability (because only p calls or packets are allowed to the system), while the Erlang C formula (14.9) is the probability that a packet must wait in the (infinitely long) queue because all servers are busy. Although the Erlang B formula (14.17) has been derived in the context of the M/M/m/m queue, it holds under much weaker assumptions, a fact already known to Erlang, as mentioned by Kelly (1991). Kelly starts his memoir on Loss Networks with the Erlang B formula, which Erlang obtained from his powerful method of statistical equilibrium. The latter concept is now identified as the steady-state of Markov processes. Kelly further relates the impact of the Erlang B formula from telephony to interacting particle system and phase transitions in nature (e.g. the famous Ising model). Much eort has been devoted over time to generalize Erlang’s results as far as possible. The Erlang B formula (14.17) holds for the M/G/m/m queue as well, thus for an arbitrary service process provided the mean service rate (per server) equals . The proof by Gnedenko and Kovalenko (1989, pp. 237— 240) is long and complicated, whereas the proof of Ross (1996, Section 5.7.2) 14.2 Variants of the M/M/1 queue 281 is more elegant and is based on the time-reversed Markov chain. As a corollary, Ross demonstrates that the departure process (including both lost and served packets) of the M/G/m/m system is a Poisson process with rate . Example 1 In case p $ ³ 4,´ the expression (14.15) and (14.6) tend to limp<" Pr [QV = 0] = exp while (14.16) and (14.7) become µ ¶ m exp Pr [QV = m] = m!m This queueing system is denoted as M/M/4. Thus, the number in the M/M/4 system (in steady-state) is Poisson distributed with parameter . Hence, in case p $ 4, the average number in the system H[QV ] = (as follows from (3.11)) and the average time in the system follows from Little’s theorem (13.21) as H [W ] = 1 . The fact that, if p $ 4, the mean time in the system M/M/4 equals the average service time has a consistent explanation: if the number of servers p $ 4, implying that there is an infinite service capacity, it means that there is no waiting room and the only time a packet is in the system is his service time 1 . Example 2 Consider two voice over IP (VoIP) gateways connected by a link with capacity F. Denote the capacity of a voice call by Fvoice (in bit/s). For example in ISDN, Fvoice = 64 kb/s. In general, Fvoice in VoIP depends on the specifics of the codecs used. The arrival rate of voice tra!c can be expressed in terms of the number d of call attempts per hour and the mean call duration g (in seconds) as d×g 3600 The number p of calls that the link can carry simultaneously is = p= F Fvoice Since the arrival process of voice calls is well modeled by a Poisson process with exponential holding time, the Erlang B formula (14.17) is applicable to compute the blocking probability or grade of service (GoS) as up Ppp! um m=0 m! (14.18) where u = = p and = p is the tra!c intensity. This relation (14.18) specifies the probability that admission control will have to refuse 282 Queueing models a call request between the two VoIP gateways because the link is already transporting p calls. An Internet service provider can make a trade-o between the link capacity F (by hiring more links or a higher capacity link from a network provider) and the blocking probability or GoS. The latter must be small enough to keep its subscribed customers, but large enough to make profit. A reasonable value for GoS seems = 1034 . If the Internet service provider hires a 2 Mb/s link and oers its customers VoIP software with codec rate 40 kb/s (G.726 standard), then p = 50. Since the left-hand side of (14.18) is strictly increasing in , solving the equation (14.18) for u yields u 28=87 or the tra!c intensity equals = 0=5775. Furthermore, F = 40 kb/s, we obtain = 1=155 Mb/s. If the mean call duration since = p g (in seconds) is known, the number of call attempts per hour then follow as . If we assume that a telephone call lasts on average 2 minutes d = 4158129 g or g = 120 s, the number of call attempts per hour that the Internet service provider can handle with a GoS of 1034 equals d = 34651. 14.2.3 The M/M/1/K queue In contrast to the basic M/M/1 queue, the M/M/1/K system cannot contain more than N packets (including the packet in the server). Arriving packets that find the system completely occupied (with N packets), are refused service and are to be considered as lost (or marked). In the basic steady-state relations for the birth and death process (11.15) the appearing summation is limited to N instead of infinity or n = if n ? N and n = 0 if n N. Thus, with = and 1 Pr [QV = 0] = PN m=0 m = 1 1 N+1 the pdf of the system content for the M/M/1/K system becomes, (1 ) m 1 N+1 =0 Pr [QV = m] = 0mN (14.19) mAN The probability that m positions in the M/M/1/K system are occupied is proportional to that in the infinite system (14.1) with proportionality factor ¡ ¢31 1 N+1 . The probability that the system is completely filled with N packets equals Pr [QV = N] = (1 ) N 1 N+1 (14.20) 14.3 The M/G/1 queue 283 This probability also equals the loss probability for packets in the M/M/1/K system. Regarding the QoS problem in multimedia based on IP-networks, a first crude estimate of the packet loss in a router with N positions can be derived from (14.20). The estimate is rather crude because the arrival process of packets in the Internet is likely not a Poisson process and the variable length of the packets does not necessarily lead to an exponential service rate. 14.3 The M/G/1 queue The most general single-server queueing system with Poisson arrivals is the M/G/1 queue. The service time distribution I{ (w) can be any arbitrary distribution. Due to its importance, we will derive the system content and the waiting time distribution in steady-state. In order to describe the M/G/1 queueing system, special observation points in time must be chosen such that the equations for the evolution of the number of packets in the system are most conveniently deduced. The set of departure times {uq } appears to be a suitable set. Any other set of observation points is likely to lead to a more complex mathematical treatment, mainly because the remaining service time of the packet just under service is a stochastic variable. If the M/G/1 queue is observed at departure times, the evolution of the number of packets in the system QV (uq+ ) that the departing packets leave behind is a discrete Markov chain, namely the embedded Markov chain of the M/G/1 queue continuous-time process. Section 13.4.1 has shown that, in steady-state, relation (13.19) tells that the distribution of the number of packets in the system observed by arrivals equals that left behind by departures. In addition, the PASTA Theorem 13.5.1 states that, in steady-state, Poisson arrivals observe the actual distribution of the number of packets in the system. This PASTA property makes that the embedded Markov chain observing the system at departure epoch only, nevertheless provides the steady-state solution since the arrival process is Poisson. Let us concentrate in deriving the transition probabilities that specify this embedded Markov chain entirely apart from an initial state distribution. With the notation of Section 13.4, Sn = QV (un+ ) and An denote the number of packets in the system at the discrete-time un and the number of arrivals during a time interval [un > un+1 ], respectively. The transition probability of the embedded Markov chain is Ylm = Pr [Sn+1 = m|Sn = l] 284 Queueing models and the evolution over time follows from (13.16) as Sn+1 = (Sn 1)+ + An (14.21) Hence, since An 0, we see that Sn+1 (Sn 1)+ or that Sn+1 ? (Sn 1)+ is impossible. Hence, Ylm = 0 for m ? l and l A 1 while, for l A 0, Ylm = Pr [l 1 + An = m|Sn = l] = Pr [An = m l + 1]. The case for l = 0 results in Y0m = Pr [An = m] = Y1m . Denoting dm = Pr [An = m], the transition probability matrix becomes2 6 5 d0 d1 d2 d3 · · · 9 d0 d1 d2 d3 · · · : : 9 : 9 Y = 9 0 d0 d1 d2 · · · : 9 0 0 d d ··· : 0 1 8 7 .. .. .. .. . . . . . . . and the corresponding transition graph is sketched in Fig. 14.2. aj i + 1 a1 ... i2 a3 i1 a2 i i+1 i+2 ... j ... a0 Fig. 14.2. State transition graph for the M/G/1 embedded Markov chain. The number of Poisson arrivals during a time slot [un > un+1 ] clearly depends on the length of the service time {n+1 = un+1 un that is distributed according to I{ (w), which is independent of a specific packet n. Furthermore, the arrival process is a Poisson process with rate and independent of the state of the queueing process, thus Pr [An = m] = Pr [A = m]. Hence, using the law of total probability (2.46), Z " Pr [A = m] = Pr [A = m|{ = w] gI{ (w) 0 Z " = 0 2 h3w (w)m gI{ (w) m! The structure of this transition probability matrix Y has been investigated in great depth by Neuts (1989). Moreover, Y belongs to the class of matrices whose eigenstructure is explicitly given in Appendix A.5.3. 14.3 The M/G/1 queue 285 If we denote the Laplace transform of the service time by Z " *{ (}) = h3}w gI{ (w) 0 then we observe that m dm = Pr [A = m] = m! Z " h 0 ¯ ()m gm *{ (}) ¯¯ w gI{ (w) = m! g} m ¯}= 3w m (14.22) so that the transition probability matrix Y is specified. Since all dm A 0 for all m A 0, Fig. 14.2 indicates that all states are reachable from an arbitrary state l as the Markov process evolves over time in at least l steps. These l steps occur in the transition from state l to state 0. This implies that the Markov process is irreducible and the steady-state stability requirement ? 1 makes the Markov process ergodic. The steady-state vector with components m = Pr [QV;G = m] where limq<" QV (uq+ ) = QV;G follows from (9.22) as solution of = Y , where Y is a matrix of infinite dimensions. 14.3.1 The system content in steady-state Rather than pursuing with the matrix analysis that is explored by Neuts (1989), we present an alternative method to determine the steady-state distribution m using generating functions. The generating function approach will lead in an elegant way to the celebrated Pollaczek-Khinchin equation. The probability generating function (pgf) Jn (}) of a discrete random variable Gn is defined in (2.18) as " £ ¤ X { Jn (}) = H } Gn = jn [m]} m (14.23) m=0 where jn [m] = Pr[Gn = m]. From (14.21), we have h i + Vn+1 (}) = H } (Sn 31) +An (14.24) Anticipating the corresponding result (14.31) derived in Section 14.4 for the GI/D/m system in discrete-time, we observe that the generating function Vn (}) satisfies a formally similar equation when p = 1 in (14.31). This correspondence points to a more general framework because, by choosing appropriate observation points, the M/G/1 and GI/D/1 systems (in discrete-time) obey a same equation formally. Since the results deduced for the GI/D/m system are more general because of the p servers instead of 1 here, we content ourselves here to copy the result (14.35) derived below3 in 3 Notice that the notation D(}) here is dierent from D(w) = Uw 0 QD (x)gx used before. 286 Queueing models Section 14.4, ¡ ¢ (} 1) D(}) V(}) = 1 D0 (1) } D(}) We further continue to introduce in this general equation the details of the P" m M/G/1 queueing system by specifying D(}) = m=0 Pr [A = m] } . With (14.22), we find the Taylor expansion, ¯ " X (})m gm *{ (}) ¯¯ = *{ ( }) (14.25) D(}) = m! g} m ¯}= m=0 and the probability generating function of the system content of the steadystate M/G/1 queueing system, ¢ (} 1) *{ ( }) ¡ V(}) = 1 + *0{ (0) } *{ ( }) But, since *0{ (0) = H [{] = 1 , which is the average service time, we finally arrive using (13.2) at the famous Pollaczek-Khinchin equation, V(}) = (1 ) (} 1) *{ ( }) } *{ ( }) (14.26) Let us further investigate what can be concluded from the Pollaczek-Khinchin equation (14.26). First, we can verify by executing the limit } $ 1 that V(1) = 1, the normalization condition for any probability generating function. More interestingly, the average number of packets in the M/G/1 system follows after some tedious manipulations using de l’Hospital’s rule as H [QV ] = V 0 (1) =+ £ ¤ From (2.42), *00{ (0) = H {2 , 2 *00{ (0) 2(1 ) £ ¤ 2 H {2 H [QV ] = + 2(1 ) (14.27) Hence, the average number of packets in the M/G/1 system (in steady-state) is proportional to the second moment of the service time distribution. Since £ 2¤ H { = Var[{]+(H [{])2 , the relation4 (14.27) shows that, for equal average I 4 Sometimes the coe!cient of variation for the service time, F[ = H [QV ] = + 2 2 1 + F[ 2(1 3 ) Va r[[] , is used such that H[[] 14.3 The M/G/1 queue 287 service rates, the service process with highest variability leads to the largest average number of packets in the system. One of the early successes of the Japanese industry was the “just in time” (JIT) principle, which essentially tries to minimize the variability in a manufacturing process. Minimization of variability is also very important in the design of scheduling rules: the less variability, the more e!ciently buer places in a router are used. Since a deterministic server has the lowest variance (namely zero), the M/D/1 queue will occupy on average the lowest number of packets. This design principle was used in ATM, where all service times precisely equal the time needed to serve one ATM cell. The average time spent in the system follows directly from Little’s law (13.21), £ ¤ H {2 H [QV ] = H [{] + H [W ] = 2(1 ) and, since H [W ] = H [{] + H [z], the average waiting time in the queue is £ ¤ H {2 (14.28) H [z] = 2(1 ) Observe a general property of “averages” in queueing systems: there is a simple pole at = 1. Both the average number of packets in the system (and in the queue) and the average waiting time grows unboundedly as $ 1. 14.3.2 The waiting time in steady-state The derivation of the pgf (14.26) of the steady-state system content V(}) has not made any assumption about the service discipline that determines the order in which packets are served. The waiting time (in the queue) and the system time (total time spent in the M/G/1 queueing system) will, of course, dependent on the order. As mentioned earlier, a FIFO service discipline is assumed. At each departure time un , the number of packets left behind by that n-th packet is precisely QV (un ). With FIFO, this implies that during the total time Wn that the n-th packet has spent in the M/G/1 queueing system, precisely QV (un ) packets have arrived. Similarly, as above in (14.22), we compute the number of Poisson AW arrivals during Wn (instead of {n ) and directly find, in steady-state, ¯ Z m " 3w m ()m gm *W (}) ¯¯ W Pr [A = m] = h w gIW (w) = m! 0 m! g} m ¯}= 288 Queueing models and the corresponding pgf, DW (}) = " X Pr [AW = m] } m = *W ( }) m=0 where *W (}) is the Laplace transform of the system time W . Since the number of Poisson arrivals AW during the system time W of a packet in steadystate equals the number of packets left behind by that packet, Pr [AW = m] = Pr [QV;G = m]. The PASTA property (Theorem 13.5.1) states that, in steadystate, the observed number of packets in the queue at departure or arrival times is equal in distribution to the actual number of packets in the queue or that Pr [QV;G = m] = Pr [QV = m]. By considering the pgfs of both sides, DW (}) = V(}), such that with (14.26), *W ( }) = (1 ) (} 1) *{ ( }) } *{ ( }) After a change of variable v = }, we end up with the result that the Laplace transform of the total system time in steady-state is a function of the Laplace transform of the service time *W (v) = (1 ) v*{ (v) v + *{ (v) (14.29) ¤ £ Since Wn = zn + {n and, in steady-state W = z + {, we have that H h3vW = H [h3vz h3v{ ] = H [h3vz ] H [h3v{ ], where the latter follows from independence between { and z. Hence, *W (v) = *{ (v)*z (v), from which the Laplace transform of the waiting time in the queue follows as v (14.30) *z (v) = (1 ) v + *{ (v) Due to the correspondence with (14.26), these two relations (14.29) and (14.30) are also called Pollaczek-Khinchin equations for the system time and waiting time respectively. For example, for an exponential service time (13) and (14.29) becomes *W (v) = v+(3) , which with average 1 , *{ (v) = v+ is indeed the Laplace transform of the pdf of total system time (14.4) in the M/M/1 queue. The relation (14.30) can be written in terms of the residual service time (8.17), which after Laplace transform becomes *uz (v) = 13*{ (v) vH[{] , as *z (v) = 1 1 *uz (v) It shows that the dominant tail behavior (see Section 5.7) arises from the pole at *uz (v) = 1 . By formal expansion into a Taylor series (only valid for 14.4 The GI/D/m queue 289 |*uz (v)| ? 1), we find *z (v) = (1 ) " X n *nuz (v) n=0 or, after taking the inverse Laplace transform, X g (nW) (1 ) n iuz (w) Pr [z w] = gw " iz (w) = n=0 The pdf iz (w) of the waiting time in the queue can be interpreted as a sum (nW) of convolved residual service time pdfs iuz (w) weighted by (1 ) n = Pr [QV = n], the steady-state probability of the system content in the M/M/1 system (14.1). 14.4 The GI/D/m queue The analysis of the GI/D/m queue illustrates a discrete-time approach to queueing. Since each of the p servers operate deterministically which means that per unit time precisely one packet (or an ATM cell or customer) is served, the basic time unit in the analysis, also called a time slot, is equal to that service time. Hence, the arrival process is expressed as a counting process: instead of specifying the interarrival rate, the number of arrivals at each time slot is used. In the sequel, we confine ourselves to a deterministic server discipline that removes during timeslot n£ precisely p cells from the queue. Hence, we have ¤ X p n Xn = p and [n (}) = H } = } . Substituting (13.16) in (14.23) leads to h i + Vn+1 (}) = H } (Sn 3p) +An (14.31) At this point, a further general evaluation of the expression (14.31) is only possible by assuming independence between the random variables An and Qn . From (13.18), it then follows that Vn+1 (}) = Tn (})Dn (}). This crucial assumption facilitates the analysis considerably. For, h i + Vn+1 (}) = H } (Sn 3p) } An i £ h ¤ + (by independence) = H } (Sn 3p) H } An = Dn (}) " X m=0 Pr[(Sn p)+ = m] } m (by definition (14=23)) 290 Queueing models The summation can be worked out as " X E= Pr[(Sn p)+ = m] } m m=0 = Pr[(Sn p)+ = 0] + = p X Pr[Sn = p + m] } m m=1 " X Pr[Sn = m] } m3p Pr[Sn = m] + m=0 " X m=1+p Setting in terms of the generating function of Vn (}) yields 3 4 p " p X X X Pr[Sn = m] + } 3p C Pr[Sn = m] } m Pr[Sn = m] } m D E= m=0 =} 3 3p C Vn (}) m=0 p X 4 m=0 Pr[Sn = m] (} m } p )D m=0 Finally, we obtain a recursion relation for the generating function of the system content in the GI/D/m system at discrete-time n, 3 4 p31 X Pr[Sn = m] (} m } p )D (14.32) Vn+1 (}) = Dn (})} 3p CVn (}) m=0 In the single-server case p = 1, where precisely one cell is served per time slot (provided the queue is not empty), equation (14.32) simplifies with Vn (0) = Pr[Sn = 0] to Vn+1 (}) = Dn (})} 31 {Vn (}) Pr[Sn = 0] (1 })} ¸ Vn (}) Vn (0) + Vn (0) = Dn (}) } (14.33) Notice that Vn+1 (0) = Dn (0)(Vn0 (0) + Vn (0)). 14.4.1 The steady-state of the GI/D/m queue The steady-state behavior is reached if the system’s distributions do not change anymore in time. With limn<" Vn (}) = V(}) and limn<" Dn (}) = D(}), (14.32) reduces in steady-state to P p m D(}) p31 m=0 Pr[S = m] (} } ) V(}) = } p D(}) 14.4 The GI/D/m queue 291 Recall that [n (}) = } p is the generating function of the service process. At this point, we use the same powerful argument from complex analysis as in Section 11.3.3. Since a generating function of a probability distribution is analytic inside and on the unit circle, the possible zeros of } p D(}) inside that unit circle must be precisely cancelled by zeros in the numerator. Clearly, } = 1 is a zero of } p D(}). On the unit circle, excluding the point } = 1 where D(1) = 1, we know5 that |D(})| ? 1 for all points on the unit circle |}| = 1 (except for } = 1). The region around } = 1 deserves some closer investigation. From the Taylor expansions D(}) = 1 + (} 1) + r(} 1) } p = 1 + p(} 1) + r(} 1) and the fact that ? p because a steady-state requires ? 1, we substitute } 1 = %hl , which describes a circle with radius % around } = 1. Along this circle with arbitrarily small radius %, we find that ¯ ¯ q ¯ ¯ l |D(})| = ¯1 + %h + r(%)¯ = (1 + % cos + r(%))2 + (% sin + r(%))2 ¯ ¯ q ¯ ¯ |} p | = ¯1 + p%hl + r(%)¯ = (1 + p% cos + r(%))2 + (p% sin + r(%))2 which demonstrates that |} p | A |D(})| on this arbitrary small circle if cos A 0 or 2 ? ? 2 . Invoking Rouché’s Theorem 11.3.1 with i (}) = } p and j(}) = D(}) on the contour F, the unit circle including the point } = 1 by an arbitrarily small arc % on the right of } = 1 as illustrated in Fig. 14.3, such that |i (})| A |j(})| on the contour F, shows that } p D(}) has precisely p zeros 1 > 2 > = = = > p = 1 inside that contour F. V(}) Since T(}) = D(}) is the generating function of the number of occupied buer positions, it is also analytic inside the unit circle. Therefore, the zeros P p m {q }1$q$p31 and p = 1 are also zeros of s(}) = p31 m=0 Pr[S = m] (} } ). This leads to a set of p equations for each q 6= 0, p31 X Pr[S = m] (qp qm ) = 0 m=0 which determine the unknown probabilities Pr[S = m]. Since s(}) is a poly5 For any probability generating function *J (}) it holds for |}| $ 1 that [ [ " " " [ m Pr [J = m] } $ Pr [J = m] } m $ Pr [J = m] = 1 |*J (})| = m=0 m=0 m=0 292 Queueing models |z| = 1 1 0 H C Fig. 14.3. Details of the contour F in the neighborhood of } = 1= nomial of degree p, s(}) is entirely determined by its zeros as s(}) = (} 1) p31 Y (} q ) q=1 The unknown is determined from the normalization condition V(1) = T(1) = 1, which is explicitly lim s(}) }<1 } p D(}) =1 With de l’Hospital’s rule, 31 = p31 Y q=1 1 = (1 q ) lim p31 }<1 p} D0 (}) Qp31 q=1 (1 q ) p Finally, we arrive at the generating function of the buer content via that of the system content V(}) = D(})T(}), (p )(} 1) Y } q } p D(}) 1 q p31 T(}) = (14.34) q=1 With D0 (1) = the single-server case (p = 1) in (14.34) reduces to the well-known result for the pgf of the system and buer content of a GI/D/1 system respectively as ¡ ¢ (} 1) D(}) V(}) = 1 D0 (1) } D(}) }1 T(}) = (1 ) } D(}) (14.35) (14.36) 14.4 The GI/D/m queue 293 The probability of an empty buer, T(0) = Pr [Q = 0], immediately follows from (14.34). The average queue length for the single-server (p = 1) is obtained as T0 (1), or H [QT ] = H [Q] = D00 (1) 2 (1 ) (14.37) 14.4.2 The waiting time in a GI/D/m system Let W = z + 1 denote the steady-state system time of an arbitrary packet, a “test” packet, in units of a timeslot. In addition to the system content S that describes the number of packets in the system at the beginning of a time slot, an additional random variable must be introduced: F denotes the number of packets that arrive in the same timeslot just before the “test” packet. In the D-server and assuming a FIFO discipline, these F packets will be served before the “test” packet, possibly in the same time slot. The system time of the “test” packet equals ¹ º (S p)+ + F W = +1 p where b{c denotes the largest integer smaller than or equal to {. Indeed, (S p)+ + F are the number of packets in the system just before the arrival of the “test” packet. At the beginning of a timeslot, at most p packets are served, which explains the integer division. The service time takes precisely one additional time slot. Let us simplify the notation by defining R = (S p)+ + F. From this expression for the system time, we deduce, for each integer n 1 (the minimal waiting time in the system equals 1 timeslot), that Pr [W = n] = p31 X Pr [R = (n 1)p + m] m=0 such that the generating function of the waiting time W (}) is W (}) = " X Pr [W = n] } n = n=1 = " p31 XX m=0 n=0 " p31 X X Pr [R = (n 1)p + m] } n n=1 m=0 Pr [R = np + m] } n+1 294 Queueing models Also, W (} p ) = } p p31 X } 3m m=0 =} p =} Pr [R = np + m] } pn+m n=0 p31 X } 3m " " X X Pr [R = q] } q q>pn+m n=0 q=0 m=0 p " X p31 X } 3m " X Pr [R = q] } q=0 m=0 q " X q>pn+m n=0 where the Kronecker delta n>p = 1 if n = p else n>p = 0. The sum " X " X q>pn+m = n=0 q3m>pn = 1p|q3m n=3" is one if p divides q m else it is zero. Such expression can be written as " X p31 X 1 h2l(q3m) ¢= q3m>pn = ¡ h2ln(q3m)@p 2l(q3m)@p p 1 h n=3" n=0 Using the latter summation yields W (} p ) = p31 " p31 X } p X 3m X } Pr [R = q] } q h2ln(q3m)@p p q=0 m=0 = = n=0 p p31 X p31 X³ } p }h2ln@p n=0 m=0 ´3m " X ³ ´q Pr [R = q] }h2ln@p q=0 ³ ´ 1 } 3p 2ln@p ¢31 U }h ¡ 2ln@p n=0 1 }h p p31 X } p where we have introduced the generating function U(}) = Thus, we arrive at P" q=0 Pr [R = q] } q. ³ ´ }p 1 1 X 2ln@p U }h ¢31 ¡ p 1 }h2ln@p p31 W (} p ) = n=0 The generating function U(}) can be specified further since h i i £ ¤ h + + U(}) = H } (S3p) +F = H } F H } (S3p) where independence of the arrival process and queueing process (GI) has 14.4 The GI/D/m queue 295 been used. From (14.31), the corresponding steady-state relation is h i i h i £ ¤ h + + + V(}) = H } (S3p) +A = H } A H } (S3p) = D(})H } (S3p) h i + while from V(}) = D(})T(}), we observe that T(}) = H } (S3p) . Hence, U(}) = I (})T(}) where T(}) is given by (14.34). We now turn our attention to the determination of I (}), the generating function of the number of packets in front of the “test” packet. The “test” packet has been uniformly chosen out of the total flow of arriving packets at the system. Let us denote by AW the number of arriving packets in the same time slot as the “test” packet. The random variable AW is not the same as the number of arriving cells A per time slot. For example, we know that there is at least one arrival in the time slot of the “test” packet, namely, the “test” packet itself, hence, Pr [AW = 0] = 0. Furthermore, the larger the number of arriving packets in a time slot, the higher the probability that the “test” packet is chosen out of those packets in this time slot. Hence, Pr [AW = m] is proportional to the number m of arriving packets in a time slot. In addition, Pr [AW = m] is also proportional to Pr [A = m], which describes how likely a number m of arriving packets is. Combining both shows that Pr [AW = m] = m Pr [A = m] P 1 W with the proportionality factor equal to = H[A] because " m=0 Pr [A = m] = 1. The “test” packet is uniformly distributed among the arriving packets AW in the time slot of the “test” packet (in steady-state). The probability of having precisely n packets in front of the “test” packet given AW = m 1 equals 1n?m Pr [F = n|AW = m] = m Indeed, the “test” packet has equal probability 1m of occupying any of the m possible positions. The occupation of a position n + 1 implies precisely n cells in front of the “test” packet in a FIFO discipline. Using the law of total probability (2.46), Pr [F = n] = = " X Pr [F = n|AW = m] Pr [AW = m] m=1 " X m=n+1 " X 1 m Pr [A = m] 1 Pr [A = m] = m H [A] H [A] m=n+1 296 Queueing models The generating function I (}) becomes I (}) = " X n=0 1 X X Pr [F = n] } = Pr [A = m] } n H [A] " " n n=0 m=n+1 m X " " 1 X 1 X 1 }m Pr [A = m] } n31 = Pr [A = m] H [A] H [A] 1} m=1 m=1 n=1 3 4 " " X X 1 C Pr [A = m] } m Pr [A = m]D = H [A] (} 1) = m=1 = m=1 } (D(}) Pr [A = 0] 1 + Pr [A = 0]) H [A] (} 1) Since H [A] = D0 (1) = , we finally arrive at I (}) = D (}) 1 (} 1) D0 (1) Combining all involved expressions leads to the expression of the generating function of the total time spent in the GI/D/m queue ¡ 2ln@p ¢ p31 p31 p1 X 1 Y }h2ln@p q D }h } p p ¢ ¡ W (} ) = ¢31 p ¡ p 1 q } D }h2ln@p q=1 1 }h2ln@p n=0 For the single-server case (p = 1), the generating function of the system time (queueing time plus service time) considerably simplifies to ¶ µ D (}) 1 1 } W (}) = } D(}) from which H [W ] and Var[W ] readily follow. The computation of the pdf given the arrival process D(}) is more complex, as illustrated for the M/D/1/K queue in the next section. 14.5 The M/D/1/K queue Suppose we have a buer of N cells and an aggregate arrival stream consisting of a large number of sources with none of them dominating the others. This input process is well modeled by a Poisson process with arrival rate . Both the input process as well as the buer content and the output process have been simulated in Fig. 14.4. Observe the eect of variations and maximum number of cells in the queue and input process! 14.5 The M/D/1/K queue 297 6 Poisson(0.8) number of cells in queue 2 number of served cells 8 4 number of arriving cells 1 10 6 4 2 0 0 0 0 200 400 600 800 1000 0 timeslot 200 400 600 800 1000 0 200 400 600 800 1000 timeslot timeslot Fig. 14.4. On the left, the Poisson input process with = 0=8 in terms of the number of cells versus the timeslot. In the middle, the buer occupancy for a buer with N = 20 as function of time. On the right, the M/D/1/K output process in cells served per timeslot. 14.5.1 The pdf of the buer occupancy in the M/D/1 queue n For a Poisson process, Pr[A = n] = n! h3 and D(}) = h(}31) . The pgf T(}) of the buer content immediately follows from (14.36) as T(}) = (1 ) (1 }) h(13}) 1 } h(13}) The average queue length is obtained from (14.37) as ¤ £ H QT;P@G@1 = H [Q] = 2 2 (1 ) and Little’s law (13.21) provides the average waiting time in the queue ¤ £ £ ¤ H QT;P@G@1 = H zP@G@1 = 2 (1 ) (14.38) Since } h(13}) is analytic everywhere, there always exists a neighborhood (depending on ) around } = 0 for which |} h(13}) | 1. Hence, we can use the series expansion for the geometric series to obtain X X T(}) } n hn(13}) = (1 }) } n h(n+1)(13}) = (1 }) h(13}) 1 " " n=0 n=0 298 Queueing models Integrating with respect to removes the factor (1 }), Z " X }n T(}) g = h(n+1) h3(n+1)} 1 n+1 n=0 " " X X }n (1)q (n + 1)q q q h(n+1) } = n+1 q! q=0 n=0 = " X " X (1)q n=0 q=0 q! h(n+1) (n + 1)q31 q } q+n After a change in the variable p = q + n with p 0, which implies that n p since q = p n 0, we have Ãp ! Z " X X (1)p3n T(}) g = h(n+1) (n + 1)p3n31 p3n } p 1 (p n)! p=0 n=0 Ãp+1 ! " X X (1)p+13n = hn n p3n p+13n } p (p + 1 n)! p=0 n=1 Dierentiation with respect to , gives Ãp+1 ¸! " p+13n p3n X X (n) (n) T(}) = (1 ) (1)p+13n hn + }p (p + 1 n)! (p n)! p=0 n=1 from which we finally deduce the probability t[p] = Pr [Q = p] that the p-th position in the buer is occupied6 ¸ p+1 X (n)p3n (n)p+13n p+13n n + t[p] = (1 ) (1) h (14.39) (p + 1 n)! (p n)! n=1 One readily observes from the derivation above that the probability v[p] = Pr [S = p] that p positions in the system are occupied, is v[p] = t[p 1] 6 Explicitely, we have for the queue content probabilities t[0] = h (1 3 ) t[1] = h (1 3 ) (h 3 3 1) t[2] = h (1 3 ) ( 2 3 h2 + 3 h 3 2 h ) 2 while the system content probabilities are v[0] = (1 3 ) v[1] = (h 3 1) (1 3 ) v[p] = t[p 3 1] (p D 2) 14.5 The M/D/1/K queue 299 for p 2 because the }-transform is V(}) = (1 ) 13}(13}) . This result is h(13}) a characteristic property of a deterministic server. Next, we rewrite (14.39) as # "p+1 p p+13n p3n X X n (n) n (n) h h t[p] = (1 ) (p + 1 n)! (p n)! n=1 n=1 h i = (1 ) h(p+1) j(h3 ; p + 1) hp j(h3 ; p) (14.40) where j({; p) = p X {n (p n)n n! (14.41) n=0 Due to the nature of the dierences, we immediately find the cumulative distribution, N X i h t[p] = (1 ) h(N+1) j(h ; N + 1) h p=1 and since t[0] = (1 ) h , we arrive at Pr[Q N] = N X t[p] = (1 ) h(N+1) j(h ; N + 1) (14.42) p=0 The expressions in (14.40) are numerically only useful for small p because the series is alternating. This problem may be solved by considering a famous result due to Lagrange (Markushevich, 1985, Vol. 2, Chapter 3, Section 14) he} = 1 + e " X (e + qd)q31 ¡ 3d} ¢q }h q! (14.43) q=1 that converges for |}| d31 . Dierentiation of (14.43) with respect to z = }h3d} leads to " X (e + d + qd)q q 3qd} h(e+d)} = } h 1 d} q! q=0 e} g e} g e} g} eh because gz h = g} h gz = (13d})h 3d} . Choosing d = 1, e + d = p and } = , we obtain " " X X h3p (p q)q (q p)q = (h3 )q = j(h3 ; p) + (h3 )q 1 q=0 q! q! q=p+1 300 Queueing models and thus 3 j(h " X (q p)q h3p (h3 )q ; p) = 1 q=p+1 q! where the infinite series consists of merely positive terms. Substituted in (14.42), this finally yields N+1 Pr[Q A N] = (1 ) " X qq+N+1 (h3 )q (q + N + 1)! q=1 (14.44) In the heavy tra!c limit = $ 1, the dominant zero (5.35) of 1 and the resulting tail } h(13}) is approximately equal to 1 + 2 (13) asymptotic (5.31) for the buer occupancy pdf is Pr [Q A N] (13) 1 3N31 h3N log h32N 1 (14.45) 14.6 The N*D/D/1 queue The N*D/D/1 queue is a basic model for constant bit rate sources in ATM, as shown in Fig. 14.5. The input process consists of a superposition of Q independent periodic sources with the same period G but randomly phased, i.e. arbitrarily shifted in time with respect to each other. The server operates deterministically and serves one ATM cell per timeslot. The buer size is assumed to be infinitely long mainly to enable an exact analytic solution. During a time period G measured in time slots or server time units, precisely Q cells arrive such that the tra!c intensity or load (13.2) equals = Q G. 1 2 3 K D . . . N Fig. 14.5. Sketch of an ATM concentrator where Q input lines are multiplexed onto a single output line. The N*D/D/1 queue models this ATM basic switching unit accurately. Whereas the arrivals in the M/D/1 queue are uncorrelated, the successive 14.6 The N*D/D/1 queue 301 interarrival times in the N*D/D/1 queue are negatively correlated. For the same average arrival rate, this more regular arrival process results in shorter queues than in the M/D/1 queue, where the higher variability in the arrival process causes longer queues. Due to the dependence of the arrivals over many timeslots, the solution method is based on the Benes̆ approach and starts from the complementary distribution (13.15) for the virtual waiting time or unfinished work in steady-state, Pr [y(w" ) A {] = limw<" Pr [y(w) A {]. Applied to the N*D/D/1 queue the unfinished work equals the number of ATM cells in the system, thus Pr [y(w" ) A {] = Pr [QV A {]. Hence, in the steady-state for ? 1 or Q ? G, ¯Z w" ¸ " X ¯ Pr [QV A {] = Pr y(w" + { n) = 0 ¯¯ QD (x) gx = n n=d{e w" +{3n ¸ Z w" × Pr w" +{3n QD (x) gx = n The periodic cell trains with period equal to G timeslots at each input line lead to a periodic aggregated arrival stream of the Q input lines also with period G. Each cell train transports precisely one cell per period G, which allows us to observe the characteristics of the aggregated arrival process during the time interval [0> G). The computations are most conveniently performed if we choose the steady-state observation point w" = G. Each of the Q ATM cells arrives uniformly in [0> G] due to the random phasing of each cell train and the probability that it arrives in [G +{n> G] is s = n3{ G . Hence, the number of arrivals in [G + { n> G] is a sum of Bernoulli random variables, which is binomially distributed, Z G ¸ µ ¶µ ¶ µ ¶ Q n{ n n { Q3n Pr QD (x) gx = n = 1 n G G G+{3n The conditional probability is obtained as follows. The unfinished work at time G + { n only depends on past arrivals in the interval [0> G + { n]. Given that the number of arrivals in [G + { n> G] equals n while there are always precisely Q in [0> G], the number of arrivals in [0> G + { n] Q3n ? 1 since equals Q n and the corresponding tra!c intensity 0 = G+{3n Q ? G and, thus, Q n ? G n for any n. From Section 13.3.2, we use the local stationary result: for any stationary single-server queueing system with tra!c intensity , the probability of an empty system at an arbitrary time is 1 . If we take a random point in w 5 [0> G + { n], then stationarity 0 implies that Pr [y (w) = 0] = 1 0 = G+{3Q G+{3n . Since ? 1, the system 302 Queueing models is necessarily empty at some instant wW in [0> G + { n). As explained in Section 13.2 and Section 13.3, we may consider that the system restarts from wW on ignoring the past. But the probability Pr [y (w) = 0] at a random time in the interval [0> wW ] and in [wW > G + { n] is the same, which means that Pr [y(G + { n) = 0] = G+{3Q G+{3n because we can periodically repeat the system’s process in [0> wW ] while omitting any activity in [wW > G + { n]. With respect to this newly constructed periodic arrival pattern, the point w = G + { n is arbitrary such that the local stationary result is applicable. In summary, we arrive at the overflow probability in a N*D/D/1 system, Pr [QV A {] = µ ¶µ ¶ µ ¶ Q X G+{Q Q n{ n n { Q3n (14.46) 1 G+{n n G G n=d{e Observe that Pr [QV A Q ] = 0. Rewriting (14.46) yields Q µ ¶ G+{Q X Q (n {)n (G + { n)Q3n31 Pr [QV A {] = GQ n n=d{e Q3d{e µ ¶ G+{Q X Q = (Q m {)Q3m (G + { Q + m)m31 Q m G m=0 G+{Q = GQ Q µ X m=0 G+{Q GQ Q m ¶ (G + { Q + m)m31 (Q { m)Q3m Q X µ ¶ Q (G+{Q +m)m31 (Q m {)Q3m m m=Q3d{e+1 Applying Abel’s identity (Comtet, 1974, p. 128), valid for all x> |> }, q µ ¶ X q (x n})n31 (| + n})q3n (x + |) = x n q (14.47) n=0 with q = Q , x = G + { Q , | = Q { and } = 1 gives Pr[QV {] = G+{Q GQ Q X µ ¶ Q (G + { Q + m)m31 (Q m {)Q3m m m=Q3d{e+1 (14.48) demonstrating that, indeed, Pr [QV 0] = 0. For small {, relation (14.48) is convenient, while (14.46) is more suited for large { $ Q . For example, ¡ 1 ¢Q ¡ ¢ 1 Q , while Pr [QV A Q 1] = G . 1+ G Pr [QV 1] = G+13Q G+1 14.6 The N*D/D/1 queue 303 In the heavy tra!c regime for = Q G $ 1, a Brownian motion approximation (Roberts, 1991) is { 32{ Q + 13 Pr [QV A {] ' h (14.49) Figure 14.6 compares the exact (14.46) overflow probability and the Brownian approximation (14.49) for = 0=95. Observe from (14.45) that £ ¤ 2{2 Pr [QV A {] ' h3 Q Pr QM/D/1 A { which shows that, for su!ciently high Q , the overflow probability of the N*D/D/1 queue tends to that of the M/D/1 queue. Thus, an arrival process consisting of a superposition of a large number of periodic processes tends to 2{2 a Poisson arrival process. The decaying factor h3 Q reflects the eect of the negative correlations in the arrival process and shows that a Poisson process overestimates the tail probability in heavy tra!c. Comparison of (14.46) and (14.44) for lower loads = Q G illustrates that the Poisson approximation becomes more accurate. 0 10 -1 Exact Brownian Approximation 10 -2 10 -3 10 M/D/1: N of -4 10 -5 10 -6 Pr[NS > x] 10 N = 5000 -7 10 -8 10 -9 10 N = 1000 -10 10 N = 500 -11 10 N = 200 -12 10 N = 100 -13 N = 50 10 U = 0.95 -14 10 -15 10 0 20 40 60 80 100 x Fig. 14.6. The overflow probability Pr [QV A {] in the N*D/D/1 queue for = 0=95 and various number of sources Q . 304 Queueing models 14.7 The AMS queue Many arrival patterns in telecommunication exhibit active or “on” periods succeeded by silent inactive or “o” periods. At the burst or flow level phenomena of the order of time of an on-o period are dominant and the finer details of the individual packet arrivals within an on-period can be ignored. The stream of packets can be regarded as a continuous fluid characterized by the flow arrival rate. The AMS queue is perhaps the simplest exact solvable queueing model that describes the queueing behavior at the burst or flow level. The AMS queue named after Anick, Mitra and Sondhi (Anick et al., 1982; Mitra, 1988) considers Q homogeneous, independent on-o sources in a continuous fluid flow approach. For each source, both the on- and o-period are exponentially distributed, which makes the model Markovian by Theorem 10.2.3. In the on-period each source emits a unit amount of information. Hence, at each moment in time when u sources are in the on-period, u packets (units of information) arrive at the buer. The service time is constant and equal to f ? Q packets per unit time. If f A Q , then the buer is always empty. The time unit is chosen as the average time of an on-period while the average time of an o-period is denoted by 1 . The buer size is infinitely long. The Q tra!c intensity then equals = f(1+) and stability requires that ? 1. The ratio 1+ is the long term “on” time fraction of the sources. Suppose the number of on-sources at time w is l. During the next time interval w only two elementary actions can take place: a new source can start with probability (Q l)w or a source can turn o with probability lw. Compound events have probabilities R(w2 ). The probability of no change in the arrival process is 1 [(Q l) + l]w during which l sources are active and the queue empties at rate f l. The AMS queueing process is a birth-death process where state l describes the number of on-sources and where the birth rate l = (Q l) and the death rate l = l. Let Sl (w> {) where 0 l Q , w 0, { 0 be the probability that at time w, l sources are on and the buer content does not exceed {. Then, we have Sl (w + w> {) = [Q (l 1)] w Sl31 (w> {) + (l + 1) w Sl+1 (w> {) + [1 {(Q l) + l} w] Sl (w> { (l f) w) + R(w2 ) Passing to the limit w $ 0 yields CSl (w> {) CSl (w> {) + (l f) = (Q l + 1)Sl31 (w> {) [(Q l) + l]Sl (w> {) Cw C{ + (l + 1) Sl+1 (w> {) 14.7 The AMS queue 305 The time-independent equilibrium probabilities, l ({) = limw<" Sl (w> {), reflect the steady-state where l sources are on and the buer content does (w>{) = 0, the steady-state equations become for not exceed {. Setting CSlCw 0lQ gl ({) = (Q l + 1) l31 ({) [(Q l) + l] l ({) + (l + 1) l+1 ({) g{ (14.50) In matrix notation, where ({) is a column vector as opposed to Markov theory where ({) is a row vector, (l f) G g({) = T({) g{ (14.51) and G = diag[f> 1 f> 2 f> = = = > Q f] and T is a tri-diagonal (Q + 1) × (Q + 1) matrix, 5 Q 1 0 0 9 Q [(Q 1) + 1] 2 0 9 0 (Q 1) [(Q 2) + 2] 3 9 T=9 .. .. .. 9 .. . . . 9 . 7 0 0 0 ··· 0 0 0 ··· 6 ··· 0 ··· 0 : ··· 0 : : .. .. : : . . : [ + (Q 1)] Q 8 Q The buer overflow probability is Pr [QV A {] = 1 k({)k1 = 1 PQ g({) m=0 m ({), which implies that k(4)k1 = 1. Moreover, lim{<" g{ = 0 and T(4) = 0 corresponds to the steady-state of the continuous Markov chain (arrival process and service process as a whole). Furthermore, µ ¶ 1 Q l (14.52) l (4) = (1 + )Q l is the probability that l out of Q sources are on simultaneously irrespective of what the buer level in the system is. 31 As shown in Section 10.2.2, besides ({) = hG T{ (0), the solution of (14.51) can be expressed in terms of the eigenvalues m , the corresponding right-eigenvector {m and left-eigenvector |m of G31 T, ({) = Q X ¡ ¢ hm { {m |mW (0) m=0 where, as shown in Appendix A.5.2.2, the eigenvalues are labeled in increasing order Q 3[f]31 ? · · · ? 1 ? 0 ? Q = 0 ? Q31 ? · · · ? Q3[f] . This way of writing distinguishes between underload and overload eigenvalues. Only bounded solutions are allowed. As shown in Appendix A.5.2.2, there 306 Queueing models are precisely Q [f]1 negative real eigenvalues such that m 5 [0> Q [f]1]. In addition, m = Q that corresponds to the eigenvalue Q = 0 and the (4) eigenvector. The general bounded solution of (14.51) is X Q3[f]31 ({) = (4) + dm hm { {m (14.53) m=0 where the scalar coe!cients dm = |mW (0) still need to be determined. Rather than determining |mW (0) as in Appendix A.5.2.2, a more elegant and physical method is used. The eigenvalue solution in Appendix A.5.2.2 has scaled the eigenvectors by setting the Q component equal to 1, hence, ({m )Q = 1. Writing the Q -th component in (14.53) gives with (14.52) Q + Q ({) = (1 + )Q X Q3[f]31 dm hm { (14.54) m=0 The most convenient choice of { is { = 0. If the number of on-source m at any time exceeds the service rate f, then the buer builds-up and cannot be empty, m (0) = 0 for [f] + 1 m Q This observation provides one equation in (14.54) for the coe!cients dn , X Q3[f]31 m=0 dm = Q (1 + )Q and shows that Q [f] 1 additional equations are needed to determine all coe!cients dm . By dierentiating (14.54) p-times and evaluating at { = 0, we find these additional equations ¯ Q3[f]31 X gp Q ({) ¯¯ = dm mp ¯ p g{ {=0 m=0 which will be determined with the help of the dierential equation (14.51). Indeed, for p = 1, the dierential equation (14.51) gives ¯ gQ ({) ¯¯ = G31 T(0) g{ ¯{=0 31 The important observation is that the eect of multiplication by ¯ G T gm ({) ¯ decreases the number of zero components in (0) by 1, i.e. g{ ¯ =0 {=0 for [f] + 2 m Q . Any additional multiplication by G31 T has the same 14.7 The AMS queue eect. Since that gp ({) g{p 307 ¡ ¢p = G31 T ({), we thus find, for 0 p Q [f] 1 ¯ gp Q ({) ¯¯ =0 g{p ¯{=0 We write these Q [f] equation in the unknown dn in matrix form, 5 1 1 12 .. . 9 9 9 9 9 9 9 Q [f]2 9 7 1 Q [f]1 1 1 2 22 .. . Q [f]2 3 Q [f]1 3 2 2 ··· ··· ··· .. . ··· 1 3 32 .. . Q[f]2 Q[f]1 6 1 6 5 5 Q [f]1 : d0 9 : 2 d1 Q : 9 : 9 9 [f]1 : 9 : d2 .. :=9 :=9 9 : : 9 . .. : 7 8 9 9 Q [f]2 : . 7 Q [f]1 8 d Q [f]1 Q [f]1 Q [f]1 ··· Q (1+)Q 0 0 .. . 0 0 6 : : : : : : : 8 and recognize the matrix, denoted by Y , as a Vandermonde matrix (Section A.1, art. 5) with Y Y l=0 m=l+1 Q3[f]31Q3[f]31 det (Y ) = (m l ) Since all eigenvalues appearing in the Vandermonde matrix are distinct (Appendix A.5.2.2) det (Y ) 6= 0 and a unique solution follows for all 0 m Q [f] 1 from Cramer’s theorem as µ dm = 1+ ¶Q Q3[f]31 Y l=0;l6=m m l m (14.55) Together with the exact determination of the eigenvalues m and corresponding right-eigenvector {m explicitly given in Appendix A.5.2.2, the coe!cients dm completely solve the AMS queue. P The buer overflow probability Pr [QV A {] = 1 Q m=0 m ({) becomes PQ with m=0 m (4) = 1, X Q3[f]31 Pr [QV A {] = m=0 m { dm h Q X ({m )o o=0 Using the explicit form of the generating function (A.44) where the roots u1 and u2 belonging to eigenvalue n are specified in (A.42) and the residue n = nm = f1 in (A.43), the buer overflow probability is X Q 3[f]31 Pr [QV A {] = m=0 dm hm { (1 u1 )nm (1 u2 )Q 3nm (14.56) 308 Queueing models For large {, Pr [QV A {] will be dominated by the exponential with the largest negative eigenvalue 0 (for which n0 = Q ), Pr [QV A {] d0 h0 { Q X ({Q )m m=0 Writing that largest negative eigenvalue (A.47) in terms of the tra!c intensity , gives (1 + ) (1 ) 0 = 1 Qf ¡ Q ¢Q P From (A.49), we have Q . Combined with (14.55), the m=0 ({Q )m = f asymptotic formula for the buer overflow probability becomes Y Q3[f]31 Q 0 { Pr [QV A {] h l=1 l l 0 (14.57) 0 10 -1 10 -2 10 -3 U = 0.9 10 -4 10 -5 10 -6 Pr[NS > x] 10 N = 40 -7 10 -8 10 U = 0.7 -9 10 N = 100 -10 10 -11 10 -12 10 U = 0.5 -13 10 -14 10 -15 10 0 5 10 15 20 x Fig. 14.7. The overflow probability (14.56) in the AMS queue versus the buer level { for fixed = 12 . For each tra!c intensity = 0=5, 0=7 and 0.9, the upper curve corresponds to Q = 40 and the lower to Q = 100. The asymptotic formula (14.57) is shown in dotted line. Figure 14.7 shows both the exact (14.56) and asymptotic (14.57) overflow 14.8 The cell loss ratio 309 probability as function of { for various tra!c intensities and two size of Q . The average o-period in Fig. 14.7 is two times the average on-period. For large values of { and large tra!c intensities , the asymptotic formula is adequate. For smaller values, clear dierences are observed. As mentioned in Section 5.7, the asymptotic regime that nearly coincides with (14.57) refers to the burst scale phenomena while the non-asymptotic regime reflects the smaller scale variations. The AMS queue allows us to analyze the eect of the burstiness of a source by varying . 14.8 The cell loss ratio Due to its importance in ATM and in future time-critical communication services, the QoS loss-performance measure, the cell loss ratio, deserves some attention. In designing a switch for time-critical services with strict delay requirements smaller than GW , the buer size N is dimensioned as follows. The order of magnitude of GW is about 10 ms, the maximum end-to-end delay for high-quality telephony (world wide) advised in ITU-T standards. Let H[K] be the average number of hops of a path in an ATM network that rarely exceeds 10 hops. The buer size N is determined such that the GW GW , thus, N maximum waiting time of a cell never exceeds H[K] 10 . For example, for STM-1 links where F = 155 Mb/s, we have that ' 366 800 W ATM cell/s such that N G 10 ' 366 ATM cell buer positions. This firstorder estimate shows that the ATM buer for time-critical tra!c consists of a few hundreds of ATM cell positions. That small number for N indeed assures that the delay constraints are met, but introduces the probability to loose cells. Hence, the QoS parameter to be controlled for time-critical services is the cell loss ratio. The cell loss ratio fou is defined as the ratio of the long-time average number of lost cells because of buer overflow to the long-time average number of cells that arrive in steady-state. There are typically two dierent views to describe the cell loss ratio: a conservation-based and a combinatorial one. The conservation law simply states that cells entering the system also must leave it. The average number of entering cells are all those oered per time slot minus the ones that have been rejected, thus (1 fou). On the other side, the average number of cells that leave the system are related to the server activity as (1 t [0]), where is the service rate and t[m] = Pr [QT = m]. Hence, we have (1 fou) = (1 t [0]) (14.58) In the combinatorial view, only the arrival process is viewed from a position 310 Queueing models in the buer and the number of ways in which cells are lost are counted leading to " N 1 X X q t [N m] d [m + q] (14.59) fou = 0 D (1) q=0 m=0 with D0 (1) = and d[m] = Pr [A = m]. Although equation (14.58) is simple, its practical use is limited since the quantities involved are to be known with extremely high accuracy if fou is of the order of 10310 , which in practice means a virtually loss-free service. Therefore, we confine ourselves to the combinatorial result and express (14.59) in terms of a generating function as ¯ gY (}) ¯¯ 0 fou D (1) = (14.60) g} ¯}=1 where Y (}) = " X }q q=0 = N X N X t [N m] d [m + q] = m=0 t [N m] } 3m Ã" X " X N X }q q=0 ! t [N m] } 3m d [m + q] } m m=0 d [m + q] } m+q q=0 m=0 Rearranging in terms of the generating function for the arrivals D(}) and P m for the buer occupancy T(}) = N m=0 t [m] } , where t [m] = 0 for m A N, yields à ! m31 N X X t [N m] } 3m D(}) d [q] } q Y (}) = q=0 m=0 = D(}) } 3N N X t [N m] } N3m m=0 = } 3N D(})T(}) } 3N N X t [N m] } 3m N X m=0 t [m] } m d [q] } q q=0 m=0 N3m31 X m31 X d [q] } q (14.61) q=0 In order to express the cell loss ratio entirely in terms of the generating functions D(}) and T(}), we employ (2.20), à ! Z Z q q ³ } ´q+1 ¸ X X 1 \ ($) \ ($) 1 m m |[m]} = } g$ = 1 g$ 2l F(0) $ m+1 2l F $ } $ m=0 m=0 Z \ ($) ³ } ´q+1 1 = \ (}) g$ (14.62) 2l F $ } $ 14.8 The cell loss ratio 311 where F is a contour enclosing the origin and the point } and lying within the convergence region of \ (}). Combining (14.61) and (14.62), we rewrite Y (}) as Z D($)T($) 1 g$ Y (}) = } 3N D(})T(}) } 3N T(})D(}) + 2l F ($ }) $ N Z 1 D($)T($) = g$ 2l F ($ }) $ N Finally, our expression for the cell loss ratio in a GI/G/1/K system reads Z D($)T($) 1 g$ (14.63) fou = 2lD0 (1) F ($ 1)2 $N where the contour F encloses both the origin and the point } = 1 and lies in the convergence region of D(}). Usually, D(}) is known while T(}) proves to be more complicated to obtain. The product T(})D(}) = V(}) is the pgf of the system content. If T(}) and D(}) are meromorphic functions7 and if ¯ ¯ ¯ D(}) T(}) ¯ ¯ ¯ = 0> lim }<" ¯ (} 1)2 } N31 ¯ the contour F in (14.63) can be closed over |$| A 1-plane to get 1 X D($)T($) fou = 0 Res$<s N D (1) s $ ($ 1)2 (14.64) where s are the poles of D(})T(}) outside the unit circle. If these conditions are met, a non-trivial evaluation of the cell loss ratio can be obtained. In case the buer pgf of the finite system is known, then T(}) is a polynomial T(}) is zero and of degree at most N so that the only pole of T(}) }N ¯ lim¯}<" } N = ¯ } D(}) ¯ t(N) 1 and the above conditions simplify to lim}<" ¯ (}31) 2 ¯ = 0. Executing (14.63) then leads to T(s) 1 X Res$<s D($) (14.65) fou = 0 N D (1) s s (s 1)2 where only the poles s of the arrival process D(}) play a role. For example, if the number of arrivals has a geometric distribution d [n] = (1 )n 13 with 0 1 with generating function (3.6), Dgeo (}) = 13} , then the conditions for (14.65) are satisfied and we obtain, µ ¶ 1 N fougeo = T 7 Functions that only have poles in the complex plane. 312 Queueing models An important class excluded from (14.64) consists of entire functions D(}) that possess a Taylor series expansion converging for all complex variables }. The pgf of a Poisson process with parameter , DPoisson (}) = h(}31) , is an important representative of that class. For a Poisson arrival process, (14.63) is Z h$ T($) h3 g$ fouPoisson = 2l F ($ 1)2 $N Deforming the contour to enclose the negative half $-plane (Re($) ? f) yields Z f+l" h$ T($) h3 fouPoisson = g$ 2l f3l" ($ 1)2 $ N where the real number f exceeds unity. This expression is recognized as an inverse Laplace transform and since the argument of the Laplace transform is a rational function, an exact evaluation is possible leading, however, again to (14.59). Hence, the combinatorial view does not oer much insight immediately which suggests to consider a conservation-based approach. Indeed, it is well known that, owing to the PASTA property (Theorem 13.5.1), an exact expression (Syski, 1986; Bisdikian et al., 1992; Steyaert and Bruneel, 1994) in continuous-time for the cell loss ratio in a M/G/1/K system can be derived, with the result fouM/G/1/K;cont = (1 ) Pr[Q A N 1] 1 Pr[Q A N 1] (14.66) where, as usual, the tra!c intensity = and Pr [Q A N 1] is the overflow probability in the corresponding infinite system M/G/1. Transforming fouco nt yields (14.66) to discrete-time using foudiscr = (13fou co nt ) fouM/G/1/K;discr = 1 Pr[Q A N 1] 1 Pr[Q A N 1] (14.67) 14.9 Problems (i) A router processes 80% of the time data packets. On average 3.2 packets are waiting for service. What is the mean waiting time of a packet given that the mean processing time equals 1 ? (ii) Compute in a M/M/m/m queue the average number of busy servers. (iii) Let us model a router by a M/M/1 system with average service time equal to 0.5 s. 14.9 Problems 313 (a) What is the relation between the average response time (average system time) and the arrival rate ? (b) How many jobs/s can be processed for a given average response time of 2.5 s? (c) What is the increase in average response time if the arrival rate increases by 10%? (iv) Assume that company has a call center with two phone lines for service. During some measurements it was observed that both the lines are busy 10% of the time. On the other hand, the average call holding time was 10 minutes. Calculate the call blocking probability in the case that the average call holding time increases from 10 minutes to 15 minutes. Call arrivals are Poissonean with constant rate. (v) Consider a queueing network with Poisson arrivals consisting of two infinitely long single-server queues in tandem with exponential service times. We assume that the service times of a customer at the first and second queue are mutually independent as well as independent of the arrival process. Let the rate of the Poisson arrival process be , and let the mean service rates at queues 1 and 2 be 1 and 2 , respectively. Moreover, assume that ? 1 and ? 2 . Give the probability that in steady-state there are q customers at queue 1 and p customers at queue 2. (vi) Let us consider the following simple design question: which queue of the M/M/m family is most suitable if the arrival rate is and the required service rate is n, with n A 1. We have the three options illustrated in Fig. 14.8 at our disposal. Since all queues have infinite P O/k 1 1 kP O P O P P O/k A k k B C Fig. 14.8. Three dierent options: (A) one M/M/1 queue with service rate n, (B) n M/M/1 queue with service rate and (C) one M/M/k queue with service rate n= buers and the same tra!c intensity = n , and thus the same throughput. The QoS qualifier of interest here is the delay, more 314 Queueing models precisely, the system time W of a packet. Compare the average system time and draw conclusions. (vii) An aeroplane takes exactly 5 minutes to land after the airport’s tra!c control has sent the signal to land. Aeroplanes arrive at random with an average rate of 6/hour. How long can an aeroplane expect to circle before getting the signal to land? (Only one aeroplane can land at a time) (viii) There are two kinds of connection requests arriving at a base station of a mobile telephone network: connection requests generated by new calls (that originate from the same cell as the base station) or handovers (that originate from a dierent cell, but are transferred to the cell of the base station). The handovers are supposed not to experience blocking. Therefore, the base station has to reject some of the new call connection requests. Every accepted connection request occupies one of the P available channels. During a busy hour, the average measured channel occupation time of a call is 1.64 minutes irrespective of the type of call. Furthermore, the average number of active calls is 52 and the measured blocking is 2% of the number of all the connection requests. The average interarrival time between two consecutive new call connection requests in the cell is 3 seconds. (a) Calculate the arrival rate (in calls/minute) for the handover calls. (b) What is the percentage of new calls that are blocked? (ix) Let Q denote the number of Poisson arrivals with rate during the service time { (random variable) of a packet. Assume that the Laplace transform of the service time *{ (v) = H[h3v{ ] is known. (a) Show that the pgf of Q is given by *{ ((1 })). (b) What is the pgf if the service time { is exponential distributed with mean 1 ? Deduce from this the distribution of Q . (x) A single-server queue has exponential inter-arrival and service times with means 31 and 31 , respectively. New customers are sensitive to the length of the queue. If there are l customers in the system when a customer arrives, then that customer will join the queue with a probability (l+1)31 , otherwise he/she departs and does not return. Find the steady-state probability distribution of this queuing system. (xi) The M/M/m/m/s queue (The Engset-formula). Consider a system with p connections and v customers who all desire to telephone and, hence, need to obtain a connection or line. Each customer can occupy at most one line. The group of v customers consists of two subgroups. 14.9 Problems 315 When a line has been assigned to a customer, this customer is transferred from the “still demanding subgroup” to the “served group”. The number of call attempts decreases with the size of the “served group” whose members all occupy one line. More precisely, the arrival rate in the Engset model is proportional to the size of the “still demanding subgroup” and the number of arrivals is exponential. The holding time of a line is also exponentially distributed with mean 1 . (a) Describe the M/M/m/m/s queue as a birth-death process. (b) Compute the steady-state. (c) Compute the blocking probability (similar to the blocking in the Erlang model). (xii) Compare the cell loss ratio of the M/M/1/K and of the discrete M/1/D/K using the dominant pole approximation in Section 5.7. Hint: approximate the cell loss ratio by the overflow probability. Part III Physics of networks 15 General characteristics of graphs The structure or interconnection pattern of a network can be represented by a graph. Properties of the graph of a network often relate to performance measures or specific characteristics of that network. For example, routing is an essential functionality in many networks. The computational complexity of shortest path routing depends on the hopcount in the underlying graph. This chapter mainly focuses on general properties of graphs that are of interest to Internet modeling. Mainly driven by the Internet, a large impetus from dierent fields in science makes the understanding of the growth and the structure of graphs one of the currently most studied and exciting research areas. The recent books by Barabasi (2002) and Dorogovtsev and Mendes (2003) nicely reflect the current state of the art in stochastic graph theory and its applications to, for example, the Internet, the World Wide Web, and social and biological networks. 15.1 Introduction Network topologies as drawn in Fig. 15.1 are examples of graphs. A graph J is a data structure consisting of a set of Y vertices connected by a set of H edges. In stochastic graph theory and communications networking, the vertices and edges are called nodes and links, respectively. In order to dierentiate between the expectation operator H [=], the set of links is denoted by L and the number of links by O and similarly, the set of nodes by N and number of nodes by Q . Thus, the usual notation of a graph J (Y> H) in graph theory is here denoted by J (Q> O). The full mesh or complete graph NQ consists of Q nodes and O = Omax = Q(Q31) links, where every node has a link to every other node. The graph 2 that is generated by the statement “any l is directly connected to any m” in 319 320 General characteristics of graphs full mesh (complete graph) star ring 2D (square) lattice Tree (connected, loopless graph) Fig. 15.1. Several types of network topologies or graphs. a population of Q members, ¡ ¢ is a complete graph NQ . Since in NQ the number of links Omax = R Q 2 for large Q , it demonstrates “Metcalfe’s law”: the value of networking increases quadratically in the number of connected members. The interconnection pattern of a network with Q nodes can be represented by an adjacency matrix D consisting of elements dlm that are either one or zero depending on whether there is a link between node l and m or not. The adjacency matrix is a real symmetric Q × Q matrix when we assume bi-directional transport over links. If there is a link from l to m (dlm = 1) then there is a link from m to l (dml = 1) for any m 6= l. Moreover, we exclude self-loops (dmm = 0) or multiple links between two nodes l and m. More properties of the adjacency matrix of a graph are found in Appendix B. A walk from node D to node E with n 1 hops or links is the node list WD<E = q1 $ q2 $ · · · qn31 $ qn where q1 = D and qn = E. A path from node D to node E with n 1 hops or links is the node list PD<E = q1 $ q2 $ · · · qn31 $ qn where q1 = D and qn = E and where qm 6= ql for each index l and m. Sometimes the shorter notation PD<E = q1 q2 · · · qn31 qn is used. All links ql $ qm and the nodes qm in the path PD<E are dierent, whereas in a walk WD<E no restrictions on the node list is put. If the starting node D equals the destination node E, that path PD<D is called a cycle or loop. In telecommunications networks, paths and not walks are basic entities in connecting two communicating parties. Two paths between D and E are node(link)-disjoint if they have no nodes(links) in common. Apart from the topological structure specified via the adjacency matrix D, the link between node l and m is further characterized by a link weight z(l $ m), most often a positive real number1 that reflects the 1 In quality of service routing, a link is specified by a vector z(l < m) with positive components, each reflecting a metric (such as delay, jitter, loss, monetary cost, administrative weight, physical distance, available capacity, priority, etc.). 15.2 The number of paths with m hops 321 importance of that particular link. Often symmetry in both directions, z(l $ m) = z(m $ l), is assumed leading to undirected graphs. Although this assumption seems rather trivial, we point out that in telecommunications, transport of information in up-link and down-link is, in general, not symmetrical. Via measurements in the Internet, Paxson (1997) found in 1995 that about 50% of the paths from D $ E were dierent from those from E $ D= Furthermore, it is often assumed that the link metric z(l $ m) is independent from z(n $ o) for all links (l $ m) dierent from (n $ o). In the Internet’s intra-domain routing protocol, the Open Shortest Path First (OSPF) protocol, network operators have the freedom2 to specify the link weight z(l $ m) A 0 on the interfaces of their routers. 15.2 The number of paths with m hops Let [m (D $ E; Q) denote the number of paths with m hops between a source node D and a destination node E. The most general expression for the number of paths with m hops between node D and node E is X X X ··· 1D<n1 = 1n1 <n2 =· · · =1nm31 <E [m (D $ E; Q ) = n1 M{D>E} @ n2 M{D>n @ @ 1 >E} nm31 M{D>n 1 >··· >nm32 >E} (15.1) where 1{ is the indicator function. The number of paths with one hop equals [1 (D $ E; Q ) = 1D<E . The maximum number of m hop paths is attained in the complete graph NQ where 1n1 <n2 = 1 for each link n1 $ n2 and equals (Q 2)! max([m (D $ E; Q )) = (15.2) (Q m 1)! The maximum number of hops in any path is Q 1. This maximum occurs, for example, in a line graph where the path runs from the one extreme node to the other or in a ring (see Fig. 15.1) between neighboring nodes where there is a one hop and a (Q 1)-hops path. The total number of paths PQ between two nodes in the complete graph is PQ = Q31 X max([m (D $ E; Q )) = m=1 Q31 X m=1 X 1 (Q 2)! = (Q 2)! (Q m 1)! n! Q32 n=0 = (Q 2)!h U 2 8 10 In Cisco’s OSPF implementation, it is suggested to use z(l < m) = E(l<m) where E(l < m) denotes the capacity (in bit/s) of the link between nodes l and m. An approach to optimize the OSPF weights to reflect actual tra!c loads is presented by Fortz and Thorup (2000). 322 General characteristics of graphs where " " X X 1 (Q 2)! = U = (Q 2)! m! (Q 1 + m)! m=Q31 m=0 1 1 1 + + + ··· Q 1 (Q 1)Q (Q 1)Q (Q + 1) ¶m " µ X 1 1 = ? Q 1 Q 2 = m=1 implying that for Q 3, U ? 1. But PQ is an integer. Hence, the total number of paths in NQ is exactly equal to PQ = [h(Q 2)!] (15.3) where h = 2.718 281=== and [{] denotes the largest integer smaller than or equal to {. Since any graph is a subgraph of the complete graph, the maximum total number of paths between two nodes in any graph is upper bounded by [h(Q 2)!]. 15.3 The degree of a node in a graph The degree gm of a node m in a graph J(Q> O) equals the number of its neighboring nodes and 0 gm Q 1. Clearly, the node m is disconnected from the rest of the graph if gm = 0. Hence, in connected graphs, 1 gm Q 1. The basic law for the degree (see also Appendix (B.2)) is Q X gm = 2O> m=1 since each link belongs to precisely two nodes and, hence, is counted twice. In directed graphs, the in(out)-degree is defined as the number of the in(out)going links at a node, while the sum of in- and out-degree equals the degree. The minimum nodal degree in the graph J is denoted by gmin = minmMJ gm . P 2O The average degree of a graph is defined as gd = Q1 Q m=1 gm = Q which is, for a connected graph, bounded by 2 Q2 gd Q 1. The lower bound is obtained for any spanning tree, a graph that connects all nodes and that contains no cycles and where O = Omin = Q 1. The upper bound is reached in the complete graph NQ with Omax = Q(Q31) . Graphs where 2 gmin = gd such as NQ and the ring topology in Fig. 15.1 are called regular graphs since any node has precisely gd links. Sometimes networks are classified either as dense if gd is high or as sparse 15.3 The degree of a node in a graph 323 Fig. 15.2. Degree graph with = 2=4 and Q = 300. All nodes are drawn on a circle. if gd is small. For instance, the Internet is sparse with average degree gd 3, although some backbone routers may have a much higher degree, even exceeding 100. The distribution of the degree GInternet of an arbitrary node in the Internet is shown to be approximately polynomial (Siganos et al., 2003), Pr [GInternet = n] n3 ( ) (15.4) P 3v for Re(v) A 1 is the Riemann with3 5 (2=2> 2=5) and (v) = " n=1 n Zeta function (Titchmarsh and Heath-Brown, 1986). A graph of this class is called a degree graph. Figures 15.2 and 15.3 show two instances of a degree graph. Also the web graph consisting of websites and hyperlinks features a power law for the in-degree. David Aldous has given the following argument why a power law of the in-degree of the web graph is natural. To a good approximation, the number of websites is growing exponentially at rate A 0. This means that the lifetime W of a random website satisfies Pr [W A w] h3w . 3 A more general expression than (15.4) is Pr [gm = n] = fn j(n), where f is a normalization constant and where j(n) is a slowly varying function (Feller, 1971, pp. 275-284) with basic property that limw<" j(w{) = 1, for every { A 0. j(w) 324 General characteristics of graphs Fig. 15.3. Degree graph with = 2=4 and Q = 200. The higher degree nodes are put inside the circle. Let o (x) denote the number of links into a site at time x after its creation. At observation time w, the distribution of the number of links [ into a random website is, by the law of total probability, Z w g Pr [W x] Pr [[ A n|W = x] gx Pr [[ A n] = gx 0 Z w h3x Pr [[ A n|W = x] gx 0 Z w Z w 31 3x h 1{o(x)An} gx = h3x gx = h3w + h3o (n) 0 o31 (n) Only if o increases exponentially fast as o (x) hx for some ? , a power law behavior of the in-degree Pr [[ A n] n3 arises for su!ciently large w. For a polynomial growth o (x) x and large w, Pr [[ A n] h3n 1 The large dierence in the decrease of Pr [[ A n] with n between both ex- 15.4 Connectivity and robustness 325 amples illustrates the importance of the growth law of o (x). The argument shows that a polynomial scaling law, commonly referred to as a power law, is a natural consequence of exponential growth. An exponential growth possesses the property that go(x) gx = o (x) which is established by preferential attachment. Preferential attachment means that new links are on average added to sites proportional to their size. The more links a site has, the larger the probability that a new link attaches to this site. For example, already popular websites are increasingly more often linked to than small or less popular websites. Since many aspects of the Internet, such as the number of IP packets, number of users, number of websites, number of routers, etc., are currently growing approximately exponentially fast, the often observed power laws are more or less expected. 15.4 Connectivity and robustness A graph J is connected if there is a path between each pair of nodes and disconnected otherwise. A telecommunication network should be connected. Moreover, it is essential that the network should be robust: it should still operate if some of the links between routers or switches are broken or temporarily blocked by other calls. Hence, the network graph should possess a redundancy of links. The minimum number of links to connect all nodes in the network equals Q 1. This minimum configuration is called redundancy level 1. In general, a redundancy level of G is defined by Baran (2002) as the link-to-node ratio in an infinite G-lattice4 . A redundancy level of at least 3 is regarded as a highly robust network. A consequence of this insight has been employed in the design of the early Internet (Arpanet): it would be theoretically possible to build extremely reliable communication networks out of unreliable links by the proper use of redundancy. Another more timely application of the same principle is the design of reliable ad-hoc and sensor networks. 4 A G-lattice is a graph where each nodal position corresponds to a point with integer coordinates within a G dimensional hyper-cube with size ]. Apart from the border nodes, each node has a same degree equal to 2G. The number of nodes equals Q = ] G . From (B.2), the link-to-node ratio follows as Q 1 [ O gm = G 3 u = Q 2Q m=1 1 where the correction u = R Q G 31 is due to the border nodes. For an infinite G-lattice, where the limit ] < " (which implies Q < "), we obtain lim O ]<" Q =G 326 General characteristics of graphs There exist interesting results from graph theory that help to dimension a reliable telecommunication network. Instead of the redundancy level, the edge and vertex connectivity seem more natural quantifiers from which robustness can be derived. The edge connectivity (J) of a connected graph J is the smallest number of edges (links) whose removal disconnects J. The vertex connectivity (J) of a connected graph dierent5 from the complete graph NQ is the smallest number of vertices (nodes) whose removal disconnects J. edge connectivity C B A D O(G) = 1 G C B E F A D G F C B E A N(G) = 1 vertex connectivity D E F Fig. 15.4. An example of the edge and the vertex connectivity of a graph. These definitions are illustrated in Fig. 15.4. For any connected graph J holds that (J) (J) gmin (J) (15.5) In particular, if J is the complete graph NQ , then (NQ ) = (NQ ) = gmin (NQ ) = Q 1. Due to the importance of the inequality6 (15.5), it deserves some more discussion. Let us concentrate on a connected graph J that is not a complete graph. Since gmin (J) is the minimum degree of a node, say q, in J, by removing all links of node q, J is disconnected. By definition, since (J) is the minimum number of links that leads to disconnectivity, it follows that (J) (J) and (J) Q 2 because J is not a complete graph and consequently the minimum nodal degree is at most Q 2. Furthermore, the definition of (J) implies that there exists a set V of (J) links whose removal splits the graph J into two connected subgraphs J1 and J2 , as illustrated in Fig. 15.5. Any link of that set V 5 The complete graph NQ cannot be disconnected by removing nodes and we define (NQ ) = Q 3 1 for Q D 3. 6 A second general inequality (B.23) relates the second smallest eigenvalue of the Laplacian to the edge and vertex connectivity (see Section B.4). 15.4 Connectivity and robustness 327 connects a node in J1 to a node in J2 . Indeed, adding again an arbitrary link of that set makes J again connected. But J can be disconnected into the same two connected subgraphs by removing nodes in J1 and/or J2 . Since possible disconnectivity inside either J1 or J2 can occur before (J) nodes are removed, it follows that (J) cannot exceed (J), which establishes the inequality (15.5). G2 G1 C A B Fig. 15.5. A graph J with Q = 16 nodes and O = 32 links. Two connected subgraphs J1 and J2 are shown. The graph’s connectivity parameters are (J) = 1 (removal of node F), (J) = 2 (removal of links from F to J1 ), gmin (J) = 3 and gd = 2O Q = 4. Let us proceed to find the number of link-disjoint paths between D and E in a connected graph J. Suppose that K is a set of links whose removal separates D from E. Thus, the removal of all links in the set K destroys all paths from D to E. The maximum number of link-disjoint paths between D and E cannot exceed the number of links in K. However, this property holds for any set K, and thus also for the set with the smallest possible number of links. A similar argument applies to node-disjoint paths. Hence, we end up with Theorem 15.4.1: Theorem 15.4.1 (Menger’s Theorem) The maximum number of link(node)-disjoint paths between D and E is equal to the minimum number of links (nodes) separating D and E. Recall that the edge connectivity (J) (analogously vertex connectivity (J)) is the minimum number of links (nodes) whose removal disconnects J. By Menger’s Theorem, it follows that there are at least (J) link-disjoint paths and at least (J) node-disjoint paths between any pair of nodes in J. In order to dimension the graph J of a robust telecommunications network, the goal is to maximize both (J) and (J). Of course, the most reliable graph is the complete graph; however, it is also the most expensive. Usually, since the cost of digging and of installing/connecting the fibres is around 70% of the total network cost, the total number of links O 328 General characteristics of graphs is minimized. Since the minimum cannot exceed the average, we have that gmin gd = 2O Q . From (15.5), it follows that the best possible reliability is achieved if the network graph is designed such that (J) = 2O Q The optimum implies that gmin (J) = gd = 2O Q or that each node has the same degree gm = gd . Hence, a best possible reliable graph is a regular graph (gm = gd ), but not every regular graph necessarily obeys (J) = gd . Furthermore, two dierent graphs with the same parameters Q , O, (J), (J) and gmin (J) are not necessarily equally reliable. Indeed, the edge and vertex disconnectivity only give a minimum number (J) and (J) respectively, but do not give information about the number of subsets in J that lead to this number. It is clear that if only one vulnerable set of nodes is responsible for a low (J), while in another graph there are more such sets, that the first graph is more reliable than the second one. In summary, the presented simplified analysis gives some insights, but more details (e.g. the number of vulnerable sets or subgraphs) must be considered in the dimensioning study. 15.5 Graph metrics An important challenge in the modeling of a network is to determine the class of graphs that represents best the global and local structure of the network. Most of the valuable networks like the Internet, road infrastructures, neural networks in the human brain, social networks, etc. are large and changing over time. In order to classify graphs a set of distinguishing properties, called metrics, needs to be chosen. These metrics are in general function of the graph’s structure J(Q> O). Natural metrics of a graph are the degree distribution and the hopcount distribution of an arbitrary path. Beside quantities such as the diameter and the complexity of a graph defined in algebraic graph theory (Appendix B), some other metrics are the clustering coe!cient, the expansion, the resilience, the distortion and the betweenness. The clustering coe!cient fJ (y) characterizes the density of connections in the environment of a node y and is defined as the ratio of the number of links | connecting the gy neighbors of y over the total possible gy (g2y 31) , fJ (y) = 2| gy (gy 1) (15.6) The expansion hJ (k) of a graph reflects the number of nodes that can be 15.6 Random graphs 329 reached in k hops from a node y, hJ (k) = 1 X |F (k)| Q2 (15.7) yMN where F (k) is the set of nodes that can be reached in k hops from a node y and |D| represents the number of elements in the set D. We can interpret F (k) geometrically as a ball centered at node y with radius k. The resilience uJ (p) measures the connectivity or robustness of a graph. Let p = |F (k)| denote the number of nodes in a ball centered at node y and with radius k, and define o (y> p) as the number of links that needs to be removed to split F (k) into two sets with roughly equal numbers of nodes (around p@2). The resilience uJ (p) of a graph is 1 X o(y> p) (15.8) uJ (p) = O yMN The distortion wJ (p) measures how closely the graph resembles a tree and is defined as 1 X z (F (k)) (15.9) wJ (p) = Q yMN where z (J) is the value of the minimum spanning tree in J with unit link weight z (l $ m) = 1 for each link of J= Consider a flow with a unit amount of tra!c between each pair of nodes in the graph J. Each flow between a node pair follows the shortest path between that node pair. The betweenness E of a link (node) is defined as the number of shortest paths between all possible pairs of nodes in J that traverse the link (node). If Kl<m denotes the number of hops in the shortest path from l $ m, then the total number of hops KJ in all shortest paths in P PQ PO J is KJ = Q l=1 m=l+1 Kl<m . This number is also equal to KJ = o=1 Eo , where Eo is the betweenness of a link o in J. Taking the expectation of both relations gives the average betweenness of a link in terms of the average hopcount ¡Q ¢ 2 H[KQ ] H[KQ ] O with equality only for the complete graph. H [E] = 15.6 Random graphs Besides the regular topologies in Fig. 15.1, the class of random graphs constitutes an attractive set of topologies to analyze network performance. The 330 General characteristics of graphs theory of random graphs originated from a series of papers by Erdös and Rényi in the late 1940s. There exists an astonishingly large amount of literature on random graphs. The standard work on random graphs is the book by Bollobas (2001). We also mention the work of Janson et al. (1993) on evolutionary processes in random graphs. The two most frequently occurring models for random graphs are the Erdös-Rényi random graphs Js (Q ) and Ju (Q> O). The class of random graphs denoted by Js (Q ) consists of all graphs with Q nodes in which the links are chosen independently and with probability s. In the class Js (Q ) the total number of links is not deterministic, but on average equal ¡ ¢ to H [O] = s Omax where Omax = Q2 . Since s = OH[O] we also call s the max link density of Js (Q ). An instance of the class J0=013 (300) is drawn in Fig. 15.6. Related to Js (Q ) are the geometric random graphs J{slm } (Q ) where the links are still chosen independently but where the probability of l $ m being an edge is slm . An example of J{slm } (Q ) is the Waxman graph ul um |) and where (Waxman, 1998; Van Mieghem, 2001) with slm = exp(d| the vector ul represents the position of node l and d a real, non-negative number. Geometric random graphs are good models for ad-hoc wireless networks where the probability slm = i (ulm ) that there is a wireless link between node l and m is specified by the radio propagation that is briefly explained at the end of Section 3.5. random graphs with Q nodes and O links. The class Ju (Q> O) is the¡ set of Om a x ¢ In total, we can construct O dierent graphs, which corresponds to the number of ways we can distribute a set of O ones in the Omax possible places in the upper triangular part above the diagonal of the adjacency matrix D. Each of the possible Omax links has equal probability to belong to a random graph of the class Ju (Q> O). The probability that an element in the adjacency matrix D is dlm = 1 equals s = OmOa x . As opposed to the class Js (Q ), the number of non-zero elements in D in each random graph of Ju (Q> O) is precisely 2O (see Appendix B.1, art. 2), which induces weak dependence between links in Ju (Q> O). The latter also explains why more computations are easier in Js (Q ) than in Ju (Q> O). The average number of paths with m hops between two arbitrary nodes in Js (Q ) follows from (15.1) and (2.13) as H[[m ] = (Q 2)! m s (Q m 1)! (15.10) for 1 m Q 1. The average total number of paths between two arbitrary 15.6 Random graphs 331 Fig. 15.6. A connected random graph Js (Q ) with Q = 300 and s = 0=013 drawn on a circle. nodes D and E equals 6 5 Q31 Q32 X X s3o 1 H7 [m 8 = (Q 2)!sQ 31 (Q 2)!sQ31 h s o! m=1 o=0 where the latter bound is closely approached for large Q. Moreover, when the random graph reduces to the complete graph (s = 1), we again obtain (15.3). Since the degree gm of a node m is the number of links incident with that node, it follows directly from the definition of Js (Q ) that the probability density function of the degree Grg of an arbitrary node in Js (Q ) equals µ ¶ Q 1 n Pr [Grg = n] = s (1 s)Q313n (15.11) n The interest in random graphs is fueled by the fact that the topology of the Internet is inaccurately known and also that good models7 are lacking. In some sense, the Internet can be regarded as a growing and changing organism. Such complex networks also arise in other fields. Increased interest 7 A detailed discussion on di!culties in modeling or simulating the Internet is presented by Floyd and Paxson (2001). 332 General characteristics of graphs from dierent disciplines to understand network behavior resulted in a new wave of science, which may be termed “the physics of networks” and which was recently reviewed by Strogatz (2001). Random graphs are an elegant vehicle to thoroughly analyze the performance of, for example, routing algorithms. Some constructed overlay networks such as Gnutella and mobile wireless ad-hoc networks seem reasonably well modeled by Js (Q ). However, the class Js (Q) does not describe the Internet topology well, and the degree distribution especially deviates significantly. The degree distribution (15.11) in Js (Q ) is binomially distributed, while that of the Internet is close to a power law (15.4). Hence, there is a discrepancy between Internet measurements and properties of the random graph Js (Q ). 15.6.1 The number F(Q> O) of connected random graphs in the class Ju (Q> O) From the point of view of telecommunications networks, by far the most interesting graphs are those with connected topology. This limitation restricts the value of the link density s from below by a critical threshold sf . For large Q, the critical threshold is sf logQQ , as shown in Section 15.6.3. In the theory of random graphs, the problem to determine the number of connected random graphs F(Q> O) in the class Ju (Q> O) has been intensively studied. Gilbert (1956) has presented an exact recursion formula for F(Q> O) via the technique of enumeration. Erdös and Rényi (1959, 1960) have determined the asymptotic behavior of random graphs via the probabilistic method, largely introduced by Erdös himself. Since the analysis8 of Gilbert is both exact and simple, we will review his results here and those of Erdös and Rényi in the next section. Consider a particular random graph M of the class of random graphs Ju (Q + 1> O), which is constructed from the class Ju (Q> O) by adding one node labelled Q + 1. Suppose that the node labelled Q + 1 in the random graph M belongs to a connected component N that possesses y other nodes and some number of links. ¡ ¢ The remaining part of M has Q y nodes and O links. There are Qy ways in which the y nodes of N can be chosen out of the Q nodes in Ju (Q> O). On the other hand, there are F(y + 1> ) ¡ Q 3y ¢ ways of picking a connected random graph N while there are ( 2 ) ways O3 of constructing the remaining part of M. Hence, since the number of ways ¡ Q +1 ¢ we can construct a graph M equals ( O2 ) , we obtain Gilbert’s recursion 8 A dierent, less admissible approach is found in Goulden and Jackson (1983). 15.6 Random graphs 333 formula, µ¡Q+1¢¶ 2 µ¡Q3y¢¶ y+1 = O Q µ ¶ (X 2 ) X Q y=0 y 2 F(y + 1> ) O =y (15.12) Gilbert (1956) further derives the generating function for F(Q> O) as ! à n " X " " X X F(Q> O) Q O (1 + |)(2) {n (15.13) { | = log 1 + Q! n! Q=1 O=1 n=1 which converges for 2 | 0 and all {. So far, no other explicit formulae for F(Q> O) exist. In 1889, Caley (see Appendix B.1, art. 3) proved that, in the special case where O = Q 1, there holds F(Q> Q 1) = Q Q32 . In the other extreme where O = Omax¡ corresponding with a full mesh, we have F(Q> Omax ) = 1. ¢ Actually, when Q2 Q 1 ? O Omax , the graph is always connected because the adjacency matrix D has necessarily at least one non-zero element ¡ ¢ ¡Q ¢ per row. This means that F(Q> O) = ( 2 ) for Q Q 1 ? O O . In max 2 O all cases where O ? Q 1, the random graphs are necessarily Disconnected, leading to F(Q> O) = 0. For computational purposes, we rewrite (15.12) as µ¡Q+1¢¶ 2 O = ¡ 0 ¢ Q +1 (X 2 ) =Q y+1 µ ¡0¢ ¶ Q31 µ¡Q3y¢¶ 2 ) X µQ ¶ (X 2 2 F(Q +1> ) F(y+1> ) + y =y O O y=0 Since O3 = >O , we arrive after a substitution of Q $ Q 1 at the recursion formula, µ¡Q ¢¶ F(Q> O) = 2 O µ¡Q 313y¢¶ ¶ (y+1 2 ) Q 1 X 2 F(y + 1> ) y O =y Q32 Xµ y=0 (15.14) Below we list a few values: F(2> 1) = 1 F(3> 2) = 3 F(4> 3) = 16 F(5> 4) = 125 F(5> 8) = 45 F(6> 5) = 1296 F(6> 9) = 4945 F(6> 13) = 105 F(7> 6) = 16807 F(7> 10) = 331506 F(7> 14) = 116175 F(7> 18) = 1330 F(3> 3) = 1 F(4> 4) = 15 F(5> 5) = 222 F(5> 9) = 10 F(6> 6) = 3660 F(6> 10) = 2997 F(6> 14) = 15 F(7> 7) = 68295 F(7> 11) = 343140 F(7> 15) = 54257 F(7> 19) = 210 F(4> 5) = 6 F(5> 6) = 205 F(5> 10) = 1 F(6> 7) = 5700 F(6> 11) = 1365 F(6> 15) = 1 F(7> 8) = 156555 F(7> 12) = 290745 F(7> 16) = 20349 F(7> 20) = 21 F(4> 6) = 1 F(5> 7) = 120 F(6> 8) = 6165 F(6> 12) = 455 F(7> 9) = 258125 F(7> 13) = 202755 F(7> 17) = 5985 F(7> 21) = 1 334 General characteristics of graphs 15.6.2 The Erdös and Rényi asymptotic analysis In a classical paper, Erdös and Rényi (1959) proved that ¸ ¶ ¸ µ 1 32{ is connected = h3h lim Pr Ju Q> Q log Q + {Q Q<" 2 (15.15) Ignoring the integral part [.] operator and eliminating { using the number of links O = 12 Q log Q + {Q gives, for large Q , Pr[Ju (Q> O) = connected] h3Q h 3 2O Q (15.16) which should be compared with the exact result, Pr[Ju (Q> O) = connected] = F(Q> O) ¡(Q )¢ 2 (15.17) O In contrast to the unattractive computation of the exact F(Q> O) via recursion (15.14), the Erdös and Rényi asymptotic expression (15.16) is simple. The accuracy for relatively small Q is shown in Fig. 15.7. 1.0 L=N Pr[Gr(N,L) = disconnected] 0.8 L = 3/2 N 0.6 L = 2N 0.4 L = 2/3 N log N 0.2 exact Erdos' asymptotic formula 0.0 0 10 20 30 40 50 60 Number of Nodes N Fig. 15.7. The probability that a random graph Ju (Q> O) is disconnected : a comparison between the exact result (15.17) and Erdos’ asymptotic formula (15.16) for O = Q , O = 32 Q , O = 2Q and O = 23 Q log Q . The key observation of Erdös and Rényi (1959) is that a phase transition in random graphs with Q nodes occurs when the number of links O is around 15.6 Random graphs 335 Of = 12 Q log Q . Phase transitions are well-known phenomena in physics. For example, at a certain temperature, most materials possess a solid-liquid transition and at a higher temperature a second liquid-gas transition. Below that critical temperature, most properties of the material are completely dierent than above that temperature. Some materials are superconductive below a certain critical temperature Wf , but normally conductive (or even on the ¤property Dn isolating) above Wf . Erdös and Rényi concentrated £ that a random graph Ju (Q> O) with O{ = 12 Q log Q + {Q consists of Q n connected nodes and n isolated nodes for fixed n. If Dfn means the absence of property Dn , they proved that, for all fixed n, Pr [Dfn ] $ 0 if Q $ 4 which means that for a large number of nodes Q , almost all random graphs Ju (Q> O{ ) possess property Dn . This result is equivalent to a result proved in Section 15.6.3 that the class of random graphs Js (Q) is almost surely disconnected if the link density s is below sf logQQ and connected for s A sf . In view of the analogy with physics, it is not surprising that corresponding sharp transitions also are observed for other properties than just Dn . In the sequel, we will show that, for the random graph Ju (Q> O{ ), the probability that the largest connected component, called the giant component JF (Q> O{ ), has Q n nodes is, for large Q , Poisson distributed with mean h32{ , 32{ (h32{ )n h3h lim Pr [number of nodes in JF (Q> O{ ) = Q n] = Q<" n! (15.18) If n = 0, then all nodes belong to the giant component and the graph is completely connected in which case (15.18) leads to (15.16). The total number of graphs Ju (Q> O{ ) with n 1 isolated nodes equals ¡Q ¢¡(Q 3n)¢ 2 , the number of ways in which n isolated nodes can be chosen n O{ out of the total of Q nodes multiplied by the number of graphs that can be constructed with Q n nodes and O{ links. Observe that this total number also includes those graphs where not all the Q n nodes are necessarily connected. In other words, this total number includes the graphs that do not possess property Dn . The total number of graphs W0 without isolated node follows from the inclusion-exclusion formula (2.10) as µ ¶µ¡Q3n¢¶ Q X n Q 2 W0 (Q> O{ ) = (1) n O{ n=0 where the index n = 0 equals the total number of graphs with Q nodes and O{ links, i.e. the total number of elements in the sample space. Evi- 336 General characteristics of graphs dently, the total number F(Q> O{ ) of connected random graphs of the class Ju (Q> O{ ) is smaller than W0 (Q> O{ ) because all of them must obey property D0 as well. Since9 ¡Q ¢¡(Q 3n)¢ 2 (h32{ )n n O{ lim = Q ¢ ¡ (2) Q<" n! O{ we obtain 32{ )n W0 (Q> O{ ) X 32{ n (h (1) = = h3h Q ¢ ¡ Q<" (2) n! " lim n=0 O{ But, if Q $ 4, the dierence W0 (Q> O{ ) F(Q> O{ ) Pr [Df0 ] $ 0 ¡(Q )¢ 2 O{ which demonstrates (15.18) for n = 0. The remaining case for n 0 in (15.18) follows from the observation that the number of graphs in Ju (Q> O{ ) 9 It is convenient to take the logarithm of Q Q n wn = n 2 O { Q 2 O{ 2 n31 = Qn \ n! m=0 O\ n31 { 31 Q 3n 3m 1 \ 2 = (Q 3 m) Q n! m=0 3 m m=0 13 m Q 2m O{ 31 O\ { 31 1 3 n O{ 31 n (Q 3n)(Q 313n) 13 13 2m Q Q 31 1 3 Q (Q m=0 31) which is log (n!wn ) = n log Q + + H[ { 31 m=0 n m n + (H{ 3 1) log 1 3 + log 1 3 log 1 3 Q Q Q 31 m=0 n31 [ log 1 3 2m (Q 3 n)(Q 3 1 3 n) 3 log 1 3 2m Q(Q 3 1) For large Q and using the expansion log (1 3 }) = 3} + R } 2 , we have for fixed n with 2m 2m log 1 3 = log 1 3 + R Q 33 (Q 3 n)(Q 3 1 3 n) Q(Q 3 1) that 2n + R O2{ Q 33 log (n!wn ) = n log Q + R Q 31 3 O{ Q In order to have a finite limit limQ <" log (n!wn ) = f M R, we must require that n log Q 3 f O{ 2n = f which implies that O{ = Q log Q 3 Q . For this scaling the order term R O2{ Q 33 Q 2 2n f indeed vanishes if Q < ". By choosing { = 3 2n , we arrive at the correct scaling of O{ = 1 Q log Q + {Q postulated above and f = 32n{. 2 15.6 Random graphs 337 ¡Q ¢ with property Dn is equal to n multiplied by the number of connected graphs with Q n nodes and O{ links, which is approximately ¡Q ¢¡(Q 3n)¢ n 2 O{ ¡( )¢ Q 2 O{ W0 (Q n> O{ ) (h32{ )n 3h32{ h $ Q 3n ¡( )¢ n! 2 O{ where the limit gives the correct result because the small dierence between the total number and that without property Dn tends to zero. 15.6.3 Connectivity and degree There is an interesting relation between the connectivity of a graph, a global property, and the degree G of an arbitrary node, a local property. The implication {J is connected} =, {Gmin 1} where Gmin = minall nodes MJ G is always true. The opposite implication is not always true, however, because a network can consists of separate, disconnected clusters containing nodes each with minimum degree larger than 1. A random graph can be generated from a set of labelled Q nodes by randomly assigning a link with probability s to each pair of nodes. During this construction process, initially separate clusters originate, but at a certain moment, one of those clusters starts dominating (and swallowing) the other clusters. This largest cluster becomes the giant component. For large Q and a certain sQ which depends on Q , the implication {Gmin 1} =, {Js (Q) is connected} is almost surely (a.s.) correct. A rigorous mathematical proof is fairly complex and omitted. Thus, for large random graphs Js (Q ) holds the equivalence {Js (Q ) is connected} +, {Gmin 1} almost surely such that Pr [Js (Q ) is connected] = Pr [Gmin 1] a.s. From (3.32) and (15.11), we have that ´Q ³ Pr[Gmin 1] = (Pr[Guj 1])Q = (1 Pr[Guj = 0])Q = 1 (1 s)Q31 which shows that Pr [Gmin 1] rapidly tends to one for fixed 0 ? s ? 1 and large Q. Therefore, the asymptotic behavior of Pr [Js (q) is connected] 338 General characteristics of graphs requires the investigation of the influence of s as a function of Q , ³ ³ ´´ Pr [Js (Q ) is connected] = exp Q log 1 (1 sQ )Q31 4 3 " m(Q31) X (1 sQ ) D = exp CQ m m=1 4 3 " (Q31)m X (1 s ) Q 31 Q D exp CQ = h3Q(13sQ ) m m=2 If we denote fQ , Q · (1 sQ )Q 31 , then Q " X (1 sQ )(Q31)m m=2 m = " X m=2 fmQ mQ m31 ¡ ¢ can be made arbitrarily small for large Q provided we choose fQ = R Q with ? 12 . Thus, for large Q , we have that ³ ³ ´´ Pr [Js (Q ) is connected] = h3fQ 1 + R Q 231 which tends to 0 for 0 ? ? 12 and to 1 for ? 0. Hence, the critical exponent where a sharp transition occurs is = 0. In that case, fQ = f (a real positive constant) and µ µ ¶ ¶ log Qf log f log Q = +R = sQ = 1 exp Q 1 Q Q In summary, for large Q , Pr [Js (Q ) is connected] $ 0 1 if s ? logQQ if s A logQQ (15.19) with a transition region around sf logQQ with a width of R( Q1 ). Notice { ' logQQ + Q{ : for large { ? 0, the agreement with (15.15) where s{ = OOmax 32{ 32{ $ 0, while for large { A 0, h3h $ 1 and the width of the transition h3h 1 region for the link density s is R( Q ). 15.6.4 Size of the giant component Let V = Pr [q 5 F] denote the probability that a node q in Js (Q ) belongs to the giant component F. If q 5 @ F, then none of the neighbors of node q 15.6 Random graphs 339 belongs to the giant component. The number of neighbors of a node q is the degree gq of a node such that Pr [q 5 @ F] = Pr [all neighbor of q 5 @ F] X = Pr [all n neighbors of q 5 @ F|gq = n] Pr [gq = n] nD0 Since in Js (Q ) all neighbors of q are independent10 , the conditional probability becomes, with 1 V = Pr [q 5 @ F], @ F])n = (1 V)n Pr [all n neighbors of q 5 @ F|gq = n] = (Pr [q 5 Moreover, this probability holds for any node in q 5 Js (Q ) such that, writing the random variable Grg instead of an instance gq , 1V = " X (1 V)n Pr [Grg = n] = *Grg (1 V) n=0 ¤ £ where *Grg (x) = H xGrg is the generating function of the degree Grg in Js (Q ). For large Q , the degree distribution in Js (Q ) is Poisson distributed with mean degree rg = s (Q 1) and *Grg (x) ' hrg (x31) . For large Q , the fraction V of nodes in the giant component in the random graph satisfies an equation similar to that in (12.13) of the extinction probability in a branching process, V = 1 h3rg V (15.20) and the average size of the giant component is Q V. For rg ? 1 the only solution is V = 0 whereas for rg A 1 there is a non-zero solution for the size of the giant component. The solution can be expressed as a Lagrange series using (5.34), V (rg ) = 1 h3rg " X (q + 1)q ¡ q=0 (q + 1)! rg h3rg ¢q (15.21) By reversing (15.20), the average degree in the random graph can be expressed in terms of the fraction V of nodes in the giant component, rg (V) = 10 log (1 V) V (15.22) This argument is not valid, for example, for a two-dimensional lattice Z2s in which each link between adjacent nodes at integer value coordinates in the plane exists with probability s. The critical link density for connectivity in Z2s is sf = 12 , a famous result proved in the theory of percolation (see, for example, Grimmett (1989)). 340 General characteristics of graphs 15.7 The hopcount in a large, sparse graph with unit link weights Routers in the Internet forward IP packets to the next hop router, which is found by routing protocols (such as OSPF and BGP). Intra-domain routing as OSPF is based on the Dijkstra shortest path algorithm, while inter-domain routing with BGP is policy-based, which implies that BGP does not minimize a length criterion. Nevertheless, end-to-end paths in the Internet are shortest paths in roughly 70% of the cases. Therefore, we consider the shortest path between two arbitrary nodes because (a) the IP address does not reflect a precise geographical location and (b) uniformly distributed world wide communication, especially, on the web seems natural since the information stored in servers can be located in places unexpected and unknown to browsing users. The Internet type of communication is dierent from classical telephony because (a) telephone numbers have a direct binding with a physical location and (b) the intensity of average human interaction rapidly decreases with distance. We prefer to study the hopcount KQ because it is simple to measure via the trace-route utility, it is an integer, dimensionless, and the quality of service (QoS) measures (such as packet delay, jitter and packet loss) depend on the hopcount, the number of traversed routers. In this section, we first investigate the hopcount in a sparse, but connected graph where all links have unit weight. Chapter 16 treats graphs with other link weight structures. 15.7.1 Bi-directional search The basic idea of a bi-directional search to find the shortest path is by starting the discovery process (e.g. using Dijkstra’s algorithm) from D and E simultaneously. When both subsections from D and from E meet, the concatenation forms the shortest path from D to E. In case all link weights are equal, z (l $ m) = 1 for any link l $ m in a graph J, the shortest path from D and E is found when the discovery process from D and that from E have precisely one node of the graph in common. Denote by FD (o), respectively FE (o), the set of nodes that can be reached from D, respectively E, in o or less hops. We define FD (0) = {D} and FE (0) = {E}. The hopcount is larger than 2o if and only if FD (o) _ FE (o) is empty. Conditionally on |FD (o)| = qD , respectively |FE (o)| = qE , the sets FD (o) and FE (o) do not possess a common node with probability ¡Q313qD ¢ Pr [FD (o) _ FE (o) = B||FD (o)| = qD > |FE (o)| = qE ] = q E ¡Q 31 ¢ qE 15.7 The hopcount in a large, sparse graph with unit link weights 341 which consists of the ratio of all combinations in which the qE nodes around E can be chosen out of the remaining nodes that do not belong to the set FD over all combinations in which qE nodes can be chosen in the graph with Q nodes except for node D. Furthermore, ¡Q313qD ¢ (Q qD 1)(Q qD 2) · · · (Q qD qE ) qE ¡Q31 ¢ = (Q 1)(Q 2) · · · (Q qE ) q E +qE (1 qDQ+1 )(1 qDQ+2 ) · · · (1 qDQ ) = qE 1 2 (1 Q )(1 Q ) · · · (1 Q ) For large Q , we apply the Taylor series around { = 0 of log (1 {) = P {m { " m=2 M , ¡Q313qD ¢ log q E ¡Q31 ¢ qE µ ¶ µ ¶ qD + n n log 1 log 1 Q Q n=1 à ! ¶ X qE µ qE " X X 1 n qD + n (qD + n)m nm = Q Q mQ m m=2 n=1 n=1 µ ¶ ³ ´ 2 qD qE 1 1 qD qE 1 = + + U Q Q 2qD 2qE 2qD qE = qE X where the remainder is µ³ ¶ m31 µ ¶ qE " X X 1 X m qD qE ´3 m3p p q n =R U= mQ m Q p D m=3 p=0 n=1 After exponentiation µ µ³ ¶¶ qD qE ´2 1+R Q µ ¶ H [|FD (o)|2 |FE (o)|2 ] By the law of total probability (2.47) and up to R for Q2 ¯ qD qE £ ¤ Pr KQ A 2o¯|FD (o)| = qD > |FE (o)| = qE = h3 Q large Q , we obtain µ ¶¸ |FD (o)| |FE (o)| Pr [KQ A 2o] H exp Q (15.23) This probability (15.23) h holds for any ilarge graph with a unit link weight structure provided H |FD (o)|2 |FE (o)|2 = r(Q 2 ). Formula (15.23) becomes increasingly accurate for decreasing |FD (o)| and |FE (o)|, and so for sparser large graphs. 342 General characteristics of graphs 15.7.2 Sparse large graphs and a branching process In order to proceed, the number of nodes in the sets FD (o) and FE (o) needs to be determined, which is di!cult in general. Therefore, we concentrate here on a special class of graphs in which the discovery process from D and E is reasonably well modeled by a branching process (Chapter 12). A branching process evolves from a given set FD (o 1) in the next o-th discovery cycle (or generation) to the set FD (o) by including only new nodes, not those previously discovered. The application of a branching process implies that the newly discovered nodes do not possess links to any previously discovered node of FD (o 1) except for its parent node in FD (o 1). Hence, only for large and sparse graphs or tree-like graphs, this assumption can be justified, provided that the number of links that point backwards to early discovered nodes in FD (o 1) is negligibly small. Assuming that a branching process models the discovery process well, we will compute the number of nodes that can be reached from D and similarly from E in o hops from a branching process with production \ specified by the degree distribution of the nodes in the graph. The additional number of nodes [o discovered during the o-th cycle of a branching process that are included in the set FD (o) is described by the basic law (12.1). Thus, P |FD (o)| = on=0 [n with [0 = 1 (namely node D). In terms of the scaled n random variable Zn = [ with unit mean H [Zn ] = 1, n |FD (o)| = o X Zn n n=0 and where = H [\ ] 1 A 1 denotes the average degree minus 1, i.e. the outdegree, in the graph. Only the root has H [\ ] equal to the mean degree. Immediately, the average size of the set of nodes reached from D in o hops is with H [Zn ] = 1, H [|FD (o)|] = o X n = n=0 o+1 1 1 which equally holds for H [|FE (o)|]. Applying Jensen’s inequality (5.5) to (15.23) yields µ ¶ µ ¶¸ H [FD (o)] H [FE (o)] FD (o)FE (o) exp H exp Q Q such that µ Pr [KQ A n] exp 2 n Q ( 1)2 ¶ 15.7 The hopcount in a large, sparse graph with unit link weights 343 With the tail probability expression (2.36) for the average, we arrive at the lower bound for the expected hopcount in large graphs, µ ¶ " " X X 2 n Pr [KQ A n] exp H [KQ ] = Q ( 1)2 n=0 n=0 ¢ ¡ P n can be evaluated exactly11 as The sum V1 (w) = " n=0 exp w ´i ³ h ¡ ¢ " cos 2n log + arg 2nl X log w log 2 1 log w + q V1 (w) = +s 2 2 log log n=1 2n sinh 2n log " ³ ´ X 3n + 1 h3w n=1 Furthermore, ´i ¯ ³ h ¯ ¡¢ ¯ ¯X 2n 2nl " ¯ " cos log log w + arg log ¯ X 1 ¯ ¯ q q = e() ¯ ¯ 2 ¯ n=1 2n sinh 2n2 ¯n=1 2n sinh 2n log log is increasing, but for 1 ? 5 its maximum and the function W () = I2e() log 2 value W (5) is smaller than 0=0035. Since w = Q(31) 2 is small and A 1, we approximate 1 log w + V1 (w) (15.24) 2 log 11 For = Uh(v) A 0 and Uh(s) D 0, we have K(v) = sv ] " s wv31 h3 w gw 0 and K(v) or ] " " " [ [ n 1 = wv31 h3 w gw nv 0 n=0 n=0 ] " wv31 V1 (w) gw = K(v) 0 v v 3 1 By Mellin inversion, for f A 0, V1 (w) = 1 2l ] f+l" f3l" K(v) v gv v 3 1 w By moving the line of integration to the left, we encounter a double pole at v = 0 from K(v) 2nl from v131 . Invoking Cauchy’s residue theorem leads and v131 and simple poles at v = log to the result. 344 General characteristics of graphs and arrive, for large Q , at 2 log Q 1 log (31)2 + log Q H [KQ ] + log 2 log log This shows that in large, sparse graphs for which the discovery process is Q well modeled by a branching process, it holds that H [KQ ] scales as log log where = H [\ ] 1 A 1 is the average degree minus 1 in the graph. We can refine the above analysis. Let us now assume that the convergence of Zn $ Z is su!ciently fast for large Q and that Z A 0 such that, |FD (o)| ZD o X n = ZD n=0 o+1 1 o+1 ZD 1 1 is a good approximation (and similarly for |FE (o)|). The verification of this approximation is di!cult in general. Theorem 12.3.2 states that Pr [Z = 0] = 0 and equivalently Pr [Z A 0] = 1 0 where the extinction probability 0 obeys the equation (12.13). Using this approximation, we find from (15.23) µ ¸ ¶¯ ¯ ZD ZE 2o+2 ¯ ZD > ZE A 0 Pr [KQ A 2o] H exp Q ( 1)2 ¯ where the condition on Z A 0 is required else there are no clusters FD (o) and FE (o) nor a path. Since the same asymptotics also holds for odd values of the hopcount, we finally arrive, for n 1 and large Q , at ´¯ h ³ i ¯ Pr [KQ A n] H exp ]n ¯ ZD > ZE A 0 where the random variable ]= g 2 ZD ZE Q ( 1)2 g and ZD = ZE = Z . A more explicit computation of Pr [KQ A n] requires the knowledge of the limit random variable Z , which strongly depends on the nodal degree \ . The average hopcount H [KQ ] is found similarly as in the analysis above by using (15.24) with w = ], H [KQ ] H [ V1 (])| ZD > ZE A 0] ¯ # " 1 2 log Z log Q + 2 log (31) + ¯¯ =H ¯Z A 0 ¯ 2 log 1 log Q 2 log (31) H [ log Z | Z A 0] = + 2 2 log log 15.7 The hopcount in a large, sparse graph with unit link weights 345 In sparse graphs with average degree H [\ ] equal to and for a large number of nodes Q , the average hopcount is well approximated12 by H [KQ ] = 1 2 log ( 1) H [ log Z | Z A 0] log Q + 2 log 2 log log (15.25) This expression (15.25) for the average hopcount — which is more refined than Q the commonly used estimate H [KQ ] log log — contains the curious average H [ log Z | Z A 0] where Z is the limit random variable of the branching process produced by the graph’s degree distribution \ . Application to Gp (N) The above analysis holds for fixed H [\ ] = s(Q where is approximately 1) such that, for large Q , we require that s = Q equal to the average degree. Since the binomial distribution (15.11) for the degree in Js (Q) is very well approximated by the Poisson distribution n Pr [Grg = n] n! h3 for large Q and constant , formula (15.25) requires the computation of H [ log Z | Z A 0] in a Poisson branching process, which is presented in Hooghiemstra and Van Mieghem (2005) but here summarized in Fig. 15.8. The numerical evaluation of average hopcount (15.25) in a 1.2 1.0 E[logW|W>0] 0.8 0.6 0.4 0.2 0.0 -0.2 1 2 3 4 5 6 7 8 9 10 P Fig. 15.8. The quantity H [ log Z | Z A 0] of a Poisson branching process versus the average degree . 12 A more rigorous derivation that stochastically couples the graph’s growth specified by a certain degree distribution to a corresponding branching process is found in van der Hofstad et al. (2005). In particular, the analysis is shown to be valid for any randomly constructed graph with a finite variance of the degree. More details on the result for the average hopcount are presented in Hooghiemstra and Van Mieghem (2005). 346 General characteristics of graphs random graph of the class Js (Q ) for small average degree and large Q shows that (15.25) is much more accurate than only its first term log Q . At the other end of the scale for a constant link density s = f ? 1, which corresponds to an average degree H [\ ] = f(Q 1), the above analysis no longer applies for such large values of the average degree H [\ ]. Fortunately, in that case, an exact asymptotic analysis is possible (see Problem (iii)): Pr [KQ = 1] = s ¡ ¢ Pr [KQ = 2] = (1 s) 1 (1 s2 )Q 32 (15.26) Values of KQ higher than 2 are extremely unlikely since Pr [KQ A 2] = (1 £ ¤Q 32 s) 1 s2 tends to zero rapidly for su!ciently large Q . Hence, H[KQ ] ' Pr [KQ = 1] + 2 Pr [KQ = 2] ' 2 s and, similarly, we find Var[KQ ] ' s(1s). This asymptotic analysis even holds for a larger link density regime 1 s = fQ 3 2 + with A 0 because ¤Q32 £ 1 =0 lim Pr [KQ A 2] = lim (1 fQ 3 2 + ) 1 fQ 31+2 Q<" Q<" but for = 0, it holds that limQ <" Pr [KQ A 2] = h3f A 0. In summary, if the link density s scales as s = fQ 3 with 5 [0> 12 ), the average hopcount H[KQ ] ' 2 s is constant and very small. If s = Q 13 , equation (15.25) shows that H [KQ ] log Q . The regime in between for 5 [ 12 > 1) needs other analysis techniques. 15.8 Problems (i) An extremely regular graph is a g-lattice where each nodal position corresponds to a point with integer coordinates within a gdimensional hyper-cube with size ]. Apart from border nodes, each node has a constant degree (number of neighbors), precisely equal to 2g. Assuming that all link metrics are equal to one, compute the probability generating function of the hopcount of the shortest path between two uniformly chosen points. coe!cient (ii) If fJs (Q) is the clustering £ ¤ £of the ¤random graph Js (Q ), then compute Pr fJs (Q ) { and H fJs (Q) . (iii) Derive (15.26) in Js (Q ) with unit link weights. 16 The Shortest Path Problem The shortest path problem asks for the computation of the path from a source to a destination node that minimizes the sum of the positive weights1 of its constituent links. The related shortest path tree (SPT) is the union of the shortest paths from a source node to a set of p other nodes in the graph with Q nodes. If p = Q 1, the SPT connects all nodes and is termed a spanning tree. The SPT belongs to the fundamentals of graph theory and has many applications. Moreover, powerful shortest path algorithms like that of Dijkstra exist. Section 15.7 studied the hopcount, the number of hops (links) in the shortest path, in sparse graphs with unit link weights. In this chapter, the influence of the link weight structure on the properties of the SPT will be analyzed. Starting from one of the simplest possible graph models, the complete graph with i.i.d. exponential link weight, the characteristics of the shortest path will be derived and compared to Internet measurements. The link weights seriously impact the path properties in QoS routing (Kuipers and Van Mieghem, 2003). In addition, from a tra!c engineering perspective, an ISP may want to tune the weight of each link such that the resulting shortest paths between a particular set of in- and egresses follow the desirable routes in its network. Thus, apart from the topology of the graph, the link weight structure clearly plays an important role. Often, as in the Internet or other large infrastructures, both the topology and the link weight structure are not accurately known. This uncertainty about the precise structure leads us to consider both the underlying graph and each of the link weights as random variables. 1 A zero link weight is regarded as the coincidence of two nodes (which we exclude), while an infinite link weight means the absence of a link. 347 348 The Shortest Path Problem 16.1 The shortest path and the link weight structure Since the shortest path is mainly sensitive to the smaller, positive link weights, the probability distribution of the link weights around zero will dominantly influence the properties of the resulting shortest path. A regular link weight distribution Iz ({) = Pr [z {] has a Taylor series expansion around { = 0, ¡ ¢ Iz ({) = iz (0) { + R {2 since Iz (0) = 0 and Iz0 (0) = iz (0) exists. A regular link weight distribution is thus linear around zero. The factor iz (0) only scales all link weights, but does not influence the shortest path. The simplest distribution of the link weight z with a distinct dierent behavior for small values is the polynomial distribution Iz ({) = { 1{M[0>1] + 1{M[1>") > A 0> (16.1) The corresponding density is iz ({) = {31 1{M[0>1] . The exponent log Iz ({) {0 log { = lim is called the extreme value index of the probability distribution of z and = 1 for regular distributions. By varying the exponent over all nonnegative real values, any extreme value index can be attained and a large class of corresponding SPTs, in short -trees, can be generated. Fw(x) 1 D D 0 H D! larger scale 1 x Fig. 16.1. A schematic drawing of the distribution of the link weights for the three dierent -regimes. The shortest path problem is mainly sensitive to the small region around zero. The scaling invariant property of the shortest path allows us to divide all link weights by the largest possible such that Iz (1) = 1 for all link weight distributions. 16.2 The shortest path tree in NQ with exponential link weights 349 Figure 16.1 illustrates schematically the probability distribution of the link weights around zero (0> ], where A 0 is an arbitrarily small, positive real number. The larger link weights in the network will hardly appear in a shortest path provided the network possesses enough links. These larger link weights are drawn in Fig. 16.1 from the double dotted line to the right. The nice advantage that only small link weights dominantly influence the property of the resulting shortest path tree implies that the remainder of the link weight distribution (denoted by the arrow with larger scale in Fig. 16.1) only plays a second order role. To some extent, it also explains the success of the simple SPT model based on the complete graph NQ with i.i.d. exponential link weights, which we derive in Section 16.2. A link weight structure eectively thins the complete graph NQ — any other graph is a subgraph of NQ — to the extent that a specific shortest path tree can be constructed. Finally, we assume the independence of link weights, which we deem a reasonable assumption in large networks, such as the Internet with its many independent autonomous systems (ASs). Apart from the Section 16.7, we will mainly consider the case for = 1, which allows an exact analysis. 16.2 The shortest path tree in NQ with exponential link weights 16.2.1 The Markov discovery process Let us consider the shortest path problem in the complete graph NQ , where each node in the graph is connected to each other node. The problem of finding the shortest path between two nodes D and E in NQ with exponentially distributed link weights with mean 1 can be rephrased in terms of a Markov discovery process. The discovery process evolves as a function of time and stops at a random time W when node E is found. The process is shown in Fig. 16.2. The evolution of the discovery process can be described by a continuoustime Markov chain [(w), where [(w) denotes the number of discovered nodes at time w, because the characteristics of a Markov chain (Theorem 10.2.3) are based on the exponential distribution and the memoryless property. Of particular interest here is the property (see Section 3.4.1) that the minimum of q independent exponential variables each with parameter l is again an P exponential variable with parameter ql=1 l . The discovery process starts at time w = W0 with the source node D and for the initial distribution of the Markov chain, we have Pr[[(W0 ) = 1] = 1. The state space of the continuous Markov chain is the set VQ consisting of all positive integers (nodes) q with q Q . For the complete graph NQ , the 350 The Shortest Path Problem transition rates are given by q = q(Q q)> q 5 VQ (16.2) Indeed, initially there is only the source node D with label2 0, hence q = 1. From this first node D precisely Q 1 new nodes can be reached in the complete graph NQ . Alternatively one can say that Q 1 nodes are competing with each other each with exponentially distributed strength to be discovered and the winner amongst them, say F with label 1, is the one reached in shortest time which corresponds to an exponential variable with rate Q 1. v8 v7 corresponding URT Markov discovery process v6 v5 v4 v3 2 5 v2 h=0 0 6 3 4 1 5 h=1 h=2 6 v1 1 7 0 8 h=3 time 2 4 3 8 W6 7 Fig. 16.2. On the left, the Markov discovery process as function of time in a graph with Q = 9 nodes. The circles centered at the discovering node D with label 0 present equi-time lines and yn is the discovering time of the n-th node, while n = yn yn1 is the n-th interattachment time. The set of discovered nodes redrawn per level are shown on the right, where a level gives the number of hops k from the source node D. The tree is a uniform recursive tree (URT). 2 When continuous measures such as time and weight of a path are computed, the source node is most conveniently labeled by zero, whereas in counting processes, such as the number of hops of a path, the source node is labeled by one. 16.2 The shortest path tree in NQ with exponential link weights 351 After having reached F from D at hitting time y1 , two nodes q = 2 are found and the discovery process restarts from both D and F. Although at time y1 we were already progressed a certain distance towards each of the Q 2 other, not yet discovered, nodes, the memoryless property of the exponential distribution tells us that the remaining distance to these Q 2 nodes is again exponentially distributed with the same parameter 1. Hence, this allows us to restart the process from D and F by erasing the previously partial distance to any other not yet discovered node as if we ignore that it were ever travelled. From the discovery time y1 of the first node on, the discovery process has double strength to reach precisely Q 2 new nodes. Hence, the next winner, say G labeled by 2, is reached at y2 in the minimum time out of 2(Q 2) traveling times. This node G has equal probability to be attached to D or F because of symmetry. When G is attached to D (the argument below holds similarly for attachment to F), symmetry appears to be broken, because G and F have only one link used, whereas D has already two links used. However, since we are interested in the shortest path problem and since the direct link from D to G is shorter than the path D $ F $ G, we exclude the latter in the discovery process, hereby establishing again the full symmetry in the Markov chain. This exclusion also means that the Markov chain maintains single paths from D to each newly discovered node and this path is also the shortest path. Hence, there are no cycles possible. Furthermore, similar to Dijkstra’s shortest path algorithm, each newly reached node is withdrawn from the next competition round, which guarantees that the Markov chain eventually terminates. Besides terminating by extinction of all available nodes, after each transition when a new node is discovered, the Markov chain stops with 1 , since each of the q already discovered nodes has probability equal to Q3q precisely 1 possibility out of the remaining Q q to reach E and only one of them is the discoverer. The stopping time W is defined as the infimum for w 0 at which the destination node E is discovered. In summary, the described Markov discovery process, a pure birth process with birth rate q = q(Q q), models exactly the shortest path for all values of Q . 16.2.2 The uniform recursive tree A uniform recursive tree (URT) of size Q is a random tree rooted at D. At each stage a new node is attached uniformly to one of the existing nodes until the total number of nodes is equal to Q . The hopcount kQ (equivalent to the depth or distance) is the smallest number of links between the root D and a destination chosen uniformly from all nodes {1> 2> = = = > Q }. 352 The Shortest Path Problem n o (n) Denote by [Q the n-th level set of a tree W , which is the set of nodes in the tree W at hopcount n from the root nD in aograph with Q nodes, and (n) (n) (0) by [Q the number of elements in the set [Q . Then, we have [Q = 1 because the zeroth level can only contain the root node D itself. For all (n) n A 0, it holds that 0 [Q Q 1 and that Q31 X (n) [Q = Q (16.3) n=0 (q) Another consequence of the definition is that, if [Q = 0 for some level (m) q ? Q 1, then all [Q = 0 for levels m A q. In such a case, the longest possible shortest path in the tree has a hopcount of q. The level set o n (1) (2) (Q31) OQ = 1> [Q > [Q > = = = > [Q (n) of a tree W is defined as the set containing the number of nodes [Q at each level n. An example of a URT organized per level n is drawn on the right in Fig. 16.2 and in Fig. 16.3. A basic theorem for URTs proved in van der Hofstad et al. (2002b), is the following: (n) (n) Theorem 16.2.1 Let {\Q }n>QD0 and {]Q }n>QD0 be two independent copies of the vector of level sets of two sequences of independent URTs. Then (n) g (n31) {[Q }nD0 = {\Q1 (n) + ]Q3Q1 }nD0 > (16.4) where on the right-hand side the random variable Q1 is uniformly distributed over the set {1> 2> = = = > Q 1}. Theorem 16.2.1 also implies that a subtree rooted at a direct child of the root is a URT. For example, in Fig. 16.3, the tree rooted at node 5 is a URT of size 13 as well as the original tree without the tree rooted at node 5. By applying Theorem 16.2.1 to the URT subtree, any subtree rooted at a member of a URT is also a URT. An arbitrary URT X consisting of Q nodes and with the root labeled by 1 can be represented as X = (q2 # 2) (q3 # 3) = = = (qQ # Q ) (16.5) where (qm # m) means that the m-th node is attached to node qm 5 [1> m 1] and q2 = 1. Hence, qm is the predecessor of m and the predecessor relation is indicated by the arrow “#”. Moreover, qm is a discrete uniform random variable on [1> m 1] and all q2 > q3 > = = = > qQ are independent. 16.2 The shortest path tree in NQ with exponential link weights Root 1 6 12 2 18 3 4 22 24 5 26 9 7 8 10 14 21 13 16 20 23 25 19 11 15 17 353 X N( 0) 1 X N(1) 5 X N( 2) 9 X N( 3) 7 X N( 4) 4 Fig. 16.3. An instance of a uniform recursive tree with Q = 26 nodes organized per level 0 n 4. The node number (inside the circle) indicates the order in which the nodes were attached to the tree. Theorem 16.2.2 The total number of URTs with Q nodes is (Q 1)! Proof: (a) Let the nodes be labeled in the order of attachment to the URT and assign label 1 for the root. The URT growth law indicates that node 2 can only be attached in one way, node 3 in two ways, namely to node 1 and node 2 with equal probability. The n-th node can be attached in n 1 possible nodes. Each of these possible constructions leads to a URT. (b) By summing over all allowable configurations in (16.5), we obtain 1 X 2 X q2 =1 q3 =1 and this proves the theorem. === Q31 X 1 = (Q 1)! qQ =1 ¤ In general, Cayley’s Theorem (Appendix B.1 art. 3) states that there are Q Q32 labeled trees possible. The URT is a subset of the set of all possible labeled trees. Not all labeled trees are URTs, because the nodes that are further away from the root must have larger labels. The shortest path tree from the source or root D to other nodes in the complete graph is the tree associated with the Markov discovery process, where the number of nodes [(w) at time w is constructed as follows. Just as the discovery process, the associated tree starts at the root D. We now investigate the embedded Markov chain (Section 10.4) of the continuous-time discovery process. After each transition in the continuous-time Markov chain, [(w) $ 354 The Shortest Path Problem [(w)+1, an edge of unit length is attached randomly to one of the q already discovered nodes in the associated tree because a new edge is equally likely to be attached to any of the q discovering nodes. Hence, the construction of the tree associated with the Markov discovery process and illustrated in Fig. 16.2 on the right demonstrates that the shortest path tree in the complete graph NQ with exponential link weights is an uniform recursive tree. This property of the shortest path tree in NQ with exponential link weights is an important motivation to study the URT. More generally, in van der Hofstad et al. (2001) we have proved that, for a fixed link density s and su!ciently large Q , the shortest path tree in the class RGU, the class of random graphs Js (Q ) with exponential or uniformly distributed link weights, is a URT. Smythe and Mahmoud (1995) have reviewed a number of results on recursive trees that have appeared in the literature from the late 1960s up to 1995. 16.3 The hopcount kQ in the URT 16.3.1 Theory The hopcount kQ from the root to an arbitrary chosen node in the URT equals the number of links or hops from the root to that node. We allow the arbitrary node to coincide with the root in which case kQ = 0. Theorem 16.3.1 The probability generating function of the hopcount in the URT with Q nodes is h i (Q + }) *kQ (}) = H } kQ = (16.6) (Q + 1)(} + 1) Proof: Since the number of nodes at hopcount n from the root (or at (n) level n) is [Qk , al node uniformly chosen out of Q nodes in the URT has (n) probability H [Q Q of having hopcount n, Pr[kQ = n] = h i (n) H [Q Q (16.7) If the size of the URT grows from q to q + 1 nodes, each node at hopcount n 1 from the root can generate a node at hopcount n with probability 1@q. Hence, for n 1, h i (n31) i Q31 h X H [q (n) H [Q = q q=n 16.3 The hopcount kQ in the URT 355 With (16.7), a recursion for Pr[kQ = n] follows for n 1 as 1 X Pr[kQ = n] = Pr[kq = n 1] Q Q 31 q=n The generating function of kQ equals Q31 h i X kQ Pr [kQ = n] } n = Pr[kQ = 0] + *kQ (}) = H } n=1 = 1 1 + Q Q Q 31 Q31 X X n=1 q=n q Q 31 X X 1 1 = + Q Q q=1 n=1 Pr[kq = n 1]} n Q31 1 } X Pr[kq = n 1]} = *k (}) + Q Q q=1 q n Taking the dierence between (Q + 1)*kQ +1 (}) and Q *kQ (}) results in the recursion (Q + 1)*kQ +1 (}) = (Q + })*kQ (}) £ ¤ £ ¤ Iterating this recursion starting from *k1 (}) = H } k1 = H } 0 = 1 leads to (16.6). ¤ Corollary 16.3.2 The probability density function of the hopcount in the URT with Q nodes is (n+1) (1)Q 3(n+1) VQ (16.8) Q! Proof: The probability generating function *kQ (}) in (16.6) is also the (n) generating function of the Stirling numbers VQ of the first kind (Abramowitz and Stegun, 1968, 24.1.3) such that the probability that a uniformly chosen node in the URT has hopcount n equals (16.8). ¤ Pr[kQ = n] = The explicit form of the generating function shows that the average hopcount kQ in a URT of size Q equals ¯ Q X ¯ 1 g 0 H[kQ ] = *kQ (1) = = log *kQ (})¯¯ (16.9) g} o }=1 o=2 = #(Q + 1) + 1 0 (}) is the digamma function (Abramowitz and Stegun, where #(}) = KK(}) 1968, Section 6.3) and the Euler constant is = 0=57721 = = =. Similarly, 356 The Shortest Path Problem the variance (2.27) follows from the logarithm of the generating function OkQ (}) = log (Q + }) log (Q + 1) log (} + 1) as Var[kQ ] = # 0 (Q + 1) # 0 (2) + #(Q + 1) + 1 2 + # 0 (Q + 1) 6 Using the asymptotic formulae for the digamma function leads to µ ¶ 1 H[kQ ] = log Q + 1 + R Q µ ¶ 2 1 +R Var[kQ ] = log Q + 6 Q = #(Q + 1) + (16.10) (16.11) For large Q , we apply an asymptotic formula of the Gamma function (Abramowitz and Stegun, 1968, Section 6.1.47) to the generating function of the hopcount (16.6), µ µ ¶¶ Q }31 1 1+R *kQ (}) = (} + 1) Q P 1 n = " Introducing the Taylor series of K(}) n=1 fn } where the coe!cients fn are listed in Abramowitz and Stegun (1968, Section 6.1.34), we obtain with Q } = h} log Q , µ µ ¶¶ " " X 1 logn Q n 1 X n31 } 1+R fn } *kQ (}) = Q n! Q n=1 n=0 ¡1¢ " n 1+R Q X X logn3p Q n } fp+1 = Q (n p)! n=0 p=0 With the definition (2.18) of the probability generating function, we conclude that the asymptotic form of the probability density function (16.8) of the hopcount in the URT is ¡ ¢ n 1 + R Q1 X logn3p Q fp+1 Pr[kQ = n] = (16.12) Q (n p)! p=0 Since the coe!cients fn are rapidly decreasing, approximating the sum in (16.12) by its first term (p = 0) yields to first order in Q , (log Q )n (16.13) Pr[kQ = n] Q n! which is recognized as a Poisson distribution (3.9) with mean log Q . Hence, for large Q and to first order, the average and variance of the hopcount in 16.3 The hopcount kQ in the URT 357 the URT are approximately H[kQ ] Var[kQ ] log Q . The accuracy of the Poisson approximation can be estimated by comparison with the average (16.10) and the variance (16.11) found above up to second order in Q . For example, if the URT has Q = 104 nodes, the Poisson approximation yields H[kQ ] = Var[kQ ] = 9=21034, while the average (16.10) is H[kQ ] = 8=78756 accurate up to 1034 and the variance (16.11) is Var[kQ ] = 8=14262. The exact results are H[kQ ] = 8=78761 and Var[kQ ] = 8=14277. 16.3.2 Application of the URT to the hopcount in the Internet In trace-route measurements explained in Van Mieghem (2004a), we are interested in the hopcount KQ denoted with capital K, which equals kQ in the URT excluding the event kQ = 0. In other words, the source and the destination are dierent nodes in the graph. Since from (16.8) Pr[kQ = 0] = (1) (31)Q 31 VQ Q! = Q1 we obtain, for 1 n Q 1, Pr[KQ = n] = Pr[kQ = n|kQ 6= 0] = = Pr[kQ = n> kQ 6= 0] Pr[kQ 6= 0] Q Pr[kQ = n] Q 1 Using (16.8), we find (n+1) Pr[KQ = n] = Q (1)Q3(n+1) VQ Q 1 Q! (16.14) with corresponding generating function, *KQ (}) = Q31 X Pr[KQ = n] } n n=1 Q X Q Pr[kQ = n] } n Pr[kQ = 0] Q 1 Q 1 n=0 µ ¶ Q 1 = *kQ (}) Q 1 Q Q31 = The average hopcount H[KQ ] = H[kQ |kQ 6= 0] is Q X1 Q 1 o Q 31 H[KQ ] = o=2 (16.15) 358 The Shortest Path Problem Hence, for large Q and in practice, we find that µ Pr[KQ = n] = Pr[kQ = n] + R 1 Q ¶ which allows us to use the previously derived expressions (16.12), (16.10) and (16.11). The histogram of the number of traversed routers in the Internet measured between two arbitrary communicating parties seems reasonably well modeled by the pdf (16.12). Figure 16.4 shows both the histogram of the hopcount deduced from paths in the Internet measured via the trace-route utility and the fit with (16.12). From the fit, we find a rather high number of nodes Asia Europe USA fit with log(NAsia) = 13.5 fit with log(NEurope) = 12.6 fit with log(NUSA) = 12.9 0.10 Pr[H = k] 0.08 0.06 0.04 0.02 0.00 0 5 10 15 20 25 30 hop k Fig. 16.4. The histograms of the hopcount derived from the trace-route measurement in three continents from CAIDA in 2004 are fitted by the pdf (16.12) of the hopcount in the URT. h12=6 3 105 Q h13=5 7 105 , which points to the approximate nature of modeling the Internet hopcount by that deduced from a URT. The relation between Internet measurements and the properties of the URT is further analyzed in a series of articles (Van Mieghem et al., 2000; van der Hofstad et al., 2001; Van Mieghem et al., 2001b; Janic et al., 2002; van der Hofstad et al., 2002b). At the time of writing, an accurate model of the hopcount in the Internet is not available. 16.4 The weight of the shortest path 359 16.4 The weight of the shortest path The weight — sometimes also called the length — of the shortest path is defined as the sum of the link weights that constitute the shortest path. In Section 16.2.1, the shortest path tree in the complete graph with exponential link weights was shown to be a URT. In this section, we confine ourselves to the same type of graph and require that the source node D (or root) is dierent from the destination node E. By Theorem 10.2.3 of a continuous-time Markov chain, the discovery time Pn of the n-th node from node D equals yn = q=1 q , where 1 > 2 > = = = > n are independent, exponentially distributed random variables with parameter q = q(Q q) with 1 q n. We call m the interattachement time between the discovery or the attachment to the URT of the m 1-th and m-th node in the graph. The Laplace transform of yn is Z " £ 3}y ¤ g n = h3}w Pr [yn w] H h gw 0 For a sum of independent exponential random variables, using the probability generating function (3.16), we have !# " à n n n X Y £ 3}y ¤ £ ¤ Y q(Q q) H h n = H exp } = q H h3}q = } + q(Q q) q=1 q=1 q=1 (16.16) ¤ £ The probability generating function3 *ZQ (}) = H h3}ZQ of the weight ZQ of the shortest path equals *ZQ (}) = Q31 X ¤ £ H h3}yn Pr [E is n-th attached node in URT] n=1 1 X Y q(Q q) Q 1 } + q(Q q) Q31 n = (16.17) n=1 q=1 because any node apart from the root D but including the destination node E has equal probability to be the n-th attached node. The average weight is ¯ ¯ Q 31 n g*ZQ (}) ¯¯ 1 X g Y q(Q q) ¯¯ = H [ZQ ] = ¯ ¯ g} Q 1 g} q=1 } + q(Q q) ¯ }=0 n=1 3 1 d }=0 If the link weights have mean (instead of 1), then ZQ is multiplied by d as explained in Sections 16.2.1 and 3.4.1. The weight of the scaled shortest path ZQ>d has pgf l k *ZQ>d (}) = H h3}dZQ = *ZQ (d}) 360 The Shortest Path Problem Using the logarithmic derivative of the product, ¯ n g Y q(Q q) ¯¯ ¯ g} q=1 } + q(Q q) ¯ n Y }=0 q(Q q) g = } + q(Q q) g} q=1 = à n X q(Q q) log } + q(Q q) q=1 !¯ ¯ ¯ ¯ ¯ }=0 n X 1 q(Q q) q=1 gives Q31 n Q31 Q 31 X 1 1 1 XX 1 X H [ZQ ] = 1 = Q 1 q(Q q) Q 1 q=1 q(Q q) q=1 = n=1 Q31 X n=q Q q 1 Q 1 q=1 q(Q q) The average weight is #(Q ) + 1 X 1 = Q 1 q Q 1 Q31 H [ZQ ] = (16.18) q=1 For large Q , log Q + H [ZQ ] = +R Q µ 1 Q2 ¶ Similarly, the variance is computed (see problem (ii) in Section 16.9) as, ³P ´2 Q 31 1 Q31 X q=1 q 1 3 Var [ZQ ] = (16.19) 2 Q (Q 1) q (Q 1)2 Q q=1 and for large Q, 2 +R Var [ZQ ] = 2Q 2 µ log2 Q Q3 ¶ By inverse Laplace transform of (16.17), the distribution Pr [ZQ w] can be computed. The asymptotic distribution for the weight of the shortest path is (see problem (iii) in Section 16.9) 3{ lim Pr [Q ZQ log Q {] = h3h Q<" (16.20) A related but slightly more complex analysis is presented in Section 16.5.1 where we study the flooding time. The interest of such an asymptotic analysis is that it often leads to tractable solutions that are physically more appealing to interpret. Moreover, it turns out that results for finite, not too small Q are reasonably approximated by the asymptotic law. 16.5 The flooding time WQ 361 Since ZQ equals the sum of the link weights of the shortest path from the root to an arbitrary node and since KQ = kQ |kQ A 0 is the number of links in that shortest path (where the arbitrary destination node is dierent from the root), one may wonder whether there is a relation between them. Although the shortest path has precisely KQ hops, the destination node of that path is not necessarily the KQ -th attached node to the URT grown at the root. The destination node cannot be discovered sooner than the KQ -th attached node, otherwise the hopcount of the shortest path would be shorter than KQ . Hence, the destination node is the n-th discovered node and attached to the URT somewhere in between the KQ 1-th and the last attached node. Thus, n 5 [KQ > Q 1]. If n = KQ , then all previously discovered nodes belong to the shortest path and the m-th attached node in the URT is linked to the m 1-th, for all m n. If n A KQ , precisely n KQ of the attached nodes do not belong to the shortest path. Hence ZQ = Zn provided n KQ nodes in the URT discovered so far do not belong to the path and precisely KQ do. The latter condition requires the determination of all structurally favorable possibilities which is rather complex. Curiously, the probability that the shortest path consists of the direct (2) link between source and destination is, with (16.14), (16.18) and VQ = P 1 (1)Q (Q 1)! Q31 n=1 n , 1 X 1 Pr[KQ = 1] = = H [ZQ ] Q 1 n Q31 n=1 16.5 The flooding time WQ The most commonly used process that informs each node (router) about changes in the network topology is called flooding: the source node initiates the flooding process by sending the packet with topology information to all adjacent neighbors and every router forwards the packet on all interfaces except for the incoming one and duplicate packets are discarded. Flooding is particularly simple and robust since it progresses, in fact, along all possible paths from the emitting node to the receiving node. Hence, a flooded packet reaches a node in the network in the shortest possible time (if overheads in routers are ignored). Therefore, an interesting problem lies in the determination of the flooding time WQ , which is the minimum time needed to inform all nodes in a network with Q nodes. Only after a time WQ , all topology databases at each router in the network are again synchronized, i.e. all routers possess the same topology information. The flooding time WQ 362 The Shortest Path Problem is defined as the minimum time needed to reach all Q 1 remaining nodes from a source node over their respective shortest paths. We will here consider the flooding time WQ in the complete graph containing Q nodes and with independent, exponentially distributed link weights with mean 1. The generalization to the random graph Js (Q ) with i.i.d. exponential (or uniform4 ) distributed link weight is treated in van der Hofstad et al. (2002a). The flooding time WQ equals the absorption time, starting from state q = 1 of the birth-process with rates (16.2). The probability generating function follows directly from (16.16) with n = Q 1, 3{WQ *WQ ({) = H[h Z " ]= 0 h3{w iWQ (w) gw = Q31 Y q(Q q) q(Q q) + { q=1 (16.21) The average flooding time equals H[WQ ] = Q31 X H [q ] = q=1 Q31 X Q31 1 2 X 1 2 = = (#(Q ) + ) (16.22) q(Q q) Q q=1 q Q q=1 Using the asymptotic expansion (Abramowitz and Stegun, 1968, Section 6.3.18) of the diagamma function, we conclude that 2 log Q Q which demonstrates that the average flooding time in the complete graph with exponential link weights with mean 1 decreases to zero when Q $ 4. Also, the average flooding time is about twice as long as the average weight of an arbitrary shortest path (16.18). The variance of WQ equals H[WQ ] Q31 Q31 1 2 X 1 4 X 1 = + q2 (Q q)2 Q 2 q=1 q2 Q 3 q=1 q q=1 q=1 (16.23) ³ ´ log Q 2 For large Q , we have that Var[WQ ] = 3Q 2 + R Q 3 . Var[WQ ] = Q31 X Var [q ] = Q31 X 16.5.1 The asymptotic law for flooding time WQ The exact expression iWQ (w) for probability density function of the flooding time WQ derived in van der Hofstad et al. (2002a), does not provide much 4 Both the exponential and uniform distribution are regular distributions with extreme value index = 1. This means that the small link weights that are most likely included in the shortest path are almost identically distributed for all regular distributions with same iz (0). 16.5 The flooding time WQ 363 insight. Because we are interested in the flooding time in large networks, we investigate the asymptotic distribution of WQ > for Q large. We rewrite (16.21) as [(Q 1)!]2 h ¡ ¢ i Q31 Q2 Q 2 q { + q=1 4 2 *WQ ({) = Q For Q = 2P , using K(}+p) K(}+1) = à *W2P ({) = (16.24) Qp31 q=1 (q + }), we deduce that !2 s (2P )(1 + { + P 2 P ) s (P + { + P 2 ) (16.25) s { For large P , there holds { + P 2 P + 2P , provided |{| ? 2P . After substitution of { = 2P | in (16.25), with ||| ? 1, we obtain *W2P (2P |) 2 (1 + |) 2 (2P ) 2 (1 + |)(2P )32| 2 (2P + |) from which follows the asymptotic relation lim Q 2| *WQ (Q |) = 2 (1 + |)> Q<" Equivalently, we have for ||| ? 1, 3|(QWQ 32 log Q) lim H[h Q<" 1 ] = lim Q<" Q Z " µ 3|w h 3" ||| ? 1 iWQ w + 2 log Q Q (16.26) ¶ gw = 2 (1 + |) This limit demonstrates that the probability distribution function of the random variable QWQ 2 log Q converges to a probability distribution with Laplace transform 2 (1 + |). Let us define the normalized density function ¶ µ 1 w + 2 log Q jQ (w) = iWQ (16.27) Q Q We can prove convergence in density, i.e. limQ <" jQ (w) = j (w) and that the latter exists. By the inversion theorem for Laplace transforms we obtain for w M R, 1 Q <" 2l lim jQ (w) = lim Q <" ] f+l" f3l" h|w Q 2| *WQ (Q|)g| where 0 ? f ? 1. Since K(}) is analytic over the entire complex plane except for simple poles at the points } = 3q for q = 0> 1> 2> ===> we find that Q 2| *WQ (Q|) is analytic whenever the real part of | is non-negative. Evaluation along the line Re(|) = f = 0 then gives ] " 1 lim jQ (w) = lim hlwx Q 2lx *WQ (lQx)gx Q <" Q <" 2 3" 364 The Shortest Path Problem As dominating function we take |hlwx Q 2lx *WQ (lQx)| = |*WQ (lQx)| $ 1 + x2 x4 when x| A 1> and |*WQ (lQx)| $ 1> for |x| $ 1= This follows from the first equality in (16.24), using only the factors in the product with q = 1 and q = Q 3 1> and bounding the other factors using q(Q 3 q) $1 |q(Q 3 q) + lQx| The Dominated Convergence Theorem 6.1.4 allows us to interchange the limit and integration operator such that lim jQ (w) = Q <" 1 2 ] " 1 = 2l hlwx lim Q 2lx *WQ (lQx)gx = 3" ] l" Q <" 1 2l ] l" 3l" hw| lim Q 2| *WQ (Q|)g| Q <" hw| K2 (1 + |)g| (16.28) 3l" The right-hand side of (16.26) is a perfect square, which indicates that the limit distribution is a two-fold convolution. Now, the Mellin transform (Titchmarsh, 1948) of the exponential function is Z f+l" 1 3w w3| (|) g|> fA0 h = 2l f3l" and thus with w = h3x , g ³ 3h3x ´ 1 h = gx 2l Z f+l" h|x (| + 1) g| f3l" which shows that (16.28) is the two-fold convolution of the probability den3w g (w)> where (w) = h3h is the Gumbel distribution (3.37). sity function gw Furthermore, the two-fold convolution is given by Z " g ³ (2W) ´ 3x 3(w3x) h3h h3h gx (w) = h3w gw 3" µ ¶¸ Z " w 3w 3w@2 =h exp 2h cosh x gx 2 3" Z " ³ ´ h i exp 2h3w@2 cosh (x) gx = 2h3w N0 2h3w@2 = 2h3w 0 where N ({) denotes the modified Bessel function (Abramowitz and Stegun, 1968, Section 9.6) of order . In summary, ³ ´ g ³ (2W) ´ (w) = 2h3w N0 2h3w@2 (16.29) lim jQ (w) = j(w) = Q<" gw 16.5 The flooding time WQ 365 and the corresponding distribution function is Z } ³ ´ h3w N0 (2h3w@2 )gw = 2h3}@2 N1 2h3}@2 lim Pr[Q WQ 2 log Q }] = 2 Q<" 3" (16.30) The right-hand side of (16.29) is maximal for w = 0=506357, which is slightly smaller than = 0=577261> but still in accordance with H[WQ ] given by (16.22). The asymmetry shows that {Q WQ 2 log Q + }} is much more 0 10 M=5 M = 10 M = 20 limit M of -1 10 -2 10 g2M(t) g2M(t) 0.20 -3 10 0.15 0.10 0.05 -4 10 0.00 -4 -2 0 2 4 6 8 10 t -5 10 -4 -2 0 2 4 6 8 10 t Fig. 16.5. The scaled density jQ (w) for three values of Q = 2P (dotted lines) and the asymptotic result (full line) on a log-lin scale. The insert is drawn on a lin-lin scale. likely than the event {Q WQ 2 log Q }}, which confirms the intuition that the flooding time can be much longer than the average H[WQ ], but not so much shorter than H[WQ ]. Figure 16.5 illustrates the convergence of jQ (w) to the limit in (16.29). When comparing (16.26) with the corresponding result (C.6) for the weight of the shortest path, we observe that, for large Q , the random variable Q WQ 2 log Q consists of the sum of Q ZQ;1 log Q + Q ZQ;2 log Q , where both Q ZQ;m log Q are i.i.d. random variables. Intuitively, we can say that the flooding time consists of the time to travel from a left-hand corner of the graph to the center and from the center to a right-hand corner of the graph. The asymptotic distribution (16.30) is a beautiful example of a sum of Q 366 The Shortest Path Problem independent random variables that clearly does not converge to a Gaussian and, hence, does not obey the (extended) Central Limit Theorem 6.3.1. 16.6 The degree of a node in the URT n o (n) Let us denote by GQ the set of nodes with degree n in a graph with (n) Q nodes n o and by GQ the cardinality (the number of elements) of this set (n) GQ . Since each node appears only in one set, it holds for any graph that Q31 X (n) GQ = Q (16.31) n=1 In a probabilistic setting, we may investigate the event that the degree n occurs in a graph of size Q . The expectation of that event is 6 5 Q Q Q h h i i X X X (n) 8 7 H GQ = H 1{gm =n} = H 1{gm =n} = Pr [gm = n] (16.32) m=1 m=1 m=1 By summing over all n, we verify that # "Q31 Q Q31 Q X (n) X X X H GQ = Pr [gm = n] = 1=Q n=1 m=1 n=1 m=1 which is again (16.31). i h (n) 16.6.1 Recursion for Pr GQ = m in the URT The growth law of URTs dictates the way a specific tree of size Q transforms to the tree of size Q + 1 by adding the node with label Q + 1 at random. Based on this growth law, the set of nodes with degree n in a specific tree of size Q + 1 consists of: (i) the same set of nodes with degree n in the ancestor tree of size Q provided the new node qQ+1 is not attached to any of the nodes of this set nor to any of the nodes with degree n 1; (ii) the same set of nodes with degree n except for one, say node qo , provided the new node qQ+1 is attached to that node qo ; (iii) the same set of nodes with degree n and one additional node of the set of n 1 degree nodes provided the new node qQ+1 is attached to a node of the set of degree n 1. 16.6 The degree of a node in the URT 367 The evolution scenario in three parts is generally applicable for any class of trees that possess a growth law. It does not hold for graphs in general because only in a tree, a node has one well-defined parent node and the in-degree is one. Using the law of total probability (2.46) yields, oi h oi i h n n h (n) (n) (n) (n31) (n) (n31) Pr qQ+1 5 @ GQ >GQ @ GQ >GQ Pr GQ +1 = m = Pr GQ = m|qQ +1 5 n n oi h oi h (n) (n) (n) Pr qQ+1 5 GQ + Pr GQ = m + 1|qQ+1 5 GQ n n h oi h oi (n) (n31) (n31) + Pr GQ = m 1|qQ+1 5 GQ Pr qQ+1 5 GQ If the process of attaching a new node Q + 1 does not depend on the way thehQ previous nodes i arehattached ibut rather on their number, there holds (n) (n) Pr GQ = m|qQ+1 = Pr GQ = m . This property holds for the URT. We obtain a three point recursion for n A 1, h i h i h n oi (n) (n) (n) (n31) Pr GQ+1 = m = Pr GQ = m Pr qQ+1 5 @ GQ > GQ oi i h n h (n) (n) + Pr GQ = m + 1 Pr qQ +1 5 GQ i h n oi h (n) (n31) + Pr GQ = m 1 Pr qQ +1 5 GQ The probability generating function " h (n) i X h i (n) Pr GQ = m } m *G (}> Q ; n) = H } GQ = m=0 is obtained after multiplication by } m and summing over all m, h n oi (n) (n31) @ GQ > GQ *G (}> Q ; n) *G (}> Q + 1; n) = Pr qQ+1 5 h n oi * (}> Q ; n) * (0> Q; n) (n) G G + Pr qQ+1 5 GQ } n oi h (n31) }*G (}> Q ; n) + Pr qQ+1 5 GQ h i (n) Now *G (0> Q ; n) = Pr GQ = 0 is the probability of the event that there are no nodes with degree n for 1 n nmax Q 1. Since the normalization of the generating function requires that *G (1> Q ; n) = 1 and since n n n oi h oi h oi h (n) (n31) (n) (n31) @ GQ >GQ +Pr qQ+1 5 GQ +Pr qQ+1 5 GQ =1 Pr qQ+1 5 h i h n oi (n) (n) it follows that Pr GQ = 0 Pr qQ+1 5 GQ = 0. Further, h n oi (n) Pr qQ+1 5 GQ 6= 0 368 The Shortest Path Problem for any n 5 [1> nmax ] because the attachment of the node qQ+1 is possible to any non-empty set, this means that the absence of nodes with degree h i (n) n 5 [1> nmax ] cannot occur in URTs, thus Pr GQ = 0 = 0. A consequence is that the probability generating function *G (}> Q ; n) is at least R (}) as h } $ 0 n(for Q A 1).oiAfter using *G (0> Q ; n) = 0 and eliminat(n) (n31) @ GQ > GQ ing Pr qQ +1 5 , the recursion relation for the probability generating function becomes5 n oi 3 h 4 (n) n h oi Pr q 5 G Q+1 Q *G (}> Q + 1; n) (n31) D = 1+C Pr qQ+1 5 GQ (1 }) *G (}> Q ; n) } (16.33) The special case for n = 1 and Q A 1 is h n oi 4 3 (1) Pr qQ+1 5 GQ (1 }) D *G (}> Q ; 1) *G (}> Q + 1; 1) = C1 + } 16.6.2 The Average Number of Degree n Nodes in the URT In the URT, a new node qQ+1 is attached uniformly to any of Q previously attached nodes such that h i (n) n oi H GQ h (n) = Pr qQ+1 5 GQ Q Also, the probability that an arbitrary node in a URT of size Q has degree k l (n) n equals H GQ Q . We obtain from (16.33) 3 3 *G (}> Q + 1; n) = C1 + C i h (n) H GQ Q} h i4 4 (n31) H GQ D (1 })D *G (}> Q ; n) Q (16.34) 5 k (n) l (n) = 1 for all p $ n because Gp = 0 if With the initialization *G (}> p; n) = H } Gp 1 ? p $ n, after iterating (16.33) we arrive at *G (}> Q ; n) = Q 31 \ p=n k k rl rl q q 1 (n) (n31) 1 3 Pr qp+1 M Gp 3 Pr qp+1 M Gp 13 (1 3 }) } 16.6 The degree of a node in the URT 369 By taking the derivative of both sides in (16.34) with respect to } and evaluating at } = 1, a recursion for the average is found, h i (n31) i Q 1 h i H GQ h (n) (n) H GQ + (16.35) H GQ+1 = Q Q h i (n) (n) Let uQ = (Q 1)H GQ , then the recursion valid for 1 ? n Q 2 becomes un31 (n) (n) uQ+1 = uQ + Q (16.36) Q 1 Theorem 16.6.1 In the URT, the average number of degree n nodes is given by (n) n31 i Q h (1)Q+n31 VQ31 (1)Q X (m) (n) VQ31 (2)m (16.37) + n H GQ = n + 2 (Q 1)! 2 (Q 1)! m=1 Proof: See Section 16.8. ¤ (n) For large Q and using the asymptotics of the Stirling numbers VQ of the first kind (Abramowitz and Stegun, 1968, Section 24.1.3.III), the asymptotic law is i h à ! (n) H GQ logn31 Q 1 Pr [GURT = n] = (16.38) = n +R Q 2 Q2 The ratio of the average number of nodes with degree n over the total number of nodes, which equals the probability that an arbitrary node in a URT of size Q has degree n, exponentially fast with rate ln 2. h decreases i (n) The variance Var GQ is most conveniently computed from the logarithm of the probability generating function with (2.27). By taking the logarithm of both sides in (16.34) and dierentiating twice and adding (16.35), we obtain i h i h (n) (n) Var GQ+1 = i (Q ; n) + Var GQ where 3 i(Q ; n) = C h i (n) H GQ Q h h i 42 3 h i i4 (n31) (n) (n31) H GQ H GQ H GQ D +C D + + Q Q Q 370 The Shortest Path Problem h i (n) Since Var Gp = 0 for p n, the general solution is Q i X h (n) i (m; n) Var GQ = m=n For large Q , using (16.38), we observe that h i à ! (n) µ ¶ Var GQ log2n32 Q 1 3 = +R Q 2n 22n Q2 (n) G (16.39) (n) In practice, if we use the estimator ŵQ = QQ for the probability that the degree of a node equals n, then (a) the estimator is unbiased because the l k (n) h i H G (n) Q mean of the estimator H ŵQ equals the correct mean and (b) the Q l k ¸ (n) h i (n) ¡ ¢ Var GQ G (n) variance Var ŵQ = Var QQ = $ 0 as R Q1 for large Q . 2 Q 10 0 RIPE data (May-June 2003) N = 2574, L = 3992 fit: ln(Pr[D U = k]) = 0.44 - 0.67 k with U = 0.99 Pr[DU = k] 10 10 10 10 RIPE data (Jan.-Feb. 2004) N = 3850, L = 6743 fit: ln(Pr[D U = k]) = -0.49 - 0.41 k with U = 0.95 -1 -2 -3 -4 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 k Fig. 16.6. The histogram of the degree GX derived from the graph JX formed by the union of paths measured via trace-route in the Internet. Both measurements in 2003 and 2004 are fitted on a log-lin plot and the correlation coe!cient quantifies the quality of the fit. The law (16.38) is observed in Fig. 16.6, which plots the histogram of the degree GX in the graph JX . The graph JX is obtained from the union of trace-routes from each RIPE measurement box to any other box positioned 16.6 The degree of a node in the URT 371 mainly in the European part of the Internet. For about 50 measurement boxes in 2003, the correspondence is striking because the slope of the fit on a log-lin scale equals 0.668 while the law (16.38) gives ln 2 = 0=693. Ignoring in Fig. 16.6 the leave nodes with n = 1 suggests that the graph JX is URT-like. For 72 measurement boxes in 2004 which obviously results in a larger graph JX , deviations from the URT law (16.38) are observed. If measurements between a larger full mesh of boxes were possible and if the measurement boxes were more homogeneously spread over the Internet, a power law behavior is likely to be expected as mentioned in Section 15.3. However, these earlier reported trace-route measurements that lead to power law degrees have been performed from a relatively small number of sources to a large number of destinations. These results question the observability of the Internet: how accurate are Internet properties such as hopcount and degree that are derived from incomplete measurements, i.e. from a selected small subset of nodes at which measurement boxes are placed? 16.6.3 The degree of the shortest path tree in the complete graph with i.i.d. exponential link weights In the complete graph NQ with i.i.d. exponential link weights, any node q possesses equal properties in probability because of symmetry. If we denote by gq the degree of node q in the shortest path tree rooted at that node q, the symmetry implies that Pr [gq = n] = Pr [gl = n] for any node q and l. In fact, we consider here the degree of a URT as an overlay tree in a complete graph. Concentrating on a node with label 1, we obtain from (16.32) h i i h (n) (1) H GQ = Q Pr [g1 = n] = Q Pr [Q = n The latter follows from the fact that the degree of a node is equal to the (1) number of its direct neighbors, the nodes at level 1, [Q . By definition of the URT, the second node surely belongs to the level set 1, while node 3 has equal probability to be attached to the root or to node 2. In general, when attaching a node m to a URT of size m 1, the probability that node m is 1 . Thus, the number of nodes at level 1 in the attached to the root equals m31 URT (constructed upon the complete graph) is in distribution equal to the sum of Q 1 independent Bernoulli random variables each with dierent 1 , mean m31 (1) g [Q = Q X m=2 µ Bernoulli 1 m1 ¶ = Q31 X m=1 µ ¶ 1 Bernoulli m 372 The Shortest Path Problem because each node in the complete graph is connected to Q 1 neighbors. The generating function is SQ 31 ¸ Q31 h (1) i Y Bernoulli 1 ¸ 1 [Q m=1 Bernoulli m m =H } H } = H } m=1 Using the generating function (3.1) of a Bernoulli random vari probability ¸ able, H } Bernoulli 1 m = 1 1m + }m , yields h (1) i Q31 Y µ } + m 1 ¶ (} + Q 1) [Q H } = = m (})(Q ) m=1 Compared to the generating function (16.6) of the hopcount kQ , we recognize that Q32 Q31 h (1) i X X Pr[kQ31 = n]} n+1 = Pr[kQ31 = n 1]} n H } [Q = }*Q31 (}) = n=0 n=1 from which we deduce, for 1 n Q 1, h i (1) Pr [Q = n = Pr[kQ31 = n 1] Using (16.7), we arrive at the curious result h i (n31) h i H [Q 31 (1) Pr [Q = n = for 1 n Q 1= Q 1 The probability that the number of level 1 nodes in the shortest path tree in the complete graph with i.i.d. exponential link weights is n equals the average number of nodes on level n 1 in a URT of size Q 1 divided by that size Q 1. In other words, the “horizontal” distribution at level 1 is related to the “vertical” distribution of the size of the level sets. In summary6 , in the complete graph with i.i.d. exponential link weights, the probability that an arbitrary node q as root of a shortest path tree has degree n is (n) (1)Q313n VQ31 Pr [gq = n] = Pr[kQ31 = n 1] = (Q 1)! (16.40) The degree of an arbitrary node in the union of all shortest paths trees in the complete graph NQ with i.i.d. exponential link weights is also given by (16.40) because in that union each node q is once a root and further plays, 6 This result is due to Remco van der Hofstad (private communication). 16.7 The minimum spanning tree 373 by symmetry, the role of the m-th attached node in the URT rooted at any other node in NQ . 16.7 The minimum spanning tree From an algorithmic point of view, the shortest path problem is closely related to the computation of the minimum spanning tree (MST). The Dijkstra shortest path algorithm is similar to Prim’s minimum spanning tree algorithm (Cormen et al., 1991). In this section, we compute the average weight of the MST in a graph with a general link weight structure. 16.7.1 The Kruskal growth process of the MST Since the link weights in the underlying complete graph are chosen independently and assigned randomly to links in the complete graph, the resulting graph is probabilistically the same if we first order the set of link weights and assign them in increasing order randomly to links in the complete graph. In the latter construction process, only the order statistics or the ranking of the link weights su!ce to construct the graph because the precise link weight can be unambiguously associated to the rank of a link. This observation immediately favors the Kruskal algorithm for the MST over Prim’s algorithm. Although the Prim algorithm leads to the same MST, it gives a more complicated, long-memory growth process, where the attachment of each new node depends stochastically on the whole growth history so far. Pietronero and Schneider (1990) illustrate that in our approach Prim, in contrast with Kruskal, leads to a very complicated stochastic process for the construction of the MST. The Kruskal growth process described here is closely related to a growth process of the random graph Ju (Q> O) with Q nodes and O links. The construction or growth of Ju (Q> O) starts from Q individual nodes and in each step an arbitrary, not yet connected random pairs is connected. The only dierence with Kruskal’s algorithm for the MST is that, in Kruskal, links generating loops are forbidden. Those forbidden links are the links that connect nodes within the same connected component or “cluster”. As a result, the internal wiring of the clusters diers, but the cluster size statistics (counted in nodes, not links) is exactly the same as in the corresponding random graph. The metacode of the Kruskal growth process for the construction of the MST is shown in Fig. 16.7. The growth process of the random graph Js (Q ), which is asymptotically equal to that of Ju (Q> O), is quantified in Section 15.6.4 for large Q . The 374 The Shortest Path Problem KruskalGrowthMST 1. start with Q disconnected nodes 2. repeat until all nodes are connected 3. randomly select a node pair (l> m) 4. if a path Pl$m does not exist 5. then connect l to m Fig. 16.7. Kruskal growth process fraction of nodes V in the giant component of Js (Q ) is related to the average degree or to the link density s because rg = s(Q 1) in Js (Q ) by (15.20). For large Q , the size of the giant cluster in the forest is thus determined as a function of the number of added links that increase rg . a e c d f b Fig. 16.8. Component structure during the Kruskal growth process. We will now transform the mean degree rg in the random graph Js (Q ) to the mean degree MST in the corresponding stage in Kruskal growth process of the MST. In early stages of the growth each selected link will be added with high probability such that MST = rg almost surely. After some time the probability that a selected link is forbidden increases, and thus rg exceeds MST . In the end, when connectivity of all Q nodes is reached, MST = 2 (since it is a tree) while rg = R(log Q ), as follows from (15.19) and the critical threshold sf logQQ . Consider now an intermediate stage of the growth as illustrated in Fig. 16.8. Assume there is a giant component of average size Q V and qo = Q(1 V)@vo small components of average size vo each. Then we can distinguish six types of links labelled d-i in Fig. 16.8. Types d and e are links that have been 16.7 The minimum spanning tree 375 chosen earlier in the giant component (d) and in the small components (e) respectively. Types f and g are eligible links between the giant component and a small component (f) and between small components (g) respectively. Types h and i are forbidden links connecting nodes within the giant component (h), respectively within a small component (i ). For large Q , we can enumerate the average number of links O{ of each type {: Od + Oe = 12 PVW Q Of = VQ · (1 V)Q Og = 12 q2o · v2o Oh = 12 (VQ )2 VQ Oi = 12 qo vo (vo 1) qo (vo 1) To highest order in R(Q 2 ), we have Of = Q 2 V(1 V)> 1 Og = Q 2 (1 V)2 > 2 1 Oh = Q 2 V 2 2 +Og The probability that a randomly selected link is eligible is t = Of +OOgf +O h +Oi ¡ 2¢ or, to order R Q , t = 1 V2 (16.41) In contrast with the growth of the random graph Js (Q ) where at each stage a link is added with probability s, in the Kruskal growth of the MST we are only successful to add one link (with probability 1) per 1t stages on average. Thus the average number of links added in the random graph corresponding 1 to one link in the MST is 1t = 13V 2 . This provides an asymptotic mapping between rg and MST in the form of a dierential equation, grg 1 = gMST 1 V2 By using (15.22), we find gMST grg (1 + V) (V + (1 V) log(1 V)) gMST = = gV grg gV V2 Integration with the initial condition MST = 2 at V = 1, finally gives the average degree MST in the MST as function of the fraction V of nodes in the giant component MST (V) = 2V (1 V)2 log(1 V) V (16.42) As shown in Fig. 16.9, the asymptotic result (16.42) agrees well with the simulation (even for a single sample), except in a small region around the transition MST = 1 and for relatively small Q . The key observation is that all transition probabilities in the Kruskal 376 The Shortest Path Problem Fraction S of nodes in the giant component 1.0 0.8 N = 1000 N = 10000 N = 25000 Theory 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 Mean degree PMST Fig. 16.9. Size of the giant component (divided by Q ) as a function of the mean degree M ST . Each simulation for a dierent number of nodes Q consists of one MST sample. growth process asymptotically depend on merely one parameter V, the fraction of nodes in the giant component, and V is called an order parameter in statistical physics. In general, the expectation of an order parameter distinguishes the qualitatively dierent regimes (states) below and above the phase transition. In higher dimensions, fluctuations of the order parameter around the mean can be neglected and the mean value can be computed from a selfconsistent mean-field theory. In our problem, the underlying complete (or random) graph topology makes the problem eectively infinite-dimensional. The argument leading to (15.20) is essentially a mean-field argument. 16.7.2 The average weight of the minimum spanning tree By definition, the weight of the MST is ZMST = O X z(m) 1mMMST (16.43) m=1 where z(m) is the m-th smallest link weight. The average MST weight is H [ZMST ] = O X m=1 ¤ £ H z(m) 1mMMST 16.7 The minimum spanning tree 377 The random variables z(m) and 1mMMST are independent because the m-th smallest link weight z(m) only depends on the link weight distribution and the number of links O, while the appearance of the m-th link in the MST only depends on the graph’s topology, as shown in Section 16.7.1. Hence, ¤ £ ¤ ¤ £ £ H z(m) 1mMMST = H z(m) H [1mMMST ] = H z(m) Pr [m 5 MST] such that the average weight of the MST is H [ZMST ] = O X £ ¤ H z(m) Pr [m 5 MST] (16.44) m=1 In general for independent link weights with probability density function iz ({) and distribution function Iz ({) = Pr [z {], the probability density function of the m-th order statistic follows from (3.36) as µ ¶ miz ({) O (16.45) (Iz ({))m (1 Iz ({))O3m iz(m) ({) = Iz ({) m ¡ ¢ The factor Om (Iz ({))m (1 Iz ({))O3m is a binomial distribution with mean = Iz ({) O and variance 2 = OIz ({) (1 Iz ({)) that, by the Central (m3)2 Limit Theory 6.3.1, tends for large O to a Gaussian I12 h3 22 , which £ ¤ peaks at m = . For large Q and fixed Om , we have7 {m = H z(m) ' Iz31 ( Om ). We found before in (16.41) that the link ranked m appears in the MST with probability Pr [m 5 MST] = 1 Vm2 where Vm is the fraction of nodes in the giant component during the construction process of the random graph at the stage where the number of links precisely equals m. Since links are added independently, that stage in fact establishes the random graph Ju (Q> O = m). Our graph under ¡Q ¢consideration is the complete graph NQ such that we add in total O = 2 links. 7 31 In general, it holds that z(n) = Iz (X(n) ) and 31 31 H z(n) = H Iz (X(n) ) 6= Iz (H X(n) ) but, for a large number of order statistics O, the Central Limit Theorem 6.3.1 leads to m 31 31 H z(n) ' Iz (H X(n) ) ' Iz O because for a uniform random variable X on [0,1] the average weight of the m-th smallest link is exactly m m H z(l) = ' O+1 O 378 The Shortest Path Problem With (15.22) and rg = 2O Q , it follows that log(1 Vm ) 2m = Q Vm Hence, H [ZMST ] ' O X m=1 Iz31 (16.46) µ ¶ ¢ m ¡ 1 Vm2 O We now approximate the sum by an integral, Z O ³x´¡ ¢ H [ZMST ] ' Iz31 1 Vx2 gx O 1 Substituting { = 2x Q (which is the average degree in any graph J (Q> x)) 2 yields for large Q where O ' Q2 , Z Z ´ ¢ Q Q 3131 ³ { ´³ Q Q 31 ³ { ´¡ 2 Iz Iz H [ZMST ] ' 1 V Q { g{ ' 1 V 2 ({) g{ 2 2 Q 2 0 Q 2 Q It is known (Janson et al., 1993) that, if the number of links in the growth process of the random graph is below Q2 , with high probability (and ignoring a small onset region just below Q2 ), there is no giant component such that V ({) = 0 for { 5 [0> 1]. Thus, we arrive at the general formula valid for large Q , Z Z ¢ Q 1 31 ³ { ´ Q Q 31 ³ { ´ ¡ Iz Iz H [ZMST ] ' g{ + 1 V 2 ({) g{ 2 0 Q 2 1 Q (16.47) The first term is the contribution from the smallest Q@2 links in the graph, which are included in the MST almost surely. The remaining part comes from the more expensive links in the graph, which are included with diminishing probability since 1 V 2 ({) decreases exponentially for large { as can be deduced from (15.21). The rapid decrease of 1 V 2 ({) makes only ¡ ¢ relatively small values of the argument Iz31 Q{ contribute to the second integral. At this point, the specifics of the link weight ¡ ¢ distribution needs to be introduced. The Taylor expansion of Q2 Iz31 Q{ for large Q to first order is µ ¶ µ ¶ 1 1 Q 31 ³ { ´ Q 31 { { Iz +R +R = Iz (0) + = 2 Q 2 2iz (0) Q 2iz (0) Q since we require that link weights are positive such that Iz31 (0) = 0. This expansion is only useful provided iz is regular, i.e. iz (0) is neither zero nor 16.7 The minimum spanning tree 379 infinity. These cases occur, for example, for polynomial link weights with iz ({) = {31 and 6= 1. For polynomial link weights, however, holds ¡ ¢ 13 1 1 that Q2 Iz31 Q{ = Q 2 { . Formally, this latter expression reduces to the first order Taylor approach for = 1, apart from the constant factor iz1(0) . Therefore, we will first compute H [ZMST ] for polynomial link weights and then return to the case in which the Taylor expansion is useful. 16.7.2.1 Polynomial link weights The average weight of the MST for polynomial link weights follows8 from (16.47) as à ! 1 Z Q ¢ 1 ¡ 1 Q 13 { 1 V 2 ({) g{ + H [ZMST ()] ' 1 2 + 1 1 and g{ = Let | = V ({) and use (15.22), then { = V 31 (|) = log(13|) | ³ ´ log(13|) g g| while | = V (1) = 0 and | = V (Q ) = 1, such that g| | Z Q L= ¢ 1 ¡ { 1 V 2 ({) g{ 1 ¶1 µ ¶ Z 1µ ¢ g log(1 |) log(1 |) ¡ 1 |2 = g| | g| | 0 After partial integration, we have 2 1 + 1 L =1 +1 +1 Finally, we end up with H [ZMST ()] ' Q 8 à 1 13 1 1 +1 Z " 0 Z " { 0 h3{ 1 { +1 1 (1 h3{ ) 1 +1 g{ ! h3{ 1 (1 h3{ ) g{ (16.48) Since the average of the n-th smallest link weight can be computed from (3.36) as 1 K n+ H! H z(n) = 1 K (n) K H+1+ the exact formula (16.44) reduces to H 1 [ K m+ H! 1 3 Vm2 H [ZM S T ()] = 1 K (m) K H+1+ m=1 Analogously to the above manipulations, after convertion to an integral, substituting { = K(}+ 1 ) 2x and using (Abramowitz and Stegun, 1968, Section 6.1.47), for large }, that K(}) = Q 1 (}) 1 + R }1 , we arrive at the same formula. 380 The Shortest Path Problem If ? 1, then H[ZMST ()] $ 0 for Q $ 4, while for A 1, H[ZMST ()] $ 4. In particular, lim<" H [ZMST ()] = Q 1. Only for = 1, H [ZPVW (1)] is finite for large Q . More precisely, H [ZMST (1)] = (3) = 1=202 = = = (16.49) where we have used (Abramowitz and Stegun, 1968,R Section 23.2.7) the " v31 integral of the Riemann Zeta function (v) (v) = 0 hxx 31 gx, which is convergent for Re (v) A 1. This particular case for = 1 has been proved earlier by Frieze (1985) based on a dierent method. 16.7.2.2 Generalizations We now return to the Taylor series valid for link weights where 0 ? iz (0) ? 4. The above result for = 1 immediately yields H [ZMST ] = (3) iz (0) (16.50) This result is for the complete graph NQ . A random graph Js (Q ) with s ? 1 and weight density iz ({) is equivalent to NQ with a fraction 1s of infinite link weights. Thus the eective link weight distribution is siz ({) + (1 s)z>" , and we can simply replace iz (0) by siz (0) in the expression (16.50) to obtain the average weight of the MST in the random graph Js (Q ). 16.8 The proof of the degree Theorem 16.6.1 of the URT h i (Q31) 16.8.1 The case n = Q: H GQ (Q ) If n = Q, the recursion (16.35) becomes with GQ k l (Q ) H GQ +1 = = 0, l k (Q 31) H GQ Q (1) With initial value G2 = 2, the solution l k (Q 31) = H GQ 2 (16.51) (Q 3 1)! l k (n) is readily verified. Since for any URT, it holds that Pr GQ = m = 0 for m A Q 3 n, we have k l k l (Q 31) (Q 31) that H GQ = Pr GQ = 1 . Since there exists in total (Q 3 1)! dierent URTs of size Q, this result (16.51) means that there are precisely two possible URTs with a node of degree Q 3 1. Indeed, one is the root with Q 3 1 children and the other is the root with one child of degree Q 3 1 that in turn possesses Q 3 2 children. Also, (Q 31) uQ l k (Q 31) = = (Q 3 1)H GQ 2 (Q 3 2)! (16.52) 16.8 The proof of the degree Theorem 16.6.1 of the URT h i (1) 16.8.2 The case n = 1: H GQ 381 If n = 1 and Q D 3, the recursion (16.35) is slightly dierent because the newly attached node nQ +1 necessarily belongs to the set of degree 1 nodes in the URT of size Q + 1 such that k l Q 3 1 k (1) l (1) H GQ +1 = H GQ + 1 Q l l k k (0) (1) (1) (1) (1) = 2. With uQ = (Q 3 1)H GQ , the recursion with G1 = 1 and G2 = 2. Hence, H G3 for n = 1 becomes (1) (1) uQ +1 = uQ + Q (1) (1) The particular solution is uQ ;s = dQ 2 + eQ + f. Substitution of uQ = dQ 2 + eQ + f into the dierence equation yields dQ 2 + (e + 2d) Q + d + e + f = dQ 2 + (e + 1) Q + f or, by equating corresponding power in Q, we find the conditions e + 2d = e + 1 and d + e = 0 from which d = 12 , e = 3 12 . Thus, l k Q (1) (1) uQ = (Q 3 1)H GQ = (Q 3 1) + f 2 and k l Q f (1) H GQ = + 2 Q 31 k l (1) Using H G3 = 2 shows that f = 1 such that, for Q A 2, l k 1 Q (1) + H GQ = 2 Q 31 (16.53) h i (n) 16.8.3 The general case: H GQ Let us denote U({> |) = 31 " Q [ [ (n) uQ {n |Q (16.54) Q =3 n=2 then the recursion (16.36) is transformed into 31 31 31 " Q " Q " Q [ [ [ [ [ [ (n) (n) (n31) n Q (Q 3 1)uQ +1 {n | Q = (Q 3 1)uQ {n | Q + uQ { | Q =3 n=2 Q =3 n=2 Q =3 n=2 Now, the left-hand side is 31 " Q " Q 32 " Q 32 [ [ 1 [ [ 1 [ [ (n) (n) (n) (Q 3 1)uQ +1 {n | Q = (Q 3 2)uQ {n | Q = (Q 3 2)uQ {n | Q | | Q =3 n=2 Q =4 n=2 Q =3 n=2 = " Q 31 " 1 [ [ 1 [ (n) (Q 31) Q 31 Q QuQ {n |Q 3 QuQ { | | Q =3 n=2 | Q =3 3 " Q 31 " 2 [ [ (n) n Q 2 [ (Q 31) Q 31 Q uQ { | + u { | | Q =3 n=2 | Q =3 Q 382 The Shortest Path Problem Using (16.54) yields 31 " Q [ [ 2 CU({> |) (n) 3 U({> |) (Q 3 1)uQ +1 {n | Q = C| | Q =3 n=2 3 " " 1 [ 2 [ (Q 31) Q 31 Q (Q 31) Q 31 Q QuQ { | + u { | | Q =3 | Q =3 Q Invoking (16.52) yields " " " [ 2 [ (Q 31) Q 31 Q ({|)Q 4 [ ({|)Q = 4{| = 4{| (h{| 3 1) uQ { | = | Q =3 {| Q =3 (Q 3 2)! Q! Q =1 " " " [ 2 [ Q ({|)Q (Q + 2) ({|)Q 1 [ (Q 31) Q 31 Q QuQ { | = = 2{| | Q =3 {| Q =3 (Q 3 2)! Q! Q =1 = 2 ({|)2 " [ ({|)Q Q =0 Q! + 4{| " [ ({|)Q = 2 ({|)2 h{| + 4{| (h{| 3 1) Q! Q =1 such that 31 " Q [ [ 2 CU({> |) (n) 3 U({> |) 3 2 ({|)2 h{| (Q 3 1)uQ +1 {n | Q = C| | Q =3 n=2 Similarly, 31 32 " Q " Q [ [ [ [ (n31) n Q (n) uQ { | ={ uQ {n |Q Q =3 n=2 Q =3 n=1 ={ 31 " Q " " [ [ [ [ (n) (1) (Q 31) Q 31 Q uQ {n |Q + {2 uQ |Q 3 { uQ { | Q =3 n=2 = {U({> |) + {2 Q =3 " [ (1) uQ | Q 3 { Q =3 Q =3 " [ (Q 31) Q 31 Q uQ { | ({|)2 g2 2 g|2 Q =3 (1) Using both (16.52) and uQ = Q (Q 3 1) + 1 leads to 2 {2 " [ (1) uQ |Q = {2 Q =3 { " [ Q =3 (Q 31) Q 31 Q uQ { | =2 Q =3 " [ Q 2 (Q 3 1)|Q + {2 " [ Q =3 |Q = |3 13| + {2 |3 13| " [ ({|)Q ({|)Q = 2 ({|)2 = 2 ({|)2 (h{| 3 1) (Q 3 2)! Q! Q =3 Q =1 " [ such that 3 31 " Q [ [ {2 | 3 ({|)2 g2 | (n31) n Q + uQ { | = {U({> |) + 3 2 ({|)2 (h{| 3 1) 2 2 g| 13| 13| Q =3 n=2 Combining all transforms the recursion (16.36) to a first order linear partial dierential equation (1 3 |) CU({> |) + C| {2 |3 2 3 3 3| + | 2 + 13{3 U({> |) = {2 | 3 + 2 ({|)2 | 13| (1 3 |)3 1 1 = {2 | 2 + 13| (1 3 |)3 16.8 The proof of the degree Theorem 16.6.1 of the URT 383 with boundary equations U({> 0) = U(0> |) = 0. Further, ] (n) 31 31 " Q " Q l k [ [ [ [ uQ U({> |) (n) n Q 31 { g| = | = H GQ {n | Q 31 |2 Q 3 1 Q =3 n=2 Q =3 n=2 Hence, if { = 1, ] 31 " Q " l k l k [ [ [ U(1> |) (n) (1) Q 3 H GQ | Q 31 g| = H GQ |Q 31 = 2 | Q =3 n=2 Q =3 " " " l k [ [ [ Q Q 31 |Q 31 (1) H GQ | Q 31 = Q|Q 31 3 3 | 2 Q 31 Q =3 Q =3 Q =3 Q =3 Q =3 ] 1 1 1 | = 3 (1 + 2|) 3 g| 2 (1 3 |)2 2 13| = " [ Q| Q 31 3 " [ or U(1> |) = |2 (1 3 |)3 3 |2 (1 3 |) (16.55) It is more convenient to consider the dierential equation as an ordinary dierential equation in | and to regard the variable { as a parameter. The homogeneous dierential equation, CUk ({> |) 2 (1 3 |) = { + 3 1 Uk ({> |) C| | is solved after integration with respect to |, ] {+ 2 31 ] ] g| g| | ln Uk ({> |) = g| = ({ 3 1) +2 13| 13| | (1 3 |) l k = (1 3 {) ln(1 3 |) + 2 (ln | 3 ln(1 3 |)) = ln (1 3 |)3{31 |2 or Uk ({> |) = (1 3 |)3{31 | 2 . The particular solution is of the form U ({> |) = F (|) Uk ({> |) where F (|) obeys CF (|) 1 1 = {2 (1 3 |){ + C| 13| (1 3 |)3 or F (|) = {2 =3 ] (1 3 |){33 + (1 3 |){31 g| + f ({) {2 (1 3 |){32 {2 (1 3 |){ 3 + f ({) {32 { where f ({) is a function of {, independent of |, to be determined later. The solution is U ({> |) = 3 ({|)2 3 ({ 3 2) (1 3 |) 3 {| 2 + f ({) (1 3 |)3{31 | 2 (1 3 |) The initial condition U (0> |) = 0 shows that f (0) = 0, while the boundary condition (16.55) implies that f(1) = 0. Expanding this solution in a power series around { = 0 and | = 0 yields Uk ({> |) = (1 3 |)3{31 |2 = " [ 3{ 3 1 (31)Q |Q +2 Q Q =0 384 The Shortest Path Problem From the generating function of the Stirling numbers of the first kind (Abramowitz and Stegun, 1968, Section 24.1.3), q [ K({ + 1) (m) = Vq {m K({ + 1 3 q) m=0 (16.56) we observe that (n+1) Q 3{ 3 1 [ VQ +1 (31)n n K(3{) = { = Q !K(3{ 3 Q) Q! Q n=0 such that Uk ({> |) = (n+1) " [ Q [ VQ +1 Q =0 n=0 Q! (31)Q +n {n | Q +2 = (n+1) 32 " Q [ [ VQ 31 (Q 3 2)! Q =2 n=0 (31)Q +n {n |Q Hence, U ({> |) = " [ " [ Q =2 n=2 1 2n31 (n+1) 32 " " Q Q [ [ [ VQ 31 {n |Q 3 { |Q + f ({) (31)Q +n {n | Q 2 (Q 3 2)! Q =2 Q =2 n=0 It remains to determine f ({) by equating the corresponding powers in { and | at both sides. With the definition (16.54), equating the second power (Q = 2) in | yields 0= " [ 1 {n 3 { + f ({) n31 2 n=2 which indicates that f ({) = { 3 {2 23{ agreeing with f (0) = f (1) = 0. The Taylor series around { = 0 is f ({) = 1 f1 = 1 and fn = 3 2n1 for n A 1. Equating the power Q A 2 in |, Q 31 [ (n) uQ {n = n=2 S" n n=0 fn { with f0 = 0, (n+1) Q 32 Q [ VQ 31 n (31)Q +n {n { 3 { + f ({) n31 2 2 (Q 3 2)! n=2 n=0 " [ 1 (n+1) " " Q [ [ VQ 31 n n (31)Q +n {n { 3 { + f { n 2n31 2 (Q 3 2)! n=2 n=0 n=0 3 4 (m+1) n31 " " [ [ [ VQ 31 1 Q n Q +m D n C = fn3m (31) { 3{+ { 2n31 2 (Q 3 2)! m=0 n=2 n=1 3 3 4 4 (1) (m+1) n31 " " [ [ [ (31)Q VQ 31 (31)Q +m VQ 31 1 Q n C C D D {n { 3 { + f1 = fn3m {+ 2n31 2 (Q 3 2)! (Q 3 2)! m=0 n=2 n=2 = " [ 1 (1) which, by using VQ 31 = (31)Q (Q 3 2)! and f1 = 1, equals Q 31 [ n=2 (n) uQ {n = 3 4 (m+1) n31 " Q [ [ VQ 31 Q +m D n n C { + fn3m (31) { 2n31 2 (Q 3 2)! m=0 n=2 n=2 " [ 1 16.9 Problems 385 Finally, by equating the corresponding powers in {, leads to (n) uQ = = (m+1) Q n31 [ VQ 31 f (31)Q +m + n3m 2n31 2 (Q 3 2)! m=0 1 1 2n31 n31 Q (31)Q +n31 V (n) [ (m) (31)Q Q 31 + n + V (32)m (Q 3 2)! 2 (Q 3 2)! m=1 Q 31 2 q K(q3{) or to (16.37). As a check, using (16.56) the generating function (31)K(3{) = reveals that l k (Q 31) = H GQ Sq (m) m m=0 Vq { , Q 32 [ (31)Q 1 (m) + Q 31 V (32)m (Q 3 1)! 2 (Q 3 1)! m=1 Q 31 % & (31)Q (31)Q 31 K(Q 3 1 + 2) Q 1 (Q 31) Q 31 + Q 31 3 VQ 31 (32) = Q 31 + 2 (Q 3 1)! 2 (Q 3 1)! K(2) = Q 2Q 31 + Q! 1 2 1 Q 3 Q 31 + = + 2Q 31 (Q 3 1)! 2 (Q 3 1)! (Q 3 1)! (Q 3 1)! 1 = Q + Q 131 is readily verified. Also H GQ 2 16.9 Problems (i) Comparison of simulations with exact results. Many of the theoretical results are easily verified by simulations. Consider the following standard simulation: (a) Construct a graph of a certain class, e.g. an instance of the random graphs Js (Q ) with exponentially distributed link weights (b) Determine in that graph a desired property, e.g. the hopcount of the shortest path between two dierent arbitrary nodes, (c) Store the hopcount in a histogram and (d) repeat the sequence (a)-(c) q times with each time a dierent graph instance in (a). Estimate the relative error of the simulated hopcount in Js (Q ) with s = 1 for q = 104 > 105 and 106 . (ii) Given the probability generating function (16.17) of the weight of the shortest path in a complete graph with independent exponential link weights, compute the variance of ZQ . (iii) Prove the asymptotic law (16.20) of the weight of the shortest path in a complete graph with i.i.d. exponential link weights. (iv) In a communication network often two paths are computed for each important flow to guarantee su!cient reliability. Apart from the shortest path between a source D and a destination E, a second path between D and E is chosen that does not travel over any intermediate router of the shortest path. We call such a path node-disjoint to the shortest path. Derive a good approximation for the distribution of 386 The Shortest Path Problem the hopcount of the shortest node-disjoint path to the shortest path in the complete graph with exponential link weights with mean 1. 17 The e!ciency of multicast The e!ciency or gain of multicast in terms of network resources is compared to unicast. Specifically, we concentrate on a one-to-many communication, where a source sends a same message to p dierent, uniformly distributed destinations along the shortest path. In unicast, this message is sent p times from the source to each destination. Hence, unicast uses on average iQ (p) = pH [KQ ] link-traversals or hops, where H [KQ ] is the average number of hops to a uniform location in the graph with Q nodes. One of the main properties of multicast is that it economizes on the number of linktraversals: the message is only copied at each branch point of the multicast tree to the p destinations. Let us denote by KQ (p) the number of links in the shortest path tree (SPT) to p uniformly chosen nodes. If we define the multicast gain jQ (p) = H [KQ (p)] as the average number of hops in the SPT rooted at a source to p randomly chosen distinct destinations, then jQ (p) iQ (p). The purpose here is to quantify the multicast gain jQ (p). We present general results valid for all graphs and more explicit results valid for the random graph Js (Q ) and for the n-ary tree. The analysis presented here may be valuable to derive a business model for multicast: “How many customers p are needed to make the use of multicast for a service provider profitable?” Two modeling assumptions are made. First, the multicast process is assumed1 to deliver packets along the shortest path from a source to each of the p destinations. As most of the current Internet protocols forward packets based on the (reverse) shortest path, the assumption of SPT delivery is quite realistic. The second assumption is that the p multicast group member nodes are uniformly chosen out of the total number of nodes Q . This assumption has been discussed by Phillips et al. (1999). They concluded 1 The assumption ignores shared tree multicast forwarding such as core-based tree (CBT, see RFC2201). 387 388 The e!ciency of multicast that, if p and Q are large, deviations from the uniformity assumption are negligibly small. Also the Internet measurements of Chalmers and Almeroth (2001) seem to confirm the validity of the uniformity assumption. 17.1 General results for jQ (p) Theorem 17.1.1 For any connected graph with Q nodes, Qp (17.1) p+1 Proof: We need at least one edge for each dierent user; therefore jQ (p) p and the lower bound is attained in a star topology with the source at the center. We will next show that an upper bound is obtained in a line topology. It is su!cient to consider trees, because multicast only uses shortest paths without cycles. If the tree has not a line topology, then at least one node has degree 3 or the root has degree 2. Take the node closest to the root with this property and cut one of the branches at this node; we paste that branch to a node at the deepest level. Through this procedure the multicast function jQ (p) stays unaltered or increases. Continuing in this fashion until we reach a line topology demonstrates the claim. For the line topology we place the source at the origin and the other nodes at the integers 1> 2> = = = > Q 1. The links of the graph are given by (l> l + 1)> l = 0> 1> = = = > Q 2. The multicast gain jQ (p) equals H [P ], where P is the maximum of a sample of size p, without replacement, from the integers 1> 2> = = = > Q 1. Thus, ¡n¢ p jQ (p) p Pr [P n] = ¡Q31 ¢> pn Q 1 p from which jQ (p) = H [P ] is ¡ n ¢ ¡n31¢ Q31 ¡ n31 ¢ Q31 X X p31 jQ (p) = n p ¡Q31¢p = n ¡Q31¢ n=p Q31 X p ¡n¢ n=p Q 31 X p ¡n¢ pQ pQ p ¡ Q ¢= p+1 p+1 p n=p n=p p+1 PQ31 ¡ n ¢ ¡ Q ¢ where we have used that n=p p @ p+1 = 1, because it is a sum of probabilities over all possible disjoint outcomes. ¤ =p p ¡Q31 ¢= Figure 17.1 shows the allowable space for jQ (p). 17.1 General results for jQ (p) gN(m) 389 Nm/(m + 1) N1 N/2 clog(N) 1 m 1 N1 Fig. 17.1. The allowable region (in white) of jQ (p). For exponentially growing graphs, H[KQ ] = f log Q , implying that the allowable region for these graphs is smaller and bounded at the left (in dotted line) by the straight line p(f log Q ). Theorem 17.1.2 For any connected graph with Q nodes, the map p 7$ (p) jQ (p) is concave and the map p 7$ ijQ is decreasing. Q (p) Proof: Define \p to be the random variable giving the additional number of hops necessary to reach the p-th user when the first p1 users are already connected. Then we have that H [\p ] = jQ (p) jQ (p 1) Moreover, let \p0 be the random number of additional hops necessary to reach the p-th multicast group member, when we discard all extra hops of the (p 1)-st group member. An example is illustrated in Fig. 17.2. The random variable \p0 has the same distribution as \p31 , because both the (p 1)-st and the p-th group member are chosen uniformly from the remaining Q p 1 nodes. In general, \p0 6= \p31 > but, for each n, Pr[\p0 = n] = Pr[\p31 = n] and, hence, £ ¤ (17.2) H \p0 = H [\p31 ] Furthermore, we have by construction that \p \p0 with probability 1, implying that £ ¤ (17.3) H [\p ] H \p0 Indeed, attaching the p-th group member to the reduced tree takes at least as many hops as attaching that same group member to the non-reduced tree because the former is contained in the latter and the extra hops added by 390 The e!ciency of multicast the p 1 group member can only help us. Combining (17.2) and (17.3) immediately gives £ ¤ jQ (p) jQ (p 1) = H [\p ] H \p0 = jQ (p 1) jQ (p 2) (17.4) This is equivalent to the concavity of the map p 7$ jQ (p). Root A C B D 3 1 4 5 2 Fig. 17.2. A multicast session with p = 5 group members where \5 = 1 (namely link C-5). To construct \50 the three dotted lines must be removed and we observe that \50 = 2 (A-C-5), which is referred to as the reduced tree. In this example, \50 = \4 = 2 because A-C-4 and A-C-5 both consist of 2 hops. In general, they are equal in distribution because the role of group member 4 and 5 are identical in the reduced tree. (p) In order to show that jiQ is decreasing it su!ces to show that p 7$ Q (p) jQ (p) is decreasing, since iQ (p) is proportional to p. Defining jQ (0) = 0, p we can write jQ (p) as a telescoping sum p p X X {jQ (n) jQ (n 1)} = {n jQ (p) = n=1 n=1 where {n = jQ (n) jQ (n 1)> n = 1> = = = > p. Then, jQ (p) 1 X {n = p p p n=1 is the mean of a sequence of p positive numbers {n . By (17.4) the sequence {n {n31 is decreasing and, hence, jQ (p) 1 X jQ (p 1) 1 X {n {n = = p p p1 p1 p p31 n=1 n=1 17.1 General results for jQ (p) This proves that p 7$ jQ (p)@p is decreasing. 391 ¤ Next, we will give a representation for jQ (p) that is valid for all graphs. Let [l be the number of joint hops that all l uniformly chosen and dierent group members have in common, then the following general theorem holds, Theorem 17.1.3 For any connected graph with Q nodes, p µ ¶ X p jQ (p) = (1)l31 H [[l ] l (17.5) l=1 Note that jQ (1) = iQ (1) = H [[1 ] = H [KQ ] so that the decrease in average hops or the “gain” by using multicast over unicast is precisely p µ ¶ X p (1)l31 H [[l ] jQ (p) iQ (p) = l l=2 However, computing H [[l ] for general graphs is di!cult. Proof of Theorem 17.1.3: Let D1 > D2 > = = = > Dp be sets where Dl consists of all links that constitute the shortest path from the source to multicast group member l. Denote by |Dl | the number of elements in the set Dl . The multicast group members are chosen uniformly from the set of all nodes except for the root. Hence, H [[1 ] = H [|Dl |] > for 1 l Q and H [[2 ] = H [|Dl _ Dm |] > for 1 l ? m Q ¡ ¢ etc.. Now, jQ (p) = H [|D1 ^ D2 ^ · · · ^ Dp |]. Since T(D) = H [|D|] @ Q2 is a probability measure on the set of all links, we obtain from ¡Q ¢ the inclusionexclusion formula (2.3) applied to T and multiplied with 2 afterwards, H [|D1 ^ D2 ^ · · · ^ Dp |] = p X H [|Dl |] l=1 X H [|Dl _ Dm |] + · · · l?m + (1)p31 H [|D1 _ D2 _ · · · _ Dp |] µ ¶ p H [[2 ] + · · · + (1)p31 H [[p ] = pH [[1 ] 2 This proves Theorem 17.1.3. ¤ 392 The e!ciency of multicast Corollary 17.1.4 For any connected graph with Q nodes, p µ ¶ X p H [[p ] = (1)l31 jQ (l) l (17.6) l=1 The corollary is a direct consequence of the inversion formula for the binomial (Riordan, 1968, Chapter 2). Alternatively, in view of the GregoryNewton interpolation formula (Lanczos, 1988, Chapter 4, Section 2) for P ¡p¢ l l31 l j (0) where j jQ (p) = " Q (0), we can write H [[l ] = (1) Q l=1 l is the dierence operator, i (0) = i(1) i (0). Corollary 17.1.5 For any connected graph, the multicast e!ciency jQ (p) is bounded by iQ (p) H [KQ ] (17.7) jQ (p) where H [KQ ] is the average number of hops in unicast. Proof: We give two demonstrations. (a) From jQ (Q 1) = Q 1 (all nodes, source plus Q 1 destinations, of the graph are spanned by a tree (p) (see Theorem consisting of Q 1 links) and the monotonicity of p 7$ jiQ Q (p) 17.1.2) we obtain: jQ (Q 1) Q 1 1 jQ (p) = = iQ (p) iQ (Q 1) (Q 1)H [KQ ] H [KQ ] (b) Alternatively, Theorem 17.1.1 indicates that jQ (p) p, which, with the identity iQ (p) = pH [KQ ], immediately leads to (17.7). ¤ Corollary 17.1.5 means that for any connected graph, including the graph describing the Internet, the ratio of the unicast over multicast e!ciency is bounded by the expected hopcount in unicast. In order words, the maximum savings in resources an operator can gain by using multicast (over unicast) never exceeds H [KQ ], which is roughly about 15 in the current Internet. 17.2 The random graph Js (Q ) In this section, we confine to the class RGU, the random graphs of the class Js (Q ) with independent identically and exponentially distributed link weights z with mean H [z] = 1 and where Pr[z {] = 1 h3{ , { A 0. In Section 16.2, we have shown that the corresponding SPT is, asymptotically, a URT. The analysis below is exact for the complete graph NQ while asymptotically correct for connected random graphs Js (Q ). 17.2 The random graph Js (Q ) 393 17.2.1 The hopcount of the shortest path tree Based on properties of the URT, the complete probability density function of the number of links KQ (p) in the SPT to p uniformly chosen nodes can be determined. We first derive £ K (p)a¤recursion for the probability generating of the number of links KQ (p) in the function *KQ (p) (}) = H } Q SPT to p uniformly chosen nodes in the complete graph NQ . Lemma 17.2.1 For Q A 1 and all 1 p Q 1, *KQ (p) (}) = (Q p 1)(Q 1 + p}) p2 } * (}) + K (p) Q 1 2 *KQ 1 (p1) (}) (Q 1)2 (Q 1) (17.8) Proof: To prove (17.8), we use the recursive growth of URTs: a URT of size Q is a URT of size Q 1, where we add an additional link to a uniformly chosen node. 1 N N 2 N N Case A Case B Case C and D Fig. 17.3. The several possible cases in which the Q -th node can be attached uniformly to the URT of size Q 1. The root is dark shaded while the p multicast member nodes are lightly shaded. In order to obtain a recursion for KQ (p) we distinguish between the p uniformly chosen nodes all being in the URT of size Q 1 or not. The p probability that they all belong to the tree of size Q 1 is equal to 1 Q31 (case A in Fig. 17.3). If they all belong to the URT of size Q 1, then we have that KQ (p) = KQ31 (p). Thus, we obtain µ ¶ h i p p *KQ (p) (}) = 1 *KQ 31 (p) (}) + H } 1+OQ 31 (p) (17.9) Q 1 Q 1 where OQ31 (p) is the number of links in the subtree of the URT of size Q 1 spanned by p 1 uniform nodes and the “one” refers to the link from 394 The e!ciency of multicast the added Q -th node to its ancestor in the URT of size Q 1. We complete the proof by investigating the generating function of OQ31 (p). Again, there are two cases. In the first case (B in Fig. 17.3), the ancestor of the added Q -th node is one of the p 1 previous nodes (which can only happen if it is unequal to the root), else we get one of the cases C and D in Fig. 17.3. The probability of the first event equals p31 Q31 , the probability of the latter equals p31 1 Q31 . If the ancestor of the added Q-th node is one of the p 1 previous nodes, then the number of links OQ31 (p) equals KQ31 (p 1), otherwise the generating function of the number of additional links equals ¶ µ 1 1 (}) *KQ 31 (p) (}) + * 1 Q p Q p KQ 31 (p31) The first contribution comes from the case where the ancestor of the added Q -th node is not the root, and the second from where it is equal to the root, 1 1 = Q3p . Therefore, which has probability Q313(p31) i p1 h (}) * H } OQ 31 (p) = Q 1 KQ 31 (p31) µ ¶ *KQ 31 (p31) (}) Q p Q p1 *KQ 31 (p) (}) + + Q 1 Q p Q p p Q p1 = *KQ 31 (p31) (}) + *KQ 31 (p) (}) (17.10) Q 1 Q 1 Substitution of (17.10) into (17.9) leads to (17.8). ¤ Since jQ (p) = H[KQ (p)] = *0KQ (p) (1), we obtain the recursion for jQ (p), µ ¶ p2 p2 p (p) + jQ31 (p 1) + jQ (p) = 1 j Q31 (Q 1)2 (Q 1)2 Q 1 (17.11) Theorem 17.2.2 For all Q 1 and 1 p Q 1, p µ ¶ h i p!(Q 1 p)! X (Q + n}) p KQ (p) *KQ (p) (}) = H } (1)p3n = 2 (1 + n}) n ((Q 1)!) n=0 (17.12) Consequently, (m+1) (p) p!(1)Q3(m+1) VQ Sm Pr [KQ (p) = m] = ¢ ¡ (Q 1)! Q31 p (17.13) 17.2 The random graph Js (Q ) (m+1) 395 (p) where VQ and Sm denote the Stirling numbers of first and second kind (Abramowitz and Stegun, 1968, Section 24.1). Proof: By iterating the recursion (17.8) for small values of p, the computations given in van der Hofstad et al. (2006a, Appendix) suggest the solution (17.12) for (17.8). One can verify that (17.12) satisfies (17.8). This proves (17.12) of Theorem 17.2.2. Using (Abramowitz and Stegun, 1968, Section 24.1.3.B), the Taylor expansion around } = 0 equals *KQ (p) (}) = p 1 p!Q(Q 3 1 3 p)! [ p K(Q + n}) 3 (31)p3n (Q 3 1)! Q!K(1 + n}) Q n n=0 (m+1) Q 31 p [ (31)Q 3(m+1) VQ p!Q(Q 3 1 3 p)! [ p nm } m (31)p3n (Q 3 1)! Q! n m=1 n=0 # p $ (m+1) Q 31 [ p p!Q(Q 3 1 3 p)! [ (31)Q 3(m+1) VQ (31)p3n nm } m = n (Q 3 1)! Q! m=1 n=0 = Using the definition of Stirling numbers of the second kind (Abramowitz and Stegun, 1968, 24.1.4.C), p [ p (p) p!Sm (31)p3n nm = n n=0 (p) for which Sm = 0 if m ? p, gives Q 31 *KQ (p) (}) = (p!)2 (Q 3 1 3 p)! [ 2 ((Q 3 1)!) (m+1) (31)Q 3(m+1) VQ (p) m Sm } m=1 This proves (17.13) and completes the proof of Theorem 17.2.2. ¤ Figure 17.4 plots the probability density function of K50 (p) for dierent values of p. Corollary 17.2.3 For all Q 1 and 1 p Q 1, jQ (p) = H [KQ (p)] = Q X 1 pQ Q p n (17.14) n=p+1 and P 1 2 (p) p2 Q 2 Q jQ Q 1+p n=p+1 n2 jQ (p) Var [KQ (p)] = Q +1p (Q + 1 p) (Q p)(Q + 1 p) (17.15) The formula (17.14) is proved in two dierent ways. The earlier proof presented in Section 17.6 below does not rely on the recursion in Lemma 17.2.1 nor on Theorem 17.2.2. The shorter proof is presented here. Formula (17.14) can be expressed in terms of the digamma function #({) as µ ¶ #(Q ) #(p) 1 (17.16) jQ (p) = pQ Q p 396 The e!ciency of multicast 0.5 Pr[H50(m) = j] 0.4 0.3 0.2 0.1 0.0 0 10 20 30 40 50 j hops Fig. 17.4. The pdf of K50 (p) for p = 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 47. Proof of Corollary 17.2.3: The expectation and variance of KQ (p) will not be obtained using the explicit probabilities (17.13), but by rewriting (17.12) as p k l K(p + 1)K(Q 3 p) [ p (31)p3n CwQ 31 wQ 31+n} 2 w=1 n K (Q) n=0 k l K(p + 1)K(Q 3 p) (31)p CwQ 31 wQ 31 (1 3 w} )p = w=1 K2 (Q) *KQ (p) (}) = (17.17) Indeed, k l K(p + 1)K(Q 3 p) Q 31 Q 31 p } p (31) w C C (1 3 w ) } w w=}=1 K2 (Q) k l K(p + 1)K(Q 3 p) p(31)p31 CwQ 31 wQ log w(1 3 w)p31 = > 2 w=1 K (Q) k l K(p + 1)K(Q 3 p) H[KQ (p) (KQ (p) 3 1)] = (31)p C}2 CwQ 31 wQ 31 (1 3 w} )p 2 w=}=1 K (Q) K(p + 1)K(Q 3 p) p31 = p(31) K2 (Q) k l × CwQ 31 wQ log2 w(1 3 w)p32 [3(p 3 1)w + (1 3 w)] H[KQ (p)] = w=1 We will start with the former. Using Cwl (1 3 w)m |w=1 = m!(31)m l>m and Leibniz’ rule, we find H[KQ (p)] = l K(p + 1)K(Q 3 p) Q 3 1 Q 3p k Q p! C w log w w=1 K2 (Q) p31 w 17.2 The random graph Js (Q ) 397 Since Cwn [wq log w]w=1 = q [ 1 q! (q 3 n)! m=q3n+1 m we obtain expression (17.14) for H[KQ (p)]. We now extend the above computation to H[KQ (p)(KQ (p) 3 1)] that we write as H[KQ (p) (KQ (p) 3 1)] = K(p + 1)K(Q 3 p) (U1 + U2 ) K2 (Q) (17.18) where k l U1 = p(p 3 1)(31)p32 CwQ 31 wQ +1 log2 w(1 3 w)p32 w=1 k l U2 = p(31)p31 CwQ 31 wQ log2 w(1 3 w)p31 w=1 Using Cwn [wq log2 w]w=1 = 2 6 53 42 q q q q [ [ [ [ q! 1 1 1 q! 8 7C D 3 = (q 3 n)! l=q3n+1 m=l+1 lm (q 3 n)! l l2 l=q3n+1 l=q3n+1 we obtain, k Q 3 1 l p(p 3 1)(p 3 2)!CwQ 3p+1 wQ +1 log2 w w=1 p32 6 53 42 Q +1 Q +1 Q 3 1 [ 1 [ 1 8 7C D 3 = (Q + 1)! p32 n n2 n=p+1 n=p+1 U1 = Similarly, Q 3 1 l k p(p 3 1)!CwQ 3p wQ log2 w w=1 p31 6 42 53 Q Q Q 3 1 [ [ 1 1 8 D 3 7C = Q! n n2 p31 n=p+1 n=p+1 U2 = Substitution into (17.18) leads to 6 53 42 Q Q [ [ 1 1 p2 Q 2 8 7C D 3 H[KQ (p)(KQ (p) 3 1)] = (Q + 1 3 p)(Q 3 p) n n2 n=p+1 n=p+1 + Q [ 1 2p(p 3 1)Q (Q + 1 3 p)(Q 3 p) n=p+1 n 2 (p), we From jQ (p) = H[KQ (p)] and Var [KQ (p)] = H[KQ (p)(KQ (p) 3 1)] + jQ (p) 3 jQ obtain (17.15). This completes the proof of Corollary 17.2.3. ¤ For Q = 1000, Fig. 17.5 illustrates the typical behaviorp for large Q of the expectation jQ (p) and the standard deviation Q (p) = Var [KQ (p)] for all values of p. For any spanning tree, the number of links KQ (Q 1) is precisely Q 1, so that Var[KQ (Q 1)] = 0. 398 The e!ciency of multicast 1000 12 10 8 600 6 400 4 g1000 (m) V1000 (m) Standard deviation VN (m) average hopcount gN (m) 800 200 2 0 200 400 600 0 1000 800 m Fig. 17.5. The average number of hops jQ (p) (left axis) in the SPT and the corresponding standard deviation Q (p) (right axis) as a function of the number p of multicast group members in the complete graph with Q = 1000. Figure 17.5 also indicates that the standard deviation Q (p) of KQ (p) is much smaller than the average, even for Q = 1000. In fact, we obtain from (17.15) that Var [KQ (p)] 2 (p) 2jQ Q 1+p 2Q jQ (p) 2 = r(jQ (p)) jQ (p) = PQ 1 Q +1p Q p p n=p+1 n This bound implies that with probability converging to 1 for every p = 1> = = = > Q 1, ¯ ¯ ¯ KQ (p) ¯ ¯ ¯% 1 ¯ ¯ jQ (p) (p)3jQ (p) In van der Hofstad et al. (2006a), the scaled random variable KQs jQ (p) g KQ (p)3jQ (p) s $ jQ (p) is proved to tend to a Gaussian random variable, i.e. s Q (0> 1), for all p = r( Q ). For large graphs of the size of the Internet and larger, this observation implies that the mean jQ (p) = H [KQ (p)] is a good approximation for the random variable KQ (p) itself because the variations of KQ (p) around the mean are small. Consequently, it underlines the importance of jQ (p) as a significant measure for multicast. 17.2 The random graph Js (Q ) 399 17.2.2 The weight of the shortest path tree In this section, we summarize results on the weight ZQ (p) of the SPT and omit derivations, but refer to van der Hofstad et al. (2006a). For all 1 p Q 1, the average weight of the SPT is H [ZQ (p)] = p X m=1 1 X 1 Q m n Q31 (17.19) n=m In particular, if the shortest path tree spans the whole graph, then for all Q 2, H [ZQ (Q 1)] = Q31 X 1 q2 q=1 (17.20) 2 from which H [ZQ (Q 1)] ? (2) = 6 for any finite Q . The variance is X 1 X1 X 1 4 X 1 5 Var [ZQ (Q 1)] = + 4 Q n3 m3 n m4 Q31 Q31 m Q31 n=1 m=1 n=1 m=1 or asymptotically, for large Q , 4 (3) +R Var [ZQ (Q 1)] = Q µ log Q Q2 (17.21) ¶ (17.22) Asymptotically for large Q , the average weight of a shortest path tree is (2) = 1=645 = = =, while the average weight of the minimum spanning tree, given by (16.49), is (3) ? (2). This result has an interesting implication. The Steiner tree is the minimum weight tree that connects a set of p members out of Q nodes in the graph. The Steiner tree problem is NP-complete, which means that it is unfeasible to compute for large Q . If p = 2, the weight of the Steiner tree equals that of the shortest path, ZSteiner>Q (2) = ZQ , while for p = Q , we have ZSteiner>Q (Q ) = ZMST . Hence, for any p ? Q and Q , H[ZSteiner>Q (p)] (3) because the weight of the Steiner tree does not decrease if the number of members p increases. The ratio (2) (3) = 1=368 indicates that the use of the SPT (computationally easy) never performs on average more than 37% worse than the optimal Steiner tree (computationally unfeasible). In a broader context and referring to the concept of the “Prize of Anarchy”, which is broadly explained in Robinson (2004), the SPT used in a communications network is related to the Nash equilibrium, while the Steiner tree gives the hardly achievable global optimum. Simulations — even for small Q, which allow us to cover the entire p-range 400 The e!ciency of multicast 10 N fW* (m)(x) 10 10 0 m=1 m=5 m = 10 m = 20 m = 30 m = 40 m = 50 m = 60 m = 70 m = 80 m = 90 m = 95 -1 -2 N = 100 10 10 -3 Normalized Gumbel Normalized Gaussian N(0,1) -4 -4 -2 0 2 4 6 x Fig. 17.6. The pdf of the normalized random variable ZQ (p) for Q = 100. as illustrated in Fig. 17.6 — indicate that the normalized random variable ZQW (p) = ZQs(p)3H[ZQ (p)] lies between a normalized Gaussian Q (0> 1) and ydu[ZQ (p)] a normalized Gumbel (see Theorem 6.4.1). Fig. 17.6 may suggest that, for all p ? Q , {3 3I 6 lim Pr [ZQW (p) {] h3h Q<" (17.23) where = 0=57721=== is Euler’s constant. For the particular case of p = 1, the relation to the Gumbel distribution has been shown in Section 16.4 where the correct limit law is given in (16.20). However, van der Hofstad et al. (2006b) show that the weight of the shortest path tree for p = Q 1 tends to a Gaussian, s ¢ ¡ g 2 Q (ZQ (Q 1) (2)) $ Q 0> SPT 2 with SPT = 4 (3) 4=80823 as follows from (17.22). This shows that simulations alone may be inadequate to deduce asymptotic behavior. Finally, Janson (1995) gave the related result for the minimum spanning tree. He extended Frieze’s result (16.49) by proving that the scaled weight of the minimum spanning tree also tends to a Gaussian for large Q , s ¡ ¢ g 2 Q (ZMST (3)) $ Q 0> MST 17.3 The n-ary tree 401 2 where2 MST = 6 (4) 4 (3) 1=6857. k=2 k=5 Fig. 17.7. The left hand side tree (n = 2) has Q = 31 and G = 4, while the right hand side (n = 5) has Q = 31 and G = 2. 17.3 The n-ary tree In this section, we consider the n-ary tree of depth3 G with the source at the root of the tree and p receivers at randomly chosen nodes (see Fig. 17.7). In a n-ary tree the total number of nodes satisfies Q = 1 + n + n2 + · · · + nG = n G+1 1 n1 (17.24) Theorem 17.3.1 For the n-ary tree, jQ>n (p) = Q 1 G31 X m=0 Proof: See Section 17.7. ¡Q313 nm+1 31 ¢ n G3m n31 ¡Qp31¢ (17.25) p ¤ Unfortunately, the m-summation seems di!cult to express in closed form. Observe that jQ (Q 1) = Q 1> because all binomials vanish. The sum extends over all levels m G1> for which the remaining number of nodes in 2 Wästlund (2005) succeeded in computing the triple sum in Janson’s original result 2 M ST = 3 " [ " [ " [ 4 (l + n 3 1)!nn (l + m)l32 m 32 45 l!n! (l + m + n)l+n+2 l=0 m=1 n=1 The depth G is equal to the number of hops from the root to a node at the leaves. 402 The e!ciency of multicast 10000 8000 gN (m) 6000 k=3 k=2 4000 k=5 k = 10 random graph k-ary tree 0.8 m law 4 N = 10 2000 0 0 2000 4000 6000 8000 10000 m Fig. 17.8. The multicast gain jQ (p) computed for the n-ary tree with four values of n, the random graph (with “eective” nrg = h = 2=718===), and the Chuang-Sirbu power law for Q = 104 on a linear scale where the prefactor H [KQ ] is given by (16.10). the lower levels o (i.e. G o A m) is larger than p nodes. In some sense, we may regard (17.25) as an (exact) expansion around p = Q 1. Explicitly, µ jQ>n (p) = Q 1 n 1 G G31 X m=2 p Q 1 nm+1 31 31 n31 n G3m Y ¶ n G31 n µ Y 1 t=0 µ 1 t=0 p Q 1t p Q 1t ¶ ¶ (17.26) which shows that jQ>n (p) is a polynomial in p of degree Q31 n . Moreover, the terms in the m-sum rapidly decrease; their ratio equals ³ ´ 31 Q nm+1 31 n31 p 1 m n 31 Q313t t= n31 n 1 ? n à 1 !nm p m 31 Q 1 nn31 ?? 1 17.3 The n-ary tree 403 Figure 17.8 indicates that formula j(17.25), althoughk derived subject to (n31)] 1 , where b{c is the (17.24), also seems valid when G = log[1+Q log n largest integer smaller than or equal to {. This suggests that the deepest level G need not be filled completely to count nG nodes and that (17.25) may extend to “incomplete” n-ary trees. As further observed from Fig. 17.8, jQ>n (p) is monotonously decreasing in n. Hence, it is quite likely that he map n 7$ jQ>n (p) is decreasing in n 5 [1> Q 1]. Intuitively, this conjecture can be understood from Fig. 17.7. Both the n = 2 and n = 5 trees have an equal number of nodes. We observe that the deeper G (or the smaller n), the more overlap is possible, hence, the larger jQ>n (p). Theorem 17.1.1 can also be deduced from (17.25). The lower bound is attained in a star topology where n = Q 1, G = 1 and H[KQ ] = 1. The upper bound is attained in a line topology where n = 1, G = Q 1 and H[KQ ] = Q2 . Furthermore, for real values of n 5 [1> Q 1], the set of curves specified by (17.25) covers the total allowable space of jQ>n (p), as shown in Fig. 17.1. This suggests to consider (17.25) for estimating n in real topologies. Since jQ (1) = H[KQ ], the average hopcount in a n-ary tree follows from (17.25) as H[KQ ] = Q 1 G31 X G31 Q 1 n n3131 1 X G3m nm+1 1 = n Q 1 Q 1 n1 m+1 n m=0 G3m m=0 QG G 1 = + Q 1 (Q 1)(n 1) n 1 (17.27) For large Q , we find with ¹ º log[1 + Q (n 1)] G= 1 logn Q + logn (1 1@n) + R(1@Q ) log n that 1 +R H[KQ ] = logn Q + logn (1 1@n) n1 µ logn Q Q ¶ (17.28) Comparing (17.28) with the average hopcount in the random graph (16.10) shows equality to first order if nrg = h. Moreover, both the second order 1 terms 1 = 0=42 and log(1 1@h) h31 = 1=04 are R(1) and independent of Q . This shows that the multicast gain in the random graph is well approximated by jQ>h (p). 404 The e!ciency of multicast 17.4 The Chuang—Sirbu law We discuss the empirical Chuang—Sirbu scaling law, which states that jQ (p) H [KQ ] p0=8 for the Internet. Based on Internet measurements, Chuang and Sirbu (1998) observed that jQ (p) H [KQ ] p0=8 . Subsequently, Phillips et al. (1999) dubbed this observation the Chuang—Sirbu law. Corollary 17.1.5 implies that the empirical law of Chuang—Sirbu cannot hold true for all p Q . Indeed, if jQ (p) = H [KQ ] p0=8 > we obtain from the inequality (17.7) and the identity iQ (p) = pH [KQ ], that p0=2 H [KQ ]. Write p = {Q for a fixed 0 ? { ? 1 and { independent of Q . Hence, we have shown that Q] $ 0> Corollary 17.4.1 For all graphs satisfying the condition that H[K Q 0=2 for large Q , the empirical Chuang—Sirbu law does not hold in the region p = {Q with 0 ? { 1 and su!ciently large Q . The most realistic graph models for the Internet assume that H [KQ ] f log Q , since this implies that the number of routers that can be reached from any starting destination grows exponentially with the number of hops. For these realistic graphs, Corollary 17.4.1 states that empirical Chuang— Sirbu law does not hold for all p. On the other hand, there are more regular graphs (such as a g-lattice, where H[KQ ] ' g3 Q 1@g ) with H [KQ ] Q 0=2+ (and A 0) for which the mathematical condition p0=2 H [KQ ] is satisfied for all p and Q . As shown in Van Mieghem et al. (2000), however, these classes of graphs, in contrast to random graphs, are not leading to good models for SPTs in the Internet. 17.4.1 Validity range of the Chuang—Sirbu law For the random graph Js (Q ), the SPT is very close to a URT for large Q and with (16.10), we obtain iQ (p) p(log Q + 1) From the exact jQ (p) formula (17.16) for the random graph Js (Q ), the asymptotic for large Q and p follows as µ ¶ pQ Q 1 jQ (p) log (17.29) Q p p 2 The above scaling explains the empirical Chuang—Sirbu law for Js (Q ): ¡for¢p pQ log Q small with respect to Q, the graphs of (log Q +1)p0=8 and Q3p p 1 2 look very alike in a log-log plot, as illustrated in Fig. 17.9. 17.4 The Chuang—Sirbu law 405 Using the asymptotic properties of the digamma function #, we obtain (17.29) as an excellent approximation for large Q (and all p) or, in normalized form with p = {Q and 0 ? { ? 1, jQ ({Q ) + 0=5 { log { Q {1 (17.30) Q ] 0=8 = H[K The normalized Chuang—Sirbu law is jQ ({Q) Q Q 0=2 { . It is interesting Q] = 1, since then both to note that the Chuang—Sirbu law is “best” if H[K Q 0=2 endpoints { = 0 and { = 1 coincide with (17.30). This optimum is achieved when Q 250 000, which is of the order of magnitude of the estimated number of routers in the current Internet. This observation may explain the fairly good correspondence on a less sensitive log-log scale with Internet measurements. At the same time, it shows that for a growing Internet, the fit of the Chuang—Sirbu law will deteriorate. For Q 106 , the Chuang—Sirbu law underestimates jQ (p) for all p. 7 10 0.8 m law random graph 6 10 5 10 gN (m) 1.00 4 10 0.95 Effective Power Exponent 3 10 2 10 0.90 0.85 0.80 Number of Nodes N 0.75 1 1 10 10 0 10 1 10 2 10 2 3 10 10 3 4 5 10 10 4 10 10 6 7 10 10 5 10 8 9 10 10 6 10 10 10 7 10 m Fig. 17.9. The multicast e!ciency for Q = 10m with m = 3> 4> ===> 7. The endpoint of each curve jQ (Q 1) = Q 1 determines Q . The insert shows the eective power exponent versus Q . 17.4.2 The eective power exponent (Q ) For small to moderate values of p, jQ (p) is very close to a straight line in a log-log plot. This “power law behavior” implies that log jQ (p) 406 The e!ciency of multicast log H(KQ )+(Q ) log p, which is a first order Taylor expansion of log jQ (p) in log p. This observation suggests the computation4 of the eective power exponent (Q ) as ¯ g log jQ (p) ¯¯ (17.31) (Q ) = g log p ¯ p=1 Only for a straight line, the dierential operator can be replaced by the dierence operator such that (Q) W (Q )> where W (Q) = jQ (2) log H[K Q] log 2 (17.32) In general, for small p, the eective power exponent (17.31) is not a constant 0.8 as in the Chuang—Sirbu law, but dependent on Q . Since jQ (p) is concave jQ (p) by Theorem 17.1.2, (Q ) is the maximum possible value for g log at any g log p p 1. A direct consequence of Theorem 17.1.1 is that the eective power exponent (Q ) 5 [ 12 > 1]. From recent Internet measurements, Chalmers and Almeroth (2001) found that 0=66 (Q) 0=7. The eective power exponent (Q) as defined in (17.31) for the random graph is ³ ´ 2 2 Q #(Q ) + 6 + 6Q ¢ ¡ (Q ) = (Q 1) #(Q ) + ( 1) + Q1 while, according to the definition (17.32), W (Q ) = jQ (2) log H[K Q] log 2 = 1 + log2 (Q 1) (#(Q ) + 3@2 + 1@Q ) (Q 2) (#(Q ) + 1 + 1@Q ) ¸ The dierence (Q ) W (Q ) monotonously decreases and is largest, 0.048 at Q = 3 while 0.0083 at Q = 105 and 0.0037 at Q = 1010 . This eective power exponent (Q ) is drawn in the insert of Fig. 17.9, which shows that (Q ) is increasing and not a constant close to 0.8. More interestingly, for Q] large Q , we find with (16.10) and (16.11) that (Q ) Var[K H[KQ ] and that Q] limQ<" (Q ) = 1. In Van Mieghem et al. (2000), the ratio = Var[K H[KQ ] pops up naturally as the extreme value index of the distribution of the link weights in a topology. Since measurements of the hopcount in Internet Q] indicate that Var[K H[KQ ] 1, which corresponds to a regular distribution, this extreme value index strongly favors the model of the hopcount based on 4 Although (17.5) only has meaning for integer p, analytic continuation to a complex variable is possible and, hence, dierentiation can be defined. 17.5 Stability of a multicast shortest path tree 407 shortest paths in Js (Q ), although random graphs do not model the Internet topology well. Thus, if the number of nodes in the Internet is still growing, we suggest, only for small to moderate values of p, the consideration of a power law approximation for the multicast gain Va r [KQ ] jQ (p) H [KQ ] p H[KQ ] instead of the Chuang—Sirbu law. In summary, many properties in nature seem linear on an insensitive loglog scale. However, deriving from these plots simple and attractive power laws for complicated matter, seems a little oversimplified5 . 17.5 Stability of a multicast shortest path tree We now turn to the problem of quantifying the stability in a multicast tree. Inspired by Poisson arrival processes, at a single instant of time, we assume that either one or zero group members can leave. In the sequel, we do not make any further assumption about the time-dependent process of leaving/joining a multicast group and refrain from dependencies on time. The number of links in the tree that change after one multicast group member leaves the group has been chosen as measure for the stability of the multicast tree. If we denote this quantity by Q (p), then, by definition of jQ (p), the average number of changes equals H [Q (p)] = jQ (p) jQ (p 1) (17.33) Since jQ (p) is concave (Theorem 17.1.2), H [Q (p)] is always positive and decreasing in p. If the scope of p is extended to real numbers, H [Q (p)] 0 (p) which simplifies further estimates. jQ The situation where on average less than one link changes if one multicast group member leaves may be regarded as a stable regime. Since H [Q (p)] is always positive and decreasing in p, this stable regime is reached when the group size p exceeds p1 , which satisfies H [Q (p1 )] = 1. For example, for the URT that is asymptotically the SPT for the class RGU defined in Section 16.2.2, this condition approximately follows from (17.29) as µ ¶ µ ¶ pQ Q (p 1)Q Q log log (17.34) H [Q (p)] Q p p Q p+1 p1 5 Many recent articles devote attention to power law behavior but most of them seem prudent: just recall the immense interest (hype?) a few years ago in the long range and self-similar nature of Internet tra!c and the relation to the “simple” power law with only the Hurst parameter (comparable to (Q) here) in the exponent. 408 The e!ciency of multicast p Let { = Q , then 0 ? { ? 1 and µ ¶ H [Q (p)] { ({ 1@Q ) 1 log { + log { Q 1{ 1 ({ 1@Q ) Q After expanding the second term in a Taylor series around { to first order in Q1 , µ ¶ { 1 log { 1 H [Q ({Q )] +R (1 {)2 Q For large Q , H [Q ({1 Q)] 1 occurs when {1 = 0=3161, which is the { = 1. For the class RGU, a stable tree as defined solution in { of {313log (13{)2 above is obtained when the multicast group size p is larger than p1 = 0=3161Q Q3 . In the sequel, since p1 is high and of less practical interest, we will focus on multicast group sizes smaller than p1 = The computation of p1 for other graph types turns out to be di!cult. Since, as mentioned above, the comparison with Internet measurement (Van Mieghem et al., 2001a) shows that formula (17.29) provides a fairly good estimate, we expect that p1 Q3 also approximates well the stable regime in the Internet. The following theorem quantifies the stability in the class RGU. Theorem 17.5.1 For su!ciently large Q and fixed p, the number of changed edges Q (p) in a random graph Js (Q ) with uniformly distributed link weights tends to a Poisson distribution, n 3H[{Q (p)] (H [Q (p)]) Pr [Q (p) = n] h n! (17.35) where H [Q (p)] = jQ (p) jQ (p 1) and jQ (p) is given by (17.16) or approximately by (17.29). Proof: In Section 16.2.1 we have mentioned that the SPT in the class RGU is an URT for large Q . In addition, the random variable for the number of hops KQ from the root to an arbitrary node tends, for large Q , to a Poisson random variable with mean H [KQ ] log Q + 1 as shown in Section 16.3.1. Now, Q (p) = KQ (p) KQ (p 1) is the positive discrete random variable that counts the absolute value of the dierence between the hopcount kU<p from the root (source) to user p and the hopcount kU<p31 from the root to the user closest in the tree to p, which we here relabel by p 1. Both users p and p 1 are not independent nor the two random variables kU<p and kU<p31 are independent in general due to possible overlap in their paths. If the shortest paths from the root to each of the two users p and p 1 17.5 Stability of a multicast shortest path tree 409 Root A B D m m1 Fig. 17.10. A sketch of a uniform recursive tree, where kU$p = 3 and kU$p1 = 4 and the number of links in common is two (shown in bold Root-A-B). overlap, there always exists a node in the SPT, say node E as illustrated in Fig. 17.10, that sees the partial shortest paths from itself to p and p 1 as non-overlapping and independent. Since the SPT is a URT, the subtree rooted at that node E (enclosed in dotted line in Fig. 17.10) is again a URT as follows from Theorem 16.2.1. With respect to E, the nodes p and p 1 are uniformly chosen and the number of links Q (p) that change if the p-th node leaves is just its hopcount with respect to E (instead of the original root). We denote the unknown number of nodes in that subtree rooted at E by (p) Q . We have that (p) (p 1) because by adding a group member, the size of the subtree can only decrease. For large Q and small p, (p) is large such that the above mentioned asymptotic law of the hopcount applies. If both p and Q are large, (p) will become too small for the asymptotic law to apply. Thus, for fixed p and large Q , this implies that Q (p) tends to a Poisson random variable with mean H [Q (p)]. ¤ Simulations in Van Mieghem and Janic (2002) indicate that the Poisson law seems more widely valid than just in the asymptotic regime (Q $ 4). The proof can be extended to a general topology. Assume for a certain class of graphs that the pdf of the hopcount Pr [KQ = n] and the multicast e!ciency jQ (p) can be computed for all sizes Q. The subtree rooted at E is again a SPT in a subcluster of size (p), which is an unknown random variable. The argument similar as the one in the proof above shows that £ ¤ Pr [Q (p) = n] = Pr K(p) = n 410 The e!ciency of multicast This argument implicitly assumes that all multicast users are uniformly distributed over the graph. By the law of total probability, Q ¤ X £ ¤ £ Pr K(p) = n|(p) = q Pr [(p) = q] Pr K(p) = n = = q=1 Q X Pr [Kq = n] Pr [(p) = q] q=1 which, unfortunately shows that the pdf of (p) is required to specify Pr [Q (p) = n]. However, we can proceed further in an approximate way by replacing the unknown random variable (p) by its best estimate, H [(p)]. In that approximation, the average size H [(p)] of the shortest path subtree rooted at E can be specified, at least in principle, with the use of (17.33). £ ¤ ¤ PH[(p)]31 £ n Pr KH[(p)] = n , by equating Indeed, since H KH[(p)] = n=1 ¤ £ H KH[(p)] = jQ (p) jQ (p 1) a relation in one unknown H [(p)] is found and can be solved for H [(p)]. In conclusion, we end up with the approximation £ ¤ Pr [Q (p) = n] Pr KH[(p)] = n which roughly demonstrates that, in general, Pr [Q (p) = n] is likely related to the hopcount distribution in that certain class of graphs. Unfortunately, for very few types of graphs, both the pdf Pr [KQ = n] and the multicast gain jQ (p) can be computed. This fact augments the value of Theorem 17.5.1, although the class RGU is not a good model for the graph of the Internet. Fortunately, the shortest path tree deduced from that class seems a reasonable approximation as shown in Fig. 16.4 and su!cient to provide first order estimates. 17.6 Proof of (17.16): jQ (p) for random graphs Before embarking with the proof of formula (17.16), we first prove the following lemma. Lemma 17.6.1 For d A e, V(d> e) = e [ (d 3 n)! 1 n=1 (e 3 n)! n = d! [#(d + 1) 3 #(d 3 e + 1)] e! and V(e> e) = e [ 1 n=1 n = #(e + 1) + 17.6 Proof of (17.16): jQ (p) for random graphs 411 Proof: We start by writing V(d> e) = e [ (d 3 n) · · · (e 3 n + 1) n n=1 =d e [ e [ (d 3 1 3 n) · · · (e 3 n + 1) 3 (d 3 1 3 n) · · · (e 3 n + 1) n n=1 n=1 and by the recurrence for the binomial Since (d 3 1 3 n) · · · (e 3 n + 1) = (d 3 e 3 1)! d313n e3n Se d313n d31 = > we have that n=1 e3n e31 V(d> e) = dV(d 3 1> e) 3 1 (d 3 1)! d 3 e (e 3 1)! After s iterations, we have V(d> e) = d(d 3 1) · · · (d 3 s + 1)V(d 3 s> e) 3 s31 [ d! 1 (e 3 1)! m=0 (d 3 m)(d 3 m 3 e) and, if s = d 3 e, the recursions stops with result, d3e31 e [ 1 d! d! [ 1 3 e! n=1 n (e 3 1)! m=0 (d 3 m)(d 3 m 3 e) 3 4 # d $ d3e d3e e d [ [1 d! C [ 1 1D d! [ 1 d! [ 1 3 3 3 = = e! n=1 n e! n=1 n n=e+1 n e! n=1 n n=1 n V(d> e) = from which the lemma follows. ¤ l k (Q ) Proof of equation (17.16): We will investigate H [[l ] = H [l in the URT with Q nodes. Here H [[l ] is the number of joint hops in a multicast SPT from the root to l uniformly chosen k l nodes in the URT and where all the group member nodes are dierent from the root. Let ˜ l be the same quantity where we allow the group member nodes to be the root. Then, H [ k l ˜ l = Q 3 l H [[l ] H [ Q 1 since there are l possibilities each with probability Q that one of the nodes equals the root, in which case [l = 0. k l ˜ l is deduced from Fig. 17.11, where two clusters are The average number of joint hops H [ shown each with respectively n and Q 3 n nodes. The first cluster with n nodes does not possess the root (dark shaded), but it contains the l multicast group members (light shaded). There is already at least 1 joint hop because the link between the root and node D, that can be viewed as the root of the first cluster, is used by all l group members lying in the first cluster. Given the size n of the first cluster, the probability that all l uniformly chosen group members belong to the first n(n31)···(n3l+1) cluster equals Q (Q 31)···(Q 3l+1) because the probability that the first group member belongs to n that cluster, which is Q , the probability that the second group member also belongs to the first n31 cluster, which is Q 31 and so on. Since the size of the first cluster connected to the root is uniform in between 1 and Q 3 1, the probability that the size is n equals Q 131 . When all l nodes are in that first cluster of size n, [l is at least 1, and the problem restarts, but with Q replaced by n and D being the root. Hence, if all l group members belong tolthe first cluster, the average number of k S 31 n(n31)···(n3l+1) ˜ (n) because we must sum over all possible joint hops is Q 131 Q 1 + H [ l n=1 Q (Q 31)···(Q 3l+1) sizes for the first cluster. If not all l group member nodes are in the first cluster, the group member nodes are divided over the two clusters. But, in that case, we have no joint overlaps or [l = 0. 412 The e!ciency of multicast Root A N k nodes k nodes l k ˜ (Q ) -recursion. Fig. 17.11. The two contributing clusters leading to the H [ l Thus, if not all l group members nodes are in the first cluster, the only way that there are possible joint overlaps ([l A 0), is that all l group member nodes are in the second cluster. However, by removing the first cluster, we are left again with a uniform recursive tree of size Qk 3 n. The l S 31 (Q 3n)(Q 3n31)···(Q 3n3l+1) ˜ (Q 3n) . average number of joint hops in this case is Q 131 Q H [ l n=1 Q (Q 31)···(Q 3l+1) Adding both contributions results in the recursion formula k l ˜ (Q ) = H [ l Q 31 k l [ n(n 3 1) · · · (n 3 l + 1) 1 ˜ (n) 1 + 2H [ l Q 3 1 n=1 Q (Q 3 1) · · · (Q 3 l + 1) (17.36) We next write (Q ) l l k ˜ (Q ) = = Q(Q 3 1) · · · (Q 3 l + 1)H [ l l k Q! ˜ (Q ) H [ l (Q 3 l)! then the above recurrence equation (17.36) turns into (Q ) l = Q 31 [ 1 (n) [n(n 3 1) · · · (n 3 l + 1) + 2l ] Q 3 1 n=1 = Q 31 Q 31 [ [ 2 1 (n) n(n 3 1) · · · (n 3 l + 1) + Q 3 1 n=l Q 3 1 n=1 l Subtracting (Q ) (Q 3 1)l (Q 31) 3 (Q 3 2)l = (Q 3 1)! (Q 31) + 2l (Q 3 l 3 1)! from which we obtain (Q 31) (Q ) l Q = (Q 3 2)! + l Q(Q 3 l 3 1)! Q 31 (17.37) Iterating (17.37) gives (Q ) l Q = n31 [ (Q 3n) (Q 3 2 3 m)! + l (Q 3 m)(Q 3 l 3 1 3 m)! Q 3n m=0 17.6 Proof of (17.16): jQ (p) for random graphs 413 l k (l) ˜ (l) = 0, because the root is then always one of the group member nodes, we Since l = H [ l finally obtain, (Q ) l =Q Q[ 3l31 m=0 Q [ (Q 3 2 3 m)! (n 3 2)! =Q (Q 3 m)(Q 3 l 3 1 3 m)! n(n 3 l 3 1)! n=l+1 (Q ) It can be shown that, for large Q, l Because k l (Q ) H [l = (17.38) (Q 32)! Q ; l31 . (Q 3l31)! l k Q ˜ (Q ) = (Q 3 l 3 1)! (Q ) H [ l l Q 3l (Q 3 1)! we have that Q l k (Q 3 l 3 1)!Q [ (n 3 2)! (Q ) = H [l (Q 3 1)! n(n 3 l 3 1)! n=l+1 (17.39) k l (Q ) 1 Q 1 and, for large Q, H [l ; l31 ; l31 = (Q 31) Invoking Theorem 17.1.3, the average number of multicast hops for p uniformly chosen, distinct group members is jQ (p) = = p Q [ (Q 3 l 3 1)!Q [ (n 3 2)! p (31)l31 (Q 3 1)! n(n 3 l 3 1)! l l=1 n=2 Q 32 p [ 3Q (Q 3 2 3 v)! [ p (Q 3 l 3 1)! (31)l l (Q 3 1)! v=0 Q 3v (Q 3 l 3 1 3 v)! l=1 The l-summation can be executed as follows. Consider {Q 31 (131@{)p = Dierentiating v times yields Sp p l Q 3l31 = l=0 l (31) { p l [ (Q 3 l 3 1)! p gv k Q 313p {Q 3l3v31 = (31)l { ({ 3 1)p v (Q 3 l 3 1 3 v)! g{ l l=0 Expanding the right-hand side around { = 1 gives " l [ Q 3 1 3 p gv gv k Q 313p p { = ({ 3 1) ({ 3 1)n+p g{v g{v n n=0 = " [ Q 3 1 3 p n=0 n (n + p)! ({ 3 1)n+p3v (n + p 3 v)! Evaluation at { = 1 only leads to a non-zero contribution if n + p 3 v = 0. Hence, p Q 3 1 3 p [ (Q 3 1)! (Q 3 l 3 1)! p = v! 3 (31)l (Q 3 l 3 1 3 v)! (Q 3 1 3 v)! l v 3 p l=1 414 The e!ciency of multicast and Q 32 Q 32 [ 3Q(Q 3 1 3 p)! [ v! 1 +Q (Q 3 1)! (Q 3 v)(v 3 p)!(Q 3 1 3 v) (Q 3 v)(Q 3 1 3 v) v=0 v=0 %Q 32 & Q 32 [ 3Q(Q 3 1 3 p)! [ v! v! = 3 (Q 3 1)! (v 3 p)!(Q 3 1 3 v) (v 3 p)!(Q 3 v) v=p v=p %Q 31 & Q [ 1 [ 1 +Q 3 n n n=1 n=2 %Q 3p31 & Q[ 3p [ 3Q(Q 3 1 3 p)! (Q 3 n 3 1)! (Q 3 n)! = 3 +Q 31 (Q 3 1)! (Q 3 n 3 1 3 p)!n (Q 3 n 3 p)!n n=1 n=2 jQ (p) = Rewrite the first summation as Q 3p31 [ n=1 Q[ 3p (Q 3 2)! (Q 3 n 3 1)! (Q 3 n 3 1)!(Q 3 n 3 p) = + (Q 3 n 3 1 3 p)!n (Q 3 2 3 p)! (Q 3 n 3 p)!n n=2 = Q[ 3p Q[ 3p (Q 3 2)! (Q 3 n)! (Q 3 n 3 1)! + 3p (Q 3 2 3 p)! (Q 3 n 3 p)!n (Q 3 n 3 p)!n n=2 n=2 Then, 3Q(Q 3 1 3 p)! jQ (p) = (Q 3 1)! = % & Q[ 3p (Q 3 2)! (Q 3 n 3 1)! 3p +Q 31 (Q 3 2 3 p)! n(Q 3 n 3 p)! n=2 Q 3p Q (p 3 1) + 1 pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)! + (Q 3 1) (Q 3 1)! n(Q 3 n 3 p)! n=2 = 31 + Q 3p pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)! (Q 3 1)! n(Q 3 n 3 p)! n=1 Using Lemma 17.6.1 Q[ 3p (Q 3 1)! (Q 3 n 3 1)! 1 = [#(Q) 3 #(p)] (Q 3 n 3 p)! n (Q 3 p)! n=1 (17.40) ¤ finally leads to (17.16). 17.7 Proof of Theorem 17.3.1: jQ (p) for n-ary trees ˜ Let [l be the number of joint hops for l dierent multicast group members (we allow the root to ˜ l = 0). Then, be a user in which case [ k l ˜ l D 1 = Pr [All group members belong to the same cluster connected to the root] Pr [ = n · Pr [All group members belong to the first cluster connected to the root] (Q 31)@n 1+n+···+nG1 =n Ql l l =n 1+n+···+nG l (17.41) 17.7 Proof of Theorem 17.3.1: jQ (p) for n-ary trees 415 By self-similarity of n-ary trees we obtain 1+n+···+nG2 k l l ˜ l D 2|[ ˜ l D 1 = s(G31) = n Pr [ l G1 1+n+···+n l because keach cluster extending from the root l k l isk itself a n-ary l tree of depth G 3 1. In general, we ˜ l D m = Pr [ ˜ l D m|[ ˜ l D m 3 1 Pr [ ˜ l D m 3 1 . Hence, by iteration, have Pr [ k l ˜l D m = Pr [ G \ (q) sl > m = 1> 2> = = = > G 3 1 (17.42) q=G3m+1 k l ˜ l D G = 0, because if [ ˜ l = G some destinations must Note that for l D 2 the probability Pr [ be identical. From (2.36) we obtain for l D 2, k l G31 [ ˜l = H [ G \ (q) sl = m=1 q=G3m+1 G31 [ m=1 Gm m G31 [ nG3m 1+···+n nm 1+···+n l l = 1+···+nG 1+···+nG m=1 l (17.43) l k l ˜ l , we find Since H [[l ] = QQ3l H [ H [[l ] = 1+···+nm G31 Q [ nG3m l 1+···+nG > Q 3 l m=1 lD2 (17.44) l k l ˜ 1 and H [[1 ] we find For the value of H [ G k l r q [ 1 ˜1 = 1 H [ nm (1 + · · · + nG3m ) = GnG+1 3 (Q 3 1) Q m=1 Q(n 3 1) and H [[1 ] = m G31 G [ nG3m 1+···+n nG 1 [ m Q 1 n (1 + · · · + nG3m ) = 1+···+nG + Q 3 1 m=1 Q 3 1 m=1 Q 31 1 Invoking Theorem 17.1.3 yields jQ>n (p) = 1+···+nm G31 p [ Q [ nG3m p pnG l 3 (31)l 1+···+nG Q 3 1 l=1 l Q 3 l m=1 l m+1 Writing Dm = n n3131 and reversing the l- and m-summation yields, using (17.24), jQ>n (p) = G31 p [ Dm ! [ p (Q 3 l 3 1)! pnG 3Q (31)l nG3m Q 31 Q! (Dm 3 l)! l m=1 l=1 Concentrating on the inner sum with lower sum bound l = 0, denoted as Vm , and substituting n = p 3 l, we have p [ p K(Q 3 p + n) Vm = (31)p3n n K(D m 3 p + n + 1) n=0 Invoking the Taylor series of the hypergeometric function (Abramowitz and Stegun, 1968, Section 15.1.1), " I (d> e; f; }) = K(f) [ K(d + q)K(e + q) q } K(d)K(e) q=0 K(f + q)q! 416 The e!ciency of multicast p!Vm is the coe!cient in } p of the Cauchy product of (1 3 })p = " [ p (31)n } n n n=0 and " [ K(Q 3 p) K(Q 3 p + n) I (1> Q 3 p; Dm 3 p + 1; }) = }n K(Dm 3 p + 1) K(D m 3 p + 1 + n) n=0 Hence, Vm = 1 K(Q 3 p) gp [(1 3 })p I (1> Q 3 p; Dm 3 p + 1; })]|}=0 p! K(Dm 3 p + 1) g} p Invoking the dierentiation formula (Abramowitz and Stegun, 1968, Section 15.2.7), (31)p K(d + p)K(f 3 e + p)K(f) gp (1 3 })d+p31 I (d> e; f; }) = (13})d31 I (d+p> e; f+p; }) g} p K(d)K(f 3 e)K(f + p) we have, since d = 1 and I (d> e; f; 0) = 1, Vm = (31)p K(Q 3 p)K(Dm + 1 3 Q + p) K(Dm + 1 3 Q)K(Dm + 1) Thus, jQ>n (p) = = G31 [ Dm ! (31)p (Q 3 p 3 1)!(Dm 3 Q + p)! pnG (Q 3 1)! nG3m 3Q 3 Q 31 Q! (Dm 3 Q)!Dm ! Dm ! m=1 G31 G31 [ (31)p31 (Q 3 p 3 1)! [ G3m (Dm 3 Q + p)! pnG + nG3m + n Q 31 (Q 3 1)! (Dm 3 Q)! m=1 m=1 from which (17.25) is immediate. ¤ 17.8 Problem (i) Compute the eective power exponent W (Q) for the n-ary tree. 18 The hopcount to an anycast group In this chapter, the probability density function of the number of hops to the most nearby member of the anycast group consisting of p members (e.g. servers) is analyzed. The results are applied to compute a performance measure of the e!ciency of anycast over unicast and to the server placement problem. The server placement problem asks for the number of (replicated) servers p needed such that any user in the network is not more than m hops away from a server of the anycast group with a certain prescribed probability. As in Chapter 17 on multicast, two types of shortest path trees are investigated: the regular n-ary tree and the irregular uniform recursive tree treated in Chapter 16. Since these two extreme cases of trees indicate that the performance measure 1 d log p where the real number d depends on the details of the tree, it is believed that for trees in real networks (as the Internet) a same logarithmic law applies. An order calculus on exponentially growing trees further supplies evidence for the conjecture that 1 d log p for small p. 18.1 Introduction IPv6 possesses a new address type, anycast, that is not supported in IPv4. The anycast address is syntactically identical to a unicast address. However, when a set of interfaces is specified by the same unicast address, that unicast address is called an anycast address. The advantage of anycast is that a group of interfaces at dierent locations is treated as one single address. For example, the information on servers is often duplicated over several secondary servers at dierent locations for reasons of robustness and accessibility. Changes are only performed on the primary servers, which are then copied onto all secondary servers to maintain consistency. If both the primary and all secondary servers have a same anycast address, a query 417 418 The hopcount to an anycast group from some source towards that anycast address is routed towards the closest server of the group. Hence, instead of routing the packet to the root server (primary server) anycast is more e!cient. Suppose there are p (primary plus all secondary) servers and that these p servers are uniformly distributed over the Internet. The number of hops from the querying device D to the closest server is the minimum number of hops, denoted by kQ (p), of the set of shortest paths from D to these p servers in a network with Q nodes. In order to solve the problem, the shortest path tree rooted at node D, the querying device, needs to be investigated. We assume in the sequel that one of the p uniformly distributed servers can possibly coincide with the same router to which the querying machine D is attached. In that case, kQ (p) = 0. This assumption is also reflected in the notation, small k, according to the convention made in Section 16.3.2 that capital K for the hopcount excludes the event that the hopcount can be zero. Clearly, if p = 1, the problem reduces to the hopcount of the shortest path from D to one uniformly chosen node in the network and we have that kQ (1) = kQ > where kQ is the hopcount of the shortest path in a graph with Q nodes. The other extreme for p = Q leads to kQ (Q ) = 0 because all nodes in the network are servers. In between these extremes, it holds that kQ (p) kQ (p 1) since one additional anycast group member (server) can never increase the minimum number of hops from an arbitrary node to that larger group. The hopcount to an anycast group is a stochastic problem. Even if the network graph is exactly known, an arbitrary node D views the network along a tree. Most often it is a shortest path tree. Although the sequel emphasizes “shortest path trees”, the presented theory is equally valid for any type of tree. The node D’s perception of the network is very likely dierent from the view of another node D0 . Nevertheless, shortest path trees in the same graph possess to some extent related structural properties that allow us to treat the problem by considering certain types or classes of shortest path trees. Hence, instead of varying the arbitrary node D over all possible nodes in the graph and computing the shortest path tree at each dierent node, we vary the structure of the shortest path tree rooted 18.2 General analysis 419 at D over all possible shortest path trees of a certain type. Of course, the confinement of the analysis then lies in the type of tree that is investigated. We will only consider the regular n-ary tree and the irregular URT . It seems reasonable to assume that “real” shortest path trees in the Internet possess a structure somewhere in between these extremes and that scaling laws observed in both the two extreme cases may also apply to the Internet. The presented analysis allows us to address at least two dierent issues. First, for a same class of trees, the e!ciency of anycast over unicast defined in terms of a performance measure , = H [kQ (p)] 1 H [kQ (1)] is quantified. The performance measure indicates how much hops (or link traversals or bandwidth consumption) can be saved, on average, by anycast. Alternatively, also reflects the gain in end-to-end delay or how much faster than unicast, anycast finds the desired information. Second, the so-called server placement problem can be treated. More precisely, the question “How many servers p are needed to guarantee that any user request can access the information within m hops with probability Pr [kQ (p) A m] , where is certain level of stringency,” can be answered. The server placement problem is expected to gain increased interest especially for real-time services where end-to-end QoS (e.g. delay) requirements are desirable. In the most general setting of this server placement problem, all nodes are assumed to be equally important in the sense that users’ requests are generated equally likely at any router in the network with Q nodes. As mentioned in Chapter 17, the validity of this assumption has been justified by Phillips et al. (1999). In the case of uniform user requests, the best strategy is to place servers also uniformly over the network. Computations of Pr [kQ (p) A m] ? for given stringency and hop m, allow the determination of the minimum number p of servers. The solution of this server placement problem may be regarded as an instance of the general quality of service (QoS) portfolio of an network operator. When the number of servers for a major application oered by the service provider are properly computed, the service provider may announce levels of QoS (e.g. via Pr [kQ (p) A m] ? ) and accordingly price the use of the application. 18.2 General analysis Let us consider a particular n o shortest path tree W rooted at node D with (n) as defined in Section 16.2.2. Suppose the level set OQ = [Q 1$n$Q31 420 The hopcount to an anycast group that the result of uniformly distributing p anycast group members over the graph leads to a number p(n) of those anycast group member nodes that (n) are n hops away n from o the root. These p distinct nodes all belong to the (n) (n) n-th level set [Q . Similarly as for [Q , some relations are immediate. First, p(0) = 0 means that none of the p anycast group members coincides with the root node D or p(0) = 1 means that one of them (and at most one) is attached to the same router D as the querying device. Also, for all n A 0, (n) it holds that 0 p(n) [Q and that Q31 X p(n) = p (18.1) n=0 Given the tree W specified the level set OQªand the anycast group members © (0) by (1) specified by the set p > p > = = = > p(Q31) , we will derive the lowest nonempty level p(m) , which is equivalent to kQ (p). Let us denote by hm the event that all first m + 1 levels are not occupied by an anycast group member, n o n o n o hm = p(0) = 0 _ p(1) = 0 _ · · · _ p(m) = 0 The probability distribution of the minimum hopcount, [kQ (p) © Pr ª = m|OQ ], (m) is then equal to the probability of the event hm31 _ p A 0 . Since the ª © ªf © event p(m) A 0 = p(m) = 0 , using the conditional probability yields hn o¯ i ¯ Pr [kQ (p) = m|OQ ] = Pr p(m) A 0 ¯ hm31 Pr [hm31 ] i´ o¯ ³ hn ¯ = 1 Pr p(m) = 0 ¯ hm31 Pr [hm31 ] (18.2) © ª Since hm = hm31 _ p(m) = 0 , the probability of the event hm can be decomposed as hn o¯ i ¯ Pr [hm ] = Pr p(m) = 0 ¯ hm31 Pr [hm31 ] (18.3) The assumption that all£p group members are uniformly distributed ¤ © anycast ª¯ enables to compute Pr p(m) = 0 ¯ hm31 exactly. Indeed, by the uniform assumption, the probability equals the ratio of the favorable possibilities over the total possible. The total number of ways to distribute p items over P (n) Q m31 — the latter constraints follows from the condition n=0 [Q positions ¡Q3Sm31 [ (n) ¢ n=0 Q hm31 — equals . Likewise, the favorable number of ways to p 18.2 General analysis 421 distribute p items over the remaining levels higher than m, leads to S o¯ hn i ¡Q 3 mn=0 [Q(n) ¢ ¯ (m) p (18.4) Pr p = 0 ¯ hm31 = ¡ Sm31 (n) Q3 [ ¢ n=0 p Q The recursion (18.3) needs an initialization, given by h i p Pr [h0 ] = Pr p(0) = 0 = 1 Q Q 31 ¤ ¤ ª £ (0) £© ( ) which follows from Pr p = 0 = p and equals Pr p(0) = 0 |h31 Q (p) £ ¤ p (although the event h31 is meaningless). Observe that Pr p(0) = 1 = Q holds for any tree such that p Pr [kQ (p) = 0] = Q By iteration of (18.3), we obtain Pr [hm ] = m Y v=0 ¡Q3Sv (n) n=0 [Q ¢ ¡Q 3Sm (n) n=0 [Q p p ¡Q ¢ ¡Q3Sv31 [ (n) ¢ = n=0 p Q ¢ (18.5) p P where the convention in summation is that en=d in = 0 if d A e. Finally, combining (18.2) with (18.4) and (18.5), we arrive at the general conditional expression for the minimum hopcount to the anycast group, ¡Q 3Sm31 [ (n) ¢ ¡Q3Sm [ (n) ¢ n=0 Q n=0 Q p p (18.6) Pr [kQ (p) = m|OQ ] = ¡Q ¢ p Clearly, while Pr [kQ (0) = m|OQ ] = 0 since there is no path, we have for p = 1, (m) [ Pr [kQ (1) = m|OQ ] = Q Q It directly follows from (18.6) that ¡Q 3Sq (n) n=0 [Q Pr [kQ (p) q|OQ ] = 1 p ¡Q ¢ p ¢ (18.7) P 31 P (n) (n) If Q qn=0 [Q ? p or, equivalently, Q n=q+1 [Q ? p, then equation (18.7) shows that Pr [kQ (p) A q|OQ ] = 0. The maximum possible hopcount of a shortest path to an anycast group strongly depends on the specifics of the shortest path tree or the level set OQ . A general result is worth mentioning: 422 The hopcount to an anycast group Theorem 18.2.1 For any graph, it holds that Pr[kQ (p) A Q p] = 0 In words, the longest shortest path to an anycast group with p members can never possess more than Q p hops. Proof: This general theorem follows from the fact that the line topology is the tree with the longest hopcount Q 1 and only in the case that all p last positions (with respect to the source or root) are occupied by the p anycast group members, is the maximum hopcount Q p. ¤ For the URT , Pr[kQ (p) = Q p] is computed exactly in (18.12). Corollary 18.2.2 For any graph, it holds that 1 Q Proof: This corollary follows from Theorem 18.2.1 and the law of total probability. Alternatively, if there are Q 1 anycast members in a network with Q nodes, the shortest path can only consist of one hop if none of the anycast members coincides with the root node. This probability is precisely 1 ¤ Q. Pr[kQ (Q 1) = 1] = Using the tail probability formula (2.36) for the average, it follows from (18.7) that P Q32 µ (n) ¶ 1 X Q qn=0 [Q (18.8) H [kQ (p)|OQ ] = ¡Q ¢ p p q=0 from which we find, 1 X (n) n[Q H [kQ (1)|OQ ] = Q Q31 n=1 Thus, given OQ , a performance measure for anycast over unicast can be quantified as H [kQ (p)|OQ ] 1 = H [kQ (1)|OQ ] Using the law of total probability, the distribution of the minimum hopcount to the anycast group is X Pr [kQ (p) = m|OQ ] Pr [OQ ] (18.9) Pr [kQ (p) = m] = all OQ 18.3 The n-ary tree 423 or explicitly, Pr[kQ (p) = m] = X SQ31 ¡SQ 31 {n ¢ ¡SQ 31 {n ¢ h i n=m n=m+1 (1) (Q31) p = {Q31 Pr [Q = {1 >= = = >[Q ¡Q ¢ p n=1 {n =Q31 p where the integers {n 0 for all n. This expression explicitly shows the importance of the level structure OQ of the shortest path tree W . The level set OQ entirely determines the shape of the tree W . Unfortunately, a general form for Pr [OQ ] or Pr [kQ (p) = m] is di!cult to obtain. In principle, via extensive trace-route measurements from several roots, the shortest path tree and Pr [OQ ] can be constructed such that a (rough) estimate of the level set OQ in the Internet can be obtained. 18.3 The n-ary tree For regular trees, explicit expressions are possible because the summation in (18.9) simplifies considerably. For example, for the n-ary tree defined in Section 17.3, (m) [Q = n m (m) Provided the set OQ only contains these values of [Q for each m, we have that Pr [OQ ] = 1, else it is zero (because then OQ is not consistent with a G+1 n-ary tree). Summarizing, for the n-ary tree with Q = n n3131 and G levels, the distribution of the minimum hopcount to the anycast group is ¡Q3 nm 31 ¢ ¡Q3 nm+1 31 ¢ n31 n31 p (18.10) Pr [kQ (p) = m] = ¡Q ¢ p p Extension of the integer n to real numbers in the formula (18.10) is expected to be of value as suggested in Section 17.3. When a n-ary tree was used to fit corresponding Internet multicast measurements (Van Mieghem et al., 2001a), a remarkably accurate agreement was found for the value n 3=2, which is about the average degree of the Internet graph. Hence, if we were to use the n-ary tree as model for the hopcount to an anycast group, we expect that n 3=2 is the best value for Internet shortest path trees. However, we feel we ought to mention that the hopcount distribution of the shortest path between two arbitrary nodes is definitely not a n-ary tree, because Pr [kQ (1) = m] increases with the hopcount m, which is in conflict with Internet trace-route measurements (see, for example, the bell-shape curve in Fig. 16.4). Figure 18.1 displays Pr [k(p) m] for a n-ary with outdegree n = 3 and 424 The hopcount to an anycast group 1.0 N = 500 k=3 0.8 Pr[hN (m) d j] m = 50 m = 10 0.6 m=5 0.4 m=2 m=1 0.2 0 1 2 3 4 5 j Fig. 18.1. The distribution function of k500 (p) versus the hops m for various sizes of the anycast group in a n-ary tree with n = 3 and Q = 500 Q = 500. This type of plot allows us to solve the “server placement problem”. For example, assuming that the n-ary tree is a good model and the network consists of Q = 500 nodes, Fig. 18.1 shows that at least p = 10 servers are needed to assure that any user is not more than four hops separated from a server with a probability of 93%. More precisely, the equation Pr[k500 (p) A 4] ? 0=07 is obeyed if p 10. Figure 18.2 gives an idea how the performance measure decreases with the size of the anycast group in n-ary trees (all with outdegree n = 3), but with dierent size Q . For values of p up to around 20% of Q , we observe that decreases logarithmically in p. 18.4 The uniform recursive tree (URT) Chapter 16 motivates the interest in the URT. The URT is believed to provide a reasonable, first order estimate for the hopcount problem to an anycast group in the Internet. 18.4.1 Recursion for Pr [k(p) = m] Usually, a combinatorial approach such as (18.9) is seldom successful for URTs while structural properties often lead to results. The basic Theo- 18.4 The uniform recursive tree (URT) 425 1.0 k=3 0.8 0.6 K N = 100 N = 500 0.4 N = 5000 5 N = 10 0.2 6 N = 10 0.0 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 0.1 1 m/N Fig. 18.2. The performance measure for several sizes of n-ary trees (with n = 3) as a function of the ratio of anycast nodes over the total number of nodes. rem 16.2.1 of the URT, applied to the anycast minimum hop problem, is illustrated in Fig. 18.3. Root i anycast members m i anycast members R1 N k nodes T2 k nodes T1 Fig. 18.3. A uniform recursive tree consisting of two subtrees W1 and W2 with n and Q n nodes respectively. The first cluster contains l anycast members while the cluster with Q n nodes contains p l anycast members. Figure 18.3 shows that any URT can be separated into two subtrees W1 and W2 with n and Q n nodes respectively. Moreover, Theorem 16.2.1 states 426 The hopcount to an anycast group that each subtree is independent of the other and again a URT. Consider now a specific separation of a URT W into W1 = w1 and W2 = w2 , where the tree w1 contains n nodes and l of the p anycast members and w2 possesses Q n nodes and the remaining p l anycast members. The event {kW (p) = m} equals the union of all possible sizes Q1 = n and subgroups p1 = l of the event {kw1 (l) = m 1} _ {kw2 (p l) m} and the event {kw1 (l) A m 1} _ {kw2 (p l) = m}, {kW (p) = m} = ^n ^l {{kw1 (l) = m 1} _ {kw2 (p l) m}} ^ {{kw1 (l) A m 1} _ {kw2 (p l) = m}} Because kQ (0) is meaningless, the relation must be modified for the case l = 0 to {kW (p) = m} = {kw2 (p) = m} and for the case l = p to {kW (p) = m} = {kw1 (p) = m 1} This decomposition holds for any URT W1 and W2 , not only for the specific ones w1 and w2 . The transition towards probabilities becomes X (Pr [kw1 (l) = m 1] Pr [kw2 (p l) m] Pr [kW (p) = m] = all w1 >w2 >n>l + Pr [kw1 (l) m 1] Pr [kw2 (p l) = m]) × Pr [W1 = w1 > W2 = w2 > Q1 = n> p1 = l] Since W1 and W2 and also p1 are independent given Q1 , the last probability o simplifies to o = Pr [W1 = w1 > W2 = w2 > Q1 = n> p1 = l] = Pr [W1 = w1 |Q1 = n] Pr [W2 = w2 |Q1 = n] Pr [p1 = l|Q1 = n] Pr [Q1 = n] Theorem 16.2.1 states that Q1 is uniformly distributed over the set with 1 . The fact that l out of the p Q 1 nodes such that Pr [Q1 = n] = Q31 anycast members, uniformly chosen out of Q nodes, belong to the recursive subtree W1 implies that p l remaining anycast members belong to W2 . Hence, analogous to a combinatorial problem outlined by Feller (1970, p. 43) that leads to the hypergeometric distribution, we have ¡n¢¡Q 3n¢ Pr [p1 = l|Q1 = n] = l ¡Qp3l ¢ p 18.4 The uniform recursive tree (URT) 427 ¡n¢ because all favorable combinations are those l to distribute l anycast mem¡ 3n¢ bers in W1 with n nodes multiplied by all favorable Q p3l to distribute the remaining p l in W2 containing Q ¡ n¢nodes. The total way to distribute p anycast members over Q nodes is Q p . Finally, we remark that the hopcount of the shortest path to p anycast members in a URT only depends on its size. This means that the sum over all w1 of Pr [W1 = w1 |Q1 = n], which equals 1, disappears and likewise also the sum over all w2 . Combining the above leads to Pr [kQ (p) = m] = Q31 X p31 X (Pr [kn (l) = m 1] Pr [kQ3n (p l) m] n=1 l=1 ¡n¢¡Q3n¢ + Pr [kn (l) A m 1] Pr [kQ3n (p l) = m]) l p3l ¡ ¢ (Q 1) Q p ¡ ¢ ¡ ¢ Q31 X Q3n Pr [kQ 3n (p) = m] + n Pr [kn (p) = m 1] p p + ¢ ¡Q (Q 1) p n=1 By substitution of n 0 = Q n and p0 = p l, we obtain the recursion, ¡ ¢¡Q3n¢ Pr [kQ (p) = m] = Q31 X p31 X nl n=1 l=1 Q3n31 X × p3l (Pr [kn (l) = m 1] + Pr [kn (l) = m]) ¡ ¢ (Q 1) Q p Pr [kQ3n (p l) = t] t=m ¡ ¢ Q31 X n (Pr [kn (p) = m] + Pr [kn (p) = m 1]) p + ¡ ¢ (Q 1) Q p n=1 (18.11) This recursion (18.11) is solved numerically for Q = 20. The result is shown in Fig. 18.4, which demonstrates that Pr [k(p) A Q p] = 0 or that the path with the longest hopcount to an anycast group of p members consists of Q p links. Since there are (Q 1)! possible recursive trees (Theorem 16.2.2) and there is only one line tree with Q 1 hops where each node has precisely one child node, the probability to have precisely Q 1 hops from the root is 1 (Q31)! (which also is Pr [kQ = Q 1] given in (16.8)). The longest possible hopcount from a root to p anycast members occurs in the line tree where all p anycast members occupy the last p positions. Hence, the probability 428 The hopcount to an anycast group for the longest possible hopcount equals Pr [kQ (p) = Q p] = p! ¡ ¢ (Q 1)! Q p (18.12) because there are p! possible ways to distribute the p¡ anycast members Q¢ at the p last positions in the line tree while there are p possibilities to distribute p anycast members at arbitrary places in the line tree. -1 10 N = 20 -3 10 -5 10 Pr[hN (m) = j] -7 10 -9 10 -11 10 -13 10 -15 10 -17 10 -19 10 0 2 4 6 8 10 12 14 16 18 20 j Fig. 18.4. The pdf of kQ (p) in a URT with Q = 20 nodes for all possible p. Observe that Pr[kQ (p) A Q p] = 0= This relation connects the various curves to the value for p. Figure 18.4 allows us to solve the “server placement problem”. For example, consider the scenario in which a network operator announces that any user request will reach a server of the anycast group in no more than m = 4 hops in 99.9% of the cases. Assuming his network has Q = 20 routers and the shortest path tree is a URT, the network operator has to compute the number of anycast servers p he has to place uniformly spread over the Q = 20 routers by solving Pr [k20 (p) A 4] ? 1033 . Figure 18.4 shows that the intersection of the line m = 4 and the line Pr [k20 (p) = 4] = 1033 is the curve for p = 7. Since the curves for p 7 are exponentially decreasing, Pr [k20 (p) A 4] is safely1 approximated by Pr [k20 (p) = 4], which leads to the placing of p = 7 servers. When following the line m = 4, we also observe that the curves for p = 5> 6> 7> 8 lie near to that of p = 7. This means that 1 More precisely, since Pr [k20 (4) A 4] = 0.001 06 and Pr [k20 (5) A 4] = 0.000 32, only p = 5 servers are su!cient. 18.4 The uniform recursive tree (URT) 429 placing a server more does not considerably change the situation. It is a manifestation of the law 1 d log p, which tells us that by placing p servers the gain measured in hops with respect to the single server case is slowly, more precisely logarithmically, increasing. The performance measure for the URT is drawn for several sizes Q in Fig. 18.5. 18.4.2 Analysis of the recursion relation The product of two probabilities in the double sum in (18.11) seriously complicates a possible analytic treatment. A relation for a generating function of Pr [kQ (p) = m] and other mathematical results are derived in Van Mieghem (2004b). Here, we summarize the main results. p . Using Pr [kn (l) 1] = 1, the (a) Let us check Pr [kQ (p) = 0] = Q p3l convention that Pr [kn (l) = 1] = 0 and Pr [kQ3n (p l) = 0] = Q 3n , the right hand side of (18.11), denoted by u, simplifies to µ ¶µ ¶ Q31 p XX pl n Q n 1 u= ¡ ¢ Q n l pl (Q 1) Q p n=1 l=0 Q31 X p31 X µn ¶µQ 1 n¶ 1 = ¡ ¢ l p1l (Q 1) Q p n=1 l=0 Q31 X µQ 1¶ p 1 = = ¡ ¢ Q p1 (Q 1) Q p n=1 (b) Observe that Pr [kQ (Q ) = m] = 0 for m A 0. (c) For p = 1, 1 X n Pr [kQ = m] = (Pr[kn = m] + Pr [kn = m 1]) Q 1 Q Q 31 n=1 Multiplying both sides by } m , summing over all m leads to the recursion for the generating function (16.6) (Q + 1)*Q+1 (}) = (} + Q )*Q (}) 430 The hopcount to an anycast group (d) The case p = 2 is solved in Van Mieghem (2004b, Appendix) as µ ¶ m 2(1)Q313m (m+1) 2(1)Q3m X (n+m+1) n 2n1 Pr [kQ (2) = m] = + (1) VQ VQ Q! Q !(Q 1) n n=1 ¶ µ ¶¸ m31 µ 2n + 1 2(1)Q3m X m + n + 1 n+m+1 (1)n VQ + Q !(Q 1) m n n=0 (18.13) In van der Hofstad et al. (2002b) we have demonstrated that the covariance between the number of nodes at level u and m for u m in the URT is µ ¶ u h i (1)Q31 X (u) (m) (n+m+1) n+m 2n + m u (1) H [Q [Q = VQ (Q 1)! n n=0 For m u = 1, the last term in (18.13) is recognized as ¡2n31¢ 1 ¡2n¢ = 2 n , the first sum in (18.13) is n l k (m31) (m) H [Q [Q (Q2 ) . Since ³ ´ ¸ (m) 2 µ ¶ H [Q m (m+1) 2(1)Q3m X 2(1)Q3m31 VQ (n+m+1) n 2n1 (1) = VQ ¡ ¢ Q !(Q 1) (Q 1) Q! n 2 Q2 n=1 Q 313m With 2(31)Q! (m+1) VQ = 2 Pr [kQ = m], we obtain ³ ´ ¸ i h (m) 2 (m31) (m) H [Q H [Q [Q 2Q Pr [kQ = m] + Pr [kQ (2) = m] = ¡Q ¢ ¡ ¢ Q 1 2 Q2 2 ¶ m µ 2(1)Q31 X m + n n+m + (1)n+m VQ Q !(Q 1) n n=1 It would be of interest to find an interpretation for the last sum. Without proof2 , we mention the following exact results: Q µ ¶ X Q p=1 p Pr [kQ (p) = Q 2] = Q1 Xµ p=1 ¶ Q 1 Pr [kQ 1 (p) = Q 3] 1 + Q 1 (Q 2)! p For p Q 3, it holds that p! Pr [kQ (p) = Q p 1] = ¡ ¢ (Q 1)! Q p 2 "µ ¶ # p X Q p+1 + (p 1)(p@2 + 1) + 2 n By substitution into the recursion (18.11), one may verify these relations. n=2 18.5 Approximate analysis 431 18.5 Approximate analysis Since the general solution (18.9) is in many cases di!cult to compute as shown for the URT in Section 18.4, we consider a simplified version of the p above problem where each node in the tree has equal probability s = Q to be a server. Instead of having precisely p servers, the simplified version considers on average and the probability that there are precisely ¡Q ¢ p p servers Q3p . In the simplified version, the associated p servers is p s (1 s) equations to (18.4) and (18.3) are o¯ oi i hn hn (m) ¯ Pr p(m) = 0 ¯ hm31 = Pr p(m) = 0 = (1 s)[Q Pr [hm ] = m Y Pr hn oi Sm31 (o) p(m) = 0 = (1 s) o=0 [Q o=0 which implies that the probability that there are no servers in the tree is (1 s)Q . Since in that case, the hopcount is meaningless, we consider the conditional probability (18.2) of the hopcount given that the level set contains at least one server (which is denoted by e kQ (p)) is ³ ´ Sm31 (o) (m) i h 1 (1 s)[Q (1 s) o=0 [Q kQ (p) = m|OQ = Pr e 1 (1 s)Q Thus, h i 1 (1 s)Sqo=0 [Q(o) e Pr kQ (p) q|OQ = 1 (1 s)Q h i (o) Finally, to avoid the knowledge of the entire level set OQ , we use H [Q = (o) Q Pr [kQ (1) = o] from (16.7) as the best estimate for each [Q and obtain the approximate formula µ l¶ k Sm31 k (o) l (m) H [Q H [Q (1 s) o=0 1 (1 s) h i kQ (p) = m = Pr e (18.14) 1 (1 s)Q In the dotted lines in Fig. 18.5, we have added the approximate result for the URT where H [kQ (p)] is computed based on (18.14), but where H[kQ (1)] is computed exactly. For p = 1, the approximate analysis (18.14) is not well i Fig. 18.5 illustrates this deviation in the fact that appr (1) = h suited: kQ (1) @H [kQ (1)] ? 1. For higher values of p we observe a fairly good H e correspondence. We found that the probability (18.14) reasonably approximates the exact result plotted on a linear scale. Only the tail behavior (on 432 The hopcount to an anycast group log-scale) and the case for p = 1 deviate significantly. In summary for the URT, the approximation (18.14) for Pr [kQ (p) = m] is much faster to compute than the exact recursion and it seems appropriate for the computation of for p A 1. However, it is less adequate to solve the server placement problem that requires the tail values Pr [kQ (p) A m]. 1.0 N = 10 : K 0.404 ln(m/N) N = 20 : K 0.295 ln(m/N) N = 30 : K 0.252 ln(m/N) N = 50 : K 0.210 ln(m/N) 0.8 K 0.6 0.4 0.2 0.0 2 3 4 5 6 7 8 9 2 0.1 3 4 5 6 7 8 9 1 m/N Fig. 18.5. The performance measure for several sizes Q of URTs as a function of the ratio p@Q 18.6 The performance measure in exponentially growing trees In this section, we investigate the observed law 1 d log p for a much larger class of trees, namely the class of exponentially growing trees to which both the n-ary tree and the URT belong. Also most trees in the Internet are exponentially growing trees. A tree is said to grow exponentially in the ³ ´ (m) 1@m number of nodes Q with degree if limm<" [Q = or, equivalently, (m) [Q m , for large m. The fundamental problem with this definition is that it only holds for infinite graphs Q = 4. For real (finite) graphs, there must (o+1) (o+2) (Q31) exist some level m = o for which the sequence [Q > [Q > = = = > [Q P (m) ceases to grow because Q31 m=0 [Q = Q ? 4. This boundary eect complicates the definition of exponential growth in finite graphs. The second (0) (1) (o) complication is that even in the finite set [Q > [Q > = = = > [Q not necessary (m) (m) all [Q with 0 m o need to obey [Q m , but “enough” should. 18.6 The performance measure in exponentially growing trees 433 Without the limit concept, we cannot specify the precise conditions of exponential growth in a finite shortest path tree. If we assume in finite graphs P (m) (m) that [Q m for m o, then om=0 [Q = Q with 0 ? ? 1. Indeed, for A 1, the highest hopcount level o possesses by far the most nodes since o+1 31 o 31 , which cannot be larger than a fraction Q of the total number of nodes. We now present an order calculus to estimate for exponentially growing trees based on relation (18.8). Let us denote ¡Q3{¢ |= p ¡Q ¢ p = p31 Yµ m=0 { 1 Q m ¶ For large Q and fixed p, ³ {p ´ (1 + r (1)) | = exp Q (m) In the case where the tree is exponentially growing for m o as [Q = m m with m some slowly varying sequence, only very few levels o (bounded by P (n) a fixed number) around o obey qn=0 [Q = R(Q ) where q 5 [o o> o], Pq (n) while for all m A o, we have n=0 [Q = q Q with some sequence q ? P (n) q+1 ? max q = 1. Applied to (18.8) where { = qn=0 [Q ? Q , ¡ ¢ Q32 ³ p ´ X (13q )Q q p exp q + H [kQ (p)|OQ ] (1 + r (1)) ¡Q ¢ Q p q=0 o X q=o+1 If there are only a few levels more than o, the last series is much smaller than 1 and can be omitted. Since the slowly varying sequence q is unknown, we approximate q = and Z p o 3x ³ p ³ p ´ Z o ´ Q h 1 q q exp q exp gq = gx p Q Q log x 0 q=0 Q Z " 3x h3p h 1 gx log p x p log µQ ³ p ´¶ h3p p 1 log + R = log p Q Q o X where in the last step a series (Abramowitz and Stegun, 1968, Section 5.1.11) 434 The hopcount to an anycast group for the exponential integral is used. Thus, µ ¶ 3p ¡p¢ 33 h p 3log p3log +R Q 1+ log Q ³ (1 + r (1)) ¡ 1 ¢´ 31 + R 1 +hlog+log Q Q à µ ¶! 3p h 1 log p h31 p = (1 + r (1)) 1 +R log Q log Q log2 Q Since by definition = 1 for p = 1, we finally arrive at ¶ µ 3p log p h31 h p 1 1 +R log Q log Q log2 Q which supplies evidence for the conjecture 1 d log p that exponentially growing graphs possess a performance measure that logarithmically decreases in p, which is rather slow. Measurement data in the Internet seem to support this log p-scaling law. Apart from the correspondence with figures in the work of Jamin et al. (2001), Fig. 6 in Krishnan et al. (2000) shows that the relative measured tra!c flow reduction decreases logarithmically in the number of caches p. Appendix A Stochastic matrices This appendix reviews the matrix theory for Markov chains. In-depth analyses are found in classical books by Gantmacher (1959a,b), Wilkinson (1965) and Meyer (2000). A.1 Eigenvalues and eigenvectors 1. The algebraic eigenproblem consists in the determination of the eigenvalues and the corresponding eigenvectors { of a matrix D for which the set of q homogeneous linear equations in q unknowns D{ = { (A.1) has a non-zero solution. Clearly, the zero vector { = 0 is always a solution of (A.1). A non-zero solution of (A.1) is only possible if and only if the matrix D L is singular, that is det (D L) = 0 (A.2) This determinant can be expanded in a polynomial in of degree q, f() = (1)q q + fq31 q31 + · · · + f1 + f0 = 0 (A.3) which is called the characteristic (eigenvalue) polynomial of the matrix D and where the coe!cients X Pq3n (A.4) fn = (1)n doo and Pn is a principal minor1 . Since a polynomial of degree q has q complex zeros, the matrix D possesses q eigenvalues n , not all necessarily distinct. 1 A principal minor Pn is the determinant of a principal n × n submatrix Pn×n obtained by deleting the same q 3 n rows and columns in D. Hence, the main diagonal elements (Pn×n )ll are n elements of main diagonal elements {dll }1$l$q of D. 435 436 Stochastic matrices In general, the characteristic polynomials can be written as f() = q Y (n ) (A.5) n=1 Since f() = det (D L), it follows from (A.3) and (A.5) that for = 0 det D = f0 = q Y n (A.6) n=1 Hence, if det D = 0, there is at least one zero eigenvalue. Also, (1)q31 fq31 = q X n = trace(D) (A.7) n=1 If all n 0, we can apply the general theorem of the arithmetic and geometric mean (5.2) to (A.6) and (A.7) with tn = Sqn m , m=1 q Y n=1 ÃP n n q n=1 n n P q m=1 m !Sq m=1 m and by choosing m = 1, we find the inequality ¶ µ trace(D) q det D q To any eigenvalue , the set (A.1) has at least one non-zero eigenvector {. Furthermore, if { is a non-zero eigenvector, also n{ is a non-zero eigenvalue. Therefore, eigenvectors are often normalized, for instance, a probabilistic eigenvector has the sum of its components equal to 1 or a norm k{k1 = 1 as defined in (A.23). If the rank of D L is less than q 1, there will be more than one independent vector. Just these cases seriously complicate the eigenvalue problem. In the sequel, we omit the discussion on multiple eigenvalues and refer to Wilkinson (1965). 2. The eigenproblem of the transpose DW , DW | = | (A.8) is of singular importance. Since the¡ determinant of a matrix is equal to the ¢ W determinant of its transpose, det D L = det (D L) which shows that the eigenvalues of D and DW are the same. However, the eigenvectors are, in general, dierent. Alternatively, we can write (A.8) as |W D = | W (A.9) A.1 Eigenvalues and eigenvectors 437 The vector |mW is therefore called the left-eigenvector of D belonging to the eigenvalue m , whereas {m is called the right-eigenvector belonging to the same eigenvalue m . An important relation between left- and righteigenvectors of a matrix D is, for m 6= n , |mW {n = 0 (A.10) Indeed, left-multiplying (A.1) with = n by |mW , |mW D{n = n |mW {n and similarly right-multiplying (A.9) with = m by {n |mW D{n = m |mW {n leads, after subtraction to 0 = (n m ) |mW {n and (A.10) follows. Since eigenvectors may be complex in general and since |mW {n = {Wn |m , the expression |mW {n is not an inner-product that is always real and for which ¢W ¡ |mW {n = {Wn |m holds. However, (A.10) expresses that the sets of left- and right-eigenvectors are orthogonal if m 6= n . 3. If D has q distinct eigenvalues, then the q eigenvectors are linearly independent and span the whole q dimensional space. The proof is by reductio ad absurdum. Assume that v is the smallest number of linearly dependent eigenvectors labelled by the first v smallest indices. Linear dependence then means that, v X n {n = 0 (A.11) n=1 where n 6= 0 for 1 n v. Left-multiplying by D and using (A.1) yields v X n n {n = 0 (A.12) n=1 On the other hand, multiplying (A.11) by v and subtracting from (A.12) leads to v31 X n (n v ) {n = 0> n=1 which, because all eigenvalues are distinct, implies that there is a smaller set of v 1 linearly depending eigenvectors. This contradicts the initial hypothesis. This important property has a number of consequences. First, it applies to left- as well as to right-eigenvectors. Relation (A.10) then shows that the sets 438 Stochastic matrices of left- and right-eigenvectors form a bi-orthogonal system with |nW {n 6= 0. For, if {n were orthogonal to |n (or |nW {n = 0), (A.10) demonstrates that {n would be orthogonal to all left-eigenvectors |m . Since the set of lefteigenvectors span the q dimensional vector space, it would mean that the q dimensional vector {n would be orthogonal to the whole q-space, which is impossible because {n is not the null vector. Second, any q dimensional vector can be written in terms of either the left- or right-eigenvectors. 4. Let us denote by [ the matrix with in column m the right-eigenvector {m and by \ W the matrix with in row n the left-eigenvector | W . If the rightand left-eigenvectors are scaled such that, for all 1 n q, |nW {n = 1, then \ W[ = L (A.13) or, the matrix \ W is the inverse of the matrix [. Furthermore, for any right-eigenvector, (A.1) holds, rewritten in matrix form, that D[ = [ diag(n ) (A.14) Left-multiplying by [ 31 = \ W yields the similarity transform of matrix D, [ 31 D[ = \ W D[ = diag(n ) (A.15) Thus, when the eigenvalues of D are distinct, there exists a similarity transform K 31 DK that reduces D to diagonal form. In many applications, similarity transforms are applied to simplify matrix problems. Observe that a similarity transform preserves the eigenvalues, because, if D{ = {, then K 31 { = K 31 D{ = (K 31 DK)K 31 {. The eigenvectors are transformed to K 31 {. When D has multiple eigenvalues, it may be impossible to reduce D to a diagonal form by similarity transforms. Instead of a diagonal form, the most compact form when D has u distinct eigenvalues each with multiplicity P pm such that um=1 pm = q is the Jordan canonical form F, 5 9 9 9 F=9 9 7 6 Fp1 3d (1 ) Fd (1 ) : : : : : 8 .. . Fpu31 (u31 ) Fpu (u ) A.1 Eigenvalues and eigenvectors 439 where Fp () is a p × p submatrix of the form, 5 1 9 0 9 9 .. Fp () = 9 ... . 9 7 0 ··· 0 ··· 0 ··· 1 0 .. .. . . 0 0 0 6 0 ··· : : .. : . : : 1 8 The number of independent eigenvectors is equal to the number of submatrices. If an eigenvalue has multiplicity p, there can be one large submatrix Fp (), but also a number n of smaller submatrices Fem () such P that nm=1 em = p. This illustrates, as mentioned in art. 1, the much higher complexity of the eigenproblem in case of multiple eigenvalues. For more details we refer to Wilkinson (1965). 5. The companion matrix of the characteristic polynomial (A.3) of D is defined as 5 (1)q31 fq31 (1)q31 fq32 9 1 0 9 9 0 1 F=9 9 . .. .. 7 . 0 0 ··· ··· ··· .. . ··· 6 (1)q31 f1 (1)q31 f0 : 0 0 : : 0 0 : : .. .. 8 . . 1 0 Expanding det (F L) in cofactors of the first row yields det (F L) = f (). If D has distinct eigenvalues, D as well as F are similar to diag(l ). It has been shown that the similarity transform K for D equals K = [. The similarity transform for F is the Vandermonde matrix Y (), where 5 {q31 {q31 1 2 q32 9 { {q32 2 9 1 9 .. .. Y ({) = 9 . 9 . 9 7 {1 {2 1 1 ··· ··· ··· .. . ··· 6 q31 {q31 q31 {q q32 : {q32 q31 {q : .. .. : . . : : : {q31 {q 8 1 1 The Vandermonde matrix Y () is clearly non-singular if all eigenvalues are 440 Stochastic matrices distinct. Furthermore, 5 q1 q2 q31 9 q31 2 9 1 9 .. .. Y ()diag (l ) = 9 . 9 . 9 7 21 22 1 2 while 5 (1)q1 f (1 ) + q1 9 q1 1 9 9 .. 9 FY () = 9 . 9 7 21 1 ··· ··· ··· .. . ··· (1)q1 f (2 ) + q2 q1 2 .. . 22 2 6 qq31 qq q31 : q31 q31 q : .. .. : . . : : : 2 2 q31 q 8 q31 q ··· ··· ··· .. . ··· 6 (1)q1 f (q ) + qq : q1 q : : .. : . : : 2 8 q q Since f (m ) = 0, it follows that FY () = Y ()diag(l ), which demonstrates the claim. Hence, the eigenvector {n of F belonging to eigenvalue n is ¤ £ {Wn = q31 q32 · · · n 1 n n 6. When left-multiplying (A.1), we obtain D2 { = D{ = 2 { or, in general for any integer n 0, Dn { = n { (A.16) Since any eigenvalue satisfies its characteristic polynomial f () = 0, we directly find from (A.16) that the matrix D satisfies its own characteristic equation, f(D) = 0 (A.17) This result is the Caley—Hamilton theorem. There exist several other proofs of the Caley—Hamilton theorem. 7. Consider an arbitrary matrix polynomial in , I () = p X In n n=0 where all In are q × q matrices and Ip 6= R. Any matrix polynomial I () can be right and left divided by another (non-zero) matrix polynomial E() in a unique way as proved in Gantmacher (1959a, Chapter IV). Hence the A.1 Eigenvalues and eigenvectors 441 left-quotient and left-remainder I () = E()TO () + O() and the rightquotient and right-remainder I () = TU ()E() + U() are unique. Let us concentrate on the right-remainder in the case where E() = L D is a linear polynomial in . Using Euclid’s division scheme for polynomials, p31 I () = Ip p31 (L D) + (Ip D + Ip31 ) + £ ¤ = Ip p31 + (Ip D + Ip31 ) p32 (L D) p32 X In n n=0 X ¢ p32 p33 ¡ 2 + In n + Ip D + Ip31 D + Ip32 n=0 and continued, we arrive at 5 I () = 7Ip p31 + · · · + n31 p X Im Dm3n + · · · + p X 6 Im Dm31 8 (L D) m=1 m=n + p X Im Dm m=0 In summary, I () = TU () (L D) + U() (and similarly for the leftquotient and left-remainder) with ³P ´ ³P ´ P Pp p p n31 m3n n31 m3n I I D () = D T TU () = p m m O m=n m=n P n=1 m Ppn=1 m U() = p I D = I (D) O() = D I m m=0 m m=0 (A.18) and where the right-remainder is independent of . The Generalized Bézout Theorem states that the polynomial I () is divisible by (L D) on the right (left) if and only if I (D) = R (O() = R). By the Generalized Bézout Theorem, the polynomial I () = j()L j(D) is divisible by (L D) because I (D) = j(D)L j(D) = R. If I () is an ordinary polynomial, the right- and left-quotient and remainder are equal. The Caley—Hamilton Theorem (A.17) states that f(D) = 0, which indicates that f()L = T() (L D) and also f()L = (L D) T(). The matrix T() = (L D)31 f() is called the adjoint matrix of D. Explicitly, from (A.18), 3 4 q q X X T() = n31 C fm Dm3n D n=1 m=n Pq m3n . The main theand, with (A.6), T(0) = (D)31 det D = m=1 fm D oretical interest of the adjoint matrix stems from its definition f()L = 442 Stochastic matrices T() (L D) = (L D) T() in case = n is an eigenvalue of D. Then, (n L D) T(n ) = 0, which indicates by (A.1) that every non-zero column of the adjoint matrix T(n ) is an eigenvector belonging to the eigenvalue n . In addition, by dierentiation with respect to , we obtain f0 ()L = (L D) T0 () + T() This demonstrates that, if T(n ) 6= R, the eigenvalue n is a simple root of f() and, conversely, if T(n ) = R, the eigenvalue n has higher multiplicity. The adjoint matrix T() = (L D)31 f() is computed by observing that, on the Generalized Bézout Theorem, f()3f() is divisible without re3 mainder. By replacing in this polynomial and by L and D respectively, T() readily follows as illustrated in Section A.4.2. 8. Consider the arbitrary polynomial of degree o, j({) = j0 o Y ({ m ) m=1 Substitute { by D, then j(D) = j0 o Y (D m L) m=1 Since det (DE) = det D det E and det(nD) = n q det D, we have det(j(D)) = j0q o Y det(D m L) = j0q m=1 o Y f(m ) m=1 With (A.5), det(j(D)) = j0q o Y q Y (n m ) = m=1 n=1 = q Y q Y n=1 j0 o Y (n m ) m=1 j (n ) n=1 If k({) = j({) , we arrive at the general result: For any polynomial j({), the eigenvalues values of j(D) are j (1 ) > = = = > j (q ) and the characteristic polynomial is q Y det(j(D) L) = (j (n ) ) (A.19) n=1 which is a polynomial in of degree at most q. Since the result holds for an A.2 Hermitian and real symmetric matrices 443 arbitrary polynomial, it should not surprise that, under appropriate conditions of convergence, it can be extended to infinite polynomials, in particular to the Taylor series of a complex function. As proved in Gantmacher (1959a, Chapter V), if the power series of a function i (}) around } = }0 i (}) = " X im (}0 )(} }0 )m (A.20) m=1 P m converges for all } in the disc |}}0 | ? U, then i (D) = " m=1 im (}0 )(D}0 L) provided all eigenvalues of D lie with the region of convergence of (A.20), i.e. | }0 | ? U. For example, hD} = log D = " X } n Dn n=0 " X n=1 n! for all D (1)n31 (D L)n for |n 1| ? 1, all 1 n q n and, from (A.19), the eigenvalues of hD} are h}1 > = = = > h}1 . Hence, the knowledge of the eigenstructure of a matrix D allows us to compute any function of D (under the same convergence restrictions as complex numbers }). A.2 Hermitian and real symmetric matrices ¡ ¢W A Hermitian matrix D is a complex matrix that obeys DK = DW = D, where dK = (dlm )W is the complex conjugate of dlm = Hermitian matrices possess a number of attractive properties. A particularly interesting subclass of Hermitian matrices are real, symmetric matrices that obey DW ¡= D.¢The W inner-product of vector | and { is defined as | K { and obeys | K { = ¡ K ¢K P | { = {K |. The inner-product {K { = qm=1 |{m |2 is real and positive for all vectors except for the null vector. 9. The eigenvalues of a Hermitian matrix are all real. Indeed, leftmultiplying (A.1) by {K yields {K D{ = {K { ¡ ¢K and, since {K D{ = {K DK { = {K D{, it follows that {K { = K {K { or = K because {K { is a positive real number. Furthermore, since D = DK , we have DK { = { 444 Stochastic matrices Taking the complex conjugate, yields DW {W = {W In general, the eigenvectors of a Hermitian matrix are complex, but real for a real symmetric matrix since DK = DW . Moreover, the left-eigenvector | W is the complex conjugate of the right-eigenvector {. Hence, the orthogonality relation (A.10) reduces, after normalization, to an inner-product {K n {m = nm (A.21) where nm is the Kronecker delta, which is zero if n 6= m and else nn = 1. Consequently, (A.13) reduces to [K [ = L which implies that the matrix [ formed by the eigenvectors is an unitary matrix ([ 31 = [ K ). For a real symmetric matrix D, the corresponding relation [ W [ = L implies that [ is an orthogonal matrix ([ 31 = [ W ). Although the arguments so far (see Section A.1) have assumed that the eigenvalues of D are distinct, the theorem applies in general (as proved in Wilkinson (1965, Section 47)): For any Hermitian matrix D, there exists a unitary matrix X such that X K DX = diag (m ) real m and for any real symmetric matrix D, there exists an orthogonal matrix X such that X W DX = diag (m ) real m 10. To a real symmetric matrix D, a bilinear form {W D| is associated, which is a scalar defined as q q X X W W dlm {l |m { D| = {D| = l=1 m=1 We call a bilinear form a quadratic form if | = {. A necessary and su!cient condition for a quadratic form to be positive definite, i.e. {W D{ A 0 for all { 6= 0, is that all eigenvalues of D should be positive. Indeed, art. 9 shows the existence of an orthogonal matrix X that transforms D to a diagonal form. Let { = X }, then {W D{ = } W X W DX } = q X n=1 n }n2 (A.22) A.3 Vector and matrix norms 445 which is only positive for all }n provided n A 0 for all n. From (A.6), a positive definite quadratic form {W D{ possesses a positive determinant, det D A 0. This analysis shows that the problem of determining an orthogonal matrix X (or the eigenvectors of D) is equivalent to the geometrical problem of determining the principal axes of the hyper-ellipsoid q q X X dlm {l |m = 1 l=1 m=1 Relation (A.22) illustrates that the eigenvalues n are the squares of the principal axis. A multiple eigenvalue refers to an indeterminacy of the principal axes. For example if q = 3, an ellipsoid with two equal principal axis means that any section along the third axis is a circle. Any two perpendicular diameters of the largest circle orthogonal to the third axis are principal axis of that ellipsoid. A.3 Vector and matrix norms Vector and matrix norms, denoted by k{k and kDk respectively, provide a single number reflecting a “size” of the vector or matrix and may be regarded as an extension of the concept of the modulus of a complex number. A norm is a certain function of the vector components or matrix elements. All norms, vector as well as matrix norms, satisfy the three distance relations (i) k{k A 0 unless { = 0 (ii) k{k = || k{k for any complex number (iii) k{ + |k k{k + k|k In general, the Hölder t-norm of a vector { is defined as 3 41@t q X k{k = C |{m |t D t (A.23) m=1 For example, the well-known Euclidean norm or length of the vector { is found for t = 2 and k{k22 = {K {. In probability theory where { denotes P a discrete pdf, the law of total probability states that k{k1 = qm=1 {m = 1 and we will write k{k1 = k{k. Finally, max |{m | = limt<" k{kt = k{k" . The unit-spheres Vt = {{| k{kt = 1} are, in three dimensions q = 3, for t = 1, an octahedron, for t = 2, a ball and for t = 4, a cube. Furthermore, V1 fits into V2 , which in turn fits into V" , implies that k{k1 k{k2 k{k" for any {. 446 Stochastic matrices The Hölder inequality proved in Section 5.5 states that, for 1s + 1t = 1 and real s> t A 1, ¯ K ¯ ¯{ | ¯ k{k k|k (A.24) s t A special case of the Hölder inequality where s = t = 2 is the CauchySchwarz inequality ¯ K ¯ ¯{ | ¯ k{k k|k (A.25) 2 2 The t = 2 norm is invariant under an unitary (hence also orthogonal) transformation X , where X K X = L, because kX {k22 = {K X K X { = {K { = k{k2 . An p other example s of a non-homogeneous vector norm is the quadratic form k{kD = {W D{ provided D is positive definite. Relation (A.22) shows that, if not all eigenvalues m of D are the same, not all p components of the vector { are weighted similarly and, thus, in general, k{kD is a non-homogeneous norm. The quadratic form k{kL equals the homogeneous Euclidean norm k{k22 . A.3.1 Properties of norms All norms are equivalent in the sense that there exist positive real numbers f1 and f2 such that, for all {, f1 k{ks k{kt f2 k{ks For example, k{k2 k{k1 s q k{k2 k{k" k{k1 q k{k" s k{k" k{k2 q k{k" By choosing in the Hölder inequality (5.15) s = t = 1, {m $ m {vm for real v A 0 and |m $ m A 0, we obtain with 0 ? ? 1 an inequality for the weighted t-norm à Pq m=1 m |{m | Pq m=1 m v à Pq !1 v v m=1 m |{m | P q m=1 m !1 v For m = 1, the weights m disappear such that the inequality for the Hölder t-norm becomes 1 1 k{kv k{kv q v ( 31) A.3 Vector and matrix norms where q 1 1 ( 31) v 447 1. On the other hand, with 0 ? ? 1 and for real v A 0, ³P q ´ 1v 4 1v 3 41 3 ¶ 1 v q q µ v v X X |{ |{ | | m m D =C D C Pq 1 = 1 Pq v |{ | v ) v v ) n n=1 |{ | ( |{ | m=1 m=1 n n n=1 n=1 v m=1 |{m | k{kv = P q k{kv ( |{ |v 1 Since | = Sq m |{ |v 1 and 1 A 1, it holds that | | and n=1 n 3 41 3 4 1v à P !1 ¶ 1 v q q µ q v v X X |{m |v v | |{ | |{ m m m=1 C D C D = Pq Pq Pq =1 v v v n=1 |{n | n=1 |{n | n=1 |{n | m=1 m=1 1 1 which leads to the opposite inequality (without normalization as q v ( 31) ), k{kv k{kv In summary, if s A t A 0, then the general inequality for Hölder t-norm is 1 1 k{ks k{kt k{ks q t 3 s (A.26) For p × q matrices D, the most frequently used norms are the Euclidean or Frobenius norm 41@2 3 q p X X |dlm |2 D (A.27) kDkI = C l=1 m=1 and the t-norm kDkt = sup kD{kt (A.28) k{kt ° ° kD{k ° { ° On the second distance relation, k{k t = °D k{k ° , which shows that {6=0 t t t kDkt = sup kD{kt (A.29) k{kt =1 Furthermore, the matrix t-norm (A.28) implies that kD{kt kDkt k{kt (A.30) Since the vector norm is a continuous function of the vector components and since the domain k{kt = 1 is closed, there must exist a vector { for which equality kD{kt = kDkt k{kt holds. Since the n-th vector component of D{ P is (D{)l = qm=1 dlm {m , it follows from (A.23) that ¯ ¯t 41@t 3 ¯ p ¯X X ¯ q ¯ ¯ ¯ D C d { kD{kt = lm m ¯ ¯ ¯ ¯ l=1 m=1 448 Stochastic matrices For example, for all { with k{k1 = 1, we have that ¯ ¯ ¯ X q p ¯X p X q p X X X ¯ q ¯ ¯ ¯ d { |d | |{ | = |{ | |dlm | kD{k1 = lm m ¯ lm m m ¯ ¯ ¯ l=1 m=1 l=1 m=1 m=1 l=1 à ! q p p X X X |{m | max |dlm | = max |dlm | m=1 m m l=1 l=1 Clearly, there exists a vector { for which equality holds, namely, if n is the column in D with maximum absolute sum, then { = hn , the n-th basis vector with all components zero, except for the n-th one, which is 1. Similarly, for all { with k{k" = 1, ¯ ¯ ¯X ¯ q q X X ¯ q ¯ ¯ ¯ dlm {m ¯ max |dlm | |{m | max |dlm | kD{k" = max ¯ l ¯ l l ¯ m=1 m=1 m=1 Again, if u is the row with maximum absolute sum and {m = 1.sign(dum ) P P such that k{k" = 1, then (D{)u = qm=1 |dum | = maxl qm=1 |dlm | = kD{k" . Hence, we have proved that kDk" = max l kDk1 = max m from which q X m=1 p X |dlm | (A.31) |dlm | (A.32) l=1 ° K° °D ° = kDk 1 " The t = 2 matrix norm, kD{k2 > is obtained dierently. Consider kD{k22 = (D{)K D{ = {K DK D{ Since DK D is a Hermitian matrix, art. 9 shows that all eigenvalues are real and non-negative because a norm kD{k22 0. These ordered eigenvalues are denoted as 12 22 · · · q2 0. Applying the theorem in art. 9, there exists a unitary matrix X such that { = X } yields ¡ ¢ {K DK D{ = } K X K DK DX } = } K diag m2 } 12 } K } = 12 k}k22 Since the t = 2 norm is invariant under a unitary (orthogonal) transform k{k2 = k}k2 , by the definition (A.28), kD{k2 = 1 {6=0 k{k2 kDk2 = sup (A.33) A.3 Vector and matrix norms 449 where the supremum is achieved if { is the eigenvector of DK D belonging to 12 . Meyer (2000, p. 279) proves the corresponding result for the minimum eigenvalue provided that D is non-singular, ° 31 ° °D ° = 2 1 = q31 min kD{k2 k{k2 =1 The non-negative quantity m is called the m-th singular value and 1 is the largest singular value of D. The importance of this result lies in an extension of the eigenvalue problem to non-square matrices which is called the singular value decomposition. A detailed discussion is found in Golub and Loan (1983). If D has real eigenvalues 1 2 · · · q , the above can be simplified and we obtain {W D{ W {6=0 { { (A.34) {W D{ {6=0 {W { (A.35) 1 = sup q = inf because, for any {, it holds that q {W¡{ {W¢ D{ 1 {W {. The Frobenius norm kDk2I = trace DK D . With (A.7) and the analysis of DK D above, kDk2I = q X n2 (A.36) n=1 In view of (A.33), the bounds kDk2 kDkI s q kDk2 may be attained. A.3.2 Applications of norms ° ° ° n ° ° n31 ° (a) Since °D ° = °DD ° kDk °Dn31 °, by induction, we have for any integer n, that ° ° ° n° n °D ° kDk and lim Dn = 0 if kDk ? 1 n<" (b) By taking the norm of the eigenvalue equation (A.1), kD{k = || k{k and with (A.30), || kDkt (A.37) 450 Stochastic matrices Applied to DK D, for any t-norm, ° ° ° ° 12 °DK D°t °DK °t kDkt Choose t = 1 and with (A.33), ° ° kDk22 °DK °1 kDk1 = kDk" kDk1 (c) Any matrix D can be transformed by a similarity transform K to a Jordan canonical form F (art. 4) as D = KFK 31 , from which Dn = KF n K 31 . A typical Jordan submatrix (Fp ())n = n32 E, where E is independent of n. Hence, for large n, Dn $ 0 if and only if || ? 1 for all eigenvalues. A.4 Stochastic matrices A probability matrix S is reducible if there is a relabeling of the states that leads to ¸ S E 1 Se = R S2 where S1 and S2 are square matrices. Relabeling amounts to permuting rows and columns in the same fashion. Thus, there exists a similarity transform K such that S = K SeK 31 . A.4.1 The eigenstructure In this section, the basic theorem on the eigenstructure of a stochastic, irreducible matrix will be proved. Lemma A.4.1 If S is an irreducible non-negative matrix and if y is a vector with positive components, then the vector } = (S +L)y has always fewer zero components than y. Proof: Denote y= y1 0 ¸ and } = }1 0 ¸ where y1 A 0> }1 A 0 which is always possible by suitable renumbering of the states and ¸ S11 S12 S = S21 S22 A.4 Stochastic matrices 451 The relation } = (S + L)y is written as ¸ ¸ ¸ }1 S11 y1 y1 = + 0 S21 y1 0 Since S is irreducible, S21 6= R, such that y1 A 0 implies that S21 y1 6= 0, which proves the lemma. ¤ Observe, in addition, that all components of } are never smaller than those of y. Also, transposing does not alter the result. Theorem A.4.2 (Frobenius) The modulus of all eigenvalues of an irreducible stochastic matrix S are less than or equal to 1. There is only one real eigenvalue = 1 and the corresponding eigenvector has positive components. Proof: The t = 4 norm (A.31) of a probability matrix S with q states defined by (9.7) subject to (9.8) precisely equals kS k" = 1. From (A.37), it follows that all eigenvalues are, in absolute value, smaller than or equal to 1. Since all elements Slm 5 [0> 1] and because an irreducible matrix has no zero element rows, y W S has positive components if yW has positive components. (yW S ) Thus, there always exists a scalar 0 ? y = min1$n$q (yW ) n , such that n y y W y W S . By Lemma A.4.1, we can always transform the vector y to a vector } by right-multiplying both sides with (L + S ) such that y y W (L + S ) yW S (L + S ) y } W } W S and, by definition of y , y } since the components of } are never smaller than those of y. Hence, for any arbitrary vector y with positive components, the transform in Lemma A.4.1 leads to an increasing set y } · · · , which is bounded by 1 because no eigenvalue can exceed 1. This shows that = 1 is the largest eigenvalue and the corresponding eigenvector | W has positive components. This eigenvector | W is unique. For, if there were another linearly independent eigenvector zW corresponding to the eigenvalue = 1, any linear combination } W = | W + zW is also an eigenvector belonging to = 1. But and can always be chosen to produce a zero component which the transform method shows to be impossible. The fact that the eigenvector | W is the only eigenvector belonging = 1, implies that the eigenvalue = 1 is a single zero of the characteristic polynomial of S . ¤ The theorem proved for stochastic matrices is a special case of the famous Frobenius theorem for non-negative matrices (see for a proof, e.g. Gant- 452 Stochastic matrices macher (1959b, Chapter XIII)). We note that, in the theory of Markov chains, the interest lies in the determination of the left-eigenvector | W = belonging to = 1, because the right-eigenvector { of S belonging to = 1 equals xW = [1 1 · · · 1], where is a scalar, because of the constraints (9.8). Recall (A.10) and (A.13), the proper normalization, | W x = 1, precisely corresponds to the total law of probability. Using the interpretation of Markov chains, an alternative argument is possible. If all eigenvalues were || ? 1, application (c) in Section A.3.2 indicates that the steady-state would be non-existent because S n $ 0 for n $ 4. Since this is impossible, there must be at least one eigenvalue with || = 1. Furthermore, (9.22) shows that at least one eigenvalue corresponding to the steady-state is real and precisely 1. Corollary A.4.3 An irreducible probability matrix S cannot have two linearly independent eigenvectors with positive components. Proof: Consider, apart from |W = belonging to = 1, another eigenvector zW belonging to the eigenvalue $ 6= 1. On art. 3, zW | = 0, which is ¤ only possible if not all components of zW are positive. The corollary is important because no other eigenvector of S than | W = can represent a (discrete) probability density. Since the null vector is never an eigenvector, the corollary implies that at least one component in the other eigenvectors must be negative. Since the characteristic polynomial of S has real coe!cients (because Slm is real), the eigenvalues occur in complex conjugate pairs. Since = 1 is an eigenvalue, for an even number of state q, there must be at least another real eigenvalue obeying 1 ? 1. It has been proved that the boundary of the locations of the eigenvalues inside the unit disc consists of a finite number of points on the unit circle joined by certain curvilinear arcs. There exist an interesting property of a rank-one update S̄ of a stochastic matrix S . The lemma is of a general nature and also applies to reducible Markov chains with several eigenvalues m = 1 for 1 ? m n. Lemma A.4.4 If {1> 2 > 3 > = = = > q } are the eigenvalues of the stochastic matrix S , then the eigenvalues of S̄ = S + (1 )xy W , where y W is any probability vector, are {1> 2 > 3 > = = = > q }. A.4 Stochastic matrices 453 Proof: We start from the eigenvalues equation (A.2) ¢ ¡ ¢ ¡ det S̄ L = det S L + (1 )xy W ³ ³ ´´ = det (S L) L + (S L)31 (1 )xyW ´ ³ = det (S L) det L + (1 ) (S L)31 xy W Applying the formula ¢ ¡ det L + fgW = 1 + gW f (A.38) which follows, after taking the determinant, from the matrix identity ¶µ ¶µ ¶ µ ¶ µ L 0 L + fgW f L f L 0 = gW 1 0 1 + gW f gW 1 0 1 gives ´ ³ ¡ ¢ det S̄ L = det (S L) 1 + y W (1 ) (S L)31 x Since the row sum of a stochastic matrix S is 1, we have that S x = x and, thus, (S L) x = ( ) x from which (S L)31 x = ( )31 x. Using this result leads to 1 + y W (1 ) (S L)31 x = 1 + 1 1 W 1 y x=1+ = because a probability vector is normalized to 1, i.e. y W x = 1. Hence, we end up with ¡ ¢ 1 det S̄ L = det (S L) Invoking (A.19) yields ¡ ¢ det S̄ L = q Y n=1 Y 1 = (1 ) (n ) (n ) q n=2 which shows that the eigenvalues of S̄ are {1> 2 > 3 > = = = > Q }. ¤ A similar property may occur in a special case where a Markov chain is supplemented by an additional state q + 1 which connects to every other state and to which every other state is connected (such that S̄ is irreducible). Then, ¶ µ S (1 )x S̄ = 0 yW 454 Stochastic matrices with corresponding eigenvalues {1> 2 > 3 > = = = > q > 0}. This result is similarly proved as Lemma A.4.4 using (Meyer, 2000, p. 475) ¶ µ ¡ ¢ D E = det D det G FD31 E det (A.39) F G provided D31 exists unless F = 0. A.4.2 Example: the two-state Markov chain The two-state Markov chain is defined by ¸ 1s s S = t 1t Observe that det S = 1st. The eigenvalues of S satisfy the characteristic polynomial f() = 2 (2 s t) + det S = 0, from which 1 = 1 and 2 = 1 s t = det S . The adjoint matrix T () is computed (art. 7) via the polynomial f()3f() 3 , f() f() = + (2 s t) and after $ L and $ S T () = L + S (2 s t)L ¸ 1+t s = t 1+s The (unscaled) right- (left-) eigenvectors of S follow as the non-zero columns (rows) of T (). For 1 = 1, we find {1 = (1> 1) and |1W = (t> s). For 2 = 1st, the eigenvector {2 = (s> t) and |2W = (1> 1). Normalization 1 1 (1> 1) and {2 = s+t (s> t). If the (art. 4) requires that |nW {n = 1 or {1 = s+t eigenvalues are distinct (s + t 6= 0), the matrix S can be written as (art. 4) S = [diag(n )\ W , ¸ ¸ ¸ 1 1 s 1 0 t s S = 0 1st 1 1 s + t 1 t from which any power S n is immediate as ¸ ¸ ¸ 1 1 0 1 s t s n S = 1 1 0 (1 s t)n s + t 1 t ¸ ¸ n 1 (1 s t) t s s s = + t t s+t t s s+t (A.40) A.4 Stochastic matrices The steady-state matrix S " = limn<" S n follows as ¸ ¸ 1 t s " S = s+t t s 455 (A.41) because |1 s t| ? 1. Alternatively, the steady-state vector is a solution of (9.25), ¸ ¸ ¸ 0 s t 1 = 1 2 1 1 ¸ s t Applying Cramer’s rule with G = det = (s + t), we obtain 1 1 ¸ ¸ 0 t s 0 1 1 and 2 = G or 1 = G det det 1 1 1 1 h i t s = s+t s+t which indeed agrees with (A.41) and (9.37). A.4.3 The tendency towards the steady-state A stochastic matrix S and the corresponding Markov chain is regular if the only eigenvalue with || = 1 is = 1. It is fully regular if, in addition, = 1 is a simple zero of the characteristic polynomial of S . The Frobenius Theorem A.4.2 indicates that a regular matrix is necessarily reducible. Application (c) in Section A.3.2 demonstrates that the steady-state only exists for regular Markov chains. Alternatively, a regular matrix S has the property that S n A R (for some n), i.e. all elements are strictly positive. In the sequel, we concentrate on fully regular stochastic matrices S , where all eigenvalues lie within the unit circle, except for the largest one, = 1. If the Q eigenvalues of the regular stochastic matrix S are ordered as 1 = 1 A |2 | · · · |Q | 0, the second largest eigenvalue 2 will determine the speed of convergence of the Markov chain towards the steady-state. A.4.3.1 Example: the three-state Markov chain The three-state Markov chain S is defined by (9.7) with Q = 3. Assuming that S is irreducible, we determine the eigenvalues. Since the Frobenius Theorem A.4.2 already determines one eigenvalue 1 = 1, the remaining two 2 and 3 are found from (A.6) and (A.7). They obey the equations 2 3 = det S 2 + 3 = S11 + S22 + S33 1 = trace(S ) 1 456 Stochastic matrices or the quadratic equation {2 (2 + 3 ) { + 2 3 = 0. The explicit solution is q 1 1 2 = (trace(S ) 1) + (trace(S ) 1)2 4 det S 2 2q 1 1 (trace(S ) 1)2 4 det S 3 = (trace(S ) 1) 2 2 All eigenvalues are real if the discriminant (trace(S ) 1)2 4 det S is nonnegative which leads to three cases: (a) In case (trace(S ) 1)2 A 4 det S , the eigenvalues obey 1 A 2 A 3 , but not necessarily 1 A |2 | A |3 |. The latter inequality is true if trace(S ) A 1, in which case the speed of convergence towards the steadystate is determined by the decay of (2 )n as n $ 4. If trace(S ) = 1, then s 2 = 3 = det S and if trace(S ) ? 1, |3 | determines the speed of convergence. Notice that 2 A 12 (trace(S ) 1) 12 . (b) In case (trace(S ) 1)2 ? 4 det S , there are two complex conjugate roots 2 = + l and 3 = l, both with the same modulus |2 | = |3 | equal to 2 + 2 = 2 3 = det S and with real part = 12 (trace(S ) 1). In this case, we have that 0 det S ? 1. Hence, the Markov chain converges ´n ³s det S in the discrete-time n. towards the steady-state as (c) In case (trace(S ) 1)2 = 4 det S , there is a double eigenvalue = 2 = 3 = s 1 (trace(S ) 1) = ± det S 2 and S cannot be reduced by a similarity transform K to a diagonal matrix (Section A.1, art. 4) that but to the Jordan canonical form F such that S n = K 31 F n K. Since (Meyer, 2000, pp. 599—600) 6 35 64n 5 1 0 0 1 0 0 F n = C7 0 1 8D = 7 0 n nn31 8 0 0 n 0 0 the Markov chain converges towards the steady-state as n ³s ´n31 det S in the discrete-time n. We observe that 12 2 = 3 ? 1 because 0 trace(S ) ? 3. If trace(S ) = 3, then S = L, and S is not irreducible. The fastest possible convergence occurs when 2 = 3 = 0 or when det S = 0 and trace(S ) = 1 in which case S has rank 1. In any matrix of rank 1, all row vectors are linearly dependent. Since the column sum of a stochastic matrix S is 1 by (9.8), every row in S is precisely the same and (9.6) shows that after one discrete-time step, the steady-state £ A.5 Special types of stochastic matrices ¤ 457 = Q1 Q1 · · · Q1 is reached. As shown in Section 9.3.1, a transition probability matrix with constant rows can be regarded as a limit transition probability matrix D = limn<" S̆ n of a Markov process with transition probability matrix S̆ . A.5 Special types of stochastic matrices A.5.1 Doubly stochastic matrices A doubly stochastic matrix S has both row and column sums equal to 1, Q X n=1 Sln = Q X Snm = 1 for all l> m n=1 If S is symmetric, S = S W , then S is doubly stochastic, but the reverse implication is not true. As observed in Section A.4.1, the left-eigenvector | W = and the right-eigenvector { = x belonging to eigenvalue = 1 satisfy | W x = 1. For doubly stochastic matrices, it holds that the role of left- and right-eigenvector can be reversed, which leads to | = { or ¤ £ = Q1 Q1 · · · Q1 The example in Section A.4.3 ¤ illustrates that a steady-state vector equal £ does not necessarily imply that S is doubly to = Q1 Q1 · · · Q1 stochastic. A.5.2 Tri-diagonal bandmatrices A.5.2.1 Tri-diagonal Toeplitz bandmatrix A Toeplitz matrix has constant entries on each diagonal parallel to the main diagonal. Of particular interest is the Q × Q tri-diagonal Toeplitz matrix, 6 5 e d : 9 f e d 9 : 9 : .. .. .. D=9 : . . . 9 : 7 f e d 8 f e that arises in the Markov chain of the random walk and the birth and death process. Moreover, the eigenstructure of the tri-diagonal Toeplitz matrix D can be expressed in analytic form. 458 Stochastic matrices An eigenvector { corresponding to eigenvalue satisfies (D L) { = 0 or, written per component, (e ){1 + d{2 = 0 f{n31 + (e ){n + d{n+1 = 0 2n Q 1 f{Q31 + (e ){Q = 0 We assume that d 6= 0 and f 6= 0 and rewrite the set with {0 = {Q+1 = 0 as µ ¶ ³f´ e 0n Q 1 {n+1 + {n = 0 {n+2 + d d which are second order dierence equations with constant coe!cients. The n general solution of these equations is {n = u¡1n +u 2 where ¢ ¡ f ¢ u1 and u2 are the e3 2 roots of the corresponding polynomial { + d { + d = 0. If u1 = u2 , the general solution is {n = u1n + nu1n , which is impossible since it implies that all {n = 0 due to the fact that {0 = {Q+1 = 0, which forces u1 to be zero. An eigenvector is never the zero vector. Thus, we have distinct roots u1 6= u2 that satisfy µ ¶ e u1 + u2 = d f u1 u2 = d The constants and follow from the boundary requirement {0 = {Q +1 = 0 as + =0 u1Q +1 + u2Q+1 = 0 Rewriting the last equation with = , yields ³ ´Q+1 u1 u2 = 1 or uu12 = 2lp h Q +1 for some 1 p Q (the root p = 0 must be rejected since u1 6= u2 ). 2lp Substitution of u1 = u2 h Q +1 into the last root equation yields u1 = pf lp Q +1 dh and u2 = p f 3 lp Q +1 dh The first root equation is only possible for special values of = p with 1 p Q , which are the eigenvalues, r ³ µ ¶ ´ s lp f 3 Qlp p h +1 + h Q +1 = e + 2 df cos p = e + d d Q +1 A.5 Special types of stochastic matrices 459 Since there are precisely Q dierent values of p, there are Q distinct eigenvalues p . The components {n of the eigenvector belonging to p are µ ¶ ³ f ´ n ³ lpn ´ ³ f ´n lpn pn 2 2 {n = sin h Q +1 h3 Q +1 = 2l d d Q +1 The scaling constant follows from the normalization k{k1 = 1 or 2l Q ³ ´n X f 2 n=1 ³ Since sin pn Q+1 Q ³ ´n X f 2 n=1 d ´ d µ sin pn Q +1 ¶ =1 h lpn i = Im h Q +1 we have µ sin pn Q +1 ¶ = Im " Q µr X f n=1 5 d ³p f h lp Q +1 lp Q +1 ¶n # ´Q+1 6 dh : 91 18 = Im 7 p f lp 1 d h Q +1 ³ ³ ´ ¡p f ¢Q +1 ´ p 1 + (1)p sin d Q+1 ³ ´ = 1 pf p f 1 2 d cos Q+1 + d from which the scaling constant is ³ ´ 631 5³ ¡p f ¢Q+1 ´ p sin 1 + (1)p d Q+1 ³ ´ 18 2l = 7 pf p f 1 2 d cos Q+1 + d Finally, the components {n of the eigenvector { belonging to p become, for 1 n Q , ´ ³ ¡f ¢n pn 2 sin d Q+1 {n = s f Q +1 p 1+(31) ( d ) sin( Qp +1 ) sf 1 p 132 d cos( Q +1 )+ df Observe that for stochastic matrices d + e + f = 1 (see the general random walk in Section 11.2) and for the infinitesimal rate matrix d + e + f = 0 (see the birth and death process in Section 11.3), which only changes the eigenvalue through e. 460 Stochastic matrices A.5.2.2 Tri-diagonal AMS matrix This section computes the exact spectrum of the tri-diagonal AMS matrix specified in (14.51). The analysis bears some resemblance to that of the birth and dead process with constant birth and death rates in Section 11.3.3. The eigenvalue equation G31 T{ = { is rewritten for the m-th component of the right-eigenvector belonging to the eigenvalue as (Q m + 1) {m31 [( + 1 ) m + Q f] {m + (m + 1) {m+1 = 0 for 0 m Q . This dierence equation has linear coe!cients whereas those in Section A.5.2.1 are constant. It is most conveniently solved using P m generating functions. Let J (}) = Q m=0 {m } , then the dierence equation @ [0> Q ] to is transformed with {m = 0 if m 5 ¢ ¡ ¢ ¡ Q } J (}) {Q } Q } 2 J0 (}) Q {Q } Q31 ( + 1 ) }J0 (}) [Q f]J (}) + J0 (}) = 0 from which the logarithmic derivative is Q } + f Q J0 (}) = 2 J (}) } + ( + 1 ) } 1 The integration of the right-hand side requires a partial faction decomposition, Q } + f Q } 2 + ( + 1 ) } 1 = f2 f1 + } u1 } u2 where u1 and u2 are the roots of the quadratic polynomial } 2 +( + 1 ) } 1 and f1 and f2 are the residues computed for n = 1> 2 as fn = lim }<un (} un ) (Q } + f Q) (} u1 ) (} u2 ) . Explicitly, and they obey f1 + f2 = Q and f1 u2 + f2 u2 = f3Q p ( + 1 )2 + 4 u1 = A0 2 p ( + 1 ) ( + 1 )2 + 4 ?0 u2 = 2 ( + 1 ) + (A.42) s with u1 u2 = 1 and u1 + u2 = +13 . Moreover, unless = 1 ± 2l A.5 Special types of stochastic matrices 461 in which case u1 = u2 = ± Il , the roots are distinct. The residues are Q u1 + f Q (u1 u2 ) Q u2 + f Q = Q f1 f2 = (u2 u1 ) f1 = (A.43) Integration now yields log J (}) = f1 log (} u1 ) + f2 log (} u2 ) + e or J (}) = he (} u1 )f1 (} u2 )Q3f1 The integration constant e is obtained from lim}<" J(}) = {Q . Thus, }Q µ ¶f1 ³ J (}) } u1 u2 ´Q e = h lim = he 1 lim }<" } Q }<" } u2 } such that he = {Q . The obvious scaling for the eigenvector is to choose {Q = 1 and we arrive at J (}) = Q X {m } m = (} u1 )f1 (} u2 )Q3f1 (A.44) m=0 which shows that f1 must be an integer n 5 [0> Q ] for J (}) to be a polynomial of degree Q . Expanding the binomials with f1 = n gives n µ ¶ Q3n X X µQ n¶ n m n3m J (}) = } (u1 ) } q (u2 )Q 3n3q m q q=0 m=0 ¶ m µ ¶µ " X X n Q n = (u1 )n3m (u2 )Q3n3m+q } m q m q q=0 m=0 from which the eigenvector components belonging to $ (n) are, for 0 m Q, ¶ m µ ¶µ X n Q n Q3m (A.45) u1 n3m u2 Q3n3m+q {m (n) = (1) q m q q=0 The requirement on f1 also leads to equations for the eigenvalues . Indeed, equating f1 = n in (A.43) and substituting the explicit expressions for the roots u1 and u2 , we obtain after squaring the quadratic equations for the eigenvalue (n) for 0 n Q D(n) 2 (n) + E(n) (n) + F(n) = 0 (A.46) 462 Stochastic matrices where D(n) = (Q@2 n)2 (Q@2 f)2 E(n) = 2(1 ) (Q@2 n)2 Q (1 + ) (Q@2 f) F(n) = (1 + )2 [(Q@2)2 (Q@2 n)2 ] Each of the Q + 1 quadratic equations (A.46) has two roots 1 (n) and 2 (n), thus in total 2(Q +1), while there are only Q +1 eigenvalues. The coe!cients D (n), E (n) and F (n) only depend on n via (Q@2 n)2 , which means that the quadratics (A.46) for which n0 = Q n are identical. This observation reduces the set {1 (n)> 2 (n)}0$n$Q of roots to precisely Q + 1 and confines the analysis to 0 n Q@2. We will show that all roots are real and distinct (except for n = Q@2). (n) = E 2 (n) 4D (n) F (n) is with | = (Q@2 n)2 5 £ The discriminant ¤ 0> (Q@2)2 , ¢ ¡ (n) = 16| 2 + 4 (1 + ) f2 (1 + ) 2fQ + Q 2 | 2 {(n) which shows that (n) is concave in | because g g| = 32 ? 0, for 2 | = 0, (Q@2) = 0 and, for | = (Q@2)2 , (0) = Q 2 (f(1 + ) Q )2 A 0 and, hence, (n) 0 for n 5 [0> Q@2]. This means that, for 0 n ? Q@2, the roots 1 (n) and 2 (n) are real and distinct and, for n = Q@2 (only if Q is even) where (Q@2) = 0, 1 (Q@2) = 2 (Q@2) = E (Q@2) 1+ = 2D (Q@2) 1 2 Qf For ? n Q@2, the roots {1 ()> 2 ()} are dierent from the roots {1 (n)> 2 (n)} because D(n) } 2 + E(n) } + F(n) ? D() } 2 + E() } + F() for all }. Indeed, D() D(n) = (Q@2 )2 (Q@2 n)2 A 0 and the discriminant (E() E (n))2 4 (D() D(n)) (F() F(n)) ? 0 shows that there are no real solutions. Thus, an extreme eigenvalue occurs for n = 0 for which F (0) = 0 such that 1 (0) = 0 and 2 (0) = 1 + Q E (0) f = D (0) 1 Qf (A.47) Q ? 1 and f ? Q shows that 2 (0) ? 0, The stability requirement = f(1+) and thus 2 (0) is the largest negative eigenvalue. The eigenvalues for other 0 ? n Q@2 are either larger than 0 or smaller than 2 (0). We need to consider two dierent cases (a) f ? Q@2 and (b) f A Q@2 while F (n) ? 0 for all n 5 [0> Q). (a) If f ? Q@2 and if 0 n ? f and , then D (n) A 0. Hence, the product A.5 Special types of stochastic matrices 463 1 (n)2 (n) = F(n) D(n) ? 0 which means that 1 (n) A 0 A 2 (n) and that there are precisely [f] positive eigenvalues. Similarly, D (n) ? 0 for f ? n ? Q@2, such that 1 (n)2 (n) A 0 while 1 (n) + 2 (n) = E(n) D(n) ? 0 shows that both eigenvalues are negative because E (n) ? 0. Indeed, if 1 and f ? Q@2, the above expression immediately leads to E (n) ? 0 while if ? 1 and f ? Q@2, the expression "µ µ ¶2 µ ¶2 # ¶ ¸ Q Q Q Q 2f n f f +1 E(n) = 2(1 ) 2 2 2 f shows that both terms are negative. (b) If f A Q@2, we see that D (n) A 0 for 0 ? n ? Q f leading to 1 (n) A 0 A 2 (n). For Q f ? n ? Q@2, we have D (n) ? 0 and thus 1 (n)2 (n) = F(n) D(n) A 0 while their same sign follows from 1 (n) + 2 (n) = E(n) D(n) requires us to consider the sign of E (n). If 1, then E (n) A 0. If A 1, then µ ¶ Q E(n) = Q (1 + ) f + 2(1 ) (Q@2 n)2 2 µ ¶ µ ¶ Q Q 2 ? Q (1 + ) f + 2(1 ) f 2 2 µ ¶µ ¶ Q (Q f) = 2f f +1 A0 2 f which shows that 0 ? 2 (n) ? 1 (n). Hence, there are Q [f] + 2(Q@2 Q + [f]) = [f] positive eigenvalues. In summary, there are [f] positive eigenvalues, one 1 (0) = 0 and Q [f] negative eigenvalues. Relabel the eigenvalues as (n > Q3n ) = (1 (n)> 2 (n)) in increasing order Q3[f]31 ? · · · ? 1 ? 0 ? Q = 0 ? Q31 ? · · · ? Q3[f] .This way of writing distinguishes between underload and overload eigenvalues. In terms of the discriminant by (n) = E 2 (n) 4D (n) F (n), the non-positive eigenvalues are (a) If f ? Q@2, s 3E(n)3 {(n) 2D(n) s 3E(n)~ {(n) 1>2 (n) = 2D(n) 1 (n) = 0 n [f] [f] + 1 n Q2 (b) If f A Q@2, s 3E(n)3 {(n) 1 (n) = 2D(n) 0 n Q [f] 1 464 Stochastic matrices The eigenvector belonging to m follows from (A.45) where u1 and u2 are given in (A.42) and n is determined from (A.43) since n = f1 . The eigenvectors for 1 (n) and 2 (n) belonging to a same quadratic n must be dierent. Especially in this case, the corresponding n = f1 values can be determined from (A.43). For example, for Q = 0, we find u1 = 1, u2 = 1 and n = 0 and the eigenvector belonging to Q is with (A.45), µ ¶ m µ ¶ Q Q3m Q 3m Q 3m {m (0) = (1) = (A.48) u1 u2 m Q m After renormalization such that k{(0)k1 = 1, i.e. by dividing each com¡Q ¢ m P (1+)Q 1 PQ , the steady-state vector ponent by Q m=0 {m (0) = Q m=0 m = Q (14.52) is obtained. Similarly, for the largest negative eigenvalue 0 in (A.47), we find with u1 = 1 Qf , u2 = Q131 and n = f1 = Q such that (f ) µ ¶µ ¶Q3m µ ¶ Q Q Q3m Q Q 3m 0 u2 = (A.49) 1 u1 {m (Q ) = (1) m f m The left-eigenvectors | satisfy (A.9): | W G31 T = |W . The above approach is applicable. However, there is a more elegant method based on the observation that there exists a diagonal matrix q Z = gldj (Z0 > = = = > ZQ ) for ¡Q ¢ ¡ 31 ¢W 31 31 TZ is m which Z TZ = Z TZ , namely Zm = m . Since Z symmetric, the left- and right-eigenvectors corresponding to the same eigenvalue are the same (Section A.2, art. 9). Now |W G31 T = | W is equivalent to ¡ ¢31 ¡ 31 ¢ Z TZ | W Z = |W Z Z 31 G31 Z Z 31 TZ = | W Z Z 31 GZ W = |W Z , G 31 GZ and T 31 TZ = TW , we obtain With |Z Z = Z Z = Z Z 31 31 W W |Z GZ TZ = |Z . The transpose |Z = TZ GZ |Z is Z 2 | = TG31 Z 2 | which shows compared to G31 T{ = { that { = Z 2 | or, the vector components are, for 0 m Q , µ ¶ Q m {m = |m (A.50) m A.5.2.3 General tri-diagonal matrices Since tri-diagonal matrices of the form (11.1) frequently occur in Markov theory, we devote this section to illustrate how far the eigen-analysis can be A.5 Special types of stochastic matrices 465 pushed. For an eigenpair (the right-eigenvector { belonging to eigenvalue ), the components in (S L){ = 0 satisfy (u0 ) {0 + s0 {1 = 0 tm {m31 + (um ) {m + sm {m+1 = 0 1m?Q tQ {Q31 + (uQ ) {Q = 0 If sm = s and tm = t, the matrix S reduces to a Toeplitz form for which the eigenvalues and eigenvectors can be explicitly written, as shown in Appendix A.5.2.1. Here, we consider the general case and show how orthogonal polynomials enter the scene. Using um = 1 tm sm , u0 = 1 s0 and uQ = 1 tQ , the set becomes, with = 1, s0 + {0 () s0 sm + tm + tm {m () {m31 () {m+1 () = sm sm tQ {Q 31 () {Q () = tQ + {1 () = 1m?Q (A.51) The dependence on the eigenvalue is made explicit. Solving (A.51) iteratively for m ? Q , ¢ {0 () ¡ 2 + (t1 + s1 + s0 ) + s1 s0 s0 s1 ¢ {0 () ¡ 3 + (t1 + t2 + s2 + s1 + s0 ) 2 {3 () = s2 s1 s0 + (t2 t1 + t2 s0 + s2 t1 + s2 s1 + s2 s0 + s1 s0 ) + s2 s1 s0 {2 () = reveals a polynomial of degree m in the eigenvalue = 1. By inspection, the general form of {m () for m ? Q is m {0 () X {m () = Qm31 fn (m) n p=0 sp n=0 (A.52) 466 Stochastic matrices with fm (m) = 1 fm31 (m) = f0 (m) = m31 X p=0 m31 Y (sp + tp ) sp p=0 where t0 = sQ = 0. By substituting (A.52) into (A.51), m31 X fn (m + 1) n = n=1 m31 X [(tm + sm ) fn (m) tm sm31 fn (m 1) + fn31 (m)] n n=1 and equating the corresponding powers in , a recursion relation for the coe!cients fn (m) (0 n ? Q ) is obtained with fm (m) = 1, fn (m + 1) = (tm + sm ) fn (m) tm sm31 fn (m 1) + fn31 (m) from which all coe!cients can be determined. Finally, for m = Q , the explicit form of {Q () follows from (A.51) as tQ tQ {0 () X fn (Q 1) n {Q31 () = {Q () = QQ 32 tQ + tQ + p=0 sp Q 31 n=0 We can always scale an eigenvector without eecting the corresponding eigenvalue. If we require a normalization of the eigenvector k{()k1 = 1, then {0 () is uniquely determined, {0 () = 1 ¯ ¯ ¯P ¯ ¯ PQ 31 ¯¯Pm ¯ fn (m) fn (Q31) n ¯ T 1 + m=1 ¯ n=0 Tm31 n ¯¯ + |tQtQ+| ¯ Q31 ¯ Q 32 n=0 p=0 sp p=0 sp Another scaling consists of choosing {0 () = 1. Hence, apart from the eigenvalue , all eigenvector components {m () are explicitly determined. If = 1 or = 0, the solution is {m () = {0 (0). If k{()k1 = 1, then {m () 1 , which is, after proper scaling by Q + 1 (art. 4 in Section A.1), the = Q+1 £ ¤ right-eigenvector x = 1 1 · · · 1 belonging to the left-eigenvector (see also Section A.4.1). If {0 () = 1, we immediate obtain x. Eigenvectors belonging to dierent eigenvalues 0 6= are linearly independent (art. 3 in Section A.1), but only orthogonal if S = S W , i.e. if sm = tm+1 . Only in the latter case (art. 9 in Section A.2), where also all eigenvalues are real, we A.5 Special types of stochastic matrices 467 have Q X ¡ ¢ {m () {m 0 = k{()k22 0 m=0 This orthogonality requirement determines the dierent eigenvalues . Since 0 = 0 is an eigenvalue, each other real eigenvalue 6= 0 must obey Q X {m () = 0 m=0 P while the normalization enforces k{()k1 = Q m=0 |{m ()| = 1. The scaling PQ {0 () = 1 leads to the polynomial m=0 en n of degree Q whose Q zeros equal the eigenvalues 6= 0 and whose coe!cients are, with sm = tm+1 and for 2 n Q 2, e0 = (Q + 1) tQ e1 = Q + tQ en = Q 32 X f1 (m) f1 (Q 1) + 2tQ QQ32 Qm31 p=0 sp p=0 sp m=1 Q32 X Q31 X fn31 (m) tQ fn (m) 2tQ fn (Q 1) + + QQ32 Qm31 Qm31 p=0 sp p=0 sp p=0 sp m=n m=n31 sQ32 + 2tQ + fQ32 (Q 1) QQ32 p=0 sp 1 eQ = QQ31 p=0 sp eQ31 = The Newton identities (B.9) relate these coe!cients to the sum of integer powers of the real zeros 6= 0. Proceeding much further in the case that S is not symmetric is di!cult. A similarity transform is needed to transform the linearly independent set of vectors { () for dierent to an orthogonal set from which the eigenvalues then follow, as in the symmetric case above. Karlin and McGregor (see Schoutens (2000, Chapter 3)) have shown the existence of a set of orthogonal polynomials (similar to our set {m ()) that obey an integral orthogonality condition (similar to Legendre or Chebyshev polynomials) instead of our summation orthogonality condition. Only in particular cases, however, were they able to specify this orthogonal set explicitly. 468 Stochastic matrices A.5.3 A triangular matrix complemented with one subdiagonal The transition probability matrix S has the structure of a triangular matrix complemented with one subdiagonal, 5 6 S00 S01 S02 ··· ··· S0Q 9 S10 S11 S12 ··· ··· S1Q : 9 : 9 0 S21 S22 ··· ··· S2Q : 9 : S =9 . : .. .. .. .. 9 .. : . . ··· . . 9 : 7 0 0 · · · SQ31>Q32 SQ31;Q31 SQ31;Q 8 0 0 ··· 0 SQ;Q31 SQQ Besides the normalization kk1 = 1, the steady-state vector obeys the relation = =S , or per vector component (9.23), m = m+1 X Snm n n=0 because Snm = 0 if n A m + 1. Immediately we obtain an iterative equation that expresses m+1 (for m ? Q ) in terms of the n for 0 n m as µ m+1 = 1 Smm Sm+1;m ¶ m31 X Snm n m Sm+1;m n=0 Let us consider the eigenvalue equation (A.1) that is written for stochastic matrices as (S L)W {W = 0. The matrix (S L)W is a (Q + 1) × (Q + 1) matrix of rank Q because det(S L)W = 0 (else all eigenvectors { are zero). When writing this set of equations in terms of {0 , we produce the following set of Q equations, 5 S10 9 S11 3 9 9 9 S12 9 9 9 9 ··· 9 9 .. 7 . 0 S21 0 0 ··· ··· S22 3 S32 ··· .. . ··· .. . S1;Q 31 S2;Q 31 S3;Q 31 ··· .. . 0 0 .. . 0 0 .. . ··· ··· ··· ··· SQ 31;Q 32 SQ 31;Q 31 3 0 SQ ;Q 31 6 6 5 3S 6 5 { : 1 00 : : 9 {2 : 9 3S01 : : 9 {3 : 9 3S02 : : 9 : :9 :=9 : {0 :=9 .. .. : 9 : :9 : 9 : :9 . . :7 8 7 8 : {Q 31 3S0;Q 32 8 {Q 3S0;Q 31 Since the right hand side matrix is a triangular Q matrix, the determinant equals the product of the diagonal elements or Q31 n=0 Sn+1;n . By Cramer’s A.5 Special types of stochastic matrices 469 rule, we find that 5 S10 9 S11 3 9 9 9 S12 9 9 .. 9 9 . 9 det 9 .. 9 . 9 9 9 9 S1m 9 9 .. 7 . {m = {0 S1;Q 31 0 S21 S22 3 .. . .. . S2m .. . S2;Q 31 ··· ··· .. . 0 0 .. . 3 S00 3S01 .. . ··· Sm31;m32 3S0>m32 0 0 .. . .. . ··· Sm31;m31 3 3S0;m31 0 ··· ··· .. . ··· Sm31;m .. . 3S0m .. . 3S0;Q 31 Sm+1;m .. . ··· .. . ··· Sm31;Q 31 TQ 31 n=0 Sm+1;Q 31 ··· ··· ··· ··· 0 0 .. . .. . .. . .. . 0 SQ ;Q 31 6 : : : : : : : : : : : : : : : : : 8 Sn+1;n The above determinant is of the form (Meyer, 2000, p. 467) ¸ Dm×m Rm×Q3m det = det F det D EQ3m×m FQ 3m×Q3m Q 31 where det F = Q n=m Sn+1;n . In the determinant det D, we can change the m-th column with the (m 1)-th, and subsequently, the (m 1) th with the (m 2)-th and so on until the last column is permuted to the first column, in total m 1 permutations. After changing the sign of that first column, the result is that det D = (1)m det (Sm×m Lm×m ) where Sm×m is the original transition probability matrix limited to m states (instead of Q + 1). Hence, for 1 m Q , {0 (1)m det (Sm×m Lm×m ) {m = Qm31 n=0 Sn+1;n and the normalization of eigenvectors k{k1 = 1 determines {0 as {0 = 1+ PQ 1 (31)m det(Sm×m 3Lm×m ) Tm31 m=1 n=0 Sn+1;n If the Q + 1 eigenvalues are known, we observe that all eigenvectors can be expressed in terms of the original matrix S in a same way. Appendix B Algebraic graph theory This appendix reviews the elementary basics of the matrix theory for graphs J (Q> O). The book by Cvetkovic et al. (1995) is the current standard work on algebraic graph theory. B.1 The adjacency and incidence matrix 1. The adjacency matrix D of a graph J with Q nodes is an Q × Q matrix with elements dlm = 1 only if (l> m) is a link of J, otherwise dlm = 0. Because the existence of a link implies that dlm = dml , the adjacency matrix D = DW is a real symmetric matrix. It is assumed further that the graph J does not contain self-loops (dll = 0) nor multiple links between two nodes. The complement Jf of the graph J consists of the same set of nodes but with a link between (l> m) if there is no link (l> m) in J and vice versa. Thus, (Jf )f = J and the adjacency matrix Df of the complement Jf is Df = M L D where M is the all-one matrix ((M)lm = 1). Information about the direction 1 3 4 2 6 5 Fig. B.1. A graph with Q = 6 and O = 9. The links are lexicographically ordered, h1 = 1 $ 2> h2 = 1 $ 3> h3 = 1 # 6 etc. of the links is specified by the incidence matrix E, an Q × O matrix with 471 472 Algebraic graph theory elements ; ? 1 if link hm = l $ m elm = 1 if link hm = l # m = 0 otherwise Figure B.1 exemplifies the definition of D and E: 5 0 91 9 91 D=9 90 70 1 1 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 1 6 5 6 1 1 1 1 0 0 0 0 0 0 9 1 1: 0 0 1 1 1 0 0 0: : 9 : 0: 0 1 0 0 1 0 0: 9 0 1 E =9 : 0: 0 0 0 0 0 1 1 0: 9 0 : 8 7 1 0 0 0 0 1 0 0 1 1 8 0 0 0 1 0 0 1 0 0 1 2. The relation between adjacency and incidence matrix is given by the admittance matrix or Laplacian T, T = EE W = D where = diag(g1 > g2 > = = = > gQ ) is the degree matrix. Indeed, if l 6= m and noting that each column has only two non-zero elements at a dierent row, O X ¡ ¢ tlm= EE W lm = eln emn = 1 n=1 PO 2 If l = m, then n=1 eln = gl , the number of links that have node l in common. Also, by the definition of D, the row sum l of D equals the degree gl of node l, Q X gl = dln (B.1) PQ n=1 Consequently, each row sum n=1 tln = 0 which shows that T is singular implying that det T = 0. The Laplacian is symmetric T = TW because D and are both symmetric and the quadratic form defined in Section A.2 art. 10, {W T{ = {W TW { = {W E W E{ = kE{k22 0 is positive semidefinite, which implies that all eigenvalues of T are nonnegative and at least one is zero because det T = 0. P PQ Since Q l=1 n=1 dln = 2O, the basic law for the degree follows as Q X l=1 gl = 2O (B.2) B.1 The adjacency and incidence matrix 473 Notice that S = 31 D is a stochastic matrix because all elements of S lie in the interval [0> 1] and each row sum is 1. 3. Let M denote the all-one matrix with (M)lm = 1 and (J) the total number of spanning trees in the graph J, also called the complexity of J, then adjT = (J) M (B.3) adjT where T31 = det T . We omit the proof, but apply the relation (B.3) to the complete graph NQ where T = Q L M. Equation (B.3) demonstrates that all elements of adjT are equal to (J). Hence, it su!ces to compute one suitable element of adjT, for example (adjT)11 that is equal to the determinant of the (Q 1) × (Q 1) principal submatrix of T obtained by deleting the first row and column in T, 5 6 Q 1 1 === 1 9 1 Q 1 === 1 : 9 : (adjT)11 = det 9 : .. .. .. 7 8 . . . 1 1 === Q 1 Adding all rows to the first and subsequently adding this new first row to all other rows gives 6 5 1 1 1 1 === 1 1 : 9 0 Q 9 1 Q 1 = = = : = det 9 . (adjT)11 = det 9 .. .. 8 7 .. 7 ... . . 0 0 1 1 === Q 1 5 6 === 1 === 0 : . : = Q Q 2 .. . .. 8 === Q Hence, the total number of spanning trees in the complete graph NQ which is also the total number of possible spanning trees in any graph with Q nodes equals Q Q32 . This is a famous theorem of Cayley of which many proofs exist (van Lint and Wilson, 1996, Chapter 2). 4. The complexity of J is also given by (J) = det (M + T) Q2 Indeed, observe that MT = (ME) E W = 0 since ME = 0. Hence, (Q L M) (M + T) = Q M + QT M 2 MT = Q T and adj ((Q L M) (M + T)) = adj (M + T) adj (Q L M) = adj (QT) (B.4) 474 Algebraic graph theory Since TNQ = Q L M and as shown in art. 3, adj(Q L M) = Q Q32 M and since adj(Q T) = Q Q31 adjT = Q Q 31 (J) M where we have used (B.3), adj (M + T) M = Q (J) M Left-multiplication with M+T taking into account that MT = 0 and M 2 = Q M finally gives (M + T) adj (M + T) M = det (M + T) M = Q 2 (J) M which proves (B.4). 5. A walk of length n from node l to node m is a succession of n arcs of the form (q0 $ q1 )(q1 $ q2 ) · · · (qn31 $ qn ) where q0 = l and qn = m. A path is a walk in which all vertices are dierent, i.e. qo 6= qp for all 0 o 6= p n. Lemma B.1.1 The ¡number of walks of length n from node l to node m is ¢ equal to the element Dn lm . Proof (by induction): For n = 1, the number of walks of length 1 between state l and m equals the number of direct links between l and m, which is by definition the element dlm in the adjacency matrix D. Suppose the lemma holds for n 1. A walk of length n consists of a walk of length n 1 from l to some vertex u which is adjacent to m. By the induction ¢ ¡ n31 and hypothesis, the number of walks of length n 1 from l to u is D lu total number the number of walks with length 1 from u to m equals d¡um . The ¢ ¡ n¢ P n31 of walks from l to m with length n then equals Q d = D lm um u=1 D lu (by the rules of matrix multiplication). ¤ Explicitly, Q Q X Q ³ ´ X X = ··· dlu1 du1 u2 · · · dun32 un31 dun31 m Dn lm u1 =1 u2 =1 un31 =1 As shown in Section 15.2, the number of paths with n hops between node l and node m is X X X [n (l $ m; Q ) = ··· dlu1 du1 u2 · · · dun31 m u1 6={l>m} u2 6={l>u1 >m} un31 6={l>u1 >===>un32 >m} The definition of a path restricts the first index u1 to Q 2 possible values, the second u2 to Q 3, etc.. such that the total possible number of paths is n31 Y o=1 (Q 1 o) = (Q 2)! (Q n 1)! B.2 The eigenvalues of the adjacency matrix 475 whereas the total possible number of walks clearly is Q n31 . A graph is connected if, for each pair of nodes, there a walk or, ¢ ¡ exists n equivalently, if there exists some integer n A 0 for which D lm 6= 0 for each ¡ ¢ l> m. The lowest integer n for which Dn lm 6= 0 for each pair of nodes l> m is called the diameter of the graph J. Lemma B.1.1 demonstrates that the diameter equals the length of the longest shortest hop path in J. B.2 The eigenvalues of the adjacency matrix In this section, only general results of the eigenvalue spectrum of a graph J are treated. For special types of graphs, there exists a wealth of additional, but specific properties of the eigenvalues. 1. Since D is a real symmetric matrix, it has Q real eigenvalues (Section A.2), which we order as 1 2 · · · Q . Section A.1, art. 4 shows that, apart from a similarity transform, the set of eigenvalues with corresponding eigenvectors is unique. A similarity transform consists of a relabeling of the nodes in the graph that obviously does not alter the structure of the graph but merely expresses the eigenvectors in a dierent base. The classical Perron-Frobenius Theorem for non-negative square matrices (of which Theorem A.4.2 is a special case) states that 1 is a simple and positive root of the characteristic polynomial in (A.3) possessing the only eigenvector of D with non-zero components. Moreover, it follows from (A.34) that PQ PQ {W D{ l=1 m=1 dlm {l {m 1 = sup W = max PQ 2 {6=0 {6=0 { { l=1 {l The maximum is attained if and only if { is the eigenvector of D belonging W as shown in Section A.3. to 1 and for any other vector | 6= {, 1 {{WD{ { P By choosing the vector | = x = (1> 1> = = = > 1), we have, with Q m=1 dlm = gl and (B.2), 1 XX 1 X 2O dlm = gl = Q Q Q Q Q l=1 m=1 l=1 Q 1 (B.5) The stochastic matrix S = 31 D where = diag(g¡1 > g2 > = = = > gQ¢) is the degree matrix has the characteristic polynomial det 31 D L = Q Y det(D3{) where det = gm . Since the largest eigenvalue of a stochastic det { m=1 matrix equals 1 = 1 (Theorem A.4.2), for a regular graph where gm = u, the largest eigenvalue equals 1 = u. 476 Algebraic graph theory 2. Since dll = 0, we have that trace(D) = 0. From (A.7), Q 31 (1) fQ31 = Q X n = 0 (B.6) n=1 3. The Newton identities for polynomials. Let sq (}) denote a polynomial of order q defined as sq (}) = q X n dn (q) } = dq (q) n=0 q Y (} }n (q)) (B.7) n=1 where {}n (q)} are the q zeros. It follows from (B.7) that sq (0) = d0 (q) = Q dq (q) qn=1 (}n (q)). The logarithmic derivative of (B.7) is s0q (}) = sq (}) q X n=1 1 } }n (q) For } A maxn }n (q) (which is always possible for polynomials, but not for functions), we have that s0q (}) = sq (}) q X " X (}n (q))m } m+1 n=1 m=0 = sq (}) " X ]m (q) m=0 } m+1 where ]m (q) = q X (}n (q))m n=1 Thus q X ndn (q) } n = n=1 q X dn (q) } n " X ]m (q)} 3m = m=0 n=0 q " X X dn (q) ]m (q)} n3m (B.8) m=0 n=0 Let o = n m, then 4 o q. Also m = n o 0 such that n o. Combined with 0 n q, we have max(0> o) n q. Thus, q " X X dn (q) ]m (q)} n3m = m=0 n=0 q X q X dn (q) ]n3o (q)} o o=3" n=max(o>0) = 31 X q X o=3" n=0 dn (q) ]n3o (q)} o + q X q X o=0 n=o dn (q) ]n3o (q)} o B.2 The eigenvalues of the adjacency matrix 477 Equating the corresponding powers of } in (B.8) yields d0 (q)]3o (q) + q X n=1 q X dn (q) ]n3o (q) = 0 o0 dn (q) ]n3o (q) = (o q)do (q) n=o+1 The last set of equations for 0 o ? q, 1 X dn (q) ]n3o (q) qo q do (q) = (B.9) n=o+1 are the Newton identities that relate the coe!cients of a polynomial to the sum of the positive powers of the zeros. Applied to the characteristic polynomials (A.3) and (A.5) of the adjacency matrix with }n (q) = n , dn (q) = (1)Q fn and fQ 31 = 0 (from (B.6)) yields for the first few values, 1 X 2 n 2 Q fQ 32 = n=1 Q 1X 3n 3 n=1 3à 4 !2 Q Q X 1C X 2 fQ 34 = n 2 4n D 8 fQ 33 = n=1 n=1 4. From (A.4), the coe!cient of the characteristic polynomial fQ 32 = P P . Each principal minor P2 has a principal submatrix of the form doo 2¸ 0 { with {> | 5 [0> 1]. A minor P2 is non-zero if and only if { = | = 1 | 0 in which case P2 = 1. For each set of adjacent nodes, there exists such non-zero minor, which implies that fQ32 = O From art. 3, it follows that the number of links O equals 1 X 2 n 2 Q O= n=1 (B.10) 478 Algebraic graph theory 5. Each principal submatrix P3×3 is of the form 5 6 0 { } P3×3 = 7 { 0 | 8 } | 0 with determinant P3 = det P3×3 = 2{|}, which is only non-zero for { = | = }. That form of P3×3 corresponds with a subgraph of 3 nodes that are fully connected. Hence, fQ33 = 2× the number of triangles in J. From art. 3, it follows that 1X 3 the number of triangles in J = n 6 Q (B.11) n=1 6. In general, from (A.4) and by identifying the structure of a minor Pn , any coe!cient fQ3n can be expressed in terms of graph characteristics, X (1)f|fohv(G) (B.12) (1)Q fQ3n = GMJn where Jn is the set of all subgraphs of J with exactly n nodes and f|fohv (G) is the number of cycles in a subgraph G 5 Jn . The minor Pn is a determinant of the Pn×n submatrix of D and defined as X (1)(s) d1s1 d2s2 · · · dnsn det Pn = s where the sum is over all n! permutations s = (s1 > s2 > = = = > sn ) of (1> 2> = = = > n) and (s) is the parity of s, i.e. the number of interchanges of (1> 2> = = = > n) to obtain (s1 > s2 > = = = > sn ). Only if all the links (1> s1 ) > (2> s2 ) > = = = (n> sn ) are contained in J, d1s1 d2s2 = = = dnsn is non-zero. Since dmm = 0, the sequence of contributing links (1> s1 ) > (2> s2 ) > = = = (n> sn ) is a set of disjoint cycles and (s) depends on the number of those disjoint cycles. Now, det Pn is constructed ¡ ¢ from a specific set G 5 Jn of n out of Q nodes and in total there are Qn such sets in Jn . Combining all contributions leads to the expression (B.12). 7. Since D is a symmetric 0-1 matrix, we observe that using (B.1), Q Q Q X X X ¡ 2¢ 2 dln dnl = dln = dln = gl D ll = n=1 n=1 n=1 Hence, with (A.16) or (B.10), (A.7) and basic law for the degree (B.2) is expressed as Q Q X X 2n = gn = 2O (B.13) trace(D2 ) = n=1 n=1 B.2 The eigenvalues of the adjacency matrix 479 Furthermore, Q Q X X ¡ Q Q Q Q X Q Q X X X X X ¢ D2 lm = dln dnm = dnl dnm l=1 m=1;m6=l = l=1 m=1;m6=l n=1 n=1 l=1 Q X Q X Q X dnl (gn dnl ) = n=1 l=1 à gn m=1;m6=l Q X dnl Q X l=1 n=1 ! dnl l=1 or Q Q X X ¡ Q X ¢ gn (gn 1) D2 lm = l=1 m=1;m6=l (B.14) n=1 ¡ 2¢ P PQ Lemma B.1.1 states that Q l=1 m=1;m6=l D lm equals twice the total number of two-hop walks with dierent source and destination nodes. In other words, the total number of connected triplets of nodes in J equals half (B.14). 8. The total number Qn of walks of length n in a graph follows from Lemma B.1.1 as Q Q X X Qn = (Dn )lm l=1 m=1 Since any real symmetric matrix (Section A.2, art. 9) can be written as D = X diag(m )X W where X is an orthogonal matrix of the (normalized) eigenvecP n tors of D, we have that Dn = X diag(nm )X W and (Dn )lm = Q q=1 xlq xmq q . Hence, ÃQ !2 Q X Q Q X Q X X X Qn = xlq xmq nq = xlq nq l=1 m=1 q=1 q=1 l=1 9. Applying the Hadamard inequality for the determinant of any matrix Fq×q , à q !1 q 2 Y X |det F| |flm |2 m=1 l=1 yields, with dlm = dml and (B.1), |det D| ÃQ Q Y X m=1 l=1 ! 12 d2ml = ÃQ Q Y X m=1 l=1 ! 12 dml = Q Y p gm m=1 480 Algebraic graph theory Hence, with (A.6), (det D)2 = Q Y 2n Q Y gm (B.15) m=1 n=1 10. Applying the Cauchy—Schwarz inequality (5.17) à q !2 q q X X X dn en d2n e2n n=1 n=1 n=1 to the vector (2 > = = = > Q ) and the 1 vector (1> 1> = = = > 1) gives !2 ÃQ Q X X n (Q 1) 2n n=2 n=2 Introducing (B.6) and (B.13) ¡ ¢ 21 (Q 1) 2O 21 leads to the bound for the largest (and positive) eigenvalue 1 , r 2O (Q 1) (B.16) 1 Q P 2O Alternatively, in terms of the average degree gd = Q1 Q m=1 gm = Q , the largest eigenvalue 1 is bounded by the geometric mean of the average degree p and the maximum possible degree, 1 gd (Q 1). Combining the lower bound (B.5) and upper bound (B.16) yields r 2O 2O (Q 1) 1 (B.17) Q Q 11. From the inequality (A.26) for Hölder t-norms, we find that, if Q X |n |t ? t n=1 then Q X PQ |n |s ? s n=1 for s ¯A t A 0. ¯Since n=1 n = 0, not all n can ¯be positive ¯ and combined PQ ¯PQ ¯PQ s¯ s¯ s with ¯ n=1 n ¯ n=1 |n | , we also have that ¯ n=1 n ¯ ? s . Applied to the case where t = 2 and s = 3 gives the following implication: if B.2 The eigenvalues of the adjacency matrix 481 ¯ ¯P 2 ? 2 then ¯ Q 3 ¯ ? 3 . In that case, the number of triangles ¯ n=1 n ¯ 1 1 n=2 n given in (B.11) is ¯Q ¯ Q 1 3 1 ¯¯X 3 ¯¯ 1 3 1X 3 n 1 ¯ n ¯ A 0 the number of triangles in J = 1 + ¯ 6 6 6 6¯ PQ n=2 n=2 P 2 2 Hence, if Q n=2 n ? 1 , then the number s of triangles in J is at least one. Equivalently, in view of (B.10), if 1 A O then the graph J contains at least one triangle. 12. A Theorem of Turan states that h Theorem B.2.1 A graph J with Q nodes and more than tains at least one triangle. Q2 4 i links con- h 2i 2 This theorem is a consequence of art. 7 and 11. For, using O A Q4 Q4 s which is equivalent to Q ? 2 O in the bound on the largest eigenvalue (B.5), 1 and 1 A triangle. s 2O 2O A s = O Q 2 O s O is precisely the condition in art. 11 to have at least one 13. The eigenvalues of the complete graph NQ are 1 = Q 1 and 2 = = = = = Q = 1. This follows by computing the determinant in (A.2) in the same way as in Section B.1, art. 3. Alternatively, the adjacency matrix of the complete graph is M L and, if xW = [1 1 · · · 1] is the all-one vector, then M = x=xW . A direct computation yields µ ¶ ¢ ¡ W x=xW Q det (M L L) = det x=x ( + 1) L = ( ( + 1)) det L +1 Using (A.38) and xW x = Q , µ det (M L L) = ( ( + 1)) 1 Q Q +1 ¶ = (1)Q31 ( + 1)Q31 ( + 1 Q ) , gives the eigenvalues of NQ . Since the number of links in NQ is O = Q(Q31) 2 Q(Q31) we observe that the equality sign in (B.16) can occur. Since O for 2 any graph, the upper bound (B.16) shows that 1 Q 1 for any graph. 482 Algebraic graph theory 14. The dierence between the largest eigenvalue 1 and second largest 2 is never larger than Q , i.e. 1 2 Q (B.18) Since 1 A 0 as indicated by (B.17), it follows from (B.6) that 0= Q X n=1 n 1 + Q X |n | 1 + (Q 1) |2 | n=2 such that 2 1 Q 1 Hence, 1 2 1 + 1 Q 1 = Q 1 Q 1 Art. 13 states that the largest possible eigenvalue is 1 = Q 1 of the complete graph which proves (B.18). Again, the equality sign in (B.18) occurs in case of the complete graph. 15. Regular graphs. Every node m in a regular graph has the same degree gm = u and relation (B.1) indicates that each row sum of D equals u. Theorem B.2.2 The maximum degree gmax = max1$m$Q gm is an eigenvalue of the adjacency matrix D of a connected graph J if and only if the corresponding graph is regular (i.e. gm = gmax for all m). Proof: If { is an eigenvector of D belonging to eigenvalue = gmax so is each vector n{ for each complex n (Section A.1, art. 1). Thus, we can scale the eigenvector { such that the maximum component, say {p = 1, and {n 1 for all n. The eigenvalue equation D{ = gmax { for that maximum component {p is gmax {p = gmax = Q X dpm {m m=1 which implies that all {m = 1 whenever dpm = 1, i.e. when the node m is adjacent to node p. Hence, the degree of node p is gp = gmax . For any node m adjacent to p for which the component {m = 1, a same eigenvalue relation holds and thus gm = gmax . Proceeding this process shows that every node n 5 J has same degree gn = gmax because J is connected. Hence, { = x where xW = [1 1 · · · 1]. Conversely, if J is connected and regular, P then Q m=1 dpm = gmax for each p such that x is the eigenvector belonging B.2 The eigenvalues of the adjacency matrix 483 to eigenvalue = gmax , and the only possible eigenvector (as follows from ¤ art. 1). Hence, there is only one eigenvalue gmax . 16. The characteristic polynomial of the complement Jf is det (Df L) = det (M D ( + 1) L) ³ ³ ´´ = (1)Q det (D + ( + 1) L) L (D + ( + 1) L)31 M ³ ´ = (1)Q det ((D + ( + 1) L)) det L (D + ( + 1) L)31 x=xW where we have used that M = x=xW and x is the all-one vector. Similar to the proof of Lemma A.4.4, we find det (Df L) = (1)Q j () det (D + ( + 1) L) (B.19) where j () = 1 xW (D + ( + 1) L)31 x In general, j () is not a simple function of although a little more is ° °2 1 ° ° known. For example, j () = 1 °(D + ( + 1) L)3 2 x° which shows that 2 j () 5 (4> 1]. Unlike in the proof of Lemma A.4.4, x is generally not an eigenvector of D and we can write (Section A.1, art. 8) 1 X (1)n Dn +1 ( + 1)n " (D + ( + 1) L)31 = n=0 P n n where the last sum " n=0 D } can be interpreted as the matrix generating function of the number of walks ³ ´ of length n (see Section B.1, art. 5 n and art. 8). Since D = X diag nm X W (Section A.2, art. 9) where the orthogonal s £ xm of D, the matrix product¤ £ matrix X consists of eigenvectors ¤ W x X = x=x1 x=x2 · · · x=xQ = Q cos 1 cos 2 · · · cos Q where m is the angle between ³the´eigenvector xm and the all-one vector P n 2 x. Hence, xW Dn x = xW X diag nm X W x = Q Q m=1 m cos m and, with P" (3m )n +1 n=0 (+1)n = +1+m , we can write j () = 1 Q Q X m=1 cos2 m + 1 + m With (A.5), we have f ( 1) = det (D + ( + 1) L) = QQ n=1 (n + 1 + ) 484 Algebraic graph theory and, hence, ¢ (1)Q X ¡ + 1 + m Q 2 cos2 m Q Q det (Df L) = m=1 Q Y (n + 1 + ) n=1;n6=m (B.20) which shows that the poles of j () are precisely compensated by the zeros of the polynomial f ( 1). Thus, the eigenvalues of Df are generally dierent from {m 1}1$m$Q where m is an eigenvalue of D. Only if x n 3Q and is an eigenvector of D corresponding with n , then j () = +1+ +1+n all eigenvalues of Df belong to the set {m 1}1$m6=n$Q ^ {Q 1 n }. According to art. 15, x is only an eigenvector when the graph is regular. B.3 The stochastic matrix S = 31 D The stochastic matrix S = 31 D, introduced in Section B.2, art. 1, characterizes a random walk on a graph. A random walk is described by a finite Markov chain that is time-reversible. Alternatively, a time-reversible Markov chain can be viewed as random walk on an undirected graph. Random walks on graphs have many applications in dierent fields (see e.g. the survey by Lovász (1993)); perhaps, the most important application is randomly searching or sampling. The combination of Markov theory and algebra leads to interesting properties of S = 31 D. Section 9.3.1 and A.4.1 show that the left-eigenvector of S belonging to eigenvalue = 1 is the steady-state vector (which is a 1×Q row vector) and that the corresponding right-eigenvector is the all-one vector x, which essentially follows from (9.8) and which indicates that, at each discrete time step, precisely one transition occurs. These eigenvectors obey the eigenvalue equations S W W = W and S x = x and the orthogonality relation x = 1 (Section A.1, art. 3). If g = (g1 > g2 > = = = > gQ ) is the degree vector, then the basic law for the degree (B.2) is written in vector form as ¡ g ¢W x = 1. Theorem 9.3.5 states that the steady-state gW x = 2O, or, 2O ¡ g ¢W x = 1 eigenvector is unique such that the equations x = 1 and 2O imply that the steady-state vector is µ ¶W g = 2O or m = gm 2O (B.21) B.3 The stochastic matrix S = 1 D 485 In general, the matrix S is not symmetric, but, after a similarity transform K = 1@2 , a symmetric matrix U = 1@2 S 31@2 = 31@2 D31@2 is obtained whose eigenvalues are the same as those of S (Section A.1, art. 4). The powerful property (Section A.2, art. 9) of symmetric matrices shows that all eigenvalues are real and that U = X W diag(U ) X , where the columns of the orthogonal matrix X consist of the normalized eigenvectors yn that obey ymW yn = mn . Explicitly written in terms of these eigenvectors gives U= Q X n yn ynW n=1 where, with Frobenius Theorem A.4.2, the real eigenvalues are ordered as 1 = 1 2 · · · Q 1. If we exclude bipartite graphs (where the set of nodes is N = N1 ^ N2 with N1 _ N2 = B and where each link connects a node in N1 and in N2 ) or reducible Markov chains (Section A.4), then |n | ? 1, for n A 1. Section A.1, art. 4 shows that the similarity transform K = 1@2 maps the steady state vector into y1 = K 31 W and, with (B.21), 31@2 W ° y1 = ° °31@2 W ° 2 or s r gm 2O y1m = s µ s ¶2 = PQ gm m=1 gm s = m 2O 2O Finally, since S = 31@2 U1@2 , the spectral decomposition of the transition probability matrix of a random walk on a graph with adjacency matrix D is S = Q X 31@2 n yn ynW 1@2 = x + n=1 Q X n 31@2 yn ynW 1@2 n=2 ¢ ¡ The q-step transition probability (9.10) is, with yn ynW lm = ynl ynm and (B.21), s Q gm gm X q q + n ynl ynm Slm = 2O gl n=2 The convergence towards the steady state m can be estimated from s s Q Q X ¯ ¯ q gm X q ¯Slm m ¯ gm |qn | |ynl | |ynm | ? |n | gl gl n=2 n=2 486 Algebraic graph theory Denoting by = max (|2 | > |Q |) and by 0 the largest element of the reduced set {|n |} \ {} with 2 n Q , we obtain s ¯ ¯ q ¯Slm m ¯ ? gm q + R (0q ) gl B.4 Eigenvalues and connectivity A graph J has n components (or clusters) if there exists a relabeling of the nodes such that the adjacency matrix has the structure 6 5 D1 R = = = R 9 .. : 9 R D2 . : : D=9 9 .. : .. 7 . 8 . R = = = Dn where the square submatrix Dp is the adjacency matrix of the connected component p. Disconnectivity is a special case of reducibility of a stochastic matrix defined in Section A.4 and expresses that no communication is possible between two states in a dierent component or cluster. Using (A.39) indicates that det (D L) = n Y det (Dp p L) (B.22) p=1 If D is a regular graph with degree u, so is each submatrix Dp . Since Dp is connected, Section B.2, art. 15 states that the largest eigenvalue of any Dp equals u. Hence, by (B.22), the multiplicity of the largest eigenvalue of D equals the number of components in the regular graph. As shown in Section B.1, art. 2, the Laplacian T has non-negative eigenvalues of which at least one equals zero. In addition, the matrix (Q 1)L T = (Q 1)L + D is non-negative with constant row sums all equal to Q 1. Although the matrix (Q 1)L T is not an adjacency matrix and does not represent a regular graph, the main argument in the proof of Theorem B.2.2 is the property of constant row sums and non-negative matrix elements. Hence, the multiplicity of the largest eigenvalue of (Q 1)L T is equal to the number of components of J. But the largest eigenvalue of (Q 1)L T is the smallest of T (Q 1)L and also of T. Hence, we have proved B.5 Random matrix theory 487 Theorem B.4.1 The multiplicity of the smallest eigenvalue = 0 of the Laplacian T is equal to the number of components in the graph J If T has only 1 zero eigenvalue with corresponding eigenvector x (because PQ n=1 tln = 0 for each 1 l Q is, in vector notation, Tx = 0), then the graph is connected; it has only 1 component. Theorem B.4.1 also implies (T) that, if the second smallest eigenvalue T = Q31 of T is zero, the graph J is disconnected. Since all eigenvectors of a matrix are linearly independent, the eigenvector {T of T must satisfy {WT x = 0 since x is the eigenvector belonging to = 0. By requiring this additional constraint and choosing the scaling of the eigenvector such that {W { = 1, we obtain similar to (A.35) that T = min k{k22 =1 and {W x=0 {W T{ The second smallest eigenvalue T has many interesting properties that characterize how strongly a graph J is connected. It is interesting to mention the inequality (Cvetkovic et al., 1995, p. 265) ³ ´ (J) T 2 (J) 1 cos Q (B.23) where (J) and (J) are the vertex and edge connectivity respectively. B.5 Random matrix theory Random matrix theory investigates the eigenvalues of an Q × Q matrix D whose elements dlm are random variables with a given joint distribution. Even in case all elements dlm are independent, there does not exist a general expression for the distribution of the eigenvalues. However, in some particular cases (such as Gaussian elements dlm ), there exist nice results. Moreover, if the elements dlm are properly scaled, in various cases the spectrum in the limit Q $ 4 seems to converge rapidly to a deterministic limit distribution. The fascinating results of random matrix theory and applications from nuclear physics to the distributions of the non-trivial zeros of the Riemann Zeta function are discussed by Mehta (1991). Random matrix theory immediately applies to the adjacency matrix of the random graph Js (Q ) where each element dlm is 1 with probability s and zero with probability 1 s. 488 Algebraic graph theory B.5.1 The spectrum of the random graph Js (Q ) Let denote an arbitrary eigenvalue of the adjacency matrix of the random graph Js (Q ). Clearly, is a random variable with mean H [] = Q1 £ ¤ PQ n = 0 because of (B.6). In addition, the variance Var[] = H 2 = n=1 1 PQ 2 n=1 n and from (B.10) Q 2O = s(Q 1) Q This results implies ³s ´ that, for fixed s and large Q , the eigenvalues of Js (Q ) grow as R Q , with the exception1 of the largest eigenvalue 1 . The number of links O in Js (Q ) is binomially distributed with mean H [O] = s Q (Q231) . Taking the expectation of the bounds (B.17) on the largest eigenvalue gives r 2 2 (Q 1) hs i O H [O] H [1 ] H Q Q Using (2.12) yields Var [] = (2) ( 2 ) µ¡Q ¢¶ hs i X X s s n Q 2 O = n Pr [O = n] = ns (1 s)( 2 )3n H n Q Q n=0 n=0 Unfortunately, the sum cannot be expressed in closed form, but (Q2 ) µ¡ ¢¶ Q 1 X Q2 s n s q¡ ¢ ns (1 s)( 2 )3n s n Q 2 n=0 with equality for Q $ 4. In summary, for any Q and s, s s(Q 1) H [1 ] s (Q 1) (B.24) The degree distribution (15.11) of the random graph is a binomial distribution with mean H [Grg ] = s(Q 1) and Var[Grg ] = (Q 1)s(1 s). The inequality (5.13) indicates that the degree Grg converges exponentially fast to zero the mean H [Guj ] for fixed s and large Q , which means that the random graphs tends to a regular graph with high probability. Section B.2, art. 1 states that 1 $ s (Q 1) with high probability. Comparison with the bounds (B.24) indicates that the upper bound is less tight than the lower bound and that the upper bound is only sharp when s $ 1, i.e. for the complete graph. Section B.2, art. 13 shows that only for the complete graph the upper bound is indeed exactly attained. 1 1 It is known that, for large Q, the second largest eigenvalue of Js (Q) grows as R Q 2 + . B.5 Random matrix theory 489 B.5.2 Wigner’s Semicircle Law Wigner’s Semicircle Law is the fundamental result in the spectral theory of large random matrices. Theorem B.5.1 (Wigner’s Semicircle Law) Let D be a random Q × Q real symmetric matrix with independent and identically distributed elements dlm with 2 = Var[dlm ] and denote by (DQ ) an eigenvalue of the set of the Q real eigenvalues of the scaled matrix DQ = IDQ . The probability density function i(DQ ) ({) of (DQ ) tends for Q $ 4 to lim i(DQ ) ({) = Q<" 1 p 2 4 {2 1|{|$2 2 2 (B.25) Since Wigner’s first proof (Wigner, 1955) of this Theorem and his subsequent generalizations (Wigner, 1957, 1958) many proofs have been published. However, none of them is short and easy enough to include here. Wigner’s Semicircle Law illustrates that, for su!ciently large Q , the distribution of the eigenvalues of IDQ does not depend anymore on the probability distribution of the elements dlm . Hence, Wigner’s Semicircle Law exhibits a universal property of a class of large, real symmetric matrices with independent random elements. Mehta (1991) suspects that, for a much broader class of large random matrices, a mysterious yet unknown law of large numbers must be hidden. The scaling of D by I1Q can be understood from the previous Section B.5.1. The adjacency matrix of the random graph satis2 fies the conditions in Theorem B.5.1 ´ = s (1 s) and its eigenvalues ³swith Q . In order to obtain the finite limit (apart from the largest) grow as R distribution (B.25) scaling by I1Q is necessary. The spectrum of Js (50) together with the properly rescaled Wigner’s Semicircle Law (B.25) is plotted in Fig. B.2. Already for this small value of Q , we observe that Wigner’s Semicircle Law is a reasonable approximation for the intermediate s-region. The largest eigenvalue 1 for finite Q , which is distributed around s (Q 1) as demonstrated above and shown in Fig. B.2 but which is not incorporated in Wigner’s Semicircle Law, influences the PQ average H [] = Q1 n=1 n = 0 and causes the major bulk of the pdf around { = 0 to shift leftward compared to Wigner’s Semicircle Law, which is perfectly centered around { = 0. The complement of Js (Q ) is (Js (Q ))f = J13s (Q ), because a link in Js (Q ) is present with probability s and absent with probability 1 s and (Js (Q ))f is also a random graph. For large Q , there exists a large range of s values for which both s sf and 1 s sf such that both Js (Q ) 490 Algebraic graph theory 14 p = 0.1 N = 50 p = 0.2 p = 0.3 p = 0.4 p = 0.5 p = 0.6 p = 0.7 p = 0.8 p = 0.9 Semicircle Law (p = 0.5) 12 p = 0.1 10 p = 0.9 fO(x) 8 6 p = 0.8 p = 0.2 p = 0.7 p = 0.3 4 E[O1] = p(N 1) 2 0 0 10 20 30 40 eigenvalue x Fig. B.2. The probability density function of an eigenvalue in Js (50) for various s. Wigner’s Semicircle Law, rescaled and for s = 0=5 ( 2 = 14 ), is shown in bold. We observe that the spectrum for s and 1 s is similar, but slightly shifted. The high peak for s = 0=1 reflect disconnectivity, while the high peak at s = 0=9 shows the tendency to the spectrum of the complete graph where Q 1 eigenvalues are precisely 1. and (Js (Q ))f are connected almost surely. Figure B.2 shows that the normalized spectra of Js (Q ) and J13s (Q ) are, apart from a small shift and ignoring the largest eigenvalue, almost identical. Equation (B.20) indicates that the spectrum of a graph and its complement tends to each other if cos m $ 0 (except for the largest eigenvalue which will tend to x). This seems to suggest that Js (Q ) and J13s (Q ) are tending to a regular graph with degree s (Q 1) and (1 s) (Q 1) and that these regular graphs (even for small Q) have nearly the same spectrum (apart from the largest s ' IQ I1Q eigenvalue s (Q 1) and (1 s) (Q 1) respectively): I13s Q where s is an eigenvalue of Js (Q ). Figure B.3 shows the probability density function i ({) of the eigenvalues of the adjacency matrix D of Js (Q ) with Q = 100 together with the eigenvalues of the corresponding matrix DX where all one elements in the adjacency matrix of Js (100) are replaced by i.i.d uniform random variables on [0,1]. Wigner’s Semicircle Law provides an already better approximation B.5 Random matrix theory 491 12 p = 0.2 N = 100 p = 0.3 p = 0.4 p = 0.5 p = 0.6 p = 0.7 p = 0.8 p = 0.9 Semicircle Law (p = 0.5) 10 p = 0.7 p = 0.8 fO(x) 8 6 4 2 0 0 20 40 60 80 eigenvalue x Fig. B.3. The spectrum of the adjacency matrix of Js (100) (full lines) and of the corresponding matrix with i.i.d. uniform elements (dotted lines). The small peaks at higher values of { are due to 1 . than for Q = 50. Since the elements of DX are always smaller (with probability 1) than those of D, the matrix norm kDX k2 ? kDk2 , which implies by Section B.2, art. 1 that 1 (DX ) ? 1 (D). In addition, relation (B.13) P 2 shows that Q n=1 n (DX ) ? 2O such that Var[ (DX )] ? Var[ (D)], which is manifested by a narrower and higher peaked pdf centered around { = 0. Appendix C Solutions of problems C.1 Probability theory (Chapter 2) (i) Using the general formula (2.12) for a non-zero random variable [, we have H [log [] = " [ log n Pr [[ = n] n=1 S n while (2.18), *[ (}) = " n=0 Pr [[ = n] } , shows that we need to express log n in terms of n } . A possible solution starts from the double integral with 0 ? d $ e, ] " ] " ] e ] e g{ gwh3w{ = gw g{h3w{ d 0 0 d where the reversal Uof integration is justified by absolute convergence (Titchmarsh, 1964, Section 1.8). Since 0" gwh3w{ = {1 , the left-hand side integral equals ] e ] " g{ d gwh3w{ = log 0 e d while the integral at the right hand side is ] " 3wd ] " ] e h 3 h3we gw gw g{h3w{ = w 0 d 0 hence, ] " log n = 0 h3w 3 h3wn gw w Multiplying both sides by Pr [[ = n] and summing over n, we obtain (reversal in operators is justified on absolute convergence), " [ n=1 ] " gw [ 3w h 3 h3wn Pr [[ = n] w 0 n=1 $ # ] " " " [ [ gw Pr [[ = n] 3 h3wn Pr [[ = n] h3w = w 0 n=1 n=1 " log n Pr [[ = n] = which finally gives with (2.18) ] " H [log [] = 0 h3w 3 *[ (h3w ) + *[ (0) gw w 493 494 Solutions of problems (ii) (a) The pdf of the n-th smallest order statistic follows from (3.36) for an exponential distribution as p 3 1 n31 3(p3n+1){ 1 3 h3{ h i[(n) ({) = p n31 The probability generating function (2.37) is l p 3 1 ] " k n31 3(}+(p3n+1))w 1 3 h3w *[(n) (}) = H h3}[(n) = p h gw n31 0 Let x = h3w and = } + (p 3 n + 1), then the integral reduces to the well-known Beta function (Abramowitz and Stegun, 1968, Section 6.2.) ] ] " n31 3w 1 1 1 1 3 h3w h gw = (1 3 x)n31 x@31 gw = E (n> @) 0 0 1 K (n) K (@) = K (n + @) Hence, *[(n) (}) = } n31 \ +p+13n K 1 p! p! } = } (p 3 n)! K + p + 1 (p 3 n)! m=0 +p3m The mean follows from H [(n) = 3O0z(n) (0) where O[ is the logarithm of the generating function (2.41) as n31 1 [ 1 H [(n) = m=0 p 3 m (C.1) (b) For a polynomial probability density function i[ ({) = {31 1{M[0>1] with A 0, we have with (3.36) for { M [0> 1] that p 3 1 {n31 (1 3 { )p3n i[(n) ({) = p n31 with mean p 3 1 ] 1 {n (1 3 { )p3n g{ H [(n) = p n31 0 1 p 3 1 ] 1 K n+ 1 p! =p wn+ 31 (1 3 w)p3n gw = 1 n31 (n 3 1)! K p + 1 + 0 If < ", then H [(n) = 1, while for < 0, H [(n) = 0. For a uniform distribution n where = 1, the result is H [(n) = p+1 . Indeed, the p independently chosen uniform random variables divide, after ordering, the line segment [0> 1] into p + 1 subintervals. The length O of each subinterval has a same distribution, which more easily follows by symmetry if the line segment is replaced by a circle of unit perimeter. Since the length O of each subinterval is equal in distribution, one can consider the first subinterval [0> [(1) ] whose length O exceeds a value { M (0> 1) if and only if all p uniform random variables belong to [{> 1]. The latter event has probability equal to (1 3 {)p such that Pr [O A {] = (1 3 {)p 1 and, with (2.35), H [O] = p+1 . (iii) If [ were a discrete random variable, then Pr [[ = n] E qqn , where qn is the number of values in the set {{1 > {2 > = = = > {q } that is equal to n. For a continuous random variable [, the values are generally real numbers ranging from {min = min1$m$q {m until {max = max1$m$q {m . We first construct a histogram K from the set {{1 > {2 > = = = > {q } by choosing 3{min a bin size {{ = {maxp , where p is the number of bins (abscissa points). The choice of 1 ? p ? q is in general di!cult to determine. However, most computer packages allow us to experiment with p and the human eye proves sensitive enough to make a good choice C.1 Probability theory (Chapter 2) 495 of p: if p is too small, we loose details, while a high p may lead to high irregularities due to the stochastic nature of [. Once p is chosen, the histogram consists of the set {k0 > k1 > = = = > kp31 } where km equals the number of [ values in the set {{1 > {2 > = = = > {q } that lies in the interval [{min + m{{> {min + (m + 1){{] for 0 $ m $ p 3 1. By construction, Sp31 m=0 km = q. The histogram K approximates the probability density function i[ ({) after dividing each value km by q{{ because {max {min 31 { [ ] {max 1= {min i[ ({)g{ = lim {{<0 i[ (m ) {{ E m=0 p31 [ m=0 km {{ = 1 q{{ where in the Riemann sum m denotes a real number m M [{min + m{{> {min + (m + 1){{]. Alternatively from (2.31) we obtain i[ (m ) = lim {{<0 Pr [m ? [ $ m + {{] Pr [{min + m{{ ? [ $ {min + (m + 1){{] E {{ {{ such that i[ (m ) E km q{{ which reduces to the discrete case where {{ = 1. Q (iv) The density of mobile nodes in the circle with radius u equals = u 2 . Let U denote the (random) position of a mobile node. The probability that there is a mobile node between distance { and { + g{ (and { $ u) is Pr [{ $ U $ { + g{] = 2{g{ u2 From (2.31), the pdf of U equals iU ({) = 2{ 1 and the distribution function follows by u2 {$u 2 integration as IU ({) = {u2 1{$u + 1{Au . The (random) position U(p) of the p-th nearest mobile node to the center is given by (3.36) iU(p) ({) = QiU ({) Q 3 1 (IU ({))p31 (1 3 IU ({))Q 3p p31 Written in terms of the density for { $ u, iU(p) ({) = 2{ Q 313(p31) Q 3 1 {2 p31 {2 13 p31 Q Q 2 we recognize, apart from the prefactor 2{, a binomial distribution (3.3) with s = { . Q Similar to the derivation of the law of rare events in Section 3.1.4, this binomial distribution tends, for large Q but constant density , to a Poisson distribution with = {2 . Hence, asymptotically, the pdf of the position U(p) of the p-th nearest mobile node to the center is, for { $ u, p31 {2 2 iU(p) ({) = 2{ h3{ (p 3 1)! (v) We use the law of total probability (2.46) first assuming that Z is discrete, [ Pr[Y 3 Z $ {] = Pr[Y 3 Z $ {|Z = n] Pr [Z = n] n and, by independence, Pr[Y 3 Z $ {|Z = n] = Pr[Y $ n + {]. Hence, [ Pr[Y $ n + {] Pr [Z = n] Pr[Y 3 Z $ {] = n 496 Solutions of problems If Z is continuous, the general formula is ] " Pr[Y 3 Z $ {] = Pr[Y $ { + |] 3" g Pr[Z $ |] g| g| (C.2) from which the pdf follows by dierentiation ] " iY 3Z ({) = 3" iY ({ + |)iZ (|)g| This resembles the convolution integral (2.62). If both Y and Z have the same distribution, direct integration of (C.2) yields Pr[Y $ Z ] = Pr[Z $ Y ] = 1 2 This equation confirms the intuitive result that two independent random variables with same density function have equal probability to be larger or smaller than the other. C.2 Correlation (Chapter 4) (i) In two dimensions, formula (4.2) becomes & % k l 2 2 [ \ 3}1 [3}2 \ 2 2 H h } + }1 }2 \ [ + } = exp 3}2 \ 3 [ }1 + 2 1 2 2 % & 2 1 3 2 \ 1 2 }22 = exp 3}2 \ 3 [ }1 + ([ }1 + }2 \ ) + 2 2 Hence, with (4.21) the joint probability distribution is i[\ ({> |; ) = 1 (2l)2 ] f1 +l" ] f2 +l" f1 3l" h}1 ({3[ )+}2 (|3\ ) f2 3l" 2 2 2 \ (1 ) 2 1 }2 2 × h 2 ([ }1 +}2 \ ) + g}1 g}2 ] f2 +l" 2 (12 ) 2 \ 1 }2 }2 (|3\ ) 2 = h h g}2 2l f2 3l" ] f1 +l" 2 1 1 × h}1 ({3[ ) h 2 ([ }1 +}2 \ ) g}1 2l f1 3l" Evaluating the last integral, denoted by O, yields ] f1 +l" 1 1 h}1 ({3[ ) exp ([ }1 + }2 \ )2 g}1 2l f1 3l" 2 ] " 1 1 = h(f1 +lw)({3[ ) exp ([ (f1 + lw) + }2 \ )2 gw 2 3" 2 % 2 & ] " 2 3[ }2 \ 1 f1 ({3[ ) h w 3 l f1 + gw hlw({3[ ) exp = 2 2 [ 3" O= C.3 Poisson process (Chapter 7) 497 Since the integrand is an entire function, thekcontour canlbe shifted, which allows substitution as in real analysis. Thus, let x = w 3 l f1 + }2\ , then [ % & 2 [ 2 [ h exp 3 x gx 2 3" % # $& ] 2 ({ 3 [ ) 1 3 }2\ ({3[ ) " [ exp 3 [ x2 + 2lx h = gx 2 2 2 [ 3" 5 # &] $2 6 % 2 " [ ({ 3 ) ({ 3 [ )2 1 3 }2\ ({3[ ) [ 8 gx [ h x+l exp 3 exp 73 = 2 2 2 2[ 2 [ 3" 1 f1 ({3[ ) O= h 2 ] " By substituting w = x + l k l } l x+l f1 + 2 \ ({3[ ) ({3[ ) , the integral becomes 2 [ % & & I ] " ] " 2 2 [ [ 2 2 2 exp 3 exp 3 h3z z31@2 gz w gw = 2 w gw = 2 2 [ 0 3" 0 I I 2 2 1 = K = [ 2 [ ] " % where we have used the Gamma function (Abramowitz and Stegun, 1968, Chapter 6). Hence, } 3 2 \ ({3[ ) O= h [ I [ 2 % ({ 3 [ )2 exp 3 2 2[ & and ({3 )2 ] f2 3l" \2 (12 ) 2 exp 3 22[ 1 }2 3 \ + \ ({3[ ) }2 }2 | [ 2 [ i[\ ({> |; ) = I h h g}2 2l f2 3l" [ 2 The last integral is recognized with (3.22) as the inverse Laplace transform of a Gaussian 2 1 3 2 and mean = + \ ({ 3 ). Thus with variance 2 = \ \ [ [ % 2 & |3\ 3 \ ({3[ ) [ ({3[ )2 exp 3 2 exp 3 22 2\ (132 ) [ I s i[\ ({> |; ) = I [ 2 \ 1 3 2 2 which finally leads to the joint Gaussian density function (4.4). Hence, the linear combination method leads to exact results for Gaussian random variables. C.3 Poisson process (Chapter 7) (i) Let \ be a binomial random variable with parameters Q and s, where Q is a Poisson random variable with parameter . The probability density function of \ is obtained by applying the law of total probability (2.46), Pr [\ = n] = " [ q=0 Pr[\ = n|Q = q] Pr [Q = q] 498 Solutions of problems With (3.3) and (3.9), we have Pr [\ = n] = = " " [ sn 3 [ t (q3n) q q n (q3n) q h3 = h s t q! n! (q 3 n)! n q=0 q=n " (s)n 3+t sn n 3 [ t q q h = h n! q! n! q=0 n Since t = 1 3 s, we arrive at Pr [\ = n] = (s) h3s , which means that \ is a Poisson n! random variable with mean s. If a su!cient sample of test strings defined above is sent and received, the average number of “one bits” at receiver divided by the average number of bits at the sender gives the probability s (if errors occur indeed independently). (ii) Since the counting process of a sum of a Poisson process is again a Poisson counting process S with rate equal to 4m=1 m , the average number of packets of the four classes in the router’s S buers during interval W is = W 4m=1 m . Hence, the probability density function for the q total number Q of arrivals is Pr [Q = q] = q! h3 . (iii) Theorem 7.3.4 states the Q(w) is a Poisson counting process with rate 1 + 2 . Then, Pr [{[1 (w) = 1} K {[(w) = 1}] Pr [[(w) = 1] Pr [{[1 (w) = 1} K {[2 (w) = 0}] = Pr [[(w) = 1] 1 Pr [[1 (w) = 1] Pr [[2 (w) = 0] = = Pr [[(w) = 1] 1 + 2 Pr [[1 (w) = 1|[(w) = 1] = since the Poisson random variables [1 and [2 are independent. As an application we can consider a Poissonean arrival flow of packets at a router with rate . If the packets are marked randomly with probability s = 1 , the resulting flow consists of two types, those marked and those not. Each of these flows is again a Poisson flow, the marked flow with rate 1 = s and the non-marked flow with 2 = (1 3 s). Actually, this procedure leads to a decomposition of the Poisson process into two independent Poisson processes and leads to the reverse of Theorem 7.3.4. 1 (iv) (a) Applying the solution of previous exercise immediately gives + 1 2 +3 (b) Since the three Poisson processes are independent, the total number of cars on the three lanes, denoted by [, is also a Poisson process (Theorem 7.3.4) with rate = 1 + 2 + 3 . q Hence, Pr [[ = q] = q! h3 . (c) Let us denote the Poisson process in lane m by [m . Then, using the independence between the [m , Pr [[1 = q> [2 = 0> [3 = 0] = Pr [[1 = q] Pr [[2 = 0] Pr [[3 = 0] = q q 1 31 32 33 h h = 1 h3 h q! q! (v) (a) The player relies on the fact that during the time there is exactly one arrival. Since the game rules mention that he should identify the last signal in (0> W ), signals arriving during (0> v) do not influence his chance to win because of the memoryless property of the Poisson process. The number of arrivals in the interval (v> W ) obeys a Poisson distribution with parameter (W 3 v). The probability that precisely one signal arrives in the interval (v> W ) is Pr [Q (W ) 3 Q (v) = 1] = (W 3 v) h3(W 3v) . (b) Maximizing this winning probability with respect to v (by equating the first derivative to zero) yields g Pr [Q (W ) 3 Q (v) = 1] = 3h3(W 3v) + 2 (W 3 v) h3(W 3v) = 0 gv with solution (W 3 v) = 1 or v = W 3 1@. This maximum (which is readily verified by g2 checking that gv 2 Pr [Q (W ) 3 Q (v) = 1] ? 0) lies inside the allowed interval (0> W ). The maximum probability of winning is Pr [Q (W ) 3 Q (W 3 1@) = 1] = 1@h. C.3 Poisson process (Chapter 7) 499 (vi) (a) We apply the general formula (7.1) for the pdf of a Poisson process with mean H [[(w)] = w = 1. Then, Pr [[ (w + v) 3 [ (v) = 0] = h3w = 1h . S 1 (b) Pr [[ (w + v) 3 [ (v) A 10] = 1 3 Pr [[ (w + v) 3 [ (v) $ 10] = 1 3 1h 10 n=0 n! . (c) Each minute is equally probable as follows from Theorem 7.3.3. (vii) This exercise is an application of randomly marking in a Poisson flow as explained in solution (iii) above. The total flow of packets can be split up into an ACK stream, a Poisson process Q1 with rate s = 3v31 and a data flow, an independent Poisson process Q2 with rate (1 3 s) = 7v31 . Then, (a) Pr [Q1 A 1] = 1 3 Pr [Q1 = 0] = 1 3 h33 (b) The average number is H [Q1 + Q2 |Q1 = 5] = H [Q1 |Q1 = 5] + H [Q2 |Q1 = 5] = 5 + H [Q2 ] = 5 + 7 = 12 packets. 2 (c) Pr [Q1 = 2|Q1 + Q2 = 8] = 6 3 h3 7 h7 Pr[Q1 =2>Q1 +Q2 =8] 6! = 2! 108 10 E 29=65% Pr[Q1 +Q2 =8] 8! h (viii) (a) Since the three Poisson arrival processes are independent, the total number of requests will also be a Poisson process with the parameter = 1 + 2 + 3 = 20 requests/hour (Theorem 7.3.4). The expected number of requests during an 8-hour working day is H [Q] = w = 20 × 8 = 160 requests. (b) If we denote arrival processes of requests with dierent ADSL problems each with a random variable [l for l = 1> 2> and 3, then due to their mutual independence Pr [[1 = 0> [2 = n> [3 = 0] = Pr [[1 = 0] Pr [[2 = n] Pr [[3 = 0] = h31 w 8 from which Pr [[1 = 0> [2 = 3> [3 = 0] = h3 3 (2 w)n h32 w 33 w h n! 6 3 6 h3 3 6 h3 3 = 1=7 × 1033 . 3! 20 (c) If we denote the total number of requests by [ then Pr [[ = 0] = h3w = h3 4 = 33 6=7 × 10 . (d) The precise time is irrelevant for Poisson processes, only the duration of the interval matters. Here intervals are overlapping and we need to compute the probability 3 s = Pr [{[ (0=2) = 1} K {[ (0=5) 3 [ (0=1) = 2}] = 1 [ Pr [{[ (0=1) = n} K {[ (0=2) 3 [ (0=1) = 1 3 n} K {[ (0=5) 3 [ (0=2)} = 1 + n] n=0 = 1 [ Pr [[ (0=1) = n] Pr [[ (0=2) 3 [ (0=1) = 1 3 n] Pr [[ (0=5) 3 [ (0=2) = 1 + n] n=0 = 1 [ n=0 h32 (2)n 32 (2)13n 36 (6)1+n h h = 48h310 = 2=18 × 1033 n! (1 3 n)! (1 + n)! (e) Given that at the moment w + v there are n + p requests, the probability that there were n requests at the moment w is Pr [{[ (w) = n} K {[ (w + v) = n + p}] Pr [[ (w + v) = n + p] Pr [[ (w) = n] Pr [[ (w + v) 3 [ (w) = p] = Pr [[ (w + v) = n + p] Pr [[ (w) = n|[ (w + v) = n + p] = (w)n h3w (v)p h3v n! p! = ( (w + v))n+p h3(w+v) (n + p)! n + p w n v p = n w+v w+v 500 Solutions of problems (ix) (a) The number of attacks that are arriving to the PC is a Poisson random variable [ (w) with rate = 6. The probability that exactly one (n = 1) attack during one (w = 1) hour follows from (7.1) as Pr [[(1) = 1] = 6h36 . (b) Applying (7.2), the expected amount of time that the PC has been on is w = H[[(w)] = 60 = 10 hours. 6 (c) The arrival time of the fifth attack is denoted by W . Given that there are six attacks in one hour (w = 1), we compute the probability Pr[W ? w|[(1) = 6] that either five attacks arrive in the interval (0> w) and one arrives in (w> 1) or all six attacks arrive in (0> w) and none arrives in the interval (w> 1). Hence, for 0 $ w ? 1, IW (w) = Pr[W ? w|[(1) = 6] Pr[{[(w) = 5} K {[(1) = 6}] + Pr[{[(w) = 6} K {[(1) = 6}] Pr[[(1) = 6] Pr[[(w) = 5] Pr[[(1) 3 [(w) = 1] + Pr[[(w) = 6] Pr[[(1) 3 [(w) = 0] = Pr[[(1) = 6] = = ((w)5 @5!)h3w (1 3 w)h3(13w) + ((w)6 @6!)h3w h3(13w) = 6w5 3 5w6 (6 @6!)h3 The probability that the fifth attack will arrive between 1:30 p.m. and 2 p.m. is IW (1) 3 7 IW 12 = 1 3 64 = 57 . 64 U (d) The expectation of W given [(1) = 6 follows from (2.33) as H [W |[(1)] = 01 {iW ({)g{ gIW (w) derived in (c). Alternatively, the expectation can be computed from where iW (w) = gw U (2.35), H [W |[(1)] = 01 1 3 (6{5 3 5{6 ) g{ = 57 . Hence the expected arrival time of the fifth attack between 1 p.m. and 2 p.m. is about 1:43 p.m. (x) Let [ and [m denote the lifetime of system and subsystem m respectively. For a series {[ A w} and of subsystems with independent lifetimes [m is the event {[ A w} = Kq m=1 m T Pr [[ A w]. Recall with (3.32) that Pr [[ A w] = Pr min Pr [[ A w] = q m 1$m$q [m A w . m=1 Using the definition of the reliability function (7.5) then yields Userie s (w) = q \ Um (w) m=1 (xi) The probability that the system V shown in Fig. 7.6 fails is determined by the subsystem with longest lifetime or [ = max1$m$q [m . Invoking relation (3.33) combined with the definition of the reliability function (7.5) leads to Up a ra llel (w) = 1 3 q \ (1 3 Um (w)) m=1 C.4 Renewal theory (Chapter 8) (i) The equivalence {Q (w) A q} Ui {Zq $ w} indicates " [ Pr ZQ (w) $ { = Pr [{Zq $ {} K {Zq+1 A w}] q=0 = Pr [Z0 $ {> Z1 A w] + " [ q=1 Pr [Zq $ {> Zq+1 A w] C.4 Renewal theory (Chapter 8) 501 The convention Z0 = 0 reduces Pr [Z0 $ {> Z1 A w] = Pr [Z1 A w] = Pr [1 A w] = 1 3 I (w). Furthermore, by the law of total probability, ] " Pr [Zq $ {> Zq+1 A w] = Pr [Zq $ {> Zq+1 A w|Zq = x] 0 ] { g Pr [Zq $ x] gx gx Pr [Zq+1 A w|Zq = x] g Pr [Zq $ x] = 0 A renewal process restarts after each renewal from scratch (due to the stationarity and the independent increments of the renewal process). This implies that Pr [Zq+1 A w|Zq = x] = Pr [q+1 A w 3 x] = 1 3 I (w 3 x) because the interarrival times are i.i.d. random variables. Combined, " [ Pr ZQ (w) $ { = Pr [ A w] + q=1 ] { Pr [ A w 3 x] g Pr [Zq $ x] 0 ] { Pr [ A w 3 x] g = Pr [ A w] + #" [ 0 $ Pr [Zq $ x] q=1 With the basic equivalence (8.6) and the definition (8.7) of the renewal function p(w), we arrive at ] { Pr ZQ (w) $ { = Pr [ A w] + Pr [ A w 3 x] gp (x) 0 This equation holds for all {. If { = w, we can use the renewal equation, ] w ] w Pr [ A w 3 x] gp (x) = p(w) 3 0 Pr [ $ w 3 x] gp (x) 0 = p(w) 3 p(w) + I (w) which indeed confirms Pr ZQ (w) $ w = 1. (ii) The generating function of the number of renewals in the interval [0> w] is with (8.10) " l [ k Pr [Q(w) = n] } n *Q (w) (}) = H } Q (w) = n=0 = Pr [Q(w) = 0] + ] w #[ " $ Pr [Q(w 3 v) = n 3 1] } n i (v)gv 0 = Pr [Q(w) = 0] + } n=1 ] w #[ " 0 ] w = Pr [Q(w) = 0] + } 0 $ Pr [Q(w 3 v) = n] } n i (v)gv n=0 *Q (w3v) (}) gI (v) From (8.6), we have that Pr [Q(w) = 0] = 1 3 I (w) and ] w *Q (w) (}) = 1 3 I (w) + } 0 *Q (w3v) (}) gI (v) By derivation with respect to }, we arrive at the dierential-integral equation for the derivative of the generating function, *0Q (w) (}) = = ] w 0 ] w *Q (w3v) (}) gI (v) + } *Q (w) (}) 3 1 + I (w) } 0 ] w +} 0 *0Q (w3v) (}) gI (v) *0Q (w3v) (}) gI (v) 502 Solutions of problems which reduces to the renewal equation (8.9) for } = 1 since *0Q (w) (1) = p(w). The second derivative ] w ] w 0 (}) = 2 * (}) gI (v) + } *00 *00 Q (w) Q (w3v) Q (w3v) (}) gI (v) 0 2 2 = *0Q (w) (}) 3 } } 0 ] w 0 ] w *Q (w3v) (}) gI (v) + } 0 *00 Q (w3v) (}) gI (v) evaluated at } = 1, is *00 Q (w) (1) = 2p(w) 3 2I (w) + ] w 0 *00 Q (w3v) (1) gI (v) The variance Var[Q(w)] follows from (2.27) as 2 0 0 Var[Q(w)] = *00 Q (w) (1) + *Q (w) (1) 3 *Q (w) (1) ] w = 3p(w) 3 p2 (w) 3 2I (w) + *00 Q (w3v) (1) gI (v) 0 (iii) Every time an IP packet is launched by TCP, a renewal occurs and the reward is that 2000 km are travelled, in each renewal, thus Uq = 2000 km. The speed in a trip that suers from congestion is, on average, 40 000 km/s, while the speed without congestion experience is 120 000 km/s. Since congestion only occurs in 1/5 cases, the average length (in s) of a renewal period is H [ ] = 2000 4 2000 1 7 × + × = 120000 5 40000 5 300 The average speed of an IP packet (in km/s) then follows from (8.20) as lim w<" H [U] 2000 U(w) = = 7 = 85714=3 w H [ ] 300 (iv) Every transmission of an ATM cell is a renewal with average length of the renewal interval equal to H [ ] = Q@u, where 1@u is the mean interarrival time for a voice sample. If q is the time between the q-th and q + 1-th arrival of sample, then the average total cost per ATM cell transmission equals H [U] = H % Q [ & qf × q + N = f q=1 = Q [ qH [q ] + N q=1 f Q(Q 3 1) +N u 2 H[U] (Q 31) . Hence, the average cost per unit time incurred in UMTS is H[ ] = f 2 + Nu Q (v) (a) The replacement of a router is a renewal process where the time at which router Um is replaced is Zm = plq([m > W ), and D> if [m $ W Um = E> if [m A W The average cost per renewal period is H [U] = D Pr [[m $ W ] + E Pr [[m A W ] and the average length of a renewal interval equals ] " H [[] = ] " Pr [Zm A w] gw = 0 ] W Pr [min ([m > W ) A w] gw = 0 The time average cost rate of the policy ChangeRouter is F = Pr [[m A w] gw 0 H [U] . H [[] C.5 Discrete-time Markov chains (Chapter 9) 503 (b) For D = 10000> E = 7000, Pr [[m $ W ] = 1 3 h3W with mean life time and W = 5, we have 1 = 10 years H [U] = D Pr [[m $ W ] + E Pr [[m A W ] = 10000 × 1 3 h31@2 + 7000 × h31@2 ' 8200 and ] W H [[] = ] 5 Pr [[m A w] gw = 0 h30=1w gw = 10 × 1 3 h31@2 ' 4 0 such that time average cost rate of the policy ChangeRouter is F ' 8200 = 2050. 4 C.5 Discrete-time Markov chains (Chapter 9) (i) (a) The Markov chain is drawn in Fig. C.1 0.2 0.8 0.2 1 0.2 2 3 0.8 0.8 Fig. C.1. Three-state Markov chain. n (b) The steady-state vector is computed via (9.24). The sequence S 2 gences to yield three correct digits after four multiplications 5 S2 = 7 5 S8 = 7 0=800 0=640 0=640 0=160 0=320 0=160 6 0=040 0=040 8 0=200 S4 = 7 5 0=762 0=761 0=761 0=190 0=191 0=190 6 0=048 0=048 8 0=048 S 16 = 7 5 rapidly conver- 0=768 0=742 0=742 0=168 0=211 0=186 6 0=046 0=046 8 0=072 0=762 0=762 0=762 0=190 0=190 0=190 6 0=048 0=048 8 0=048 from which we find that the row vector in S 16 equals = 0=762 0=190 0=048 . The second method consists in solving the set (9.25) by Cramer’s method. Hence, 30=2 det P = 0=2 1 2 = 30=2 0=2 1 0=8 31=0 1 0 0 1 det P 0=0 0=8 = 0=84 1 1 = 0=0 0=8 1 = 0=19 1 = 0 0 1 0=8 31 1 0=0 0=8 1 det P 30=2 0=2 1 0=8 31 1 det P = 0=762 0 0 1 = 0=048 The third method relies on the specific structure of the Markov chain, a discrete birth and dead process or general random walk with constant sn = s and tn = t. Applying formula 504 Solutions of problems (11.9), taking into account that Q = 2, = 0=2 = 14 yields, 0=8 1 = 1 3 14 1 3 413 = 16 = 0=762 21 4 = 0=190 21 1 3 = 2 = = 0=048 21 2 = 1 = (ii) The Markov chain is shown in Fig. C.2. The state 1 is an absorbing state. From (9.23), 1 1 2 1 2 1 2 1 3 2 3 3 3 4 4 1 4 4 5 5 1 5 ... N 1 n Fig. C.2. A recurrent Markov chain with positive drift. the steady-state vector components are found as 1 = Q [ n n=1 n 2 = 0 m32 m31 m = m31 mD2 or 1 = 1 and m = 0 for m A 1. Hence, the steady-state vector exists, and is dierent from = 0, which demonstrates that the Markov chain is positive recurrent for any number of states Q. However, the drift for m A 1 because m = 1 is absorbing is H [[n+1 3 [n |[n = m] = 1 3 1 1 2 3 =13 m m m which is always positive for m A 2. Hence, given an initial state m A 2, the Markov chain will, on average, move to the right (higher states). (iii) (a) The Markov chain is shown in Fig. C.3. 1 pb pb b 1 py y 1 pm m 1 po u py pm po Fig. C.3. Markov chain of the growth process of trees in a forest during a period of 15 years. C.5 Discrete-time Markov chains (Chapter 9) (b) The evolution of the Markov process is defined by 6 5 5 se e[n + 1] s| sp 0 0 9 |[n + 1] : 9 1 3 se 7 p[n + 1] 8 = 7 0 0 1 3 s| x[n + 1] 0 0 1 3 sp 505 6 6 5 e[n] sr 0 : 9 |[n] : 8 · 7 p[n] 8 0 x[n] 1 3 sr (c) The number of trees in each category after 15 years (one period) is 5 6 5 6 5 6 5 6 e[1] 0=1 0=2 0=3 0=4 5000 500 0 0 0 : 9 0 : 9 4500 : 9 |[1] : 9 0=9 · = 7 p[1] 8 = 7 0 0=8 0 0 8 7 0 8 7 0 8 x[1] 0 0 0=7 0=6 0 0 and after 30 years (two periods) 5 6 5 e[2] 0=1 9 |[2] : 9 0=9 7 p[2] 8 = 7 0 0 x[2] 0=2 0 0=8 0 0=3 0 0 0=7 6 5 6 5 6 500 950 0=4 0 : 9 4500 : 9 450 : · = 0 8 7 0 8 7 3600 8 0=6 0 0 (d) The steady-state vector obeys equation (9.22) or, equivalently, (9.25). Applying a variant of (9.25), we have 6 5 6 5 6 5 1 1 1 1 e 1 31 0 0 : 9 | : 9 0 : 9 1 3 se · 7 0 8=7 31 0 8 7 p 8 0 1 3 s| 0 0 0 1 3 sp 3sr 0 The determinant is det S = 3s0 3 (1 3 se ) (1 + 2s0 3 s| 3 sp + s| sp 3 s0 s| ) and via Cramer’s method we have 5 6 1 1 1 1 1 0 31 0 0 9 : e = det 7 0 1 3 s| 31 0 8 det S 0 0 1 3 sp 3sr 6 5 31 0 0 7 31 0 8 det 1 3 s| 0 1 3 sp 3sr 3sr = = det S det S With the numerical values given in (c), e = 0=25773. After a similar calculation for the other categories, the total number of trees in steady growth is 6 5 6 5 1289 5000e 9 5000| : 9 1160 : 7 5000 8 ' 7 928 8 p 50000 1624 (iv) (a) The clustered error pattern is modeled as a two-state discrete Markov chain. When a bit is received incorrectly, the system is in state 0 else it is in state 1. The Markov chain is shown in Fig. 9.2, wheres = 1 3 0=95 = 0=05 and t = 1 3 0=999 = 0=001. The transition 0=95 0=05 . probability matrix is S = 0=001 0=999 (b) There is only one communicating class because both states 0 and 1 are reachable from each other. The Markov chain is therefore irreducible. (c) The steady-state vector follows from (9.37) as 1 = 50 = 0=0196 0=9804 51 51 The fraction of correctly received bits in the long run is 98.04% and the fraction of incorrectly received bits is 1.96%. (d) After repair, the system operates correctly in 99.9% of the cases, which implies that 506 Solutions of problems s t 1 = 0=999 and 0 = 0=001. Formula (9.37) indicates that s+t = 0=999 and s+t = 0=001 or s = 999t. The test sequence shows that Pr[[0 = 1> = = = > [11 = 1] = (1 3 t)10 Pr [[0 = 1] = (1 3 t)10 = 0=9999 which leads to t ' 1035 and thus, s = 0=0999. A correctly (incorrectly) received bit is followed by a next correctly (incorrectly) received bit with probability 1 3 t = 0.999 99, respectively 1 3 s = 0.100 01. C.6 Continuous-time Markov processes (Chapter 10) (i) (a) The failure rate for each processor is = 0=001 per hour. The repair rate is = 0=01 per hour. The Markov chain is shown in Fig. C.4. 2O O 1 2 3 P 2P Fig. C.4. The Markov chain for the three states: (1) both processors work, (2) one processor is damaged and (3) both processors are damaged. (b) The infinitesimal generator is 3 T=C 32 0 2 3( + ) 2 4 0 D 32 g (v(w)), or If the state probability vector is denoted by v(w), we can also write v(w)T = gw 3 [v1 (w) v2 (w) v3 (w)] · C 32 0 2 3( + ) 2 4 0 D = v01 (w) 32 v02 (w) v03 (w) (c) The steady-state = limw<" v (w) obeys the equation (10.19) 3 1 2 3 32 ·C 0 2 3( + ) 2 4 0 D = [0 32 0 0] 2 2 2 , 2 = (+) . From Since 1 + 2 + 3 = 1, we find that 1 = + 2 and 3 = + the balance equation, we know that the probability flux from state 1 to state 2 should precisely equal that in the opposite direction such that 21 = 2 and similar for the transitions 2 < 3, 1 = 22 . Using 1 + 2 + 3 = 1 leads faster to the solution. With = 0=001 and = 0=01, the values are 1 = 0=8264, 2 = 0=1653 and 3 = 0=0083. (d) The availability in case (i) is 1 = 0=8264. The availability in case (ii) is 1 +2 = 0=9917. (ii) (a) In state 0, both servers are damaged, state 1 refers to one server down and one operating while in state 2, both servers are operating. The corresponding Markov chain is shown in Fig. C.5. 6 5 0 E 3E 8 7 I + H 3I 3 H 3 (b) The infinitesimal generator T = 2K 3H 3 2K H C.6 Continuous-time Markov processes (Chapter 10) 507 OB O 0 1 PF + PE 2 PH PE 1 31 1 31 = 6=66 × 1032 h 31 , E = = h h 15 20 5 × 1032 h 31 , K = 3 × 1034 h 31 , I = 7 × 1034 h31 and H = 6 × 1035 h31 . Fig. C.5. The Markov chain is specified by = (c) The steady-state vector obeys (10.19). The solution of T = 0 is 5 0 1 2 3E 7 I + H H 0 3I 3 H 3 2K 6 E 8= 0 3H 3 2K 0 0 Since this linear set of equation S is undetermined, we remove an arbitrary equation and add the normalization condition 2m=0 m = 1, 5 0 1 2 3E 7 I + H H 0 3I 3 H 3 2K 6 1 1 8= 0 1 0 1 The steady-state probabilities are E (I + H + ) = 0=9898 (H + E ) (I + H + ) + 2K (I + H + E ) 2K 1 = 2 = 0=0088 (I + H + ) 2 = 0 = 1 3 2 3 1 = 0=0013 (d) Theorem 10.2.3 states that the average lifetime of state m is H [m ] = 1 . This yields tl 1 1 = = 20 h t0 E 1 1 H [1 ] = = = 14=9 h t1 I + H + 1 1 = = 1515 h H [2 ] = t2 H + 2K H [0 ] = (e) A repair takes place when the system transfers from state 1 to 2. When the system jumps from state 0 to state 2, two repairs take place. The fraction of time during which both servers are damaged is 0 and the fraction of time in which one server is operating is 1 . The rate of repairs will be the rate of changing from state 1 to 2, plus two times the rate of changing from state 0 to state 2: iu = 1 t12 + 20 t02 = 1 + 20 E = 7=17 1034 If we denote with [ the random variable of the number of total failures over the period of 1 year, then the average value of [ will be H [[] = iu × 24 × 365 = 6=28 508 Solutions of problems C.7 Continuous-time Markov processes (Chapter 11) (i) In both cases we apply the general formulae (11.15) and (11.16) for the steady-state of a general birth and death process. (a) Using the notation = , we first compute m31 \ m31 m \ m \ 1 m p = = m = (p + 1) p=1 p m! p=0 p+1 p=0 Then, with (11.16), m = m m! S m 1+ " m=1 m! = m 3 h m! mD0 which demonstrates that the steady-state probability that the birth and death process is in state m is Poisson distributed with mean . and p = , (b) Similarly, we first compute with p = (p+1) m31 \ m31 \ m p = = (p + 1) m! p=0 p+1 p=0 which leads to precisely the same steady-state as in (a). Indeed, the steady-state is only a function of the ratios p , which are the same in both (a) and (b). p+1 (ii) All stations in slotted ALOHA operate independently and each has probability sw = 0=12 to transmit in a timeslot. A station is successful in one slot with probability sv = sw (13sw )Q 31 where the number of stations Q = 8. Thus, sv = 0=049. The waiting time Z to transmit one packet is a geometric random variable with parameter sv from which (Section 3.1.3) the mean H [Z ] = s1 . Alternatively, H [Z ] obeys the equation v H [Z ] = sv + (1 3 sv ) (1 + H [Z ]) because the average waiting time equals 1 timeslot with probability sv plus 1 timeslot increased with the average waiting time with probability 1 3 sv . Solving that equation again yields H [Z ] = s1 = 20=39 timeslots. The average transmission time for 7 packets is v 7H [Z ] = 142=7 timeslots. C.8 Queueing (Chapters 13 and 14) (i) Let us denote the number of packets in the server by Q{ . Since a router either serves 0 or 1 packet, the problem states that Pr [Q{ = 1] = 0=8, and also that H [Q{ ] = 0=8. For any queue, it holds that QV = QT + Q{ and W = z + {, the number in the system equals the number in the buer and the number that is being served. From Little’s Theorem (13.21), 1 that it follows with H [{] = H [Q{ ] = or = H [Q{ ]. Substituted into Little’s law for the waiting time in the buer, H QT = H [z], and using H QT = 3=2 gives H [z] = H QT 1 4 = H [Q{ ] (ii) In a M/M/m/m queue, the number of busy servers equals the number (of packets) in the C.8 Queueing (Chapters 13 and 14) 509 system QV . From (14.16) and the definition (2.11), the average number of busy servers equals H [QV ] = p [ m=0 m Pr [QV = m] = Sp 1 p [ m m (m 3 1)!m m=0 m!m m=1 The sum can be rewritten as 6 5 p31 p [ m 7[ m p 8 m = = 3 (m 3 1)!m m=0 m!m m=0 m!m p!p m=1 p [ such that 6 5 p 7 p!p 8 = (1 3 Pr [QV = p]) H [QV ] = 1 3 Sp m m m=0 m! where the last probability is recognized as the Erlang B formula (14.17). (iii) (a) Since the average service rate = 2 s31 , the average response time (average system time) follows from (14.2) as 1 H WM /M / 1 = 23 (b) If H WM / M / 1 = 2=5 s, then it follows from (a) that = 1=6 s31 . Hence, the number of jobs/s that can be processed for a given average response time of 2.5s equals 1.6 jobs/s (c) A 10% increase in arrival rate corresponds to = 1=76 s31 and from (a) we obtain that 1 H WM / M / 1 = 0=24 = 4=17 s, which is with respect to 2.5s an increase in average response time of 67%. (iv) We know that when the average call holding time is 1@ = 10 min, the time blocking 1 . Additionally, for Poisson call arrivals, the time blocking probability probability SB t = 10 SB t equals the call blocking probability SB on the PASTA property. The number of channels is p = 2. The arrival intensity can be calculated from the Erlang B formula (14.17) SB = u2 @2 1 + u + u2 @2 where u = @. Solving this equation for u = 2 and taking into account that the tra!c t intensity M [0> 1] yields = I SE + 2 2SE 3SE 13SE I 1 . For SB = 10 , we have that u = 1+9 19 from which = 1+90 19 . The blocking probability (14.17) corresponding to an average call I 1 + 19 × 15 is SB t E 0=174. holding time 1@ = 15 min for which u = @ = 90 (v) The queue 1 is a M/M/1 queue. By Burke’s Theorem 14.1.1, the departure process from queue 1 is a Poisson with rate . By assumption, this departure process which is the arrival process to the second queue is also independent of the service process at queue 2. Therefore, queue 2, viewed in isolation, is also a M/M/1 queue. We know that the queueing processes in both queues are stable because the load 1 = ? 1 and 2 = ? 1. The 1 2 steady-state distribution of the number of customers in queue 1 and queue 2 follow from q p (14.1) as Pr[q at queue 1] = 1 (1 3 1 ) and Pr[p at queue 2] = 2 (1 3 2 ). The number of customers presently in queue 1 is independent of the sequence of earlier arrivals at queue 2 and therefore also of the number of customers presently in queue 2. This implies that p Pr[q at queue 1> p at queue 2] = Pr[q at queue 1] Pr[p at queue 2] = q 1 (131 )2 (132 ) (vi) The average system times H [W ] for the three dierent queueing systems are immediate. From (14.2), for system A, we have H [WD ] = 1 n (1 3 ) 510 Solutions of problems For each of the n subqueues of system B, (14.2) gives H [WE ] = 1 (1 3 ) while for the M/M/k queue, (14.14) yields H [WF ] = n (1 3 ) + Pr [QV D n] n (1 3 ) Clearly, H [WE ] = nH [WD ] shows that by replacing n small systems by one larger system with same processing capability, the average system time decreases by a factor n. From the relation H [WF ] = i (n> )H [WD ] with i (n> ) = n (1 3 ) + Pr [QV D n], it is more complicated to decide where i (n> ) is larger or smaller than 1. The extreme values Ci (n>) C Pr[QV Dn] of i (n> ) are known: i (n> 0) = n and i (n> 1) = 1. Since = 3n + C C (n>) V Dn] and C Pr[Q A 0, it cannot be concluded that CiC is monotonously decreasing C from n to 1 in which case we would have i (n> ) A 1. Assuming n real, we observe that Ci (n>) C Pr[QV Dn] = (1 3 ) + A 0 for all ? 1, which implies that i (2> ) ? i (3> ) ? · · · Cn Cn and allows us to concentrate only on i (2> ). Numerical results show that i (2> ) ? 1 if D 0=85, but i (3> ) D 1. This leads us to the conclusion that for n A 2, system A always outperforms system C; only if n = 2 and in the heavy tra!c regime D 0=85, system C leads to a slightly shorter average system time of maximum 1.7%. Hence, by replacing n A 2 processing units (servers) by one with same processing capability, always lowers the total time spent in the system. Of course, all conclusions only apply to systems that can be well modeled as M/M/m queueing systems. To first order a computing device (processor) may be regarded as a M/M/1 queue. Then, the analysis shows that replacing an old processor by a n times faster one is faster (on average) than installing n old processors in parallel. (vii) The waiting process for aeroplanes is modeled as a M/D/1 queue because the arrival process 1 is Poissonean with rate = 10 arrivals/minute, it consists of a single queue as 1 aeroplane can land at a time and the service process (the landing process) takes precisely { = 5 5 . Since minutes (constant service time). Thus, H[{] = 5 minutes> Var[{] = 0 and = 10 the M/D/1 process is a special case of the M/G/1, we can apply the general formula (14.28) for the average waiting time in the queue of an M/G/1 system 1 H[z] = · 52 H[{2 ] = 10 1 = 2=5 minutes 2(1 3 ) 2· 2 (viii) (a) We know that the arrival intensity of new calls to the cell is i = 20 calls/min. Let k denote the arrival rate of the handover calls. The average time spent by a call in the cell is H [W ] = 1=64 minutes and the average number of ongoing calls is H [Q] = 52. Furthermore, the blocking rate is SE = 0=02. The total arrival rate of calls that are carried by the base station is c a rrie d = (1 3 SE ) o e red = (1 3 SB ) (f + h ) Little’s formula (13.21) states that H [Q] = c a rried H [W ]. Note that only the carried calls have an influence on the state of the system. We can solve the asked k from these two equations as h = H [Q] 3 f = 12=35 calls/minute H [W ] (1 3 SB ) (b) The arrival intensity of lost calls is lo st = SB (f + h ) = 0=647 calls/minute. If only the new calls are blocked, the asked blocking rate is SB f = lo st = 3=24% f (ix) (a) The derivation is given in the study of the M/G/1 queue in Section 14.3.1 where D (}) given by (14.25) should be replace by H } Q . C.8 Queueing (Chapters 13 and 14) 511 1 b) Use in (14.25) the Laplace transform of an exponential random variable with mean given in (3.16) *{ (v) = +v One obtains k l H } Q = *{ ( (1 3 })) = @ ( + ) = (1 3 }) + 1 3 (@ ( + )) } [ n " }n = 13 + n=0 + from which the probability density function Pr [Q = n] = 13 + + n follows. Thus, Pr [Q = n] is recognized as a geometric random variable with mean H [Q] = . (x) The queueing system is modeled by a birth and death process. The death rate is obvious and equal to . The arrival rate into state m equals the arrival rate of customers multiplied 1 by the probability of really going to that state m which is m+1 . Hence, m = m+1 . The steady-state equation of this birth and death process is a Poisson process with rate = as derived above in Section C.7 solution (i). (xi) The M/M/m/m/s queue (The Engset formula ). The arrival rate in Engset model is proportional to the size of the “still demanding subgroup” and the number of arrivals is 1 exponential. The holding time of a line is exponentially distributed with mean . The Engset model is described as a birth—death process where each state n refers to the size of the “served subgroup”. Since the total of customers is v, the “still demanding subgroup” consists of v 3 n members. The birth rate is m = (v 3 n) and the death rate n = n. The proportionality factor can be interpreted as the arrival rate per “still demanding customer”. The Markov graph is depicted in Fig. C.6. Application of the general birth—death sD 1 (s 1)D 2 P (s 2)D ... 3 2P (s m + 1)D 3P m mP Fig. C.6. The Markov chain of the Engset loss model. formulae for the steady-state vector (11.16) or Pr [QV = m] yields, with u = , Tm (v3n+1) n m = Sp Tq (v3n+1) 1 + q=1 n=1 n n=1 v m u m = Sp v q q=0 q u The computation of the blocking probability is more complex than for the Erlang B formula, because the arrival process is not a Poisson process. Indeed, due to the finite number of customers v, the largest number of possible arrivals is finite and the arrival rate depends on the state. Hence, the PASTA property cannot be applied. For a small time interval {w, the blocking probability SE equals the ratio of the se ({w), the probability of blocking in {w, over sd ({w), the probability of an arrival in {w. Since the arrival rates depend on 512 Solutions of problems the state, the probability of an arrival in {w is not equal to p as for the Erlang B model. Instead, we have Sp p [ (v 3 q) v uq (v 3 q) Pr[QV = q] = {w q=0 sd ({w) = {w Sp v q q q=0 q u q=0 Sp v31 q u q=0 q = v{w S v q p q=0 q u Furthermore, blocking is only caused if QV = p and if at least one of the v 3 p customers of the “still demanding group” generates an arrival. However, since the interval {w can be made arbitrarily small1 , a generation of more than 1 arrival has probability r({w) such that it su!ces to consider only one call attempt. Hence, v31 p u se ({w) = {w(v 3 p) Pr[QV = p] = v{w Spp v q q=0 q u The Engset call blocking probability SE = sse ({w) becomes ({w) d v31 p u SE = Sp pv31 uq q=0 q (C.3) Observe that SE = Pr [QV = p] in a system with v 3 1 instead of v customers: an entering customer observes a system with v 3 1 customers ignoring himself. At last, if we denote = v, the Engset call blocking formula (C.3) can be rewritten as p SE = S 1 p! (v313p)!vpq p q=0 q!(v313q)! q (v313p)!vpq (v313p)! The ratio (v313q)! is a polynomial in v of degree q3p such that limv<" = (v313q)! 1. In conclusion, if = v and v < ", the Engset call blocking probability reduces to the Erlang B formula (14.17). (xii) Although for a M/D/1 the exact expression of the overflow probability (14.44) exists, this series converges slowly for high tra!c intensities = so that fast executable expressions are desirable. Substituting (14.45) with = into (14.67) gives fou() E 13 3N 13 31 13 3N 1 3 31 For su!ciently high loads A 0=8, we use the approximation E 32 of Section 5.7 to obtain (1 3 )2N (C.4) fouM / D / 1 / K ' 1 3 2N+1 Comparing with (14.20) in the M/M/1/K queue, fouM / M / 1/ K ' (1 3 )N 1 3 N+1 the M-server (in continuous-time) needs approximately twice as much buer places to guarantee the same cell loss ratio as in the corresponding D-server (in discrete-time). Further combining (14.3) and (14.38) shows that M / M / 1 H zM / M / 1 = 2M / D / 1 H zM / D / 1 or, the average waiting time in the queue (normalized to the average service time) for the 1 Similar arguments are used in Chapter 7 when studying the Poisson process. C.9 General characteristics of graphs (Chapter 15) 513 M/M/1 queue is exactly twice as long as for the M/D/1 queue. The variability of the service in the M-server causes these rather large dierences in performance. Furthermore, the simple formula (C.4) is particularly useful to engineer ATM buers or to dimension simple queueing networks. If the number of individual flows that constitute the aggregate flow are large enough and none of the individual flows is dominant, the aggregate arrival process is quite well approximated by a Poisson process. Given as a QoS requirement a stringent cell loss ratio fou W , the input flow can be limited such that fouM / D / 1 / K ? fouW . Alternatively, the buer size N can be derived from (C.4) subject to fouM / D / 1 / K = fou W for an aggregate Poisson input flow = 0=9. As long as the input flow is limited to ? 0=9, the thus found buer size N always guarantees a cell loss ratio below fou W provided the input flow can be approximated as a Poisson arrival process. C.9 General characteristics of graphs (Chapter 15) (i) In one dimension (g = 1), the hopcount kQ of the shortest path between two uniformly chosen points {D and {E equals the distance between {D and {E . We allow the hopcount to be zero which is reflected by the small k while capital K refers to the case where the source D 1 and the destination E are dierent. Thus, Pr[|{D 3 {E | = n] = ] 1n=0 + 2(]3n) 11$n$]31 ]2 with corresponding generating function *] ({) = ]31 [ Pr[|{D 3 {E | = n]{n = n=0 ] 3 ]{2 + 2{({] 3 1) ] 2 ({ 3 1)2 Since the nodes are uniformly chosen, all coordinate dimensions are independent and the generating function of the hopcount of the shortest path in a g-lattice is2 *g] ({). From g (] 2 3 1) and (2.26) and (2.27), the average number of hops is immediate as H[kQ ] = 3] 2 2 g(] 31)(] +2) . The total number of nodes in the g-lattice is the variance as Var[kQ ] = 18] 2 g Q = ] such that, for large Q, we obtain H[kQ ] ' g 1@g Q 3 Var[kQ ] ' g 2@g Q 18 and both increasing in g A 1 (for constant Q )as inQ (for constant g). For a two-dimensional I Q . lattice, the average hopcount scales as R (ii) Using the definition (15.6) of the clustering coe!cient and applying the law of total probability (2.46) yields 31 l Q[ k Pr Pr fJs (Q ) $ { = n=0 2| $ { gy = n Pr [gy = n] gy (gy 3 1) The degree distribution Pr [gy = n] in the random graph is given by (15.11) and k l Pr n { 2 n n n [ 2| 3m 2 sm (1 3 s) 2 { gy = n = $ { gy = n = Pr | $ m 2 g (g 3 1) y 2 y If the sizes of the hypercube are not identical, the pgf is m=0 g \ m=1 *]m ({). 514 Solutions of problems because | is the number of links between the gy = n neighbors of y, which is binomially distributed with parameter s. Combined gives k l n { 2 31 n l Q[ k n [ Q 3 1 n 3m 2 sm (1 3 s) 2 Pr fJs (Q ) $ { = s (1 3 s)Q 313n m n m=0 n=0 l k The average H fJs (Q ) is computed via (2.35) as k l ] 1 k l H fJs (Q ) = Pr fJs (Q ) A { g{ 0 = Let w = n Q 31 [ ] 1 2 [ n=0 0 k l m= n { +1 2 Q 3 1 n s (1 3 s)Q 313n n n {, then 2 n ] 1 0 n n 3m 2 sm (1 3 s) 2 g{ m ] n n n 2 1 3m m 2 s (1 3 s) 2 g{ = n m k l 0 n n 2 n n [ 3m 2 sm (1 3 s) 2 gw m 2 [ m= 2 { 2 m=[w]+1 n n 2 2 n 1 [ [ n2 m 3m s (1 3 s) 2 = n m w=0 m=w+1 2 Reversing the w- and m- sum yields ] 1 0 n n 2 m31 n n n 1 [ [ n2 m 3m 3m 2 sm (1 3 s) 2 s (1 3 s) 2 g{ = n m m k l m=1 w=0 n 2 [ m= 2 2 { n 2 n 1 [ n2 m 3m s (1 3 s) 2 = n m =s m m=1 2 l k Hence, we find that H fJs (Q ) = s. Along the same lines, we find that the generating function *f (}) of the clustering coe!cient fJs (Q ) is *f (}) = Q 31 [ n=0 Q 3 1 n s (1 3 s)Q 313n n # } 3 n $ n 1 3 s + sh (2) 2 The variance is computed from (2.43) as 31 k l Q[ Q 3 1 sn (1 3 s)Q 313n Var fJs (Q ) = s 3 s2 n n n=2 2 (iii) The probability Pr [KQ = 2] is determined by the intersection of two independent events. First, there is no direct path between node D and E. This event has a chance proportional to 13s. Second, there is at least one path with two hops. All Q 32 possible two-hops paths between D and E have the structure (D < m) (m < E) and they have no links in common, i.e. they are mutually independent and independent from the direct link. The probability of the second event equals 1 3 S2 , where S2 is the probability that there is no path with C.10 The uniform recursive tree (Chapter 16) 515 two hops. Hence, we have that Pr [KQ = 2] = (1 3 s)(1 3 S2 ) and it remains to compute S2 . The event of no path with two hops is f 32 32 = KQ Q m=1 1(D<m)(m<E) m=1 1((D<m)(m<E))f such that S2 = Pr = k f l k l 32 32 = Pr KQ Q m=1 1(D<m)(m<E) m=1 1((D<m)(m<E))f Q 32 \ 32 Q\ Q 32 1 3 Pr 1((D<m)(m<E)) = 1 3 s2 Pr 1((D<m)(m<E))f = m=1 m=1 which demonstrates (15.26). C.10 The uniform recursive tree (Chapter 16) (i) The relative error u, defined as 1 minus the simulated value over the exact value at hop n given in (16.14), versus the number of hops n is shown in Fig. C.7. The insert in Fig. C.7 illustrates that, on a linear scale, the dierence between simulation and theory (full line) is not distinguishable for q D 105 iterations. The average H [uq ] and standard deviation 1 4 10 iterations 5 10 iterations 6 10 iterations 0.20 Pr[H50 = k] Relative error 0.1 0.01 exact pdf 0.15 0.10 0.05 0.00 0.001 0 0 5 10 2 4 6 8 10 12 14 k hops 15 20 25 k hops Fig. C.7. The relative error of the simulations of the hopcount in the complete graph with exponential link weight versus the hopcount for 10 4 > 105 and 106 iterations. [uq ] of the relative error for q iterations versus the hops n are H [u104 ] = 0=12 H [u105 ] = 0=047 H [u106 ] = 0=017 [u104 ] = 0=17 [u105 ] = 0=073 [u106 ] = 0=02 where the range of n values has been limited for q = 104 to 10 hops, for q = 105 to 11 hops, and for q = 106 to 12 hops. For larger hops, the simulations return zeros because the tail probability Pr [KQ A n] decreases as R (1@n!) and simulating such a rare event requires on average at least as many simulations as (Pr [KQ = n])31 . The table roughly shows 516 Solutions of problems that the average error over the non-zero returned values decreases as R I1q , which is in agreement with the Central Limit Theorem 6.3.1. Each iteration of the simulation can be regarded as an independent trial and the histogram sums in a particular way the number of these trials. (ii) Using (2.43), we have 2 2 0 Var [ZQ ] = *00 = *00 ZQ (0) 3 *ZQ (0) ZQ (0) 3 (H [ZQ ]) $2 # Q 31 Q 31 n [ [ g2 \ q(Q 3 q) 1 1 1 = 3 Q 3 1 n=1 g} 2 q=1 } + q(Q 3 q) Q 3 1 q=1 q }=0 where H [ZQ ] is given in (16.18). The derivatives of the product j (}) = n \ q(Q 3 q) } + q(Q 3 q) q=1 are elegantly computed via the logarithmic derivative gj(}) = j (}) g logg}j(}) . The second g} 2 2 2 g j(}) g log j(}) j(}) + j (}) g log . With derivative is g}2 = j (}) g} g} 2 n n [ g [ 1 q(Q 3 q) g log j (}) = =3 log g} g} q=1 } + q(Q 3 q) } + q(Q 3 q) q=1 n [ g2 log j (}) 1 = 2 g} 2 q=1 (} + q(Q 3 q)) we obtain since j(0) = 1, Q 31 [ 1 Var [ZQ ] = Q 3 1 n=1 $2 # n [ 1 q(Q 3 q) q=1 Q 31 [ n [ 1 1 + 3 Q 3 1 n=1 q=1 q2 (Q 3 q)2 # SQ 31 1 $2 q=1 q Q 31 (C.5) The first sum is Q 31 [ n=1 1 q(Q 3 q) q=1 and, with Q 31 [ n=1 $2 # n [ Q 31 [ n [ Q 31 n [ [ 1 1 = = q(Q 3 q) m=1 m(Q 3 m) q=1 n=1 q=1 SQ 31 Sn n=q 1 m=1 m(Q 3m) q(Q 3 q) SQ 31 Sn # n [ n=q SQ 31 Sn Sq31 Sn 1 1 1 m=1 m(Q 3m) = m=1 m(Q 3m) 3 m=1 m(Q 3m) , n=1 n=1 1 q(Q 3 q) q=1 $2 3 4 Q 31 [ q31 n n [ [[ 1 1 1 C D = 3 q(Q 3 q) n=1 m=1 m(Q 3 m) n=1 m=1 m(Q 3 m) q=1 4 3 Q 31 Q 31 Q 31 q31 q31 [ [ [ [ [ 1 1 1 C 13 1D = q(Q 3 q) m=1 m(Q 3 m) n=m m(Q 3 m) n=m q=1 m=1 Q 31 [ = Q 31 [ Q 31 Q 31 q31 [ [ [ 1 1 1 1 3 q(Q 3 q) m (Q 3 q) m(Q 3 m) q=1 q=1 m=1 m=1 + Q 31 [ q31 [ 1 1 q(Q 3 q) m=1 (Q 3 m) q=1 C.10 The uniform recursive tree (Chapter 16) Furthermore, since 1 Sn Sn 1 q=1 q(Q 3q) = Q 517 1 SQ 31 1 q=Q 3n q , we have 1 q=1 q + Q Q 31 [ q31 Q 31 q31 Q 31 Q 31 [ [ 1 [ 1 [ 1 [ 1 1 1 1 1 = + (Q 3 q) m(Q 3 m) Q (Q 3 q) n Q (Q 3 q) n q=1 q=1 q=1 m=1 n=1 n=Q 3q+1 Q[ 3m31 Q 31 = 1 [ 1 Q m=1 m n=1 Q 31 Q 31 1 [ 1 [ 1 1 + n Q m=1 m n=m+1 n and q31 q31 Q 31 [ [ 1 [ 1 1 1 1 1 = + q(Q 3 q) (Q 3 m) Q q Q 3 q (Q 3 m) q=1 q=1 m=1 m=1 Q 31 [ = Q 31 Q 31 Q 31 Q 31 [ 1 [ 1 [ 1 1 1 [ 1 + Q m=1 m n=Q 3m+1 n Q m=1 m n=m+1 n Hence, Q 31 [ n=1 $2 # n [ #Q 31 $2 [ 1 2 = Q + 1 = Q = Q 31 Q[ 3m31 Q 31 Q 31 [ 1 [ 1 1 1 + q n Q m n q=1 m=1 n=1 n=Q 3m+1 3 4 #Q 31 $2 Q 31 Q 31 Q 31 [ 1 [ 1 1 [ 1 C[ 1 2 D 3 = 3 Q q=1 q Q m=1 m n=1 n n=Q 3m n 1 q(Q 3 q) q=1 3 1 [ 1 Q m=1 m Q 31 Q 31 [ 1 [ 1 1 Q m=1 m n=Q 3m+1 n #Q 31 $2 [ 1 q=1 q #Q 31 $2 [ 1 1 Q q=1 q + Q 31 Q 31 Q 31 [ 1 [ 1 1 2 [ 1 1 + Q m=1 m Q 3 m Q m=1 m n=Q 3m+1 n + Q 31 Q 31 Q 31 [ 2 [ 1 1 2 [ 1 + Q 2 m=1 m Q m=1 m n=Q 3m+1 n Substituted into (C.5) yields S 2 Q 31 1 Var [ZQ ] = 3 q=1 q 2 (Q 3 1) Q SQ 31 Sn SQ 31 1 m=1 m + (Q 3 1) Q 2 2 SQ 31 1 SQ 31 m=1 m 1 n=Q 3m+1 n Q (Q 3 1) 1 q=1 q2 (Q 3q)2 n=1 + 2 + Q 31 Further, Q 31 [ n [ n=1 q=1 1 q2 (Q 3 q)2 = Q 31 [ q=1 1 q2 (Q 3 q)2 Q 31 [ n=q 1= Q 31 [ 1 2 (Q 3 q) q q=1 1 such that The partial fraction expansion of q2 (Q1 3q) = Q12 q + Q1q2 + Q 2 (Q 3q) Q 31 [ n [ n=1 q=1 1 q2 (Q 3 q)2 = Q 31 Q 31 2 [ 1 1 [ 1 + 2 Q q=1 q Q q=1 q2 518 Solutions of problems Combined, S 2 Q 31 1 Var [ZQ ] = 3 q=1 q 2 4 (Q 3 1) Q SQ 31 1 2 q=1 q + (Q 3 1) Q 2 Q (Q 3 1) + SQ 31 1 Q 31 [ 1 1 q=1 q2 + m n=Q 3m+1 n Q(Q 3 1) m=1 Q 31 [ Invoking the identity Q 31 [ Q 31 Q 31 [ [ 1 1 1 = Q 3 m n=m n q2 q=1 m=1 (which can be verified by induction) yields 3 4 Q 31 Q 31 Q 31 Q 31 Q 31 [ [ [ [ 1D 1 1 1 1 1 C[ 1 = = 3 m n=Q 3m+1 n Q 3 m n=m+1 n Q 3 m n=m n m m=1 m=1 m=1 Q 31 [ = Q 31 [ Q 31 Q 31 Q 31 Q 31 [ [ [ 2 [ 1 1 1 1 1 3 3 = 2 Q 3 m n=m n m (Q 3 m) q Q q=1 q q=1 m=1 m=1 Finally, we arrive at (16.19). (iii) The limit for Q < " of the probability generating function (16.17) of the weight ZQ of the shortest path l k *ZQ (}) = H h3}ZQ = Q 31 \ n [ q(Q 3 q) 1 Q 3 1 n=1 q=1 } + q(Q 3 q) will be derived from which the distribution then follows by taking the inverse Laplace transform. Since 3v 4 3v 4 2 2 Q Q Q Q C D C } + q(Q 3 q) = +}+ +}3 3q 3q D 2 2 2 2 u we have with | = Q 2 2 + }, n \ n n \ n!(Q 3 1)! \ q(Q 3 q) 1 1 = Q Q } + q(Q 3 q) (Q 3 n 3 1)! 3 q q=1 | 3 +q q=1 q=1 | + 2 2 The products can be written in terms of the Gamma function, 3n K |+ Q 2 = Q K |+ Q q=1 | + 2 3 q 2 n +1 K |3 Q \ 1 2 = Q K |3 Q +n+1 q=1 | 3 2 + q 2 n \ 1 Thus, Q 31 K |3 Q + 1 Q[ K (n + 1) K | + 2 3 n 2 *ZQ (}) = (Q 3 2)! K(Q 3 n) K | 3 Q + n + 1 K |+ Q n=1 2 2 C.10 The uniform recursive tree (Chapter 16) 519 I } Let the number of nodes be even Q = 2P such that | = P 2 + } ; P + 2P (provided |}| ? 2P). The sum, denoted by V, can be split as V= = 2P 31 P [ [ K (| + P 3 n) K (| + P 3 n) K (n + 1) K (n + 1) + K(2P 3 n) K (| 3 P + n + 1) K(2P 3 n) K (| 3 P + n + 1) n=1 n=P+1 P 31 [ P 31 [ K (P 3 m + 1) K (| + m) K (P + n + 1) K (| 3 n) + K(P + m) K (| 3 m + 1) K(P 3 n) K (| + n + 1) m=0 n=1 P 31 [ = m=3(P 31) K (| + m) K (P 3 m + 1) K(P + m) K (| 3 m + 1) and *Z2P (}) = (2P 3 1)! K (| 3 P + 1) 1 K (| + P) 2P 3 1 P 31 [ m=3(P 31) K (| + m) K (P 3 m + 1) K(P + m) K (| 3 m + 1) For large P, (2P 3 1)! } K } +1 } K (| 3 P + 1) ; (2P)3 2P K ; K (2P) 2P +1 } K (| + P) 2P K 2P + 2P which suggests that we consider } < 2P} since then, using (Abramowitz and Stegun, 1968, Section 6.1.47), *Z2P (2P}) ; (2P)3} K (} + 1) 3} ; (2P) 1 2P 3 1 P 31 [ m=3(P 31) 1 1 1+R 1+R P P K (} + 1) Hence, lim Q } *ZQ (Q}) = K (} + 1) Q <" or equivalently, l k lim H h3(Q ZQ 3log Q )} = K (} + 1) Q <" (C.6) The inverse Laplace transform of K (} + 1) is a Gumbel distribution (3.37) and we arrive at the asymptotic distribution for the k weight of thel shortest path (16.20). Q Since Pr [QZQ 3 log Q $ |] = Pr ZQ $ |+log Q from which after substitution of { = |+log Q Q and ignoring the limit Q < ", it follows that Pr [ZQ $ {] = h3Q h probability density function is found after derivation as Q { +{) i˜ZQ ({) = Q 2 h3Q (h Q{ , the (C.7) The goodness of this asymptotic distribution (C.7) for finite Q is illustrated in Fig. C.8. 2 3Q E 0. Since * ˜ Observe ZQ (}) = U " 3}w from Fig. C.8 that iZQ (0) = 1 while iZQ (0) = Q h h i (w) gw is a single-sided Laplace transform, integrating by parts yields }*ZQ (}) = ZUQ 0 0 0 iZQ (0) + 0" h3}w iZ (w) gw provided iZ (w) exists for all w D 0. Hence, we find a wellQ Q known limit criterion of single-sided Laplace transforms, iZQ (0) = lim }*ZQ (}) }<" (C.8) Applied to (16.17) leads to iZQ (0) = 1 for all finite Q and applied to the scaled link weight where the mean is d1 such that *ZQ;d (}) = *ZQ (d}) gives iZQ (0) = d. The interpretation of this property is related to the choice of the link weights. The shortest 520 Solutions of problems 2 10 N = 200 N = 100 N = 50 1 10 0 fWN(x) 10 -1 10 -2 10 -3 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 x Fig. C.8. The pdf of the weight of the shortest path for various Q. Each simulation consists of 106 iterations. The bold curves represent the finite Q-equivalent (C.7) of the asymptotic result. path includes almost surely the smallest link weights of Iz ({) = iz (0){ + R {2 since Iz (0) = 0. Both the exponential (with parameter 1) and uniform distribution are regular with iz (0) = 1. Since the smallest values of the weight of the shortest path ZQ in NQ occur for direct links a.s., the distribution of ZQ around zero is dominated by the distribution of the link weight z around zero. The contribution cannot be due a q-hop shortest path with q A 1 since such a path existsof thesum of q exponentials, which has a probability density around { = 0 of the form R {q31 . Indeed, see (3.24) or apply (C.8) T n to the pgf of a sum of q exponentials *Vq (}) = q n=1 }+n . (iv) When the intermediate nodes of the shortest path between a source D and a destination E are removed from NQ , we obtain again a complete graph with Q 3 kQ + 1 nodes. The resulting graph contains link weights that are not perfectly exponentially distributed anymore nor are they perfectly independent, because we have removed a special set of nodes and not a random set. But, since we have removed at each node of the shortest path, apart from the shortest link, Q 3 3 other links, we assume that the dependence between NQ and the reduced graph is ignorably small. Under these assumptions, the shortest node-disjoint path in NQ is a shortest path in NQ 3KQ +1 with exponential link weight with mean 1. The distribution of hopcount kqg Q of that shortest node-disjoint path is 31 k l Q[ Pr kqg Pr kQ 3m+1 = n|kQ = m Pr [kQ = m] Q =n ' m=0 The hopcount KQ of the shortest path in the complete graph NQ with independent exponential link weights with mean 1 is given in (16.8). With the assumption that Pr kQ 3m+1 = n|kQ = m = Pr kQ 3m+1 = n C.10 The uniform recursive tree (Chapter 16) 521 node disjoint shortest path shortest path 0.20 Pr[HN = k] 0.15 N = 200 0.10 N = 100 N = 50 0.05 0.00 0 2 4 6 8 10 12 14 k hops Fig. C.9. Both the pdf of the hopcount of the shortest path (thin line) and the shortest node-disjoint path (bold line). we obtain (m+1) Q 31 (n+1) k l (31)n+1 [ VQ 3m+1 VQ Pr kqg Q =n ' Q! (Q 3 m + 1)! m=0 For large Q, we can use the Poisson approximation (16.13) l k Pr kqg Q =n E Q 31 1 [ (log(Q 3 m + 1))n (log Q )m Qn! m=0 Q 3m+1 m! Since (log(Q 3 m + 1))n = logn Q 3 n(m31) logn31 Q + R Q R Q12 , we have to highest order in Q, logn1 Q2 1 1 and Q 3m+1 = Q + # # $$ Q 31 1 [ logn Q logn31 Q (log Q )m +R 2 Qn! m=0 Q Q m! # # $$ 1 logn Q logn31 Q E +R E Pr [kQ = n] n! Q Q2 l k Pr kqg Q =n E For large Q, we expect approximately that the hopcount of the shortest and that of the shortest node-disjoint path have about the same distribution. The validity of the assumption is illustrated in Fig. C.9 for relatively small values of Q = 50> 100> and 200. Each simulation consisted of q = 106 iterations. The corresponding weight of the shortest and node-disjoint shortest path are drawn in Fig. C.10. The weight of the node-disjoint shortest path is evidently always larger than that of the shortest path in the same graph. Nevertheless, for large Q , the simulations suggest that both pdfs tend to each other. 522 Solutions of problems shortest path node-disjoint shortest path 10 N = 50 N fW (x) 1 0.1 N = 100 N = 200 0.01 0.001 0.0 0.1 0.2 0.3 0.4 0.5 x Fig. C.10. Pdf of the weight of the shortest path (thin line) and the node-disjoint shortest path (bold line). C.11 The e!ciency of multicast (Chapter 17) (i) Using (17.25), we obtain jQ>n (2) = Q 3 1 3 G31 [ n G3m m+1 m+1 Q 3 1 3 n n3131 Q 3 2 3 n n3131 m=0 = (Q 3 1)(Q 3 2) (2Q 3 3)G (2Q 3 3) (2Q 3 3)GQ + 3 + (Q 3 1)(Q 3 2) (Q 3 1)(Q 3 2)(n 3 1) (Q 3 2)(n 3 1) Q(Q 3 1 3 2G) 1 (Q 3 1 3 2G) 3 3 3 (Q 3 1)(Q 3 2)(n 3 1) (Q 3 1)(Q 3 2)(n 3 1)2 (Q 3 2)(n 3 1)2 or, for large Q, jQ>n (2) ; 2G 3 3 +R n31 logn Q Q the eective power exponent W (Q) as defined in (17.32), equals for the n-ary tree and large Q , 3 2G3 n1 log 1 G3 n1 W (Q) ; log 2 5 6 1 8 = 1 + log2 71 3 1 2(n 3 1) logn Q + logn (1 3 1@n) 3 n31 1 1 ;13 = 1 + log2 1 3 2(n 3 1)H[KQ ] (log 4)(n 3 1)H[KQ ] which shows, for large Q, that W (Q) ? 1, but that W (Q) < 1 if n < ". Bibliography Abramowitz, M. and Stegun, I. A. (1968). Handbook of Mathematical Functions. (Dover Publications, Inc., New York). Allen, A. O. (1978). Probability, Statistics, and Queueing Theory. Computer Science and Applied Mathematics, (Academic Press, Inc., Orlando). Almkvist, G. and Berndt, B. C. (1988). Gauss, Landen, Ramanuyan, the ArithmicGeometric Mean, Ellipses, and the Ladies Diary. American Mathematical Monthly 95, 585—608. Anick, D., Mitra, D., and Sondhi, M. M. (1982). Stochastic theory of a datahandling system with multiple sources. The Bell System Technical Journal 61, 8 (October), 1871—1894. Anupindi, R., Chopra, S., Deshmukh, S. D., Van Mieghem, J. A., and Zemel, E. (2006). Managing Business Flows. Principles of Operations Management , 2nd edn. (Prentice Hall, Upper Saddle River). Barabasi, A.-L. (2002). Linked, The New Science of Networks. (Perseus, Cambridge, MA). Baran, P. (2002). The beginnings of packet switching - some underlying concepts: The Franklin Institute and Drexel University seminar on the evolution of packet switching and the Internet. IEEE Communications Magazine, 2—8. Berger, M. A. (1993). An Introduction to Probabiliy and Stochastic Processes. (Springer-Verlag, New York). Bertsekas, D. and Gallager, R. (1992). Data Networks, 2nd edn. (Prentice-Hall International Editions, London). Billingsley, P. (1995). Probability and Measure, 3rd edn. (John Wiley & Sons, New York). Bisdikian, C., Lew, J. S., and Tantawi, A. N. (1992). On the tail approximation of the blocking probability of single server queues with finite buer capacity. Queueing Networks with Finite Capacity, Proc. 2nd Int. Conf., 267—280. Bollobas, B. (2001). Random Graphs, 2nd edn. (Cambridge University Press, Cambridge, UK). Borovkov, A. A. (1976). Stochastic Processes in Queueing Theory. (Springer-Verlag, New York). Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. (Cambridge University Press, Cambridge). Brockmeyer, E., Halstrom, H. L., and Jensen, A. (1948). The Life and Works of A. K. Erlang. (Academy of Technical Sciences, Copenhagen). 523 524 BIBLIOGRAPHY Chalmers, R. C. and Almeroth, K. C. (2001). Modeling the branching characteristics and e!ciency gains in global multicast trees. IEEE INFOCOM2001, Alaska. Chen, L. Y. (1975). Poisson approximation for dependent trials. The Annals of Probability 3, 3, 534—545. Chen, W.-K. (1971). Applied Graph Theory. (North-Holland Publishing Company, Amsterdam). Chuang, J. and Sirbu, M. A. (1998). Pricing multicast communication: A costbased approach. Proceedings of the INET’98 . Cohen, J. W. (1969). The Single Server Queue. (North-Holland Publishing Company, Amsterdam). Cohen-Tannoudji, C., Diu, B., and Laloë, F. (1977). Mécanique Quantique. Vol. I and II. (Hermann, Paris). Comtet, L. (1974). Advanced Combinatorics, revised and enlarged edn. (D. Riedel Publishing Company, Dordrecht, Holland). Cormen, T. H., Leiserson, C. E., and Rivest, R. L. (1991). An Introduction to Algorithms. (MIT Press, Boston). Cvetkovic, D. M., Doob, M., and Sachs, H. (1995). Spectra of Graphs, Theory and Applications, third edn. (Johann Ambrosius Barth Verlag, Heidelberg). Dorogovtsev, S. N. and Mendes, J. F. F. (2003). Evolution of Networks, From Biological Nets to the Internet and WWW. (Oxford University Press, Oxford). Embrechts, P., Klüppelberg, C., and Mikosch, T. (2001a). Modelling Extremal Events for Insurance and Finance, 3rd edn. (Springer-Verlag, Berlin). Embrechts, P., McNeil, A., and Straumann, D. (2001b). Correlation and Dependence in Risk Management: Properties and Pitfalls. Risk Management: Value at Risk and Beyond, ed. M. Dempster and H. K. Moatt, (Cambridge University Press, Cambridge, UK). Erdös, P. and Rényi, A. (1959). On random graphs. Publicationes Mathematicae Debrecen 6, 290—297. Erdös, P. and Rényi, A. (1960). On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Kozl. 5, 17—61. Feller, W. (1970). An Introduction to Probability Theory and Its Applications, 3rd edn. Vol. 1. (John Wiley & Sons, New York). Feller, W. (1971). An Introduction to Probability Theory and Its Applications, 2nd edn. Vol. 2. (John Wiley & Sons, New York). Floyd, S. and Paxson, V. (2001). Di!culties in simulating the internet. IEEE Transactions on Networking 9, 4 (August), 392—403. Fortz, B. and Thorup, M. (2000). Internet tra!c engineering by optimizing OSPF weights. IEEE INFOCOM2000 . Frieze, A. M. (1985). On the value of a random minimum spanning tree problem. Discrete Applied Mathematics 10, 47—56. Gallager, R. G. (1996). Discrete Stochastic Processes. (Kluwer Academic Publishers, Boston). Gantmacher, F. R. (1959a). The Theory of Matrices. Vol. I. (Chelsea Publishing Company, New York). Gantmacher, F. R. (1959b). The Theory of Matrices. Vol. II. (Chelsea Publishing Company, New York). Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimus obnoxiae. Pars prior. Gauss Werke 4, 3—26. BIBLIOGRAPHY 525 Gilbert, E. N. (1956). Enumeration of labelled graphs. Canadian Journal of Mathematics 8, 405—411. Gnedenko, B. V. and Kovalenko, I. N. (1989). Introduction to Queuing Theory, second edn. (Birkhauser, Boston). Golub, G. H. and Loan, C. F. V. (1983). Matrix Computations. (North Oxford Academic, Oxford). Goulden, I. P. and Jackson, D. M. (1983). Combinatorial Enumeration. (John Wiley & Sons, New York). Grimmett, G. R. (1989). Percolation. (Springer-Verlag, New York). Grimmett, G. R. and Stirzacker, D. (2001). Probability and Random Processes, 3rd edn. (Oxford University Press, Oxford). Hardy, G. H. (1948). Divergent Series. (Oxford University Press, London). Hardy, G. H., Littlewood, J. E., and Polya, G. (1999). Inequalities, 2nd edn. (Cambridge University Press, Cambridge, UK). Hardy, G. H. and Wright, E. M. (1968). An Introduction to the Theory of Numbers, 4th edn. (Oxford University Press, London). Harris, T. E. (1963). The Theory of Branching Processes. (Springer-Verlag, Berlin). Harrison, J. M. (1990). Brownian Motion and Stochastic Flow Systems. (Krieger Publishing Company, Malabar, Florida). van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2001). First passage percolation on the random graph. Probability in the Engineering and Informational Sciences (PEIS) 15, 225—237. van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2002a). The flooding time in random graphs. Extremes 5, 2 (June), 111—129. van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2002b). On the covariance of the level sizes in recursive trees. Random Structures and Algorithms 20, 519—539. van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2005). Distances in random graphs with finite variance degree. Random Structures and Algorithms 27, 1 (August), 76—123. van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2006a). Size and weight of shortest path trees with exponential link weights. Combinatorics, Probability and Computing. van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2006b). The weight of the shortest path tree. Random Structures and Algorithms. Hooghiemstra, G. and Koole, G. (2000). On the convergence of the power series algorithm. Performance Evaluation 42, 21—39. Hooghiemstra, G. and Van Mieghem, P. (2005). On the mean distance in scale free graphs. Methodology and Computing in Applied Probability (MCAP) 7, 285—306. Jamin, S., C. Jin, A. R. Kurc, D. R., and Shavitt, Y. (2001). Constrained mirror placement on the internet. IEEE INFOCOM’01 . Janic, M., Kuipers, F., Zhou, X., and Van Mieghem, P. (2002). Implications for QoS provisioning based on traceroute measurements. Proceedings of 3nd International Workshop on Quality of Future Internet Services, QofIS2002 ed. B. Stiller et al., Zurich, Switzerland, Springer Verlag LNCS 2511 , 3—14. Janson, S. (1995). The minimal spanning tree in a complete graph and a functional limit theorem for trees in a random graph. Random Structures and Algorithms 7, 4 (December), 337—356. 526 BIBLIOGRAPHY Janson, S. (2002). On concentration of probability. Contemporary Combinatorics, ed. B. Bollobás, Bolyai Soc. Math. Stud. 10, János Bolyai Mathematical Society, Budapest, 289—301. Janson, S., Knuth, D. E., Luczak, T., and Pittel, B. (1993). The birth of the giant component. Random Structures and Algorithms 4, 3, 233—358. Karlin, S. and Taylor, H. M. (1975). A First Course in Stochastic Processes, 2nd edn. (Academic Press, San Diego). Karlin, S. and Taylor, H. M. (1981). A Second Course in Stochastic Processes. (Academic Press, San Diego). Kelly, F. P. (1991). Special invited paper: Loss networks. The Annals of Applied Probability 1, 3, 319—378. Kleinrock, L. (1975). Queueing Systems. Vol. 1 — Theory. (John Wiley and Sons, New York). Kleinrock, L. (1976). Queueing Systems. Vol. 2 — Computer Applications. (John Wiley and Sons, New York). Krishnan, P., Raz, D., and Shavitt, Y. (2000). The cache location problem. IEEE/ACM Transactions on Networking 8, 5 (October), 586—582. Kuipers, F. A. and Van Mieghem, P. (2003). The Impact of Correlated Link Weights on QoS Routing. IEEE INFOCOM03 . Lanczos, C. (1988). Applied Analysis. (Dover Publications, Inc., New York). Langville, A. N. and Meyer, C. D. (2005). Deeper inside PageRank. Internet Mathematics 1, 3 (Februari), 335—380. Le Boudec, J.-Y. and Thiran, P. (2001). Network Calculus, A Theory of Deterministic Queuing Systems for the Internet. (Springer Verlag, Berlin). Leadbetter, M. R., Lindgren, G., and Rootzen, H. (1983). Extremes and Related Properties of Random Sequences and Processes. (Springer-Verlag, New York). Leon-Garcia, A. (1994). Probability and Random Processes for Electrical Engineering, 2nd edn. (Addison-Wesley, Reading, Massachusetts). van Lint, J. H. and Wilson, R. M. (1996). A course in Combinatorics. (Cambridge University Press, Cambridge, UK). Lovász, L. (1993). Random Walks on Graphs: A Survey. Combinatorics 2, 1—46. Markushevich, A. I. (1985). Theory of functions of a complex variable. Vol. I — III. (Chelsea Publishing Company, New York). Mehta, M. L. (1991). Random Matrices, 2nd edn. (Academic Press, Boston). Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra. (Society for Industrial and Applied Mathematics (SIAM), Philadelphia). Mitra, D. (1988). Stochastic theory of a fluid model of producers and consumers coupled by a buer. Advances in Applied Probability 20, 646—676. Morse, P. M. and Feshbach, H. (1978). Methods of Theoretical Physics. (McGrawHill Book Company, New York). Neuts, M. F. (1989). Structured Stochastic Matrices of the M/G/1 Type and Their Applications. (Marcel Dekker Inc., New York). Norros, I. (1994). A storage model with self-similar input. Queueing Systems 16, 34, 387—396. Pascal, B. (1954). Oeuvres completes. Bibliothèque de la Pléade, (Gallimard, Paris). Paxson, V. (1997). End-to-end Routing Behavior in the Internet. IEEE/ACM Transactions on Networking 5, 5 (October), 601—615. Phillips, G., Schenker, S., and Tangmunarunkit, H. (1999). Scaling of multicast trees: Comments on the chuang-sirbu scaling law. ACM Sigcomm99 . BIBLIOGRAPHY 527 Pietronero, L. and Schneider, W. (1990). Invasion percolation as a fractal growth problem. Physica A 170, 81—104. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992). Numerical Recipes in C , 2nd edn. (Cambridge University Press, New York). Rainville, E. D. (1960). Special Functions. (Chelsea Publishing Company, New York). Riordan, J. (1968). Combinatorial Identities. (John Wiley & Sons, New York). Roberts, J. W. (1991). Performance Evaluation and Design of Multiservice Networks. Information Technologies and Sciences, vol. COST 224. (Commission of the European Communities, Luxembourg). Robinson, S. (2004). The prize of anarchy. SIAM News 37, 5 (June), 1—4. Ross, S. M. (1996). Stochastic Processes, 2nd edn. (John Wiley & Sons, New York). Royden, H. L. (1988). Real Analysis, 3rd edn. (Macmillan Publishing Company, New York). Sansone, G. and Gerretsen, J. (1960). Lectures on the Theory of Functions of a Complex Variable. Vol. 1 and 2. (P. Noordho, Groningen). Schoutens, W. (2000). Stochastic Processes and Orthogonal Polynomials. (SpringerVerlag, New York). Siganos, G., Faloutsos, M., Faloutsos, P., and Faloutsos, C. (2003). Power laws and the AS-level internet topology. IEEE/ACM Transactions on Networking 11, 4 (August), 514—524. Smythe, R. T. and Mahmoud, H. M. (1995). A survey of recursive trees. Theory of Probability and Mathematical Statistics 51, 1—27. Steyaert, B. and Bruneel, H. (1994). Analytic derivation of the cell loss probability in finite multiserver buers, from infinite buer results. Proceedings of the second workshop on performance modelling and evaluation of ATM networks, Bradford UK , 18.1—11. Strogatz, S. H. (2001). Exploring complex networks. Nature 410, 8 (March), 268— 276. Syski, R. (1986). Introduction to Congestion Theory in Telephone Systems, 2nd edn. Studies in Telecommunication, vol. 4. (North-Holland, Amsterdam). Titchmarsh, E. C. (1948). Introduction to the Theory of Fourier Integrals, 2nd edn. (Oxford University Press, Ely House, London W. I). Titchmarsh, E. C. (1964). The Theory of Functions. (Oxford University Press, Amen House, London). Titchmarsh, E. C. and Heath-Brown, D. R. (1986). The Theory of the Zetafunction, 2nd edn. (Oxford Science Publications, Oxford). Van Mieghem, P. (1996). The asymptotic behaviour of queueing systems: Large deviations theory and dominant pole approximation. Queueing Systems 23, 27— 55. Van Mieghem, P. (2001). Paths in the simple random graph and the Waxman graph. Probability in the Engineering and Informational Sciences (PEIS) 15, 535—555. Van Mieghem, P. (2004a). Data Communications Networking. (Delft University of Technology, Delft). Van Mieghem, P. (2004b). The Probability Distribution of the Hopcount to an Anycast Group. Delft University of Technology, Report 2003605 (www.nas.ewi.tudelft.nl/people/Piet/teleconference) . 528 BIBLIOGRAPHY Van Mieghem, P. (2005). The limit random variable W of a branching process. Delft University of Technology, Report 20050206 (www.nas.ewi.tudelft.nl/people/Piet/teleconference) . Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. (2000). A Scaling Law for the Hopcount in the Internet. Delft University of Technology, Report2000125 (www.nas.ewi.tudelft.nl/people/Piet/telconference). Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. (2001a). On the e!ciency of multicast. IEEE/ACM Transactions on Networking 9, 6 (December), 719—732. Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. W. (2001b). Stochastic model for the number of traversed routers in internet. Proceedings of Passive and Active Measurement: PAM-2001, April 23-24, Amsterdam. Van Mieghem, P. and Janic, M. (2002). Stability of a multicast tree. Proceedings IEEE INFOCOM2002 2, 1099—1108. Veres, A. and Boda, M. (2000). The chaotic nature of TCP congestion control. IEEE INFOCOM’2000, Tel-Aviv, Israel . Walrand, J. (1998). Communication Networks, A First Course, 2nd edn. (McGrawHill, Boston). Wästlund, J. (2005). Evaluation of Janson’s constant for the variance in the random minimum spanning tree problem. Linköping studies in Mathematics. Series editor: Bengt Ove Turesson 7 (www.ep.liu.se/ea/lsm/2005/007). Waxman, B. M. (1998). Routing of multipoint connections. IEEE Journal on Selected Areas in Communications 6, 9 (December), 1617—1622. Whittaker, E. T. and Watson, G. N. (1996). A Course of Modern Analysis, Cambridge Mathematical Library edn. (Cambridge University Press, Cambridge, UK). Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics 62, 3 (November), 548—564. Wigner, E. P. (1957). Characteristic vectors of bordered matrices with infinite dimensions ii. Annals of Mathematics 65, 2 (March), 203—207. Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Annals of Mathematics 67, 2 (March), 325—327. Wilkinson, J. H. (1965). The Algebraic Eigenvalue Problem. (Oxford University Press, New York). Wol, R. W. (1982). Poisson arrivals see time averages. Operations Research 30, 2 (April), 223—231. Wol, R. W. (1989). Stochastic Modeling and the Theory of Queues. (Prentice-Hall International Editions, New York). Index n-ary tree, 387, 401, 414, 417, 423, 432, 522 Pareto, 56 Poisson, 40, 116, 129, 335 polynomial, 44, 348, 494 regular, 348, 362 uniform, 43, 74 Weibull, 55, 107, 132 adjacency matrix, 320, 471, 488 eigenvalues, 475 Bayes’ rule, 28, 197 Benes̆’ equation, 261, 301 cell loss ratio (clr), 309, 512 Central Limit Theorem, 104, 148, 366, 377, 516 Cherno bound, 88 Chuang—Sirbu scaling law, 404—407 complete graph, 319, 321, 327, 347, 349, 359, 371, 373, 380, 392, 473, 481, 482, 488, 520 conditional distribution function, 28 conditional expectation, 34, 233, 341 conditional probability, 26 conditional probability density function, 28 correlation coe!cient, 30, 61, 67, 69, 74, 119 covariance, 29, 71, 78 matrix, 63, 66 degree graph, 323 degree of a node, 225, 322, 472 disjoint paths, 520 Merger’s Theorem, 327 distribution n-th order statistics, 53, 494 Bernoulli, 37 binomial, 38, 332, 488 Cauchy, 54 chi-square, 50 Erlang, 48, 125, 274, 278 exponential, 44, 75 extremal, 106 Fréchet, 107 Gamma, 48, 51, 125 Gaussian, 46, 103, 400 geometric, 39, 272 Gumbel, 54, 107, 400 joint Gaussian, 64 lognormal, 57, 77 Engset formula, 314, 511 Erlang B formula, 2, 280 Erlang C formula, 277 event, 10, 53 mutually exclusive, 10 failure rate, 131 flooding time, 362 giant component, 335, 337, 339 Google, 224 graph connectivity, 325, 486 edge connectivity, 326, 487 vertex connectivity, 326, 487 graph metrics betweenness, 329 clustering coe!cient, 328, 346, 513 diameter, 475 distortion, 329 expansion, 328 hopcount, 329 resilience, 329 histogram, 118, 494 hopcount, 340, 347, 354, 357, 387, 392, 403, 409, 418, 420, 423, 431, 513, 520 incidence matrix, 471 inclusion-exclusion formula, 12, 335, 391 sieve of Eratosthenes, 15 indicator function, 12, 17, 43, 321 inequality Boole, 15 Cauchy-Schwarz, 90, 91, 480 Chebyshev, 88 Gauss, 92 529 530 Index Hölder, 90, 480 Jensen, 85, 342 Markov, 88 Minkowsky, 91 infinitesimal generator, 181 Laplacian (admittance matrix), 472, 486 law of rare events, 41, 128, 495 law of total probability, 27, 123, 142, 159, 204, 205, 238, 255, 274, 278, 284, 295, 324, 341, 367, 410, 422, 445, 495 level set of a tree, 352 Lindley’s equation, 255 link weight, 320, 340, 341, 347, 349, 359, 362, 373, 392, 406, 408 Little’s law, 267, 273, 275, 281, 287, 297, 508, 510 Markov chain absorbing states, 164 communicating states, 162 conservative, 181 continuous-time, 179 discrete-time, 158 embedded, 186, 188 hitting time, 163 irreducible Markov chain, 161, 226 periodic and aperiodic, 162 transient and recurrent states, 165 mean time to failure, 131 memoryless property, 27, 40, 45, 125, 132, 185, 351 Metcalfe’s law, 320 minimum spanning tree (MST), 373, 399 modes of convergence, 99 Newton identities for polynomials, 477 order statistics, 52, 127 PageRank (Google), 224 phase transition, 335, 376 Poisson arrivals see time averages (PASTA), 267, 274, 275, 283, 288, 312, 509, 511 Pollaczek-Khinchin equation, 286 power law, 325 probability density function (pdf), 16, 20, 22 joint, 28, 32 probability generating function (pgf), 18 logarithm of, 19, 25 moment generating function, 25, 235 process arrival, 248, 270 birth and death, 208, 304, 351 branching, 229, 342 geometric, 244 Poisson, 246, 345 counting, 263 Markov, 180, 253, 349 balance equation, 187 Chapman-Kolmogorov equation, 180 forward and backward equation, 182, 195 time reversibility, 196 nonhomogeneous Poisson, 129 Poisson, 120, 210 queueing, 250 renewal, 137 service, 249, 270 stochastic, 115 modeling, 117 Yule, 212, 230 quality of service (QoS), 2, 249, 283, 309, 340, 419 random graph, 330, 332, 337, 339, 346, 354, 362, 373, 374, 377, 387, 392, 403, 404, 406, 408, 410, 488, 513 random variable continuous, 20, 59 discrete, 16, 58 expectation, 17, 22 indepedent, 97, 104 independent, 28, 29, 32, 34, 47, 49, 51, 78 normalized, 31, 93, 400 random vector, 62 random walk, 202, 484 redundancy level, 325 regular graphs, 322, 328, 475, 482, 486, 488 reliability function, 131 renewal alternating renewal process, 153 Blackwell’s Renewal Theorem, 146 Elementary Renewal Theorem, 145, 170 inspection paradox, 152 Key Renewal Theorem, 146, 151 renewal equation, 141 renewal function, 140 renewal process, 137 renewal theory, 165 server placement problem, 419, 424, 429 shortest path, 340, 347 tree (SPT), 387, 392, 399, 407, 419, 428 slotted Aloha, 219 stochastic matrix, 159, 450, 451, 455, 473 total variation distance, 42 transition probability matrix, 159, 190, 201 spectral decomposition, 184 unfinished work, 256, 261, 263, 301 uniform recursive tree (URT), 354, 380, 392, 404, 407, 411, 417, 419, 422, 424 uniformization, 189 Wald’s identity, 34, 145, 154 web graph, 224, 323 Wigner’s Semicircle Law, 489