APPLIED PROBABILITY & STATISTICS: SOME CHALLENGES AND OPPORTUNITIES (Where angels fear to tread ? ) V. Ramaswami1 AT&T Labs Research 180 Park Avenue, Florham Park, NJ 07932, USA vramaswami@att.com Abstract: The last three decades have seen major strides in statistical methodologies. That, combined with the exploitation of increased computing power and interfaces of statistics with machine learning and artificial intelligence, has enabled the consideration of many problems that were typically ignored for a variety of reasons such as too many parameters, small samples, non-normality, identifiability issues, etc. While many fields have taken advantage of that progress, some traditional fields of applied probability like queueing lag behind terribly. We illustrate the opportunities offered by this through some examples of attempts to bring in more statistical methodologies in our applied probability work at AT&T and conclude with some personal thoughts on the challenges ahead. 1. Introduction Even a cursory reading of Professor DasGupta’s page [1] in the recent IMS Bulletin will show how the landscape of statistical inference has changed in the last thirty years. It is noted by him that many new age topics like the bootstrap, EM, MCMC, function estimation, extreme values etc., have replaced many classical parametric and nonparametric methods so much so that what is now considered a set of core courses in statistical inference has but a small intersection with what was considered core in the 70’s and earlier. In addition to the new developments in Mathematical Statistics, much concurrent development has occurred in the areas of data mining and machine learning. These, combined with exponentially growing computing power, have allowed statistics to make major leaps into problem areas previously considered intractable. Many areas like biostatistics, computational biology, genetics, and mathematical finance have incorporated many of these advances. Yet, some traditional areas like queueing are yet to take advantage of these advances. In the context of stochastic processes, even in the areas where statistical methods are incorporated systematically, the real successes are primarily in the area of time series analysis only. Statistical problems arising in an area like queueing models typically tend to be harder due to the inherent non-linearity of performance measures of interest as functions of input distributions and parameters, and these difficulties are further exacerbated by the lack of explicit formulas. Yet on the positive side, in many application areas like telecommunications, computer performance and finance, data is available in plenty and 1 Invited Presentation at the Golden Jubilee of the PSG Technical University, Coimbatore, India. 1 the ability to collect, store and manipulate data has increased enormously. The availability of plentiful data also allows one to validate methods using the more stringent way of using part of the data as a learning set and the rest as a test set. That gives much opportunity for the researcher and the applied probabilist to test their models and algorithms and to refine them in some realistic scenarios. For the academician, particularly the availability of large data sets over the Worldwide Web provides enough of a testing ground for new ideas and methods, no longer forcing a necessity to relegate the completion of the research cycle through validation to industry. Our goal is to illustrate the opportunities available through a simple set of examples taken from our recent work at AT&T. We shall also share some of our personal thoughts on several research problems suggested by our work that may enable fields like queueing to incorporate the newer ideas and to become more useful, and for statistics to permeate even more deeply into the domain of stochastic processes. 2. Poisson Regression for the Non-stationary Poisson Process Wi-Fi hotspots in public venues like coffee shops have become an important aspect of wireless and mobile communications, and there is interest in modeling various aspects of their performance. We have provided a detailed analysis in the Proceedings of the 2011 IEEE INFOCOM conference [3], and here we highlight only the way we modeled the arrival stream of connections over week days as a non-stationary Poisson process based on 15 minute counts of arrivals obtained over a set of five weeks in over 200 coffee shops in New York and San Francisco. We used the first four weeks of data for model fitting and the last week’s data for testing the model. Shown in Figure 1 are the average 15 minute counts of newly starting WiFi connections over two consecutive days after grouping the venues into four subclasses by size. From here on, we will concentrate just on the “Large” group comprising of 51 coffee shops. It is clear that the arrival pattern is non-stationary and exhibits a daily periodicity. Average number of arrivals 12 10 Tiny Small Medium Large 8 6 4 2 0 3 am 6 am 9 am 12 pm 3 pm 6 pm 9 pm 12 am 3 am 6 am 9 am 12 pm 3 pm 6 pm 9 pm 12 am Two weekdays (15 min bins) Figure 1: Fifteen minute arrival counts over 2 consecutive days 2 A Poisson process is suggested by the fact that a large number of customers typically arrive into the coffee shop but only a very small fraction log into the Wi-Fi system there, and non-stationarity is suggested from the above preliminary graphical analysis. Let us concentrate on one of the groups, say the “Large” one and see how we could proceed. A simple way of attempting to fit would be to take the empirical mean curve (the blue curve of Figure 1) averaged over one day and to try to fit to it a continuous curve, say a polynomial, yielding the required values λ(t). There are several problems associated with this approach: (a) the throwing away of considerable amounts of information in the data; (b) inability to ensure nonnegativity of fitted values; and (c) the high degree of the polynomial that would be needed to mimic the flat regions, peaks and troughs in the data. With this in mind, I suggested building a methodology based on the Poisson regression model. The idea of Poisson regression, which is a part of the GLM methodology in Statistics, is to fit a Poisson distribution to data on a random variable Y by assuming that its mean λ = exp(α + β.x) , where x is a possibly vector valued co-variate, and then to estimate the parameters in the model from the observed pairs (x(i), y(i)) using the maximum likelihood procedure. The details of this may be found in most standard statistical texts today and won’t be discussed here. This, however, appears to be its first use in the context of a non-stationary Poisson process. To that end, we need to identify one or more co-variates, and obviously since the mean is a function of time, a simple way of achieving this is to choose the covariates so that they determine the time point at which the mean is to be estimated. The graph of the mean suggested that it is probably a good idea to group the 15-minute slots into blocks of say 3 hours each starting at 12 AM; that in turn would yield 8 blocks each with 12 fifteen minute counts for each day. In short, we decided we would use the co-variate X=(I,J) where I varied over values 1 to 8 and J over 1 to 12, with the pairs (1,1), (1,2) ...., (8,12) covering the 96 consecutive 15 minute time slots over a day starting at 12 AM. Note that the 3 hour block structure is motivated by a hope that it will help to model the flat regions without much trouble. We initially used a Poisson regression model given by 3 log (i, j ) ( k i k k j k ) . k 1 A third degree polynomial appears to be the minimum needed; we were trying to be optimistic in our approach involving only the first three powers of the co-variates. That model did not fare well, and some thought revealed one glaring problem with the model. The blocks are not identical with respect to intra-block behavior of the process. Thus, for instance, while in the three hour block from 12 AM to 3 AM hardly any change occurred in the counts, the three hour block from 6 AM to 9 AM shows a rapid increase in the counts. So, we decided to include an interaction term in the model enhancing it to 3 log (i, j ) ( k i k k j k ) (ij ) . k 1 3 A significant improvement occurred in the model, but however, when we tried the model on the test data spanning the last week, performance was not at all satisfactory. At this point, I decided to try to inject yet another method from more modern statistics, namely clustering. Recall that our grouping of the time slots into three hour blocks was entirely arbitrary, and perhaps we could do something better. So what we did was to do a clustering of the 96 time slots of the day into 8 clusters using the “mean clustering” algorithm; this chooses eight centroids for cluster means and assigns individual points to the clusters such that the intra-cluster variances are minimized. Several interesting things started happening. Following are the eight clusters obtained by us. Note that cluster 0 includes time slots 1 and 96 showing the daily cyclical pattern in the data identified automatically by the clustering scheme. Also, the clustering scheme automatically groups like intervals into a common cluster even if they are not contiguous. Thus, for instance, the time slots 32-34, 38-40 and 64-66 that mark very fast ascents of the counts are grouped together. Now with the variables I and J defined by the above groupings, we re-computed the Poisson regression model with an interaction term. Shown below in Figure 2 is the data over the last week, the estimated expected values, and the 95% confidence limits computed based on the Poisson distributions. 14 Number of arrivals 12 Observed data Model mean 2.5% quantile 97.5% quantile 10 8 6 4 2 0 Mon Tue Wed 5 weekdays (15 min bins) Thu Fri Figure 2: Comparison of model and observations over test data for arrivals 4 Of the 5x96 = 480 observations, there are very few that fall outside the confidence limits and when used as a statistical test are so small in number as to provide no reason to reject the hypothesis that the fit is good. We thus obtained an interesting new method for fitting a non-stationary Poisson process model to time series data of arrivals by exploiting Poisson regression, a much used procedure in modern Statistics for static data. Combining it with a clustering algorithm was key to its success in our context. Number of simultaneously present customers Just to conclude our discussion of this data analysis, we wish to note that we did not stop there. We fitted a heavy tailed distribution to the connection times from the first four weeks and modeled the number of simultaneous connections as a M(t)/G/∞ queue. We then compared the resulting model with the actually observed data in the last one week. Following is a figure providing a comparison similar to that in Figure 2. 15 Observed data Model mean, m(t) 2.5% quantile 97.5% quantile 12 9 6 3 0 Mon Tue Wed 5 weekdays (15 min bins) Thu Fri Figure 3: Comparison of model and test data for simultaneous connections Thus, we have managed to obtain an interesting way to fit and validate an M(t)/G/∞ model to our data set. To the best of our knowledge, these types of model fitting or validation have not been made earlier in the queueing literature. For further details, refer to [3]. 3. Modeling Heavy Tailed Distributions There are many practical situations where the distribution of an underlying variable is heavy tailed. For a quick introduction to heavy tailed distributions, refer to Sigman [8] and references cited therein. These are distributions whose tail P[X>x] decays slowly with x, and among those an important class is those for which for large x, P[ X x} O( x ) for some α >0. Note that the tail of a heavy tailed distribution decays very slowly, for example relative to an exponential distribution which has tail decay of the form exp(-λx). 5 A simple way to recognize a heavy tail of the type noted is to consider a log log plot of the complementary distribution P[X>x]; in the heavy tailed case we will see that for sufficiently large x, it has a linear decay with decay rate (– α ). A heavy tailed distribution with a tail of the form noted above will not have a finite mean if α < 1 and no finite variance unless α >2, In other words, not all moments may exist finitely. This has serious implications to using some commonly used descriptive measures and also for estimating them based on data. To understand the difference between the heavy tailed and other distributions even more, assume X is the repair time of some equipment after a failure and consider a conditional probability such as P[X>2x | X>x]. For an exponential distribution, this is easily computed to be e x which goes to 0 very fast, whereas in the heavy tailed case exemplified above, for large x, it is approximately given by 1 / 2 . Thus, for α =2, we see that this conditional probability is 0.71 for all large x which should indicate a very unacceptable situation indeed. Heavy tailed behavior is undesirable in many contexts. For example in a queueing model, if service times are heavy tailed, delays have even heavier tails, the queue becomes subject to large excursions, and often descriptors such as steady state measures of performance and other commonly used measures to characterize the system and control it become meaningless since convergence to them is not obtained or is too slow. Some well-known examples of heavy tailed distributions are the Pareto, Weibull and log Normal distributions. While the Pareto distribution is characterized by a complementary cdf of the form P[ X x] K / x , the Weibull is such that P[ X x] Ke x , where K is a suitable normalizing constant. Finally, we say that X is log normal if log X is normally distributed. There are slightly modified versions of the Pareto and Weibull distributions which allow for more than one parameter for greater flexibility. In the literature, it is customary to use one of the above distributions to model heavy tailed random variables. One difficulty with this has always been that while this allows for matching the tail fairly well, the head of the distribution is not matched. In many practical examples, the heavy tail comes from only a very small fraction of the data, and thus bulk of the data fails to be well represented properly. This, in turn, results in many performance measures of interest not getting characterized and predicted accurately. Some authors have attempted to remedy this in an ad-hoc manner by forming mixtures of heavy tailed and non-heavy tailed distributions, but as far as we know, no cogent theory has evolved that has yielded one common class of distributions. 6 Our own encounters with heavy tails have occurred in three diverse contexts within the last couple of years. The first was the Wi-Fi modeling scenario [3] discussed earlier where we found the connection durations to be heavy tailed. A second one involved repair times in a certain part of our communications network [5]. And the third is the well known example of internet file sizes which we [4] revisited recently for evaluating the impact of heavy tails on TCP performance. It is well-known that phase type distributions [6],[7], that is distributions that can be realized as absorption time distributions in finite Markov chains with an absorbing state, are dense in the set of all distributions on [0, ∞ ). A difficulty with this class in modeling heavy tailed random variables is that phase type distributions always have an exponentially decaying tail; the decay parameter is the Perron eigenvalue of the submatrix in the infinitesimal generator corresponding to the transient states, although this fact is not very relevant to our limited discussion here. Just note that when we use a PH distribution as a model for X, we get for large x, P[ X x] K e x , where η is some positive constant. Using phase type distributions as a basis, we can, however, obtain an interesting class of distributions to model heavy tails. To that end, we consider a family that I will call log PH in analogy with the log normal which is defined by saying that X is a log PH random variable if log(X) has a phase type distribution. It is easy to verify that if log(X) has an exponential decay parameter η, then for large x, P[ X x] K / x . The Pareto distribution is a trivial special case of this formulation, and the general formulation we have made should provide much greater versatility due to the generality of the class of phase type distributions. With these in mind, I advocated the use of this model class in the three examples where we encountered heavy tails. Since the Wi-Fi example is available in the open literature, I will just discuss some results related to the other two examples only. 3a. The Repair Times Example The motivation for us [5] to fit a distribution to repair time data arose from trying to answer an important question as to whether a certain set of observed long repair times impairing an undesirably large number of transactions were to be treated as outliers or were somewhat endemic in the system. In the latter case, there exists both the need and scope for improvement through more stringent vendor management etc. For proprietary reasons, we are unable to describe the context in greater detail although I must note that our work did help to catch a potentially damaging situation fairly early and to take various successful corrective steps. 7 Figure 4 provides the repair times (units are deliberately masked to preserve confidentiality), and Figure 5 below provides a log log plot of the empirical complementary distribution function exhibiting a heavy tail though a linear asymptote. Figure 4: Year 2010 Repair time distribution (sample size = 151) Figure 5: log log plot for Year 2010 We attempted to fit a phase type distribution to the logarithms and after some experimentations with different values of the number of phases, we obtained, using the EM algorithm, a phase type distribution of order 6 characterized by the following parameters Α = (0,5867, 0.0001, 0.3989, 0.0143) and the matrix T given by [,1] [,2] [,3] [,4] [,5] [,6] [1,] -2.675229 2.675229 0.000000 0.000000 0.000000 0.000000 [2,] 0.000000 -2.675230 2.675230 0.000000 0.000000 0.000000 [3,] 0.000000 0.000000 -2.949639 2.949639 0.000000 0.000000 [4,] 0.000000 0.000000 0.000000 -2.949639 2.949639 0.000000 [5,] 0.000000 0.000000 0.000000 0.000000 -2.949639 2.949639 [6,] 0.000000 0.000000 0.000000 0.000000 0.000000 -3.719682 Note the small number of distinct parameters identified, and these we found to be quite stable upon testing with different starting points etc. in the iterative scheme. 8 The following two figures compare the empirical and model cdf directly and through a quantile to quantile plot. Figure 6: Empirical and fitted cdf of 2010 log repair times Figure 7: Comparison of sample and model quantiles for Year 2010 9 What the above demonstrated for us and to the community of our vendors was that the heavy tailed phenomenon was inherent in the system, and some action had to be taken to make repair time durations more acceptable. That squarely put to rest all arguments concerning “black swans,” etc., brought in as possible things one has to live with. As we were doing the modeling, some efforts were ongoing to improve repair times, and we used data in the first half of 2011 and compared it to that in 2010. The following figure suggests two interesting inferences. The first is that our tail estimate appears to be very robust. The second one, more of interest to the business, is that while the good is getting better, the undesirable continues to remain undesirable. This conclusion is one that should hopefully allow a major change of course that would also eliminate much wasted effort and cost of failures. Figure 8: A comparison of years 2010 and 2011 3b. Long File Sizes and TCP Performance Recently, we undertook an effort to re-examine the famous internet data set earlier considered in the literature by Crovella, see [2], to see if with our new tools we could obtain a more accurate understanding of the effect of heavy tails on internet performance. Given below in Figure 9 is the histogram of the logarithm of 130,000 WWW files downloaded in the measured system in 1995. 10 Figure 9: Distribution of log file sizes The complementary distribution shown in Figure 10 demonstrates the presence of a heavy tail. Figure 10: log log plot indicating a heavy tail There was negligible amount of data (<0.01%) below 200 bytes. So we decided to take as our variable of interest S/200, where S is the size of the file in bytes and then took the 11 logarithm after sweeping values below 200 to the value 200; see footnote2. We thus decided to fit a phase type distribution to the variable Y= max(0, log(X/200)), and later to set the original variable X =200 with probability P(X≤200) and X = 200 * exp(Y) with probability P(X>200). For the resulting set of values Y, our approach with the EM algorithm allowed us to settle on a phase type fit with 5 transient phases with initial vector (0,0,0,0,1,0) and infinitesimal generator T for the transient part given by -1.31 0 0 1.31 0 1.31 -1.31 0 0 0 0 0 0 1.79 -2.11 0.32 0 0 0 -2.11 0 0 2.0 0.11 -2.11 In Figure 11 below, we show the empirical histogram and the density of the fitted PH distribution. We decided to live with the inability of the fit to match a couple of peaks in the empirical density since the areas under the curve is what really matters (see the next plot), and also because we did not want to increase the dimension of the fit enormously just to accomplish that. Figure 11: Empirical histogram and fitted density The following graph, Figure 12, which provides a comparison of the empirical and model cdfs shows that the fit does indeed do a very good job. In passing, we also note that a 2 Since data is padded to yield 1024 byte packets, this has no effect on the TCP performance shown later. This procedure avoids having to fit a bilateral phase type model. 12 property of the EM fit is that the model mean will match the empirical mean. Hopefully, this phase type fit should then pass muster when used in a queueing model. Figure 12: Empirical and model cdf plots We then compared our phase type fit to several of the classical types of fit and show the results in Figure 13. It is clear that the Pareto and Weibull (with additional parameters to also match the mean) fare very poorly. While visually the log normal appears more reasonable in the main body, a more detailed analysis (to be presented in [3]) showed that its tail decays at a rate much faster than what the data would support. 13 Figure 13: Comparison of models for fit of file size data In the context of our discussion, the real proof of the pudding is how well the models predict performance of systems. Certainly, we already have reasons to believe that the fit based on the phase type distributions will do very well. But how much better would it do? That is an interesting question. To answer that, we considered the simple example, see Figure 14, of a bottleneck link and a set of clients who download files from a web server. Figure 14: The network analyzed We considered several different scenarios wherein clients act as a finite source – i.e., once they make a request, they may not make another one until that download is complete - and also the open system where clients can make additional requests even as prior requests are pending, different system configurations, and many different load levels. We assumed the “think times’ to be exponentially distributed and chose different rates for it to obtain the desired load level. For brevity we show here only one set of examples corresponding to the finite source case with a loading of 80% of the link for a case of 40 clients working under TCP Tahoe with parameters as given by the default settings in NS2. The following Figure 15 provides a comparison of queue length distributions obtained from a set 1000 simulations (using the NS2 simulator implementing TCP Tahoe) based on the different models for file sizes as well as simulations of file sizes based on sampling the empirical trace of file sizes. Note the remarkable accuracy with which the log ph model is able to predict queue lengths in the output buffer at the server. The poor match of the head of the distribution by the other models has resulted in their curves not even maintaining the right shape, and the underestimation of the tail is very noticeable particularly for the Weibull and the log normal. 14 Figure 15: Comparison of fits based on queue lengths One of the major difficulties with heavy tails is that long run averages of queue statistics converge very slowly, even if they do. So, we decided to consider the distribution of the per second connection throughput which is a fairly complex functional of the sample paths. In this case, the log ph model still performs quite well while the other models fail miserably and provide significant overestimates of throughput. Figure 16: Comparison of fits based on throughput 15 This combined with our previous example, we hope, will at least generate a much stronger interest in phase type and log ph distributions. 4. SOME INTERESTING CHALLENGES The above has raised a number of research questions of a modeling/mathematical kind which are along the lines of what we consider as creating a greater interplay between modern methods of statistics and applied probability: (a) Can we obtain any mathematical insights on the quality of the estimator of the tail decay parameter? What are good methods to estimate it from data as part of the fitting procedure? Having used one such method, can we force the log PH fit to result in tail decay equal to the value obtained as the estimate? (b) What if I were to insist that the fitted means of my non-stationary Poisson process also show some smoothness properties at the knot points? What are good algorithms to achieve that? (c) Can the method be extended to fit more complicated processes like the Markov modulated Poisson process? How well would a clustering algorithm help to identify the number of different phases to be used in the model and the associated Poisson rates? How well would it work as a fitting technique (i) for the arrival process? (ii) for queueing models? (d) Is there any need or scope for using zero inflated non-stationary models (as they do in Statistics for the static case) to handle situations when, relative to the Poisson, there is an excessive number of intervals with no events? This would, for example, be a meaningful model in software bug estimation for a well-tested software where there would be a preponderous number of zero counts. If you are the real stochastic processes type, is there really a continuous time stochastic process generating such a zero inflated process and what is its structure? There are many stochastic processes that could lend themselves to an approach based on Poisson regression. An example would be the modeling of software errors where the covariates could be the specific software module with the bug, the number of times specific sections of code are invoked, the number of bugs already found and fixed in the block, etc. Although there are some reported instances of the use of Poisson and zero inflated Poisson regression in the software reliability context, we are not aware of their use in the context of a non-stationary Poisson process model with time also taken into account. Turning to the log PH class of distributions, we believe we have asserted it to be an interesting class in its own right. Studying them in some depth from the perspectives of extreme value and heavy tailed distributions in probability would therefore become a worthwhile addition to the literature. The log PH approach provides a way to re-examine 16 with improved models many networking issues which we have considered in the past with various ad-hoc procedures. We used the EM algorithm to fit the phase type distribution to the logarithms. But this may not be best for getting a good estimate of the tail. Could we obtain an estimate of the tail independently and constrain the estimated PH to conform to that? For instance, we could estimate the tail parameter from a regression model for the log log plot of the complementary distribution function. For the TCP example, we have used simulation primarily for examining the efficacy of the log PH and EM framework within a nearly realistic context, but even more importantly due to the fact that at present we do not have the necessary tools to handle the queueing model analytically. Developing such tools and appropriate asymptotics and approximations should form an interesting set of problems for applied probability. One of the lessons we learned in the TCP modeling is on the risk of extrapolating far beyond data when we use a heavy tailed distribution with an infinite support based on data with a finite support. Unlike in the case of fast decaying tails, this extrapolation can significantly alter the system estimates such as queue length distributions since they are very sensitive to the tail. The direct use of the log PH and other heavy tailed distribution results in significantly pessimistic results for the queue compared to the trace based simulation. In our case, to obtain a good match we had to rescale our distributions within a finite interval, and we arbitrarily took that to cover 1.2 times the maximum data point seen; the total mass to the right was very small and less than 10 4. Can we bring more science into this aspect? That is not only a challenging statistical question but a very important one for practical applications. We hope that the examples we have presented above from our recent efforts and the types of questions they raise drive home our point that there is much that could be done in effecting a greater tie in for the mutual progress of both applied probability and statistics. This is certainly a topic that has caught our recent fancy. It is indeed unfortunate that we as applied probabilists have not taken advantage of the more recent developments in statistics. And it is equally unfortunate that statisticians mostly contend themselves with static problems and do not get involved in making significant contributions that will make applied probability as a field more practicable. Just as Dr. Das Gupta [1] noted in the context of a curriculum of statistical inference, we do have a challenge to look at even broader issues and see how future generations can be served better by our science. As for our own specific attempts, whether we have foolishly rushed into an area where angels fear to tread or have identified some real opportunities, only the future can tell. Acknowledgment: I record my heartfelt thanks to my co-authors in [3],[4], and [5] for indulging me with support for my wildly speculative suggestions and working together to 17 demonstrate them to be practically useful. My thanks are also due Dr. Soren Asmussen for some useful comments and observations. References [1] Anirban DasGupta: Anirban’s Angle: A new Core? IMS Bulletin, 40(5), Aug 2011, p8. http://bulletin.imstat.org/wp-content/uploads/Bulletin40_5.pdf [2] M.E. Crovella, M.S. Taqqu, A. Bestavros : Heavy-tailed probability distributions in the World Wide Web, in A Practical Guide to heavy Tails : Statistical Techniques & Applications, 1998. [3] A. Ghosh, R. Jana, , V. Ramaswami, J. Rowland, & N.K. Shankaranarayanan: Modeling and characterization of large-scale Wi-Fi traffic in public hotspots, IEEE INFOCOM 2011, Shanghai, China. [4] K. Jain, R. Jana & V. Ramaswami: Modeling heavy tailed file sizes for TCP performance evaluation – in preparation. [5] Y. Kogan, W. Lai & V. Ramaswami: Modeling heavy tailed repair times – in preparation. [6] G. Latouche & V. Ramaswami: Introduction to Matrix Analytic Methods in Stochastic Modeling, SIAM/ASA, 1999. [7] M.F. Neuts: Matrix-Geometric Solutions in Stochastic Models, Johns Hopkins University Press, 1981. [8] K. Sigman: Appendix: A primer on heavy tailed distributions, QUESTA, Vol. 33, Issue 1/3, 1999. 18