The author and publisher shall ot be liable in eny event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. Printed in the United States of Americe 10987654321 ISBN 0-13-178457-9 Pearson Pearson Pearson Pearson Pearson Pearson Pearson Pearson Pearson Education Education Education Education Education Educacién Education Education Education, LTD. Australia PTY, Limited Singapore, Pte. Ltd North Asia Ltd Canada, Ltd. de Mexico, S$.A. de C.V. -- Japan Malaysia, Pte. Ltd Upper Saddle River, New Jersey To my father P.D. Stark (in memoriam) From darkness to light (1941-1945). Henry Stark To Harriet. John W. Woods Contents Introduction to Probability 11 Tntroduction: Why Study Probability? 1.2 The Different Kinds of Probability A. Probability as Intuition B. Probability as the Ratio of Favorable to Total Outcomes (Classical Theory) C. Probability as a Measure of Frequency of Occurrence D. Probability Based on an Axiomatic Theory Misuses, Miscalculations, and Paradoxes in Probability Sets, Fields, and Events Examples of Sample Spaces Axiomatic Definition of Probability Joint, Conditional, and Total Probabilities; Independence Bayes' Thecrem and Applications Combinatorics Occeupancy Problems Extensions and Applications 19 Bernoulli Trials—Binomial and Multinomial Probability Laws Multinomial Probability Law 1.10 Asymptotic Behavior of the Binomial Law: The Poisson Law 1.11 Normal Approximation to the Binomial Law 112 Summary Problems References xiii [CECEEE Preface Contents vi 2 Random Variables Introduction Definition of a Random Variable Probability Distribution Function Probability Density Function (PDF) Four Other Common Density Functions More Advanced Density Functions Continuous, Discrete, and Mixed Random Variables Examples of Probability Mass Functions Conditional and Joint Distributions and Densities Failure Rates 2.8 Summary Prob lems References Additional Reading 3 Functions of Random Variables 3.1 Introduction Functions of & Random Variable (Several Views) Solving Problems of the Type Y = g(X) General Formula of Determining the pdf of ¥ = g(X) Solving Problems of the Type Z = g(X,Y) Solving Problems of the Type V = g(X,Y), W = h(X,Y) Fundamental Problem Obtaining fyw Directly from fxy 3.5 Additional Examples 3.6 Summary Problems References Additional Reading 4 Expectation and Introduction to Estimation 4.1 Expected Value of a Random Vaziable On the Validity of Equation 4.1-8 Conditional Expectations 4.2 Conditional Expectation as a Random Variable 4.3 Moments Joint Moments Properties of Uncorrelated Random Variables Jointly Gaussian Random Variables Contours of Constant Density of the Joint Gaussian pdf 4.4 Chebyshev and Schwarz Inequalities Random Variables with Nonnegative Values The Schwarz Inequality Contents 4.5 Moment-Generating Functions 211 4.6 4.7 Chernoff Bound Characteristic Functions Joint Characteristic Functions The Central Limit Theorem Estimators for the Mean and Variance of the Normal Law Confidence Intervals for the Mean Confidence Interval for the Variance 214 216 222 225 230 231 234 Summary 236 4.8 4.9 5 vii Problems References 237 243 Additional Reading 243 Random Vectors and Parameter Estimation 244 5.1 Joint Distribution and Densities 244 5.2 5.3 5.4 Multiple Transformation of Random Variables Expectation Vectors and Covariance Matrices Properties of Covariance Matrices 248 5.5 Simultaneous Diagonalization of Two Covariance Matrices and Applications in Pattern Recognition Projection Maximization of Quadratic Forms 259 262 263 5.6 The Multidimensional Gaussian Law 269 5.7 Characteristic Functions of Random Vectors The Characteristic Function of the Normal Law Parameter Estimation 277 280 282 Estimation of E[X] 284 Estimation of Vector Means and Covariance Matrices Estimation of p 286 286 Estimation of the Covariance K 287 5.8 5.9 6 251 254 5.10 Maximum Likelihood Estimators 5.11 Linear Estimation of Vector Parameters 290 294 5.12 Summary Problems 297 298 References 303 Additional Reading 303 Random Sequences 6.1 Basic Concepts Infinite-Lengih Bernoullt Trials 304 304 310 6.2 6.3 Continuity of Probability Measure Statistical Specification of 2 Random Sequence Basic Principles of Discrete-Time Linear Systems Random Sequences and Linear Systems 315 317 334 340 Contents viii 6.4 6.5 WSS Random Sequences Power Spectral Density Interpretation of thé PSD Synthesis of Random Sequences and Discrete-Time Simulation Decimation Interpolation Markov Random Seguences ARMA Models Markov Chains 6.6 Vector Random Sequences and State Equations 6.7 Convergence of Random Sequences 6.8 6.9 Laws of Large Numbers Summary Problems References 7 Random Processes 7.1 Basic Definitions 7.2 Some Important Random Processes Asynchronous Binary Signaling Poisson Counting Process Alternative Derivation of Poisson Process Random Telegraph Signal Digital Modulation Using Phase-Shift Keying ‘Wiener Process or Brownian Motion Markov Random Processes Birth-Death Markov Chains 7.3 7.4 7.5 Chapman-Kolmogorov Equations Random Process Generated from Random Sequences Continuous-Time Linear Systems with Random Inputs ‘White Noise Some Useful Classifications of Random Processes Stationarity Wide-Sense Stationary Processes and LSI Systems Wide-Sense Stationary Case Power Spectral Density An Interpretation of the psd More on White Noise 7.6 7.7 7.8 Stationary Processes and Differential Equations Periodic and Cyclostationary Processes Vector Processes and State Equations State Equations Summary 348 351 852 Contents Problems References 8 Advanced Topics in Random Processes 81 Mean-Square (m.s.) Calculus Stochastic Continuity and Derivatives [8-1] Further Results on m.s. Convergence [8-1] 8.2 8.3 m.s. Stochastic Integrals m.s. Stochastic Differential Equations 8.4 8.5 Ergodicity [8-3] Karhunen-T.oéve Expansion (8-5) 8.6 Representation of Bandlimited and Periodic Processes 518 524 Bandlimited Processes Bandpass Random Processes ‘WSS Periodic Processes Fourier Series for WSS Processes 87 8.8 9 Summary Appendix: Integral Equations Existence Theorem Problems References 540 551 Applications to Statistical Signal Processing 552 9.1 9.2 Estimation of Random Variables More on the Conditional Mean Orthogonality and Linear Estimation Some Properties of the Operator £ TInnovation Sequences and Kalman Filtering Predicting Gaussian Random Sequences Kalman Predictor and Filter 9.3 Exror-Covariance Equations Wiener Filters for Random Sequences Unrealizable Case (Smoothing) Causal Wiener Filter 9.4 9.5 575 581 585 Expectation-Maximization Algorithm Log-Likelihood for the Linear Transformation Summary of the E-M algorithm E-M Algorithm for Exponential Probability Functions Application to Emission Tomography Log-likelihood Function of Complete Data E-step M-step Hidden Markov Models (HIMM) Specification of an HMM 601 Contents Application to Speech Processing Efficient Computation of P[E|M] with a Recursive Algorithin Viterbi Algorithim and the Most Likely State Sequence for the Observations 9.6 Spectral Estimation The Periodogram Bartlett’s Procedure—Averaging Periodograms Parametric Spectral Estimate 9.7 Maximum Entropy Spectral Density Simulated Annealing Gibbs Sampler Noncausal Gauss-Markov Models Compound Markov Models Gibbs Line Sequence Summary 9.8 Problems References Appendix A Review of Relevant Mathematics Al Basic Mathematics Sequences Convergence A2 Summations Z-Transform Continuous Mathematics Definite and Indefinite Integrals Differentiation of Integrals Integration by Parts Completing the Square Double Integration Functions A3 Residue Method for Inverse Fourier Transformation Fact Inverse Fourier Transform for psd of Random Sequence Mathematical Induction. Axiom of Induction References A4 Appendix B1 B.2 Appendix c1 B Gamma and Delta Functions Gamma Function Dirac Delta Function C Functional Transformations Introduction and Jacobians 604 605 Contents xi C.2 Jacobians for n = 2 663 C.3 Jacobian for General n 664 Appendix D Measure and Probability 668 D.1 Introduction and Basic Ideas 668 D.2 Measurable Mappings and Functions Application of Measure Theory to Probability 670 670 Distribution Measure 671 Appendix Index E Sampled Analog Waveforms and Discrete-time Signals 672 674 Preface The first edition of this book (1986) grew out of a set of notes used by the authors to teach two one-semester courses on probability and random processes at Rensselaer Polytechnic Tnstitute (RPI). At that time the probability course at RPI was required of all students in the Computer and Systems Engineering Program and was a highly recommended elective for students in closely related areas. While many undergraduate students took the course in the junior year, many seniors and first-year graduate students took the course for credit as well. Then, as now, most of the students were engineering students. To serve these students well, we felt that we should be rigorous in introducing fundamental principles while furnishing many opportunities for students to develop their skills at solving problems. There are many books in this area and they range widely in their coverage and depth. At one extreme are the very rigorous and authoritative books that view probability frora the point of view of measure theory and relate probability to rather exotic theorems such as the Radon-Nikedym theorem (see for example Probability and Measure by Patrick Billingsley, Wiley, 1978). At the other extreme are books that usually combine probability and statistics and largely omit underlying theory and the more advanced types of applications of probability. In the middle are the large number of books that combine probability and random processes, largely avoiding a measure theoretic approach, preferring to emphasize the axioms upon which the theory is based. It would be fair to say that our book falls into this latter category. Nevertheless this begs the question: why waite or revise another book in this arca if there are already several good texts out there that use the same approach and provide roughly the same coverage? Of course back in 1986 there were few books that emphasized the engineering applications of probability and random processes and that integrated the latter into one volume. Now there are several such books. Both authors have been associated (both as students and faculty) with colleges and universities that have demanding programs in engineering and applied science. Thus their xiii xiv Preface experience and exposure have been to superior students that would not be content with 8 text that furnished a shallow discussion of probability. At the same time, however, the avthors wanted to write a book on probability and randotis processes for engineering and applied science students. A measure-theoretic book, or one that avoided the engineering applications of probability and the processing of random signals, was regarded not suitable for such students. At the same time the authors felt that the book should have enough depth so that students taking 2™ year graduate courses in advanced topics such as estimation and detection, pattern recognition, voice and image processing, networking and queuing, and so forth would not be handicapped by insufficient knowledge of the fundamentals and applications of random phenomena. In a nutshell we tried to write a book that combined rigor with accessibility and had a strong self-teaching orientation. To that end we included a large number of worked-out examples, MATLAB codes, and special appendices that include a review of the kind of basic math needed for solving problems in probability as well as an introduction to measure theory and its relation to probability. The MATLAB codes, as well as other useful material such as multiple choice exams that cover each of the book’s sections, can be found at the boolk’s web site http : // The normal use of this book would be as follows: for a first course in probability at, say the junior or senior year, a reasonable goal is to cover Chapters 1 through 4. Nevertheless we have found that this may be too much for students not well prepared in mathematics. In that case we suggest a load reduction in which combinatorics in Chapter 1 (parts of Section 1.8), failure rates in Chapter 2 {Section 2.7), more advanced density Sunctions and the Poisson transform in Chapter 3 are lightly or not covered the first time around. The proof of the Central Limit Theorem, joint characteristic functions, and Section 4.8, which deals with statistics, all in Chapter 4, can, likewise, also be omitted on a first reading. Chapters b to 9 provide the material for a first course in random processes. Normally such a course is taken in the first year of graduate studies and is required for all further study in signal processing, communications, computer and communication networking, controls, and estimation and detection theory. Here what to cover is given greater latitude. If pressed for time, we suggest that the pattern recognition applications and simultaneous diagonaljzation of two covariance matrices in Chapter 5 be given lower preference than the other material in that chapter. Chapters 6 and 7 are essential for any course in random processes and the coverage of the topics therein should be given high priority. Chapter 9 on signal processing should. likewise be given high priority, because it illustrates the applications of the theory to cutrent state-of-art problems. However, within Chapter 9, the instructor can choose among a number of applications and need not cover them all if time pressure becormes an issue. Chapter 8 dealing with advanced topics is critically important to the more advanced students, especially those seeking further studies toward the Ph.D. Nevertheless it too can be lightly covered or omitted in a first course if time is the critical factor. Readers familiar with the 2" edition of this book will find significant changes in the 37 edition. The changes were the result of mumerous suggestions made by lecturers and students alike. To begin with, we modified the title to Probability and Random Processes with Applications to Signal Processing, to better reflect the contents. We removed the two chapters on estimation theory and moved some of this material to other chapters where it naturally fitted in with the material already there. Some of the material on parameter Preface Xv estimation e.g., the Gauss-Markov Theorem has been removed, owing to the need for finding space for new material. In terms of organization, the major changes have been in the random processes part of the book. Many readers preferred seeing discrete-time random phenomena in one chapter and continuous-time phenomena in another chapter. In the earlier editions of the book there was a division along these lines but also a secondary division along the lines of stationary versus non-stationary processes. For some this made the book awkward to teach from. Now all of the material on discrete-time phenomena appears in one chapter {Chapter 6); likewise for continuous-time phenomena (Chapter 7). Another major change is a new Chapter 9 that discusses applications to signal processing. Included are such topics as: the orthogonality principle, Wiener and Kalman filters, The Expectation-Maximization algorithm, Hidden Markov Models, and simulated annealing. Chapter 8 (Advanced Topics) covers much of the same ground as the old Chapter 9 e.g., stochastic continuity, meansquare convergence, Ergodicity etc. and material from the old Chapter 10 on representation of random processes. There have been significant changes in the first half of the book also. For example, in Chapter i there is an added section on the misuses of probability in ordinary life. Here we were helped by the discussions in Steve Pinker’s excelient book How the Mind Works (Norton Publishers, New York, 1997). Chapter 2 (Random Variables) now includes discusslons on more advanced distributions such as the Gamma, Chi-square and the Student-t. All of the chapters have many more worked-ont examples as well as more homework prob- lems. Whenever convenient we tried to use MATLAB to obtain graphical results. Also, being a book primanily for engineers, many of the worked-out example and homework problems relate to real-life systems or situations. We have added several new appendices to provide the necessary background mathematics for certain results in the text and to enrich the reader’s understanding of probability. An appendix on Measure Theory falls in the latter category. Among the former are appendices on the delta and gamma functions, probability-related basic math, including the principle of proof-by-induction, Jacobians for n-dimensional transformations, and material on Fourier and Laplace inversion. For this edition, the authors wonld like to thank Geoffrey Williamson and Yongyi Yang for numerous insightful discussions and help with some of the MATLAB programs. Also we thank Nikos Galatsanos, Miles Wernick, Geoffrey Chan, Joseph LoCicero, and Don Ucci for helpful suggestions. We also would like to thank the administrations of Lliinois Institute of Technology and Rensselaer Polytechnic Institute for their patience and support while this third edition was being prepared. Of course, in the end, it is the reaction of the students that is the strongest driving force for improvements. To all our students and readers we owe a large debt of gratitude. Henry Stark John W. Woods Introduction to Probability 1.1 INTRODUCTION: WHY STUDY PROBABILITY? One of the most frequent questions posed by beginning students of probability is: “Ts anything truly random and if so how does one differentiate between the truly random and that which, because of a lack of information, is treated as random but really isn*t?” First, regarding the question of truly random phenomena: “Do such things exist? A theologian might state the case as follows: “We cannot claim to know the Creator’s mind, and we cannot predict His actions because He operates on a scale too large to be perceived by man. Hence there are many things we shall never be able to predict no matter how refined our measurernents.” At the other extreme from the cosmic scale is what happens at the atomic level. Our friends the physicists speak of such things as the probability of an atomic system being in a certain state. The uncertainty principle says that, try as we might, there is a limit to the accuracy with which the position and momentum can be simultaneously ascribed to a particle. Both quantities are fuzzy and indeterminate. Many, including some of our most famous physicists, believe in an essential randomness of nature. Engen Merzbacher in his well-known textbook on quantum mechanies [1-1] writes: The probability doctrine of quantum mechanics assexts that the indetermination, of which we have just given an example, is a property inherent in nature and not merely a. profession of our temporary ignorance from which we expect to be relieved by a future better and more complete theory. The conventional interpretation thus denies the possibility of an ideal theory which would encompass the present quantum mechanics Chapter 1 2 _Introduction to Probability but would be free of its supposed defects, the most notorious “imperfection” of quantum mechanics being the abandonment of strict classical determinism. But the issue of determinism versus inherent indeterminism need never even be consid- ered when discussing the validity of the probabilistic approach. The fact remains that there nearly uncountable number of situations where we cannot make any cateis, quite literally, gorical deterministic assertion regarding a phenomenon because we cannot measure all the contributing elements. Take, for example, predicting the value of the current i(t) produced by a thermally excited resistor R. Conceivably, we might accurately predict i{t) at some instant ¢ in the future if we could keep track, say, of the 10% or so excited electrons moving in each other’s magneiic ficlds and setting up local field pulses that eventually all contribute to producing (£). Such a calculation is quite inconceivable, however, and therefore we use a probabilistic model rather than Maxwell’s equations to deal with resistor noise. Similax arguments can be made for predicting weather, the outcome of a coin toss, the time to failure of a computer, and many other situations. Thus to conclude: Regardless of which position one takes, that is, determinism versus indeterminism, we are forced to use probabilistic models in the real world because we do not know, cannot calculate, or cannot measure all the forces contributing to an effect. The forces may be too complicated, too numerous, or too faint. Probability is a mathematical model to help us study physical systems in an average sense. Thus we cannot use probability in any meaningful sense to answer questions such as: “What is the probability that a comet will strike the earth tomorrow?” or “What is the probability that thete is life on other planets?"? R. A. Fisher and R. Von Mises, in the first third of the twentieth century, were largely responsibie for developing the groundwork of modern probability theory. The modern axiomatic treatment upon which this book is based is largely the result of the work by Andrei N. Kolmogorov {1-2}. 1.2 THE DIFFERENT KINDS OF PROBABILITY There are essentially four kinds of probability. We briefly discuss them here. A. Probability as Intuition This kind of probability deals with judgments based on intuition. Thus “She will probably marry him,” and “He probably drove too fast,” are in this category. Intuitive probability can lead to contradictory behavior. Joe is still likely to buy an imported Itsibisi, world famous for its reliability, even though his neighbor Frank has a. 19-year-old Buick that has never broken down and Joe’s other neighbor, Bill, has his Itsibitsi in the repair shop. Here Joe may be behaving “rationally,” going by the statistics and ignoring, so-to-speak, his Nevertheless, certain evangelists and others have dealt with this question rather fearlessly. However, whatever probahility system these people use, it is not the system that we shall discuss in this book. Sec. 1.2, THE DIFFERENT KINDS OF PROBABILITY 3 personal observation. On the other hand, Joe will be wary about letting his nine-year-old daughter Jane swim in the local pond, if Frank reports that Bill thought that he might have seen an alligator in it. This despite the fact that no one has ever reported seeing an alligator in this pond, and countless people have enjoyed swimming in it without ever having been bitten by an alligator. To give this example some credibility, assume that the pond is in Florida. Here Joe is ignoring the statistics and reacting to, what is essentially, a rumor. Why? Possibly because the cost to Joe “just-in-case” there is an alligator in the pond would be too high {1-3]. People buying lottery tickets intuitively believe that certain number combinations like month/day/year of their grandson’s birthday are more likely to win that, say, 06-06-06. How many people will bet even odds that a coin that, heretofore has behaved “fairly,” that is in an unbiased fashion, will come up heads on the next toss, if in the last seven tosses it has come up heads? Many of us share the belief that the coin has some sort of memory and that, after seven heads, that coin must “make things right” by coming up with more tails. A mathematical theory dealing with intuitive probability was developed by B. O. Koopman [1-4]. However, we shall not discuss this subject in this book. B. Probability as the Ratio of Favorable to Total Outcomes (Classical Theory) In this approach, which is not experimental, the probability of an event is computed a priori! by counting the number of ways Ny that B can occur and forming the ratio Ng/N where N is the number of all possible outcomes, that is, the number of all alternatives to E plus Ng. An important notion here is that all outcomes are equally likely. Since equally likely is really & way of saying equally probable, the reasoning is somewhat circular, Suppose we throw a pair of unbiased dice and ask what is the probability of getting a seven? We partition the outcome into 36 equally likely outcomes as shown in Table 1.2-1 where each entry is the sum of the numbers on the two dice. Table 1.2-1 Two Dice Outcomes of Throwing 1st die Znddie 1 2 3 4 5 6 1 2 3 4 2 3 4 5 3 4 5 6 4 85 6 7 5 6 7 8 6 7 8 9 7 8 9 10 5 6 7 8 ¢ 10 11 6 7 8 9 10 11 12 1A _priors means relating to reasoning from self-evident propasitions or presupposed by experience. A posteriors means relating to reasoning from observed facts. Chapter 1 4 Introduction to Probability The total number of outcomes is 36 if we keep the dice distinet. The number of ways of getting a seven is N7 = 6. Hence N 6 Pigetting a seven] = TUE 1 Example 1.2-1 Throw a fair coin twice (note that since no physical experimentation is involved, there is no problem in postulating an ideal “fair coin”). The possible outcomes are HH, HT, TH, TT. The probability of getting at least one tail T is computed as follows: With £ denoting the event of getting at least one tail, the event E is the set of outcomes B = {HT,THTT}. Thus, E occurs whenever the outcome is #T ox TH or TT. The number of elements in F is Ng = 3; the number of all outcomes, N, is four. Hence Plat least one T = Ne _ % The classical theory suffers from at least two significant probloms: (1) It cannot deal with Gutcomes that are not equally likely; and (2) it cannot handle uncountably infinite outcomes without ambiguity (see the example by Athanasios Papoulis {1-5]). Nevertheless, in those problems where it is impractical to actually determine the outcome probabilities by experimentation and where, because of symmetry considerations, one can indeed argue equally likely outcomes the classical theory is useful. Historically, the classical approach was the predecessor of Richard Von Mises’ {1-6) relative frequency approach developed in the 1930s. C. Probability as a Measure of Frequency of Occurrence The relative-frequency approach to defining the probability of an event E is to perform an experiment n times. The number of times that E appears is denoted by ng. Then it is tempting to define the probability of & occwrring by PiE) = n—co lm 22, 7 we must have 0 < PIE] Quite clearly since ngp that we can never perform the experiment a: is (1.2-1) One difficulty with this approach mte number of times so we can only estimate P[E] from a finite number of trials. Secondly, we postulate that np/n approaches & limit as n goes to infinity. Bui consider flipping a fair coin 1000 times. The likelihood of getting exactly 500 heads is very small; in fact, if we flipped the coin 10,000 times, the Tikelihood of getting exactly 5000 heads is even smaller. As n - oo, the event of observing exactly n/2 heads becomes vanishingly small. Yet our intuition demands that Plhead] = 3