STATISTICS 330 COURSE NOTES Cyntha A. Struthers Department of Statistics and Actuarial Science, University of Waterloo Spring 2022 Edition ii 1 Contents 1. Preview 1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Univariate Random Variables 2.1 Probability . . . . . . . . . . . . . . 2.2 Random Variables . . . . . . . . . . 2.3 Discrete Random Variables . . . . . 2.4 Continuous Random Variables . . . . 2.5 Location and Scale Parameters . . . 2.6 Functions of a Random Variable . . 2.7 Expectation . . . . . . . . . . . . . . 2.8 Inequalities . . . . . . . . . . . . . . 2.9 Variance Stabilizing Transformation 2.10 Moment Generating Functions . . . 2.11 Calculus Review . . . . . . . . . . . 2.12 Chapter 2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Multivariate Random Variables 3.1 Joint and Marginal Cumulative Distribution 3.2 Bivariate Discrete Distributions . . . . . . . 3.3 Bivariate Continuous Distributions . . . . . 3.4 Independent Random Variables . . . . . . . 3.5 Conditional Distributions . . . . . . . . . . 3.6 Joint Expectations . . . . . . . . . . . . . . 3.7 Conditional Expectation . . . . . . . . . . . 3.8 Joint Moment Generating Functions . . . . 3.9 Multinomial Distribution . . . . . . . . . . 3.10 Bivariate Normal Distribution . . . . . . . . 3.11 Calculus Review . . . . . . . . . . . . . . . 3.12 Chapter 3 Problems . . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 . . . . . . . . . . . . 5 5 10 15 19 27 30 37 43 44 45 52 56 . . . . . . . . . . . . 61 62 63 66 76 84 90 97 102 108 111 114 116 iv CONTENTS 4. Functions of Two or More Random Variables 4.1 Cumulative Distribution Function Technique . 4.2 One-to-One Transformations . . . . . . . . . . 4.3 Moment Generating Function Technique . . . . 4.4 Chapter 4 Problems . . . . . . . . . . . . . . . 5. Limiting or Asymptotic Distributions 5.1 Convergence in Distribution . . . . . . . 5.2 Convergence in Probability . . . . . . . 5.3 Weak Law of Large Numbers . . . . . . 5.4 Moment Generating Function Technique 5.5 Additional Limit Theorems . . . . . . . 5.6 Chapter 5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . for Limiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributions . . . . . . . . . . . . . . . . 6. Maximum Likelihood Estimation - One Parameter 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Maximum Likelihood Method . . . . . . . . . . . . . . . 6.3 Score and Information Functions . . . . . . . . . . . . . 6.4 Likelihood Intervals . . . . . . . . . . . . . . . . . . . . 6.5 Limiting Distribution of Maximum Likelihood Estimator 6.6 Con…dence Intervals . . . . . . . . . . . . . . . . . . . . 6.7 Approximate Con…dence Intervals . . . . . . . . . . . . 6.8 Chapter 6 Problems . . . . . . . . . . . . . . . . . . . . 7. Maximum Likelihood Estimation - Multiparameter 7.1 Likelihood and Related Functions . . . . . . . . . . . . . 7.2 Likelihood Regions . . . . . . . . . . . . . . . . . . . . . 7.3 Limiting Distribution of Maximum Likelihood Estimator 7.4 Approximate Con…dence Regions . . . . . . . . . . . . . 7.5 Chapter 7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 121 125 137 147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 152 156 159 167 172 179 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 184 185 191 204 208 213 218 224 . . . . . 227 228 237 240 242 249 . . . . . . . . . . . . . . . . . . . . 8. Hypothesis Testing 8.1 Test of Hypothesis . . . . . . . . . . . . . . . . . 8.2 Likelihood Ratio Tests for Simple Hypotheses . . 8.3 Likelihood Ratio Tests for Composite Hypotheses 8.4 Chapter 8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 253 257 265 273 9. Solutions to Chapter Exercises 9.1 Chapter 2 . . . . . . . . . . . 9.2 Chapter 3 . . . . . . . . . . . 9.3 Chapter 4 . . . . . . . . . . . 9.4 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 276 288 305 316 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 9.5 9.6 9.7 v Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 10. Solutions to Selected End of 10.1 Chapter 2 . . . . . . . . . . 10.2 Chapter 3 . . . . . . . . . . 10.3 Chapter 4 . . . . . . . . . . 10.4 Chapter 5 . . . . . . . . . . 10.5 Chapter 6 . . . . . . . . . . 10.6 Chapter 7 . . . . . . . . . . 10.7 Chapter 8 . . . . . . . . . . Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 338 383 412 435 448 460 465 11. Summary of Named Distributions 469 12. Distribution Tables 473 0 CONTENTS Preface In order to provide improved versions of these Course Notes for students in subsequent terms, please email corrections, sections that are confusing, or comments/suggestions to castruth@uwaterloo.ca. 1. Preview The following examples will illustrate the ideas and concepts discussed in these Course Notes. They also indicate how these ideas and concepts are connected to each other. 1.1 Example The number of service interruptions in a communications system over 200 separate days is summarized in the following frequency table: Number of interruptions Observed frequency 0 64 1 71 2 42 3 18 4 4 5 1 >5 0 Total 200 It is believed that a Poisson model will …t these data well. Why might this be a reasonable assumption? (PROBABILITY MODELS ) If we let the random variable X = number of interruptions in a day and assume that the Poisson model is reasonable then the probability function of X is given by P (X = x) = xe for x = 0; 1; : : : x! where is a parameter of the model which represents the mean number of service interruptions in a day. (RANDOM VARIABLES, PROBABILITY FUNCTIONS, EXPECTATION, MODEL PARAMETERS ) Since is unknown we might estimate it using the sample mean x= 64(0) + 71(1) + 200 + 1(5) = 230 = 1:15 200 (POINT ESTIMATION ) The estimate ^ = x is the maximum likelihood estimate of . It is the value of which maximizes the likelihood function. (MAXIMUM LIKELIHOOD ESTIMATION ) The likelihood function is the probability of the observed data as a function of the unknown parameter(s) in the model. The maximum likelihood estimate is thus the value of which maximizes the probability of observing the given data. 1 2 1. PREVIEW In this example the likelihood function is given by L( ) = P (observing 0 interruptions 64 times,. . . , > 5 interruptions 0 times; ) = 200! 64!71! 1!0! =c =c e 200 for 71 1e 0! 64(0)+71(1)+ +1(5) 230 64 0e 1 5e 1! 1 P 5! x=6 (64+71+ +1) e xe 0 x! >0 where 1 0! 200! c= 64!71! 1!0! The maximum likelihood estimate of 64 1 1! 71 ::: 1 5! can be found by solving 1 dL d = 0 or equivalently d log L d = 0 and verifying that it corresponds to a maximum. If we want an interval of values for which are reasonable given the data then we could construct a con…dence interval for . (INTERVAL ESTIMATION ) To construct con…dence intervals we need to …nd the sampling distribution of the estimator. In this example we would need to …nd the distribution of the estimator X= X1 + X2 + n + Xn where Xi = number of interruptions in a day i, i = 1; 2; : : : ; 200. (FUNCTIONS OF RANDOM VARIABLES: cumulative distribution function technique, one-to-one transformations, moment generating function technique) Since Xi Poisson( ) with E(Xi ) = and V ar(Xi ) = the distribution of X for large n is approximately N( ; =n) by the Central Limit Theorem. (LIMITING DISTRIBUTIONS ) Suppose the manufacturer of the communications system claimed that the mean number of interruptions was 1. Then we would like to test the hypothesis H : = 1. (TESTS OF HYPOTHESIS ) A test of hypothesis uses a test statistic to measure the evidence based on the observed data against the hypothesis. A test statistic with good properties for testing H : = 0 is the likelihood ratio statistic, 2 log [L ( 0 ) =L (^ )]. (LIKELIHOOD RATIO STATISTIC ) For large n the distribution of the likelihood ratio statistic is approximately 2 (1) if the hypothesis H : = 0 is true. 1.2 Example The following are relief times in hours for 20 patients receiving a pain killer: 1:1 4:1 1:4 1:8 1:3 1:5 1:7 1:2 1:9 1:4 1:8 3:0 1:6 1:7 2:2 2:3 1:7 1:6 2:7 2:0 It is believed that the Weibull distribution with probability density function f (x) = x 1 e (x= ) for x > 0; > 0; >0 1.2. EXAMPLE 3 will provide a good …t to the data. (CONTINUOUS MODELS, PROBABILITY DENSITY FUNCTIONS ) Assuming independent observations the (approximate) likelihood function is 20 Q L( ; ) = xi 1 e (xi = ) for > 0; >0 i=1 where xi is the observed relief time for the ith patient. (MULTIPARAMETER LIKELIHOODS ) The maximum likelihood estimates ^ and ^ are found by simultaneously solving @ log L =0 @ and @ log L =0 @ Since an explicit solution to these equations cannot be obtained, a numerical solution must be found using an iterative method. (NEWTON’S METHOD) Also, since the maximum likelihood estimators cannot be given explicitly, approximate con…dence intervals and tests of hypothesis must be based on the asymptotic distributions of the maximum likelihood estimators. (LIMITING OR ASYMPTOTIC DISTRIBUTIONS OF MAXIMUM LIKELIHOOD ESTIMATORS ) 4 1. PREVIEW 2. Univariate Random Variables In this chapter we review concepts that were introduced in a previous probability course such as STAT 220/230/240 as well as introducing new concepts. In Section 2:1 the concepts of random experiments, sample spaces, probability models, and rules of probability are reviewed. The concepts of a sigma algebra and probability set function are also introduced. In Section 2:2 we de…ne a random variable and its cumulative distribution function. It is important to note that the de…nition of a cumulative distribution function is the same for all types of random variables. In Section 2:3 we de…ne a discrete random variable and review the named discrete distributions (Hypergeometric, Binomial, Geometric, Negative Binomial, Poisson). In Section 2:4 we de…ne a continuous random variable and review the named continuous distributions (Uniform, Exponential, Normal, and Chi-squared). We also introduce new continuous distributions (Gamma, Two Parameter Exponential, Weibull, Cauchy, Pareto). A summary of the named discrete and continuous distributions that are used in these Course Notes can be found in Chapter 11. In Section 2:5 we review the cumulative distribution function technique for …nding a function of a random variable and prove a theorem which can be used in the case of a monotone function. In Section 2:6 we review expectations of functions of random variables. In Sections 2:8 2:10 we introduce new material related to expectation such as inequalities, variance stabilizing transformations and moment generating functions. Section 2.11 contains a number of useful calculus results which will be used throughout these Course Notes. 2.1 Probability To model real life phenomena for which we cannot predict exactly what will happen we assign numbers, called probabilities, to outcomes of interest which re‡ect the likelihood of such outcomes. To do this it is useful to introduce the concepts of an experiment and its associated sample space. Consider some phenomenon or process which is repeatable, at least in theory. We call the phenomenon or process a random experiment and refer to a single repetition of the experiment as a trial. For such an experiment we consider the set of all possible outcomes. 5 6 2.1.1 2. UNIVARIATE RANDOM VARIABLES De…nition - Sample Space A sample space S is a set of all the distinct outcomes for a random experiment, with the property that in a single trial, one and only one of these outcomes occurs. To assign probabilities to the events of interest for a given experiment we begin by de…ning a collection of subsets of a sample space S which is rich enough to de…ne all the events of interest for the experiment. We call such a collection of subsets a sigma algebra. 2.1.2 De…nition - Sigma Algebra A collection of subsets of a set S is called a sigma algebra, denoted by B, if it satis…es the following properties: (1) ? 2 B where ? is the empty set (2) If A 2 B then A 2 B (3) If A1 ; A2 ; : : : 2 B then 1 S i=1 Ai 2 B Suppose A1 ; A2 ; : : : are subsets of the sample space S which correspond to events of interest for the experiment. To complete the probability model for the experiment we need to assign real numbers P (Ai ) ; i = 1; 2; : : :, where P (Ai ) is called the probability of Ai . To develop the theory of probability these probabilities must satisfy certain properties. The following Axioms of Probability are a set of axioms which allow a mathematical structure to be developed. 2.1.3 De…nition - Probability Set Function Let B be a sigma algebra associated with the sample space S. A probability set function is a function P with domain B that satis…es the following axioms: (A1) P (A) 0 for all A 2 B (A2) P (S) = 1 (A3) If A1 ; A2 ; : : : 2 B are pairwise mutually exclusive events, that is, Ai \ Aj = ? for all i 6= j, then 1 1 S P P Ai = P (Ai ) i=1 i=1 Note: The probabilities P (A1 ) ; P (A2 ) ; : : : can be assigned in any way as long they satisfy these three axioms. However, if we wish to model real life phenomena we would assign the probabilities such that they correspond to the relative frequencies of events in a repeatable experiment. 2.1. PROBABILITY 2.1.4 7 Example Let B be a sigma algebra associated with the sample space S and let P be a probability set function with domain B. If A; B 2 B then prove the following: (a) P (?) = 0 (b) If A; B 2 B and A and B are mutually exclusive events then P (A [ B) = P (A)+P (B). (c) P A = 1 (d) If A P (A) B then P (A) P (B) Note: A B means a 2 A implies a 2 B. Solution (a) Let A1 = S and Ai = ? for i = 2; 3; : : :. Since it follows that P (S) = P (S) + 1 S Ai = S then by De…nition 2.1.3 (A3) i=1 1 P P (?) i=2 and by (A2) we have 1=1+ 1 P P (?) i=2 By (A1) the right side is a series of non-negative numbers which must converge to the left side which is 1 which is …nite which results in a contradiction unless P (?) = 0 as required. 1 S (b) Let A1 = A, A2 = B, and Ai = ? for i = 3; 4; : : :. Since Ai = A [ B then by (A3) i=1 P (A [ B) = P (A) + P (B) + and since P (?) = 0 by the result (a) it follows that 1 P P (?) i=3 P (A [ B) = P (A) + P (B) (c) Since S = A [ A and A \ A = ? then by (A2) and the result proved in (b) it follows that 1 = P (S) = P A [ A = P (A) + P A or P A =1 P (A) (d) Since B = (A \ B) [ A \ B = A [ A \ B and A \ A \ B = ? then by (b) P (B) = P (A) + P A \ B . But by (A1), P A \ B 0 so it follows that P (B) P (A). 2.1.5 Exercise Let B be a sigma algebra associated with the sample space S and let P be a probability set function with domain B. If A; B 2 B then prove the following: (a) 0 P (A) 1 (b) P A \ B = P (A) P (A \ B) (c) P (A [ B) = P (A) + P (B) P (A \ B) 8 2. UNIVARIATE RANDOM VARIABLES For a given experiment we are sometimes interested in the probability of an event given that we know that the event of interest has occurred in a certain subset of S. For example, the experiment might involve people of di¤erent ages and we may be interest in an event only for a given age group. This leads us to de…ne conditional probability. 2.1.6 De…nition - Conditional Probability Let B be a sigma algebra associated with the sample space S and suppose A; B 2 B. The conditional probability of event A given event B is P (AjB) = 2.1.7 P (A \ B) P (B) provided P (B) > 0 Example The following table of probabilities are based on data from the 2011 Canadian census. The probabilities are for Canadians aged 25 34. Employed Unemployed 0:066 0:010 High school diploma or equivalent 0:185 0:016 Postsecondary certi…cate, diploma or degree 0:683 0:040 Highest level of education attained No certi…cate, diploma or degree If a person is selected at random what is the probability the person (a) is employed? (b) has no certi…cate, diploma or degree? (c) is unemployed and has at least a high school diploma or equivalent? (d) has at least a high school diploma or equivalent given that they are unemployed? Solution (a) Let E be the event “employed”, A1 be the event “no certi…cate, diploma or degree”, A2 be the event “high school diploma or equivalent”, and A3 be the event “postsecondary certi…cate, diploma or degree”. P (E) = P (E \ A1 ) + P (E \ A2 ) + P (E \ A3 ) = 0:066 + 0:185 + 0:683 = 0:934 2.1. PROBABILITY 9 (b) P (A1 ) = P (E \ A1 ) + P E \ A1 = 0:066 + 0:010 = 0:076 (c) P (unemployed and have at least a high school diploma or equivalent) = P E \ (A2 [ A3 ) = P E \ A2 + P E \ A3 = 0:016 + 0:040 = 0:056 (d) P A2 [ A3 j E P E \ (A2 [ A3 ) P E 0:056 = 0:066 = 0:848 = If the occurrence of event B does not a¤ect the probability of the event A, then the events are called independent events. 2.1.8 De…nition - Independent Events Let B be a sigma algebra associated with the sample space S and suppose A; B 2 B. A and B are independent events if P (A \ B) = P (A) P (B) 2.1.9 Example In Example 2.1.7 are the events, “unemployed” and “no certi…cate, diploma or degree”, independent events? Solution The events “unemployed”and “no certi…cate, diploma or degree”are not independent since 0:010 = P E \ A1 6= P E P (A1 ) = (0:066) (0:076) 10 2.2 2. UNIVARIATE RANDOM VARIABLES Random Variables A probability model for a random experiment is often easier to construct if the outcomes of the experiment are real numbers. When the outcomes are not real numbers, the outcomes can be mapped to numbers using a function called a random variable. When the observed data are numerical values such as the number of interruptions in a day in a communications system or the length of time until relief after taking a pain killer, random variables are still used in constructing probability models. 2.2.1 De…nition of a Random Variable A random variable X is a function from a sample space S to the real numbers <, that is, X:S!< such that P (X x) is de…ned for all x 2 <. Note: ‘X x’is an abbreviation for f! 2 S : X(!) xg where f! 2 S : X(!) and B is a sigma algebra associated with the sample space S. 2.2.2 xg 2 B Example Three friends Ali, Benita and Chen are enrolled in STAT 330. Suppose we are interested in whether these friends earn a grade of 70 or more. If we let A represent the event “Ali earns a grade of 70 or more”, B represent the event “Benita earns a grade of 70 or more”, and C represent the event “Chen earns a grade of 70 or more” then a suitable sample space is S = ABC; ABC; ABC; AB C; ABC; AB C; AB C; AB C Suppose we are mostly interested in how many of these friends earn a grade of 70 or more. We can de…ne the random variable X = “number of friends who earn a grade of 70 or more”. The range of X is f0; 1; 2; 3g with associated mapping X (ABC) = 3 X ABC = X ABC = X AB C = 2 X ABC = X AB C = X AB C = 1 X AB C = 0 An important function associated with random variables is the cumulative distribution function. 2.2. RANDOM VARIABLES 2.2.3 11 De…nition - Cumulative Distribution Function The cumulative distribution function (c.d.f.) of a random variable X is de…ned by F (x) = P (X x) for x 2 < Note: The cumulative distribution function is de…ned for all real numbers. 2.2.4 Properties - Cumulative Distribution Function (1) F is a non-decreasing function, that is, F (x1 ) F (x2 ) for all x1 < x2 (2) lim F (x) = 0 and x! 1 lim F (x) = 1 x!1 (3) F is a right-continuous function, that is, lim F (x) = F (a) x!a+ (4) For all a < b P (a < X b) = P (X b) P (X a) = F (b) F (a) (5) For all b P (X = b) = F (b) 2.2.5 lim F (a) a!b Example Suppose X is a random variable with cumulative distribution function 8 x<0 > > 0 > > > > 0 x<1 > < 0:1 F (x) = P (X (a) Graph the function F (x). (b) Determine the probabilities (i) P (X 1) (ii) P (X 2) (iii) P (X 2:4) (iv) P (X = 2) (v) P (0 < X 2) x) = 0:3 > > > > 0:6 > > > : 1 1 x<2 2 x<3 x 3 12 2. UNIVARIATE RANDOM VARIABLES 1.1 1 0.9 0.8 0.7 0.6 F(x) 0.5 0.4 0.3 0.2 0.1 0 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 x Figure 2.1: Graph of F (x) = P (X (vi) P (0 X x) for Example 2.2.5 2). Solution (a) See Figure 2.1. (b) (i) P (X 1) = F (1) = 0:3 P (X 2) = F (2) = 0:6 (ii) (iii) P (X 2:4) = P (X 2) = F (2) = 0:6 (iv) P (X = 2) = F (2) lim F (x) = 0:6 x!2 0:3 = 0:3 or P (X = 2) = P (X 2) P (X 1) = F (2) F (1) = 0:6 0:3 = 0:3 (v) P (0 < X 2) = P (X 2) P (X 0) = F (2) F (0) = 0:6 0:1 = 0:5 (vi) P (0 X 2) = P (X 2) P (X < 0) = F (2) 0 = 0:6 4 2.2. RANDOM VARIABLES 2.2.6 13 Example Suppose X is a random variable with cumulative distribution function 8 > 0 x 0 > > > > 2 3 < 0<x<1 5x F (x) = P (X x) = 1 2 > 3x 7 1 x<2 > 5 12x > > > : 1 x 2 (a) Graph the function F (x). (b) Determine the probabilities (i) P (X 1) (ii) P (X 2) (iii) P (X 2:4) (iv) P (X = 0:5) (v) P (X = b), for b 2 < (vi) P (1 < X 2:4) (vii) P (1 X 2:4). Solution (a) See Figure 2.2 1 0 .8 0 .6 F (x ) 0 .4 0 .2 0 0 0 .5 1 1 .5 2 x Figure 2.2: Graph of F (x) = P (X x) for Example 2.2.6 14 2. UNIVARIATE RANDOM VARIABLES (b) (i) P (X 1) = F (1) 2 2 = (1)3 = 5 5 = 0:4 (ii) P (X 2) = F (2) = 1 (iii) P (X 2:4) = F (2:4) = 1 (iv) P (X = 0:5) = F (0:5) lim F (x) x!0:5 = F (0:5) F (0:5) = 0 (v) P (X = b) = F (b) lim F (a) a!b = F (b) F (b) = 0 for all b 2 < (vi) P (1 < X 2:4) = F (2:4) = 1 F (1) 0:4 = 0:6 (vii) P (1 X 2:4) = P (1 < X = 0:6 2:4) since P (X = 1) = 0 2.3. DISCRETE RANDOM VARIABLES 2.3 15 Discrete Random Variables A set A is countable if the number of elements in the set is …nite or the elements of the set can be put into a one-to-one correspondence with the positive integers. 2.3.1 De…nition - Discrete Random Variable A random variable X de…ned on a sample space S is a discrete random variable if there is a countable subset A < such that P (X 2 A) = 1. 2.3.2 De…nition - Probability Function If X is a discrete random variable then the probability function (p.f.) of X is given by f (x) = P (X = x) = F (x) lim F (x "!0+ ") for x 2 < The set A = fx : f (x) > 0g is called the support set of X. 2.3.3 Properties - Probability Function (1) f (x) (2) 0 for x 2 < P f (x) = 1 x2A 2.3.4 Example In Example 2.2.5 …nd the support set A, show that X is a discrete random variable and determine its probability function. Solution The support set of X is A = f0; 1; 2; 3g which is a countable 8 > 0:1 > > > > < P (X 1) P (X 0) = 0:3 f (x) = P (X = x) = > P (X 2) P (X 1) = 0:6 > > > > : P (X 3) P (X 2) = 1 set. Its probability function is if x = 0 0:1 = 0:2 if x = 1 0:3 = 0:3 if x = 2 0:6 = 0:4 or x f (x) = P (X = x) 0 0:1 1 0:2 2 0:3 3 0:4 Total 1 if x = 3 16 2. UNIVARIATE RANDOM VARIABLES Since P (X 2 A) = 3 P P (X = x) = 1, X is a discrete random variable. x=0 In the next example we review four of the named distributions which were introduced in a previous probability course. 2.3.5 Example Suppose a box containing a red balls and b black balls. For each of the following …nd P the probability function of the random variable X and show that f (x) = 1 where x2A A = fx : f (x) > 0g is the support set of X. (a) X = number of red balls among n balls drawn at random without replacement. (b) X = number of red balls among n balls drawn at random with replacement. (c) X = number of black balls selected before obtaining the …rst red ball if sampling is done at random with replacement. (d) X = number of black balls selected before obtaining the kth red ball if sampling is done at random with replacement. Solution (a) If n balls are selected at random without replacement from a box of a red balls and b black balls then the random variable X = number of red balls has a Hypergeometric distribution with probability function f (x) = P (X = x) = a x b n x a+b n for x = max (0; n By the Hypergeometric identity 2.11.6 P f (x) = x2A a x min(a;n) P x=max(0;n b) = = = 1 1 a+b n a+b n a+b n 1 P x=0 b n x a+b n a b x n x b) ; : : : ; min (a; n) 2.3. DISCRETE RANDOM VARIABLES 17 (b) If n balls are selected at random with replacement from a box of a red balls and b black balls then we have a sequence of Bernoulli trials and the random variable X = number of red balls has a Binomial distribution with probability function n x p (1 x f (x) = P (X = x) = where p = a a+b . p)n x for x = 0; 1; : : : ; n By the Binomial series 2.11.3(1) P f (x) = n P n x p (1 x x20 x2A p)n x p)n = (p + 1 = 1 (c) If sampling is done with replacement then we have a sequence of Bernoulli trials and the random variable X = number of black balls selected before obtaining the …rst red ball has a Geometric distribution with probability function p)x for x = 0; 1; : : : f (x) = P (X = x) = p (1 By the Geometric series 2.11.1 P f (x) = 1 P p (1 p)x p (1 p)] x20 x2A = [1 = 1 (d) If sampling is done with replacement then we have a sequence of Bernoulli trials and the random variable X = number of black balls selected before obtaining the kth red ball has a Negative Binomial distribution with probability function f (x) = P (X = x) = x+k x 1 pk (1 p)x for x = 0; 1; : : : Using the identity 2.11.4(1) x+k x 1 = k ( 1)x x the probability function can be written as f (x) = P (X = x) = k k p (p x 1)x for x = 0; 1; : : : 18 2. UNIVARIATE RANDOM VARIABLES By the Binomial series 2.11.3(2) P 1 P f (x) = x20 x2A = pk k k p (p x 1)x k (p x 1)x 1 P x20 k = p (1 + p 1) k = 1 2.3.6 Example If X is a random variable with probability function f (x) = xe for x = 0; 1; : : : ; x! show that 1 P >0 (2.1) f (x) = 1 x=0 Solution By the Exponential series 2.11.7 1 P f (x) = x=0 1 P x=0 = e xe x! 1 P x=0 = e x x! e = 1 The probability function (2.1) is called the Poisson probability function. 2.3.7 Exercise If X is a random variable with probability function f (x) = show that (1 p)x x log p for x = 1; 2; : : : ; 0 < p < 1 1 P f (x) = 1 x=1 Hint: Use the Logarithmic series 2.11.8. Important Note: A summary of the named distributions used in these Course Notes can be found in Chapter 11. 2.4. CONTINUOUS RANDOM VARIABLES 2.4 2.4.1 19 Continuous Random Variables De…nition - Continuous Random Variable Suppose X is a random variable with cumulative distribution function F . If F is a continuous function for all x 2 < and F is di¤erentiable except possibly at countably many points then X is called a continuous random variable. Note: The de…nition (2.2.3) and properties (2.2.4) of the cumulative distribution function hold for the random variable X regardless of whether X is discrete or continuous. 2.4.2 Example Suppose X is a random variable with cumulative distribution function 8 > 0 x 1 > > > > 2 1 < 1<x 0 2 (x + 1) + x + 1 F (x) = P (X x) = 2 1 > 1) + x 0<x<1 > 2 (x > > > : 1 x 1 Show that X is a continuous random variable. Solution The cumulative distribution function F is a continuous function for all x 2 < since it is a piecewise function composed of continuous functions and lim F (x) = F (a) x!a at the break points a = 1; 0; 1. The function F is di¤erentiable for all x 6= composed of di¤erentiable functions. Since F ( 1 + h) F ( 1) h F (0 + h) F (0) lim h h!0 F (1 + h) F (1) lim h h!0 lim h!0 1; 0; 1 since it is a piecewise function F ( 1 + h) F ( 1) =1 h F (0 + h) F (0) = 0 = lim + h h!0 F (1 + h) F (1) = 1 6= lim =0 h h!0+ = 0 6= lim h!0+ F is di¤erentiable at x = 0 butnot di¤erentiable at x = 1; 1. The set f 1; 1g is a countable set. Since F is a continuous function for all x 2 < and F is di¤erentiable except for countable many points, therefore X is a continuous random variable. 20 2. UNIVARIATE RANDOM VARIABLES 2.4.3 De…nition - Probability Density Function If X is a continuous random variable with cumulative distribution function F (x) then the probability density function (p.d.f.) of X is f (x) = F 0 (x) if F is di¤erentiable at x. The set A = fx : f (x) > 0g is called the support set of X. Note: At the countably many points at which F 0 (a) does not exist, f (a) may be assigned any convenient value since the probabilities P (X x) will be una¤ected by the choice. We usually choose f (a) 0 and most often we choose f (a) = 0. 2.4.4 Properties - Probability Density Function (1) f (x) 0 for all x 2 < R1 (2) f (x)dx = lim F (x) lim F (x) = 1 x!1 1 F (x+h) F (x) h h!0 x R (3) f (x) = lim (4) F (x) = f (t)dt; 1 (5) P (a < X x! 1 P (x X x+h) h h!0 = lim if this limit exists x2< b) = P (X b) P (X a) = F (b) F (a) = Rb f (x)dx a (6) P (X = b) = F (b) lim F (a) = F (b) F (b) = 0 a!b (since F is continuous). 2.4.5 Example Find and sketch the probability density function of the random variable X with the cumulative distribution function in Example 2.2.6. Solution By taking the derivative of F (x) we obtain F 0 (x) = 8 > > > > > < > > > > > : 0 d dx d 2 3 dx 5 x 12x 3x2 7 5 0 x<0 = = 6 2 5x 1 5 (12 0<x<1 6x) 1<x<2 x>2 We can assign any values to f (0), f (1), and f (2). For convenience we choose f (0) = f (2) = 0 and f (1) = 56 . 2.4. CONTINUOUS RANDOM VARIABLES 21 The probability density function is 0 f (x) = F (x) = 8 > > < > > : 6 2 5x 1 5 (12 0<x 6x) 1 1<x<2 0 otherwise The graph of f (x) is given in Figure 2.3. 1.4 1.2 1 f(x) 0.8 0.6 0.4 0.2 0 -0.5 0 0.5 1 1.5 2 2.5 x Figure 2.3: Graph of probability density function for Example 2.4.5 Note: See Table 2.1 in Section 2.7 for a summary of the di¤erences between the properties of a discrete and continuous random variable. 2.4.6 Example Suppose X is a random variable with cumulative distribution function 8 0 x a > > < x a a<x b F (x) = b a > > : 1 x>b where b > a. (a) Sketch F (x) the cumulative distribution function of X. (b) Find f (x) the probability density function of X and sketch it. (c) Is it possible for f (x) to take on values greater than one? 22 2. UNIVARIATE RANDOM VARIABLES Solution (a) See Figure 2.4 for a sketch of the cumulative distribution function. 1.0 F(x) 0.5 0 a x (a+b)/2 b Figure 2.4: Graph of cumulative distribution function for Example 2.4.6 (b) By taking the derivative of F (x) we obtain 8 0 > > < 1 F 0 (x) = b a > > : 0 x<a a<x<b x>b The derivative of F (x) does not exist for x = a or x = b. For convenience we de…ne f (a) = f (b) = b 1 a so that ( 1 a x b b a f (x) = 0 otherwise See Figure 2.5 for a sketch of the probability density function. Note that we could de…ne f (a) and f (b) to be any values and the cumulative distribution function would remain the same since x = a or x = b are countably many points. The random variable X is said to have a Uniform(a; b) distribution. We write this as X Uniform(a; b). (c) If a = 0 and b = 0:5 then f (x) = 2 for 0 x 0:5. This example illustrates that the probability density function is not a probability and that the probability density function can take on values greater than one. The important restriction for continuous random variables is Z1 f (x) dx = 1 1 2.4. CONTINUOUS RANDOM VARIABLES 23 1/(b-a) f(x) x a (a+b)/2 b Figure 2.5: Graph of the probability density function of a Uniform(a; b) random variable 2.4.7 Example Consider the function f (x) = and 0 otherwise. For what values of for x 1 x +1 is this function a probability density function? Solution Using the result (2.8) from Section 2.11 Z1 f (x) dx = 1 Z1 dx +1 x 1 = lim b!1 Zb 1 x 1 = lim b!1 = 1 = 1 Also f (x) 0 if h x 1 b >0 lim jb1 dx i b!1 if > 0. Therefore f (x) is a a probability density function for all X is said to have a Pareto(1; ) distribution. > 0. 24 2. UNIVARIATE RANDOM VARIABLES A useful function for evaluating integrals associated with several named random variables is the Gamma function. 2.4.8 De…nition - Gamma Function The gamma function, denoted by ( ) for all ( )= Z1 y > 0, is given by 1 e y dy 0 2.4.9 Properties - Gamma Function (1) ( ) = ( 1) ( (2) (n) = (n p (3) ( 21 ) = 1)! n = 1; 2; : : : 2.4.10 1) >1 Example Suppose X is a random variable with probability density function f (x) = x 1 e x= ( ) for x > 0; > 0; >0 and 0 otherwise. X is said to have a Gamma distribution with parameters X Gamma( ; ). (a) Verify that Z1 and and we write f (x)dx = 1 1 (b) What special probability density function is obtained for = 1? (c) Graph the probability density functions for (i) = 1, = 3 (ii) = 2, = 1:5 (iii) = 5, = 0:6 (iv) = 10, = 0:3 on the same graph. Note: See Chapter 11 - Summary of Named Distributions. Note that the notation for parameters used for named distributions is not necessarily the same in all textbooks. This is especially true for distributions with two or more parameters. 2.4. CONTINUOUS RANDOM VARIABLES 25 Solution (a) Z1 Z1 f (x) dx = 1 1 e x= x ( ) dx let y = x= 0 Z1 1 ( ) = 1 (y ) e y dy 0 1 ( ) = Z1 1 y e y dy 0 ( ) ( ) = = 1 (b) If = 1 the probability density function is f (x) = 1 e x= for x > 0; >0 and 0 otherwise which is the Exponential( ) distribution. (c) See Figure 2.6 0.45 α=10, β=0.3 0.4 0.35 α=1, β=3 α=5, β=0.6 0.3 f(x) 0.25 α=2, β=1.5 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 8 9 x Figure 2.6: Gamma( ; ) probability density functions 26 2. UNIVARIATE RANDOM VARIABLES 2.4.11 Exercise Suppose X is a random variable with probability density function f (x) = 1 x (x= ) e for x > 0; > 0; >0 and 0 otherwise. X is said to have a Weibull distribution with parameters X Weibull( ; ). (a) Verify that Z1 and and we write f (x)dx = 1 1 (b) What special probability density function is obtained for = 1? (c) Graph the probability density functions for (i) = 1, = 0:5 (ii) = 2, = 0:5 (iii) = 2, = 1 (iv) = 3, = 1 on the same graph. 2.4.12 Exercise Suppose X is a random variable with probability density function f (x) = x +1 for x > ; > 0; >0 and 0 otherwise. X is said to have a Pareto distribution with parameters X Pareto( ; ). (a) Verify that Z1 f (x)dx = 1 1 (b) Graph the probability density functions for (i) = 1, = 1 (ii) = 1, = 2 (iii) = 0:5, = 1 (iv) = 0:5, = 2 on the same graph. and and we write 2.5. LOCATION AND SCALE PARAMETERS 2.5 27 Location and Scale Parameters In Chapter 6 we will look at methods for constructing con…dence intervals for an unknown parameter . If the parameter is either a location parameter or a scale parameter then a con…dence interval is easier to construct. 2.5.1 De…nition - Location Parameter Suppose X is a continuous random variable with probability density function f (x; ) where is a parameter of the distribution. Let F0 (x) = F (x; = 0) and f0 (x) = f (x; = 0). The parameter is called a location parameter of the distribution if F (x; ) = F0 (x ) for 2< ) for 2< or equivalently f (x; ) = f0 (x 2.5.2 De…nition - Scale Parameter Suppose X is a continuous random variable with probability density function f (x; ) where is a parameter of the distribution. Let F1 (x) = F (x; = 1) and f1 (x) = f (x; = 1). The parameter is called a scale parameter of the distribution if F (x; ) = F1 x for >0 1 x f (x; ) = f1 ( ) for >0 or equivalently 2.5.3 Example Suppose X is a continuous random variable with probability density function f (x) = 1 e (x )= for x ; 2 <; >0 and 0 otherwise. X is said to have a Two Parameter Exponential distribution and we write X Exponential( ; ). (a) If X Two Parameter Exponential( ; 1) show that distribution. Sketch the probability density function for Double is a location parameter for this = 1; 0; 1 on the same graph. (b) If X Two Parameter Exponential(0; ) show that is a scale parameter for this distribution. Sketch the probability density function for = 0:5; 0; 2 on the same graph. 28 2. UNIVARIATE RANDOM VARIABLES Solution (a) For X Two Parameter Exponential( ; 1) the probability density function is f (x; ) = e (x ) for x ; 2< and 0 otherwise. Let f0 (x) = f (x; = 0) = e x for x > 0 and 0 otherwise. Then f (x; ) = e (x ) = f0 (x and therefore ) for all 2< is a location parameter of this distribution. See Figure 2.7 for a sketch of the probability density function for = 1; 0; 1. 1 0.9 0.8 θ=-1 θ=0 θ=1 0.7 0.6 f (x) 0.5 0.4 0.3 0.2 0.1 0 -1 0 1 2 x 3 4 Figure 2.7: Exponential( ; 1) probability density function for (b) For X 5 = 1; 0; 1 Two Parameter Exponential(0; ) the probability density function is 1 f (x; ) = e x= for x > 0; >0 and 0 otherwise which is the Exponential( ) probability density function. 2.5. LOCATION AND SCALE PARAMETERS 29 Let f1 (x) = f (x; = 1) = e x for x > 0 and 0 otherwise. Then 1 f (x; ) = e and therefore x= 1 x = f1 for all >0 is a scale parameter of this distribution. See Figure 2.8 for a sketch of the probability density function for = 0:5; 1; 2. 2 1.8 θ=0.5 1.6 1.4 1.2 f(x) θ=1 1 0.8 0.6 θ=2 0.4 0.2 0 0 0.5 1 1.5 x 2 2.5 Figure 2.8: Exponential( ) probability density function for 2.5.4 3 = 0:5; 1; 2 Exercise Suppose X is a continuous random variable with probability density function f (x) = 1 f1 + [(x )= ]2 g for x 2 <; 2 <; >0 and 0 otherwise. X is said to have a two parameter Cauchy distribution and we write X Cauchy( ; ). (a) If X Cauchy( ; 1) then show that is a location parameter for the distribution. Graph the Cauchy( ; 1) probability density function for = 1; 0 and 1 on the same graph. (b) If X Cauchy(0; ) then show that is a scale parameter for the distribution. Graph the Cauchy(0; ) probability density function for = 0:5; 1 and 2 on the same graph. 30 2.6 2. UNIVARIATE RANDOM VARIABLES Functions of a Random Variable Suppose X is a continuous random variable with probability density function f and cumulative distribution function F and we wish to …nd the probability density function of the random variable Y = h(X) where h is a real-valued function. In this section we look at techniques for determining the distribution of Y . 2.6.1 Cumulative Distribution Function Technique A useful technique for determining the distribution of a function of a random variable Y = h(X) is the cumulative distribution function technique. This technique involves obtaining an expression for G(y) = P (Y y), the cumulative distribution function of Y , in terms of F , the cumulative distribution function of X. The corresponding probability density function g of Y is found by di¤erentiating G. Care must be taken to determine the support set of the random variable Y . 2.6.2 Example If Z N(0; 1) …nd the probability density function of Y = Z 2 . What type of random variable is Y ? Solution If Z N(0; 1) then the probability density function of Z is 1 f (z) = p e 2 z 2 =2 for z 2 < Let G (y) = P (Y y) be the cumulative distribution function of Y = Z 2 . Since the support set of the random variable Z is < and Y = Z 2 then the support set of Y is B = fy : y > 0g. For y 2 B G (y) = P (Y y) = P Z2 y p = P( y Z p y) p = Zy p 1 p e 2 z 2 =2 1 p e 2 z 2 =2 dz y p = 2 Zy 0 dz since f (z) is an even function 2.6. FUNCTIONS OF A RANDOM VARIABLE 31 For y 2 B the probability density function of Y is g (y) = 2p Zy d 2 d 6 G (y) = 4 p e dy dy 2 z 2 =2 0 = = p 2 p e ( y) 2 1 p e y=2 2 y 2 =2 1 p 2 y 3 p 2 d p 2 7 dz 5 = p e ( y) =2 ( y) dy 2 (2.2) (2.3) by 2.11.10. We recognize (2.3) as the probability density function of a Chi-squared(1) random vari2 (1). able so Y = Z 2 The above solution provides a proof of the following theorem. 2.6.3 If Z 2.6.4 Theorem N(0; 1) then Z 2 2 (1). Example Suppose X Exponential( ). The cumulative distribution function for X is F (x) = P (X 8 <0 x) = x :1 e x= 0 x>0 Determine the distribution of the random variable Y = F (X) = 1 e X= . Solution Since Y = 1 e X= ,X= log (1 1 (Y Y)=F ) for X > 0 and 0 < Y < 1. For 0 < y < 1 the cumulative distribution function of Y is G (y) = P (Y y) = P 1 = P (X = F( log (1 log (1 = 1 e log(1 y)= = 1 (1 y) = y y)) e y)) X= y 32 If U 2. UNIVARIATE RANDOM VARIABLES Uniform(0; 1) then the cumulative distribution function of U is P (U 8 > 0 u 0 > > < u) = u 0 < u <1 > > > :1 u 1 Therefore the cumulative distribution function of Y = F (X) = 1 e X= is Uniform(0; 1). This is an example of a result which holds more generally as summarized in the following theorem. 2.6.5 Theorem - Probability Integral Transformation If X is a continuous random variable with cumulative distribution function F then the random variable ZX Y = F (X) = f (t) dt (2.4) 1 has a Uniform(0; 1) distribution. Proof Suppose the continuous random variable X has support set A = fx : f (x) > 0g. For all x 2 A, F is an increasing function since F is a cumulative distribution function. Therefore for all x 2 A the function F has an inverse function F 1 . For 0 < y < 1, the cumulative distribution function of Y = F (X) is G (y) = P (Y y) = P (F (X) = P X = F F F 1 1 y) (y) (y) = y which is the cumulative distribution function of a Uniform(0; 1) random variable. Therefore Y = F (X) Uniform(0; 1) as required. Note: Because of the form of the function (transformation) Y = F (X) in (2.4), this transformation is called the probability integral transformation. This result holds for any cumulative distribution function F corresponding to a continuous random variable. 2.6. FUNCTIONS OF A RANDOM VARIABLE 2.6.6 33 Theorem Suppose F is a cumulative distribution function for a continuous random variable. If U Uniform(0; 1) then the random variable X = F 1 (U ) also has cumulative distribution function F . Proof Suppose that the support set of the random variable X = F cumulative distribution function of X = F 1 (U ) is P (X x) = P F 1 = P (U (U ) 1 (U ) is A. For x 2 A, the x F (x)) = F (x) since P (U u) = u for 0 < u < 1 if U cumulative distribution function F . Uniform(0; 1). Therefore X = F 1 (U ) has Note: The result of the previous theorem is important because it provides a method for generating observations from a continuous distribution. Let u be an observation generated from a Uniform(0; 1) distribution using a random number generator. Then by Theorem 2.6.6, x = F 1 (u) is an observation from the distribution with cumulative distribution function F . 2.6.7 Example Explain how Theorem 2.6.6 can be used to generate observations from a Weibull( ; 1) distribution. Solution If X has a Weibull( ; 1) distribution then the cumulative distribution function is F (x; ) = Zx y 1 e y dy 0 = 1 e x for x > 0 The inverse cumulative distribution function is F 1 (u) = [ log (1 u)]1= for 0 < u < 1 If u is an observation from the Uniform(0; 1) distribution then x = [ log (1 observation from the Weibull( ; 1) distribution by Theorem 2.6.6. u)]1= is an If we wish to …nd the distribution of the random variable Y = h(X) and h is a one-to-one real-valued function then the following theorem can be used. 34 2.6.8 2. UNIVARIATE RANDOM VARIABLES Theorem - One-to-One Transformation of a Random Variable Suppose X is a continuous random variable with probability density function f and support set A = fx : f (x) > 0g. Let Y = h(X) where h is a real-valued function. Let B = fy : g(y) > 0g be the support set of the random variable Y . If h is a one-to-one function from d A to B and dx h (x) is continuous for x 2 A, then the probability density function of Y is 1 g(y) = f (h (y)) d h dy 1 (y) for y 2 B Proof We prove this theorem using the cumulative distribution function technique. d (1) Suppose h is an increasing function and dx h (x) is continuous for x 2 A. Then h 1 (y) d h 1 (y) > 0 for y 2 B. The cumulative distribution is also an increasing function and dy function of Y = h(X) is G (y) = P (Y y) = P (h (X) = P X = F h h 1 1 (y) y) since h is an increasing function (y) Therefore d d G (y) = F h 1 (y) dy dy d = F 0 h 1 (y) h 1 (y) by the Chain Rule dy d d = f h 1 (y) h 1 (y) y 2 B since h 1 (y) > 0 dy dy g (y) = d (2) Suppose h is a decreasing function and dx h (x) is continuous for x 2 A. Then h 1 (y) d h 1 (y) < 0 for y 2 B. The cumulative distribution is also a decreasing function and dy function of Y = h(X) is G (y) = P (Y = P X = 1 y) = P (h (X) 1 h F h 1 (y) y) since h is a decreasing function (y) Therefore d d G (y) = 1 F h 1 (y) dy dy d = F 0 h 1 (y) h 1 (y) by the Chain Rule dy d d = f h 1 (y) h 1 (y) y 2 B since h 1 (y) < 0 dy dy g (y) = These two cases give the desired result. 2.6. FUNCTIONS OF A RANDOM VARIABLE 2.6.9 35 Example The following two results were used extensively in your previous probability and statistics courses. (a) If Z (b) If X N(0; 1) then Y = N( ; 2) + Z then Z = N X 2 ; . N(0; 1). Prove these results using Theorem 2.6.8. Solution (a) If Z N(0; 1) then the probability density function of Z is 1 f (z) = p e 2 z 2 =2 for z 2 < Y = + Z = h (Z) is an increasing function with inverse function Z = h Since the support set of Z is A = <, the support set of Y is B = <. 1 (Y )= Y . Since d d y 1 h 1 (y) = = dy dy then by Theorem 2.6.8 the probability density function of Y is g(y) = f (h = 1 (y)) y 1 p e ( 2 d h dy ) 2 (b) If X N( ; 2) N(0; 1) then Y = + Z (y) =2 1 which is the probability density function of an N Therefore if Z 1 ; N for y 2 < 2 random variable. 2 ; . then the probability density function of X is f (x) = p 1 2 x e ( ) 2 =2 for x 2 < Z= X = h (X) is an increasing function with inverse function X = h Since the support set of X is A = <, the support set of Z is B = <. Since d d h 1 (z) = ( + z) = dz dy then by Theorem 2.6.8 the probability density function of Z is g(z) = f (h = 1 (z)) 1 p e 2 z 2 =2 d h dy 1 (z) for z 2 < which is the probability density function of an N(0; 1) random variable. Therefore if X N( ; 2) then Z = Y N(0; 1). 1 (Z) = + Z. 36 2. UNIVARIATE RANDOM VARIABLES 2.6.10 Example Use Theorem 2.6.8 to prove the following relationship between the Pareto distribution and the Exponential distribution. If X P areto (1; ) then Y = log(X) Exponential 1 Solution If X Pareto(1; ) then the probability density function of X is f (x) = for x +1 x 1; >0 Y = log(X) = h (X) is an increasing function with inverse function X = eY = h 1 (Y ). Since the support set of X is A = fx : x 0g, the support set of Y is B = fy : y > 0g. Since d h dy 1 (y) = d y e = ey dy then by Theorem 2.6.8 the probability density function of Y is g(y) = f (h = (ey ) 1 (y)) +1 d h dy ey = e 1 (y) y for y > 0 which is the probability density function of an Exponential 2.6.11 1 random variable as required. Exercise Use Theorem 2.6.8 to prove the following relationship between the Exponential distribution and the Weibull distribution. If X 2.6.12 Exponential(1) then Y = X 1= Weibull( ; ) for y > 0; Exercise Suppose X is a random variable with probability density function f (x) = x 1 for 0 < x < 1; >0 and 0 otherwise. Use Theorem 2.6.8 to prove that Y = log X Exponential 1 . > 0; >0 2.7. EXPECTATION 2.7 37 Expectation In this section we de…ne the expectation operator E which maps random variables to real numbers. These numbers have an interpretation in terms of long run averages for repeated independent trails of an experiment associated with the random variable. Much of this section is a review of material covered in a previous probability course. 2.7.1 De…nition - Expectation Suppose h(x) is a real-valued function. If X is a discrete random variable with probability function f (x) and support set A then the expectation of the random variable h (X) is de…ned by P E [h(X)] = h (x) f (x) x2A provided the sum converges absolutely, that is, provided E(jh(X)j) = P x2A jh (x)j f (x) < 1 If X is a continuous random variable with probability density function f (x) then the expectation of the random variable h (X) is de…ned by E [h(X)] = Z1 h (x) f (x)dx 1 provided the integral converges absolutely, that is, provided E(jh(X)j) = Z1 1 jh (x)j f (x)dx < 1 If E(jh(X)j) = 1 then we say that E [h(X)] does not exist. E [h(X)] is also called the expected value of the random variable h (X). 2.7.2 Example Find E(X) if X Geometric(p). Solution If X Geometric(p) then f (x) = pq x for x = 0; 1; : : : ; q = 1 p; 0 < p < 1 38 2. UNIVARIATE RANDOM VARIABLES and P E (X) = xf (x) = x2A 1 P = pq 1 P xpq x = x=0 xq x 1 1 P xpq x x=1 which converges if 0 < q < 1 x=1 pq by 2.11.2(2) (1 q)2 q if 0 < q < 1 p = = 2.7.3 Example Suppose X Pareto(1; ) with probability density function f (x) = x for x +1 and 0 otherwise. Find E(X). For what values of 1; >0 does E(X) exist? Solution E (X) = Z1 xf (x)dx = 1 Z1 = Z1 x x +1 dx 1 1 dx which converges for x > 1 by 2.8 1 = lim b!1 Zb x dx = lim x 1 b!1 +1 b j1 = 1 1 1 b 1 1 = Therefore E (X) = 2.7.4 1 1 for >1 and the mean exists only for > 1. Exercise Suppose X is a nonnegative continuous random variable with cumulative distribution function F (x) and E(X) < 1. Show that E(X) = Z1 [1 F (x)] dx 0 Hint: Use integration by parts with u = [1 F (x)]. 2.7. EXPECTATION 2.7.5 39 Theorem - Expectation is a Linear Operator Suppose X is a random variable with probability (density) function f (x), a and b are real constants, and g(x) and h(x) are real-valued functions. Then E (aX + b) = aE (X) + b E [ag(X) + bh(X)] = aE [g(X)] + bE [h(X)] Proof (Continuous Case) E (aX + b) = Z1 (ax + b) f (x)dx 1 = a Z1 xf (x)dx + b 1 = aE (X) + b (1) Z1 f (x)dx by properties of integrals 1 by De…nition 2.7.1 and Property 2.4.4 = aE (X) + b E [ag(X) + bh(X)] = Z1 [ag(x) + bh(x)] f (x)dx 1 = a Z1 g(x)f (x)dx + b 1 Z1 1 = aE [g(X)] + bE [h(X)] h(x)f (x)dx by properties of integrals by De…nition 2.7.1 as required. The following named expectations are used frequently. 2.7.6 Special Expectations (1) The mean of a random variable E (X) = (2) The kth moment (about the origin) of a random variable E(X k ) (3) The kth moment about the mean of a random variable h i E (X )k 40 2. UNIVARIATE RANDOM VARIABLES (4) The kth factorial moment of a random variable E X (k) = E [X(X 1) (X k + 1)] (5) The variance of a random variable )2 ] = V ar(X) = E[(X 2.7.7 2 where = E(X) Theorem - Properties of Variance 2 = V ar(X) = E(X 2 ) = E[X(X 2 2 1)] + V ar(aX + b) = a2 V ar(X) and E(X 2 ) = 2 + 2 Proof (Continuous Case) )2 ] = E X 2 V ar(X) = E[(X = E X2 = E X 2 2 E (X) + 2 2 = E X2 + 2 X+ 2 2 by Theorem 2.7.5 2 2 Also V ar(X) = E X 2 2 = E [X (X = E [X (X 1)] + E (X) = E [X (X 2 1)] + 1) + X] 2 by Theorem 2.7.5 n o V ar (aX + b) = E [aX + b (a + b)]2 h i h = E (aX a )2 = E a2 (X = a2 E[(X 2 = E(X 2 ) 2 gives E(X 2 ) = )2 i )2 ] by Theorem 2.7.5 = a2 V ar(X) by de…nition Rearranging 2 2 + 2 2.7. EXPECTATION 2.7.8 If X 41 Example Binomial(n; p) then show E(X (k) ) = n(k) pk for k = 1; 2; : : : and thus …nd E (X) and V ar(X). Solution E(X (k) ) = = n P x=k n P x(k) n(k) x=k = n(k) nPk n x p (1 p)n x x n k x p (1 p)n x k n nPk n k = n p)n p (p + 1 = n(k) pk p)(n py (1 y y=0 (k) k p)n py+k (1 y y=0 = n(k) pk k k x by 2.11.4(1) y k k) y by 2.11.3(1) for k = 1; 2; : : : For k = 1 we obtain E X (1) = E (X) = n(1) p1 = np For k = 2 we obtain E X (2) = E [X (X 1)] = n(2) p2 1) p2 = n (n Therefore V ar (X) = E[X(X = n (n = 1)] + 1) p2 + np np2 + np = np (1 p) 2 n2 p2 y=x k 42 2. UNIVARIATE RANDOM VARIABLES 2.7.9 Exercise Show the following: k (a) If X Poisson( ) then E(X (k) ) = (b) If X Negative Binomial(k; p) then E(X (j) ) = ( k)(j) (c) If X Gamma( ; ) then E(X p ) = (d) If X Weibull( ; ) then E(X k ) for k = 1; 2; : : : p j for j = 1; 2; : : : ( + p) = ( ) for p > k = p 1 p k +1 . for k = 1; 2; : : : In each case …nd E (X) and V ar(X). Table 2.1 summarizes the di¤erences between the properties of a discrete and continuous random variable. Property Discrete Random Variable F (x) = P (X x) = c.d.f. P P (X = t) t x Continuous Random Variable F (x) = P (X x) = Rx f (t) dt 1 F is a right continuous function for all x 2 < F is a continuous function for all x 2 < p.f./p.d.f. f (x) = P (X = x) f (x) = F 0 (x) 6= P (X = x) = 0 Probability of an event P P (X = x) P (X 2 E) = x2E P f (x) = P (a < X = x2E Total Probability P P (X = x) = x2A P f (x) = 1 x2A where A = support set of X Expectation E [g (X)] = P g (x) f (x) x2A where A = support set of X b) = F (b) Rb F (a) f (x) dx a R1 f (x) dx = 1 1 E [g (X)] = R1 g (x) f (x) dx 1 Table 2.1 Properties of discrete versus continuous random variables 2.8. INEQUALITIES 2.8 43 Inequalities In Chapter 5 we consider limiting distributions of a sequence of random variables. The following inequalities which involve the moments of a distribution are useful for proving limit theorems. 2.8.1 Markov’s Inequality P (jXj E(jXjk ) ck c) for all k; c > 0 Proof (Continuous Case) Suppose X is a continuous random variable with probability density function f (x). Let A= x: x c k 1 = fx : jxj cg since c > 0 Then E jXjk ck X c = E = Z A Z A Z x c k ! = Z1 1 x c Z k f (x) dx x k f (x) dx c A Z x k x k f (x) dx since f (x) dx c c k f (x) dx + x f (x) dx since c A k 0 1 for x 2 A A = P (jXj c) as required. (The proof of the discrete case follows by replacing integrals with sums.) 2.8.2 Chebyshev’s Inequality Suppose X is a random variable with …nite mean k>0 P (jX 2.8.3 j k ) and …nite variance 1 k2 Exercise Use Markov’s Inequality to prove Chebyshev’s Inequality. 2. Then for any 44 2. UNIVARIATE RANDOM VARIABLES 2.9 Variance Stabilizing Transformation In Chapter 6 we look at methods for constructing a con…dence interval for an unknown parameter based on data X. To do this it is often useful to …nd a transformation, g (X), of the data X whose variance is approximately constant with respect to . Suppose X is a random variable with …nite mean E (X) = . Suppose also that X has p …nite variance V ar (X) = 2 ( ) and standard deviation V ar (X) = ( ) also depending on . Let Y = g (X) where g is a di¤erentiable function. By the linear approximation g ( ) + g 0 ( ) (X Y = g (X) ) Therefore E (Y ) E g ( ) + g 0 ( ) (X ) = g( ) since E g 0 ( ) (X ) = g 0 ( ) E [(X )] = 0 Also V ar (Y ) If we want V ar (Y ) V ar g 0 ( ) (X ) = g0 ( ) constant with respect to g0 ( ) 2 2 V ar (X) = g 0 ( ) ( ) 2 (2.5) then we should choose g such that V ar (X) = g 0 ( ) ( ) 2 = constant In other words we need to solve the di¤erential equation k ( ) dg = d where k is a conveniently chosen constant. 2.9.1 Example If X Poisson( ) then show that the random variable Y = g (X) = constant variance. p X has approximately Solution p p p If X Poisson( ) then V ar (X) = ( ) = . For g (X) = X, g 0 (X) = p Therefore by (2.5), the variance of Y = g (X) = X is approximately g0 ( ) ( ) 2 = 1 2 1=2 p 2 = 1 1=2 . 2X 1 4 which is a constant. 2.9.2 Exercise If X Exponential( ) then show that the random variable Y = g (X) = log X has approximately constant variance. 2.10. MOMENT GENERATING FUNCTIONS 2.10 45 Moment Generating Functions If we are given the probability (density) function of a random variable X or the cumulative distribution function of the random variable X then we can determine everything there is to know about the distribution of X. There is a third type of function, the moment generating function, which also uniquely determines a distribution. The moment generating function is closely related to other transforms used in mathematics, the Laplace and Fourier transforms. Moment generating functions are a powerful tool for determining the distributions of functions of random variables (Chapter 4), particularly sums, as well as determining the limiting distribution of a sequence of random variables (Chapter 5). 2.10.1 De…nition - Moment Generating Function If X is a random variable then M (t) = E(etX ) is called the moment generating function (m.g.f.) of X if this expectation exists for all t 2 ( h; h) for some h > 0. Important: When determining the moment generating function M (t) of a random variable the values of t for which the expectation exists should always be stated. 2.10.2 Example (a) Find the moment generating function of the random variable X (b) Find the moment generating function of the random variable X Gamma( ; ). Negative Binomial(k; p). Solution (a) If X Gamma( ; ) then M (t) = Z1 = Z1 tx e f (x) dx = 1 Z1 etx x 1 1 x ( ) e x= dx 0 1 ( ) 1 x e t dx which converges for t < 0 1 = ( ) Z1 1 x e 1 x t dx let y = 1 t x 0 1 = ( ) Z1 1 y 1 t e y e y 1 t dy 0 = 1 ( ) 1 1 t Z1 1 y 0 = 1 1 t for t < 1 dy = ( ) ( ) 1 1 t 1 46 (b) If X 2. UNIVARIATE RANDOM VARIABLES Negative Binomial(k; p) then M (t) = 1 P etx x20 = pk 1 P x20 = pk 1 = 2.10.3 k k p ( q)x where q = 1 p x k x which converges for et ( q) x k qet p 1 qet by 2.11.3(2) for et < q qet < 1 1 k for t < log (q) Exercise (a) Show that the moment generating function of the random variable X n is M (t) = q + pet for t 2 <. (b) Show that the moment generating function of the random variable X t M (t) = e (e 1) for t 2 <. Binomial(n; p) Poisson( ) is If the moment generating function of random variable X exists then the following theorem gives us a method for determining the distribution of the random variable Y = aX + b which is a linear function of X. 2.10.4 Theorem - Moment Generating Function of a Linear Function Suppose the random variable X has moment generating function MX (t) de…ned for t 2 ( h; h) for some h > 0. Let Y = aX + b where a; b 2 R and a 6= 0. Then the moment generating function of Y is MY (t) = ebt MX (at) for jtj < h jaj Proof MY (t) = E etY = E et(aX+b) = ebt E eatX which exists for jatj < h h = ebt MX (at) for jtj < jaj as required. 2.10.5 Example (a) Find the moment generating function of Z N(0; 1). (b) Use (a) and the fact that X = + Z of a N( ; 2 ) random variable. 2) N( ; to …nd the moment generating function 2.10. MOMENT GENERATING FUNCTIONS 47 Solution (a) The moment generating function of Z is MZ (t) = Z1 1 etz p e 2 = Z1 2 1 p e (z 2 1 1 t2 =2 = e Z1 1 t2 =2 = e z 2 =2 dz 2tz )=2 1 p e 2 dz (z t)2 =2 dz for t 2 < by 2.4.4(2) since 1 p e 2 (z t)=2 is the probability density function of a N (t; 1) random variable. (b) By Theorem 2.10.4 the moment generating function of X = + Z is MX (t) = e t MZ ( t) = e t e( = e 2.10.6 t+ t)2 =2 2 t2 =2 for t 2 < Exercise If X Negative Binomial(k; p) then …nd the moment generating function of Y = X + k, k = 1; 2; : : : 2.10.7 Theorem - Moments from Moment Generating Function Suppose the random variable X has moment generating function M (t) de…ned for t 2 ( h; h) for some h > 0. Then M (0) = 1 and M (k) (0) = E(X k ) for k = 1; 2; : : : where M (k) (t) = is the kth derivative of M (t). dk M (t) dtk 48 2. UNIVARIATE RANDOM VARIABLES Proof (Continuous Case) Note that M (0) = E(X 0 ) = E (1) = 1 and also that dk tx e = xk etx k = 1; 2; : : : (2.6) dtk The result (2.6) can be proved by induction. Now if X is a continuous random variable with moment generating function M (t) de…ned for t 2 ( h; h) for some h > 0 then M (k) (t) = = = dk E etX dtk Z1 dk etx f (x) dx dtk Z1 1 dk tx e f (x) dx dtk k = 1; 2; : : : 1 assuming the operations of di¤erentiation and integration can be exchanged. (This interchange of operations cannot always be done but for the moment generating functions of interest in this course the result does hold.) Using (2.6) we have M (k) (t) = Z1 xk etx f (x) dx 1 = E X k etX t 2 ( h; h) for some h > 0 Letting t = 0 we obtain M (k) (0) = E X k k = 1; 2; : : : as required. 2.10.8 Example If X Gamma( ; ) then M (t) = (1 Theorem 2.10.7. t) , t < 1= . Find E X k , k = 1; 2; : : : using Solution d d M (t) = M 0 (t) = (1 dt dt = (1 t) t) 1 2.10. MOMENT GENERATING FUNCTIONS 49 so E (X) = M 0 (0) = d2 d2 00 M (t) = M (t) = (1 dt2 dt2 = ( + 1) 2 (1 t) 2 t) so E X 2 = M 00 (0) = ( + 1) 2 Continuing in this manner we have dk dk (k) M (t) = M (t) = (1 dtk dtk = ( + 1) ( +k t) 1) k (1 t) k for k = 1; 2; : : : so E Xk = M (k) (0) = 2.10.9 ( + 1) ( +k 1) k =( +k 1)(k) k for k = 1; 2; : : : Important Idea Suppose M (k) (t), k = 1; 2; : : : exists for t 2 ( h; h) for some h > 0, then M (t) has a Maclaurin series given by 1 M (k) (0) P tk k! k=0 where M (0) (0) = M (0) = 1 The coe¢ cient of tk in this power series is equal to M (k) (0) E(X k ) = k! k! Therefore if we can obtain a Maclaurin series for M (t), for example, by using the Binomial series or the Exponential series, then we can …nd E(X k ) by using E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t) (2.7) 50 2.10.10 Suppose X M (t) = (1 2. UNIVARIATE RANDOM VARIABLES Example Gamma( ; ). Find E X k t) , t < 1= . by using the Binomial series expansion for Solution M (t) = (1 1 P = = t) k k=0 1 P k k=0 ( t)k ( )k tk for jtj < 1 for jtj < by 2.11.3(2) 1 The coe¢ cient of tk in M (t) is )k ( k for k = 1; 2; : : : Therefore E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t) = k! k ( )k )(k) = k! ( )k k! = ( )( 1) ( ( = ( +k 1) ( + k = ( +k 1)(k) k k + 2) ( 2) ( + 1) ( ) k + 1) ( )k k for k = 1; 2; : : : which is the same result as obtained in Example 2.10.8. Moment generating functions are particularly useful for …nding distributions of sums of independent random variables. The following theorem plays an important role in this technique. 2.10.11 Uniqueness Theorem for Moment Generating Functions Suppose the random variable X has moment generating function MX (t) and the random variable Y has moment generating function MY (t). Suppose also that MX (t) = MY (t) for all t 2 ( h; h) for some h > 0. Then X and Y have the same distribution, that is, P (X s) = FX (s) = FY (s) = P (Y s) for all s 2 <. Proof See Problem 18 for the proof of this result in the discrete case. 2.10. MOMENT GENERATING FUNCTIONS 2.10.12 If X 51 Example Exponential(1) then …nd the distribution of Y = + X where > 0 and 2 <. Solution From Example 2.4.10 we know that if X MX (t) = Exponential(1) then X 1 1 Gamma(1; 1) so for t < 1 t By Theorem 2.10.4 MY (t) = e t MX ( t) for t < 1 e t 1 = for t < 1 t By examining the list of moment generating functions in Chapter 11 we see that this is the moment generating function of a Two Parameter Exponential( ; ) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a Two Parameter Exponential( ; ) distribution. 2.10.13 If X Example Gamma( ; ), where is a positive integer, then show 2X 2 (2 ). Solution (a) From Example 2.10.2 the moment generating function of X is 1 M (t) = 1 for t < t 1 By Theorem 2.10.4 the moment generating function of Y = 2 MY (t) = MX 2 = 4 = t 1 2 1 1 1 2t t 5 2 for jtj < for jtj < is 1 for jtj < 3 2X ( ) 1 2 1 2 By examining the list of moment generating functions in Chapter 11 we see that this is the moment generating function of a 2 (2 ) random variable if is a positive integer. Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a 2 (2 ) distribution if is a positive integer. 52 2.10.14 2. UNIVARIATE RANDOM VARIABLES Exercise Suppose the random variable X has moment generating function 2 =2 M (t) = et for t 2 < (a) Use (2.7) and the Exponential series 2.11.7 to …nd E (X) and V ar (X). (b) Find the moment generating function of Y = 2X 2.11 Calculus Review 2.11.1 Geometric Series 1 P atx = a + at + at2 + = x=0 2.11.2 1. What is the distribution of Y ? a 1 for jtj < 1 t Useful Results (1) 1 P 1 tx = 1 x=0 for jtj < 1 t (2) 1 P xtx 1 = x=1 2.11.3 1 t)2 (1 for jtj < 1 Binomial Series (1) For n 2 Z + (the positive integers) (a + b)n = n P n x n a b x x=0 where n x = x n! n(x) = x! (n x)! x! (2) For n 2 Q (the rational numbers) and jtj < 1 (1 + t)n = 1 P x=0 where n x = n(x) n(n = x! 1) n x t x (n x! x + 1) 2.11. CALCULUS REVIEW 2.11.4 53 Important Identities n x (1) x(k) (2) 2.11.5 x+k x 1 = n(k) n x x+k 1 k 1 = k k = ( 1)x k x Multinomial Theorem If n is a positive integer and a1 ; a2 ; : : : ; ak are real numbers, then (a1 + a2 + + ak )n = PP P n! ax1 ax2 x1 !x2 ! xk ! 1 2 axk k where the summation extends over all non-negative integers x1 ; x2 ; : : : ; xk with x1 + x2 + + xk = n. 2.11.6 Hypergeometric Identity 1 P x=0 2.11.7 b n x x2 + + 1! 2! x a+b n = 1 xn P n=0 n! for x 2 < Logarithmic Series ln (1 + x) = log (1 + x) = x 2.11.9 = Exponential Series ex = 1 + 2.11.8 a x x2 x3 + 2 3 for 1<x First Fundamental Theorem of Calculus (FTC1) If f is continuous on [a; b] then the function g de…ned by g (x) = Zx f (t) dt for a x b a is continuous on [a; b], di¤erentiable on (a; b) and g 0 (x) = f (x). 1 54 2. UNIVARIATE RANDOM VARIABLES 2.11.10 Fundamental Theorem of Calculus and the Chain Rule Suppose we want the derivative with respect to x of G (x) where G (x) = h(x) Z f (t) dt for a x b a and h (x) is a di¤erentiable function on [a; b]. If we de…ne g (u) = Zu f (t) dt a then G (x) = g (h (x)). Then by the Chain Rule G0 (x) = g 0 (h (x)) h0 (x) = f (h (x)) h0 (x) 2.11.11 (a) If Rb for a < x < b Improper Integrals f (x) dx exists for every number b a then a Z1 f (x) dx = lim b!1 a Zb f (x) dx a provided this limit exists. If the limit exists we say the improper integral converges otherwise we say the improper integral diverges. (b) If Rb f (x) dx exists for every number a b then a Zb f (x) dx = lim a! 1 1 Zb f (x) dx a provided this limit exists. (c) If both R1 a f (x) dx and Ra f (x) dx are convergent then we de…ne 1 Z1 1 where a is any real number. f (x) dx = Za 1 f (x) dx + Z1 a f (x) dx 2.11. CALCULUS REVIEW 2.11.12 55 Comparison Test for Improper Integrals Suppose that f and g are continuous functions with f (x) g (x) 0 for x (a) Z1 Z1 If f (x) dx is convergent then g (x) dx is convergent. a (b) If a Z1 g (x) dx is divergent then a 2.11.13 a. Z1 f (x) dx is divergent. a Useful Result for Comparison Test Z1 1 dx converges if and only if p > 1 xp (2.8) 1 2.11.14 Useful Inequalities 1 1 + yp 1 1 + yp 2.11.15 yp 1 yp for y 1 1 = p p +y 2y 1, p > 0 for y 1, p > 0 Taylor’s Theorem Suppose f is a real-valued function such that the derivatives f (1) ; f (2) ; : : : ; f (n+1) all exist on an open interval containing the point x = a. Then f (x) = f (a)+f (1) (a) (x a)+ for some c between a and x. f (2) (a) (x 2! a)2 + + f (n) (a) (x n! a)n + f (n+1) (c) (x (n + 1)! a)n+1 56 2. UNIVARIATE RANDOM VARIABLES 2.12 Chapter 2 Problems 1. Consider the following functions: (a) f (x) = kx x for x = 1; 2; : : :; 0 < < 1 h i 1 (b) f (x) = k 1 + (x= )2 for x 2 <, >0 (c) f (x) = ke jx (d) f (x) = k (1 (e) f (x) = x) kx2 e (f) f (x) = kx j x for x 2 <, 2< for x > 0, >0 for x >0 for 0 < x < 1, ( +1) (g) f (x) = ke x= (h) f (x) = kx 3 e 1=( x) 1+e x) x x= 2 for x 2 <, for x > 0, (i) f (x) = k (1 + x) (j) f (x) = k (1 1, for x > 0, 1 >0 >0 >0 >1 for 0 < x < 1, >0 In each case: (1) Determine k so that f (x) is a probability (density) function and sketch f (x). (2) Let X be a random variable with probability (density) function f (x). Find the cumulative distribution function of X. (3) Find E(X) and V ar(X) using the probability (density) function. Indicate the values of for which E(X) and V ar(X) exist. (4) Find P (0:5 < X 2) and P (X > 0:5jX 2). In (a) use = 0:3, in (b) use = 1, in (c) use = 0, in (d) use = 5, in (e) use = 1, in (f ) use = 1, in (g) use = 2, in (h) use = 1, in (i) use = 2, in (j) use = 3. 2. Determine if is a location parameter, a scale parameter, or neither for the distributions in (b) (j) of Problem 1. 3. (a) If X Weibull(2; ) then show is a scale parameter for this distribution. (b) If X Uniform(0; ) then show is a scale parameter for this distribution. 4. Suppose X is a continuous random variable with probability density function 8 <ke (x )2 =2 jx j c f (x) = 2 :ke cjx j+c =2 jx j>c (a) Show that where p 2 1 2 = e c =2 + 2 [2 (c) 1] k c is the N(0; 1) cumulative distribution function. 2.12. CHAPTER 2 PROBLEMS 57 (b) Find the cumulative distribution function of X, E (X) and V ar (X). (c) Show that is a location parameter for this distribution. (d) On the same graph sketch f (x) for c = 1, = 0, f (x) for c = 2, N(0; 1) probability density function. What do you notice? = 0 and the 5. The Geometric and Exponential distributions both have a property referred to as the memoryless property. (a) Suppose X Geometric(p). Show that P (X k + jjX k) = P (X j) where k and j are nonnegative integers. Explain why this is called the memoryless property. (b) Show that if Y Exponential( ) then P (Y a > 0 and b > 0. a + bjY a) = P (Y b) where 6. Suppose that f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support sets A1 ; A2 ; : : : ; Ak ; means 1 ; 2 ; : : : ; k ; and …nite variances 21 ; 22 ; : : : ; 2k respeck P tively. Suppose that 0 < p1 ; p2 ; : : : ; pk < 1 and pi = 1. i=1 (a) Show that g (x) = k P pi fi (x) is a probability density function. i=1 (b) Let X be a random variable with probability density function g (x). Find the support set of X, E (X) and V ar (X). 7. (a) If X Gamma( ; ) then …nd the probability density function of Y = eX . (b) If X Gamma( ; ) then show Y = 1=X (c) If X Gamma(k; ) then show Y = 2X= Inverse Gamma( ; ). 2 (2k) for k = 1; 2; : : :. then …nd the probability density function of Y = eX . then …nd the probability density function of Y = X (d) If X N ; 2 (e) If X N ; 2 (f) If X Uniform( (g) If X Pareto( ; ) then show that Y = (h) If X Weibull(2; ) then show that Y = X 2 2; 2) then show that Y = tan X log(X= ) 1. Cauchy(1; 0). Exponential(1). Exponential( 2 ). (i) If X Double Exponential(0; 1) then …nd the probability density function of 2 Y =X . (j) If X t(k) then show that Y = X 2 F(1; k). 58 2. UNIVARIATE RANDOM VARIABLES 8. Suppose T t(n). (a) Show that E (T ) = 0 if n > 1. (b) Show that V ar(T ) = n= (n 9. Suppose X 2) if n > 2. Beta(a; b). (a) Find E X k for k = 1; 2; : : :. Use this result to …nd E (X) and V ar (X). (b) Graph the probability density function for (i) a = 0:7, b = 0:7, (ii) a = 1, b = 3, (iii) a = 2, b = 2, (iv) a = 2, b = 4, and (v) a = 3, b = 1 on the same graph. (c) What special probability density function is obtained for a = b = 1? 10. If E(jXjk ) exists for some integer k > 1, then show that E(jXjj ) exists for j = 1; 2; : : : ; k 1. 11. If X Binomial(n; ), …nd the variance stabilizing transformation g (X) such that V ar [g (X)] is approximately constant. 12. Prove that for any random variable X, 1 P 4 E X4 X2 1 2 13. For each of the following probability (density) functions derive the moment generating function M (t). State the values for which M (t) exists and use the moment generating function to …nd the mean and variance. (a) f (x) = (b) f (x) = n x x p (1 xe =x! (c) f (x) = 1 e (x )= (d) f (x) = 21 e jx j p)n x for x = 0; 1; : : : ; n; 0 < p < 1 for x = 0; 1; : : :; for x > ; for x 2 <; 2 <, >0 >0 2< (e) f (x) = 2x for 0 < x < 1 8 > 0 x 1 > <x (f) f (x) = 2 > > :0 x 1<x 2 otherwise 14. Suppose X is a random variable with moment generating function M (t) = E(etX ) which exists for t 2 ( h; h) for some h > 0. Then K(t) = log M (t) is called the cumulant generating function of X. (a) Show that E(X) = K 0 (0) and V ar(X) = K 00 (0). 2.12. CHAPTER 2 PROBLEMS (b) If X 59 Negative Binomial(k; p) then use (a) to …nd E(X) and V ar(X). 15. For each of the following …nd the Maclaurin series for M (t) using known series. Thus determine all the moments of X if X is a random variable with moment generating function M (t): (a) M (t) = (1 t) 3 (b) M (t) = (1 + t)=(1 (c) M (t) = et =(1 16. Suppose Z for jtj < 1 t) for jtj < 1 t2 ) for jtj < 1 N(0; 1) and Y = jZj. 2 (a) Show that MY (t) = 2 (t) et =2 for t 2 < where ution function of a N(0; 1) random variable. (t) is the cumulative distrib- (b) Use (a) to …nd E (jZj) and V ar (jZj). 2 and Z 17. Suppose X N(0; 1). Use the properties of moment generating func(1) k tions to compute E X and E Z k for k = 1; 2; : : : How are these two related? Is this what you expected? 18. Suppose X and Y are discrete random variables such that P (X = j) = pj and P (Y = j) = qj for j = 0; 1; : : :. Suppose also that MX (t) = MY (t) for t 2 ( h; h), h > 0. Show that X and Y have the same distribution. (Hint: Compare MX (log s) and MY (log s) and recall that if two power series are equal then their coe¢ cients are equal.) 19. Suppose X is a random variable with moment generating function M (t) = et =(1 for jtj < 1. (a) Find the moment generating function of Y = (X 1) =2. (b) Use the moment generating function of Y to …nd E (Y ) and V ar (Y ). (c) What is the distribution of Y ? t2 ) 60 2. UNIVARIATE RANDOM VARIABLES 3. Multivariate Random Variables Models for real phenomena usually involve more than a single random variable. When there are multiple random variables associated with an experiment or process we usually denote them as X; Y; : : : or as X1 ; X2 ; : : : . For example, your …nal mark in a course might involve X1 = your assignment mark, X2 = your midterm test mark, and X3 = your exam mark. We need to extend the ideas introduced in Chapter 2 for univariate random variables to deal with multivariate random variables. In Section 3:1 we began be de…ning the joint and marginal cumulative distribution functions since these de…nitions hold regardless of what type of random variable we have. We de…ne these functions in the case of two random variables X and Y . More than two random variables will be considered in speci…c examples in later sections. In Section 3:2 we brie‡y review discrete joint probability functions and marginal probability functions that were introduced in a previous probability course. In Section 3:3 we introduce the ideas needed for two continuous random variables and look at detailed examples since this is new material. In Section 3:4 we de…ne independence for two random variables and show how the Factorization Theorem for Independence can be used. When two random variables are not independent then we are interested in conditional distributions. In Section 3:5 we review the de…nition of a conditional probability function for discrete random variables and de…ne a conditional probability density function for continuous random variables which is new material. In Section 3:6 we review expectations of functions of discrete random variables. We also de…ne expectations of functions of continuous random variables which is new material except for the case of Normal random variables. In Section 3:7 we de…ne conditional expectations which arise from the conditional distributions discussed in Section 3:5. In Section 3:8 we discuss moment generating functions for two or more random variables, and show how the Factorization Theorem for Moment Generating Functions can be used to prove that random variables are independent. In Section 3:9 we review the Multinomial distribution and its properties. In Section 3:10 we introduce the very important Bivariate Normal distribution and its properties. Section 3:11 contains some useful results related to evaluating double integrals. 61 62 3. MULTIVARIATE RANDOM VARIABLES 3.1 Joint and Marginal Cumulative Distribution Functions We begin with the de…nitions and properties of the cumulative distribution functions associated with two random variables. 3.1.1 De…nition - Joint Cumulative Distribution Function Suppose X and Y are random variables de…ned on a sample space S. The joint cumulative distribution function of X and Y is given by F (x; y) = P (X 3.1.2 x; Y y) for (x; y) 2 <2 Properties - Joint Cumulative Distribution Function (1) F is non-decreasing in x for …xed y (2) F is non-decreasing in y for …xed x (3) lim F (x; y) = 0 and lim F (x; y) = 0 (4) x! 1 lim (x;y)!( 1; 1) 3.1.3 y! 1 F (x; y) = 0 and lim (x;y)!(1;1) F (x; y) = 1 De…nition - Marginal Distribution Function The marginal cumulative distribution function of X is given by F1 (x) = lim F (x; y) y!1 = P (X x) for x 2 < The marginal cumulative distribution function of Y is given by F2 (y) = lim F (x; y) x!1 = P (Y y) for y 2 < Note: The de…nitions and properties of the joint cumulative distribution function and the marginal cumulative distribution functions hold for both (X; Y ) discrete random variables and for (X; Y ) continuous random variables. Joint and marginal cumulative distribution functions for discrete random variables are not very convenient for determining probabilities. Joint and marginal probability functions, which are de…ned in Section 3.2, are more frequently used for discrete random variables. In Section 3.3 we look at speci…c examples of joint and marginal cumulative distribution functions for continuous random variables. In Chapter 5 we will see the important role of cumulative distribution functions in determining asymptotic distributions. 3.2. BIVARIATE DISCRETE DISTRIBUTIONS 3.2 63 Bivariate Discrete Distributions Suppose X and Y are random variables de…ned on a sample space S. If there is a countable subset A <2 such that P [(X; Y ) 2 A] = 1, then X and Y are discrete random variables. Probabilities for discrete random variables are most easily handled in terms of joint probability functions. 3.2.1 De…nition - Joint Probability Function Suppose X and Y are discrete random variables. The joint probability function of X and Y is given by f (x; y) = P (X = x; Y = y) for (x; y) 2 <2 The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ). 3.2.2 Properties of Joint Probability Function (1) f (x; y) 0 for (x; y) 2 <2 P P f (x; y) = 1 (2) (x;y)2 A (3) For any set R 3.2.3 <2 P [(X; Y ) 2 R] = P P f (x; y) (x;y)2 R De…nition - Marginal Probability Function Suppose X and Y are discrete random variables with joint probability function f (x; y). The marginal probability function of X is given by f1 (x) = P (X = x) P = f (x; y) for x 2 < all y and the marginal probability function of Y is given by f2 (y) = P (Y = y) P = f (x; y) for y 2 < all x 3.2.4 Example In a fourth year statistics course there are 10 actuarial science students, 9 statistics students and 6 math business students. Five students are selected at random without replacement. Let X be the number of actuarial science students selected, and let Y be the number of statistics students selected. 64 3. MULTIVARIATE RANDOM VARIABLES Find (a) the joint probability function of X and Y (b) the marginal probability function of X (c) the marginal probability function of Y (d) P (X > Y ) Solution (a) The joint probability function of X and Y is f (x; y) = P (X = x; Y = y) 10 9 6 x y 5 x y = 25 5 for x = 0; 1; : : : ; 5, y = 0; 1; : : : ; 5, x + y 5 (b) The marginal probability function of X is f1 (x) = P (X = x) 10 9 1 X x y = y=0 = = 10 x 10 x 6 x 5 25 5 15 5 25 5 x 1 X y=0 9 y y 6 x 5 y 15 5 x 15 5 25 5 x for x = 0; 1; : : : ; 5 by the Hypergeometric identity 2.11.6. Note that the marginal probability function of X is Hypergeometric(25; 10; 5). This makes sense because, when we are only interested in the number of actuarial science students, we only have two types of objects (actuarial science students and non-actuarial science students) and we are sampling without replacement which gives us the familiar Hypergeometric probability function. 3.2. BIVARIATE DISCRETE DISTRIBUTIONS 65 (c) The marginal probability function of Y is f2 (y) = P (Y = y) 10 9 1 X x y = x=0 9 y = 9 y = 5 25 5 16 5 y 25 5 16 5 y 25 5 1 X x=0 6 x 10 x y 6 x 5 16 5 y y for y = 0; 1; : : : ; 5 by the Hypergeometric identity 2.11.6. The marginal probability function of Y is Hypergeometric(25; 9; 5). (d) P (X > Y ) = P P f (x; y) (x;y): x>y = f (1; 0) + f (2; 0) + f (3; 0) + f (4; 0) + f (5; 0) +f (2; 1) + f (3; 1) + f (4; 1) +f (3; 2) 3.2.5 Exercise The Hardy-Weinberg law of genetics states that, under certain conditions, the relative frequencies with which three genotypes AA; Aa and aa occur in the population will be 2 ; 2 (1 ) and (1 )2 respectively where 0 < < 1. Suppose n members of a very large population are selected at random. Let X be the number of AA types selected and let Y be the number of Aa types selected. Find (a) the joint probability function of X and Y (b) the marginal probability function of X (c) the marginal probability function of Y (d) P (X + Y = t) for t = 0; 1; : : :. 66 3.3 3. MULTIVARIATE RANDOM VARIABLES Bivariate Continuous Distributions Probabilities for continuous random variables can also be specifed in terms of joint probability density functions. 3.3.1 De…nition - Joint Probability Density Function Suppose that F (x; y) is a continuous function and that f (x; y) = @2 F (x; y) @x@y exists and is a continuous function except possibly along a …nite number of curves. Suppose also that Z1 Z1 f (x; y)dxdy = 1 1 1 Then X and Y are said to be continuous random variables with joint probability density function f . The set A = f(x; y) : f (x; y) > 0g is called the support set of (X; Y ). Note: We will arbitrarily de…ne f (x; y) to be equal to 0 when although we could de…ne it to be any real number. 3.3.2 @2 @x@y F (x; y) does not exist Properties - Joint Probability Density Function (1) 0 for all (x; y) 2 <2 f (x; y) (2) P [(X; Y ) 2 R] = ZZ f (x; y)dxdy for R <2 R = the volume under the surface z = f (x; y) and above the region R in the xy 3.3.3 plane Example Suppose X and Y are continuous random variables with joint probability density function f (x; y) = x + y for 0 < x < 1; 0 < y < 1 and 0 otherwise. The support set of (X; Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g. 3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 67 The joint probability function for (x; y) 2 A is graphed in Figure 3.1. We notice that the surface is the portion of the plane z = x + y lying above the region A. 2 1.5 1 0.5 0 1 1 0.8 0.5 0.6 0.4 0.2 0 0 Figure 3.1: Graph of joint probability density function for Example 3.3.3 (a) Show that Z1 Z1 1 (b) Find (i) P X 1 3; Y (ii) P (X Y) 1 2 1 2 (iii) P X + Y (iv) P XY 1 2 . 1 f (x; y)dxdy = 1 68 3. MULTIVARIATE RANDOM VARIABLES Solution (a) A graph of the support set for (X; Y ) is given in Figure 3.2. Such a graph is useful for determining the limits of integration of the double integral. 1 y A 0 1 x Figure 3.2: Graph of the support set of (X; Y ) for Example 3.3.3 Z1 Z1 1 f (x; y)dxdy = 1 Z Z (x + y) dxdy (x;y) 2A = Z1 Z1 (x + y) dxdy = Z1 1 2 x + xy j10 dy 2 = Z1 0 0 0 1 + y dy 2 0 1 1 y + y 2 j10 2 2 1 1 = + 2 2 = 1 = 3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 69 (b) (i) A graph of the region of integration is given in Figure 3.3 y 1 1/2 B 1/3 0 x 1 Figure 3.3: Graph of the integration region for Example 3.3.3(b)(i) P X 1 ;Y 3 1 2 = Z Z (x + y) dxdy (x;y) 2B Z1=2Z1=3 (x + y) dxdy = 0 = 0 Z1=2 1 2 1=3 x + xy j0 dy 2 0 = Z1=2" 1 2 1 3 2 0 = = = # 1 + y dy 3 1 1 1=2 y + y 2 j0 18 6 1 18 5 72 1 2 + 1 6 1 2 2 70 3. MULTIVARIATE RANDOM VARIABLES (ii) A graph of the region of integration is given in Figure 3.4. Note that when the region is not rectangular then care must be taken with the limits of integration. y y=x 1 C (y,y) x 0 1 Figure 3.4: Graph of the integration region for Example 3.3.3(b)(ii) P (X Z Y)= = Z (x + y) dxdy (x;y) 2C Z1 Zy (x + y) dxdy y=0 x=0 = Z1 1 2 x + xy jy0 dy 2 0 = Z1 1 2 y + y 2 dy 2 Z1 3 2 1 y dy = y 3 j10 2 2 0 = 0 = 1 2 Alternatively P (X Y)= Z1 Z1 (x + y) dydx x=0 y=x Why does the answer of 1=2 make sense when you look at Figure 3.1? 3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 71 (iii) A graph of the region of integration is given in Figure 3.5. y 1 1/2 (x,1/2-x) D 1/2 0 x 1 x+y=1/2 Figure 3.5: Graph of the region of integration for Example 3.3.3(b)(iii) P 1 2 X +Y = Z Z (x + y) dxdy (x;y) 2D 1 = 1 2 Z2 Z x (x + y) dydx x=0 y=0 1 = Z2 1 1 xy + y 2 j02 2 x dx 0 Z2 " 1 = 1 2 x 1 x + 2 1 2 2 x 0 # dx 1 = Z2 x2 1 + 2 8 dx = x3 1 1=2 + x j0 6 8 0 = 2 48 Alternatively 1 P X +Y 1 2 = 1 2 Z2 Z y (x + y) dxdy y=0 x=0 Why does this small probability make sense when you look at Figure 3.1? 72 3. MULTIVARIATE RANDOM VARIABLES (iv) A graph of the region of integration E is given in Figure 3.6. In this example the integration can be done more easily by integrating over the region F . y 1 (x,1/(2x)) F 1/2 xy=1/2 E x | 1/2 0 1 Figure 3.6: Graph of the region of integration for Example 3.3.3(b)(iv) P XY 1 2 = Z Z (x;y) 2E = 1 Z (x + y) dxdy Z (x + y) dxdy (x;y) 2F = 1 Z1 Z1 Z1 1 xy + y 2 j11 dx 2x 2 (x + y) dydx 1 x= 12 y= 2x = 1 1 2 = 1 Z1 ( 1 x+ 2 1 2 = 1 Z1 " x 1 8x2 dx 1 2 1 x + 2 8x j11 x 1 2 = 1 = 3 4 2 1 2x 1 + 2 1 2x 2 #) dx 3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 3.3.4 73 De…nition of Marginal Probability Density Function Suppose X and Y are continuous random variables with joint probability density function f (x; y). The marginal probability density function of X is given by f1 (x) = Z1 f (x; y)dy 1 for x 2 < and the marginal probability density function of Y is given by f2 (y) = Z1 1 3.3.5 f (x; y)dx for y 2 < Example For the joint probability density function in Example 3.3.3 determine: (a) the marginal probability density function of X and the marginal probability density function of Y (b) the joint cumulative distribution function of X and Y (c) the marginal cumulative distribution function of X and the marginal cumulative distribution function of Y Solution (a) The marginal probability density function of X is f1 (x) = Z1 f (x; y)dy = 1 Z1 1 1 xy + y 2 j10 = x + 2 2 (x + y) dy = for 0 < x < 1 0 and 0 otherwise. Since both the joint probability density function f (x; y) and the support set A are symmetric in x and y then by symmetry the marginal probability density function of Y is f2 (y) = y + 1 2 for 0 < y < 1 and 0 otherwise. (b) Since P (X x; Y y) = Zy Zx 0 = = (s + t) dsdt = 0 Zy 1 2 s + st jx0 dt = 2 0 1 2 1 x t + xt2 jy0 2 2 1 2 x y + xy 2 2 for 0 < x < 1, 0 < y < 1 Zy 0 1 2 x + xt dt 2 74 3. MULTIVARIATE RANDOM VARIABLES P (X x; Y y) = Zx Z1 (s + t) dtds = 1 2 x +x 2 for 0 < x < 1, y y) = Zy Z1 (s + t) dsdt = 1 2 y +y 2 for x 0 P (X x; Y 0 0 1, 0 < y < 1 0 the joint cumulative distribution function of X and Y is F (x; y) = P (X x; Y y) = 8 > > > > > > > < > > > > > > > : 0 1 2 x2 y 1 2 1 2 + x xy 2 0 or y 0 0 < x < 1, 0 < y < 1 x2 + x 0 < x < 1, y y2 x +y 1 1 1, 0 < y < 1 x 1, y 1 (c) Since the support set of (X; Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g then F1 (x) = P (X = x) lim F (x; y) = lim P (X y!1 x; Y y!1 = F (x; 1) 1 2 x +x = 2 y) for 0 < x < 1 Alternatively F1 (x) = P (X = 1 x) = 1 2 x +x 2 Zx f1 (s) ds = 1 Zx s+ 1 2 0 for 0 < x < 1 In either case the marginal cumulative distribution function of X is 8 0 x 0 > > < 1 2 0<x<1 F1 (x) = 2 x +x > > : 1 x 1 By symmetry the marginal cumulative distribution function of Y is 8 0 y 0 > > < 1 2 0<y<1 F2 (y) = 2 y +y > > : 1 y 1 ds 3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 3.3.6 75 Exercise Suppose X and Y are continuous random variables with joint probability density function f (x; y) = k (1 + x + y)3 for 0 < x < 1; 0 < y < 1 and 0 otherwise. (a) Determine k and sketch f (x; y). (b) Find (i) P (X 1; Y (ii) P (X Y ) (iii) P (X + Y 2) 1) (c) Determine the marginal probability density function of X and the marginal probability density function of Y . (d) Determine the joint cumulative distribution function of X and Y . (e) Determine the marginal cumulative distribution function of X and the marginal cumulative distribution function of Y . 3.3.7 Exercise Suppose X and Y are continuous random variables with joint probability density function f (x; y) = ke x y for 0 < x < y < 1 and 0 otherwise. (a) Determine k and sketch f (x; y). (b) Find (i) P (X 1; Y (ii) P (X Y ) (iii) P (X + Y 2) 1) (c) Determine the marginal probability density function of X and the marginal probability density function of Y . (d) Determine the joint cumulative distribution function of X and Y . (e) Determine the marginal cumulative distribution function of X and the marginal cumulative distribution function of Y . 76 3.4 3. MULTIVARIATE RANDOM VARIABLES Independent Random Variables Suppose we are modeling a phenomenon involving two random variables. For example suppose X is your …nal mark in this course and Y is the time you spent doing practice problems. We would be interested in whether the distribution of one random variable a¤ects the distribution of the other. The following de…nition de…nes this idea precisely. The de…nition should remind you of the de…nition of independent events (see De…nition 2.1.8). 3.4.1 De…nition - Independent Random Variables Two random variables X and Y are called independent random variables if and only if P (X 2 A and Y 2 B) = P (X 2 A)P (Y 2 B) for all sets A and B of real numbers. De…nition 3.4.1 is not very convenient for determining the independence of two random variables. The following theorem shows how to use the marginal and joint cumulative distribution functions or the marginal and joint probability (density) functions to determine if two random variables are independent. 3.4.2 Theorem - Independent Random Variables (1) Suppose X and Y are random variables with joint cumulative distribution function F (x; y). Suppose also that F1 (x) is the marginal cumulative distribution function of X and F2 (y) is the marginal cumulative distribution function of Y . Then X and Y are independent random variables if and only if F (x; y) = F1 (x) F2 (y) for all (x; y) 2 <2 (2) Suppose X and Y are random variables with joint probability (density) function f (x; y). Suppose also that f1 (x) is the marginal probability (density) function of X with support set A1 = fx : f1 (x) > 0g and f2 (y) is the marginal probability (density) function of Y with support set A2 = fy : f2 (y) > 0g. Then X and Y are independent random variables if and only if f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2 where A1 A2 = f(x; y) : x 2 A1 ; y 2 A2 g. Proof (1) For given (x; y), let Ax = fs : s xg and let By = ft : t X and Y are independent random variables if and only if yg. Then by De…nition 3.4.1 P (X 2 Ax and Y 2 By ) = P (X 2 Ax ) P (Y 2 By ) 3.4. INDEPENDENT RANDOM VARIABLES 77 for all (x; y) 2 <2 . But P (X 2 Ax and Y 2 By ) = P (X P (X 2 Ax ) = P (X x; Y y) = F (x; y) x) = F1 (x) and P (Y 2 By ) = F2 (y) Therefore X and Y are independent random variables if and only if F (x; y) = F1 (x) F2 (y) for all (x; y) 2 <2 as required. (2) (Continuous Case) From (1) we have X and Y are independent random variables if and only if F (x; y) = F1 (x) F2 (y) for all (x; y) 2 <2 . Now @ @x F1 (x) @2 exists for x 2 A1 and @ @y F2 (y) (3.1) exists for y 2 A2 . Taking the partial derivative @x@y of both sides of (3.1) where the partial derivative exists implies that X and Y are independent random variables if and only if @2 @ @ F (x; y) = F1 (x) F2 (y) @x@y @x @y for all (x; y) 2 A1 A2 or f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2 as required. Note: The discrete case can be proved using an argument similar to the one used for (1). 3.4.3 Example (a) In Example 3.2.4 determine if X and Y are independent random variables. (b) In Example 3.3.3 determine if X and Y are independent random variables. Solution (a) Since the total number of students is …xed, a larger number of actuarial science students would imply a smaller number of statistics students and we would guess that the random variables are not independent. To show this we only need to …nd one pair of values (x; y) 78 3. MULTIVARIATE RANDOM VARIABLES for which P (X = x; Y = y) 6= P (X = x) P (Y = y). Since 10 0 9 6 0 5 25 5 10 15 0 5 = 25 5 9 16 0 5 = 25 5 P (X = 0; Y = 0) = P (X = 0) = P (Y = 0) = 6 5 25 5 = 15 5 25 5 16 5 25 5 and P (X = 0; Y = 0) = 6 5 25 5 6= P (X = 0) P (Y = 0) = 15 5 25 5 16 5 25 5 therefore by Theorem 3.4.2, X and Y are not independent random variables. (b) Since f (x; y) = x + y f1 (x) = x + 1 2 for 0 < x < 1; 0 < y < 1 for 0 < x < 1, f2 (y) = y + 1 2 for 0 < y < 1 it would appear that X and Y are not independent random variables. To show this we only need to …nd one pair of values (x; y) for which f (x; y) 6= f1 (x) f2 (y). Since f 2 1 ; 3 3 = 1 6= f1 2 3 f2 1 3 = 7 6 5 6 therefore by Theorem 3.4.2, X and Y are not independent random variables. 3.4.4 Exercise In Exercises 3.2.5, 3.3.6 and 3.3.7 determine if X and Y independent random variables. In the previous examples we determined whether the random variables were independent using the joint probability (density) function and the marginal probability (density) functions. The following very useful theorem does not require us to determine the marginal probability (density) functions. 3.4. INDEPENDENT RANDOM VARIABLES 3.4.5 79 Factorization Theorem for Independence Suppose X and Y are random variables with joint probability (density) function f (x; y). Suppose also that A is the support set of (X; Y ), A1 is the support set of X, and A2 is the support set of Y . Then X and Y are independent random variables if and only if there exist non-negative functions g (x) and h (y) such that f (x; y) = g (x) h (y) for all (x; y) 2 A1 A2 Notes: (1) If the Factorization Theorem for Independence holds then the marginal probability (density) function of X will be proportional to g and the marginal probability (density) function of Y will be proportional to h. (2) Whenever the support set A is not rectangular the random variables will not be independent. The reason for this is that when the support set is not rectangular it will always be possible to …nd a point (x; y) such that x 2 A1 with f1 (x) > 0, and y 2 A2 with f2 (y) > 0 so that f1 (x) f2 ( y) > 0, but (x; y) 2 = A so f (x; y) = 0. This means there is a point (x; y) such that f (x; y) 6= f1 (x) f2 ( y) and therefore X and Y are not independent random variables. (3) The above de…nitions and theorems can easily be extended to the random vector (X1 ; X2 ; : : : ; Xn ). Proof (Continuous Case) If X and Y are independent random variables then by Theorem 3.4.2 f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2 Letting g (x) = f1 (x) and h (y) = f2 (y) proves there exist g (x) and h (y) such that f (x; y) = g (x) h (y) for all (x; y) 2 A1 A2 If there exist non-negative functions g (x) and h (y) such that f (x; y) = g (x) h (y) then f1 (x) = Z1 f (x; y)dy = Z1 f (x; y)dx = 1 and f2 (y) = 1 for all (x; y) 2 A1 A2 Z g (x) h (y) dy = cg (x) for x 2 A1 Z g (x) h (y) dy = kh (y) for y 2 A2 y2A2 x2A1 80 3. MULTIVARIATE RANDOM VARIABLES Now 1 = Z1 Z1 1 = Z f (x; y)dxdy 1 Z f1 (x) f2 (y) dxdy y2A2 x2A1 = Z Z cg (x) kh (y) dxdy y2A2 x2A1 = ck Since ck = 1 f (x; y) = g (x) h (y) = ckg (x) h (y) = cg (x) kh (y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2 and by Theorem 3.4.2 X and Y are independent random variables. 3.4.6 Example Suppose X and Y are discrete random variables with joint probability function x+y f (x; y) = e 2 x!y! for x = 0; 1 : : : ; y = 0; 1; : : : (a) Determine if X and Y independent are random variables. (b) Determine the marginal probability function of X and the marginal probability function of Y . Solution (a) The support set of (X; Y ) is A = f(x; y) : x = 0; 1; : : : ; y = 0; 1; : : :g which is rectangular. The support set of X is A1 = fx : x = 0; 1; : : :g, and the support set of Y is A2 = fy : y = 0; 1; : : :g. Let x y e e g (x) = and h (y) = x! y! Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem for Independence X and Y are independent random variables. (b) By inspection we can see that g (x) is the probability function for a Poisson( ) random variable. Therefore the marginal probability function of X is x f1 (x) = e x! for x = 0; 1 : : : 3.4. INDEPENDENT RANDOM VARIABLES 81 Similarly the marginal probability function of Y is y e y! f2 (y) = and Y 3.4.7 for y = 0; 1 : : : Poisson( ). Example Suppose X and Y are continuous random variables with joint probability density function 3 f (x; y) = y 1 2 x2 for 1 < x < 1; 0 < y < 1 and 0 otherwise. (a) Determine if X and Y independent are random variables. (b) Determine the marginal probability function of X and the marginal probability function of Y . Solution (a) The support set of (X; Y ) is A = f(x; y) : 1 < x < 1; 0 < y < 1g which is rectangular. The support set of X is A1 = fx : 1 < x < 1g, and the support set of Y is A2 = fy : 0 < y < 1g. Let 3 g (x) = 1 x2 and h (y) = y 2 Then f (x; y) = g(x)h(y) for all (x; y) 2 A1 A2 . Therefore by the Factorization Theorem for Independence X and Y are independent random variables. (b) Since the marginal probability density function of Y is proportional to h (y) we know f2 (y) = kh (y) for 0 < y < 1 where k is determined by 1=k Z1 3 3k 2 1 3k ydy = y j0 = 2 4 4 0 Therefore k = 4 3 and f2 (y) = 2y for 0 < y < 1 and 0 otherwise. Since X and Y are independent random variables f (x; y) = f1 (x) f2 (y) or f1 (x) = f (x; y)=f2 (y) for x 2 A1 . Therefore the marginal probability density function of X is f1 (x) = = and 0 otherwise. 3 y 1 x2 f (x; y) = 2 f2 (y) 2y 3 1 x2 for 1<x<1 4 82 3.4.8 3. MULTIVARIATE RANDOM VARIABLES Example Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 2 for 0 < x < and 0 otherwise. p 1 y2, 1<y<1 (a) Determine if X and Y independent are random variables. (b) Determine the marginal probability function of X and the marginal probability function of Y . Solution (a) The support set of (X; Y ) which is n p A = (x; y) : 0 < x < 1 y2; 1<y<1 is graphed in Figure 3.7. o y 1 2 x=sqrt(1-y ) A 0 1 x -1 Figure 3.7: Graph of the support set of (X; Y ) for Example 3.4.8 The set A can also be described as n A = (x; y) : 0 < x < 1; p 1 x2 < y < p 1 x2 o The support set A is not rectangular. Note that the surface has constant height on the region A. 3.4. INDEPENDENT RANDOM VARIABLES 83 The support set for X is A1 = fx : 0 < x < 1g and the support set for Y is A2 = fy : 1 < y < 1g If we choose the point (0:9; 0:9) 2 A1 A2 , f (0:9; 0:9) = 0 but f1 (0:9) > 0 and f2 (0:9) > 0 so f (0:9; 0:9) 6= f1 (0:9) f2 (0:9) and therefore X and Y are not independent random variables. (b) When the support set is not rectangular care must be taken to determine the marginal probability functions. To …nd the marginal probability density function of X we use the description of the support set in which the range of X does not depend on y which is n o p p A = (x; y) : 0 < x < 1; 1 x2 < y < 1 x2 The marginal probability density function of X is f1 (x) = Z1 f (x; y)dy 1 p = p = and 0 otherwise. Z1 x2 2 dy 1 x2 4p 1 x2 for 0 < x < 1 To …nd the marginal probability density function of Y which use the description of the support set in which the range of Y does not depend on x which is n o p A = (x; y) : 0 < x < 1 y 2 ; 1 < y < 1 The marginal probability density function of Y is f2 (y) = Z1 f (x; y)dx 1 = p Z1 y2 2 dx 0 = and 0 otherwise. 2p 1 y2 for 1<y<1 84 3.5 3. MULTIVARIATE RANDOM VARIABLES Conditional Distributions In Section 2.1 we de…ned the conditional probability of event A given event B as P (AjB) = P (A \ B) P (B) provided P (B) > 0 The concept of conditional probability can also be extended to random variables. 3.5.1 De…nition - Conditional Probability (Density) Function Suppose X and Y are random variables with joint probability (density) function f (x; y), and marginal probability (density) functions f1 (x) and f2 (y) respectively. Suppose also that the support set of (X; Y ) is A = f(x; y) : f (x; y) > 0g. The conditional probability (density) function of X given Y = y is given by f1 (xjy) = f (x; y) f2 (y) (3.2) for (x; y) 2 A provided f2 (y) 6= 0. The conditional probability (density) function of Y given X = x is given by f2 (yjx) = f (x; y) f1 (x) for (x; y) 2 A provided f1 (x) 6= 0. Notes: (1) If X and Y are discrete random variables then f1 (xjy) = P (X = xjY = y) P (X = x; Y = y) = P (Y = y) f (x; y) = f2 (y) and P x Similarly for f2 (yjx). P f (x; y) x f2 (y) 1 P = f (x; y) f2 (y) x f2 (y) = f2 (y) = 1 f1 (xjy) = (3.3) 3.5. CONDITIONAL DISTRIBUTIONS 85 (2) If X and Y are continuous random variables Z1 Z1 f1 (xjy)dx = 1 1 f (x; y) dx f2 (y) 1 f2 (y) = f2 (y) f2 (y) = 1 Z1 f (x; y)dx 1 = Similarly for f2 (yjx). (3) If X is a continuous random variable then f1 (x) 6= P (X = x) and P (X = x) = 0 for all x. Therefore to justify the de…nition of the conditional probability density function of Y given X = x when X and Y are continuous random variables we consider P (Y yjX = x) as a limit P (Y = lim h!0 yjX = x) = lim P (Y h!0 x+h R Ry x d dh = = lim h!0 lim h!0 1 limh!0 x + h) f (u; v) dvdu f1 (u) du x x+h R Ry f (u; v) dvdu x 1 x+h R by L’Hôpital’s Rule f1 (u) du x f (x + h; v) dv by the Fundamental Theorem of Calculus f1 (x + h) Ry f (x + h; v) dv 1 = = X 1 x+h R d dh Ry yjx limh!0 f1 (x + h) Ry 1 f (x; v) dv f1 (x) assuming that the limits exist and that integration and the limit operation can be interchanged. If we di¤erentiate the last term with respect to y using the Fundamental Theorem of Calculus we have d f (x; y) P (Y yjX = x) = dy f1 (x) 86 3. MULTIVARIATE RANDOM VARIABLES which gives us a justi…cation for using f2 (yjx) = f (x; y) f1 (x) as the conditional probability density function of Y given X = x. (4) For a given value of x, call it x , we can think of obtaining the conditional probability density function of Y given X = x geometrically in the following way. Think of the curve of intersection which is obtained by cutting through the surface z = f (x; y) with the plane x = x which is parallel to the yz plane. The curve of intersection is z = f (x ; y) which is a curve lying in the plane x = x . The area under the curve z = f (x ; y) and lying above the xy plane is not necessarily equal to 1 and therefore z = f (x ; y) is not a proper probability density function. However if we consider the curve z = f (x ; y) =f1 (x ), which is just a rescaled version of the curve z = f (x ; y), then the area lying under the curve z = f (x ; y) =f1 (x ) and lying above the xy plane is equal to 1. This is the probability density function we want. 3.5.2 Example In Example 3.4.8 determine the conditional probability density function of X given Y = y and the conditional probability density function of Y given X = x. Solution The conditional probability density function of X given Y = y is f1 (xjy) = f (x; y) f2 (y) 2 = = p 2 1 y2 p 1 p for 0 < x < 1 1 y2 y2; 1<y<1 Note that for each y 2 ( 1; 1), the conditional probability density function of X given p Y = y is Uniform 0; 1 y 2 . This makes sense because the joint probability density function is constant on its support set. The conditional probability density function of Y given X = x is f2 (yjx) = f (x; y) f1 (x) 2 = = p 4 1 x2 1 p for 2 1 x2 p 1 x2 < y < p 1 x2 ; 0 < x < 1 3.5. CONDITIONAL DISTRIBUTIONS 87 Note that for each x 2 (0; 1), the conditional probability density function of Y given X = x p p is Uniform 1 x2 ; 1 x2 . This again makes sense because the joint probability density function is constant on its support set. 3.5.3 Exercise In Exercise 3.2.5 show that the conditional probability function of Y given X = x is Binomial n x; 2 (1 1 ) 2 Why does this make sense? 3.5.4 Exercise In Example 3.3.3 and Exercises 3.3.6 and 3.3.7 determine the conditional probability density function of X given Y = y and the conditional probability density function of Y given X = x. Be sure to check that Z1 1 f1 (xjy)dx = 1 and Z1 f2 (yjx)dy = 1 1 When choosing a model for bivariate data it is sometimes easier to specify a conditional probability (density) function and a marginal probability (density) function. The joint probability (density) function can then be determined using the Product Rule which is obtained by rewriting (3.2) and (3.3). 3.5.5 Product Rule Suppose X and Y are random variables with joint probability (density) function f (x; y), marginal probability (density) functions f1 (x) and f2 (y) respectively and conditional probability (density) function’s f1 (xjy) and f2 (yjx). Then f (x; y) = f1 (xjy)f2 (y) = f2 (yjx)f1 (x) 3.5.6 Example In modeling survival in a certain insect population it is assumed that the number of eggs laid by a single female follows a Poisson( ) distribution. It is also assumed that each egg has probability p of surviving independently of any other egg. Determine the probability function of the number of eggs that survive. 88 3. MULTIVARIATE RANDOM VARIABLES Solution Let Y = number of eggs laid and let X = number of eggs that survive. Then Y Poisson( ) and XjY = y Binomial(y; p). We want to determine the marginal probability function of X. By the Product Rule the joint probability function of X and Y is f (x; y) = f1 (xjy) f2 (y) y x = p (1 p)y x = px e y! y x (1 x! ye x y p) (y x)! with support set is A = f(x; y) : x = 0; 1; : : : ; y; y = 0; 1; : : :g which can also be written as A = f(x; y) : y = x; x + 1; : : : ; x = 0; 1; : : :g (3.4) The marginal probability function of X can be obtained using f1 (x) = P f (x; y) all y Since we are summing over y we need to use the second description of the support set given in (3.4). So f1 (x) = = = = = 1 px e P x! y=x px e x! (1 x P 1 y=x p)y x (y x)! (1 y p)y x y (y x)! x let u = y x x P 1 [ (1 px e p)]u x! u! u=0 x x p e e (1 p) by the Exponential series 2.11.7 x! (p )x e p for x = 0; 1; : : : x! which we recognize as a Poisson(p ) probability function. 3.5.7 Example Determine the marginal probability function of X if Y distribution of X given Y = y is Weibull(p; y 1=p ). Gamma( ; 1 ) and the conditional 3.5. CONDITIONAL DISTRIBUTIONS 89 Solution Since Y Gamma( ; 1 ) y 1e y f2 (y) = for y > 0 ( ) and 0 otherwise. Since the conditional distribution of X given Y = y is Weibull(p; y f1 (xjy) = pyxp 1 e yxp 1=p ) for x > 0 By the Product Rule the joint probability density function of X and Y is f (x; y) = f1 (xjy) f2 (y) = pyxp p = 1 e 1e y yxp y ( ) xp 1 y e ( ) y( +xp ) The support set is A = f(x; y) : x > 0; y > 0g which is a rectangular region. The marginal probability function of X is f1 (x) = Z1 f (x; y) dy 1 = p xp 1 ( ) Z1 xp 1 ( ) Z1 y e y( +xp ) let u = y ( + xp ) dy 0 = p u + xp e u 1 + xp u e u du 0 = p xp 1 ( ) p xp 1 1 + xp 1 +1 Z du 0 = = ( ) p xp ( + xp ) 1 + xp +1 ( + 1) by 2.4.8 1 +1 since ( + 1) = ( ) for x > 0 and 0 otherwise. This distribution is a member of the Burr family of distributions which is frequently used by actuaries for modeling household income, crop prices, insurance risk, and many other …nancial variables. 90 3. MULTIVARIATE RANDOM VARIABLES The following theorem gives us one more method for determining whether two random variables are independent. 3.5.8 Theorem Suppose X and Y are random variables with marginal probability (density) functions f1 (x) and f2 (y) respectively and conditional probability (density) functions f1 (xjy) and f2 (yjx). Suppose also that A1 is the support set of X, and A2 is the support set of Y . Then X and Y are independent random variables if and only if either of the following holds f1 (xjy) = f1 (x) for all x 2 A1 or f2 (yjx) = f2 (y) for all y 2 A2 3.5.9 Example Suppose the conditional distribution of X given Y = y is f1 (xjy) = e 1 x e y for 0 < x < y and 0 otherwise. Are X and Y independent random variables? Solution Since the conditional distribution of X given Y = y depends on y then f1 (xjy) = f1 (x) cannot hold for all x in the support set of X and therefore X and Y are not independent random variables. 3.6 Joint Expectations As with univariate random variables we de…ne the expectation operator for bivariate random variables. The discrete case is a review of material you would have seen in a previous probability course. 3.6.1 De…nition - Joint Expectation Suppose h(x; y) is a real-valued function. If X and Y are discrete random variables with joint probability function f (x; y) and support set A then P P E[h(X; Y )] = h(x; y)f (x; y) (x;y) 2A provided the joint sum converges absolutely. 3.6. JOINT EXPECTATIONS 91 If X and Y are continuous random variables with joint probability density function f (x; y) then Z1 Z1 E[h(X; Y )] = h(x; y)f (x; y)dxdy 1 1 provided the joint integral converges absolutely. 3.6.2 Theorem Suppose X and Y are random variables with joint probability (density) function f (x; y), a and b are real constants, and g(x; y) and h(x; y) are real-valued functions. Then E[ag(X; Y ) + bh(X; Y )] = aE[g(X; Y )] + bE[h(X; Y )] Proof (Continuous Case) E[ag(X; Y ) + bh(X; Y )] = Z1 Z1 1 = a [ag(x; y) + bh(x; y)] f (x; y)dxdy 1 Z1 Z1 1 g(x; y)f (x; y)dxdy + b 1 Z1 Z1 1 by properties of double integrals h(x; y)f (x; y)dxdy 1 = aE[g(X; Y )] + bE[h(X; Y )] by De…nition 3.6.1 3.6.3 Corollary (1) E(aX + bY ) = aE(X) + bE(Y ) = a where X = E(X) and Y X +b Y = E(Y ). (2) If X1 ; X2 ; : : : ; Xn are random variables and a1 ; a2 ; : : : ; an are real constants then E n P i=1 where i = E(Xi ). ai Xi = n P ai E(Xi ) = i=1 n P i=1 ai i (3) If X1 ; X2 ; : : : ; Xn are random variables with E(Xi ) = , i = 1; 2; : : : ; n then E X = n n 1 P 1 P E (Xi ) = n i=1 n i=1 = n = n 92 3. MULTIVARIATE RANDOM VARIABLES Proof of (1) (Continuous Case) E(aX + bY ) = Z1 Z1 1 = a = a 1 1 Z 1 Z1 (ax + by) f (x; y)dxdy 2 x4 Z1 1 3 f (x; y)dy 5 dx + b xf1 (x) dx + b 1 Z1 3.6.4 1 2 y4 Z1 1 3 f (x; y)dx5 dy yf2 (y) dy 1 = aE(X) + bE(Y ) = a Z1 X +b Y Theorem - Expectation and Independence (1) If X and Y are independent random variables and g(x) and h(y) are real valued functions then E [g (X) h (Y )] = E [g (X)] E [h (Y )] (2) More generally if X1 ; X2 ; : : : ; Xn are independent random variables and h1 ; h2 ; : : : ; hn are real valued functions then E n Q hi (Xi ) = i=1 n Q E [hi (Xi )] i=1 Proof of (1) (Continuous Case) Since X and Y are independent random variables then by Theorem 3.4.2 f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A where A is the support set of (X; Y ). Therefore E [g(X)h(Y )] = Z1 Z1 1 = Z1 1 g(x)h(y)f1 (x) f2 (y) dxdy 1 2 h(y)f2 (y) 4 = E [g (X)] Z1 1 Z1 1 g(x)f1 (x) dx5 dy h(y)f2 (y) dy = E [g (X)] E [h (Y )] 3 3.6. JOINT EXPECTATIONS 3.6.5 93 De…nition - Covariance The covariance of random variables X and Y is de…ned by Cov(X; Y ) = E[(X X )(Y Y )] If Cov(X; Y ) = 0 then X and Y are called uncorrelated random variables. 3.6.6 Theorem - Covariance and Independence If X and Y are random variables then Cov(X; Y ) = E(XY ) X Y If X and Y are independent random variables then Cov(X; Y ) = 0. Proof Cov(X; Y ) = E [(X X ) (Y = E (XY Y )] XY = E(XY ) X X E(Y Y ) + X Y) Y E(X) = E(XY ) E(X)E(Y ) = E(XY ) E(X)E(Y ) + X Y E(Y )E(X) + E(X)E(Y ) Now if X and Y are independent random variables then by Theorem 3.6.4 E (XY ) = E(X)E(Y ) and therefore Cov(X; Y ) = 0. 3.6.7 Theorem - Variance of a Linear Combination (1) Suppose X and Y are random variables and a and b are real constants then V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X; Y ) = a2 2 X + b2 2 Y + 2abCov(X; Y ) (2) Suppose X1 ; X2 ; : : : ; Xn are random variables with V ar(Xi ) = real constants then V ar n P i=1 ai Xi = n P i=1 a2i 2 i +2 nP1 n P 2 i and a1 ; a2 ; : : : ; an are ai aj Cov (Xi ; Xj ) i=1 j=i+1 (3) If X1 ; X2 ; : : : ; Xn are independents random variables and a1 ; a2 ; : : : ; an are real constants then n n P P V ar ai Xi = a2i 2i i=1 i=1 94 3. MULTIVARIATE RANDOM VARIABLES (4) If X1 ; X2 ; : : : ; Xn are independent random variables with V ar(Xi ) = then V ar X = V ar n 1 P n2 i=1 = n 1 P Xi n i=1 2 = 1 n = n 2 n2 2 n P 2, i = 1; 2; : : : ; n V ar (Xi ) i=1 2 = n Proof of (1) h i V ar (aX + bY ) = E (aX + bY a X b Y )2 n o 2 = E [a (X ) + b (Y )] X Y h 2 2 2 = E a2 (X X ) + b (Y Y ) + 2ab (X X ) (Y h i h i 2 2 = a2 E (X + b2 E (Y + 2abE [(X X) Y) = a2 3.6.8 2 X + b2 2 Y i ) Y X ) (Y + 2abCov (X; Y ) De…nition - Correlation Coe¢ cient The correlation coe¢ cient of random variables X and Y is de…ned by (X; Y ) = Cov(X; Y ) X Y 3.6.9 Example For the joint probability density function in Example 3.3.3 …nd (X; Y ). Solution E (XY ) = Z1 Z1 1 = = Z1 0 = 1 Z1 Z1 0 1 3 xyf (x; y)dxdy = Z1 Z1 0 0 x2 y + xy 2 dxdy = Z1 0 1 1 y + y 2 dy = 3 2 xy (x + y) dxdy 1 1 3 x y + x2 y 2 j10 dy 3 2 0 1 2 1 3 1 1 1 y + y j0 = + 6 6 6 6 Y )] 3.6. JOINT EXPECTATIONS E (X) = Z1 95 xf1 (x)dx = 1 = = E X 2 = = 1 x x+ 2 dx = 0 Z1 1 x2 + x dx 2 0 1 3 1 2 1 x + x j0 3 4 1 1 7 + = 3 4 12 Z1 2 x f1 (x)dx = 1 = Z1 Z1 2 x 1 x+ 2 dx = Z1 1 x3 + x2 dx 2 0 0 1 4 1 3 1 x + x j0 4 6 1 1 3+2 5 + = = 4 6 12 12 [E (X)]2 = V ar (X) = E X 2 = 7 12 5 12 2 60 49 11 = 144 144 By symmetry E (Y ) = 7 11 and V ar (Y ) = 12 144 Therefore Cov (X; Y ) = E (XY ) E (X) E (Y ) = 7 12 1 3 7 12 = 48 49 1 = 144 144 and (X; Y ) = = = = 3.6.10 Cov(X; Y ) q X Y 1 144 11 144 11 144 1 144 144 11 1 11 Exercise For the joint probability density function in Exercise 3.3.7 …nd (X; Y ). 96 3.6.11 3. MULTIVARIATE RANDOM VARIABLES Theorem If (X; Y ) is the correlation coe¢ cient of random variables X and Y then 1 (X; Y ) 1 (X; Y ) = 1 if and only if Y = aX + b for some a > 0 and (X; Y ) = Y = aX + b for some a < 0. 1 if and only if Proof Let S = X + tY , where t 2 <. Then E (S) = V ar(S) = E (S S) = E (X = t2 2 Y and 2 = Ef[(X + tY ) = Ef[(X S ( X +t X ) + t(Y 2 X ) + 2t(X + 2Cov(X; Y )t + 2 Y )] g 2 Y )] g X )(Y Y) + t2 (Y Y) 2 2 X Now V ar(S) 0 for any t 2 < implies that the quadratic equation V ar(S) = t2 2Y + 2Cov(X; Y )t + 2X in the variable t must have at most one real root. To have at most one real root the discrimant of this quadratic equation must be less than or equal to zero. Therefore [2Cov(X; Y )]2 4 2X 2Y 0 or [Cov(X; Y )]2 or j (X; Y )j = 2 2 X Y Cov(X; Y ) 1 X Y and therefore 1 (X; Y ) 1 To see that (X; Y ) = 1 corresponds to a linear relationship between X and Y , note that (X; Y ) = 1 implies jCov(X; Y )j = X Y and therefore [2Cov(X; Y )]2 4 2 2 X Y =0 which corresponds to a zero discriminant in the quadratic equation. This means that there exists one real number t for which V ar(S) = V ar(X + t Y ) = 0 But V ar(X + t Y ) = 0 implies X + t Y must equal a constant, that is, X + t Y = c. Thus X and Y satisfy a linear relationship. 3.7. CONDITIONAL EXPECTATION 3.7 97 Conditional Expectation Since conditional probability (density) functions are also probability (density) function, expectations can be de…ned in terms of these conditional probability (density) functions as in the following de…nition. 3.7.1 De…nition - Conditional Expectation The conditional expectation of g(X) given Y = y is given by E [g(X)jy] = P g(x)f1 (xjy) x if Y is a discrete random variable and E[g(X)jy] = Z1 g(x)f1 (xjy)dx 1 if Y is a continuous random variable provided the sum/integral converges absolutely. The conditional expectation of h(Y ) given X = x is de…ned in a similar manner. 3.7.2 Special Cases (1) The conditional mean of X given Y = y is denoted by E (Xjy). (2) The conditional variance of X given Y = y is denoted by V ar (Xjy) and is given by n V ar (Xjy) = E [X = E X 2 jy 3.7.3 E (Xjy)]2 jy o [E (Xjy)]2 Example For the joint probability density function in Example 3.4.8 …nd E (Y jx) the conditional mean of Y given X = x, and V ar (Y jx) the conditional variance of Y given X = x. Solution From Example 3.5.2 we have 1 f2 (yjx) = p 2 1 x2 for p 1 x2 < y < p 1 x2 ; 0 < x < 1 98 3. MULTIVARIATE RANDOM VARIABLES Therefore E (Y jx) = Z1 = Z1 yf2 (yjx)dy 1 p p x2 1 y p dy 2 1 x2 1 x2 p = = 1 p 2 1 x2 p p ydy 1 x2 p 1 x2 y2j p 1 x2 1 x2 1 x2 Z1 = 0 Since E (Y jx) = 0 2 V ar (Y jx) = E Y jx = p = p Z1 Z1 1 x2 1 dy y2 p 2 1 x2 1 x2 p = = = y 2 f2 (yjx)dy 1 p 2 1 x2 p Z1 x2 y 2 dy 1 x2 p 1 1 x2 p y3j p 1 x2 6 1 x2 1 1 x2 3 Recall that the conditional distribution of Y given X = x is Uniform The results above can be veri…ed by noting that if U and V ar (U ) = 3.7.4 p 1 x2 ; Uniform(a; b) then E (U ) = (b a)2 12 . Exercise In Exercises 3.5.3 and 3.3.7 …nd E (Y jx), V ar (Y jx), E (Xjy) and V ar (Xjy). 3.7.5 p 1 Theorem If X and Y are independent random variables then E [g (X) jy] = E [g (X)] and E [h (Y ) jx] = E [h (Y )]. x2 . a+b 2 3.7. CONDITIONAL EXPECTATION 99 Proof (Continuous Case) E[g(X)jy] = Z1 g(x)f1 (xjy)dx = Z1 g(x)f1 (x)dx by Theorem 3.5.8 1 1 = E [g (X)] as required. E [h (Y ) jx] = E [h (Y )] follows in a similar manner. 3.7.6 De…nition E [g (X) jY ] is the function of the random variable Y whose value is E [g (X) jy] when Y = y. This means of course that E [g (X) jY ] is a random variable. 3.7.7 Example In Example 3.5.6 the joint model was speci…ed by Y Poisson( ) and XjY = y Binomial(y; p) and we showed that X Poisson(p ). Determine E (XjY = y), E (XjY ), E[E(XjY )], and E (X). What do you notice about E[E(XjY )] and E (X). Solution Since XjY = y Binomial(y; p) E (XjY = y) = py and E (XjY ) = pY which is a random variable. Since Y Poisson( ) and E (Y ) = E [E (XjY )] = E (pY ) = pE (Y ) = p Now since X Poisson(p ) then E (X) = p . We notice that E [E (XjY )] = E (X) The following theorem indicates that this result holds generally. 100 3.7.8 3. MULTIVARIATE RANDOM VARIABLES Theorem Suppose X and Y are random variables then E fE [g (X) jY ]g = E [g (X)] Proof (Continuous Case) 2 E fE [g (X) jY ]g = E 4 Z1 = 1 = Z1 Z1 = Z1 = Z1 1 1 1 Z1 g (x) f1 (xjy) dx5 1 Z1 2 4 3 1 3 g (x) f1 (xjy) dx5 f2 (y) dy g (x) f1 (xjy) f2 (y) dxdy 2 g (x) 4 Z1 1 3 f (x; y) dy 5 dx g (x) f1 (x) dx 1 = E [g (X)] 3.7.9 Corollary - Law of Total Expectation Suppose X and Y are random variables then E[E(XjY )] = E(X) Proof Let g (X) = X in Theorem 3.7.8 and the result follows. 3.7.10 Theorem - Law of Total Variance Suppose X and Y are random variables then V ar(X) = E[V ar(XjY )] + V ar[E(XjY )] Proof V ar (X) = E X 2 [E (X)]2 = E E X 2 jY = E E X 2 jY fE [E (XjY )]g2 by Theorem 3.7.8 n o n o E [E (XjY )]2 + E [E(XjY )]2 fE [E(XjY )]g2 = E[V ar(XjY )] + V ar[E(XjY )] 3.7. CONDITIONAL EXPECTATION 101 When the joint model is speci…ed in terms of a conditional distribution XjY = y and a marginal distribution for Y then Theorems 3.7.8 and 3.7.10 give a method for …nding expectations for functions of X without having to determine the marginal distribution for X. 3.7.11 Example Suppose P Uniform(0; 0:1) and Y jP = p Binomial(10; p). Find E(Y ) and V ar(Y ). Solution Since P Uniform(0; 0:1) E (P ) = 0 + 0:1 1 (0:1 0)2 1 = , V ar (P ) = = 2 20 12 1200 and E P2 = Since Y jP = p 1 + 1200 1 1+3 4 1 + = = = 1200 400 1200 1200 = V ar (P ) + [E (P )]2 = 1 20 1 300 2 Binomial(10; p) E (Y jp) = 10p, E (Y jP ) = 10P and V ar (Y jp) = 10p (1 p) , V ar (Y jP ) = 10P (1 P2 P ) = 10 P Therefore E (Y ) = E [E (Y jP )] = E (10P ) = 10E (P ) = 10 1 20 = 1 2 and V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )] = E 10 P P2 + V ar (10P ) = 10 E (P ) E P 2 + 100V ar (P ) 1 1 11 1 + 100 = = 10 20 300 1200 20 3.7.12 Exercise In Example 3.5.7 …nd E (X) and V ar (X) using Corollary 3.7.9 and Theorem 3.7.10. 3.7.13 Exercise Suppose P Beta(a; b) and Y jP = p Geometric(p). Find E(Y ) and V ar(Y ). 102 3.8 3. MULTIVARIATE RANDOM VARIABLES Joint Moment Generating Functions Moment generating functions can also be de…ned for bivariate and multivariate random variables. As mentioned previously, moment generating functions are a powerful tool for determining the distributions of functions of random variables (Chapter 4), particularly sums, as well as determining the limiting distribution of a sequence of random variables (Chapter 5). 3.8.1 De…nition - Joint Moment Generating Function If X and Y are random variables then M (t1 ; t2 ) = E et1 X+t2 Y is called the joint moment generating function of X and Y if this expectation exists (joint sum/integral converges absolutely) for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0. More generally if X1 ; X2 ; : : : ; Xn are random variables then M (t1 ; t2 ; : : : ; tn ) = E exp n P ti Xi i=1 is called the joint moment generating function of X1 ; X2 ; : : : ; Xn if this expectation exists for all ti 2 ( hi ; hi ) for some hi > 0, i = 1; 2; : : : ; n. If the joint moment generating function is known that it is straightforward to obtain the moment generating functions of the marginal distributions. 3.8.2 Important Note If M (t1 ; t2 ) exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0, then the moment generating function of X is given by MX (t) = E(etX ) = M (t; 0) for t 2 ( h1 ; h1 ) and the moment generating function of Y is given by MY (t) = E(etY ) = M (0; t) for t 2 ( h2 ; h2 ) 3.8.3 Example Suppose X and Y are continuous random variables with joint probability density function f (x; y) = e and 0 otherwise. y for 0 < x < y < 1 3.8. JOINT MOMENT GENERATING FUNCTIONS 103 (a) Find the joint moment generating function of X and Y . (b) What is the moment generating function of X and what is the marginal distribution of X? (c) What is the moment generating function of Y and what is the marginal distribution of Y? Solution (a) The joint moment generating function is M (t1 ; t2 ) = E et1 X+t2 Y Z1 Z1 = et1 x+t2 y f (x; y) dxdy 1 = 1 Z1 Zy et1 x+t2 y e y=0 x=0 = Z1 et2 y = Z1 et2 y 1 t1 Z1 et2 y 1 t1 Z1 e y 0 y 0 = = 0 @ Zy 0 y dxdy 1 et1 x dxA dy 1 t1 x y e j0 dy t1 y et1 y 1 dy 0 (1 t1 t2 )y e (1 t2 )y dy 0 which converges for t1 + t2 < 1 and t2 < 1 Therefore M (t1 ; t2 ) = = = = = 1 1 1 lim e (1 t1 t2 )y jb0 + e (1 t2 )y jb0 t1 b!1 1 t1 t2 1 t2 h i 1 1 1 h (1 lim e (1 t1 t2 )b 1 + e t1 b!1 1 t1 t2 1 t2 1 1 1 t1 1 t1 t2 1 t2 1 (1 t2 ) (1 t1 t2 ) t1 (1 t1 t2 ) (1 t2 ) 1 for t1 + t2 < 1 and t2 < 1 (1 t1 t2 ) (1 t2 ) t2 )b i 1 104 3. MULTIVARIATE RANDOM VARIABLES (b) The moment generating function of X is MX (t) = E(etX ) = M (t; 0) = = 1 0) (1 (1 t 0) 1 for t < 1 1 t By examining the list of moment generating functions in Chapter 11 we see that this is the moment generating function of a Exponential(1) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribution. (c) The moment generating function of Y is MY (t) = E(etY ) = M (0; t) = = 1 0 t) (1 (1 1 (1 t)2 t) for t < 1 By examining the list of moment generating functions in Chapter 11 we see that this is the moment generating function of a Gamma(2; 1) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a Gamma(2; 1) distribution. 3.8.4 Example Suppose X and Y are continuous random variables with joint probability density function f (x; y) = e x y for x > 0, y > 0 and 0 otherwise. (a) Find the joint moment generating function of X and Y . (b) What is the moment generating function of X and what is the marginal distribution of X? (c) What is the moment generating function of Y and what is the marginal distribution of Y? 3.8. JOINT MOMENT GENERATING FUNCTIONS 105 Solution (a) The joint moment generating function is M (t1 ; t2 ) = E e = Z1 Z1 0 0 = @ = = t1 X+t2 Y 1 et1 x+t2 y e 0 Z1 = e y(1 x y et1 x+t2 y f (x; y) dxdy 1 dxdy 10 1 Z t2 ) A @ e dy x(1 t1 ) 0 0 e (1 lim b!1 y(1 t2 ) t2 ) 1 1 Z1 Z1 jb0 1 t1 1 t2 ! lim b!1 1 dxA e (1 x(1 t1 ) t1 ) which converges for t1 < 1, t2 < 1 jb0 ! for t1 < 1, t2 < 1 (b) The moment generating function of X is MX (t) = E(etX ) = M (t; 0) 1 = (1 t) (1 0) 1 = for t < 1 1 t By examining the list of moment generating functions in Chapter 11 we see that this is the moment generating function of a Exponential(1) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribution. (c) The moment generating function of Y is MY (t) = E(etY ) = M (0; t) 1 = (1 0) (1 t) 1 = for t < 1 1 t By examining the list of moment generating functions in Chapter 11 we see that this is the moment generating function of a Exponential(1) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a Exponential(1) distribution. 106 3. MULTIVARIATE RANDOM VARIABLES 3.8.5 Theorem If X and Y are random variables with joint moment generating function M (t1 ; t2 ) which exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0 then E(X j Y k ) = @ j+k @tj1 @tk2 M (t1 ; t2 )j(t1 ;t2 )=(0;0) Proof See Problem 11(a). 3.8.6 Independence Theorem for Moment Generating Functions Suppose X and Y are random variables with joint moment generating function M (t1 ; t2 ) which exists for all t1 2 ( h1 ; h1 ) for some h1 > 0, and all t2 2 ( h2 ; h2 ) for some h2 > 0. Then X and Y are independent random variables if and only if M (t1 ; t2 ) = MX (t1 )MY (t2 ) for all t1 2 ( h1 ; h1 ) and t2 2 ( h2 ; h2 ) where MX (t1 ) = M (t1 ; 0) and MY (t2 ) = M (0; t2 ). Proof See Problem 11(b). 3.8.7 Example Use Theorem 3.8.6 to determine if X and Y are independent random variables in Examples 3.8.3 and 3.8.4. Solution For Example 3.8.3 M (t1 ; t2 ) = (1 t1 1 t2 ) (1 t2 ) for t1 + t2 < 1 and t2 < 1 1 MX (t1 ) = for t1 < 1 t1 1 for t2 < 1 (1 t2 )2 1 MY (t2 ) = Since M 1 1 ; 4 4 = 1 1 1 4 1 4 1 1 4 = 8 6= MX 3 1 4 MY 1 4 = 1 1 1 1 4 then by Theorem 3.8.6 X and Y are not independent random variables. 1 1 2 4 = 4 3 3 3.8. JOINT MOMENT GENERATING FUNCTIONS 107 For Example 3.8.4 M (t1 ; t2 ) = MX (t1 ) = MY (t2 ) = 1 1 1 t1 1 t2 1 for t1 < 1 1 t1 1 for t2 < 1 1 t2 for t1 < 1, t1 < 1 Since M (t1 ; t2 ) = 1 1 1 t1 1 = MX (t1 ) MY (t2 ) t2 for all t1 < 1, t1 < 1 then by Theorem 3.8.6 X and Y are independent random variables. 3.8.8 Example Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables each with moment generating function M (t), t 2 ( h; h) for some h > 0. Find M (t1 ; t2 ; : : : ; tn ) the joint moment generating function of X1 ; X2 ; : : : ; Xn . Find the moment generating n P function of T = Xi . i=1 Solution Since the Xi ’s are independent random variables each with moment generating function M (t), t 2 ( h; h) for some h > 0, the joint moment generating function of X1 ; X2 ; : : : ; Xn is n P ti Xi M (t1 ; t2 ; : : : ; tn ) = E exp i=1 = E n Q eti Xi i=1 = = n Q i=1 n Q E eti Xi M (ti ) i=1 for ti 2 ( h; h) ; i = 1; 2; : : : ; n for some h > 0 The moment generating function of T = n P Xi is i=1 MT (t) = E etT = E exp n P tXi i=1 = M (t; t; : : : ; t) n Q = M (t) i=1 = [M (t)]n for t 2 ( h; h) for some h > 0 108 3. MULTIVARIATE RANDOM VARIABLES 3.9 Multinomial Distribution The discrete multivariate distribution which is the most widely used is the Multinomial distribution which was introduced in a previous probability course. We give its joint probability function and its important properties here. 3.9.1 De…nition - Multinomial Distribution Suppose (X1 ; X2 ; : : : ; Xk ) are discrete random variables with joint probability function f (x1 ; x2 ; : : : ; xk ) = k P for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k; n! px1 px2 x1 !x2 ! xk ! 1 2 xi = n; 0 pi pxk k 1; i = 1; 2; : : : ; k; i=1 Notes: (1) Since k P pi = 1 i=1 Then (X1 ; X2 ; : : : ; Xk ) is said to have a Multinomial distribution. We write (X1 ; X2 ; : : : ; Xk ) k P Multinomial(n; p1 ; p2 ; : : : ; pk ). Xi = n, the Multinomial distribution is actually a joint distribution i=1 for k 1 random variables which can be written as f (x1 ; x2 ; : : : ; xk n! 1) = x1 !x2 ! xk 1! kP1 n x px1 1 px2 2 pk k 11 1 xi ! i=1 for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k 1; kP1 xi n; 0 pi kP1 n pi 1; i = 1; 2; : : : ; k 1; (2) If k = 2 we obtain the familiar Binomial distribution = kP1 i=1 n! px1 (1 x1 ! (n x1 )! 1 n x1 p (1 p1 )n x1 1 for x1 = 0; 1; : : : ; n; 0 p1 )n x1 x1 p1 1 (2) If k = 3 we obtain the Trinomial distribution f (x1 ; x2 ) = n! x1 !x2 ! (n x1 for xi = 0; 1; : : : ; n; i = 1; 2, x1 + x2 x2 )! px1 1 px2 2 (1 n and 0 p1 pi p2 )n xi i=1 i=1 i=1 f (x1 ) = kP1 x1 x2 1; i = 1; 2; p1 + p2 1 pi 1 3.9. MULTINOMIAL DISTRIBUTION 3.9.2 109 Theorem - Properties of the Multinomial Distribution If (X1 ; X2 ; : : : ; Xk ) (1) (X1 ; X2 ; : : : ; Xk Multinomial(n; p1 ; p2 ; : : : ; pk ), then 1) has joint moment generating function M (t1 ; t2 ; : : : ; tk 1) = E et1 X1 +t2 X2 + p1 et1 + p2 et2 = for (t1 ; t2 ; : : : ; tk 1) 2 <k +tk 1 Xk 1 + pk 1e tk 1 + pk n (3.5) 1. (2) Any subset of X1 ; X2 ; : : : ; Xk also has a Multinomial distribution. In particular Xi Binomial(n; pi ) for i = 1; 2; : : : ; k (3) If T = Xi + Xj ; i 6= j; then T Binomial (n; pi + pj ) (4) Cov (Xi ; Xj ) = npi pj for i 6= j (5) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the remaining of the coordinates is a Multinomial distribution. In particular the conditional probability function of Xi given Xj = xj ; i 6= j; is Xi jXj = xj Binomial n xj ; pi 1 pj (6) The conditional distribution of Xi given T = Xi + Xj = t; i 6= j; is Xi jXi + Xj = t 3.9.3 Binomial t; pi pi + pj Example Suppose (X1 ; X2 ; : : : ; Xk ) (a) Prove (X1 ; X2 ; : : : ; Xk Multinomial(n; p1 ; p2 ; : : : ; pk ) 1) has joint moment generating function M (t1 ; t2 ; : : : ; tk for (t1 ; t2 ; : : : ; tk 1) (b) Prove (X1 ; X2 ) (c) Prove Xi 2 <k 1) = p1 et1 + p2 et2 1. Multinomial(n; p1 ; p1 ; 1 p1 Binomial(n; pi ) for i = 1; 2; : : : ; k. (d) Prove T = X1 + X2 + pk Binomial(n; p1 + p2 ). p2 ). tk 1e 1 + pk n 110 3. MULTIVARIATE RANDOM VARIABLES Solution (a) Let A = (x1 ; x2 ; : : : ; xk ) : xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k; k P xi = n then i=1 M (t1 ; t2 ; : : : ; tk = P P (x1 ;x2 ; :::;xk ) = P P (x1 ;x2 ; :::;xk ) = p1 et1 + p2 et2 P 1) = E et1 X1 +t2 X2 + et1 x1 +t2 x2 + +tk 1 xk 1 2A P n! p1 et1 x !x ! x ! k 2A 1 2 + pk 1e tk 1 + pk n +tk 1 Xk 1 n! p x1 p x2 x1 !x2 ! xk ! 1 2 x1 p2 et2 x2 pk for (t1 ; t2 ; : : : ; tk pkxk 1 1 pxk k tk 1e 1) 1 xk 1 pxk k 2 <k by the Multinomial Theorem 2.11.5. (b) The joint moment generating function of (X1 ; X2 ) is M (t1 ; t2 ; 0; : : : ; 0) = p1 et1 + p2 et2 + (1 p1 p2 ) n for (t1 ; t2 ) 2 <2 which is of the form 3.5 so by the Uniqueness Theorem for Moment Generating Functions, (X1 ; X2 ) Multinomial(n; p1 ; p1 ; 1 p1 p2 ). (c) The moment generating function of Xi is M (0; 0; : : : ; t; 0; : : : ; 0) = pi eti + (1 pi ) n for ti 2 < for i = 1; 2; : : : ; k which is the moment generating function of a Binomial(n; pi ) random variable. By the Uniqueness Theorem for Moment Generating Functions, Xi Binomial(n; pi ) for i = 1; 2; : : : ; k. (d) The moment generating function of T = X1 + X2 is MT (t) = E etT = E et(X1 +X2 ) = E etX1 +tX2 = M (t; t; 0; 0; : : : ; 0) = p1 et + p2 et + (1 p1 p2 ) = (p1 + p2 ) et + (1 p1 p2 ) n n for t 2 < which is the moment generating function of a Binomial(n; p1 + p2 ) random variable. By the Uniqueness Theorem for Moment Generating Functions, T Binomial(n; p1 + p2 ). 3.9.4 Exercise Prove property (3) in Theorem 3.9.2. 3.10. BIVARIATE NORMAL DISTRIBUTION 3.10 111 Bivariate Normal Distribution The best known bivariate continuous distribution is the Bivariate Normal distribution. We give its joint probability density function written in vector notation so we can easily introduce the multivariate version of this distribution called the Multivariate Normal distribution in Chapter 7. 3.10.1 De…nition - Bivariate Normal Distribution (BVN) Suppose X1 and X2 are random variables with joint probability density function f (x1 ; x2 ) = where x= h 1 2 j j1=2 x1 x2 i 1 (x 2 exp ; = h 1 1 ) 2 i (x ; )T = for (x1 ; x2 ) 2 <2 " 2 1 1 2 2 2 1 2 # and is an nonsingular matrix. Then X = (X1 ; X2 ) is said to have a bivariate normal distribution. We write X BVN( ; ). The Bivariate Normal distribution has many special properties. 3.10.2 If X Theorem - Properties of the BVN Distribution BVN( ; ), then (1) X1 ; X2 has joint moment generating function M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT 1 = exp tT + t tT for all t = (t1 ; t2 ) 2 <2 2 (2) X1 N( 1; 2) 1 (3) Cov (X1 ; X2 ) = and X2 1 2 N( 2; 2 ). 2 and Cor (X1 ; X2 ) = where 1 1. (4) X1 and X2 are independent random variables if and only if = 0. (5) If c = (c1 ; c2 ) is a nonzero vector of constants then c1 X1 + c2 X2 (6) If A is a 2 cT ; c cT N 2 nonsingular matrix and b is a 1 XA + b BVN 2 vector then A + b; AT A (7) X2 jX1 = x1 N 2 + 2 (x1 1 )= 1 ; 2 2 (1 2 ) 112 3. MULTIVARIATE RANDOM VARIABLES and X1 jX2 = x2 (8) (X ) )T 1 (X N 1 + 1 (x2 2 1 (1 2 )= 2 ; 2 ) 2 (2) Proof For proofs of properties (1) (4) and (6) (7) see Problem 13. (5) The moment generating function of c1 X1 + c2 X2 is E et(c1 X1 +c2 X2 ) = E e(c1 t)X1 +(c2 t)X2 " # " #! h i c t i 1h c1 t 1 + = exp c1 t c2 t 1 2 2 c2 t c2 t " # " # ! h i c i 1h c1 1 t+ t2 = exp c1 c2 1 2 2 c2 c2 cT t + = exp 1 c cT t2 2 for t 2 < where c = (c1 ; c2 ) which is the moment generating function of a N cT ; c cT random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, c1 X1 + c2 X2 N cT ; c cT . The BVN joint probability density function is graphed in Figures 3.8 - 3.10. 0.2 f (x, y) 0.15 0.1 0.05 0 3 3 2 2 1 1 0 0 -1 -1 -2 -2 y -3 -3 Figure 3.8: Graph of BVN p.d.f. with The graphs all have the same mean vector matrices . The axes all have the same scale. x = h 0 0 i and = " 1 0 0 1 # = [0 0] but di¤erent variance/covariance 3.10. BIVARIATE NORMAL DISTRIBUTION 113 0.2 f(x,y) 0.15 0.1 0.05 0 3 3 2 2 1 1 0 0 -1 -1 -2 -2 -3 y -3 Figure 3.9: Graph of BVN p.d.f. with x h = i 0 0 and = " 1 0:5 0:5 1 # " 0:6 0:5 0:5 1 # 0.2 f(x,y) 0.15 0.1 0.05 0 3 3 2 2 1 1 0 0 -1 -1 -2 -2 y -3 Figure 3.10: Graph of BVN p.d.f. with -3 x = h 0 0 i and = 114 3.11 3. MULTIVARIATE RANDOM VARIABLES Calculus Review Consider the region R in the xy-plane in Figure 3.11. y=g(x) y R y=h(x) x=a x=b x Figure 3.11: Region 1 Suppose f (x; y) 0 for all (x; y) 2 <2 . The graph of z = f (x; y) is a surface in 3-space lying above or touching the xy-plane. The volume of the solid bounded by the surface z = f (x; y) and the xy-plane above the region R is given by Volume = Zb h(x) Z f (x; y)dydx x=a y=g(x) y=b y x=g(y) R x=h(y) y=a x Figure 3.12: Region 2 3.11. CALCULUS REVIEW 115 If R is the region in Figure 3.12 then the volume is given by Volume = Zd h(y) Z f (x; y)dxdy y=c x=g(y) Give an expression for the volume of the solid bounded by the surface z = f (x; y) and the xy-plane above the region R = R1 [ R2 in Figure 3.13. b3 y=g(x) y x=h(y) b2 R2 R 1 b1 a1 a2 x Figure 3.13: Region 3 a3 116 3.12 3. MULTIVARIATE RANDOM VARIABLES Chapter 3 Problems 1. Suppose X and Y are discrete random variables with joint probability function f (x; y) = kq 2 px+y for x = 0; 1; : : : ; y = 0; 1; : : : ; 0 < p < 1; q = 1 p (a) Determine the value of k. (b) Find the marginal probability function of X and the marginal probability function of Y . Are X and Y independent random variables? (c) Find P (X = xjX + Y = t). 2. Suppose X and Y are discrete random variables with joint probability function f (x; y) = e x!(y 2 x)! for x = 0; 1; : : : ; y; y = 0; 1; : : : (a) Find the marginal probability function of X and the marginal probability function of Y . (b) Are X and Y independent random variables? 3. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = k(x2 + y) for 0 < y < 1 x2 ; 1<x<1 (a) Determine k: (b) Find the marginal probability density function of X and the marginal probability density function of Y . (c) Are X and Y independent random variables? (d) Find P (Y X + 1). 4. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = kx2 y for x2 < y < 1 (a) Determine k. (b) Find the marginal probability density function of X and the marginal probability density function of Y . (c) Are X and Y independent random variables? (d) Find P (X Y ). (e) Find the conditional probability density function of X given Y = y and the conditional probability density function of Y given X = x. 3.12. CHAPTER 3 PROBLEMS 117 5. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = kxe y for 0 < x < 1; 0 < y < 1 (a) Determine k. (b) Find the marginal probability density function of X and the marginal probability density function of Y . (c) Are X and Y independent random variables? (d) Find P (X + Y t). 6. Suppose each of the following functions is a joint probability density function for continuous random variables X and Y . (a) f (x; y) = k for 0 < x < y < 1 for 0 < x2 < y < 1 (b) f (x; y) = kx (c) f (x; y) = kxy for 0 < y < x < 1 (d) f (x; y) = k (x + y) for 0 < x < y < 1 (e) f (x; y) = k x for 0 < y < x < 1 (f) f (x; y) = kx2 y (g) f (x; y) = ke for 0 < x < 1; 0 < y < 1; 0 < x + y < 1 x 2y for 0 < y < x < 1 In each case: (i) Determine k. (ii) Find the marginal probability density function of X and the marginal probability density function of Y . (iii) Find the conditional probability density function of X given Y = y and the conditional probability density function of Y given X = x. (iv) Find E (Xjy) and E (Y jx). 7. Suppose X Uniform(0; 1) and the conditional probability density function of Y given X = x is 1 f2 (yjx) = for 0 < x < y < 1 1 x Determine: (a) the joint probability density function of X and Y (b) the marginal probability density function of Y (c) the conditional probability density function of X given Y = y. 118 3. MULTIVARIATE RANDOM VARIABLES 8. Suppose X and Y are continuous random variables. Suppose also that the marginal probability density function of X is f1 (x) = 1 (1 + 4x) 3 for 0 < x < 1 and the conditional probability density function of Y given X = x is f2 (yjx) = 2y + 4x 1 + 4x for 0 < x < 1; 0 < y < 1 Determine: (a) the joint probability density function of X and Y (b) the marginal probability density function of Y (c) the conditional probability density function of X given Y = y. 9. Suppose that Beta(a; b) and Y j Binomial(n; ). Find E(Y ) and V ar(Y ). 10. Assume that Y denotes the number of bacteria in a cubic centimeter of liquid and that Y j Poisson( ). Further assume that varies from location to location and Gamma( ; ). (a) Find E(Y ) and V ar(Y ). (b) If is a positive integer then show that the marginal probability function of Y is Negative Binomial. 11. Suppose X and Y are random variables with joint moment generating function M (t1 ; t2 ) which exists for all jt1 j < h1 and jt2 j < h2 for some h1 ; h2 > 0. (a) Show that E(X j Y k ) = @ j+k @tj1 @tk2 M (t1 ; t2 )j(t1 ;t2 )=(0;0) (b) Prove that X and Y are independent random variables if and only if M (t1 ; t2 ) = MX (t1 )MY (t2 ). (c) If (X; Y ) Multinomial(n; p1 ; p2 ; 1 p1 p2 ) …nd Cov(X; Y ). 12. Suppose X and Y are discrete random variables with joint probability function f (x; y) = e x!(y 2 x)! for x = 0; 1; : : : ; y; y = 0; 1; : : : (a) Find the joint moment generating function of X and Y . (b) Find Cov(X; Y ). 3.12. CHAPTER 3 PROBLEMS 13. Suppose X = (X1 ; X2 ) 119 BVN( ; ). (a) Let t = (t1 ; t2 ). Use matrix multiplication to verify that (x = [x 1 ) (x ( + t )] 1 )T 2xtT [x ( + t )]T 2 tT t tT Use this identity to show that the joint moment generating function of X1 and X2 is M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT 1 = exp tT + t tT for all t = (t1 ; t2 ) 2 <2 2 (b) Use moment generating functions to show X1 N( 1; 2) 1 (c) Use moment generating functions to show Cov(X1 ; X2 ) = result in Problem 11(a). and X2 1 2. N( 2; 2 ). 2 Hint: Use the (d) Use moment generating functions to show that X1 and X2 are independent random variables if and only if = 0. (e) Let A be a 2 2 nonsingular matrix and b be a 1 generating function to show that XA + b BVN 2 vector. Use the moment A + b; AT A (f) Verify that (x ) 1 (x )T 2 x1 1 1 = 1 2 (1 2 2) x2 2+ 2 1 2 (x1 1) and thus show that the conditional distribution of X2 given X1 = x1 is 2 2 )). Note that by symmetry the conditional N( 2 + 2 (x1 1 )= 1 ; 2 (1 2 2 )). distribution of X1 given X2 = x2 is N( 1 + 1 (x2 2 )= 2 ; 1 (1 14. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 2e x y for 0 < x < y < 1 (a) Find the joint moment generating function of X and Y . (b) Determine the marginal distributions of X and Y . (c) Find Cov(X; Y ). 120 3. MULTIVARIATE RANDOM VARIABLES 4. Functions of Two or More Random Variables In this chapter we look at techniques for determining the distributions of functions of two or more random variables. These techniques are extremely important for determining the distributions of estimators such as maximum likelihood estimators, the distributions of pivotal quantities for constructing con…dence intervals, and the distributions of test statistics for testing hypotheses. In Section 4:1 we extend the cumulative distribution function technique introduced in Section 2.6 to a function of two or more random variables. In Section 4:2 we look at a method for determining the distribution of a one-to-one transformation of two or more random variables which is an extension of the result in Theorem 2.6.8. In particular we show how the t distribution, which you would have used in your previous statistics course, arises as the ratio of two independent Chi-squared random variables each divided by their degrees of freedom. In Section 4:3 we see how moment generating functions can be used for determining the distribution of a sum of random variables which is an extension of Theorem 2.10.4. In particular we prove that a linear combination of independent Normal random variables has a Normal distribution. This is a result which was used extensively in previous probability and statistics courses. 4.1 Cumulative Distribution Function Technique Suppose X1 ; X2 ; : : : ; Xn are continuous random variables with joint probability density function f (x1 ; x2 ; : : : ; xn ). The probability density function of Y = h (X1 ; X2 ; : : : ; Xn ) can be determined using the cumulative distribution function technique that was used in Section 2.6 for the case n = 1. 4.1.1 Example Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 3y for 0 < x < y < 1 and 0 otherwise. Determine the probability density function of T = XY . 121 122 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES Solution The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions E and F shown in Figure 4.1 y 1 F x=y x=t/y E √t - x 1 0 Figure 4.1: Support set for Example 4.1.1 For 0 < t < 1 G (t) = P (T t) = P (XY t) = Z Z 3ydxdy (x;y) 2 E Due to the shape of the region E, the double integral over the region E would have to be written as the sum of two double integrals. It is easier to …nd G (t) using G (t) = Z Z 3ydxdy (x;y) 2 E = 1 Z Z 3ydxdy (x;y) 2 F Z1 = 1 Zy Z1 3ydxdy = 1 p y= t x=t=y = 1 Z1 p 3y y p t y = 1 y = 3t 2t3=2 3ty j1p dy = 1 p t =1 for 0 < t < 1 t Z1 t 3 3y xjyt=y dy 1 3y 2 3t dy t 3t t3=2 3t3=2 4.1. CUMULATIVE DISTRIBUTION FUNCTION TECHNIQUE The cumulative distribution function for T 8 > > < 3t G (t) = > > : 123 is 0 t 0 2t3=2 0 < t < 1 1 t 1 Now a cumulative distribution function must be a continuous function for all real values. Therefore as a check we note that lim 3t 2t3=2 = 0 = G (0) lim 3t 2t3=2 = 1 = G (1) t!0+ and t!1 so indeed G (t) is a continuous function for all t 2 <. Since d dt G (t) = 0 for t < 0 and t > 0, and d d G (t) = 3t 2t3=2 dt dt = 3 3t1=2 for 0 < t < 1 the probability density function of T is g (t) = 3 3t1=2 for 0 < t < 1 and 0 otherwise. 4.1.2 Exercise Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 3y for 0 x y 1 and 0 otherwise. Find the probability density function of S = Y =X. 4.1.3 Example Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed continuous random variables each with probability density function f (x) and cumulative distribution function F (x). Find the probability density function of U = max (X1 ; X2 ; : : : ; Xn ) = X(n) and V = min (X1 ; X2 ; : : : ; Xn ) = X(1) . 124 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES Solution For u 2 <, the cumulative distribution function of U is G (u) = P (U u) = P [max (X1 ; X2 ; : : : ; Xn ) = P (X1 u; X2 u; : : : ; Xn = P (X1 u) P (X2 u] u) u) : : : P (Xn u) since X1 ; X2 ; : : : ; Xn are independent random variables n Q = P (Xi i=1 n Q = u) F (u) since X1 ; X2 ; : : : ; Xn are identically distributed i=1 = [F (u)]n Suppose A is the support set of Xi , i = 1; 2; : : : ; n. The probability density function of U is d d G (u) = [F (u)]n du du = n [F (u)]n 1 f (u) for u 2 A g (u) = and 0 otherwise. For v 2 <, the cumulative distribution function of V is H (v) = P (V v) = P [min (X1 ; X2 ; : : : ; Xn ) v] = 1 P [min (X1 ; X2 ; : : : ; Xn ) > v] = 1 P (X1 > v; X2 > v; : : : ; Xn > v) = 1 P (X1 > v) P (X2 > v) : : : P (Xn > v) since X1 ; X2 ; : : : ; Xn are independent random variables n Q = 1 P (Xi > v) = 1 i=1 n Q [1 F (v)] since X1 ; X2 ; : : : ; Xn are identically distributed i=1 = 1 [1 F (v)]n Suppose A is the support set of Xi , i = 1; 2; : : : ; n. The probability density function of V is d d H (v) = f1 [1 F (v)]n g dv dv = n [1 F (v)]n 1 f (v) for v 2 A h (v) = and 0 otherwise. 4.2. ONE-TO-ONE TRANSFORMATIONS 4.2 125 One-to-One Transformations In this section we look at how to determine the joint distribution of a one-to-one transformation of two or more random variables. We concentrate on the bivariate case for ease of presentation. The method does extend to more than two random variables. See Problems 12 and 13 at the end of this chapter for examples of one-to-one transformations of three random variables. We begin with some notation and a theorem which gives su¢ cient conditions for determining whether a transformation is one-to-one in the bivariate case followed by the theorem which gives the joint probability density function for the two new random variables. Suppose the transformation S de…ned by u = h1 (x; y) v = h2 (x; y) is a one-to-one transformation for all (x; y) 2 RXY and that S maps the region RXY into the region RU V in the uv plane. Since S : (x; y) ! (u; v) is a one-to-one transformation there exists a inverse transformation T de…ned by x = w1 (u; v) y = w2 (u; v) such that T = S T is 1 : (u; v) ! (x; y) for all (u; v) 2 RU V . The Jacobian of the transformation @(x; y) = @(u; v) where 4.2.1 @(u;v) @(x;y) @x @u @y @u @x @v @y @v = @(u; v) @(x; y) 1 is the Jacobian of the transformation S. Inverse Mapping Theorem Consider the transformation S de…ned by u = h1 (x; y) v = h2 (x; y) Suppose the partial derivatives @u @u @v @x , @y , @x and @v @y are continuous functions @(u;v) @(x;y) 6= 0 at the point (a; b). bourhood of the point (a; b). Suppose also that a neighbourhood of the point (a; b) in which S has an inverse. in the neighThen there is Note: These are su¢ cient but not necessary conditions for the inverse to exist. 126 4.2.2 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES Theorem - One-to-One Bivariate Transformations Let X and Y be continuous random variables with joint probability density function f (x; y) and let RXY = f(x; y) : f (x; y) > 0g be the support set of (X; Y ). Suppose the transformation S de…ned by U = h1 (X; Y ) V = h2 (X; Y ) is a one-to-one transformation with inverse transformation X = w1 (U; V ) Y = w2 (U; V ) Suppose also that S maps RXY into RU V . Then g(u; v), the joint joint probability density function of U and V , is given by g(u; v) = f (w1 (u; v); w2 (u; v)) @(x; y) @(u; v) for all (u; v) 2 RU V . (Compare Theorem 2.6.8 for univariate random variables.) 4.2.3 Proof We want to …nd g(u; v), the joint probability density function of the random variables U and V . Suppose S 1 maps the region B RU V into the region A RXY then P [(U; V ) 2 B] ZZ = g(u; v)dudv (4.1) B = P [(X; Y ) 2 A] ZZ = f (x; y)dxdy A = ZZ f (w1 (u; v); w2 (u; v)) @(x; y) dudv @(u; v) (4.2) B where the last line follows by the Change of Variable Theorem. Since this is true for all B RU V we have, by comparing (4.1) and (4.2), that the joint probability density function of U and V is given by g(u; v) = f (w1 (u; v); w2 (u; v)) @(x; y) @(u; v) for all (u; v) 2 RU V . In the following example we see how Theorem 4.2.2 can be used to show that the sum of two independent Exponential(1) random variables is a Gamma random variable. 4.2. ONE-TO-ONE TRANSFORMATIONS 4.2.4 127 Example Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint probability density function of U = X + Y and V = X. Show that U Gamma(2; 1). Solution Since X Exponential(1) and Y density function of X and Y is Exponential(1) independently, the joint probability f (x; y) = f1 (x) f2 (y) = e = e x e y x y with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 4.2. y . . . . . . ... x 0 Figure 4.2: Support set RXY for Example 4.2.6 The transformation S : U =X +Y, V =X has inverse transformation X =V, Y =U V Under S the boundaries of RXY are mapped as (k; 0) ! (k; k) (0; k) ! (k; 0) for k 0 for k 0 and the point (1; 2) is mapped to the point (3; 1). Thus S maps RXY into RU V = f(u; v) : 0 < v < ug 128 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES v v=u ... u 0 Figure 4.3: Support set RU V for Example 4.2.4 as shown in Figure 4.3. The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) @x @u @y @u @x @v @y @v = 0 1 1 = @y @v 1 Note that the transformation S is a linear transformation and so we would expect the Jacobian of the transformation to be a constant. The joint probability density function of U and V is given by @(x; y) @(u; v) g (u; v) = f (w1 (u; v); w2 (u; v)) = f (v; u = e u v) j 1j for (u; v) 2 RU V and 0 otherwise. To …nd the marginal probability density functions for U we note that the support set RU V is not rectangular and the range of integration for v will depend on u. The marginal probability density function of U is g1 (u) = Z1 1 = ue g (u; v) dv = e u Zu dv v=0 u for u > 0 and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable. Therefore U Gamma(2; 1). 4.2. ONE-TO-ONE TRANSFORMATIONS 129 In the following exercise we see how the sum and di¤erence of two independent Exponential(1) random variables give a Gamma random variable and a Double Exponential random variable respectively. 4.2.5 Exercise Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint probability density function of U = X + Y and V = X Y . Show that U Gamma(2; 1) and V Double Exponential(0; 1). In the following example we see how the Gamma and Beta distributions are related. 4.2.6 Example Suppose X Gamma(a; 1) and Y Gamma(b; 1) independently. Find the joint probability X density function of U = X + Y and V = X+Y . Show that U Gamma(a + b; 1) and V X X+Y Beta(a; b) independently. Find E (V ) by …nding E . Solution Since X Gamma(a; 1) and Y function of X and Y is Gamma(b; 1) independently, the joint probability density f (x; y) = f1 (x) f2 (y) = = xa 1e x yb (a) 1e y (b) xa 1 y b 1 e x y (a) (b) with support set RXY = f(x; y) : x > 0; y > 0g which is the same support set as shown in Figure 4.2. The transformation S : U =X +Y, V = X X +Y has inverse transformation X = U V , Y = U (1 V) Under S the boundaries of RXY are mapped as (k; 0) ! (k; 1) (0; k) ! (k; 0) for k 0 for k 0 and the point (1; 2) is mapped to the point 3; 13 . Thus S maps RXY into RU V = f(u; v) : u > 0; 0 < v < 1g as shown in Figure 4.4. 130 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES v 1 ... u 0 Figure 4.4: Support set RU V for Example 4.2.6 The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) v 1 u u v = uv u + uv = u The joint probability density function of U and V is given by g (u; v) = f (uv; u (1 = (uv) = ua+b a 1 1 e v)) j uj [u (1 v)]b 1 e (a) (b) uv uj v)b (a) (b) a 1 (1 uj 1 for (u; v) 2 RU V and 0 otherwise. To …nd the marginal probability density functions for U and V we note that the support set of U is B1 = fu : u > 0g and the support set of V is B2 = fv : 0 < v < 1g. Since a+b 1 u g (u; v) = u | {z e } h1 (u) va | v)b 1 (a) (b) {z } 1 (1 h2 (v) for all (u; v) 2 B1 B2 then, by the Factorization Theorem for Independence, U and V are independent random variables. Also by the Factorization Theorem for Independence the probability density function of U must be proportional to h1 (u). By writing g (u; v) = ua+b 1 e u (a + b) (a + b) a v (a) (b) 1 (1 v)b 1 we note that the function in the …rst square bracket is the probability density function of a Gamma(a + b; 1) random variable and therefore U Gamma(a + b; 1). It follows that the 4.2. ONE-TO-ONE TRANSFORMATIONS 131 function in the second square bracket must be the probability density function of V which is a Beta(a; b) probability density function. Therefore U Gamma(a + b; 1) independently of V Beta(a; b). In Chapter 2, Problem 9 the moments of a Beta random variable were found by integration. Here is a rather clever way of …nding E (V ) using the mean of a Gamma random variable. In Exercise 2.7.9 it was shown that the mean of a Gamma( ; ) random variable is . Now E (U V ) = E (X + Y ) since X X = E (X) = (a) (1) = a (X + Y ) Gamma(a; 1). But U and V are independent random variables so a = E (U V ) = E (U ) E (V ) But since U Gamma(a + b; 1) we know E (U ) = a + b so a = E (U ) E (V ) = (a + b) E (V ) Solving for E (V ) gives a a+b Higher moments can be found in a similar manner using the higher moments of a Gamma random variable. E (V ) = 4.2.7 Exercise Suppose X Beta(a; b) and Y Beta(a + b; c) independently. Find the joint probability density function of U = XY and V = X. Show that U Beta(a; b + c). In the following example we see how a rather unusual transformation can be used to transform two independent Uniform(0; 1) random variables into two independent N(0; 1) random variables. This transformation is referred to as the Box-Muller Transformation after the two statisticians George E. P. Box and Mervin Edgar Muller who published this result in 1958. 4.2.8 Example - Box-Muller Transformation Suppose X Uniform(0; 1) and Y bility density function of Uniform(0; 1) independently. Find the joint proba- U = ( 2 log X)1=2 cos (2 Y ) V = ( 2 log X)1=2 sin (2 Y ) Show that U N(0; 1) and V N(0; 1) independently. Explain how you could use this result to generate independent observations from a N(0; 1) distribution. 132 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES Solution Since X Uniform(0; 1) and Y function of X and Y is Uniform(0; 1) independently, the joint probability density f (x; y) = f1 (x) f2 (y) = (1) (1) = 1 with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g. Consider the transformation S : U = ( 2 log X)1=2 cos (2 Y ) , V = ( 2 log X)1=2 sin (2 Y ) To determine the support set of (U; V ) we note that 0 < y < 1 implies 1 < cos (2 y) < 1. Also 0 < x < 1 implies 0 < ( 2 log X)1=2 < 1. Therefore u = ( 2 log x)1=2 cos (2 y) takes on values in the interval ( 1; 1). By a similar argument v = ( 2 log x)1=2 sin (2 y) also takes on values in the interval ( 1; 1). Therefore the support set of (U; V ) is RU V = <2 . The inverse of the transformation S can be determined. In particular we note that since h i2 h i2 U 2 + V 2 = ( 2 log X)1=2 cos (2 Y ) + ( 2 log X)1=2 sin (2 Y ) = ( 2 log X) cos2 (2 Y ) + sin2 (2 Y ) = 2 log X and sin (2 Y ) V = = tan (2 Y ) U cos (2 Y ) the inverse transformation is X=e 1 2 (U 2 +V 2 ) , Y = 1 arctan 2 V U To determine the Jacobian of the inverse transformation it is simpler to use the result 2 3 1 @u @u 1 @(x; y) @(u; v) @x @y = = 4 @v @v 5 @(u; v) @(x; y) @x @y Since @u @x @v @x = = = 1 x 1 x @u @y @v @y ( 2 log x) ( 2 log x) 1=2 cos (2 y) 1=2 sin (2 y) 2 cos2 (2 y) + sin2 (2 y) x 2 x 2 ( 2 log x) 1=2 sin (2 y) 2 ( 2 log x) 1=2 cos (2 y) 4.2. ONE-TO-ONE TRANSFORMATIONS 133 Therefore @(x; y) @(u; v) @(u; v) 1 = @(x; y) x 2 1 2 2 1 e 2 (u +v ) 2 = = = 1 2 x The joint probability density function of U and V is g (u; v) = f (w1 (u; v); w2 (u; v)) 1 e 2 = (1) = 1 e 2 1 2 1 2 @(x; y) @(u; v) (u2 +v2 ) (u2 +v2 ) for (u; v) 2 <2 The support set of U is < and the support set of V is <. Since g (u; v) can be written as g (u; v) = 1 p e 2 1 2 u 2 1 p e 2 1 2 v 2 for all (u; v) 2 < < = <2 , therefore by the Factorization Theorem for Independence, U and V are independent random variables. We also note that the joint probability density function is the product of two N(0; 1) probability density functions. Therefore U N(0; 1) and V N(0; 1) independently. Let x and y be two independent Uniform(0; 1) observations which have been generated using a random number generator. Then from the result above we have that u = ( 2 log x)1=2 cos (2 y) v = ( 2 log x)1=2 sin (2 y) are two independent N(0; 1) observations. The result in the following theorem is one that was used (without proof) in a previous statistics course such as STAT 221/231/241 to construct con…dence intervals and test hypotheses regarding the mean in a N ; 2 model when the variance 2 is unknown. 4.2.9 Theorem - t Distribution If X 2 (n) independently of Z N(0; 1) then Z T =p X=n t(n) 134 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES Proof The transformation T = p Z X=n is not a one-to-one transformation. However if we add the variable U = X to complete the transformation and consider the transformation Z S: T =p , U =X X=n then this transformation has an inverse transformation given by U n X = U, Z = T Since X X and Z is 2 (n) independently of Z f (x; z) = f1 (x) f2 (z) = = 1=2 N(0; 1) the joint probability density function of 2n=2 1 xn=2 (n=2) 1 p xn=2 2(n+1)=2 (n=2) 1 e x=2 1 e e x=2 1 p e 2 z 2 =2 z 2 =2 with support set RXZ = f(x; z) : x > 0; z 2 <g. The transformation S maps RXZ into RT U = f(t; u) : t 2 <; u > 0g. The Jacobian of the inverse transformation is @ (x; z) = @ (t; u) @x @t @z @t @x @u @z @u 0 = 1 u 1=2 n @z @u = u n 1=2 The joint probability density function of U and V is given by g (t; u) = f = = t u n 2(n+1)=2 2(n+1)=2 1=2 u n ;u 1=2 u 1=2 1 2 p un=2 1 e u=2 e t u=(2n) n (n=2) 2 1 p u(n+1)=2 1 e u(1+t =n)=2 for (t; u) 2 RT U (n=2) n and 0 otherwise. To determine the distribution of T we need to …nd the marginal probability density function for T . g1 (t) = Z1 g (t; u) du 1 = 2(n+1)=2 1 p (n=2) n Z1 0 u(n+1)=2 1 e u(1+t2 =n)=2 du 4.2. ONE-TO-ONE TRANSFORMATIONS 2 135 1 2 g1 (t) = 2(n+1)=2 1 p (n=2) n Z1 " 0 = 2(n+1)=2 1 p 2(n+1)=2 (n=2) n = 1 p (n=2) n 1+ = n+1 2p 1+ (n=2) n t2 " e y # 1 (n+1)=2 Z1 t2 1+ n dy. Note that when " 1 t2 2 1+ n y (n+1)=2 1 e y # dy dy 0 (n+1)=2 n+1 2 n t2 n # 1 (n+1)=2 1 t2 2y 1 + n 1 t2 n Let y = u2 1 + tn so that u = 2y 1 + tn and du = 2 1 + u = 0 then y = 0, and when u ! 1 then y ! 1. Therefore (n+1)=2 for t 2 < which is the probability density function of a random variable with a t(n) distribution. Therefore Z T =p X=n as required. 4.2.10 t(n) Example Use Theorem 4.2.9 to …nd E (T ) and V ar (T ) if T t(n). Solution If X 2 (n) independently of Z N(0; 1) then we know from the previous theorem that Z T =p X=n t(n) Now E (T ) = E Z p X=n ! = p nE (Z) E X 1=2 since X and Z are independent random variables. Since E (Z) = 0 it follows that E (T ) = 0 136 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES as long as E X 1=2 E X 2 (n) exists. Since X k = Z1 xk 2n=2 1 xn=2 (n=2) 1 e x=2 dx 0 = 1 (n=2) Z1 xk+n=2 1 (n=2) Z1 (2y)k+n=2 2k+n=2 2n=2 (n=2) Z1 y k+n=2 2n=2 1 e x=2 dx let y = x=2 0 = 2n=2 1 e y (2) dy 0 = 1 e y dy 0 = 2k (n=2 + k) (n=2) (4.3) which exists for n=2 + k > 0. If k = 1=2 then the integral exists for n=2 > 1=2 or n > 1. Therefore E (T ) = 0 for n > 1 Now V ar (T ) = E T 2 = E T2 and E T2 = E Since Z Z2 X=n [E (T )]2 since E (T ) = 0 = nE Z 2 E X 1 N(0; 1) then E Z2 = V ar (Z) + [E (Z)]2 = 1 + 02 = 1 Also by (4.3) E X 1 = = 2 1 (n=2 1) 1 = (n=2) 2 (n=2 1) 1 n 2 which exists for n > 2. Therefore V ar (T ) = E T 2 = nE Z 2 E X 1 = n (1) n 2 n = for n > 2 n 2 1 4.3. MOMENT GENERATING FUNCTION TECHNIQUE 137 The following theorem concerns the F distribution which is used in testing hypotheses about the parameters in a multiple linear regression model. 4.2.11 If X Theorem - F Distribution 2 (n) independently of Y 2 (m) U= 4.2.12 then X=n Y =m F(n; m) Exercise (a) Prove Theorem 4.2.11. Hint: Complete the transformation with V = Y . (b) Find E(U ) and V ar(U ) and note for what values of n and m that they exist. Hint: Use the technique and results of Example 4.2.10. 4.3 Moment Generating Function Technique The moment generating function technique is particularly useful in determining the distribution of a sum of two or more independent random variables if the moment generating functions of the random variables exist. 4.3.1 Theorem Suppose X1 ; X2 ; : : : ; Xn are independent random variables and Xi has moment generating function Mi (t) which exists for t 2 ( h; h) for some h > 0. The moment generating function n P of Y = Xi is given by i=1 MY (t) = n Q Mi (t) i=1 for t 2 ( h; h). If the Xi ’s are independent and identically distributed random variables each with n P moment generating function M (t) then Y = Xi has moment generating function i=1 MY (t) = [M (t)]n for t 2 ( h; h). Proof The moment generating function of Y = n P i=1 Xi is 138 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES MY (t) = E etY = E exp t n P Xi i=1 = = n Q i=1 n Q E etXi Mi (t) i=1 since X1 ; X2 ; : : : ; Xn are independent random variables for t 2 ( h; h) If X1 ; X2 ; : : : ; Xn are identically distributed each with moment generating function M (t) then n Q MY (t) = M (t) i=1 = [M (t)]n for t 2 ( h; h) as required. Note: This theorem in conjunction with the Uniqueness Theorem for Moment Generating Functions can be used to …nd the distribution of Y . Here is a summary of results about sums of random variables for the named distributions. 4.3.2 (1) If Xi Special Results Binomial(ni ; p), i = 1; 2; : : : ; n independently, then n P Xi Binomial i=1 (2) If Xi Poisson( i ), i = 1; 2; : : : ; n independently, then n P Xi Poisson n P i i=1 Negative Binomial(ki ; p), i = 1; 2; : : : ; n independently, then n P Xi Negative Binomial i=1 (4) If Xi ni ; p i=1 i=1 (3) If Xi n P n P ki ; p i=1 Exponential( ), i = 1; 2; : : : ; n independently, then n P Xi Gamma(n; ) i=1 (5) If Xi Gamma( i ; ), i = 1; 2; : : : ; n independently, then n P i=1 Xi Gamma n P i=1 i; 4.3. MOMENT GENERATING FUNCTION TECHNIQUE (6) If Xi 2 (k ), i i = 1; 2; : : : ; n independently, then n P n P 2 Xi i=1 (7) If Xi N ; 139 2 ki i=1 , i = 1; 2; : : : ; n independently, then n P 2 Xi 2 (n) i=1 Proof (1) Suppose Xi function of Xi is Binomial(ni ; p), i = 1; 2; : : : ; n independently. The moment generating ni Mi (t) = pet + q for t 2 < for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = n P Xi is i=1 MY (t) = = n Q i=1 n Q Mi (t) pet + q ni i=1 n P pet + q = ni i=1 which is the moment generating function of a Binomial for t 2 < n P ni ; p random variable. There- i=1 fore by the Uniqueness Theorem for Moment Generating Functions n P Xi Binomial ni ; p . i=1 i=1 (2) Suppose Xi tion of Xi is n P Poisson( i ), i = 1; 2; : : : ; n independently. The moment generating funct Mi (t) = e i (e 1) for t 2 < for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = n P Xi is i=1 MY (t) = = n Q i=1 n Q Mi (t) t e i (e i=1 = e n P i=1 i 1) ! (et 1) which is the moment generating function of a Poisson for t 2 < n P i=1 i by the Uniqueness Theorem for Moment Generating Functions random variable. Therefore n P i=1 Xi Poisson n P i=1 i . 140 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES (3) Suppose Xi Negative Binomial(ki ; p), i = 1; 2; : : : ; n independently. The moment generating function of Xi is p 1 qet Mi (t) = ki for t < log (q) for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = n P Xi is i=1 MY (t) = = n Q i=1 n Q i=1 Mi (t) 1 p qet n P p 1 qet = ki ki i=1 for t < log (q) which is the moment generating function of a Negative Binomial n P ki ; p random vari- i=1 able. Therefore by the Uniqueness Theorem for Moment Generating Functions n n P P ki ; p . Xi Negative Binomial i=1 i=1 (4) Suppose Xi Exponential( ), i = 1; 2; : : : ; n independently. The moment generating function of each Xi is 1 1 for t < M (t) = 1 t n P for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = Xi is i=1 n 1 MY (t) = [M (t)]n = 1 = t for t < 1 which is the moment generating function of a Gamma(n; ) random variable. Therefore by n P the Uniqueness Theorem for Moment Generating Functions Xi Gamma(n; ). i=1 (5) Suppose Xi function of Xi is Gamma( i ; ), i = 1; 2; : : : ; n independently. The moment generating Mi (t) = i 1 1 for t < t 1 for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = n P i=1 MY (t) = n Q Mi (t) = i=1 i=1 = 1 1 n Q n P i=1 t i 1 1 t i for t < 1 Xi is 4.3. MOMENT GENERATING FUNCTION TECHNIQUE n P which is the moment generating function of a Gamma 141 i; random variable. There- i=1 fore by the Uniqueness Theorem for Moment Generating Functions n P Xi Gamma i=1 (6) Suppose Xi of Xi is 2 (k i ), n P i; i=1 i = 1; 2; : : : ; n independently. The moment generating function Mi (t) = ki 1 1 1 2 for t < 2t for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y = n P Xi is i=1 MY (t) = n Q Mi (t) i=1 = n Q 1 i=1 = ki 1 1 1 2t n P ki i=1 2t 2 which is the moment generating function of a for t < n P ki 1 2 random variable. Therefore by i=1 the Uniqueness Theorem for Moment Generating Functions n P Xi i=1 (7) Suppose Xi Theorem 2.6.3 N ; 2 2 n P ki . i=1 , i = 1; 2; : : : ; n independently. Then by Example 2.6.9 and 2 Xi 2 (1) for i = 1; 2; : : : ; n and by (6) n P Xi 2 2 (n) i=1 4.3.3 Exercise Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with moment generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1. Give an exn P p pression for the moment generating function of Z = n X = where X = n1 Xi in i=1 terms of M (t). The following theorem is one that was used in your previous probability and statistics courses without proof. The method of moment generating functions now allows us to easily proof this result. . 142 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES 4.3.4 Theorem - Linear Combination of Independent Normal Random Variables If Xi N( i ; 2 ), i i = 1; 2; : : : ; n independently, then n P ai Xi N i=1 Proof Suppose Xi Xi is 2 ), i N( i ; n P i=1 n P ai i ; i=1 a2i 2 i i = 1; 2; : : : ; n independently. The moment generating function of Mi (t) = e 2 2 i t+ i t =2 for t 2 < for i = 1; 2; : : : ; n. The moment generating function of Y = n P ai Xi is i=1 MY (t) = E etY = E exp t n P ai Xi i=1 = = = n Q E e(ai t)Xi i=1 n Q i=1 n Q since X1 ; X2 ; : : : ; Xn are independent random variables Mi (ai t) e 2 2 2 i ai t+ i ai t =2 i=1 = exp n P i=1 ai i n P t+ i=1 a2i 2 i t2 =2 which is the moment generating function of a N for t 2 < n P i=1 ai i ; n P i=1 a2i 2 i random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions n P ai Xi N i=1 4.3.5 Corollary If Xi N( ; 2 ), n P i=1 ai i ; n P i=1 a2i i = 1; 2; : : : ; n independently then n P Xi 2 N n ;n i=1 and X= n 1 P Xi n i=1 2 N ; n 2 i 4.3. MOMENT GENERATING FUNCTION TECHNIQUE 143 Proof To prove that n P Xi 2 N n ;n i=1 let ai = 1, i 2 i = , and = 2 in Theorem 4.3.4 to obtain n P Xi n P N i=1 or n P ; i=1 n P Xi 2 i=1 2 N n ;n i=1 To prove that X= we note that n 1 P Xi n i=1 n P X= i=1 Let ai = n1 , i = , and X= 2 i = n P i=1 or 2 1 n 2 N ; 1 n n Xi in Theorem 4.3.4 to obtain Xi n P N 1 n i=1 ; n P i=1 1 n 2 2 ! 2 X N ; n The following identity will be used in proving Theorem 4.3.8. 4.3.6 Useful Identity n P i=1 4.3.7 (Xi )2 = n P Xi X 2 +n X 2 i=1 Exercise Prove the identity 4.3.6 As mentioned previously the t distribution is used to construct con…dence intervals and test hypotheses regarding the mean in a N ; 2 model. We are now able to prove the theorem on which these results are based. 144 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES 4.3.8 Theorem If Xi N( ; 2 ), i = 1; 2; : : : ; n independently then 2 X independently of 1) S 2 (n 2 N n P ; Xi n 2 X i=1 = where S2 = 2 2 n P Xi (n 1) 2 X i=1 n 1 Proof For a proof that X and S 2 are independent random variables please see Problem 16. By identity 4.3.6 n P )2 = (Xi i=1 Dividing both sides by 2 n P Xi X 2 +n X 2 i=1 gives n P |i=1 2 Xi = {z | } Y (n 1) S 2 2 {z 2 + } U X p = n | {z } V Since X and S 2 are independent random variables, it follows that U and V are independent random variables. By 4.3.2(7) Y = n P 2 Xi 2 (n) i=1 with moment generating function MY (t) = (1 X N ; 2 n 2t) n=2 for t < 1 2 was proved in Corollary 4.3.5. By Example 2.6.9 X p = n N (0; 1) and by Theorem 2.6.3 V = X p = n 2 2 (1) (4.4) 4.3. MOMENT GENERATING FUNCTION TECHNIQUE 145 with moment generating function MV (t) = (1 1=2 2t) for t < 1 2 (4.5) Since U and V are independent random variables and Y = U + V then MY (t) = E etY = E et(U +V ) = E etU E etV = MU (t) MV (t) (4.6) Substituting (4.4) and (4.5) into (4.6) gives (1 2t) n=2 = MU (t) (1 2t) 1=2 for t < 1 2 or 1 2 2 which is the moment generating function of a (n 1) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions MU (t) = (1 U= 4.3.9 Theorem If Xi N( ; 2 ), (n (n 1)=2 2t) 1) S 2 2 2 for t < (n 1) i = 1; 2; : : : ; n independently then X p S= n t (n 1) Proof X p X Z = n p =r =q S= n (n 1)S 2 U 2 n 1 n 1 where Z= X p = n (n 1) S 2 N (0; 1) independently of U= 2 2 (n 1) Therefore by Theorem 4.2.9 X p S= n t (n 1) The following theorem is useful for testing the equality of variances in a two sample Normal model. 146 4.3.10 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES Theorem Suppose X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables, and independently Y1 ; Y2 ; : : : ; Ym are independent N 2 ; 22 random variables. Let S12 = Then 4.3.11 n P Xi X i=1 n and S22 = 1 S12 = S22 = 2 2 1 2 2 F (n 1; m m P Yi Y i=1 m 1) Exercise Prove Theorem 4.3.10. Hint: Use Theorems 4.3.8 and 4.2.11. 1 2 4.4. CHAPTER 4 PROBLEMS 4.4 147 Chapter 4 Problems 1. Show that if X and Y are independent random variables then U = h (X) and V = g (Y ) are also independent random variables where h and g are real-valued functions. 2. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 24xy for 0 < x + y < 1; 0 < x < 1; 0 < y < 1 and 0 otherwise. (a) Find the joint probability density function of U = X + Y and V = X. Be sure to specify the support set of (U; V ). (b) Find the marginal probability density function of U and the marginal probability density function of V . Be sure to specify their support sets. 3. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = e y for 0 < x < y < 1 and 0 otherwise. (a) Find the joint probability density function of U = X + Y and V = X. Be sure to specify the support set of (U; V ). (b) Find the marginal probability density function of U and the marginal probability density function of V . Be sure to specify their support sets. 4. Suppose X and Y are nonnegative continuous random variables with joint probability density function f (x; y). Show that the probability density function of U = X + Y is given by Z1 g (u) = f (v; u v) dv 0 Hint: Consider the transformation U = X + Y and V = X. 5. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 2 (x + y) for 0 < x < y < 1 and 0 otherwise. (a) Find the joint probability density function of U = X and V = XY . Be sure to specify the support set of (U; V ). 148 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES (b) Are U and V independent random variables? (c) Find the marginal probability density function’s of U and V . Be sure to specify their support sets. 6. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 4xy for 0 < x < 1; 0 < y < 1 and 0 otherwise. (a) Find the probability density function of T = X + Y using the cumulative distribution function technique. (b) Find the joint probability density function of S = X and T = X + Y . Find the marginal probability density function of T and compare your answer to the one you obtained in (a). (c) Find the joint probability density function of U = X 2 and V = XY . Be sure to specify the support set of (U; V ). (d) Find the marginal probability density function’s of U and V: (e) Find E(V 3 ): (Hint: Are X and Y independent random variables?) 7. Suppose X and Y are continuous random variables with joint probability density function f (x; y) = 4xy for 0 < x < 1; 0 < y < 1 and 0 otherwise. (a) Find the joint probability density function of U = X=Y and V = XY: Be sure to specify the support set of (U; V ). (b) Are U and V independent random variables? (c) Find the marginal probability density function’s of U and V . Be sure to specify their support sets. 8. Suppose X and Y are independent Uniform(0; ) random variables. Find the probability density function of U = X Y: (Hint: Complete the transformation with V = X + Y:) 9. Suppose Z1 N(0; 1) and Z2 X1 = where 1< 1; 2 1 + < 1, (a) Show that (X1 ; X2 )T N(0; 1) independently. Let 1 Z1 , 1, X2 = 2 > 0 and BVN( ; ). 2 + 2[ 1< Z1 + 1 < 1. 2 1=2 Z2 ] 4.4. CHAPTER 4 PROBLEMS 149 (b) Show that (X )T 1 (X where Z = (Z1 ; Z2 )T . 10. Suppose X V =X Y. N ; 2 2 (2). ) and Y N ; 2 Hint: Show (X )T 1 (X ) = ZT Z independently. Let U = X + Y and (a) Find the joint moment generating function of U and V . (b) Use (a) to show that U and V are independent random variables. 11. Let X and Y be independent N(0; 1) random variables and let U = X=Y . (a) Show that U Cauchy(1; 0). (b) Show that the Cauchy(1; 0) probability density function is the same as the t(1) probability density function 12. Let X1 ; X2 ; X3 be independent Exponential(1) random variables. Let the random variables Y1 ; Y2 ; Y3 be de…ned by Y1 = X1 ; X1 + X2 Y2 = X1 + X2 ; X1 + X2 + X3 Y3 = X1 + X2 + X3 Show that Y1 ; Y2 ; Y3 are independent random variables and …nd their marginal probability density function’s. 13. Let X1 ; X2 ; X3 be independent N(0; 1) random variables. Let the random variables Y1 ; Y2 ; Y3 be de…ned by X1 = Y1 cos Y2 sin Y3; X2 = Y1 sin Y2 sin Y3 ; X3 = Y1 cos Y3 for 0 < y1 < 1; 0 < y2 < 2 ; 0 < y3 < Show that Y1 ; Y2 ; Y3 are independent random variables and …nd their marginal probability density function’s. 14. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( ) distribution. Find n P the conditional probability function of X1 ; X2 ; : : : ; Xn given T = Xi = t. i=1 2 (n), X + Y 2 (m), m > n and X and Y independent random 15. Suppose X variables. Use the properties of moment generating functions to show that 2 (m Y n). 150 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES 16. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N( ; 2 ) distribution. In this n P (Xi X)2 are independent random variables. problem we wish to show that X and i=1 Note that this implies that X and S2 = 1 n 1 variables. Let U = (U1 ; U2 ; : : : ; Un ) where Ui = Xi n P (Xi X)2 are also independent random i=1 X; i = 1; 2; : : : ; n and let M (s1 ; s2 ; : : : ; sn ; s) = E[exp( n P si Ui + sX)] i=1 be the joint moment generating function of U and X. (a) Let ti = si s + s=n; i = 1; 2; : : : ; n where s = E[exp( n P n P si Ui + sX)] = E[exp( i=1 si . Show that i=1 ti Xi )] = exp N( ; 2) ti = s and i=1 n P i=1 (c) Use (a) and (b) to show that t2i = n P n P ti + ti + 2 n P i=1 i=1 E[exp(ti Xi )] = exp n P n P i=1 Hint: Since Xi (b) Verify that 1 n t2i =2 2 2 ti =2 s)2 + s2 =n. (si i=1 M (s1 ; s2 ; : : : ; sn ; s) = exp[ s + ( 2 =n)(s2 =2)] exp[ 2 n P (si s)2 =2] i=1 (d) Show that the random variable X is independent of the random vector U and n P thus X and (Xi X)2 are independent. Hint: MX (s) = M (0; 0; : : : ; 0; s) and i=1 MU (s1 ; s2 ; : : : ; sn ) = M (s1 ; s2 ; : : : ; sn ; 0). 5. Limiting or Asymptotic Distributions In a previous probability course the Poisson approximation to the Binomial distribution n x p (1 x p)n x (np)x e x! np for x = 0; 1; : : : ; n if n is large and p is small was used. As well the Normal approximation to the Binomial distribution P (Xn n x p (1 x x) = P Z p)n x for x = 0; 1; : : : ; n ! x np p np (1 p) where Z N (0; 1) if n is large and p is close to 1=2 (a special case of the very important Central Limit Theorem) was used. These are examples of what we will call limiting or asymptotic distributions. In this chapter we consider a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and look at the de…nitions and theorems related to determining the limiting distribution of such a sequence. In Section 5:1 we de…ne convergence in distribution and look at several examples to illustrate its meaning. In Section 5:2 we de…ne convergence in probability and examine its relationship to convergence in distribution. In Section 5:3 we look at the Weak Law of Large Numbers which is an important theorem when examining the behaviour of estimators of unknown parameters (Chapter 6). In Section 5:4 we use the moment generating function to …nd limiting distributions including a proof of the Central Limit Theorem. The Central Limit Theorem was used in STAT 221/231/241 to construct an approximate con…dence interval for an unknown parameter. In Section 5:5 additional limit theorems for …nding limiting distributions are introduced. These additional theorems allow us to determine new limiting distributions by combining the limiting distributions which have been determined from de…nitions, the Weak Law of Large Numbers, and/or the Central Limit Theorem. 151 152 5.1 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Convergence in Distribution In calculus you studied sequences of real numbers a1 ; a2 ; : : : ; an ; : : : and learned theorems which allowed you to evaluate limits such as lim an . In this course we are interested in n!1 a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and what happens to the distribution of Xn as n ! 1. We do this be examining what happens to Fn (x) = P (Xn x), the cumulative distribution function of Xn , as n ! 1. Note that for a …xed value of x, the sequence F1 (x) ; F2 (x) ; : : : ; Fn (x) : : : is a sequence of real numbers. In general we will obtain a di¤erent sequence of real numbers for each di¤erent value of x. Since we have a sequence of real numbers we will be able to use limit theorems you have used in your previous calculus courses to evaluate lim Fn (x). We will need to take care in determining n!1 how Fn (x) behaves as n ! 1 for all real values of x. To formalize these ideas we give the following de…nition for convergence in distribution of a sequence of random variables. 5.1.1 De…nition - Convergence in Distribution Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables and let F1 (x) ; F2 (x) ; : : : ; Fn (x) ; : : : be the corresponding sequence of cumulative distribution functions, that is, Xn has cumulative distribution function Fn (x) = P (Xn x). Let X be a random variable with cumulative distribution function F (x) = P (X x). We say Xn converges in distribution to X and write Xn !D X if lim Fn (x) = F (x) n!1 at all points x at which F (x) is continuous. We call F the limiting or asymptotic distribution of Xn . Note: (1) Although we say the random variable Xn converges in distribution to the random variable X, the de…nition of convergence in distribution is de…ned in terms of the pointwise convergence of the corresponding sequence of cumulative distribution functions. (2) This de…nition holds for both discrete and continuous random variables. (3) One way to think about convergence in distribution is that, if Xn !D X, then for large n Fn (x) = P (Xn x) F (x) = P (X x) if x is a point of continuity of F (x). How good the approximation is will depend on the values of n and x. The following theorem and corollary will be useful in determining limiting distributions. 5.1. CONVERGENCE IN DISTRIBUTION 5.1.2 Theorem - e Limit If b and c are real constants and lim n!1 lim n!1 5.1.3 153 (n) = 0 then 1+ b + n (n) n cn = ebc Corollary If b and c are real constants then lim 1+ n!1 5.1.4 cn b n = ebc Example Let Yi Exponential(1) ; i = 1; 2; : : : independently. Consider the sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ) log n. Find the limiting distribution of Xn . Solution Since Yi Exponential(1) P (Yi y) = ( 0 1 y e y 0 y>0 for i = 1; 2; : : :. Since the Yi ’s are independent random variables Fn (x) = P (Xn x) = P (max (Y1 ; Y2 ; : : : ; Yn ) = P (max (Y1 ; Y2 ; : : : ; Yn ) = P (Y1 x + log n; Y2 n Q = P (Yi x + log n) = i=1 n Q e (x+log n) x n 1 log n x + log n) x + log n; : : : ; Yn for x + log n > 0 i=1 = As n ! 1, log n ! 1 e n for x > n!1 lim n!1 = e by 5.1.3. log n 1 so lim Fn (x) = e x x) 1+ ( e x) n for x 2 < n x + log n) 154 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Consider the function e F (x) = e x for x 2 < x Since F 0 (x) = e x e e > 0 for all x 2 <, therefore F (x) is a continuous, increasing x x function for x 2 <. Also lim F (x) = lim e e = 0, and lim F (x) = lim e e = 1. x! 1 x!1 x! 1 x!1 Therefore F (x) is a cumulative distribution function for a continuous random variable. Let X be a random variable with cumulative distribution function F (x). Since lim Fn (x) = F (x) n!1 for all x 2 <, that is at all points x at which F (x) is continuous, therefore Xn !D X In Figure 5.1 you can see how quickly the curves Fn (x) approach the limiting curve F (x). 1 n= 1 n= 2 n= 5 n= 10 n= infinity 0.9 0.8 0.7 Fn(x) 0.6 0.5 0.4 0.3 0.2 0.1 0 -3 -2 -1 0 x Figure 5.1: Graphs of Fn (x) = 1 5.1.5 1 e 2 x n n 3 4 for n = 1; 2; 5; 10; 1 Example Let Yi Uniform(0; ), i = 1; 2; : : : independently. Consider the sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = max (Y1 ; Y2 ; : : : ; Yn ). Find the limiting distribution of Xn . Solution Since Yi Uniform(0; ) P (Yi y) = 8 0 > > < y > > : 1 y 0 0<y< y 5.1. CONVERGENCE IN DISTRIBUTION 155 for i = 1; 2; : : :. Since the Yi ’s are independent random variables Fn (x) = P (Xn = P (Y1 8 n Q > > 0 > > > i=1 > > Q n < x = > i=1 > > n > Q > > 1 > : 8 > > < = x) i=1 x 0 0<x< x i=1 > > : Therefore x) = P (max (Y1 ; Y2 ; : : : ; Yn ) x) n Q P (Yi x; Y2 x; : : : ; Yn x) = 0 x n x 0 0<x< 1 x lim Fn (x) = n!1 ( 0 x< 1 x = F (x) In Figure 5.2 you can see how quickly the curves Fn (x) approach the limiting curve F (x). 1 0.9 0.8 0.7 n= 1 n= 2 n= 5 n= 10 n= 100 n= infinity 0.6 Fn(x) 0.5 0.4 0.3 0.2 0.1 0 -0.5 0 0.5 Figure 5.2: Graphs of Fn (x) = 1 x x n for 1.5 2 2.5 = 2 and n = 1; 2; 5; 10; 100; 1 It is straightforward to check that F (x) is a cumulative distribution function for the discrete random variable X with probability function ( 1 y= f (x) = 0 otherwise 156 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Therefore Xn !D X Since X only takes on one value with probability one, X is called a degenerate random variable. When Xn converges in distribution to a degenerate random variable we also call this convergence in probability to a constant as de…ned in the next section. 5.1.6 Comment Suppose X1 ; X2 ; : : : ; Xn ; : : :is a sequence of random variables such that Xn !D X. Then for large n we can use the approximation P (Xn x) P (X x) If X is degenerate at b then P (X = b) = 1 and this approximation is not very useful. However, if the limiting distribution is degenerate then we could use this result in another way. In Example 5.1.5 we showed that if Yi Uniform(0; ), i = 1; 2; : : : ; n independently then Xn = max (Y1 ; Y2 ; : : : ; Yn ) converges in distribution to a degenerate random variable X with P (X = ) = 1. This result is rather useful since, if we have observed data y1 ; y2 ; : : : ; yn from a Uniform(0; ) distribution and is unknown, then this suggests using y(n) = max (y1 ; y2 ; : : : ; yn ) as an estimate of if n is reasonably large. We will discuss this idea in more detail in Chapter 6. 5.2 Convergence in Probability De…nition 5.1.1 is useful for …nding the limiting distribution of a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : when the sequence of corresponding cumulative distribution functions F1 ; F2 ; : : : ; Fn ; : : : can be obtained. In other cases we may use the following de…nition to determine the limiting distribution. 5.2.1 De…nition - Convergence in Probability A sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges in probability to a random variable X if, for all " > 0, lim P (jXn Xj ") = 0 n!1 or equivalently lim P (jXn n!1 Xj < ") = 1 We write Xn !p X Convergence in probability is a stronger former of convergence than convergence in distribution in the sense that convergence in probability implies convergence in distribution as stated in the following theorem. However, if Xn converges in distribution to X, then Xn may or may not converge in probability to X. 5.2. CONVERGENCE IN PROBABILITY 5.2.2 157 Theorem - Convergence in Probability Implies Convergence in Distribution If Xn !p X then Xn !D X. In Example 5.1.5 the limiting distribution was degenerate. When the limiting distribution is degenerate we say Xn converges in probability to a constant. The following de…nition, which follows from De…nition 5.2.1, can be used for proving convergence in probability. 5.2.3 De…nition - Convergence in Probability to a Constant A sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges in probability to a constant b if, for all " > 0, lim P (jXn bj ") = 0 n!1 or equivalently lim P (jXn bj < ") = 1 n!1 We write Xn !p b 5.2.4 Example Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables with E(Xn ) = V ar(Xn ) = 2n . If lim n = a and lim 2n = 0 then show Xn !p a. n!1 n and n!1 Solution To show Xn !p a we need to show that for all " > 0 lim P (jXn aj < ") = 1 lim P (jXn aj n!1 or equivalently n!1 ") = 0 Recall Markov’s Inequality. For all k; c > 0 Xk ck Therefore by Markov’s Inequality with k = 2 and c = " we have P (jXj 0 Now E jXn aj 2 lim P (jXn n!1 h = E (Xn h = E (Xn E c) aj 2 a) ") lim n!1 i i 2 ) + 2( n n E jXn aj2 (5.1) "2 a) E (Xn n) +( n a)2 158 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Since lim E [Xn n] n!1 2 = lim V ar (Xn ) = lim n!1 n!1 2 n =0 and lim ( n!1 a) = 0 since n therefore lim n!1 n h aj2 = lim E (Xn lim E jXn n!1 n!1 which also implies lim i a)2 = 0 aj2 E jXn n!1 =a =0 "2 (5.2) Thus by (5.1), (5.2) and the Squeeze Theorem lim P (jXn n!1 aj ") = 0 for all " > 0 as required. The proof in Example 5.2.4 used De…nition 5.2.3 to prove convergence in probability. The reason for this is that the distribution of the Xi ’s was not speci…ed. Only conditions on E(Xn ) and V ar(Xn ) were speci…ed. This means that the result in Example 5.2.4 holds for any sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : satisfying the given conditions. If the sequence of corresponding cumulative distribution functions F1 ; F2 ; : : : ; Fn ; : : : can be obtained then the following theorem can also be used to prove convergence in probability to a constant. 5.2.5 Theorem Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables such that Xn has cumulative distribution function Fn (x). If lim Fn (x) = lim P (Xn n!1 n!1 x) = then Xn !p b. 8 <0 :1 x<b x>b Note: We do not need to worry about whether lim Fn (b) exists since x = b is a point of n!1 discontinuity of the limiting distribution (see De…nition 5.1.1). 5.3. WEAK LAW OF LARGE NUMBERS 5.2.6 159 Example Let Yi Exponential( ; 1), i = 1; 2; : : : independently. Consider the sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : where Xn = min (Y1 ; Y2 ; : : : ; Yn ). Show that Xn !p . Solution Since Yi Exponential( ; 1) P (Yi > y) = ( e (y ) y> 1 y for i = 1; 2; : : :. Since Y1 ; Y2 ; : : : ; Yn are independent random variables Fn (x) = 1 P (Xn > x) = 1 P (min (Y1 ; Y2 ; : : : ; Yn ) > x) n Q P (Y1 > x; Y2 > x; : : : ; Yn > x) = 1 P (Yi > x) = 1 8 n Q > > 1 1 > < i=1 = n Q > > e (x > : 1 i=1 x ) x> i=1 = ( 0 1 e n(x ) x x> Therefore lim Fn (x) = n!1 ( 0 x 1 x> which we note is not a cumulative distribution function since the function is not rightcontinuous at x = . However ( 0 x< lim Fn (x) = n!1 1 x> and therefore by Theorem 5.2.5, Xn !p . In Figure 5.3 you can see how quickly the limit is approached. 5.3 Weak Law of Large Numbers In this section we look at a very important result which we will use in Chapter 6 to show that maximum likelihood estimators have good properties. This result is called the Weak Law of Large Numbers. Needless to say there is another law called the Strong Law of Large Numbers but we will not consider this law here. Also in this section we will look at some simulations to illustrate the theoretical result in the Weak Law of Large Numbers. 160 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS 1 0.9 0.8 0.7 F n(x) 0.6 0.5 n =1 n =2 n =5 n =10 n=100 n =infinity 0.4 0.3 0.2 0.1 0 1.5 2 2.5 3 x Figure 5.3: Graphs of Fn (x) = 1 5.3.1 3.5 e n(x ) for 4 =2 Weak Law of Large Numbers Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with E(Xi ) = and V ar(Xi ) = 2 < 1. Consider the sequence of random X1 ; X2 ; : : : ; Xn ; : : : where n 1 P Xn = Xi n i=1 Then Xn !p Proof Using De…nition 5.2.3 we need to show lim P n!1 Xn " = 0 for all " > 0 We apply Chebyshev’s Theorem (see 2.8.2) to the random variable Xn where E Xn = (see Corollary 3.6.3(3)) and V ar Xn = 2 =n (see Theorem 3.6.7(4)) to obtain Let k = p n" P Xn 0 P k p 1 k2 n for all k > 0 . Then 2 Xn " Since n" 2 lim n!1 n" =0 for all " > 0 5.3. WEAK LAW OF LARGE NUMBERS 161 therefore by the Squeeze Theorem lim P n!1 " = 0 for all " > 0 Xn as required. Notes: (1) The proof of the Weak Law of Large Numbers does not actually require that the random variables be identically distributed, only that they all have the same mean and variance. As well the proof does not require knowing the distribution of these random variables. (2) In words the Weak Law of Large Numbers says that the sample mean Xn approaches the population mean as n ! 1. 5.3.2 If X Example Pareto(1; ) then X has probability density function f (x) = for x 1 x +1 and 0 otherwise. X has cumulative distribution function 8 <0 if x < 1 F (x) = 1 :1 for x 1 x and inverse cumulative distribution function F 1 (x) = (1 x) E (X) = 8 <1 Also and : 1= for 0 < x < 1 if 0 < 1 if 1 >1 V ar (X) = ( 1)2 ( 2) Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed Pareto(1; ) random variables. By the Weak Law of Large Numbers n 1 P Xn = Xi !p E (X) = for > 1 n i=1 1 If Ui Uniform(0; 1), i = 1; 2; : : : ; n independently then by Theorem 2.6.6 Xi = F = (1 1 (Ui ) Ui ) 1= Pareto (1; ) i = 1; 2; : : : ; n independently. If we generate Uniform(0; 1) observations u1 ; u2 ; : : : ; un using a random number generator and then let xi = (1 ui ) 1= then x1 ; x2 ; : : : ; xn are observations from the Pareto(1; ) distribution. 162 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 5) distribution are plotted in Figure 5.4. 4.5 4 3.5 3 xi 2.5 2 1.5 1 0.5 0 100 200 300 400 500 i Figure 5.4: 500 observations from a Pareto(1; 5) distribution Figure 5.5 shows a plot of the points (n; xn ), n = 1; 2; : : : ; 500 where xn = 1 n n P xi is i=1 the sample mean. We note that the sample mean xn is approaching the population mean = E (X) = 5 5 1 = 1:25 as n increases. 1.5 1.45 1.4 1.35 sample mean 1.3 µ=1.25 1.25 1.2 1.15 0 100 200 300 400 500 n Figure 5.5: Graph of xn versus n for 500 Pareto(1; 5) observations 5.3. WEAK LAW OF LARGE NUMBERS 163 If we generate a further 1000 values of xi , and plot (i; xi ) ; i = 1; 2; : : : ; 1500 we obtain the graph in Figure 5.6. 5.5 5 4.5 4 3.5 xi 3 2.5 2 1.5 1 0.5 0 500 1000 1500 i Figure 5.6: 1500 observations from a Pareto(1; 5) distribution The corresponding plot of (n; xn ), n = 1; 2; : : : ; 1500 in shown in Figure 5.7. We note that the sample mean xn stays very close to the population mean = 1:25 for n > 500. 1.5 1.45 1.4 1.35 sample mean 1.3 µ=1.25 1.25 1.2 1.15 0 500 1000 1500 n Figure 5.7: Graph of xn versus n for 1500 Pareto(1; 5) observations Note that these …gures correspond to only one set of simulated data. If we generated another set of data using a random number generator the actual data points would change. However what would stay the same is that the sample mean for the new data set would still approach the mean value E (X) = 1:25 as n increases. 164 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS The points (i; xi ), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 0:5) distribution are plotted in Figure 5.8. For this distribution E (X) = Z1 x x +1 dx = 1 Z1 1 dx which diverges to 1 x 1 Note in Figure 5.8 that the are some very large observations. In particular there is one observation which is close to 40; 000. 4 x10 4 3.5 3 2.5 xi 2 1.5 1 0.5 0 0 100 200 i 300 400 500 Figure 5.8: 500 observations from a Pareto(1; 0:5) distribution The corresponding plot of (n; xn ), n = 1; 2; : : : ; 500 for these data is given in Figure 5.9. Note that the mean xn does not appear to be approaching a …xed value. 700 600 500 sample mean 400 300 200 100 0 0 100 200 n 300 400 500 Figure 5.9: Graph of xn versus n for 500 Pareto(1; 0:5) observations 5.3. WEAK LAW OF LARGE NUMBERS 165 In Figure 5.10 the points (n; xn ) for a set of 50; 000 observations generated from a Pareto(1; 0:5) distribution are plotted. Note that the mean xn does not approach a …xed value and in general is getting larger as n gets large. This is consistent with E (X) diverging to 1. x10 12 4 10 8 sample mean 6 4 2 0 0 1 2 n 3 4 5 x10 4 Figure 5.10: Graph of xn versus n for 50000 Pareto(1; 0:5) observations 5.3.3 If X Example Cauchy(0; 1) then f (x) = 1 (1 + x2 ) for x 2 < The probability density function, shown in Figure 5.11, is symmetric about the y axis and the median of the distribution is equal to 0. 0.35 0.3 0.25 0.2 f(x) 0.15 0.1 0.05 0 -5 0 x 5 Figure 5.11: Cauchy(0; 1) probability density function 166 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS The mean of a Cauchy(0; 1) random variable does not exist since 1 Z0 x dx diverges to 1 + x2 1 and 1 Z1 1 (5.3) x dx diverges to 1 1 + x2 (5.4) 0 The cumulative distribution function of a Cauchy(0; 1) random variable is F (x) = 1 arctan x + 1 2 for x 2 < and the inverse cumulative distribution function is F 1 (x) = tan 1 2 x for 0 < x < 1 If we generate Uniform(0; 1) observations u1 ; u2 ; : : : ; uN using a random number generator and then let xi = tan ui 12 , then the xi ’s are observations from the Cauchy(0; 1) distribution. The points (n; xn ) for a set of 500; 000 observations generated from a Cauchy(0; 1) distribution are plotted in Figure 5.12. Note that xn does not approach a …xed value. However, unlike the Pareto(1; 0:5) example in which xn was getting larger as n got larger, we see in Figure 5.12 that xn drifts back and forth around the line y = 0. This behaviour, which is consistent with (5.3)and (5.4), continues even if more observations are generated. 8 6 4 2 0 sample mean -2 -4 -6 -8 -10 -12 0 1 2 3 n 4 5 x10 5 Figure 5.12: Graph of xn versus n for 50000 Cauchy(0; 1) observations 5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS167 5.4 Moment Generating Function Technique for Limiting Distributions We now look at the moment generating function technique for determining a limiting distribution. Suppose we have a sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : and M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : is the corresponding sequence of moment generating functions. For a …xed value of t, the sequence M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : is a sequence of real numbers. In general we will obtain a di¤erent sequence of real numbers for each different value of t. Since we have a sequence of real numbers we will be able to use limit theorems you have used in your previous calculus courses to evaluate lim Mn (t). We will n!1 need to take care in determining how Mn (t) behaves as n ! 1 for an interval of values of t containing the value 0. Of course this technique only works if the the moment generating function exists and is tractable. 5.4.1 Limit Theorem for Moment Generating Functions Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that Xn has moment generating function Mn (t). Let X be a random variable with moment generating function M (t). If there exists an h > 0 such that lim Mn (t) = M (t) for all t 2 ( h; h) n!1 then Xn !D X Note: (1) The sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : converges if the corresponding sequence of moment generating functions M1 (t) ; M1 (t) ; : : : ; Mn (t) ; : : : converges pointwise. (2) This de…nition holds for both discrete and continuous random variables. Recall from De…nition 5.1.1 that Xn !D X if lim Fn (x) = F (x) n!1 at all points x at which F (x) is continuous. If X is a discrete random variable then the cumulative distribution function is a right continuous function. The values of x of main interest for a discrete random variable are exactly the points at which F (x) is discontinuous. The following theorem indicates that lim Fn (x) = F (x) holds for the values of x at n!1 which F (x) is discontinuous if Xn and X are non-negative integer-valued random variables. The named discrete distributions Bernoulli, Binomial, Geometric, Negative Binomial, and Poisson are all non-negative integer-valued random variables. 168 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS 5.4.2 Theorem Suppose Xn and X are non-negative integer-valued random variables. If Xn !D X then lim P (Xn x) = P (X x) holds for all x and in particular n!1 lim P (Xn = x) = P (X = x) for x = 0; 1; : : : : n!1 5.4.3 Example Consider the sequence of random variables X1 ; X2 ; : : : ; Xk ; : : : where Xk Negative Binomial(k; p). Use Theorem 5.4.1 to determine the limiting distribution of Xk as k ! 1, p ! 1 such that kq=p = remains constant where q = 1 p. Use this limiting distribution and Theorem 5.4.2 to give an approximation for P (Xk = x). Solution If Xk Negative Binomial(k; p) then If k p 1 qet Mk (t) = E etXk = for t < log q (5.5) = kq=p then k +k p= and q = (5.6) +k Substituting 5.6 into 5.5 and simplifying gives k +k Mk (t) = 1 " = " = Now lim k!1 " 1 +k e t !k 1 1 1 (et 1) k #k et 1 k et 1 k # 1 = # +k k et !k k for t < log +k k t = e (e 1) for t < 1 t by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of a Poisson( ) random variable then by Theorem 5.4.1, Xk !D X Poisson( ). By Theorem 5.4.2 P (Xk = x) = k k p ( q)x x kq p x e x! kq=p for x = 0; 1; : : : 5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS169 5.4.4 Exercise - Poisson Approximation to the Binomial Distribution Consider the sequence of random variables X1 ; X2 ; : : : ; Xn ; : : : where Xn Binomial(n; p). Use Theorem 5.4.1 to determine the limiting distribution of Xn as n ! 1, p ! 0 such that np = remains constant. Use this limiting distribution and Theorem 5.4.2 to give an approximation for P (Xn = x). In your previous probability and statistics courses you would have used the Central Limit Theorem (without proof!) for approximating Binomial and Poisson probabilities as well as constructing approximate con…dence intervals. We now give a proof of this theorem. 5.4.5 Central Limit Theorem Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = and V ar (Xi ) = 2 < 1. Consider the sequence of random variables Z1 ; Z2 ; : : : ; Zn ; : : : p n P n(Xn ) where Zn = and Xn = n1 Xi . Then i=1 Zn = p n Xn !D Z N(0; 1) Proof We can write Zn as n 1 P Zn = p (Xi n i=1 ) Suppose that for i = 1; 2; : : :, Xi has moment generating function MX (t), t 2 ( h; h) for some h > 0. Then for i = 1; 2; : : :, (Xi ) has moment generating function t M (t) = e MX (t), t 2 ( h; h) for some h > 0. Note that M (0) = 1, M 0 (0) = E (Xi and h M 00 (0) = E (Xi ) = E (Xi ) i )2 = V ar (Xi ) = =0 2 Also by Taylor’s Theorem (see 2.11.15) for n = 2 we have 1 M (t) = M (0) + M 0 (0) t + M 00 (c) t2 2 1 00 2 = 1 + M (c) t 2 for some c between 0 and t. (5.7) 170 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Since X1 ; X2 ; : : : ; Xn are independent and identically distributed, the moment generating function of Zn is Mn (t) = E etZn = E exp = t p M Using (5.7) in (5.8) gives " 1 Mn (t) = 1 + M 00 (cn ) 2 ( 1 t p 1+ = 2 n t p n n t P p (Xi n i=1 n 2 t p for n <h n (5.8) # 2 n 2 M 00 (cn ) for some cn between 0 and But M 00 (0) = ) 1 M 00 (0) + 2 t p t p 2 n M 00 (0) )n n so ( Mn (t) = 1 2 2t 1+ t2 2 2 + n [M 00 (cn ) n for some cn between 0 and t p Since cn is between 0 and n M 00 (0)] t p )n n , cn ! 0 as n ! 1. Since M 00 (t) is continuous on ( h; h) lim M 00 (cn ) = M 00 n!1 and lim n!1 lim cn = M 00 (0) = [M 00 (cn ) t2 2 2 2 n!1 M 00 (0)] n =0 Therefore by Theorem 5.1.2, with t2 2 2 (n) = we have lim Mn (t) = n!1 lim n!1 1 2 = e2t ( 1+ M 00 (cn ) 1 2 2t n + t2 2 2 M 00 (0) [M 00 (cn ) M 00 (0)] n )n for jtj < 1 which is the moment generating function of a N(0; 1) random variable. Therefore by Theorem 5.4.1 Zn !D Z N(0; 1) 5.4. MOMENT GENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS171 as required. Note: Although this proof assumes that the moment generating function of Xi , i = 1; 2; : : : exists, it does not make any assumptions about the form of the distribution of the Xi ’s. There are other more general proofs of the Central Limit Theorem which only assume the existence of the variance 2 (which implies the existence of the mean ). 5.4.6 Example - Normal Approximation to the 2 Distribution 2 (n), n = 1; 2; : : :. Consider the sequence of random variables Z ; Z ; : : : ; Z ; : : : Suppose Yn 1 2 n p where Zn = (Yn n) = 2n. Show that Yn n Zn = p !D Z 2n N(0; 1) Solution 2 (1), i = 1; 2; : : : independently. Since X ; X ; : : : are independent and identically Let Xi 1 2 distributed random variables with E (Xi ) = 1 and V ar (Xi ) = 2, then by the Central Limit Theorem p n Xn 1 p !D Z N(0; 1) 2 n P But Xn = n1 Xi so i=1 p n 1 n n P p where Sn = n P Xi 1 i=1 2 Sn = p n 2n Xi . Therefore i=1 Sn p Now by 4.3.2(6), Sn that 5.4.7 2 (n) n !D Z 2n N(0; 1) and therefore Yn and Sn have the same distribution. It follows Yn n Zn = p !D Z 2n N(0; 1) Exercise - Normal Approximation to the Binomial Distribution Suppose Yn Binomial(n; p); n = 1; 2; : : : Consider the sequence of random variables p Z1 ; Z2 ; : : : ; Zn ; : : : where Zn = (Yn np) = np(1 p). Show that Hint: Let Xi Yn np Zn = p !D Z np(1 p) N(0; 1) Binomial(1; p); i = 1; 2; : : : ; n independently. 172 5.5 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Additional Limit Theorems Suppose we know the limiting distribution of one or more sequences of random variables by using the de…nitions and/or theorems in the previous sections of this chapter. The theorems in this section allow us to more easily determine the limiting distribution of a function of these sequences. 5.5.1 Limit Theorems (1) If Xn !p a and g is continuous at x = a then g(Xn ) !p g(a). (2) If Xn !p a, Yn !p b and g(x; y) is continuous at (a; b) then g(Xn ; Yn ) !p g(a; b). (3) (Slutsky’s Theorem) If Xn !p a, Yn !D Y and g(a; y) is continuous for all y 2 support set of Y then g(Xn ; Yn ) !D g(a; Y ). Proof of (1) Since g is continuous at x = a then for every " > 0 there exists a > 0 such that jx implies jg (x) g (a)j < ". By Example 2.1.4(d) this implies that P (jg (Xn ) g (a)j < ") P (jX But Xn !p a it follows that for every " > 0 there exists a lim P (jg (Xn ) n!1 g (a)j < ") aj < aj < ) > 0 such that lim P (jX aj < ) = 1 n!1 But lim P (jg (Xn ) g (a)j < ") lim P (jg (Xn ) g (a)j < ") = 1 n!1 1 so by the Squeeze Theorem n!1 and therefore g(Xn ) !p g(a). 5.5.2 Example If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z of each of the following: p (b) Xn (b) Xn + Yn (c) Yn + Zn (d) Xn Zn (e) Zn2 N(0; 1) then …nd the limiting distributions 5.5. ADDITIONAL LIMIT THEOREMS 173 Solution p (a) Let g (x) = x which is a continuous function for all x 2 <+ . Since Xn !p a then by p p p p 5.5.1(1), Xn = g (Xn ) !p g (a) = a or Xn !p a. (b) Let g (x; y) = x + y which is a continuous function for all (x; y) 2 <2 . Since Xn !p a and Yn !p b then by 5.5.1(2), Xn + Yn = g (Xn ; Yn ) !p g (a; b) = a + b or Xn + Yn !p a + b. (c) Let g (y; z) = y + z which is a continuous function for all (y; z) 2 <2 . Since Yn !p b and Zn !D Z N(0; 1) then by 5.5.1(3), Yn + Zn = g (Yn ; Zn ) !D g (b; z) = b + Z or Yn + Zn !D b + Z where Z N(0; 1). Since b + Z N(b; 1), therefore Yn + Zn !D b + Z N(b; 1) (d) Let g (x; z) = xz which is a continuous function for all (x; z) 2 <2 . Since Xn !p a and Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn Zn = g (Xn ; Zn ) !D g (a; z) = aZ or Xn Zn !D aZ where Z N(0; 1). Since aZ N 0; a2 , therefore Xn Zn !D aZ N 0; a2 (e) Let g (x; z) = z 2 which is a continuous function for all (x; z) 2 <2 . Since Zn !D Z N(0; 1) then by Slutsky’s Theorem, Zn2 = g (Xn ; Zn ) !D g (a; z) = Z 2 or Zn2 !D Z 2 where 2 (1), therefore Z 2 ! Z 2 2 (1) Z N(0; 1). Since Z 2 D n 5.5.3 Exercise If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z of each of the following: N(0; 1) then …nd the limiting distributions (a) Xn2 (b) Xn Yn (c) Xn =Yn (d) Xn 2Zn (e) 1=Zn In Example 5.5.2 we identi…ed the function g in each case. As with other limit theorems we tend not to explicitly identify the function g once we have a good idea of how the theorems work as illustrated in the next example. 5.5.4 Example Suppose Xi Poisson( ), i = 1; 2; : : : independently. Consider the sequence of random variables Z1 ; Z2 ; : : : ; Zn ; : : : where p n Xn p Zn = Xn Find the limiting distribution of Zn . 174 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Solution Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = and V ar (Xi ) = , then by the Central Limit Theorem p n Xn Wn = !D Z N(0; 1) (5.9) p and by the Weak Law of Large Numbers Xn !p By (5.10) and 5.5.1(1) Un = s Xn (5.10) !p r =1 (5.11) Now Zn = = = p n Xn p Xn p n(Xn ) p q Xn Wn Un By (5.9), (5.11), and Slutsky’s Theorem Zn = 5.5.5 Wn Z !D =Z Un 1 N(0; 1) Example Suppose Xi Uniform(0; 1), i = 1; 2; : : : independently. Consider the sequence of random variables U1 ; U2 ; : : : ; Un ; : : : where Un = max (X1 ; X2 ; : : : ; Xn ). Show that (a) Un !p 1 (b) eUn !p e (c) sin (1 Un ) !p 0 (d) Vn = n (1 (e) 1 e Vn Un ) !D V !D 1 2 (f ) (Un + 1) [n (1 e V Exponential(1) Uniform(0; 1) Un )] !D 4V Exponential(4) 5.5. ADDITIONAL LIMIT THEOREMS 175 Solution (a) Since X1 ; X2 ; : : : ; Xn are Uniform(0; 1) random variables then for i = 1; 2; : : : 8 0 x 0 > > < x 0<x<1 P (Xi x) = > > : 1 x 1 Since X1 ; X2 ; : : : ; Xn are independent random variables Fn (x) = P (Un u) = P (max (X1 ; X2 ; : : : ; Xn ) n Q = P (Xi u) u) i=1 8 0 > > < un = > > : 1 Therefore u 0 0<u<1 u lim Fn (u) = n!1 ( 1 0 u<1 1 u 1 and by Theorem 5.2.5 Un = max (X1 ; X2 ; : : : ; Xn ) !p 1 (5.12) (b) By (5.12) and 5.5.1(1) eUn !p e1 = e (c) By (5.12) and 5.5.1(1) sin (1 Un ) !p sin (1 1) = sin (0) = 0 (d) The cumulative distribution function of Vn = n (1 Gn (x) = P (Vn Un ) is v) = P (n (1 max (X1 ; X2 ; : : : ; Xn )) 1 P max (X1 ; X2 ; : : : ; Xn ) 1 = P max (X1 ; X2 ; : : : ; Xn ) = 1 n Q = 1 = ( v) v n P Xi v n 1 i=1 1 0 1 v v n n 0 v>0 v n 176 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS Therefore lim Gn (v) = n!1 ( 0 1 v v e 0 v>0 which is the cumulative distribution function of an Exponential(1) random variable. Therefore by De…nition (5.1.1) Vn = n (1 Un ) !D V Exponential(1) (5.13) (e) By (5.13) and Slutsky’s Theorem 1 Vn e !D 1 V e where V The probability density function of W = 1 V is d [ log (1 dw for 0 < w < 1 g (w) = elog(1 = 1 e Exponential(1) w) w)] which is the probability of a Uniform(0; 1) random variable. Therefore 1 e Vn !D 1 e V Uniform (0; 1) (f ) By (5.12) and 5.5.1(1) (Un + 1)2 !p (1 + 1)2 = 4 (5.14) By (5.13), (5.14), and Slutsky’s Theorem (Un + 1)2 [n (1 where V Exponential(1). If V (Un + 1)2 [n (1 5.5.6 Un )] !D 4V Exponential(1) then 4V Un )] !D 4V Exponential(4) so Exponential (4) Delta Method Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that nb (Xn a) !D X for some b > 0. Suppose the function g(x) is di¤erentiable at a and g 0 (a) 6= 0. Then nb [g(Xn ) g(a)] !D g 0 (a)X (5.15) 5.5. ADDITIONAL LIMIT THEOREMS 177 Proof By Taylor’s Theorem (2.11.15) we have g (Xn ) = g (a) + g 0 (cn ) (Xn a) g (a) = g 0 (cn ) (Xn a) or g (Xn ) (5.16) where cn is between a and Xn . From (5.15) it follows that Xn !p a. Since cn is between Xn and a, therefore cn !p a and by 5.5.1(1) g 0 (cn ) !p g 0 (a) (5.17) Multiplying (5.16) by nb gives nb [g (Xn ) g (a)] = g 0 (cn ) nb (Xn a) (5.18) Therefore by (5.15), (5.17), (5.18) and Slutsky’s Theorem nb [g(Xn ) 5.5.7 g(a)] !D g 0 (a)X Example Suppose Xi Exponential( ), i = 1; 2; : : : independently. Find the limiting distributions of each of the following: (a) Xn (b) Un = p n Xn n(Xn ) (c) Zn = p Xn (d) Vn = n log(Xn ) p log Solution (a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = and V ar (Xi ) = 2 , then by the Weak Law of Large Numbers Xn !p (5.19) (b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = and V ar (Xi ) = 2 , then by the Central Limit Theorem p n Xn Wn = !D Z N (0; 1) (5.20) Therefore by Slutsky’s Theorem Un = p n Xn !D Z N 0; 2 (5.21) 178 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS (c) Zn can be written as Zn = p p n(Xn n Xn Xn = ) Xn By (5.19) and 5.5.1(1), Xn !p =1 (5.22) By (5.20), (5.22), and Slutsky’s Theorem Zn !D Z =Z 1 N(0; 1) (d) Let g (x) = log x, a = , and b = 1=2. Then g 0 (x) = (5.21) and the Delta Method n1=2 log(Xn ) 5.5.8 !D 1 ( Z) = Z and g 0 (a) = g 0 ( ) = 1 . By N (0; 1) Exercise Suppose Xi Poisson( ), i = 1; 2; : : : independently. Show that p Un = and Vn 5.5.9 log 1 x n Xn p p = n Xn !D U N (0; ) 1 p !D V N 0; 4 Theorem Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random variables such that p n(Xn a) !D X N 0; 2 (5.23) Suppose the function g(x) is di¤erentiable at a. Then p n[g(Xn ) g(a)] !D W N 0; g 0 (a) 2 2 provided g 0 (a) 6= 0. Proof Suppose g(x) is a di¤erentiable function at a and g 0 (a) 6= 0. Let b = 1=2. Then by (5.23) and the Delta Method it follows that p n[g(Xn ) g(a)] !D W N 0; g 0 (a) 2 2 5.6. CHAPTER 5 PROBLEMS 5.6 179 Chapter 5 Problems 1. Suppose Yi butions of Exponential( ; 1), i = 1; 2; : : : independently. Find the limiting distri- (a) Xn = min (Y1 ; Y2 ; : : : ; Yn ) (b) Un = Xn = (c) Vn = n(Xn (d) Wn = n2 (Xn ) ) 2. Suppose X1 ; X2 ; : : : ; Xn are independent and identically distributed continuous random variables with cumulative distribution function F (x) and probability density function f (x). Let Yn = max (X1 ; X2 ; : : : ; Xn ). Show that Zn = n[1 F (Yn )] !D Z Exponential(1) 3. Suppose Xi Poisson( ), i = 1; 2; : : : independently. Find Mn (t) the moment generating function of p Yn = n(Xn ) Show that lim log Mn (t) = n!1 1 2 t 2 What is the limiting distribution of Yn ? 4. Suppose Xi Exponential( ), i = 1; 2; : : : independently. Show that the moment generating function of n p P Xi n = n Zn = i=1 is h Mn (t) = e p t= n p i t= n 1 n Find lim Mn (t) and thus determine the limiting distribution of Zn . n!1 5. If Z N(0; 1) and Wn Show that 2 (n) independently then we know Z Tn = p Wn =n Tn !D Y t(n) N(0; 1) 180 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS 6. Suppose X1 ; X2 ; : : : are independent and identically distributed random variables with E(Xi ) = , V ar(Xi ) = 2 < 1, and E(Xi4 ) < 1. Let Xn = Sn2 = and n 1 P Xi n i=1 n 1 P Xi n 1 i=1 p Tn = Xn 2 n Xn Sn Show that Sn ! p and Tn !D Z 7. Let Xn N(0; 1) Binomial(n; ). Find the limiting distributions of Xn n Xn n (a) Tn = (b) Un = (c) Wn = p (d) Zn = Wn p Un (e) Vn = p n 1 Xn n Xn n n arcsin q Xn n arcsin p (f) Compare the variances of the limiting distributions of Wn , Zn and Vn and comment. 8. Suppose Xi Geometric( ), i = 1; 2; : : : independently. Let Yn = n P i=1 Find the limiting distributions of (a) Xn = Yn n (b) Wn = p (c) Vn = (d) Zn = n Xn 1 1+Xn p ) n p n(V Vn2 (1 Vn ) (1 ) Xi 5.6. CHAPTER 5 PROBLEMS 9. Suppose Xi 181 Gamma(2; ), i = 1; 2; : : : independently. Let Yn = n P Xi i=1 Find the limiting distributions of p 2p Yn n and Xn = 2 p n(Xn 2 ) p Wn = and 2 2 p n(Xn 2 ) Zn = X =p2 n (a) Xn = (b) (c) (d) Un = p n log(Xn ) Vn = p n Xn 2 log(2 ) (e) Compare the variances of the limiting distributions of Zn and Un . 10. Suppose X1 ; X2 ; : : : ; Xn ; : : : is a sequence of random variables with E(Xn ) = and V ar(Xn ) = a np Show that Xn !p for p > 0 182 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS 6. Maximum Likelihood Estimation - One Parameter In this chapter we look at the method of maximum likelihood to obtain both point and interval estimates of one unknown parameter. Some of this material was introduced in a previous statistics course such as STAT 221/231/241. In Section 6:2 we review the de…nitions needed for the method of maximum likelihood estimation, the derivations of the maximum likelihood estimates for the unknown parameter in the Binomial, Poisson and Exponential models, and the important invariance property of maximum likelihood estimates. You will notice that we pay more attention to verifying that the maximum likelihood estimate does correspond to a maximum using the …rst derivative test. Example 6.2.9 is new and illustrates how the maximum likelihood estimate is found when the support set of the random variable depends on the unknown parameter. In Section 6:3 we de…ne the score function, the information function, and the expected information function. These functions play an important role in the distribution of the maximum likelihood estimator. These functions are also used in Newton’s Method which is a method for determining the maximum likelihood estimate in cases where there is no explicit solution. Although the maximum likelihood estimates in nearly all the examples you saw previously could be found explicitly, this is not true in general. In Section 6:4 we review likelihood intervals. Likelihood intervals provide a way to summarize the uncertainty in an estimate. In Section 6:5 we give a theorem on the limiting distribution of the maximum likelihood estimator. This important theorem tells us why maximum likelihood estimators are good estimators. In Section 6:6 we review how to …nd a con…dence interval using a pivotal quantity. Con…dence intervals also give us a way to summarize the uncertainty in an estimate. We also give a theorem on how to obtain a pivotal quantity using the maximum likelihood estimator if the parameter is either a scale or location parameter. In Section 6:7 we review how to …nd an approximate con…dence interval using an asymptotic pivotal quantity. We then show how to use asymptotic pivotal quantities based on the limiting distribution of the maximum likelihood estimator to construct approximate con…dence intervals. 183 184 6.1 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Introduction Suppose the random variable X (possibly a vector of random variables) has probability function/probability density function f (x; ). Suppose also that is unknown and 2 where is the parameter space or the set of possible values of . Let X be the potential data that is to be collected. In your previous statistics course you learned how numerical and graphical summaries as well as goodness of …t tests could be used to check whether the assumed model for an observed set of data x was reasonable. In this course we will assume that the …t of the model has been checked and that the main focus now is to use the model and the data to determine point and interval estimates of . 6.1.1 De…nition - Statistic A statistic, T = T (X), is a function of the data X which does not depend on any unknown parameters. 6.1.2 Example Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample, that is, X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables, from a distribution with E(Xi ) = and 2 2 V ar(Xi ) = where and are unknown. n n P P 2 The sample mean X = n1 Xi , the sample variance S 2 = n 1 1 Xi X , and the i=1 i=1 sample minimum X(1) = min (X1 ; X2 ; : : : ; Xn ) are statistics. The random variable Xp = n is not a statistic since it is not only a function of the data X but also a function of the unknown parameters 6.1.3 and 2. De…nition - Estimator and Estimate A statistic T = T (X) that is used to estimate ( ), a function of , is called an estimator of ( ) and an observed value of the statistic t = t(x) is called an estimate of ( ). 6.1.4 Example Suppose X = (X1 ; X2 ; : : : ; Xn ) are independent and identically distributed random variables from a distribution with E(Xi ) = and V ar(Xi ) = 2 , i = 1; 2; : : : ; n. Suppose x = (x1 ; x2 ; : : : ; xn ) is an observed random sample from this distribution. The random variable X is an estimator of . The number x is an estimate of . The random variable S is an estimator of . The number s is an estimate of . 6.2. MAXIMUM LIKELIHOOD METHOD 6.2 185 Maximum Likelihood Method Suppose X is discrete random variable with probability function P (X = x; ) = f (x; ), 2 where the scalar parameter is unknown. Suppose also that x is an observed value of the random variable X. Then the probability of observing this value is, P (X = x; ) = f (x; ). With the observed value of x substituted into f (x; ) we have a function of the parameter only, referred to as the likelihood function and denoted L ( ; x) or L( ). In the absence of any other information, it seems logical that we should estimate the parameter using a value most compatible with the data. For example we might choose the value of which maximizes the probability of the observed data or equivalently the value of which maximizes the likelihood function. 6.2.1 De…nition - Likelihood Function: Discrete Case Suppose X is a discrete random variable with probability function f (x; ), where is a scalar and 2 and x is an observed value of X. The likelihood function for based on the observed data x is L( ) = L ( ; x) = P (observing the data x; ) = P (X = x; ) = f (x; ) for 2 If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from a distribution with probability function f (x; ) and x = (x1 ; x2 ; : : : ; xn ) are the observed data then the likelihood function for based on the observed data x is L( ) = L ( ; x) = P (observing the data x; ) = P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; ) n Q = f (xi ; ) for 2 i=1 6.2.2 De…nition - Maximum Likelihood Estimate and Estimator The value of that maximizes the likelihood function L( ) is called the maximum likelihood estimate. The maximum likelihood estimate is a function of the observed data x and we write ^ = ^ (x). The corresponding maximum likelihood estimator, which is a random variable, is denoted by ~ = ~(X). The shape of the likelihood function and the value of at which it is maximized are not a¤ected if L( ) is multiplied by a constant. Indeed it is not the absolute value of the likelihood function that is important but the relative values at two di¤erent values of the 186 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER parameter, for example, L( 1 )=L( 2 ). This ratio can be interpreted as how much more or less consistent the data are with the parameter 1 as compared to 2 . The ratio L( 1 )=L( 2 ) is also una¤ected if L( ) is multiplied by a constant. In view of this the likelihood may be de…ned as P (X = x; ) or as any constant multiple of it. To …nd the maximum likelihood estimate we usually solve the equation dd L ( ) = 0. If the value of which maximizes L ( ) occurs at an endpoint of , of course this does not provide the value at which the likelihood is maximized. In Example 6.2.9 we see that solving d d L ( ) = 0 does not give the maximum likelihood estimate. Most often however we will …nd ^ by solving dd L ( ) = 0. Since the log function (Note: log = ln) is an increasing function, the value of which maximizes the likelihood L ( ) also maximizes log L( ), the logarithm of the likelihood function. Since it is usually simpler to …nd the derivative of the sum of n terms rather than the product, it is often easier to determine the maximum likelihood estimate of by solving dd log L ( ) = 0. It is important to verify that ^ is the value of which maximizes L ( ) or equivalently l ( ). This can be done using the …rst derivative test. Recall that the second derivative test checks for a local extremum. 6.2.3 De…nition - Log Likelihood Function The log likelihood function is de…ned as l( ) = l ( ; x) = log L( ) for 2 where x are the observed data and log is the natural logarithmic function. 6.2.4 Example Suppose in a sequence of n Bernoulli trials the probability of success is equal to and we have observed x successes. Find the likelihood function, the log likelihood function, the maximum likelihood estimate of and the maximum likelihood estimator of . Solution The likelihood function for based on x successes in n trials is L( ) = P (X = x; ) n x = (1 )n x x for 0 1 or more simply L( ) = x (1 )n x for 0 1 6.2. MAXIMUM LIKELIHOOD METHOD 187 Suppose x 6= 0 and x 6= n. The log likelihood function is l( ) = x log + (n x) log (1 ) for 0 < <1 with derivative d l( ) = d x n 1 x (1 = x ) (1 x n (1 ) = (n ) x) for 0 < <1 The solution to dd l( ) = 0 is = x=n which is the sample proportion. Since dd l( ) > 0 if 0 < < x=n and dd l( ) < 0 if x=n < < 1 then, by the …rst derivative test, l( ) has a absolute maximum at = x=n. If x = 0 then L( ) = (1 )n for 0 1 which is a decreasing function of = 0 or = 0=n. If x = n then on the interval [0; 1]. L( ) is maximized at the endpoint n L( ) = for 0 1 which is a increasing function of on the interval [0; 1]. L( ) is maximized at the endpoint = 1 or = n=n. In all cases the value of which maximizes the likelihood function is the sample proportion = x=n. Therefore the maximum likelihood estimate of is ^ = x=n and the maximum likelihood estimator is ~ = x=n. 6.2.5 Example Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Poisson( ) distribution. Find the likelihood function, the log likelihood function, the maximum likelihood estimate of and the maximum likelihood estimator of . Solution The likelihood function is L( ) = = n Q i=1 n Q i=1 = f (xi ; ) = n Q P (Xi = xi ; ) i=1 xi e xi ! n 1 Q i=1 xi ! n P i=1 xi e n for 0 188 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER or more simply L( ) = nx e n for 0 Suppose x 6= 0. The log likelihood function is l ( ) = n (x log ) for >0 with derivative d x l( ) = n d n = (x 1 ) for >0 The solution to dd l( ) = 0 is = x which is the sample mean. Since dd l( ) > 0 if 0 < < x and dd l( ) < 0 if > x then, by the …rst derivative test, l( ) has a absolute maximum at = x. If x = 0 then L( ) = e which is a decreasing function of = 0 or = 0 = x. n for 0 on the interval [0; 1). L( ) is maximized at the endpoint In all cases the value of which maximizes the likelihood function is the sample mean = x. Therefore the maximum likelihood estimate of is ^ = x and the maximum likelihood estimator is ~ = X. 6.2.6 Likelihood Functions for Continuous Models Suppose X is a continuous random variable with probability density function f (x; ). For continuous continuous random variable, P (X = x; ) is unsuitable as a de…nition of the likelihood function since this probability always equals zero. For continuous data we usually observe only the value of X rounded to some degree of precision, for example, data on waiting times is rounded to the closest second or data on heights is rounded to the closest centimeter. The actual observation is really a discrete random variable. For example, suppose we observe X correct to one decimal place. Then 1:15 Z P (we observe 1:1 ; ) = f (x; )dx (0:1)f (1:1; ) 1:05 assuming the function f (x; ) is reasonably smooth over the interval. More generally, suppose x1 ; x2 ; : : : ; xn are the observations from a random sample from the distribution with probability density function f (x; ) which have been rounded to the nearest which is assumed to be small. Then P (X1 = x1 ; X2 = x2 ; : : : ; Xn = xn ; ) n Q i=1 f (xi ; ) = n n Q i=1 f (xi ; ) 6.2. MAXIMUM LIKELIHOOD METHOD 189 If we assume that the precision does not depend on the unknown parameter , then the term n can be ignored. This argument leads us to adopt the following de…nition of the likelihood function for a random sample from a continuous distribution. 6.2.7 De…nition - Likelihood Function: Continuous Case If x = (x1 ; x2 ; : : : ; xn ) are the observed values of a random sample from a distribution with probability density function f (x; ), then the likelihood function is de…ned as L ( ) = L ( ; x) = n Q f (xi ; ) for 2 i=1 6.2.8 Example Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Exponential( ) distribution. Find the likelihood function, the log likelihood function, the maximum likelihood estimate of and the maximum likelihood estimator of . Solution The likelihood function is L( ) = n Q f (xi ; ) = i=1 = = 1 n n 1 Q e xi = i=1 n P exp xi = i=1 n e nx= for >0 The log likelihood function is l( ) = n log + x for >0 with derivative d l( ) = d = n n 2 (x 1 x 2 ) Now dd l ( ) = 0 for = x. Since dd l( ) > 0 if 0 < < x and dd l( ) < 0 if > x then, by the …rst derivative test, l( ) has a absolute maximum at = x. Therefore the maximum likelihood estimate of is ^ = x and the maximum likelihood estimator is ~ = X. 6.2.9 Example Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution. Find the likelihood function, the maximum likelihood estimate of and the maximum likelihood estimator of . 190 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Solution The probability density function of a Uniform(0; ) random variable is f (x; ) = 1 for 0 x and zero otherwise. The support set of the random variable X is [0; ] which depends on the unknown parameter . In such examples care must be taken in determining the maximum likelihood estimate of . The likelihood function is L( ) = = n Q i=1 n Q f (xi ; ) 1 if 0 xi , i = 1; 2; : : : ; n, and >0 i=1 = To determine the value of 1 n if 0 xi , i = 1; 2; : : : ; n, and >0 which maximizes L( ) we note that L( ) can be written as 8 <0 if 0 < < x(n) L( ) = : 1n if x(n) where x(n) = max (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. To see this remember that in order to observe the sample x1 ; x2 ; : : : ; xn the value of must be larger than all the observed xi ’s. L( ) is a decreasing function of on the interval [x(n) ; 1). Therefore L( ) is maximized at = x(n) . The maximum likelihood estimate of is ^ = x(n) and the maximum likelihood estimator is ~ = X(n) . Note: In this example there is no solution to dd l ( ) = dd ( n log ) = 0 and the maximum likelihood estimate of is not found by solving dd l ( ) = 0. One of the reasons the method of maximum likelihood is so widely used is the invariance property of the maximum likelihood estimate under one-to-one transformations. 6.2.10 Theorem - Invariance of the Maximum Likelihood Estimate If ^ is the maximum likelihood estimate of of g ( ). then g(^) is the maximum likelihood estimate Note: The invariance property of the maximum likelihood estimate means that if we know the maximum likelihood estimate of then we know the maximum likelihood estimate of any function of . 6.3. SCORE AND INFORMATION FUNCTIONS 6.2.11 191 Example In Example 6.2.8 …nd the maximum likelihood estimate of the median of the distribution and the maximum likelihood estimate of V ar(~). Solution If X has an Exponential( ) distribution then the median m is found by solving 0:5 = Zm 1 e x= dx 0 to obtain m= log (0:5) By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate of m is m ^ = ^ log (0:5) = x log (0:5). Since Xi has an Exponential( ) distribution with V ar (Xi ) = 2 , i = 1; 2; : : : ; n independently, the variance of the maximum likelihood estimator ~ = X is V ar ~ = V ar(X) = 2 n By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate of V ar(~) is (^)2 =n = (x)2 =n. 6.3 Score and Information Functions The derivative of the log likelihood function plays an important role in the method of maximum likelihood. This function is often called the score function. 6.3.1 De…nition - Score Function The score function is de…ned as S( ) = S( ; x) = d d l( ) = log L ( ) d d for 2 where x are the observed data. Another function which plays an important role in the method of maximum likelihood is the information function. 192 6.3.2 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER De…nition - Information Function The information function is de…ned as d2 l( ) = d 2 I( ) = I( ; x) = d2 log L( ) for d 2 2 where x are the observed data. I(^) is called the observed information. In Section 6:7 we will see how the observed information I(^) can be used to construct an approximate con…dence interval for the unknown parameter . I(^) also tells us about the concavity of the log likelihood function l ( ). 6.3.3 Example Find the observed information for Example 6.2.5. Suppose the maximum likelihood estimate of was ^ = 2. Compare I(^) = I (2) if n = 10 and n = 25. Plot the function r ( ) = l ( ) l(^) for n = 10 and n = 25 on the same graph. Solution From Example 6.2.5, the score function is S( ) = d n l ( ) = (x d ) for >0 Therefore the information function is d2 l( ) = d 2 = d h x n d nx 1 i = nx 2 2 and the observed information is I(^) = = nx n^ = ^2 ^2 n ^ If n = 10 then I(^) = I (2) = n=^ = 10=2 = 5. If n = 25 then I(^) = I (2) = 25=2 = 12:5. See Figure 6.1. The function r ( ) = l ( ) l(^) is more concave and symmetric for n = 25 than for n = 10. As the number of observations increases we have more “information” about the unknown parameter . Although we view the likelihood, log likelihood, score and information functions as functions of , they are also functions of the observed data x. When it is important to emphasize the dependence on the data x we will write L( ; x); S( ; x); and I( ; x). When we wish to determine the sampling distribution of the corresponding random variables we will write L( ; X), S( ; X), and I( ; X). 6.3. SCORE AND INFORMATION FUNCTIONS 193 0 -1 n=10 -2 -3 n=25 r(θ) -4 -5 -6 -7 -8 -9 -10 1 1.5 2 θ 2.5 3 3.5 Figure 6.1: Poisson Log Likelihoods for n = 10 and n = 25 Here is one more function which plays an important role in the method of maximum likelihood. 6.3.4 If De…nition - Expected Information Function is a scalar then the expected information function is given by J( ) = E [I( ; X)] d2 = E l( ; X) d 2 for 2 where X is the potential data. Note: If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) then J( ) = E = nE d2 l( ; X) d 2 d2 log f (X; ) d 2 where X has probability density function f (x; ). for 2 194 6.3.5 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Example For each of the following …nd the observed information I(^) and the expected information J ( ). Compare I(^) and J(^). Determine the mean and variance of the maximum likelihood estimator ~. Compare the expected information with the variance of the maximum likelihood estimator. (a) Example 6.2.4 (Binomial) (b) Example 6.2.5 (Poisson) (c) Example 6.2.8 (Exponential) Solution (a) From Example 6.2.4, the score function based on x successes in n Bernoulli trials is S( ) = d x n l( ) = d (1 ) for 0 < <1 Therefore the information function is d2 d x n x 2l( ) = d 1 d x n x for 0 < < 1 2 + (1 )2 I( ) = = Since the maximum likelihood estimate is ^ = x n = x 2 n (1 x )2 the observed information is x x n x n x = + 2 + 2 2 x 2 ^ (1 ^) 1 nx n " # x 1 nx 1 1 n =n x + = n + 2 2 x 1 1 nx n n n = x 1 nx n n = ^(1 ^) I(^) = x n If X has a Binomial(n; ) distribution then E (X) = n and V ar (X) = n (1 fore the expected information is J( ) = E [I( ; X)] = E = = X n [(1 )+ ] (1 ) n for 0 < (1 ) We note that I(^) = J(^). 2 + n (1 <1 X n n (1 2 = 2 + ) (1 ) 2 ) ). There- 6.3. SCORE AND INFORMATION FUNCTIONS 195 The maximum likelihood estimator is ~ = X=n with X n E(~) = E = 1 n E (X) = = n n and X n V ar(~) = V ar = 1 n (1 V ar (X) = 2 n n2 ) (1 n = ) 1 J( ) = (b) From Example 6.3.3 the information function based on Poisson data x1 ; x2 ; : : : ; xn is I( ) = nx 2 for >0 Since the maximum likelihood estimate is ^ = x the observed information is nx n I(^) = 2 = ^ ^ We note that I(^) = J(^). Since Xi has a Poisson( ) distribution with E (Xi ) = V ar (Xi ) = V ar X = =n. Therefore the expected information is J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = E = n for nX 2 = then E X = n 2 ( ) >0 The maximum likelihood estimator is ~ = X with E(~) = E X = and V ar(~) = V ar X = n = 1 J( ) (c) From Example 6.2.8 the score function based on Exponential data x1 ; x2 ; : : : ; xn is S( ) = d l( ) = d n 1 x 2 for Therefore the information function is d2 d 1 x 2 l ( ) = nd 2 d 1 2x = n for > 0 2 + 3 I( ) = >0 and 196 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Since the maximum likelihood estimate is ^ = x the observed information is ! ^ 1 2 n I(^) = n = 2 + 3 ^ ^ ^ ( )2 Since Xi has a Exponential( ) distribution with E (Xi ) = and V ar (Xi ) = 2 E X = and V ar X = =n. Therefore the expected information is 1 J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = nE = n 2 for 2 + 2X 3 =n 1 2 + 2 then 2 2 >0 We note that I(^) = J(^). The maximum likelihood estimator is ~ = X with E(~) = E X = and V ar(~) = V ar X = 2 n = 1 J( ) In all three examples we have I(^) = J(^), E(~) = , and V ar(~) = [J ( )] 1 . In the three previous examples, we observed that E(~) = and therefore ~ was an unbiased estimator of . This is not always true for maximum likelihood estimators as we see in the next example. However, maximum likelihood estimators usually have other good properties. Suppose ~n = ~n (X1 ; X2 ; : : : ; Xn ) is the maximum likelihood estimator based on a sample of size n. If lim E(~n ) = then ~n is an asymptotically unbiased estimator of n!1 . If ~n !p then ~n is called a consistent estimator of . 6.3.6 Example Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the distribution with probability density function f (x; ) = x 1 for 0 x 1 for > 0 (6.1) (a) Find the score function, the maximum likelihood estimator, the information function, the observed information, and the expected information. n P (b) Show that T = log Xi Gamma n; 1 i=1 (c) Use (b) and 2.7.9 to show that ~ is not an unbiased estimator of . Show however that ~ is an aysmptotically unbiased estimator of . (d) Show that ~ is a consistent estimator of . 6.3. SCORE AND INFORMATION FUNCTIONS 197 (e) Use (b) and 2.7.9 to …nd V ar(~). Compare V ar(~) with the expected information. Solution (a) The likelihood function is n Q L( ) = f (xi ; ) = n Q n 1 xi i=1 1 i=1 = n Q for xi >0 i=1 or more simply n Q n L( ) = for xi >0 i=1 The log likelihood function is l( ) = n log + n P log xi i=1 = n log where t = n P t for >0 log xi . The score function is i=1 S( ) = = = d l( ) d n t 1 (n t) for >0 Now dd l ( ) = 0 for = n=t. Since dd l( ) > 0 if 0 < < n=t and dd l( ) < 0 if > n=t then, by the …rst derivative test, l( ) has a absolute maximum at = n=t. Therefore the maximum likelihood estimate of is ^ = n=t and the maximum likelihood estimator is n ~ = n=T where T = P log Xi . i=1 The information function is I( ) = = and the observed information is d2 l( ) d 2 n for > 0 2 n I(^) = 2 ^ The expected information is J( ) = E [I( ; X1 ; X2 ; : : : ; Xn )] = E = n 2 for >0 n 2 198 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER (b) From Exercise 2.6.12 we have that if Xi has the probability density function (6.1) then Yi = log Xi 1 Exponential for i = 1; 2; : : : ; n (6.2) From 4.3.2(4) have that T = n P i=1 (c) Since T n P log Xi = Yi Gamma n; 1 i=1 Gamma n; 1 then from 2.7.9 we have (n + p) p (n) E (T p ) = for p > n Assuming n > 1 we have E T 1 (n = 1) = n (n) 1 1 so E ~ n T = E = nE = nE T = n = n 1 1 T 1 1 1 n 6= and therefore ~ is not an unbiased estimator of . Now ~ is an estimator based on a sample of size n. Since lim E ~ = lim n!1 n!1 1 1 n = therefore ~ is an asymptotically unbiased estimator of . (d) By (6.2), Y1 ; Y2 ; : : : ; Yn are independent and identically distributed random variables with E (Yi ) = 1 and V ar (Yi ) = 12 < 1. Therefore by the Weak Law of Large Numbers and by the Limit Theorems n T 1 P 1 = Yi !p n n i=1 Thus ~ is a consistent estimator of . ~ = n !p T 6.3. SCORE AND INFORMATION FUNCTIONS 199 (e) To determine V ar(~) we note that n V ar(~) = V ar T 1 = n2 V ar T ( " = n2 E n = n2 E T Since 2 E T (n = 2 1 T # 2 E 1 E T 2 o ) 2 2) = (n (n) 2 2 1 T 1) (n 2) then n V ar(~) = n2 E T " = n 2 (n 2 2 1 2 2 = n2 E T 1) (n 2) n o 1 2 # (n 1) (n 2) (n 1)2 (n 2) 2 = 1 2 (n n 1 We note that V ar ~ 6= 6.3.7 2 n = 1 J( ) , 2) however for large n, V ar ~ 1 J( ) . Example Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Weibull( ; 1) distribution with probability density function f (x; ) = x 1 e x for x > 0; >0 Find the score function and the information function. How would you …nd the maximum likelihood estimate of ? Solution The likelihood function is L( ) = n Q f (xi ; ) = i=1 1 i=1 = n n Q i=1 n Q xi xi exp 1 e xi n P i=1 xi for >0 200 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER or more simply L( ) = n n Q exp xi i=1 i=1 The log likelihood function is l( ) = n log + n P log xi i=1 The score function is S( ) = = n P xi n P i=1 xi d l( ) d n P n +t xi log xi for for >0 >0 for >0 i=1 where t = n P log xi . i=1 Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Since d S( )= d n 2 + n P i=1 xi (log xi )2 for >0 is negative for all values of > 0 then we know that the function S ( ) is always decreasing and therefore there is only one solution to S( ) = 0. The solution to S( ) = 0 gives the maximum likelihood estimate. The information function is I( ) = = d2 l( ) d 2 n P n xi (log xi )2 2 + for >0 i=1 To illustrate how to …nd the maximum likelihood estimate for a given sample of data, we randomly generate 20 observations from the Weibull( ; 1) distribution. To do this we use the result of Example 2.6.7 in which we showed that if u is an observation from the Uniform(0; 1) distribution then x = [ log (1 u)]1= is an observation from the Weibull( ; 1) distribution. The following R code generates the data, plots the likelihood function, …nds ^ by solving S( ) = 0 using the R function uniroot, and determines S(^) and the observed information I(^). 6.3. SCORE AND INFORMATION FUNCTIONS # randomly generate 20 observations from a Weibull(theta,1) # using a random theta value between 0.5 and 1.5 set.seed(20086689) # set the seed so results can be reproduced truetheta<-runif(1,min=0.5,max=1.5) # data are sorted and rounded to two decimal places for easier display x<-sort(round((-log(1-runif(20)))^(1/truetheta),2)) x # # function for calculating Weibull likelihood for data x and theta=th WBLF<-function(th,x) {n<-length(x) L<-th^n*prod(x)^th*exp(-sum(x^th)) return(L)} # # function for calculating Weibull score for data x and theta=th WBSF<-function(th,x) {n<-length(x) t<-sum(log(x)) S<-(n/th)+t-sum(log(x)*x^th) return(S)} # # function for calculating Weibull information for data x and theta=th WBIF<-function(th,x) {n<-length(x) I<-(n/th^2)+sum(log(x)^2*x^th) return(I)} # # plot the Weibull likelihood function th<-seq(0.25,0.75,0.01) L<-sapply(th,WBLF,x) plot(th,R,"l",xlab=expression(theta), ylab=expression(paste("L(",theta,")")),lwd=3) # # find thetahat using uniroot function thetahat<-uniroot(function(th) WBSF(th,x),lower=0.4,upper=0.6)$root cat("thetahat = ",thetahat) # display value of thetahat # display value of Score function at thetahat cat("S(thetahat) = ",WBSF(thetahat,x)) # calculate observed information cat("Observed Information = ",WBIF(thetahat,x)) 201 202 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER The generated data are 0:01 0:64 0:01 1:07 0:05 1:16 0:07 1:25 0:10 2:40 0:11 3:03 0:23 3:65 0:28 5:90 0:44 6:60 0:46 30:07 4e-20 0e+00 2e-20 L(θ) 6e-20 8e-20 The likelihood function is graphed in Figure 6.2. The solution to S( ) = 0 determined by uniroot was ^ = 0:4951607. S(^) = 2:468574 10 5 which is close to zero and the observed information was I(^) = 181:8069 which is positive and indicates a local maximum. We have already shown in this example that the solution to S( ) = 0 is the unique maximum likelihood estimate. 0.3 0.4 0.5 0.6 0.7 θ Figure 6.2: Likelihood function for Example 6.3.7 Note that the interval [0:25; 0:75] used to graph the likelihood function was determined by trial and error. The values lower=0.4 and upper=0.6 used for uniroot were determined from the graph of the likelihood function. From the graph is it easy to see that the value of ^ lies in the interval [0:4; 0:6]. Newton’s Method, which is a numerical method for …nding the roots of an equation usually discussed in …rst year calculus, is a method which can be used for …nding the maximum likelihood estimate. Newton’s Method usually works quite well for …nding maximum likelihood estimates because the likelihood function is often very quadratic in shape. 6.3. SCORE AND INFORMATION FUNCTIONS 6.3.8 Let (0) 203 Newton’s Method be an initial estimate of . The estimate (i+1) = (i) + S( (i) I( (i) ) ) (i) can be updated using for i = 0; 1; : : : Notes: (1) The initial estimate, (0) , may be determined by graphing L ( ) or l ( ). (2) The algorithm is usually run until the value of (i) no longer changes to a reasonable number of decimal places. When the algorithm is stopped it is always important to check that the value of obtained does indeed maximize L ( ). (3) This algorithm is also called the Newton-Raphson Method. (4) I ( ) can be replaced by J ( ) for a similar algorithm which is called the Method of Scoring. (5) If the support set of X depends on (e.g. Uniform(0; )) then ^ is not found by solving S( ) = 0. 6.3.9 Example Use Newton’s Method to …nd the maximum likelihood in Example 6.3.7. Solution Here is R code for Newton’s Method for the Weibull Example # Newton’s Method for Weibull Example NewtonWB<-function(th,x) {thold<-th thnew<-th+0.1 while (abs(thold-thnew)>0.00001) {thold<-thnew thnew<-thold+WBSF(thold,x)/WBIF(thold,x) print(thnew)} return(thnew)} # thetahat<-NewtonWB(0.2,x) Newton’s Method converges after four iterations and the value of thetahat returned is ^ = 0:4951605 which is the same value to six decimal places as was obtained above using the uniroot function. 204 6.4 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Likelihood Intervals In your previous statistics course, likelihood intervals were introduced as one approach to constructing an interval estimate for the unknown parameter . 6.4.1 De…nition - Relative Likelihood Function The relative likelihood function R( ) is de…ned by R( ) = R ( ; x) = L( ) L(^) for 2 where x are the observed data. The relative likelihood function takes on values between 0 and 1 and can be used to rank parameter values according to their plausibilities in light of the observed data. If R( 1 ) = 0:1, for example, then 1 is rather an implausible parameter value because the data are ten times more probable when = ^ than they are when = 1 . However, if R( 1 ) = 0:5, then 1 is a fairly plausible value because it gives the data 50% of the maximum possible probability under the model. 6.4.2 De…nition - Likelihood Interval The set of values for which R( ) p is called a 100p% likelihood interval for . Values of inside a 10% likelihood interval are referred to as plausible values in light of the observed data. Values of outside a 10% likelihood interval are referred to as implausible values given the observed data. Values of inside a 50% likelihood interval are very plausible and values of outside a 1% likelihood interval are very implausible in light of the data. 6.4.3 De…nition - Log Relative Function The log relative likelihood function is the natural logarithm of the relative likelihood function: r( ) = r ( ; x) = log[R( )] for 2 where x are the observed data. Likelihood regions or intervals may be determined from a graph of R( ) or r( ). Alternatively, they can be found by solving R ( ) p = 0 or r( ) log p = 0. In most cases this must be done numerically. 6.4. LIKELIHOOD INTERVALS 6.4.4 205 Example Plot the relative likelihood function for intervals for . in Example 6.3.7. Find 10% and 50% likelihood Solution Here is R code to plot the relative likelihood function for the Weibull Example with lines for determining 10% and 50% likelihood intervals for as well as code to determine these intervals using uniroot. The R function WBRLF uses the R function WBLF from Example 6.3.7. # function for calculating Weibull relative likelihood function WBRLF<-function(th,thetahat,x) {R<-WBLF(th,x)/WBLF(thetahat,x) return(R)} # # plot the Weibull relative likelihood function th<-seq(0.25,0.75,0.01) R<-sapply(th,WBRLF,thetahat,x) plot(th,R,"l",xlab=expression(theta), ylab=expression(paste("R(",theta,")")),lwd=3) # add lines to determine 10% and 50% likelihood intervals abline(a=0.10,b=0,col="red",lwd=2) abline(a=0.50,b=0,col="blue",lwd=2) # # use uniroot to determine endpoints of 10%, 15%, and 50% likelihood intervals uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.3,upper=0.4)$root uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.6,upper=0.7)$root uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.35,upper=0.45)$root uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.55,upper=0.65)$root uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.3,upper=0.4)$root uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.6,upper=0.7)$root The graph of the relative likelihood function is given in Figure 6.3. The upper and lower values used for uniroot were determined using this graph. The 10% likelihood interval for is [0:34; 0:65]. The 50% likelihood interval for is [0:41; 0:58]. For later reference the 15% likelihood interval for is [0:3550; 0:6401]. 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER 0.0 0.2 0.4 R(θ) 0.6 0.8 1.0 206 0.3 0.4 0.5 0.6 0.7 θ Figure 6.3: Relative Likelihood function for Example 6.3.7 6.4.5 Example Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Uniform(0; ) distribution. Plot the relative likelihood function for if n = 20 and x(20) = 1:5. Find 10% and 50% likelihood intervals for . Solution From Example 6.2.9 the likelihood function for n = 10 and ^ = x(10) = 0:5 is 8 <0 if 0 < < 0:5 L( ) = : 1 if 0:5 10 The relative likelihood function is R( ) = 8 <0 : if 0 < 0:5 10 if < 0:5 0:5 is graphed in Figure 6.4 along with lines for determining 10% and 50% likelihood intervals. To determine the value of at which the horizontal line R = p intersects the graph of 10 R( ) we solve 0:5 = p to obtain = 0:5p 1=10 . Since R( ) = 0 if 0 < < 0:5 then a 100p% likelihood interval for is of the form 0:5; 0:5p 1=10 . 6.4. LIKELIHOOD INTERVALS 207 A 10% likelihood interval is h 0:5; 0:5 (0:1) A 50% likelihood interval is h 0:5; 0:5 (0:5) 1=10 1=10 i = [0:5; 0:629] i = [0:5; 0:536] 1 0.9 R(θ) 0.7 0.5 0.3 0.1 0 0 0.2 0.4 θ 0.6 0.8 1 Figure 6.4: Relative likelihood function for Example 6.4.5 More generally for an observed random sample x1 ; x2 ; : : : ; xn from the Uniform(0; ) distribution a 100p% likelihood interval for will be of the form x(n) ; x(n) p 1=n . 6.4.6 Exercise Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Two Parameter Exponential(1; ) distribution. Plot the relative likelihood function for if n = 12 and x(1) = 2. Find 10% and 50% likelihood intervals for . 208 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER 6.5 Limiting Distribution of Maximum Likelihood Estimator If there is no explicit expression for the maximum likelihood estimator ~, as in Example 6.3.7, then its sampling distribution can only be obtained by simulation. This makes it di¢ cult to determine the properties of the maximum likelihood estimator. In particular, to determine how good an estimator is, we look at its mean and variance. The mean indicates whether the estimator is close, on average, to the true value of , while the variance indicates the uncertainty in the estimator. The larger the variance the more uncertainty in our estimation. To determine how good an estimator is we also look at how the mean and variance behave as the sample size n ! 1. We saw in Example 6.3.5 that E(~) = and V ar(~) = [J ( )] 1 ! 0 as n ! 1 for the Binomial, Poisson, Exponential models. For each of these models the maximum likelihood estimator is an unbiased estimator of . In Example 6.3.6 we saw that E(~) ! and V ar(~) ! [J ( )] 1 ! 0 as n ! 1. The maximum likelihood estimator ~ is an asymptotically unbiased estimator. In Example 6.3.7 we are not able to determine E(~) and V ar(~). The following theorem gives the limiting distribution of the maximum likelihood estimator in general under certain restrictions. 6.5.1 Theorem - Limiting Distribution of Maximum Likelihood Estimator Suppose Xn = (X1 ; X2 ; : : : ; Xn ) be a random sample from the probability (density) function f (x; ) for 2 . Let ~n = ~n (Xn ) be the maximum likelihood estimator of based on Xn . Then under certain (regularity) conditions ~n !p (~n )[J( )]1=2 !D Z 2 log R( ; Xn ) = for each 2 log (6.3) N(0; 1) L( ; Xn ) !D W L(~n ; Xn ) (6.4) 2 (1) (6.5) 2 . The proof of this result which depends on applying Taylor’s Theorem to the score function is beyond the scope of this course. The regularity conditions are a bit complicated but essentially they are a set of conditions which ensure that the error term in Taylor’s Theorem goes to zero as n ! 1. One of the conditions is that the support set of f (x; ) does not depend on . Therefore, for example, this theorem cannot be applied to the maximum likelihood estimator in the case of a random sample from the Uniform(0; ) distribution. 6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 209 This is actually not a problem since the distribution of the maximum likelihood estimator in this case can be determined exactly. Since (6.3) holds ~n is called a consistent estimator of . Theorem 6.5.1 implies that for su¢ ciently large n, ~n has an approximately N( ; 1=J( )) distribution. Therefore for large n E(~n ) and the maximum likelihood estimator is an asymptotically unbiased estimator of . Since ~n has an approximately N( ; 1=J( )) distribution this also means that for su¢ ciently large n 1 V ar(~n ) J( ) 1=J( ) is called the asymptotic variance of ~n . Of course J( ) is unknown because is unknown. By (6.3), (6.4), and Slutsky’s Theorem q ~ ( n ) J(~n ) !D Z N(0; 1) (6.6) which implies that the asymptotic variance of ~n can be estimated using 1=J(^n ). Therefore for su¢ ciently large n we have 1 V ar(~n ) J(^n ) By the Weak Law of Large Numbers 1 I( ; Xn ) = n n d2 1 P l( ; Xi ) !p E n i=1 d 2 d2 l( ; Xi ) d 2 Therefore by (6.3), (6.4), (6.7) and the Limit Theorems it follows that q ~ ( n ) I(~n ; Xn ) !D Z N(0; 1) (6.7) (6.8) which implies the asymptotic variance of ~n can be also be estimated using 1=I(^n ) where I(^n ) is the observed information. Therefore for su¢ ciently large n we have V ar(~n ) 1 I(^n ) Results (6.6), (6.8) and (6.5) can be used to construct approximate con…dence intervals for . In Chapter 8 we will see how result (6.5) can be used in a test of hypothesis. Although we will not prove Theorem 6.5.1 in general we can prove the results in a particular case. The following example illustrates how techniques and theorems from previous chapters can be used together to obtained the results of interest. It is also a good review of several ideas covered thus far in these Course Notes. 210 6.5.2 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Example (a) Suppose X Weibull(2; ). Show that E X 2 = 2 and V ar X 2 = 4 . (b) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Weibull(2; ) distribution. Find the maximum likelihood estimator ~ of , the information function I ( ), the observed information I(^), and the expected information J ( ). (c) Show that ~n !p (d) Show that (~n )[J( )]1=2 !D Z N(0; 1) Solution (a) From Exercise 2.7.9 we have if X k +1 2 k E Xk = for k = 1; 2; : : : (6.9) Weibull(2; ). Therefore E X2 = 2 E X4 = 4 and V ar X 2 = E (b) The likelihood function is L( ) = n Q h 2 X2 i = 2 E X2 f (xi ; ) = i=1 2 +1 2 4 +1 2 n Q L( ) = n 1 P 2xi exp where t = n P i=1 2n 2 t exp 4 =2 2 2 = 4 2 i=1 or more simply 4 =2 (xi = )2 n 2x e Q i i=1 2n 2 = i=1 x2i for 2 for >0 for >0 >0 x2i . The log likelihood function is l( ) = t 2n log for 2 >0 The score function is S( )= d l( ) = d 2n + 2t 3 = 2 2 t n 2 6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR Now d d l( ) = 0 t 1=2 then, n for = t 1=2 . n d d Since l( ) > 0 if 0 < t 1=2 n < and d d 211 l( ) < 0 if 1=2 by the …rst derivative test, l( ) has a absolute maximum at = nt . 1=2 t Therefore the maximum likelihood estimate of is ^ = n . The maximum likelihood estimator is 1=2 n P ~= T where T = Xi2 n i=1 > The information function is d2 l( ) d 2 6T 2n I( ) = = 4 for 2 >0 and the observed information is 2 2n 6(n^ ) = ^2 ^4 6t ^4 4n ^2 I(^) = = 2n ^2 To …nd the expected information we note that, from (a), E Xi2 = n P E (T ) = E i=1 Therefore the expected information is J( ) = E = (c) To show that ~n !p 4n 2 6T 2n 4 2 for = Xi2 4 and thus 2 =n 6E (T ) 2 2n 2 = 6n 4 2 2n 2 >0 we need to show that ~= T n 1=2 n 1 P X2 n i=1 i = 1=2 !p Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Weak Law of Large Numbers n T 1 P = X 2 !p 2 n n i=1 i and by 5.5.1(1) ~= as required. T n 1=2 !p (6.10) 212 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER (d) To show that (~n )[J( )]1=2 !D Z N(0; 1) we need to show that [J( )]1=2 (~n ) p " 2 n T 1=2 = n # !D Z N(0; 1) Since X12 ; X22 ; : : : ; Xn2 are independent and identically distributed random variables with E Xi2 = 2 and V ar Xi2 = 4 for i = 1; 2; : : : ; n then by the Central Limit Theorem p n 2 T n !D Z 2 p Let g (x) = x and a = Method we have 2 p . Then n q d dx g (x) p T n = 1 p 2 x N(0; 1) (6.11) and g 0 (a) = 1 2 . By (6.11) and the Delta 2 !D 2 1 Z 2 N 0; 1 4 2 or p n q (2 ) p " 2 n T = n T n 2 1=2 (6.12) # !D Z N(0; 1) (6.13) as required. 6.5.3 Example Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Uniform(0; ) distribution. Since the support set of Xi depends on Theorem 6.5.1 does not hold. Show however that the maximum likelihood estimator ~n = X(n) is a consistent estimator of . Solution In Example 5.1.5 we showed that ~n = max (X1 ; X2 ; : : : ; Xn ) !p consistent estimator of . and therefore ~n is a 6.6. CONFIDENCE INTERVALS 6.6 213 Con…dence Intervals In your previous statistics course a con…dence interval was used to summarize the available information about an unknown parameter. Con…dence intervals allow us to quantify the uncertainty in the unknown parameter. 6.6.1 De…nition - Con…dence Interval Suppose X is a random variable (possibly a vector) whose distribution depends on , and A(X) and B(X) are statistics. If P [A(X) B(X)] = p for 0 < p < 1 then [a(x); b(x)] is called a 100p% con…dence interval for where x are the observed data. Con…dence intervals can be constructed in a straightforward manner if a pivotal quantity exists. 6.6.2 De…nition - Pivotal Quantity Suppose X is a random variable (possibly a vector) whose distribution depends on . The random variable Q(X; ) is called a pivotal quantity if the distribution of Q does not depend on . Pivotal quantities can be used for constructing con…dence intervals in the following way. Since the distribution of Q(X; ) is known we can write down a probability statement of the form P (q1 Q(X; ) q2 ) = p where q1 and q2 do not depend on . If Q is a monotone function of can be rewritten as P [A(X) B(X)] = p then this statement and the interval [a(x); b(x)] is a 100p% con…dence interval. 6.6.3 Example Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution. Determine the distribution of n P 2 Xi i=1 Q(X; ) = and thus show Q(X; ) is a pivotal quantity. Show how Q(X; ) can be used to construct a 100p% equal tail con…dence interval for . 214 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Solution Since X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Exponential( ) distribution then by 4.3.2(4) n P Xi Gamma (n; ) Y = i=1 and by Chapter 2, Problem 7(c) Q(X; ) = 2Y 2 n P Xi i=1 = 2 (2n) and therefore Q(X; ) is a pivotal quantity. Find values a and b such that P (W where W 1 a) = p 2 and P (W b) = 1 p 2 2 (2n). Since P (a W b) = p then 0 B PB @a or Therefore 2 n P 1 Xi C bC A=p i=1 0 P n 2 X B i=1 i B P@ b 2 n P Xi i=1 a 3 2 P n n P xi 2 xi 2 7 6 i=1 i=1 7 6 4 b ; a 5 1 C C=p A is a 100p% equal tail con…dence interval for . The following theorem gives the pivotal quantity in the case in which or scale parameter. 6.6.4 is either a location Theorem Let X = (X1 ; X2 ; : : : ; Xn ) be a random sample from f (x; ) and let ~ = ~(X) be the maximum likelihood estimator of the scalar parameter based on X. (1) If is a location parameter of the distribution then Q(X; ) = ~ is a pivotal quantity. (2) If is a scale parameter of the distribution then Q(X; ) = ~= is a pivotal quantity. 6.6. CONFIDENCE INTERVALS 6.6.5 215 Example Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Weibull(2; ) distribution. Use Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can be used to construct a 100p% equal tail con…dence interval for . Solution From Chapter 2, Problem 3(a) we know that is a scale parameter for the Weibull(2; ) distribution. From Example 6.5.2 the maximum likelihood estimator of is ~= 1=2 T n = Therefore by Theorem 6.6.4 n 1 P X2 n i=1 i 1=2 n 1 P X2 n i=1 i 1=2 ~ Q(X; ) = 1 = is a pivotal quantity. To construct a con…dence interval for we need to determine the distribution of Q(X; ). This looks di¢ cult at …rst until we notice that ~ !2 = n 1 P Xi2 n 2 i=1 This form suggests looking for the distribution of n P i=1 Xi2 which is a sum of independent and identically distributed random variables X12 ; X22 ; : : : ; Xn2 . From Chapter 2, Problem 7(h) we have that if Xi Xi2 Therefore n P i=1 Weibull(2; ), i = 1; 2; : : : ; n then Exponential( 2 ) for i = 1; 2; : : : ; n Xi2 is a sum of independent Exponential( 2 ) random variables. By 4.3.2(4) n P i=1 and by Chapter 2, Problem 7(h) Xi2 Gamma n; 2 Q1 (X; ) = n P Xi2 i=1 2 and therefore Q1 (X; ) is a pivotal quantity. 2 2 (2n) 216 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Now ~ Q1 (X; ) = 2n !2 = 2n [Q(X; )]2 is a one-to-one function of Q(X; ). To see that Q1 (X; ) and Q(X; ) generate the same con…dence intervals for we note that P (a 0 Q1 (X; ) b) 1 !2 ~ = P @a 2n bA = P = P " " a 2n 1=2 a 2n 1=2 ~ b 2n 1=2 # 1=2 b 2n Q(X; ) # To construct a 100p% equal tail con…dence interval we choose a and b such that P (W where W 2 (2n). a) = 1 p 2 " " = P ~ a 2n 6.6.6 n P i=1 ~ 1=2 2n b a 100p% equal tail con…dence interval is " ^ 2n b 1 n b) = 1 p 2 Since p = P where ^ = and P (W b 2n 1=2 ~ 1=2 ; ^ 2n a 1=2 1=2 2n a # 1=2 # # 1=2 x2i Example Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Uniform(0; ) distribution. Use Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can be used to construct a 100p% con…dence interval for of the form [^; a^]. 6.6. CONFIDENCE INTERVALS 217 Solution From Chapter 2, Problem 3(b) we know that is a scale parameter for the Uniform(0; ) distribution. From Example 6.2.9 the maximum likelihood estimator of is ~ = X(n) Therefore by Theorem 6.6.4 ~ Q(X; ) = = X(n) is a pivotal quantity. To construct a con…dence interval for distribution of Q(X; ). ! ~ P (Q (X; ) q) = P q = P X(n) q n Q = P (Xi q ) i=1 n Q = q i=1 n = q since P (Xi for 0 q = P 1 = P 1 a or a = (1 p) where ^ = x(n) . 1=n P = 1 a for 0 ~ a x 1 a~ =P Q (X; ) = P (Q (X; ) = 1 x of the form [^; a^] we need to choose a such To construct a 100p% con…dence interval for that p = P ~ x) = we need to determine the 1 a ~ 1 ! 1 1) P Q (X; ) 1 a Q (X; ) n . The 100p% con…dence interval for h i ^; (1 p) 1=n ^ is 1 a 218 6.7 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Approximate Con…dence Intervals A pivotal quantity does not always exist. For example there is no pivotal quantity for the Binomial or the Poisson distributions. In these cases we use an asymptotic pivotal quantity to construct approximate con…dence intervals. 6.7.1 De…nition - Asymptotic Pivotal Quantity Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random variable whose distribution depends on . The random variable Q(Xn ; ) is called an asymptotic pivotal quantity if the limiting distribution of Q(Xn ; ) as n ! 1 does not depend on . In your previous statistics course, approximate con…dence intervals for the Binomial and Poisson distribution were justi…ed using a Central Limit Theorem argument. We are now able to clearly justify the asymptotic pivotal quantity using the theorems of Chapter 5. 6.7.2 Example Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random sample from the Poisson( ) distribution. Show that p n Xn p Q(Xn ; ) = Xn is an asymptotic pivotal quantity. Show how Q(Xn ; ) can be used to construct an approximate 100p% equal tail con…dence interval for . Solution In Example 5.5.4, the Weak Law of Large Numbers, the Central Limit Theorem and Slutsky’s Theorem were all used to prove p n Xn p Q(Xn ; ) = !D Z N(0; 1) (6.14) Xn and thus Q(Xn ; ) is an asymptotic pivotal quantity. Let a be the value such that P (Z have p P = P a) = (1 + p) =2 where Z a Xn p n Xn p Xn r Xn a n ! N(0; 1). Then by (6.14) we a Xn r a and an approximate 100p% equal tail con…dence interval for " r r # xn xn xn a ; xn + a n n Xn n is ! 6.7. APPROXIMATE CONFIDENCE INTERVALS or 2 4^n since ^n = xn . 6.7.3 s a ^n n ; ^n + a s 219 ^n n 3 5 (6.15) Exercise Suppose Xn Binomial(n; ). Show that p n Q(Xn ; ) = q Xn n Xn n 1 Xn n is an asymptotic pivotal quantity. Show that an approximate 100p% equal tail con…dence interval for based on Q(Xn ; ) is given by s s 2 3 ^n (1 ^n ) ^n (1 ^n ) 4^n a 5 ; ^n + a (6.16) n n where ^n = 6.7.4 xn n . Approximate Con…dence Intervals and the Limiting Distribution of the Maximum Likelihood Estimator The limiting distribution of the maximum likelihood estimator ~n = ~n (X1 ; X2 ; : : : ; Xn ) can also be used to construct approximate con…dence intervals. This is particularly useful in cases in which the maximum likelihood estimate cannot be found explicitly. Since h J(~n ) i1=2 (~n ) !D Z N(0; 1) h i1=2 then J(~n ) (~n ) is an asymptotic pivotal quantity. An approximate 100p% con…dence interval based on this asymptotic pivotal quantity is given by " # s s s 1 1 1 ^n a = ^n a ; ^n + a (6.17) J(^n ) J(^n ) J(^n ) where P (Z a) = 1+p 2 and Z N(0; 1). Similarly since [I(~n ; X)]1=2 (~n ) !D Z N(0; 1) then [I(~n ; X)]1=2 (~n ) is an asymptotic pivotal quantity. An approximate 100p% con…dence interval based on this asymptotic pivotal quantity is given by " # s s s 1 1 1 ^n a = ^n a ; ^n + a (6.18) I(^n ) I(^n ) I(^n ) 220 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER where I(^n ) is the observed information. Notes: (1) One drawback of these intervals is that we don’t know how large n needs to be to obtain a good approximation. (2) These approximate con…dence intervals are both symmetric about ^n which may not be a reasonable summary of the plausible values of likelihood intervals below. in light of the observed data. See (3) It is possible to obtain approximate con…dence intervals based on a given data set which contain values which are not valid for . For example, an interval may contain negative values although must be positive. 6.7.5 Example Use the results from Example 6.3.5 to determine the approximate 100p% con…dence intervals based on (6.17) and (6.18) in the case of Binomial data and Poisson data. Compare these intervals with the intervals in (6.16) and (6.15). Solution From Example 6.3.5F we have that for Binomial data I(^n ) = J(^n ) = n ^n (1 ^n ) so (6.17) and (6.18) both give the intervals 2 4^n s a ^n (1 ^n ) n s ; ^n + a ^n (1 n which is the same interval as in (6.16). From Example 6.3.5F we have that for Poisson data n I(^n ) = J(^n ) = ^n so (6.17) and (6.18) both give the intervals 2 4^n a which is the same interval as in (6.15). s ^n n s ; ^n + a ^n ) ^n n 3 5 3 5 6.7. APPROXIMATE CONFIDENCE INTERVALS 6.7.6 221 Example For Example 6.3.7 construct an approximate 95% con…dence interval based on (6.18). Compare this with the 15% likelihood interval determined in Example 6.4.4. Solution From Example 6.3.7 we have ^ = 0:4951605 and I(^) = 181:8069. Therefore an approximate 95% con…dence interval based on (6.18) is s 1 ^ 1:96 I(^) r 1 = 0:4951605 1:96 181:8069 = 0:4951605 0:145362 = [0:3498; 0:6405] From Example 6.4.4 the 15% likelihood interval is [0:3550; 0:6401] which is very similar. We expect this to happen since the relative likelihood function in Figure 6.3 is very symmetric. 6.7.7 Approximate Con…dence Intervals and Likelihood Intervals In your previous statistics course you learned that likelihood intervals are also approximate con…dence intervals. 6.7.8 Theorem If a is a value such othat p = 2P (Z a) 1 where Z N (0; 1), then the likelihood interval n 2 : R( ) e a =2 is an approximate 100p% con…dence interval. Proof By Theorem 6.5.1 2 log R( ; Xn ) = 2 log L( ; Xn ) !D W L(~n ; Xn ) 2 (1) (6.19) where Xn = (X1 ;oX2 ; : : : ; Xn ). The con…dence coe¢ cient corresponding to the interval n 2 : R( ) e a =2 is P L( ; Xn ) L(~n ; Xn ) e a2 =2 = P a2 P W a2 where W 2 = 2P (Z a) 1 where Z N (0; 1) = p as required. 2 log R( ; Xn ) (1) by 6.19 222 6.7.9 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Example Since 0:95 = 2P (Z 1:96) 1 where Z N (0; 1) and e (1:96)2 =2 therefore a 15% likelihood interval for . 6.7.10 1:9208 =e 0:1465 0:15 is also an approximate 95% con…dence interval for Exercise (a) Show that a 1% likelihood interval is an approximate 99:8% con…dence interval. (b) Show that a 50% likelihood interval is an approximate 76% con…dence interval. Note that while the con…dence intervals given by (6.17) or (6.18) are symmetric about the point estimate ^n , this is not true in general for likelihood intervals. 6.7.11 Example For Example 6.3.7 compare a 15% likelihood interval with the approximate 95% con…dence interval in Example 6.7.6. Solution From Example 6.3.7 the 15% likelihood interval is [0:3550; 0:6401] and from Example 6.7.6 the approximate 95% con…dence interval is [0:3498; 0:6405] These intervals are very close and agree to 2 decimal places. The reason for this is because the likelihood function (see Figure 6.3) is very symmetric about the maximum likelihood estimate. The approximate intervals (6.17) or (6.18) will be close to the corresponding likelihood interval whenever the likelihood function is reasonably symmetric about the maximum likelihood estimate. 6.7.12 Exercise Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Logistic( ; 1) distribution with probability density function f (x; ) = e (x 1+e (x ) ) 2 for x 2 <; 2< 6.7. APPROXIMATE CONFIDENCE INTERVALS 223 (a) Find the likelihood function, the score function, and the information function. How would you …nd the maximum likelihood estimate of ? (b) Show that if u is an observation from the Uniform(0; 1) distribution then x= log 1 u 1 is an observation from the Logistic( ; 1) distribution. (c) Use the following R code to randomly generate 30 observations from a Logistic( ; 1) distribution. # randomly generate 30 observations from a Logistic(theta,1) # using a random theta value between 2 and 3 set.seed(21086689) # set the seed so results can be reproduced truetheta<-runif(1,min=2,max=3) # data are sorted and rounded to two decimal places for easier display x<-sort(round((truetheta-log(1/runif(30)-1)),2)) x (d) Use R to plot the likelihood function for (e) Use Newton’s Method and R to …nd ^. based on these data. (f ) What are the values of S(^) and I(^)? (g) Use R to plot the relative likelihood function for based on these data. (h) Compare the 15% likelihood interval with the approximate 95% con…dence interval (6.18). 224 6.8 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER Chapter 6 Problems 1. Suppose X Binomial(n; ). Plot the log relative likelihood function for if x = 3 is observed for n = 100. On the same graph plot the log relative likelihood function for if x = 6 is observed for n = 200. Compare the graphs as well as the 10% likelihood interval and 50% likelihood interval for . 2. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Discrete Uniform(1; ) distribution. Find the likelihood function, the maximum likelihood estimate of and the maximum likelihood estimator of . If n = 20 and x20 = 33, …nd a 15% likelihood interval for . 3. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Geometric( ) distribution. (a) Find the score function and the maximum likelihood estimator of . (b) Find the observed information and the expected information. (c) Find the maximum likelihood estimator of E(Xi ). (d) If n = 20 and 20 P xi = 40 then …nd the maximum likelihood estimate of and a i=1 15% likelihood interval for . Is = 0:5 a plausible value of ? Why? 4. Suppose (X1 ; X2 ; X3 ) Multinomial(n; 2 ; 2 (1 ) ; (1 )2 ). Find the maximum likelihood estimator of , the observed information and the expected information. 5. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Pareto(1; ) distribution. (a) Find the score function and the maximum likelihood estimator of . (b) Find the observed information and the expected information. (c) Find the maximum likelihood estimator of E(Xi ). (d) If n = 20 and 20 P log xi = 10 …nd the maximum likelihood estimate of and a i=1 15% likelihood interval for . Is = 0:1 a plausible value of ? Why? (e) Show that Q(X; ) = 2 n P log Xi i=1 is a pivotal quantity. (Hint: What is the distribution of log X if X Pareto(1; )?) Use this pivotal quantity to determine a 95% equal tail con…dence interval for the data in (d). Compare this interval with the 15% likelihood interval. 6.8. CHAPTER 6 PROBLEMS 225 6. The following model is proposed for the distribution of family size in a large population: k P (k children in family; ) = for k = 1; 2; : : : 1 2 1 P (0 children in family; ) = The parameter is unknown and 0 < < 12 . Fifty families were chosen at random from the population. The observed numbers of children are given in the following table: No. of children 0 1 2 3 4 Total Frequency observed 17 22 7 3 1 50 (a) Find the likelihood, log likelihood, score and information functions for . (b) Find the maximum likelihood estimate of and the observed information. (c) Find a 15% likelihood interval for . (d) A large study done 20 years earlier indicated that for these data? = 0:45. Is this value plausible (e) Calculate estimated expected frequencies. Does the model give a reasonable …t to the data? 7. Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the Two Parameter Exponential( ; 1) distribution. Show that is a location parameter and ~ = X(1) is the maximum likelihood estimator of . Show that P (~ and thus show that and q) = 1 ^ + 1 log (1 n ^ + 1 log n 1 p 2 e nq for q 0 p) ; ^ 1 ; ^ + log n 1+p 2 are both 100p% con…dence intervals for . Which con…dence interval seems more reasonable? 8. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Gamma (a) Find ~n , the maximum likelihood estimator of . (b) Justify the statement ~n !p . (c) Find the maximum likelihood estimator of V ar(Xi ). 1 1 2; distribution. 226 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER (d) Use moment generating functions to show that Q = 2 n P Xi 2 (n). If n = 20 i=1 and 20 P xi = 6, use the pivotal quantity Q to construct an exact 95% equal tail i=1 con…dence interval for . Is = 0:7 a plausible value of ? h i1=2 !D Z N(0; 1). Use this asymptotic pivotal (e) Verify that (~n ) J(~n ) quantity to construct an approximate 95% con…dence interval for . Compare this interval with the exact con…dence interval from (d) and a 15% likelihood interval for . What do the approximate con…dence interval and the likelihood interval indicate about the plausibility of the value = 0:7? 9. The number of coliform bacteria X in a 10 cubic centimeter sample of water from a section of lake near a beach has a Poisson( ) distribution. (a) If a random sample of n specimen samples is taken and X1 ; X2 ; : : : ; Xn are the respective numbers of observed bacteria, …nd the likelihood function, the maximum likelihood estimator and the expected information for : 20 P (b) If n = 20 and xi = 40, obtain an approximate 95% con…dence interval for i=1 and a 15% likelihood interval for . Compare the intervals. (c) Suppose there is a fast, simple test which can detect whether there are bacteria present, but not the exact number. If Y is the number of samples out of n which have bacteria, show that P (Y = y) = n (1 y e )y (e )n y for y = 0; 1; : : : ; n (d) If n = 20 and we found y = 17 of the samples contained bacteria, use the likelihood function from part (c) to get an approximate 95% con…dence interval for . Hint: Let = 1 e and use the likelihood function for to get an approximate con…dence interval for and then transform this to an approximate con…dence interval for = log(1 ). 7. Maximum Likelihood Estimation - Multiparameter In this chapter we look at the method of maximum likelihood to obtain both point and interval estimates for the case in which the unknown parameter is a vector of unknown parameters = ( 1 ; 2 ; : : : ; k ). In your previous statistics course you would have seen the N ; 2 model with two unknown parameters = ; 2 and the simple linear regression model N + x; 2 with three unknown parameters = ; ; 2 . Although the case of k parameters is a natural extension of the one parameter case, the k parameter case is more challenging. For example, the maximum likelihood estimates are usually found be solving k nonlinear equations in k unknowns 1 ; 2 ; : : : ; k . In most cases there are no explicit solutions and the maximum likelihood estimates must be found using a numerical method such as Newton’s Method. Another challenging issue is how to summarize the uncertainty in the k estimates. For one parameter it is straightforward to summarize the uncertainty using a likelihood interval or a con…dence interval. For k parameters these intervals become regions in <k which are di¢ cult to visualize and interpret. In Section 7:1 we give all the de…nitions related to …nding the maximum likelihood estimates for k unknown parameters. These de…nitions are analogous to the de…nitions which were given in Chapter 6 for one unknown parameter. We also give the extension of the invariance property of maximum likelihood estimates and Newton’s Method for k variables. In Section 7:2 we de…ne likelihood regions and show how to …nd them for the case k = 2. In Section 7:3 we introduce the Multivariate Normal distribution which is the natural extension of the Bivariate Normal distribution discussed in Section 3.10. We also give the limiting distribution of the maximum likelihood estimator of which is a natural extension of Theorem 6.5.1. In Section 7:4 we show how to obtain approximate con…dence regions for based on the limiting distribution of the maximum likelihood estimator of and show how to …nd them for the case k = 2. We also show how to …nd approximate con…dence intervals for individual parameters and indicate that these intervals must be used with care. 227 228 7.1 7.1.1 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER Likelihood and Related Functions De…nition - Likelihood Function Suppose X is a (vector) random variable with probability (density) function f (x; ), where = ( 1 ; 2 ; : : : ; k ) 2 and is the parameter space or set of possible values of . Suppose also that x is an observed value of X. The likelihood function for based on the observed data x is L( ) = L( ; x) = f (x; ) for 2 (7.1) If X = (X1 ; X2 ; : : : ; Xn ) is a random sample from a distribution with probability function f (x; ) and x = (x1 ; x2 ; : : : ; xn ) are the observed data then the likelihood function for based on the observed data x1 ; x2 ; : : : ; xn is L( ) = L ( ; x) n Q = f (xi ; ) for i=1 2 Note: If X is a discrete random variable then L( ) = P (observing the data x; ). If X is a continuous random variable then an argument similar to the one in 6.2.6 can be made to justify the use of (7.1). 7.1.2 De…nition - Maximum Likelihood Estimate and Estimator The value of that maximizes the likelihood function L( ) is called the maximum likelihood estimate. The maximum likelihood estimate is a function of the observed data x and we write ^ = ^ (x). The corresponding maximum likelihood estimator which is a random vector is denoted by ~ = ~(X). As in the case of k = 1 it is frequently easier to work with the log likelihood function which is maximized at the same value of as the likelihood function. 7.1.3 De…nition - Log Likelihood Function The log likelihood function is de…ned as l( ) = l ( ; x) = log L( ) for 2 where x are the observed data and log is the natural logarithmic function. The maximum likelihood estimate of , ^ = (^1 ; ^2 ; : : : ; ^k ) is usually found by solving @l( ) @ j = 0; j = 1; 2; : : : ; k simultaneously. See Chapter 7, Problem 1 for an example in which the maximum likelihood estimate is not found in this way. 7.1. LIKELIHOOD AND RELATED FUNCTIONS 7.1.4 229 De…nition - Score Vector If = ( 1 ; as 2; : : : ; k ) then the score vector (function) is a 1 S( ) = S( ; x) = @l( ) @l( ) @l( ) ; ;:::; @ 1 @ 2 @ k k vector of functions de…ned for 2 where x are the observed data. We will see that, as in the case of one parameter, the information matrice will provides information about the variance of the maximum likelihood estimator. 7.1.5 De…nition - Information Matrix If = ( 1 ; 2 ; : : : ; k ) then the information matrix (function) I( ) = I ( ; x) is a k symmetric matrix of functions whose (i; j) entry is given by @ 2 l( ) @ i@ j for k 2 where x are the observed data. I(^) is called the observed information matrix. 7.1.6 De…nition - Expected Information Matrix If = ( 1 ; 2 ; : : : ; k ) then the expected information matrix (function) J( ) is a k symmetric matrix of functions whose (i; j) entry is given by E @ 2 l( ; X) @ i@ j for k 2 The invariance property of the maximum likelihood estimator also holds in the multiparameter case. 7.1.7 Theorem - Invariance of the Maximum Likelihood Estimate If ^ = (^1 ; ^2 ; : : : ; ^k ) is the maximum likelihood estimate of is the maximum likelihood estimate of g ( ). 7.1.8 = ( 1; 2; : : : ; k ) then g(^) Example Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the N( ; 2 ) distribution. Find the score vector, the information matrix, the expected information matrix and the maximum likelihood estimator of = ; 2 . Find the observed information matrix I ^ ; ^ 2 and thus verify that ^ ; ^ 2 is the maximum likelihood estimator of ; 2 . What is the maximum likelihood estimator of the parameter = ; 2 = = which is called the coe¢ cient of variation? 230 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER Solution The likelihood function is L ; 2 n Q = i=1 p = (2 ) 1 2 1 exp n=2 n=2 2 )2 (xi 2 2 exp n 1 P 2 2 or more simply L ; 2 = 2 n=2 exp n 1 P 2 2 The log likelihood function is l ; 2 = = = n log 2 n log 2 n log 2 2 2 1 2 1 2 2 2 n 1 P 2 1 2 1 n 2 2 (xi )2 for 2 <, 1 2 <, 2 + n P x)2 + n (x (xi h (n n 1 i=1 2 2 h i for x)2 (xi 2 )=n 1 2 )2 1) s2 + n (x n P 1 @l = 0; @ (n 1 (x ) )2 1) s2 + n (x @l =0 @ 2 are solved simultaneously for 2 Since @2l @ 2 @2l @( 2 )2 = = >0 >0 )2 The equations = x and 2 )2 i=1 @l n = 2 (x @ @l = @ 2 for i=1 (xi 2 s2 = and )2 i=1 where Now (xi i=1 = n 1 P (xi n i=1 x)2 = (n 1) n @2l n (x ) = 2 4 @ @ n 1 1 h (n 1) s2 + n (x + 6 2 4 s2 n ; 2 )2 i i 2 <, 2 >0 7.1. LIKELIHOOD AND RELATED FUNCTIONS 231 the information matrix is I ; 2 Since 2 =4 n(x n 2 n(x ) n 1 2 4 4 + h (n 1 6 ) 4 )2 1) s2 + n (x 3 i 5 n n2 2 > 0 and det I ^ ; ^ = >0 ^2 2^ 6 then by the second derivative test the maximum likelihood estimates of of I11 ^ ; ^ 2 = n 1 P (xi n i=1 ^ = x and ^ 2 = x)2 = (n 1) n and s2 and the maximum likelihood estimators are n 1 P Xi n i=1 ~ = X and ~ 2 = The observed information is 2 I ^; ^2 = 4 Now E n n = 2 2 ; E 2 X n ^2 0 0 n 2^ 4 " (n = 1) n S2 3 5 # n X 4 =0 Also n 1 1 h + (n 1) S 2 + n X 6 2 4 h n 1 1 n 2 + (n 1) E(S ) + nE X 6 2 4 1 n 1 + 6 (n 1) 2 + 2 2 4 n 2 4 E = = = since E X = 0; E h 2 X i 2 = V ar X = Therefore the expected information matrix is " J 2 ; = n n 2 4 0 and the inverse of the expected information matrix is 2 2 J ; 2 1 =4 # 0 2 n n 0 0 2 4 n 3 5 2 i 2 io and E S 2 = 2 2 are 232 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER Note that 2 V ar X = V ar ^ 2 = V ar n n 1 P Xi n i=1 X 2 = 2(n 1) 2 4 n 4 n2 and n P 1 Cov(X; Xi n i=1 Cov(X; ^ 2 ) = since X and n P Xi X 2 2 X )=0 are independent random variables. i=1 By the invariance property of maximum likelihood estimators the maximum likelihood estimator of = = is ^ = ^ =^ . Recall from your previous statistics course that inferences for the independent pivotal quantities X p S= n t (n 1) and See Figure 7.1 for a graph of R ; (n 1)S 2 2 2 (n and 2 are made using 1) for n = 50; ^ = 5 and ^ 2 = 4. 2 1 0.8 0.6 0.4 0.2 0 8 6 6 5.5 4 5 2 σ 2 4.5 0 4 µ Figure 7.1: Normal Relative Likelihood Function for n = 50; ^ = 5 and ^ 2 = 4 7.1. LIKELIHOOD AND RELATED FUNCTIONS 7.1.9 233 Exercise Suppose Yi N( + xi ; 2 ), i = 1; 2; : : : ; n independently where the xi are known constants. Show that the maximum likelihood estimators of , and 2 are given by ^x ~ = Y n P (xi x) Yi Y ~ = i=1 n P (xi x)2 i=1 ~ 2 n 1 P (Yi n i=1 = ~ xi )2 ~ Note: ~ and ~ are also the least squares estimators of 7.1.10 and . Example Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Beta(a; b) distribution with probability density function (a + b) a x (a) (b) f (x; a; b) = 1 x)b (1 1 for 0 < x < 1, a > 0; b > 0 Find the likelihood function, the score vector, and the information matrix and the expected information matrix. How would you …nd the maximum likelihood estimates of a and b? Solution The likelihood function is L(a; b) = n Q (a + b) a x (a) (b) i i=1 = (a + b) (a) (b) n (a + b) (a) (b) The log likelihood function is l(a; b) = n [log (a + b) where t1 = n n Q n Q xi )b (1 a 1 xi i=1 or more simply L(a; b) = 1 a xi i=1 log (a) 1 n Q for a > 0; b > 0 b 1 (1 xi ) i=1 n Q b (1 xi ) for a > 0; b > 0 i=1 log (b) + at1 + bt2 ] n n 1 P 1 P log xi and t2 = log(1 n i=1 n i=1 for a > 0; b > 0 xi ) 234 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER Let (z) = 0 (z) d log (z) = dz (z) which is called the digamma function. The score vector is S (a; b) = h = n @l @a h @l @b i (a + b) (a) + t1 (a + b) (b) + t2 i for a > 0; b > 0. S (a; b) = (0; 0) must be solved numerically to …nd the maximum likelihood estimates of a and b. Let d 0 (z) (z) = dz which is called the trigamma function. The information matrix is I(a; b) = n " 0 (a) 0 (a 0 (a 0 (a + b) 0 (b) + b) + b) 0 (a + b) # for a > 0; b > 0, which is also expected information matrix. 7.1.11 Exercise Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Gamma( ; ) distribution. Find the likelihood function, the score vector, and the information matrix and the expected information matrix. How would you …nd the maximum likelihood estimates of and ? Often S( 1 ; 2 ; : : : ; Newton’s Method. 7.1.12 k) = (0; 0; : : : ; 0) must be solved numerically using a method such as Newton’s Method Let (0) be an initial estimate of using (i+1) = (i) = ( 1; h + S( (i) 2 ; : : : ; k ). ih ) I( (i) ) i 1 The estimate (i) can be updated for i = 0; 1; : : : Note: The initial estimate, (0) , may be determined by calculating L ( ) for a grid of values to determine the region in which L ( ) obtains a maximum. 7.1. LIKELIHOOD AND RELATED FUNCTIONS 7.1.13 235 Example Use the following R code to randomly generate 35 observations from a Beta(a; b) distribution # randomly generate 35 observations from a Beta(a,b) set.seed(32086689) # set the seed so results can be reproduced # use randomly generated a and b values truea<-runif(1,min=2,max=3) trueb<-runif(1,min=1,max=4) # data are sorted and rounded to two decimal places for easier display x<-sort(round(rbeta(35,truea,trueb),2)) x Use Newton’s Method and R to …nd (^ a; ^b). What are the values of S(^ a; ^b) and I(^ a; ^b)? Solution The generated data are 0:08 0:34 0:55 0:73 0:19 0:36 0:55 0:77 0:21 0:39 0:56 0:79 0:25 0:45 0:56 0:81 0:28 0:45 0:61 0:85 0:29 0:47 0:63 0:29 0:48 0:64 0:30 0:49 0:65 0:30 0:54 0:69 0:32 0:54 0:69 The maximum likelihood estimates of a and b can be found using Newton’s Method given by " # " # h i 1 a(i+1) a(i) (i) (i) (i) (i) = + S(a ; b ) I(a ; b ) b(i+1) b(i) for i = 0; 1; ::: until convergence. Here is R code for Newton’s Method for the Beta Example. # function for calculating Beta score for a and b and data x BESF<-function(a,b,x) {S<-length(x)*c(digamma(a+b)-digamma(a)+mean(log(x)), digamma(a+b)-digamma(b)+mean(log(1-x)))) return(S)} #) # function for calculating Beta information for a and b) BEIF<-function(a,b) {I<-length(x)*cbind(c(trigamma(a)-trigamma(a+b),-trigamma(a+b)), c(-trigamma(a+b),trigamma(b)-trigamma(a+b)))) return(I)} 236 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER # Newton’s Method for Beta Example NewtonBE<-function(a,b,x) {thold<-c(a,b) thnew<-thold+0.1 while (sum(abs(thold-thnew))>0.0000001) {thold<-thnew thnew<-thold+BESF(thold[1],thold[2],x)%*%solve(BEIF(thold[1],thold[2])) print(thnew)} return(thnew)} thetahat<-NewtonBE(2,2,x) The maximum likelihood estimates are a ^ = 2:824775 and ^b = 2:97317. The score vector ^ evaluated at (^ a; b) is S(^ a; ^b) = h 3:108624 10 14 7:771561 10 15 i which indicates we have obtained a local extrema. The observed information matrix is " # 8:249382 6:586959 I(^ a; ^b) = 6:586959 7:381967 Note that since det[I(^ a; ^b)] = (8:249382) (7:381967) (6:586959)2 > 0 and [I(^ a; ^b)]11 = 8:249382 > 0 then by the second derivative test we have found the maximum likelihood estimates. 7.1.14 Exercise Use the following R code to randomly generate 30 observations from a Gamma( ; ) distribution # randomly generate 35 observations from a Gamma(a,b) set.seed(32067489) # set the seed so results can be reproduced # use randomly generated a and b values truea<-runif(1,min=1,max=3) trueb<-runif(1,min=3,max=5) # data are sorted and rounded to two decimal places for easier display x<-sort(round(rgamma(30,truea,scale=trueb),2)) x Use Newton’s Method and R to …nd (^ ; ^ ). What are the values of S(^ ; ^ ) and I(^ ; ^ )? 7.2. LIKELIHOOD REGIONS 7.2 237 Likelihood Regions For one unknown parameter likelihood intervals provide a way of summarizing the uncertainty in the maximum likelihood estimate by providing an intervals of values which are plausible given the observed data. For k unknown parameter summarizing the uncertainty is more challenging. We begin with the de…nition of a likelihood region which is the natural extension of a likelihood interval. 7.2.1 De…nition - Likelihood Regions The set of values for which R( ) p is called a 100p% likelihood region for . A 100p% likelihood region for two unknown parameters = ( 1 ; 2 ) is given by f( 1 ; 2 ) ; R ( 1 ; 2 ) pg. These regions will be elliptical in shape. To show this we note that for ( 1 ; 2 ) su¢ ciently close to (^1 ; ^2 ) we have " # " # h i ^1 ^1 1 1 1 ^1 ^2 L ( 1; 2) L(^1 ; ^2 ) + (^1 ; ^2 ) + I(^1 ; ^2 ) 1 2 ^2 ^2 2 2 2 " # h i ^ 1 ^ 1 1 ^2 I(^1 ; ^2 ) = L(^1 ; ^2 ) + since S(^1 ; ^2 ) = 0 1 1 2 ^2 2 2 Therefore R ( 1; 2) = L ( 1; 2) L(^1 ; ^2 ) 1 = 1 h 1 ^1 2L(^1 ; ^2 ) h i 1h 2L(^1 ; ^2 ) ( The set of points ( 1 ; ( 1 ; 2 ) which satisfy ( 1 2) ^1 )2 I^11 + 2( 1 1 ^2 ^1 )( 2 " ^1 )2 I^11 + 2 which satisfy R ( 1 ; 1 2 i 2) ^2 )I^12 + ( I^11 I^12 I^12 I^22 1 #" ^1 ( 2 ^1 1 ^2 2 # ^2 )I^12 + ( 2 ^2 )2 I^22 i = p is approximately the set of points 2 ^2 )2 I^22 = 2 (1 p) L(^1 ; ^2 ) which we recognize as the points on an ellipse centred at (^1 ; ^2 ). Therefore a 100p% likelihood region for two unknown parameters = ( 1 ; 2 ) will be the set of points on and inside a region which will be approximately elliptical in shape. A similar argument can be made to show that the likelihood regions for three unknown parameters = ( 1 ; 2 ; 3 ) will be approximate ellipsoids in <3 . 238 7.2.2 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER Example (a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters (a; b) in Example 7.1.13. Comment on the shapes of the regions. (b) Is the value (2:5; 3:5) a plausible value of (a; b)? Solution (a) The following R code generates the required likelihood regions. # function for calculating Beta relative likelihood function # for parameters a,b and data x BERLF<-function(a,b,that,x) {t1<-prod(x) t2<-prod(1-x) n<-length(x) ah<-that[1] bh<-that[2] L<-<-((gamma(a+b)*gamma(ah)*gamma(bh))/ (gamma(a)*gamma(b)*gamma(ah+bh)))^n*t1^(a-ah)*t2^(b-bh) return(L)} # a<-seq(0.5,5.5,0.01) b<-seq(0.5,6,0.01) R<-outer(a,b,FUN = BERLF,thetahat,x) contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2) The 1%, 5%, 10%, 50%, and 90% likelihood regions for (a; b) are shown in Figure 7.2. The likelihood contours are approximate ellipses but they are not symmetric about the maximum likelihood estimates (^ a; ^b) = (2:824775; 2:97317). The likelihood regions are more stretched for larger values of a and b. The ellipses are also skewed relative to the ab coordinate axes. The skewness of the likelihood contours relative to the ab coordinate axes is determined by the value of I^12 . If the value of I^12 is close to zero the skewness will be small. (b) Since R (2:5; 3:5) = 0:082 the point (2:5; 3:5) lies outside a 10% likelihood region so it is not a very plausible value of (a; b). Note however that a = 2:5 is a plausible value of a for some values of b, for example, a = 2:5, b = 2:5 lies inside a 50% likelihood region so (2:5; 2:5) is a plausible value of (a; b). We see that when there is more than one parameter then we need to determine whether a set of values are jointly plausible given the observed data. 239 1 2 3 b 4 5 6 7.2. LIKELIHOOD REGIONS 1 2 3 4 5 a Figure 7.2: Likelihood regions for Beta example 7.2.3 Exercise (a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters ( ; ) in Exercise 7.1.14. Comment on the shapes of the regions. (b) Is the value (3; 2:7) a plausible value of ( ; )? (c) Use the R code in Exercise 7.1.14 to generate 100 observations from the Gamma( ; ) distribution. (d) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for the data generated in (c). Comment on the shapes of these regions as compared to the regions in (a). 240 7.3 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER Limiting Distribution of Maximum Likelihood Estimator To discuss the asymptotic properties of the maximum likelihood estimator in the multiparameter case we need to de…ne convergence in probability and convergence in distribution of a sequence of random vectors. 7.3.1 De…nition - Convergence of a Sequence of Random Vectors Let X1 ; X2 ; : : : ; Xn ; : : : be a sequence of random vectors where Xn = (X1n ; X2n ; : : : ; Xkn ). Let X = (X1 ; X2 ; : : : ; Xk ) be a random vector and x = (x1 ; x2 ; : : : ; xk ). (1) If Xin !p Xi for i = 1; 2; : : : ; k, then Xn !p X (2) Let Fn (x) = P (X1n x1 ; X2n x2 ; : : : ; Xkn xk ) be the cumulative distribution function of Xn . Let F (x) = P (X1 x1 ; X2 x2 ; : : : ; Xk xk ) be the cumulative distribution function of X. If lim Fn (x) = F (x) n!1 at all points of continuity of F (x) then Xn !D X = (X1 ; X2 ; : : : ; Xk ) To discuss the asymptotic properties of the maximum likelihood estimator in the multiparameter case we also need the de…nition and properties of the Multivariate Normal Distribution. The Multivariate Normal distribution is the natural extension of the Bivariate Normal distribution which was discussed in Section 3.10. 7.3.2 De…nition - Multivariate Normal Distribution (MVN) Let X = (X1 ; X2 ; : : : ; Xk ) be a 1 k random vector with E(Xi ) = i and Cov(Xi ; Xj ) = ij ; i; j = 1; 2; : : : ; k. (Note: Cov(Xi ; Xi ) = ii = V ar(Xi ) = 2i .) Let = ( 1; 2; : : : ; k ) be the mean vector and be the k k symmetric covariance matrix whose (i; j) entry is 1 , exists. If the joint probability density ij . Suppose also that the inverse matrix of , function of (X1 ; X2 ; : : : ; Xk ) is given by f (x1 ; x2 ; : : : ; xk ) = 1 (2 )k=2 j j1=2 exp 1 (x 2 ) 1 (x )T for x 2 <k where x = (x1 ; x2 ; : : : ; xk ) then X is said to have a Multivariate Normal distribution. We write X MVN( ; ). The following theorem gives some important properties of the Multivariate Normal distribution. These properties are a natural extension of the properties of the Bivariate Normal distribution found in Theorem 3.10.2. 7.3. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 7.3.3 241 Theorem - Properties of MVN Distribution Suppose X = (X1 ; X2 ; : : : ; Xk ) MVN( ; ). Then (1) X has joint moment generating function 1 tT + t tT 2 M (t) = exp for t = (t1 ; t2 ; : : : ; tk ) 2 <k (2) Any subset of X1 ; X2 ; : : : ; Xk also has a MVN distribution and in particular Xi N i ; 2i ; i = 1; 2; : : : ; k. (3) (X 1 ) )T (X 2 (k) (4) Let c = (c1 ; c2 ; : : : ; ck ) be a nonzero vector of constants then XcT = k P ci Xi cT ; c cT N i=1 (5) Let A be a k p vector of constants of rank p k then N( A; AT A) XA (6) The conditional distribution of any subset of (X1 ; X2 ; : : : ; Xk ) given the rest of the coordinates is a MVN distribution. In particular the conditional probability density function of Xi given Xj = xj ; i 6= j; is Xi jXj = xj N( i + ij i (xj j )= j ; (1 2 2 ij ) i ) The following theorem gives the asymptotic distribution of the maximum likelihood estimator in the multiparameter case. This theorem looks very similar to Theorem 6.5.1 with the scalar quantities replaced by the appropriate vectors and matrices. 7.3.4 Theorem - Limiting Distribution of the Maximum Likelihood Estimator Suppose Xn = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where = ( 1 ; 2 ; : : : ; k ) 2 and the dimension of is k. Let ~n = ~n (X1 ; X2 ; : : : ; Xn ) be the maximum likelihood estimator of based on Xn . Let 0k be a 1 k vector of zeros, Ik be the k k identity matrix, and [J( )]1=2 be a matrix such that [J( )]1=2 [J( )]1=2 = J( ). Then under certain (regularity) conditions ~n !p (~n ) [J( )]1=2 !D Z 2 log R( ; Xn ) = 2[l(~n ; Xn ) (7.2) MVN(0k ; Ik ) l( ; Xn )] !D W (7.3) 2 (k) (7.4) 242 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER for each 2 . Since ~n !p , ~n is a consistent estimator of . Theorem 7.3.4 implies that for su¢ ciently large n, ~n has an approximately MVN( ; [J( 0 )] 1 ) distribution. Therefore for su¢ ciently large n E(~n ) and therefore ~n is an asymptotically unbiased estimator of . Also V ar(~n ) 1 [J( )] where [J( )] 1 is the inverse matrix of of the matrix J( ). (Since J( ) is a k k symmetric matrix, [J( )] 1 also a k k symmetric matrix.) [J( )] 1 is called the asymptotic variance/covariance matrix of ~n . Of course J( ) is unknown because is unknown. But (7.2), (7.3) and Slutsky’s Theorem imply that (~n )[J(~n )]1=2 !D Z MVN(0k ; Ik ) Therefore for su¢ ciently large n we have V ar(~n ) h i where J(^n ) 1 h i J(^n ) 1 is the inverse matrix of J(^n ). It is also possible to show that (~n )[I(~n ; X)]1=2 !D MVN(0k ; Ik ) so that for su¢ ciently large n we also have V ar(~n ) h i where I(^n ) 1 h I(^n ) i 1 is the inverse matrix of the observed information matrix I(^n ). These results can be used to construct approximate con…dence regions for in the next section. as shown Note: These results do not hold if the support set of X depends on . 7.4 7.4.1 Approximate Con…dence Regions De…nition - Con…dence Region A 100p% con…dence region for satis…es = ( 1; 2; : : : ; k ) based on X is a region R(X) <k which P [ 2 R(X)] = p Exact con…dence regions can only be obtained in a very few special cases such as Normal linear models. More generally we must rely on approximate con…dence regions based on the results of Theorem 7.3.3. 7.4. APPROXIMATE CONFIDENCE REGIONS 7.4.2 243 Asymptotic Pivotal Quantities and Approximate Con…dence Regions The limiting distribution of ~n can be used to obtain approximate con…dence regions for . Since (~n )[J( )]1=2 !D Z MVN(0k ; Ik ) it follows from Theorem 7.3.3(3) and Limit Theorems that (~n )J(~n )(~n )T !D W 2 (k) An approximate 100p% con…dence region for based on the asymptotic pivotal quantity (~n )J(~n )(~n )T is the set of all vectors in the set f : (^n where c is the value such that P (W )J(^n )(^n )T cg 2 (k). c) = p and W Similarly since (~n )[I(~n ; Xn )]1=2 !D Z MVN(0k ; Ik ) it follows from Theorem 7.3.3(3) and Limit Theorems that (~n )I(~n ; Xn )(~n )T !D W 2 (k) An approximate 100p% con…dence region for based on the asymptotic pivotal quantity (~n )I(~n ; Xn )(~n )T is the set of all vectors in the set f : (^n )I(^n )(^n )T cg where I(^n ) is the observed information. Finally since 2 2 log R( ; Xn ) !D W an approximate 100p% con…dence region for the set of all vectors satisfying f : (k) based on this asymptotic pivotal quantity is 2 log R( ; x) cg where x = (x1 ; x2 ; : : : ; xn ) are the observed data. Since f : 2 log R( ; x) cg n o = : R( ; x) e c=2 we recognize that this interval is actually a likelihood region. 244 7.4.3 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER Example Use R and the results from Examples 7.1.10 and 7.1.13 to graph approximate 90%, 95%, and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with the likelihood regions in Example 7.2.2. Solution From Example 7.1.10 that for a random sample from the Beta(a; b) distribution the information matrix and the expected information matrix are given by " # 0 (a) 0 (a + b) 0 (a + b) I(a; b) = n = J(a; b) 0 (a + b) 0 (b) 0 (a + b) Since h a ~ a ~b b i J(~ a; ~b) " a ~ ~b a b # 2 !D W an approximate 100p% con…dence region for (a; b) is given by " # h i a ^ a ^ f(a; b) : a a; b) ^ ^ a ^b b J(^ b b where P (W using c) = p. Since 2 (2) (2) cg = Gamma(1; 2) = Exponential(2), c can be determined p = P (W c) = Zc 1 e 2 x=2 dx = 1 e c=2 0 which gives c= For p = 0:95, c = 2 log(1 p) 2 log(0:05) = 5:99; an approximate 95% con…dence region is given by " # h i a ^ a f(a; b) : a 5:99g a; ^b) ^ ^ a ^b b J(^ b b If we let J(^ a; ^b) = " J^11 J^12 J^12 J^22 # then the approximate con…dence region can be written as f(a; b) : (^ a a)2 J^11 + 2 (^ a a) (^b b)J^12 + (^b b)2 J^22 5:99g We note that the approximate con…dence region is the set of points on and inside the ellipse (^ a a)2 J^11 + 2 (^ a a) (^b b)J^12 + (^b b)2 J^22 = 5:99 7.4. APPROXIMATE CONFIDENCE REGIONS 245 which is centred at (^ a; ^b). For the data in Example 7.1.13, a ^ = 2:824775, ^b = 2:97317 and " # 8:249382 6:586959 I(^ a; ^b) = J(^ a; ^b) = 6:586959 7:381967 Approximate 90% ( 2 log(0:1) = 4:61), 95% ( 2 log(0:05) = 5:99), and 99% ( 2 log(0:01) = 9:21) con…dence regions are shown in Figure 7.3. The following R code generates the required approximate con…dence regions. 1 2 3 b 4 5 6 # function for calculating values for determining confidence regions ConfRegion<-function(a,b,th,info) {c<-(th[1]-a)^2*info[1,1]+2*(th[1]-a)* (th[2]-b)*info[1,2]+(th[2]-b)^2*info[2,2] return(c)} # # graph approximate confidence regions a<-seq(1,5.5,0.01) b<-seq(1,6,0.01) c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat) contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2) # 1 2 3 4 5 a Figure 7.3: Approximate con…dence regions for Beta(a; b) example 246 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER A 10% likelihood region for (a; b) is given by f(a; b) : R(a; b; x) 2 2 log R(a; b; Xn ) !D W 0:1g. Since (2) = Exponential (2) we have P [R(a; b; X) 0:1] = P [ 2 log R(a; b; X) P (W 2 log (0:1)] 2 log (0:1)) [ 2 log(0:1)]=2 = 1 e = 1 0:1 = 0:9 and therefore a 10% likelihood region corresponds to an approximate 90% con…dence region. Similarly 1% and 5% likelihood regions correspond to approximate 99% and 95% con…dence regions respectively. If we compare the likelihood regions in Figure 7.2 with the approximate con…dence regions shown in Figure 7.3 we notice that the con…dence regions are exact ellipses centred at the maximum likelihood estimates whereas the likelihood regions are only approximate ellipses not centered at the maximum likelihood estimates. We notice that there are values inside an approximate 99% con…dence region but which are outside a 1% likelihood region. The point (a; b) = (1; 1:5) is an example. There were only 35 observations in this data set. The di¤erences between the likelihood regions and the approximate con…dence regions indicate that the Normal approximation might not be good. In this example the likelihood regions provide a better summary of the uncertainty in the estimates. 7.4.4 Exercise Use R and the results from Exercises 7.1.11 and 7.1.14 to graph approximate 90%, 95%, and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with the likelihood regions in Exercise 7.2.3. Since likelihood regions and approximate con…dence regions cannot be graphed or easily interpreted for more than two parameters, we often construct approximate con…dence intervals for individual parameters. Such con…dence intervals are often referred to as marginal con…dence intervals. These con…dence intervals must be used with care as we will see in Example 7.4.6. Approximate con…dence intervals can also be constructed for a linear combination of parameters. An illustration is given in Example 7.4.6. 7.4. APPROXIMATE CONFIDENCE REGIONS 7.4.5 Let i 247 Approximate Marginal Con…dence Intervals be the ith entry in the vector (~n = ( 1; 2 ; : : : ; k ). )[J( )]1=2 !D Z Since MVN(0k ; Ik ) it follows that an approximate 100p% marginal con…dence interval for p p [^i a v^ii ; ^i + a v^ii ] i is given by where ^i is the ith entry in the vector ^n , v^ii is the (i; i) entry of the matrix [J(^n )] a is the value such that P (Z a) = 1+p N(0; 1). 2 where Z 1, and Similarly since (~n )[I(~n ; Xn )]1=2 !D Z MVN(0k ; Ik ) it follows that an approximate 100p% con…dence interval for p p [^i a v^ii ; ^i + a v^ii ] where v^ii is now the (i; i) entry of the matrix [I(^n )] 7.4.6 i is given by 1. Example Using the results from Examples 7.1.10 and 7.1.13 determine approximate 95% marginal con…dence intervals for a, b, and an approximate con…dence interval for a + b. Solution Let Since h i J(^ a; ^b) h a ~ a ~b b i 1 = " [J(~ a; ~b)]1=2 !D Z v^11 v^12 v^12 v^22 # BVN h 0 0 then for large n, V ar(~ a) v^11 , V ar(~b) v^22 and Cov(~ a; ~b) mate 95% con…dence interval for a is given by p p ^ + 1:96 v^11 ] [^ a 1:96 v^11 ; a and an approximate 95% con…dence interval for b is given by p p [^b 1:96 v^22 ; ^b + 1:96 v^22 ] i ; " 1 0 0 1 #! v^12 . Therefore an approxi- 248 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER For the data in Example 7.1.13, a ^ = 2:824775, ^b = 2:97317 and h I(^ a; ^b) i 1 i 1 J(^ a; ^b) " # 8:249382 6:586959 = 6:586959 7:381967 " # 0:4216186 0:3762120 = 0:3762120 0:4711608 = h 1 An approximate 95% marginal con…dence interval for a is p p [2:824775 + 1:96 0:4216186; 2:824775 1:96 0:4216186] = [1:998403; 3:651148] and an approximate 95% con…dence interval for b is p p [2:97317 1:96 0:4711608; 2:97317 + 1:96 0:4711608] = [2:049695; 3:896645] Note that a = 2:1 is in the approximate 95% marginal con…dence interval for a and b = 3:8 is in the approximate 95% marginal con…dence interval for b and yet the point (2:1; 3:8) is not in the approximate 95% joint con…dence region for (a; b). Clearly these marginal con…dence intervals for a and b must be used with care. To obtain an approximate 95% marginal con…dence interval for a + b we note that V ar(~ a + ~b) = V ar(~ a) + V ar(~b) + 2Cov(~ a; ~b) v^11 + v^22 + 2^ v12 = v^ so that an approximate 95% con…dence interval for a + b is given by p p [^ a + ^b 1:96 v^; a ^ + ^b + 1:96 v^] For the data in Example 7.1.13 a ^ + ^b = 2:824775 + 2:824775 = 5:797945 v^ = v^11 + v^22 + 2^ v12 = 0:4216186 + 0:4711608 + 2(0:3762120) = 1:645203 and an approximate 95% marginal con…dence interval for a + b is p p [5:797945 + 1:96 1:645203; 5:797945 1:96 1:645203] = [2:573347; 9:022544] 7.4.7 Exercise Using the results from Exercises 7.1.11 and 7.1.14 determine approximate 95% marginal con…dence intervals for a, b, and an approximate con…dence interval for a + b. 7.5. CHAPTER 7 PROBLEMS 7.5 249 Chapter 7 Problems 1. Suppose x1 ; x2 ; : : : ; xn is a observed random sample from the distribution with cumulative distribution function F (x; 1; 2) =1 1 2 for x x Find the maximum likelihood estimators of 1 1; and 1 > 0; 2 >0 2. 2. Suppose (X1 ; X2 ; X3 ) Multinomial(n; 1 ; 2 ; 3 ). Verify that the maximum likelihood estimators of 1 and 2 are ~1 = X1 =n and ~2 = X2 =n. Find the expected information for 1 and 2 . 3. Suppose x11 ; x12 ; : : : ; x1n1 is an observed random sample from the N( 1 ; 2 ) distribution and independently x21 ; x22 ; : : : ; x2n2 is an observed random sample from the N( 2 ; 2 ) distribution. Find the maximum likelihood estimators of 1 ; 2 ; and 2 . 4. In a large population of males ages 40 50, the proportion who are regular smokers is where 0 1 and the proportion who have hypertension (high blood pressure) is where 0 1. Suppose that n men are selected at random from this population and the observed data are Category Frequency SH x11 SH x12 SH x21 SH x22 where S is the event the male is a smoker and H is the event the male has hypertension. (a) Assuming the events S and H are independent determine the likelihood function, the score vector, the maximum likelihood estimates, and the information matrix for and . (b) Determine the expected information matrix and its inverse matrix. What do you notice regarding the diagonal entries of the inverse matrix? 5. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Logistic( ; ) distribution. (a) Find the likelihood function, the score vector, and the information matrix for and . How would you …nd the maximum likelihood estimates of and ? (b) Show that if u is an observation from the Uniform(0; 1) distribution then x= log 1 u 1 is an observation from the Logistic( ; ) distribution. 250 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER (c) Use the following R code to randomly generate 30 observations from a Logistic( ; ) distribution. # randomly generate 30 observations from a Logistic(mu,beta) # using a random mu and beta values set.seed(21086689) # set the seed so results can be reproduced truemu<-runif(1,min=2,max=3) truebeta<-runif(1,min=3,max=4) # data are sorted and rounded to two decimal places for easier display x<-sort(round((truemu-truebeta*log(1/runif(30)-1)),2)) x (d) Use Newton’s Method and R to …nd (^ ; ^ ). Determine S(^ ; ^ ) and I(^ ; ^ ). (e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ). (f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ). Compare these approximate con…dence regions with the likelihood regions in (e). (g) Determine approximate 95% marginal con…dence intervals for , , and an approximate con…dence interval for + . 6. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Weibull( ; ) distribution. (a) Find the likelihood function, the score vector, and the information matrix for and . How would you …nd the maximum likelihood estimates of and ? (b) Show that if u is an observation from the Uniform(0; 1) distribution then x= [ log (1 u)]1= is an observation from the Weibull( ; ) distribution. (c) Use the following R code to randomly generate 40 observations from a Weibull( ; ) distribution. # randomly generate 40 observations from a Weibull(alpha,beta) # using random values for alpha and beta set.seed(21086689) # set the seed so results can be reproduced truealpha<-runif(1,min=2,max=3) truebeta<-runif(1,min=3,max=4) # data are sorted and rounded to two decimal places for easier display x<-sort(round(truebeta*(-log(1-runif(40)))^(1/truealpha),2)) x (d) Use Newton’s Method and R to …nd (^ ; ^ ). Determine S(^ ; ^ ) and I(^ ; ^ ). (e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ). 7.5. CHAPTER 7 PROBLEMS 251 (f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for ( ; ). Compare these approximate con…dence regions with the likelihood regions in (e). (g) Determine approximate 95% marginal con…dence intervals for , , and an approximate con…dence interval for + . 7. Suppose Yi Binomial(1; pi ), i = 1; 2; : : : ; n independently where pi = 1 + e and the xi are known constants. xi 1 (a) Determine the likelihood function, the score vector, and the expected information matrix for and . (b) Explain how you would use Newton’s method to …nd the maximum likelihood estimates of and . 8. Suppose x1 ; x2 ; : : : ; xn is an observed random sample from the Three Parameter Burr distribution with probability density function f (x; ; ; ) = (x= ) 1 [1 + (x= ) ] +1 for x > 0, > 0; > 0; >0 (a) Find the likelihood function, the score vector, and the information matrix for , , and . How would you …nd the maximum likelihood estimates of , , and ? (b) Show that if u is an observation from the Uniform(0; 1) distribution then h i1= x= (1 u) 1= 1 is an observation from the Three Parameter Burr distribution. (c) Use the following R code to randomly generate 60 observations from the Three Parameter Burr distribution. # randomly generate 60 observations from the 3 Parameter Burr # distribution using random values for alpha, beta and gamma set.seed(21086689) # set the seed so results can be reproduced truea<-runif(1,min=2,max=3) trueb<-runif(1,min=3,max=4) truec<-runif(1,min=3,max=4) # data are sorted and rounded to 2 decimal places for easier display x<-sort(round(trueb*((1-runif(60))^(-1/truec)-1)^(1/truea),2)) x (d) Use Newton’s Method and R to …nd (^ ; ^ ; ^ ). Determine S(^ ; ^ ; ^ ) and I(^ ; ^ ; ^ ). Use the second derivative test to verify that (^ ; ^ ; ^ ) are the maximum likelihood estimates. (e) Determine approximate 95% marginal con…dence intervals for , , and , and an approximate con…dence interval for + + . 252 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER 8. Hypothesis Testing Point estimation is a useful statistical procedure for estimating unknown parameters in a model based on observed data. Interval estimation is a useful statistical procedure for quantifying the uncertainty in these estimates. Hypothesis testing is another important statistical procedure which is used for deciding whether a given statement is supported by the observed data. In Section 8:1 we review the de…nitions and steps of a test of hypothesis. Much of this material was introduced in a previous statistics course such as STAT 221/231/241. In Section 8:2 we look at how the likelihood function can be use to construct a test of hypothesis when the model is completely speci…ed by the hypothesis of interest. The material in this section is mostly a review of material covered in a previous statistics course. In Section 8:3 we look at how the likelihood function function can be use to construct a test of hypothesis when the model is not completely speci…ed by the hypothesis of interest. The material in this section is mostly new material. 8.1 Test of Hypothesis In order to analyse a set of data x we often assume a model f (x; ) where 2 and is the parameter space or set of possible values of . A test of hypothesis is a statistical procedure used for evaluating the strength of the evidence provided by the observed data against an hypothesis. An hypothesis is a statement about the model. In many cases the hypothesis can be formulated in terms of the parameter as H0 : 2 0 where 0 is some subset of . H0 is called the null hypothesis. When conducting a test of hypothesis there is usually another statement of interest which is the statement which re‡ects what might be true if H0 is not supported by the observed data. This statement is called the alternative hypothesis and is denoted HA or H1 . In many cases HA may simply take the form HA : 2 = 0 In constructing a test of hypothesis it is useful to distinguish between simple and composite hypotheses. 253 254 8.1.1 8. HYPOTHESIS TESTING De…nition - Simple and Composite Hypotheses If the hypothesis completely speci…es the model including any parameters in the model then the hypothesis is simple otherwise the hypothesis is composite. 8.1.2 Example For each of the following indicate whether the null hypothesis is simple or composite. Specify and 0 and determine the dimension of each. (a) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample from a Poisson( ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a speci…ed value of . (b) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample from a Gamma( ; ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a speci…ed value of . (c) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample from an Exponential( 1 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym ) represent a random sample from an Exponential( 2 ) distribution. The hypothesis of interest is H0 : 1 = 2 . (d) It is assumed that the observed data x = (x1 ; x2 ; : : : ; xn ) represent a random sample from a N( 1 ; 21 ) distribution and independently the observed data y = (y1 ; y2 ; : : : ; ym ) represent a random sample from a N( 2 ; 22 ) distribution. The hypothesis of interest is H0 : 1 = 2 ; 21 = 22 . Solution (a) This is a simple hypothesis since the model and the unknown parameter are completely speci…ed. = f : > 0g which has dimension 1 and 0 = f 0 g which has dimension 0. (b) This is a composite hypothesis since is not speci…ed by H0 . = f( ; ) : which has dimension 2 and 0 = f( 0 ; ) : > 0g which has dimension 1. > 0; > 0g (c) This is a composite hypothesis since 1 and 2 are not speci…ed by H0 . = f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension 2 and 0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension 1. (d) This is a composite hypothesis since 1 , 2 2 2 > 0; = 2 2 <; 1 ; 1 ; 2 ; 2 : 1 2 <; 1 2 2 2 = 2; and 0 = ; ; ; : = ; 1 1 2 1 2 2 1 2 dimension 2. 2, 2, 1 2 >0 2 1 and 22 are not speci…ed by H0 . which has dimension 4 2 <; 21 > 0; 2 2 <; 22 > 0 which has To measure the evidence against H0 based on the observed data we use a test statistic or discrepancy measure. 8.1. TEST OF HYPOTHESIS 8.1.3 255 De…nition - Test Statistic or Discrepancy Measure A test statistic or discrepancy measure D is a function of the data X that is constructed to measure the degree of “agreement” between the data X and the null hypothesis H0 . A test statistic is usually chosen so that a small observed value of the test statistic indicates close agreement between the observed data and the null hypothesis H0 while a large observed value of the test statistic indicates poor agreement. The test statistic is chosen before the data are examined and the choice re‡ects the type of departure from the null hypothesis H0 that we wish to detect as speci…ed by the alternative hypothesis HA . A general method for constructing test statistics can be based on the likelihood function as we will see in the next two sections. 8.1.4 Example For Example 8.1.2(a) suggest a test statistic which could be used if the alternative hypothesis is HA : 6= 0 . Suggest a test statistic which could be used if the alternative hypothesis is HA : > 0 and if the alternative hypothesis is HA : < 0 . Solution If H0 : = 0 is true then E X = 0 . If the alternative hypothesis is HA : reasonable test statistic which could be used is D = X 0 . 6= 0 then a If the alternative hypothesis is HA : used is D = X 0. > 0 then a reasonable test statistic which could be If the alternative hypothesis is HA : used is D = 0 X. < 0 then a reasonable test statistic which could be After the data have been collected the observed value of the test statistic is calculated. Assuming the null hypothesis H0 is true we compute the probability of observing a value of the test statistic at least as great as that observed. This probability is called the p-value of the data in relation to the null hypothesis H0 . 8.1.5 De…nition - p-value Suppose we use the test statistic D = D (X) to test the null hypothesis H0 . Suppose also that d = D (x) is the observed value of D. The p-value or observed signi…cance level of the test of hypothesis H0 using test statistic D is p-value = P (D d; H0 ) The p-value is the probability of observing such poor agreement using test statistic D between the null hypothesis H0 and the data if the null hypothesis H0 is true. If the p-value 256 8. HYPOTHESIS TESTING is very small, then such poor agreement would occur very rarely if the null hypothesis H0 is true, and we interpret this to mean that the observed data are providing evidence against the null hypothesis H0 . The smaller the p-value the stronger the evidence against the null hypothesis H0 based on the observed data. A large p-value does not mean that the null hypothesis H0 is true but only indicates a lack of evidence against the null hypothesis H0 based on the observed data and using the test statistic D. The following table gives a rough guideline for interpreting p-values. These are only guidelines. The interpretation of p-values must always be made in the context of a given study. Table 10.1: Guidelines for interpreting p-values p-value Interpretation p-value > 0:10 No evidence against H0 based on the observed data. 0:05 < p-value 0:10 0:01 < p-value 0:05 0:001 < p-value 0:01 p-value 0:001 Weak evidence against H0 based on the observed data. 8.1.6 Evidence against H0 based on the observed data. Strong evidence against H0 based on the observed data. Very strong evidence against H0 based on the observed data. Example For Example 8.1.4 suppose x = 5:7, n = 25 and 0 = 5. Determine the p-value for both HA : 6= 0 and HA : > 0 . Give a conclusion in each case. Solution For x = 5:7, n = 25; 0 = 5, and HA : d = j5:7 5j = 0:7, and p-value = P X = P (jT 5 125j 6= 5 the observed value of the test statistic is 0:7; H0 : 17:5) =5 where T = 25 P Xi Poisson (125) i=1 = P (T 107:5) + P (T = P (T 107) + P (T 142:5) 143) = 0:05605429 + 0:06113746 = 0:1171917 calculated using R. Since p-value > 0:1 there is no evidence against H0 : the data. For x = 5:7, n = 25; 0 = 5, and HA : = 5 based on > 5 the observed value of the test statistic is 8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES d = 5:7 257 5 = 0:7, and p-value = P X = P (T 5 0:7; H0 : 125 17:5) =5 where T = 25 P Xi Poisson (125) i=1 = P (T 143) = 0:06113746 Since 0:05 < p-value 8.1.7 0:1 there is weak evidence against H0 : = 5 based on the data. Exercise Suppose in a Binomial experiment 42 successes have been observed in 100 trials and the hypothesis of interest is H0 : = 0:5. (a) If the alternative hypothesis is HA : the p-value and give a conclusion. 6= 0:5, suggest a suitable test statistic, calculate (b) If the alternative hypothesis is HA : the p-value and give a conclusion. < 0:5, suggest a suitable test statistic, calculate 8.2 Likelihood Ratio Tests for Simple Hypotheses In Examples 8.1.4 and 8.1.7 it was reasonably straightforward to suggest a test statistic which made sense. In this section we consider a general method for constructing a test statistic which has good properties in the case of a simple hypothesis. The test statistic we use is the likelihood ratio test statistic which was introduced in your previous statistics course. Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where 2 and the dimension of is k. Suppose also that the hypothesis of interest is H0 : = 0 where the elements of 0 are completely speci…ed. H0 can also be written as H0 : 2 0 where 0 consists of the single point 0 . The dimension of 0 is zero. H0 is a simple hypothesis since the model and all the parameters are completely speci…ed. The likelihood ratio test statistic for this simple hypothesis is (X; 0) = 2 log R ( 0 ; X) " # L ( 0 ; X) = 2 log L(~; X) h i = 2 l(~; X) l ( 0 ; X) where ~ = ~ (X) is the maximum likelihood estimator of . Note that this test statistic implicitly assumes that the alternative hypothesis is HA : 6= 0 or HA : 2 = 0. 258 8. HYPOTHESIS TESTING Let the observed value of the likelihood ratio test statistic be h i (x; 0 ) = 2 l(^; x) l ( 0 ; x) where x = (x1 ; x2 ; : : : ; xn ) are the observed data. The p-value is p-value = P [ (X; 0) (x; 0 ); H0 ] Note that the p-value is calculated assuming H0 : = 0 is true. In general this p-value is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is usually intractable. We use the result from Theorem 7.3.4 which says that under certain (regularity) conditions 2 log R( ; Xn ) = 2[l(~n ; Xn ) 2 l( ; Xn )] !D W (k) (8.1) for each 2 where Xn = (X1 ; X2 ; : : : ; Xn ) and ~n = ~n (X1 ; X2 ; : : : ; Xn ). Therefore based on the asymptotic result (8.1) and assuming H0 : = 0 is true, the p-value for testing H0 : = 0 using the likelihood ratio test statistic can be approximated using p-value 8.2.1 P [W 0 )] (x; 2 where W (k) Example Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N ; 2 distribution where known. Show that, in this special case, the likelihood ratio test statistic for testing H0 : = 0 has exactly a 2 (1) distribution. Solution From Example 7.1.8 we have that the likelihood function of L( ) = or more simply n exp 2 n 1 P 2 i=1 L( ) = exp n(x 2 )2 for 2 )2 n(x 2 x)2 exp (xi is 2 2< The corresponding log likelihood function is l( ) = n(x 2 )2 for 2 Solving dl n(x = d ) 2 =0 2< for 2< 2 is 8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 259 gives = x. Since l( ) is a quadratic function which is concave down we know that ^ = x is the maximum likelihood estimate. The corresponding maximum likelihood estimator of is n 1 P ~=X= Xi n i=1 Since L(^ ) = 1, the relative likelihood function is R( ) = L( ) L(^ ) n(x 2 = exp )2 for 2 2< The likelihood ratio test statistic for testing the hypothesis H0 : ( 0 ; X) L ( 0 ; X) L (~ ; X) n(X = 2 log exp 2 2 n(X 0) = 2 = = If H0 : = 0 is true then X = 0 is 2 log X 2 0) since ~ = X 2 2 p 0 = n N 0; X 2 and 2 p 0 = n 2 (1) as required. 8.2.2 Example Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( ) distribution. (a) Find the likelihood ratio test statistic for testing H0 : ratio statistic takes on large values if ~ > 0 or ~ < 0 . = 0. Verify that the likelihood (b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : Compare this with the test in Example 8.1.6. = 5. Solution (a) From Example 6.2.5 we have the likelihood function L( ) = nx e n for 0 and maximum likelihood estimate ^ = x. The relative likelihood function can be written as n^ L( ) ^ R( ) = = en( ) for 0 ^ L(^) 260 8. HYPOTHESIS TESTING The likelihood ratio test statistic for H0 : ( 0 ; X) = = 0 is 2 log R ( 0 ; X) " ~ = 2 log ~ ~ log = 2n 0 = 2n~ n 0 0 ~ 1 ~ n(~ e + 0 ) # ~ 0 log 0 (8.2) ~ To verify that the likelihood ratio statistic takes on large values if ~ > equivalently if ~0 < 1 or ~0 > 1, consider the function g (t) = a [(t 1) log (t)] for t > 0 and a > 0 0 or ~ < 0 or (8.3) We note that g (t) ! 1 as t ! 0+ and t ! 1. Now g 0 (t) = a 1 1 t =a t 1 t for t > 0 and a > 0 Since g 0 (t) < 0 for 0 < t < 1, and g 0 (t) > 0 for t > 1 we can conclude that the function g (t) is a decreasing function for 0 < t < 1 and an increasing function for t > 1 with an absolute minimum at t = 1. Since g (1) = 0, g (t) is positive for all t > 0 and t 6= 0. Therefore if we let t = of t = 0 ~ 0 ~ in (8.2) then we see that < 1 or large values of t = (b) If x = 6, n = 25, and H0 : statistic is 0 ~ ( 0 ; X) will be large for small values > 1. = 5 then the observed value of the likelihood ratio test (5; x) = = 2 log R ( 0 ; X) " 5 25(5:6) 25(6 2 log e 6 5) # = 4:6965 The parameter space is approximate p-value is p-value = f : > 0g which has dimension 1 and thus k = 1. The 2 P (W 4:6965) where W (1) h i p = 2 1 P Z 4:6965 where Z N (0; 1) = 0:0302 calculated using R. Since 0:01 < p-value 0:05 there is evidence against H0 : = 5 based on the data. Compared with the answer in Example 8.1.6 for HA : 6= 5 we note that the p-values are slightly di¤erent by the conclusion is the same. 8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 8.2.3 261 Example Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( ) distribution. (a) Find the likelihood ratio test statistic for testing H0 : ratio statistic takes on large values if ~ > 0 or ~ < 0 . = 0. Verify that the likelihood (b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : = 5. (c) From Example 6.6.3 we have 2 Q(X; ) = n P Xi i=1 = 2n~ 2 (2n) is a pivotal quantity. Explain how this pivotal quantity could be used to test H0 : if (i) HA : < 0 , (ii) HA : > 0 , and (iii) HA : 6= 0 . (d) Suppose x = 6 and n = 25. Use the test statistic from (c) for HA : H0 : = 5. Compare the answer with the answer in (b). 6= 0 = 0 to test Solution (a) From Example 6.2.8 we have the likelihood function L( ) = n e nx= for >0 and maximum likelihood estimate ^ = x. The relative likelihood function can be written as !n ^ L( ) ^ R( ) = = en(1 )= for 0 ^ L( ) The likelihood ratio test statistic for H0 : ( 0 ; X) = = = 0 is 2 log R ( 0 ; X) !n " ~ 2 log en(1 = 2n " ~ ! 1 0 ~)= # ~ log 0 !# To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or ~ ~ equivalently if 0 < 1 and 0 > 1 we note that ( 0 ; X) is of the form 8.3 so an argument similar to Example 8.2.2(a) can be used with t = (b) If x = 6, n = 25, and H0 : statistic is ~ 0 . = 5 then the observed value of the likelihood ratio test (5; x) = = 2 log R ( 0 ; X) " 6 25 25(1 2 log e 5 = 0:8839222 6)=5 # 262 8. HYPOTHESIS TESTING The parameter space is approximate p-value is p-value = f : > 0g which has dimension 1 and thus k = 1. The 2 P (W 0:8839222) where W (1) h i p = 2 1 P Z 0:8839222 where Z N (0; 1) = 0:3471 calculated using R. Since p-value > 0:1 there is no evidence against H0 : the data. (c) (i) If HA : > 0 we could let D = ~ ~ 0 . If H0 : = 0 = 5 based on is true then since E(~) = 0 we would expect observed values of D = 0 to be close to 1. However if HA : > 0 is true ~ then E(~) = > 0 and we would expect observed values of D = 0 to be larger than 1 and therefore large values of D provide evidence against H0 : = 0 . The corresponding p-value would be ! ~ ^ p-value = P ; H0 0 = P 0 2n^ W 0 (ii) If HA : < 0 we could still let D = ~ ~ 0 ! where W . If H0 : = 0 2 (2n) is true then since E(~) = 0 we would expect observed values of D = 0 to be close to 1. However if HA : < 0 is true ~ then E(~) = < 0 and we would expect observed values of D = 0 to be smaller than 1 and therefore small values of D provide evidence against H0 : = 0 . The corresponding p-value would be ! ~ ^ p-value = P ; H0 0 = P 0 2n^ W 0 (iii) If HA : 6= 0 we could still let D = ~ ~ 0 ! where W . If H0 : = 0 2 (2n) is true then since E(~) = 0 we would expect observed values of D = 0 to be close to 1. However if HA : 6= 0 is true ~ then E(~) = 6= 0 and we would expect observed values of D = 0 to be either larger or smaller than 1 and therefore both large and small values of D provide evidence against H0 : = 0 . If a large (small) value of D is observed it is not simple to determine exactly which small (large) values should also be considered. Since we are not that concerned about the exact p-value, the p-value is usually calculated more simply as ! !! 2n^ 2n^ 2 p-value = min 2P W ; 2P W where W (2n) 0 0 8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES (d) If x = 6, n = 25, and H0 : = 5 then the observed value of the D = (50) 6 5 = min (1:6855; 0:3145) p-value = min 2P 263 ; 2P W (50) 6 5 W ~ is d = 0 2 where W 6 5 with (50) = 0:3145 calculated using R. Since p-value > 0:1 there is no evidence against H0 : the data. h ~ ~ We notice the test statistic D = 0 and ( 0 ; X) = 2n 1 log 0 functions of 8.2.4 ~ 0 = 5 based on ~ 0 i are both . For this example the p-values are similar and the conclusions are the same. Example The following table gives the observed frequencies of the six faces in 100 rolls of a die: Face: j Observed Frequency: xj 1 16 2 15 3 14 4 20 5 22 6 13 Total 100 Are these observations consistent with the hypothesis that the die is fair? Solution The model for these data is (X1 ; X2 ; : : : ; X6 ) Multinomial(100; 1 ; 2 ; : : : ; 6 ) and the hypothesis of interest is H0 : 1 = 2 = = 6 = 61 . Since the model and parameters are 6 P completely speci…ed this is a simple hypothesis. Since j = 1 there are really only k = 5 j=1 parameters. The relative likelihood function for ( 1 ; L ( 1; 2; : : : ; 5) = n! x1 !x2 ! x5 !x6 ! 2; : : : ; 5) x1 x2 1 2 x5 5 x5 5 1 (1 is 1 5) 2 or more simply L ( 1; for 0 j 2; : : : ; 5) x1 x2 1 2 = 1 for j = 1; 2; : : : ; 5 and 5 P (1 5) 2 x6 1. The log likelihood function is j j=1 l ( 1; 2; : : : ; 5) = 5 P xj log j + x6 log (1 1 2 5) j=1 Now @l @ j = xj j = xj (1 n 1 1 x1 x2 x5 1 2 5 for j = 1; 2; : : : ; 5 5) 2 j (1 1 j 2 (n x1 x2 5) x5 ) x6 264 8. HYPOTHESIS TESTING x5 . We could solve @@lj = 0 for j = 1; 2; : : : ; 5 simultaneously. In the Binomial case we know ^ = nx . It seems reasonable that the maximum likelihood x x estimate of j is ^j = nj for j = 1; 2; : : : ; 5. To verify this is true we substitute j = nj for j = 1; 2; : : : ; 5 into @@lj to obtain since x6 = n x1 x2 xj P xi n i6=j xj xj P xi = 0 n i6=j xj + Therefore the maximum likelihood estimator of Xj n is ~j = for j = 1; 2; : : : ; 5. Note also 5 P ~j = X6 . that by the invariance property of maximum likelihood estimators ~6 = 1 n j j=1 Therefore we can write 6 P Xj log l(~1 ; ~2 ; ~3 ; ~4 ; ~5 ; X) = Xj n i=1 Since the null hypothesis is H0 : 1 l( = 2 = 0 ; X) = = 6 P 6 = 1 6 1 6 Xj log i=1 so the likelihood ratio test statistic is h (X; 0 ) = 2 l(~; X) = 2 6 P l( Xj n Xj log i=1 = 2 6 P Xj log i=1 0 ; X) Xj Ej i 6 P Xj log i=1 1 6 where Ej = n=6 is the expected frequency for outcome j. This test statistic is the likelihood ratio Goodness of Fit test statistic introduced in your previous statistics course. For these data the observed value of the likelihood ratio test statistic is (x; 0) = 2 6 P xj log i=1 = 2 16 log xj 100=6 16 + 15 log 100=6 15 100=6 + + 13 log 13 100=6 = 3:699649 The approximate p-value is p-value P (W 3:699649) where W 2 (5) = 0:5934162 calculated using R. Since p-value> 0:1 there is no evidence based on the data against the hypothesis of a fair die. 8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 265 Note: (1) In this example the data (X1 ; X2 ; : : : ; X6 ) are not a random sample. The conditions for (8.1) hold by thinking of the experiment as a sequence of n independent trials with 6 outcomes on each trial. (2) You may recall from your previous statistics course that the 2 approximation is reasonable if the expected frequency Ej in each category is at least 5. 8.2.5 Exercise In a long-term study of heart disease in a large group of men, it was noted that 63 men who had no previous record of heart problems died suddenly of heart attacks. The following table gives the number of such deaths recorded on each day of the week: Day of Week No. of Deaths Mon. 22 Tues. 7 Wed. 6 Thurs. 13 Fri. 5 Sat. 4 Sun. 6 Test the hypothesis of interest that the deaths are equally likely to occur on any day of the week. 8.3 Likelihood Ratio Tests for Composite Hypotheses Suppose X = (X1 ; X2 ; : : : ; Xn ) is a random sample from f (x; ) where 2 and has dimension k. Suppose we wish to test H0 : 2 0 where 0 is a subset of of dimension q where 0 < q < k. The hypothesis H0 is a composite hypothesis since all the values of the unknown parameters are not speci…ed. For testing composite hypotheses we use the likelihood ratio test statistic 2 3 max L( ; X) 2 0 5 (X; 0 ) = 2 log 4 max L( ; X) 2 = 2 l(~; X) max l( ; X) 2 0 where ~ = ~ (X1 ; X2 ; : : : ; Xn ) is the maximum likelihood estimator of . Note that this test statistic implicitly assumes that the alternative hypothesis is HA : 2 = 0. Let the observed value of the likelihood ratio test statistic be (x; 0) = 2 l(^; x) max l ( ; x) 2 0 where x = (x1 ; x2 ; : : : ; xn ) are the observed data. The p-value is p-value = P [ (X; 0) (x; 0 ); H0 ] Note that the p-value is calculated assuming H0 : 2 0 is true. In general this p-value is di¢ cult to determine exactly since the distribution of the random variable (X; 0 ) is 266 8. HYPOTHESIS TESTING usually intractable. Under certain (regularity) conditions it can be shown that, assuming the hypothesis H0 : 2 0 is true, (X; 0) = 2 l(~n ; Xn ) 2 max l( ; Xn ) !D W 2 (k q) 0 where Xn = (X1 ; X2 ; : : : ; Xn ) and ~n = ~n (X1 ; X2 ; : : : ; Xn ). The approximate p-value is given by p-value P [W (x; 0 )] where x = (x1 ; x2 ; : : : ; xn ) are the observed data. Note: The number of degrees of freedom is the di¤erence between the dimension of and the dimension of 0 . The degrees of freedom can also be determined as the number of parameters estimated under the model with no restrictions minus the number of parameters estimated under the model with restrictions imposed by the null hypothesis H0 . 8.3.1 Example (a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Gamma( ; ) distribution. Find the likelihood ratio test statistic for testing H0 : = 0 where is unknown. Indicate how to …nd the approximate p-value. (b) For the data in Example 7.1.14 test the hypothesis H0 : = 2. Solution (a) From Example 8.1.2(b) we have = f( ; ) : > 0; > 0g which has dimension k = 2 and 0 = f( 0 ; ) : > 0g which has dimension q = 1 and the hypothesis is composite. From Exercise 7.1.11 the likelihood function is L( ; ) = [ ( ) ] n Q n xi n 1 P exp i=1 xi for > 0; >0 i=1 where and the log likelihood function is l ( ; ) = log L ( ; ) = n log ( ) n log + n P n 1 P log xi i=1 xi for > 0; i=1 The maximum likelihood estimators cannot be found explicitly so we write l(~ ; ~ ; X) = n log (~ ) n~ log ~ + ~ n P log Xi i=1 If = 0 then the log likelihood function is l( 0; ) = n log ( 0) n 0 log + 0 n P i=1 log xi n P i=1 n 1 P X ~ i=1 i xi for >0 >0 8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES which is only a function of . To determine 267 max l( ; ; X) we note that ( ; )2 0 n P and d d l( 0; xi d n 0 i=1 l ( 0; ) = + 2 d n P = n1 0 xi = x0 and therefore ) = 0 for i=1 max l( ; ; X) = ( ; )2 n log ( 0) n 0 log X + 0 0 +n log ( n~ log ~ + ~ n P log Xi 0) +n 0 log X 0 0 (~ ( 0) + (~ ) n = 2n log 0) n P 0 X ~ ^ log ^ + 0 x ^ n 1 P X ~ i=1 i log Xi + n 0] i=1 n P log Xi + 0 log X 0 i=1 with corresponding observed value 0) = 2n log q=2 ~ log ~ + 0 0 i=1 Since k n max l( ; ; X) ( ; )2 n log (~ ) (x; log Xi 0) = 2 l(~ ; ~ ; X) = 2[ n P i=1 The likelihood ratio test statistic is (X; 0 (^ ( 0) + (^ ) n 0) log xi + i=1 1=1 p-value n P 0 log x 0 2 P [W (x; 0 )] where W (1) h i p = 2 1 P Z (x; 0 ) where Z N (0; 1) The degrees of freedom can also be determined by noticing that, under the full model two parameters ( and ) were estimated, and under the null hypothesis H0 : = 0 only one parameter ( ) was estimated. Therefore 2 1 = 1 are the degrees of freedom. (b) For H0 : = 2 and the data in Example 7.1.14 we have n = 30, x = 6:824333, n P 1 log xi = 1:794204, ^ = 4:118407, ^ = 1:657032. The observed value of the likelihood n i=1 ratio test statistic is (x; p-value 0) = 6:886146 with 2 P (W 6:886146) where W (1) i h p 6:886146 where Z = 2 1 P Z N (0; 1) = 0:008686636 calculated using R. Since p-value on the data. 0:01 there is strong evidence against H0 : = 2 based 268 8. HYPOTHESIS TESTING 8.3.2 Example (a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Exponential( 1 ) distribution and independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Exponential( 2 ) distribution. Find the likelihood ratio test statistic for testing H0 : 1 = 2 . Indicate how to …nd the approximate p-value. 10 P (b) Find the approximate p-value if the observed data are n = 10, xi = 22, m = 15, i=1 15 P yi = 40. What would you conclude? i=1 Solution (a) From Example 8.1.2(c) we have k = 2 and 0 = f( 1 ; 2 ) : 1 = 2 ; hypothesis is composite. = f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension > 0; 2 > 0g which has dimension q = 1 and the 1 From Example 6.2.8 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn from an Exponential( 1 ) distribution is n L1 ( 1 ) = 1 e nx= for 1 1 >0 with maximum likelihood estimate ^1 = x. Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from an Exponential( 2 ) distribution is L2 ( 2 ) = m 2 e my= for 2 2 >0 with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood function for ( 1 ; 2 ) is L ( 1; 2) = L1 ( 1 )L2 ( 2 ) for 1 > 0; 2 >0 and the log likelihood function l ( 1; 2) = n log nx m log 1 1 my 2 for 1 > 0; 2 >0 2 The independence of the samples also implies the maximum likelihood estimators are still ~1 = X and ~2 = Y . Therefore l(~1 ; ~2 ; X; Y) = If 1 = 2 = n log X m log Y (n + m) then the log likelihood function is l( ) = (n + m) log (nx + my) for >0 8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES which is only a function of . To determine d l( ) = d and d d l ( ) = 0 for ( nx+my n+m = l( 1 ; max 1 ; 2 )2 ( max 1 ; 2 )2 (n + m) + l( 1 ; 2 ; X; Y) 269 we note that 0 (nx + my) 2 and therefore 2 ; X; Y) = nX + mY n+m (n + m) log 0 (n + m) The likelihood ratio test statistic is (X; Y; 0) = 2 l(~1 ; ~2 ; X; Y) = 2 n log X ( max 1 ; 2 )2 m log Y = 2 (n + m) log l( 1 ; 2 ; X; Y) 0 (n + m) + (n + m) log nX + m Y n+m n log X nX + mY n+m + (n + m) m log Y with corresponding observed value (x; y; Since k q=2 = 2 (n + m) log nx + my n+m n log x m log y 1=1 p-value (b) For n = 10, 0) 10 P 2 P [W (x; y; 0 )] where W (1) h i p = 2 1 P Z (x; y; 0 ) where Z xi = 22, m = 15, i=1 test statistic is (x; y; p-value 15 P N (0; 1) yi = 40 the observed value of the likelihood ratio i=1 0) = 0:2189032 and 2 P (W 0:2189032) where W (1) h i p = 2 1 P Z 0:2189032 where Z N (0; 1) = 0:6398769 calculated using R. Since p-value > 0:5 there is no evidence against H0 : the observed data. 8.3.3 1 = 2 based on Exercise (a) Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Poisson( 1 ) distribution and independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Poisson( 2 ) distribution. Find the 270 8. HYPOTHESIS TESTING likelihood ratio test statistic for testing H0 : p-value. 1 = 2. Indicate how to …nd the approximate (b) Find the approximate p-value if the observed data are n = 10, 10 P xi = 22, m = 15, i=1 15 P yi = 40. What would you conclude? i=1 8.3.4 Example In a large population of males ages 40 50, the proportion who are regular smokers is where 0 1 and the proportion who have hypertension (high blood pressure) is where 0 1. Suppose that n men are selected at random from this population and the observed data are Category SH S H SH S H Frequency x11 x12 x21 x22 where S is the event the male is a smoker and H is the event the male has hypertension. Find the likelihood ratio test statistic for testing H0 : events S and H are independent. Indicate how to …nd the approximate p-value. Solution The model for these data is (X1 ; X2 ; : : : ; X6 ) parameter space ( = ( 11 ; 12 ; 21 ; 22 ) :0 ij Multinomial(100; 2 P 2 P 1 for i; j = 1; 2 11 ; 12 ; 21 ; 22 ) ij j=1 i=1 1 with ) which has dimension k = 3. Let P (S) = 12 = (1 and P (H) = ), 21 = (1 0 21 = f( = (1 then the hypothesis of interest can be written as H0 : ) , 22 = (1 ) (1 ) and 11 ; 12 ; 21 ; 22 ) ) ; 22 : = (1 = 11 ; ) (1 12 = (1 ); 0 11 = , ); 1; 0 1g which has dimension q = 2 and the hypothesis is composite. From Example 8.2.4 we can see that the relative likelihood function for ( L( 11 ; 12 ; 21 ; 22 ) = x11 x12 x21 x22 11 12 21 22 The log likelihood function is l( 11 ; 12 ; 21 ; 22 ) = 2 P 2 P j=1 i=1 xij log ij 11 ; 12 ; 21 ; 22 ) is 8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES and the maximum likelihood estimate of ij is ^ij = xij n 271 for i; j = 1; 2. Therefore 2 P 2 P Xij log l(~11 ; ~12 ; ~21 ; ~22 ; X) = j=1 i=1 Xij n If the events S and H are independent events then from Chapter 7, Problem 4 we have that the likelihood function is x11 +x12 = )x21 +x22 (1 ) (1 )]x22 )x12 +x22 for 0 ) ]x21 [(1 )]x12 [(1 )x11 [ (1 L( ; ) = ( x11 +x21 (1 1, 0 1 The log likelihood is l( ; ) = (x11 + x12 ) log + (x21 + x22 ) log(1 + (x11 + x21 ) log for 0 < < 1, 0 < + (x12 + x22 ) log(1 ) ) <1 and the maximum likelihood estimates are ^= x11 + x12 x11 + x21 and ^ = n n therefore max l( 11 ; 12 ; 21 ; 22 )2 ( 11 ; 12 ; 21 ; 22 ; X) 0 X11 + X12 X21 + X22 + (X21 + X22 ) log n n X11 + X21 X12 + X22 + (X11 + X21 ) log + (X12 + X22 ) log n n = (X11 + X12 ) log The likelihood ratio test statistic can be written as (X; 0) = 2 l(~11 ; ~12 ; ~21 ; ~22 ; X) = 2 2 P 2 P j=1 i=1 RC Xij log ( max 11 ; 12 ; 21 ; 22 )2 l( 11 ; 12 ; 21 ; 22 ; X) 0 Xij Eij where Eij = in j , Ri = Xi1 + Xi2 , Cj = X1j + X2j for i; j = 1; 2. Eij is the expected frequency if the hypothesis of independence is true. The corresponding observed value is (x; 0) =2 2 P 2 P j=1 i=1 where eij = ri cj n , xij log xij eij ri = xi1 + xi2 , cj = x1j + x2j for i; j = 1; 2. 272 Since k 8. HYPOTHESIS TESTING q=3 2=1 p-value 2 P [W (x; 0 )] where W (1) h i p = 2 1 P Z (x; 0 ) where Z N (0; 1) This of course is the usual test of independence in a two-way table which was discussed in your previous statistics course. 8.4. CHAPTER 8 PROBLEMS 8.4 273 Chapter 8 Problems 1. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the distribution with probability density function f (x; ) = x 1 for 0 x 1 for >0 (a) Find the likelihood ratio test statistic for testing H0 : (b) If n = 20 and 20 P log xi = = 0. 25 …nd the approximate p-value for testing H0 : =1 i=1 using the asymptotic distribution of the likelihood ratio statistic. What would you conclude? 2. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Pareto(1; ) distribution. (a) Find the likelihood ratio test statistic for testing H0 : …nd the approximate p-value. (b) For the data n = 25 and 25 P = 0. Indicate how to log xi = 40 …nd the approximate p-value for testing i=1 H0 : 0 = 1. What would you conclude? 3. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the distribution with probability density function 2 x 1 f (x; ) = 2 e 2 (x= ) for x > 0; > 0 (a) Find the likelihood ratio test statistic for testing H0 : …nd the approximate p-value. (b) If n = 20 and 20 P i=1 = 0. Indicate how to x2i = 10 …nd the approximate p-value for testing H : = 0:1. What would you conclude? 4. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the Weibull(2; 1 ) distribution and independently Y1 ; Y2 ; : : : ; Ym is a random sample from the Weibull(2; 2 ) distribution. Find the likelihood ratio test statistic for testing H0 : 1 = 2 . Indicate how to …nd the approximate p-value. 5. Suppose (X1 ; X2 ; X3 ) Multinomial(n; 1 ; 2 ; 3 ). (a) Find the likelihood ratio test statistic for testing H0 : how to …nd the approximate p-value. 1 = 2 (b) Find the likelihood ratio test statistic for testing H0 : 1 = 2 ; )2 . Indicate how to …nd the approximate p-value. 3 = (1 = 2 3. Indicate = 2 (1 ); 274 8. HYPOTHESIS TESTING 6. Suppose X1 ; X2 ; : : : ; Xn is a random sample from the N( 1 ; 21 ) distribution and independently Y1 ; Y2 ; : : : ; Ym is a random sample from the N( 2 ; 22 ) distribution. Find the likelihood ratio test statistic for testing H0 : 1 = 2 ; 21 = 22 . Indicate how to …nd the approximate p-value. 9. Solutions to Chapter Exercises 275 276 9.1 9. SOLUTIONS TO CHAPTER EXERCISES Chapter 2 Exercise 2.1.5 (a) P (A) 0 follows from De…nition 2.1.3(A1). From Example 2.1.4(c) we have P A = 0 and therefore P (A) 1. 1 P (A). But from De…nition 2.1.3(A1) P A (b) Since A = (A \ B) [ A \ B and (A \ B) \ A \ B = ? then by Example 2.1.4(b) P (A) = P (A \ B) + P A \ B or P A \ B = P (A) P (A \ B) as required. (c) Since A [ B = A \ B [ (A \ B) [ A \ B is the union of three mutually exclusive events then by De…nition 2.1.3(A3) and Example 2.1.4(a) we have P (A [ B) = P A \ B + P (A \ B) + P A \ B (9.1) By the result proved in (b) P A \ B = P (A) P (A \ B) (9.2) P A \ B = P (B) P (A \ B) (9.3) and similarly Substituting (9.2) and (9.3) into (9.1) gives P (A [ B) = P (A) P (A \ B) + P (A \ B) + P (B) = P (A) + P (B) as required. P (A \ B) P (A \ B) 9.1. CHAPTER 2 277 Exercise 2.3.7 By 2.11.8 x2 x3 + 2 3 log (1 + x) = x Let x = p for 1<x 1 1 to obtain log p = (p which holds for 0 < p as 1)2 (p 1) + 2 1)3 (p 3 (9.4) 2 and therefore also hold for 0 < p < 1. Now (9.4) can be written log p = = (1 1 P Therefore 1 P p)3 (1 2 p)x (1 3 for 0 < p < 1 x x=1 p)2 (1 p) (1 p)x x log p x=1 P 1 1 (1 p)x = log p x=1 x log p =1 = log p 1 P f (x) = x=1 which holds for 0 < p < 1. Exercise 2.4.11 (a) Z1 1 f (x) dx = Z1 x 1 e (x= ) dx 0 1 Let y = (x= ) . Then dy = x dx. When x = 0, y = 0 and as x ! 1, y ! 1. Therefore Z1 Z1 x 1 (x= ) f (x) dx = e dx 1 0 = Z1 e y dy = (1) = 0! = 1 0 (b) If = 1 then f (x) = 1 e x= for x > 0 which is the probability density function of an Exponential( ) random variable. (c) See Figure 9.1 278 9. SOLUTIONS TO CHAPTER EXERCISES 2 α=1,β=0.5 1.8 α=2,β=0.5 1.6 1.4 1.2 f(x) α=3,β=1 1 0.8 0.6 0.4 α=2,β=1 0.2 0 0 0.5 1 1.5 x 2 2.5 3 Figure 9.1: Graphs of Weibull probability density function Exercise 2.4.12 (a) Z1 f (x) dx = 1 = Z1 x lim b!1 = 1 +1 h dx = lim b!1 x j b i Zb 1 x 1 =1 1 lim = 1 since b!1 b dx i h lim b b!1 >0 (b) See Figure 9.2 4 3.5 α=0.5, β=2 3 2.5 f(x) 2 α=0.5, β=1 α=1, β=2 1.5 1 α=1, β=1 0.5 0 0 0.5 1 1.5 x 2 2.5 3 Figure 9.2: Graphs of Pareto probability density function 9.1. CHAPTER 2 279 Exercise 2.5.4 (a) For X Cauchy( ; 1) the probability density function is f (x; ) = h 1 1 + (x )2 i for x 2 <; 2< and 0 otherwise. See Figure 9.3 for a sketch of the probability density function for = 1; 0; 1. 0.35 0.3 θ=1 θ=0 θ=-1 0.25 0.2 f(x) 0.15 0.1 0.05 0 -6 -4 -2 0 x 2 4 6 Figure 9.3: Cauchy( ; 1) probability density functions for Let f0 (x) = f (x; = 0) = 1 (1 + x2 ) = 1; 0; 1 for x 2 < and 0 otherwise. Then f (x; ) = and therefore (b) For X h 1 1 + (x ) 2 i = f0 (x ) for x 2 <; 2< is a location parameter of this distribution. Cauchy(0; ) the probability density function is f (x; ) = h 1 1 + (x= )2 i for x 2 <; >0 and 0 otherwise. See Figure 9.4 for a sketch of the probability density function for = 0:5; 1; 2. Let 1 for x 2 < f1 (x) = f (x; = 1) = (1 + x2 ) 280 9. SOLUTIONS TO CHAPTER EXERCISES 0.7 0.6 θ=0.5 0.5 0.4 f(x) 0.3 θ=1 0.2 0.1 θ=2 0 -5 0 x 5 Figure 9.4: Cauchy(0; ) probability density function for = 0:5; 1; 2 and 0 otherwise. Then f (x; ) = and therefore h 1 1 + (x= ) 2 1 x i = f1 is a scale parameter of this distribution. for x 2 <; >0 9.1. CHAPTER 2 281 Exercise 2.6.11 If X Exponential(1) then the probability density function of X is x f (x) = e for x 0 Y = X 1= = h (X) for > 0, > 0 is an increasing function with inverse function X = (Y = ) = h 1 (Y ). Since the support set of X is A = fx : x 0g, the support set of Y is B = fy : y 0g. Since d h dy 1 d dy y (y) = = y 1 then by Theorem 2.6.8 the probability density function of Y is g(y) = f (h (y)) 1 y = 1 d h dy 1 (y= ) e (y) for y 0 which is the probability density function of a Weibull( ; ) random variable as required. Exercise 2.6.12 X is a random variable with probability density function 1 f (x) = x for 0 < x < 1; >0 and 0 otherwise. Y = log X is a decreasing function with inverse function X = e Y = h 1 (Y ). Since the support set of X is A = fx : 0 < x < 1g, the support set of Y is B = fy : y > 0g. Since d h dy 1 (y) = = d e dy e y y then by Theorem 2.6.8 the probability density function of Y is g(y) = f (h 1 (y)) = y 1 = e e y d h dy for y e 1 (y) y 0 which is the probability density function of a Exponential 1 random variable as required. 282 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 2.7.4 Using integration by parts with u = 1 Z1 [1 F (x) and dv = dx gives du = f (x) dx, v = x and F (x)] dx = x [1 (x)] j1 0 F 0 + Z1 xf (x) dx 0 = lim x x!1 Z1 f (t) dt + E (X) x The desired result holds if lim x x!1 Z1 f (t) dt = 0 (9.5) x Since 0 x Z1 f (t) dt x Z1 tf (t) dt x then (9.5) holds by the Squeeze Theorem if lim x!1 Z1 tf (t) dt = 0 x Since E (X) = R1 xf (x) dx exists then G (x) = 0 Rx tf (t) dt exists for all x > 0 and lim G (x) = x!1 0 E (X). By the First Fundamental Theorem of Calculus 2.11.9 and the de…nition of an improper integral 2.11.11 Z1 tf (t) dt = lim G (b) G (x) = E (X) G (x) b!1 x Therefore lim x!1 Z1 tf (t) dt = lim [E (X) x!1 G (x)] = E (X) x = E (X) = 0 and (9.5) holds. E (X) lim G (x) x!1 9.1. CHAPTER 2 283 Exercise 2.7.9 (a) If X Poisson( ) then 1 P E(X (k) ) = x e x! x(k) x=0 = e k = e k k = x k 1 P k = e let y = x y=0 (x y y! e by 2.11.7 x=k 1 P k)! k for k = 1; 2; : : : Letting k = 1 and k = 2 we have E X (1) = E (X) = and E X (2) = E [X (X 2 1)] = so V ar (X) = E [X (X 2 = (b) If X [E (X)]2 1)] + E (X) 2 + = Negative Binomial(k; p) then E(X (j) ) = 1 P k k p (p x x(j) x=0 = pk (p 1)j ( k)(j) 1)x 1 P x=j = pk (p 1)j ( k)(j) 1 P x=j = pk (p 1)j ( k)(j) k j (p x j 1)x j by 2.11.4(1) k j (p x j 1)x j let y = x k j (p y 1)y 1 P y=0 k = p (p j (j) p j 1) ( k) = ( k)(j) 1 p (1 + p k j 1) j by 2.11.3(1) for j = 1; 2; : : : Letting k = 1 and k = 2 we have E X (1) = E (X) = ( k)(1) p E X = E [X (X (2) 1)] = ( k) = p and (2) 1 1 p 1 p k (1 p) p 2 = k (k + 1) 1 p p 2 284 9. SOLUTIONS TO CHAPTER EXERCISES so V ar (X) = E [X (X 1)] + E (X) 1 = k (k + 1) k (1 = + p 2 k (1 + p2 p) p k (1 = p) k (1 p) 2 p p k (1 p) (1 p2 p + p) k (1 p) p2 = (c) If X p) 2 p [E (X)]2 Gamma( ; ) then p E(X ) = Z1 x px 1 e x= ( ) dx = 0 x +p 1 e x= ( ) dx let y = x 0 1 = Z1 ( ) Z1 ( y) Z1 y +p 1 e y dy 0 +p = ( ) +p 1 e y dy which converges for +p>0 0 = p ( + p) ( ) for p > Letting k = 1 and k = 2 we have ( ) ( ) ( + 1) = ( ) E (X) = = and E X2 = E X2 = = ( + 1) 2 ( + 2) = ( ) 2 ( + 1) ( ) ( ) 2 so V ar (X) = E X 2 = 2 [E (X)]2 = ( + 1) 2 ( )2 9.1. CHAPTER 2 (c) If X 285 Weibull( ; ) then E(X ) = Z1 = Z1 k x xk 1 e (x= ) x dx let y = 0 y k 1= e y k dy = 0 y k= e y dy 0 k k = Z1 +1 for k = 1; 2; : : : Letting k = 1 and k = 2 we have 1 E (X) = +1 and 2 2 E X2 = E X2 = +1 so V ar (X) = E X 2 ( 2 = [E (X)]2 = 2 2 2 1 +1 1 +1 ) 2 +1 2 +1 Exercise 2.8.3 From Markov’s inequality we know P (jY j c) E jY jk for all k; c > 0 ck Since we are given that X is a random variable with …nite mean then h i 2 = E (X )2 = E jX j2 Substituting Y = X (9.6) and …nite variance , k = 1, and c = k into (9.6) we obtain P (jX j k ) E jX (k ) 2 or P (jX for all k > 0 as required. j k ) j2 2 = 1 k2 2 (k ) = 1 k2 2 286 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 2.9.2 p For a Exponential( ) random variable V ar (X) = ( ) = . For g (X) = log X, g 0 (X) = X1 . Therefore by (2.5), the variance of Y = g (X) = log X is approximately g0 ( ) ( ) 2 1 = 2 ( ) = 1 which is a constant. Exercise 2.10.3 (a) If X Binomial(n; p) then M (t) = = 1 P n x p (1 x etx x20 1 P x20 n x et p x p) k = (pet + 1 (b) If X p)n x p)n (1 x which converges for t 2 < by 2.11.3(1) Poisson( ) then M (t) = 1 P etx x=0 = e 1 P x=0 et x e x! et x! x which converges for t 2 < = e e by 2.11.7 = e + et for t 2 < Exercise 2.10.6 If X Negative Binomial(k; p) then MX (t) = p 1 qet k for t < log q By Theorem 2.10.4 the moment generating function of Y = X + k is MY (t) = ekt MX (t) = pet 1 qet k for t < log q 9.1. CHAPTER 2 287 Exercise 2.10.14 (a) By the Exponential series 2.11.7 2 t2 =2 t2 =2 M (t) = e =1+ + + 1! 2! 1 1 4 = 1 + t2 + t for t 2 < 2 2!22 t2 =2 Since E(X k ) = k! coe¢ cient of tk in the Maclaurin series for M (t) we have E (X) = 1! (0) = 0 and E X 2 = 2! 1 2 =1 and so V ar (X) = E X 2 [E (X)]2 = 1 (b) By Theorem 2.10.4 the moment generating function of Y = 2X MY (t) = e t MX (2t) t (2t)2 =2 = e e = e t+(2t)2 =2 1 is for t 2 < for t 2 < By examining the list of moment generating functions in Chapter 11 we see that this is the moment generating function of a N( 1; 4) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a N( 1; 4) distribution. 288 9. SOLUTIONS TO CHAPTER EXERCISES 9.2 Chapter 3 Exercise 3.2.5 (a) The joint probability function of X and Y is f (x; y) = P (X = x; Y = y) h n! 2 x [2 (1 )]y (1 = x!y! (n x y)! for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y n 2 which is a Multinomial n; ; 2 (1 )2 in x y )2 distribution or the trinomial distribution. ) ; (1 (b) The marginal probability function of X is n Xx f1 (x) = P (X = x) = y=0 = = = and so X 2 x!y! (n 2 x x y)! h )]y (1 [2 (1 )2 in x y h in x y (n x)! [2 (1 )]y (1 )2 y! (n x y)! y=0 h in x x 2 (1 ) + (1 )2 by the Binomial Series 2.11.3(1) 1 X 2 x n! x! (n x)! n x n x n! 2 x Binomial n; 1 2 2 n x for x = 0; 1; : : : ; n . (c) In a similar manner to (b) the marginal probability function of Y can be shown to be Binomial(n; 2 (1 )) since P (Aa) = 2 (1 ). (d) P (X + Y = t) = P P (x;y): x+y=t = t X x=0 = = = x! (t t P f (x; y) = f (x; t x) x=0 n! x)! (n in n h (1 )2 t in n h (1 )2 t n 2 + 2 (1 t 2 x t)! t tX t x x=0 t 2 ) t 2 x + 2 (1 h (1 )]t [2 (1 [2 (1 ) )2 in t t x h (1 )]t )2 in t x by the Binomial Series 2.11.3(1) for t = 0; 1; : : : ; n Thus X + Y Binomial n; 2 + 2 (1 ) which makes sense since X + Y is counting the number of times an AA or Aa type occurs in n trials and the probability of success is P (AA) + P (Aa) = 2 + 2 (1 ). 9.2. CHAPTER 3 289 Exercise 3.3.6 (a) 1 = Z1 Z1 1 = f (x; y)dxdy = k 1 k 2 Z1 k 2 Z1 k 2 Z1 0 lim a!1 0 = lim a!1 0 = 0 = = = Z1 Z1 0 1 ja0 (1 + x + y)2 1 dxdy (1 + x + y)3 dy 1 1 2 + (1 + a + y) (1 + y)2 dy 1 dy (1 + y)2 k 1 a lim j 2 a!1 (1 + y) 0 k 1 lim +1 2 a!1 (1 + a) k 2 Therefore k = 2. A graph of the joint probability density function is given in Figure 9.5. 2 1.5 1 0.5 0 0 0 1 1 2 2 3 3 4 4 Figure 9.5: Graph of joint probability density function for Exercise 3.3.6 290 9. SOLUTIONS TO CHAPTER EXERCISES (b) (i) P (X 1; Y 2) = Z2 Z1 0 = 0 1 j10 dy (1 + x + y)2 0 Z2 1 1 dy = 2 + (2 + y) (1 + y)2 1 4 1 3 0 = Z2 2 dxdy = (1 + x + y)3 1 1 +1= (3 2 12 4 (ii) Since f (x; y) and the support set A = f(x; y) : x and y, P (X Y ) = 0:5 1 (2 + y) 6 + 12) = 0; y 1 (1 + y) 5 12 0g are both symmetric in x (iii) P (X + Y 1) = 1 Z1 Z 0 = y 0 Z1 2 dxdy = (1 + x + y)3 0 1 1 dy = 2 + (2) (1 + y)2 0 1 +1 2 1 +0 4 = Z1 = 1 j10 (1 + x + y)2 1 1 yj 4 0 f (x; y)dy = 1 Z1 1 4 0 = 2 dy = lim a!1 (1 + x + y)3 1 (1 + x)2 for x 1 ja0 (1 + x + y)2 0 the marginal probability density function of X is f1 (x) = 8 < : 0 1 (1+x)2 x<0 x 0 By symmetry the marginal probability density function of Y is f2 (y) = 8 < : 0 1 (1+y)2 y<0 y 0 y 1 j1 (1 + y) 0 (c) Since Z1 j20 dy 9.2. CHAPTER 3 291 (d) Since P (X x; Y y) = Zy Zx 0 = 0 Zy 2 dsdt = (1 + s + t)3 1 1 1+x+t 1+t 1 1 1+x+y 1+y = 1 x 2 j0 dt (1 + s + t) 0 1 1 dt 2 + (1 + x + t) (1 + t)2 0 = Zy jy0 1 + 1 for x 1+x the joint cumulative distribution function of X and Y is 8 < 0 F (x; y) = P (X x; Y y) = 1 1 : 1 + 1+x+y 1+y lim F (x; y) = lim y!1 = 1 1+ 1 1+x 1 1+x+y for x 1 1+y x 1 1+x 0 the marginal cumulative distribution function of X is ( 0 F1 (x) = P (X x) = 1 1 1+x x<0 x 0 Check: d F1 (x) = dx d dx 1 1 1+x 1 (1 + x)2 = f1 (x) for x 0 x < 0 or y < 0 1 1+x (e) Since y!1 0, y = 0 By symmetry the marginal cumulative distribution function of Y is 8 < 0 y<0 F2 (y) = P (Y y) = 1 : 1 1+y y 0 0, y 0 292 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 3.3.7 (a) Since the support set is A = f(x; y) : 0 < x < y < 1g 1 = Z1 Z1 1 1 Z1 = k f (x; y)dxdy = k Z1 Zy 0 e x y y j0 e 2y e x y dxdy 0 dy 0 = k Z1 +e y dy 0 = k = k = lim a!1 lim a!1 1 e 2 1 e 2 2y e y 2a e a ja0 1 +1 2 k 2 and therefore k = 2. A graph of the joint probability density function is given in Figure 9.6 2 1.5 1 0.5 0 0 0 0.5 0.5 1 1 1.5 1.5 Figure 9.6: Graph of joint probability density function for Exercise 3.3.7 9.2. CHAPTER 3 293 (b) (i) The region of integration is shown in Figure 9.7. P (X 1; Y 2) = 2 Z1 Z2 x y e dydx = 2 x=0 y=x = 2 Z1 x e e y 2 jx dx x=0 x e Z1 2 e +e x dx = 2 x=0 Z1 e 2 3 1 e 2 2 e x +e 2x 2 1 2 dx x=0 2 = 2 e x e = 1 + 2e 1 e 2 3 3e 2x j10 = 2 e e + 2 y 2 y=x B (x,x) x 1 0 Figure 9.7: Region of integration for Exercise 3.3.7(b)(i) (ii) Since the support set A = f(x; y) : 0 < x < y < 1g contains only values for which x < y then P (X Y ) = 1. (iii) The region of integration is shown in Figure 9.8 P (X + Y 1) = 2 1 Z1=2 Z x e x y dydx = 2 x=0 y=x = 2 Z1 e Z1 e x e y 1 x jx dx x=0 x e x 1 +e x dx = 2 x=0 Z1 e 1 +e 2x dx x=0 = 2 e 1 = 1 2e 1 x 1 e 2 2x 1=2 j0 =2 e 1 1 2 1 e 2 1 +0+ 1 2 294 9. SOLUTIONS TO CHAPTER EXERCISES y 1 y=x (x,1-x) C y=1-x (x,x) x 1/2 0 1 Figure 9.8: Region of integration for Exercise 3.3.7(b)(iii) (c) The marginal probability density function of X is f1 (x) = Z1 f (x; y)dy = 2 Z1 x y e dy = 2e x lim y a jx e a!1 x 1 = 2e 2x for x > 0 and 0 otherwise which we recognize is an Exponential(1=2) probability density function. The marginal probability density function of Y is Z1 f (x; y)dy = 2 1 Zy e x y dx = 2e y Zx Zy s t e x y j0 = 2e y 1 e y for y > 0 0 and 0 otherwise. (d) Since P (X x; Y y) = 2 e s=0 t=s Zx = 2 e dtds = 2 Zx e s e t jys ds 0 s e y +e s e s +e 2s ds 0 = 2 Zx e y ds = 2 e y e s 1 e 2 0 = 2e x y e 2x 2e y + 1 for x 0, x < y 2s jx0 9.2. CHAPTER 3 295 P (X x; Y y) = 2 Zy Zy s t e s=0 t=s 2y = e y 2e dtds + 1 for x > y > 0 and P (X x; Y y) = 0 for x 0 or y x therefore the joint cumulative distribution function of X and Y is 8 2e x y e 2x 2e y + 1 x > 0, y > x > > < e 2y 2e y + 1 x>y>0 F (x; y) = P (X x; Y y) = > > : 0 x 0 or y x (e) Since the support set is A = f(x; y) : 0 < x < y < 1g and lim F (x; y) = lim 2e y!1 x y e y!1 = 1 2x e x) = y 2e +1 for x > 0 marginal cumulative distribution function of X is ( F1 (x) = P (X 2x 0 1 e x 2x 0 x>0 which we also recognize as the cumulative distribution function of an Exponential(1=2) random variable. Since lim F (x; y) = x!1 lim 2e x!y 2y = e x y 2e y e 2x 2e y + 1 = 2e 2y 2y e 2e y +1 + 1 for y > 0 the marginal cumulative distribution function of Y is ( 0 F2 (y) = P (Y y) = 1 + e 2y y 2e y 0 y 0 Check: P (Y y) = Zy t e +1 1 2 +1 = 2e f2 (t) dt = 2 0 = 2 or d d F2 (y) = e dy dy Zy e 2t dt = 2 e t 1 + e 2 2t jy0 0 1 e y+ e 2 2y 2e y 2y =1+e 2y + 2e y 2y 2e = f2 (y) y for y > 0 for y > 0 296 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 3.4.4 From the solution to Exercise 3.2.5 we have n! f (x; y) = 2 x [2 (1 x!y! (n x y)! for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x + y n x f1 (x) = and f2 (y) = 2 x n [2 (1 y 2 n x 1 )]y [1 )2 n in x y for x = 0; 1; : : : ; n )]n 2 (1 h )]y (1 y for y = 0; 1; : : : ; n Since 2 n )2n 6= f1 (0) f2 (0) = 1 f (0; 0) = (1 [1 )]n 2 (1 therefore X and Y are not independent random variables. From the solution to Exercise 3.3.6 we have 2 (1 + x + y)3 for x 0, y f1 (x) = 1 (1 + x)2 for x 0 f2 (y) = 1 (1 + y)2 for y 0 f (x; y) = and 0 Since f (0; 0) = 2 6= f1 (0) f2 (0) = (1) (1) therefore X and Y are not independent random variables. From the solution to Exercise 3.3.7 we have f (x; y) = 2e x y f1 (x) = 2e for 0 < x < y < 1 2x for x > 0 and f2 (y) = 2e y 1 e y for y > 0 Since f (1; 2) = 2e 3 6= f1 (1) f2 (2) = 2e 1 2e therefore X and Y are not independent random variables. 2 1 e 2 9.2. CHAPTER 3 297 Exercise 3.5.3 P (Y = yjX = x) = = P (X = x; Y = y) P (X = x) n! x!y!(n x y)! 2 x [2 (1 2 x n! x!(n x)! y [2 (1 = = = = = for y = 0; 1; : : : ; (n (n x)! y! (n x y)! " n x 2 (1 y 1 " 2 (1 n x y 1 " 2 (1 n x y 1 " 2 (1 n x y 1 1 )] 2 y 1 ) 2 ) 2 ) 2 ) 2 #y " #y " #y " h )]y (1 )2 2 n x h (1 ) )2 (1 1 2 #(n 2 + 2 1 1 #y " 1 in x y x y 2 n x y 1 1 2 in 2 1 2 (1 1 2 x) y #(n 2 +2 x) y 2 2 ) 2 #(n #(n x) which is the probability function of a Binomial n x) y x) y x; 2 1(1 2 ) ( ) as required. We are given that there are x genotypes of type AA. Therefore there are only n x members (trials) whose genotype must be determined. The genotype can only be of type Aa or type aa. In a population with only these two types the proportion of type Aa would be 2 (1 ) 2 (1 ) 2 = 2 1 2 (1 ) + (1 ) and the proportion of type aa would be (1 2 (1 )2 ) + (1 ) 2 = (1 1 )2 2 =1 2 (1 1 ) 2 Since we have (n x) independent trials with probability of Success (type Aa) equal to 2 (1 ) then it follows that the number of Aa types, given that there are x members of type (1 2 ) AA, would follow a Binomial n x; 2 1(1 2 ) ( ) distribution. 298 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 3.5.4 For Example 3.3.3 the conditional probability density function of X given Y = y is f1 (xjy) = f (x; y) x+y = f2 (y) y + 12 for 0 < x < 1 for each 0 < y < 1 Check: Z1 f1 (xjy) dx = 1 Z1 0 1 x+y 1 dx = y+2 y+ 1 2 x + xyj10 2 1 2 = 1 y+ 1 2 1 +y 2 0 =1 By symmetry the conditional probability density function of Y given X = x is f2 (yjx) = x+y x + 21 and for 0 < y < 1 for each 0 < x < 1 Z1 f2 (yjx) dy = 1 1 For Exercise 3.3.6 the conditional probability density function of X given Y = y is f (x; y) f1 (xjy) = = f2 (y) 2 (1+x+y)3 1 (1+y)2 = 2 (1 + y)2 (1 + x + y)3 for x > 0 for each y > 0 Check: Z1 f1 (xjy) dx = 1 Z1 0 1 2 (1 + y)2 2 1 lim 3 dx = (1 + y) a!1 2 j0 (1 + x + y) (1 + x + y) = (1 + y)2 lim a!1 1 1 1 2 =1 2 + 2 = (1 + y) (1 + a + y) (1 + y) (1 + y)2 By symmetry the conditional probability density function of Y given X = x is f2 (yjx) = and 2 (1 + x)2 (1 + x + y)3 Z1 for y > 0 for each x > 0 f2 (yjx) dy = 1 1 For Exercise 3.3.7 the conditional probability density function of X given Y = y is f1 (xjy) = f (x; y) 2e x y = f2 (y) 2e y (1 e y) = e 1 x e y for 0 < x < y for each y > 0 9.2. CHAPTER 3 299 Check: Z1 Zy f1 (xjy) dx = 1 x e 1 e dx = y 1 1 e x y j0 e y = 1 1 y e e y =1 0 The conditional probability density function of Y given X = x is f2 (yjx) = 2e x y f (x; y) = = ex f1 (x) 2e 2x y for y > x for each x > 0 Check: Z1 Z1 f2 (yjx) dy = ex y dy = ex lim e y a jx Z1 Zy x y dxdy a!1 x 1 = ex 0 + e x =1 Exercise 3.6.10 E (XY ) = Z1 Z1 1 1 Z1 ye = 2 xyf (x; y)dxdy = 2 y 0 = 2 Z1 ye 0 @ Zy x xe 0 y 0 dxA dy = 2 y ye 0 1 e y xye Z1 y ye = 2 0 Z1 2 Z1 Z1 y2e 2y + 1 dy = dy (2y) e dy (2y) e y2e 2y y2e 2y + ye jy0 dy x 2 Z1 2y ye y dy 2 y e 2 Z1 u 2 = 2 1 2 ye y dy 0 2 2y 2y dy + 2 (2) dy Z1 (2y) e but (2) = 1! = 1 du Z1 ue Z1 ue u 2y dy let u = 2y or y = 1 u so dy = du 2 2 0 2 e u1 2 0 1 4 dy + 2 Z1 0 Z1 = 2 2y 0 0 = 2 e 0 Z1 0 = 2 x 0 0 = xe u1 2 du 0 u2 e u 0 du 1 2 Z1 0 1 =1 2 du = 2 1 (3) 4 1 (2) 2 but (3) = 2! = 2 300 9. SOLUTIONS TO CHAPTER EXERCISES Since X Exponential(1=2), E (X) = 1=2 and V ar (X) = 1=4. Z1 E (Y ) = yf2 (y)dy = 2 1 y ye y e + 1 dy 0 Z1 = Z1 2ye 2y dy + 2 0 Z1 y ye Z1 dy = 0 = 2 Z1 = 2 Z1 2y 2ye dy 2ye 2y dy + 2 (2) 0 let u = 2y 0 ue u1 2 1 2 du = 2 E Y = Z1 1 = 2 2 y f2 (y)dy = 2 Z1 u 1 (2) = 2 2 du = 2 1 3 = 2 2 y2e y e y + 1 dy 0 Z1 2 y e 2y dy + 2 0 = 4 ue 0 0 2 Z1 Z1 2 y e y dy = 2 0 2 Z1 y2e 2 Z1 u 2 2y dy Z1 y2e 2y dy + 2 (3) 0 let u = 2y 0 = 4 2 e u1 2 du = 4 1 4 Z1 u2 e u 1 (3) = 4 4 du = 4 0 0 V ar (Y ) = E Y 2 [E (Y )]2 = 7 2 2 3 2 = 14 9 4 = 5 4 Therefore Cov (X; Y ) = E (XY ) and (X; Y ) = 1 2 E (X) E (Y ) = 1 Cov(X; Y ) X Y =q 1 4 1 4 5 4 3 2 1 =p 5 =1 3 1 = 4 4 1 7 = 2 2 9.2. CHAPTER 3 301 Exercise 3.7.12 Since Y Gamma( ; 1 ) 1 E (Y ) = = 2 1 V ar (Y ) = = 2 and E Y k = Z1 yk 1 y y e ( ) dy let u = y 0 = ( ) Z1 u k+ 1 e u1 Z1 k du = ( ) 0 k Since XjY = y Weibull(p; y E (Xjy) = y for +k >0 1=p ) 1=p 1+ 1 p , E (XjY ) = Y 1=p 1+ and V ar (Xjy) = y V ar (XjY ) = Y 1=p 2 1+ 2=p 1+ 2 p 2 2 p 2 1+ 1+ 1 p 1 p Therefore E (X) = E [E (XjY )] = = = 1 0 ( + k) ( ) = uk+ 1=p 1 1+ p 1=p 1+ 1+ 1 p 1 p ( ) 1 p 1 p ( ) E Y 1=p 1 p e u du 302 9. SOLUTIONS TO CHAPTER EXERCISES and V ar(X) = E[V ar(XjY )] + V ar[E(XjY )] 2 1 1 2 1+ = E Y 2=p 1+ + V ar Y 1=p 1+ p p p 1 1 2 2 1+ E Y 2=p + 2 1 + V ar Y 1=p = 1+ p p p 2 1 2 = 1+ 1+ E Y 2=p p p h i2 2 1 + 2 1+ E Y 1=p E Y 1=p p 1 2 2 E Y 2=p 1+ E Y 2=p = 1+ p p i2 1 1 h 2 + 2 1+ 1+ E Y 2=p E Y 1=p p p i2 2 1 h 2 = 1+ 1+ E Y 2=p E Y 1=p p p 2 32 2=p 1p 2 1 p p 2 1 2 4 5 = 1+ 1+ p ( ) p ( ) 8 2 32 9 > > 2 2 1 1 < 1+ p = 1+ p p p 2=p 4 5 = > > ( ) ( ) : ; Exercise 3.7.13 Since P Beta(a; b) E (P ) = ab a , V ar (P ) = a+b (a + b + 1) (a + b)2 and E P k = Z1 pk (a + b) a p (a) (b) 1 p)b (1 1 dp 0 = (a + b) (a) (b) Z1 pa+k 1 p)b (1 1 dp 0 = (a + b) (a + k) (a) (a + k + b) Z1 (a + k + b) k+a p (a + k) (b) 1 0 = (a + b) (a + k) (1) = (a) (a + k + b) (a + b) (a + k) (a) (a + k + b) (1 p)b 1 dp 9.2. CHAPTER 3 303 provided a + k > 0 and a + k + b > 0. Since Y jP = p Geometric(p) E (Y jp) = and V ar (Y jp) = 1 p p 1 p p2 , E (Y jP ) = 1 P P , V ar (Y jP ) = = 1 P 1 1 P 1 = 2 P2 P 1 =E P 1 P Therefore E (Y ) = E [E (Y jP )] = E 1 P 1 (a + b) (a 1) 1 (a) (a 1 + b) (a 1 + b) (a 1 + b) (a 1) (a 1) (a 1) (a 1 + b) a 1+b b 1= a 1 a 1 = = = 1 1 provided a > 1 and a + b > 1. Now V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )] = E 1 P2 = E 1 P2 = E 1 P2 = 2E P = 2 1 P 2 E 1 P 1 1 P " 1 +E P E 1 P 1 P2 + V ar E P +E 1 (a + b) (a 2) (a) (a 2 + b) = 1 # 2E 1 E 1 P 1 P +1 provided a > 2 and a + b > 2. 2 1 E 1 P 2 + 2E 2 (a + b) (a 1) (a) (a 1 + b) (a + b) (a 1) (a) (a 1 + b) (a + b 2) (a + b 1) a + b 1 a+b 1 (a 2) (a 1) a 1 a 1 a+b 1 a+b 2 a+b 1 2 1 a 1 a 2 a 1 ab (a + b 1) (a 1)2 (a 2) = 2 = E P 2 2 2 1 P 1 304 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 3.9.4 If T = Xi + Xj ; i 6= j; then T Binomial(n; pi + pj ) The moment generating function of T = Xi + Xj is M (t) = E etT + E et(Xi +Xj ) = E etXi +tXj = M (0; : : : ; 0; t; 0; : : : ; 0; t; 0; : : : ; 0) = = p1 + + pi et + (pi + pj ) et + (1 + pj et + pi pj ) n + pk 1 for t 2 < + pk n for t 2 < which is the moment generating function of a Binomial(n; pi + pj ) random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions, T has a Binomial(n; pi + pj ) distribution. 9.3. CHAPTER 4 9.3 305 Chapter 4 Exercise 4.1.2 The support set of (X; Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions E and F shown in Figure 9.3For s > 1 y x=y/s x=y 1 F E 1 0 G (s) = P (S = Z Y s = P (Y sX) = P (Y sX X Z Z1 Zy Z1 3ydxdy = 3ydxdy = 3y xjyy=s dy s) = P y=0 x=y=s (x;y) 2 E = Z1 3y y y dy = s 1 0) 0 Z1 1 s 3y 2 dy = 1 1 s y 3 j10 = 1 1 s 0 0 The cumulative distribution function for S is ( G (s) = As a check we note that lim 1 s!1+ 1 s 1 0 s 1 s 1 s>1 = 0 = G (1) and lim 1 s!1 continuous function for all s 2 <. For s > 1 g (s) = d d G (s) = ds ds 1 1 s The probability density function of S is g (s) = and 0 otherwise. x 1 s2 for s > 1 = 1 s2 1 s = 1 so G (s) is a 306 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 4.2.5 Since X Exponential(1) and Y density function of X and Y is Exponential(1) independently, the joint probability x f (x; y) = f1 (x) f2 (y) = e = e e y x y with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 9.9.The transfor. . . y . . . ... x 0 Figure 9.9: Support set RXY for Example 4.2.5 mation S : U =X +Y, V =X has inverse transformation Y U +V U V , Y = 2 2 are mapped as X= Under S the boundaries of RXY (k; 0) ! (k; k) (0; k) ! (k; k) for k for k 0 0 and the point (1; 2) is mapped to and (3; 1). Thus S maps RXY into RU V = f(u; v) : as shown in Figure 9.10. u < v < u; u > 0g 9.3. CHAPTER 4 307 v=u v ... u 0 ... v=-u Figure 9.10: Support set RU V for Example 4.2.5 The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) @x @u @y @u @x @v @y @v = 1 2 1 2 1 2 1 2 = 1 2 The joint probability density function of U and V is given by g (u; v) = f = 1 e 2 u+v u v ; 2 2 u 1 2 for (u; v) 2 RU V and 0 otherwise. To …nd the marginal probability density functions for U we note that the support set RU V is not rectangular and the range of integration for v will depend on u. g1 (u) = Z1 g (u; v) dv 1 = 1 e 2 u Zu dv v= u 1 ue u (2) 2 = ue u for u > 0 = and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable. Therefore U Gamma(2; 1). 308 9. SOLUTIONS TO CHAPTER EXERCISES To …nd the marginal probability density functions for V we need to consider the two cases v 0 and v < 0. For v 0 g2 (v) = Z1 g (u; v) du 1 = Z1 1 2 e u du u=v = = = 1 2 1 2 1 e 2 lim e u b jv lim e b b!1 e b!1 v v For v < 0 g2 (v) = Z1 g (u; v) du 1 2 Z1 1 = e u du u= v = = = 1 2 1 2 1 v e 2 lim e u b lim e b j b!1 b!1 v ev Therefore the probability density function of V is ( 1 v v<0 2e g2 (v) = 1 v v 0 2e which is the probability density function of Double Exponential(0; 1) random variable. Therefore V Double Exponential(0; 1). 9.3. CHAPTER 4 309 Exercise 4.2.7 Since X Beta(a; b) and Y function of X and Y is Beta(a + b; c) independently, the joint probability density (a + b) a 1 x (1 (a) (b) (a + b + c) a x (a) (b) (c) f (x; y) = = x)b 1 1 (a + b + c) a+b y (a + b) (c) x)b (1 1 y a+b 1 (1 y)c 1 (1 y)c 1 1 with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g as shown in Figure 9.11. y 1 x 1 0 Figure 9.11: Support RXY for Exercise 4.2.5 The transformation S : U = XY , V = X has inverse transformation X =V, Y = U=V Under S the boundaries of RXY are mapped as (k; 0) ! (0; k) 0 k 1 (0; k) ! (0; 0) 0 k 1 0 k 1 (k; 1) ! (k; k) 0 k 1 (1; k) ! (k; 1) and the point 1 1 2; 3 is mapped to and 1 1 6; 2 . Thus S maps RXY into RU V = f(u; v) : 0 < u < v < 1g as shown in Figure 9.12 310 9. SOLUTIONS TO CHAPTER EXERCISES v 1 u 1 0 Figure 9.12: Support set RU V for Exercise 4.2.5 The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) 0 1 1 v @y @v 1 v = The joint probability density function of U and V is given by u 1 v v u (a + b + c) a 1 v (1 v)b 1 (a) (b) (c) v (a + b + c) a+b 1 u (1 v)b 1 v (a) (b) (c) g (u; v) = f v; = = a+b 1 b 1 u v 1 u v 1 c 1 c 1 1 v for (u; v) 2 RU V and 0 otherwise. To …nd the marginal probability density functions for U we note that the support set RU V is not rectangular and the range of integration for v will depend on u. g (u) = Z1 g (u; v) dv = (a + b + c) a+b u (a) (b) (c) Z1 (1 v)b 1 u v 1 v b 1 v=u 1 = 1 (a + b + c) a+b u (a) (b) (c) 1 Z1 b+1 v v)b (1 1 c 1 1 dv v2 u = (a + b + c) a u (a) (b) (c) 1 Z1 1 Z1 ub 1 1 v b 1 1 1 u v c 1 u = (a + b + c) a u (a) (b) (c) u u v u b 1 1 u v c 1 u dv v2 u dv v2 1 u v c 1 dv 9.3. CHAPTER 4 311 To evaluate this integral we make the substitution u v t= u u 1 Then u v 1 u = (1 u) t u = 1 u (1 v u = (1 u) dt v2 u) t = (1 u) (1 t) When v = u then t = 1 and when v = 1 then t = 0. Therefore g1 (u) = (a + b + c) a u (a) (b) (c) 1 Z1 u) t]b [(1 1 [(1 t)]c u) (1 1 (1 u) dt 0 = (a + b + c) a u (a) (b) (c) 1 (1 b+c 1 u) Z1 tb 1 (1 t)c 1 dt 0 = = (b) (c) (a + b + c) a 1 u (1 u)b+c 1 (a) (b) (c) (b + c) (a + b + c) a 1 u (1 u)b+c 1 for 0 < u < 1 (a) (b + c) and 0 otherwise which is the probability density function of a Beta(a; b + c) random variable. Therefore U Beta(a; b + c). Exercise 4.2.12 (a) Consider the transformation S: U= X=n , V =Y Y =m which has inverse transformation X= Since X of X and Z is 2 (n) independently of Y f (x; y) = f1 (x) f2 (y) = = 2(n+m)=2 2n=2 n UV , Y = V m 2 (m) then the joint probability density function 1 xn=2 (n=2) 1 xn=2 (n=2) (m=2) 1 1 e e x=2 2m=2 x=2 m=2 1 y 1 y m=2 (m=2) e 1 e y=2 y=2 with support set RXY = f(x; y) : x > 0; y > 0g. The transformation S maps RXY into RU V = f(u; v) : u > 0; v > 0g. 312 9. SOLUTIONS TO CHAPTER EXERCISES The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) @x @u @y @u @x @v @y @v n m = v 0 @x @v n v m = 1 The joint probability density function of U and V is given by n n uv; v v m m 1 n 2(n+m)=2 (n=2) (m=2) m g (u; v) = f = = n=2 1 (uv)n=2 1 n e ( m )uv=2 v m=2 n n=2 n=2 1 u nu m v (n+m)=2 1 e v( m +1)=2 (n+m)=2 2 (n=2) (m=2) 1 e v=2 n v m for (u; v) 2 RU V and 0 otherwise. To determine the distribution of U we still need to …nd the marginal probability density function for U . g1 (u) = Z1 g (u; v) dv 1 = Z1 n n=2 n=2 1 u nu m v (n+m)=2 1 e v( m +1)=2 dv (n+m)=2 2 (n=2) (m=2) 0 1 n n u so that v = 2y 1 + m u and dv = 2 1 + Let y = v2 1 + m v = 0 then y = 0, and when v ! 1 then y ! 1. Therefore g1 (u) = = Z1 n n=2 n=2 1 u m 2(n+m)=2 (n=2) (m=2) 0 n n=2 n=2 1 (n+m)=2 u 2 m 2(n+m)=2 (n=2) (m=2) 2y 1 + n u m 1 (n+m)=2 1 Z1 1 (n+m)=2 n 1+ u m 1 n mu e y y (n+m)=2 dy. Note that when 2 1+ 1 e y n u m 1 dy dy 0 = = n n=2 n=2 1 u m (n=2) (m=2) n n=2 m n+m 2 (n=2) (m=2) 1+ un=2 n u m 1 (n+m)=2 1+ n u m n+m 2 (n+m)=2 for u > 0 and 0 otherwise which is the probability density function of a random variable with a F(n; m) distribution. Therefore U = YX=n F(n; m). =m (b) To …nd E (U ) we use E (U ) = E X=n Y =m = m E (X) E Y n 1 9.3. CHAPTER 4 313 2 (n) independently of Y 2 (m). From Example 4.2.10 we know that if since X 2 W (k) then 2p (k=2 + p) k E (W p ) = for + p > 0 (k=2) 2 Therefore 2 (n=2 + 1) =n (n=2) E (X) = 1 E Y = 2 1 (m=2 1) 1 = (m=2) 2 (m=2 and E (U ) = m (n) E n 1 m = 2 1) = 1 m m m 2 if m > 2 2 if m > 2 To …nd V ar (U ) we need E U2 Now E X2 = E Y 2 = 2 2 " (X=n)2 =E (Y =m)2 # = m2 E X2 E Y n2 22 (n=2 + 2) n =4 +1 (n=2) 2 (m=2 2) = (m=2) 4 m 2 1 1 m 2 2 2 n = n (n + 2) 2 = (m 1 2) (m 4) for m > 4 and E U2 = m2 n (n + 2) n2 (m 1 2) (m 4) = n+2 n (m m2 2) (m 4) Therefore V ar (U ) = E U 2 = = = = = [E (U )]2 2 n+2 m2 m n (m 2) (m 4) m 2 2 m n+2 1 m 2 n (m 4) m 2 m2 (n + 2) (m 2) n (m 4) m 2 n (m 4) (m 2) 2 m 2 (n + m 2) m 2 n (m 4) (m 2) 2m2 (n + m 2) for m > 4 n (m 2)2 (m 4) for m > 4 314 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 4.3.3 X1 ; X2 ; : : : ; Xn are independent and identically distributed random variables with moment generating function M (t), E (Xi ) = , and V ar (Xi ) = 2 < 1. p The moment generating function of Z = n X = is p MZ (t) = E etZ = E et = e = e = e = e = e n(X )= p n t n1 P E exp Xi n i=1 n t P t p E exp Xi n i=1 n t t Q p Xi E exp since X1 ; X2 ; : : : ; Xn are independent n i=1 n t t Q p M since X1 ; X2 ; : : : ; Xn are identically distributed n i=1 n t t p M n p n = t p n = p n = p n = p n = Exercise 4.3.7 n P (Xi )2 = i=1 = = n P i=1 n P i=1 n P Xi X +X Xi X 2 2 n P +2 X Xi X + i=1 Xi X X = 2 i=1 2 +n X i=1 since n P Xi i=1 = = n P i=1 n P i=1 n P i=1 = 0 Xi n P X= i=1 Xi Xi n n P i=1 n P i=1 n 1 P Xi n i=1 Xi n P Xi nX X 2 9.3. CHAPTER 4 315 Exercise 4.3.11 Since X1 ; X2 ; : : : ; Xn are independent N 1 ; 21 random variables then by Theorem 4.3.8 n P 2 Xi X 2 (n 1) S1 2 U= = i=1 (n 1) 2 2 1 1 Since Y1 ; Y2 ; : : : ; Ym are independent N V = (m 1) S22 2 2 = 2; m P 2 2 random variables then by Theorem 4.3.8 Yi Y 2 i=1 2 2 2 (m 1) U and V are independent random variables since X1 ; X2 ; : : : ; Xn are independent of Y1 ; Y2 ; : : : ; Ym . Now S12 = S22 = 2 1 2 2 2 (n 1)S1 2 1 n 1 = 2 (m 1)S2 2 2 m 1 = U= (n V = (m 1) 1) Therefore by Theorem 4.2.11 S12 = S22 = 2 1 2 2 F (n 1; m 1) 316 9. SOLUTIONS TO CHAPTER EXERCISES 9.4 Chapter 5 Exercise 5.4.4 If Xn Binomial(n; p) then n Mn (t) = E etXn = pet + q If for t 2 < (9.7) = np then p= and q = 1 n Substituting 9.8 into 9.7 and simplifying gives " Mn (t) = Now lim n!1 et + 1 n " n 1+ n et 1 n = 1+ #n (9.8) n et 1 n t = e (e 1) #n for t 2 < for t < 1 t by Corollary 5.1.3. Since M (t) = e (e 1) for t 2 < is the moment generating function of a Poisson( ) random variable then by Theorem 5.4.1, Xn !D X Poisson( ). By Theorem 5.4.2 P (Xn = x) = n x n p (q) x x (np)x e x! np for x = 0; 1; : : : Exercise 5.4.7 Let Xi Binomial(1; p), i = 1; 2; : : : independently. Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = p and V ar (Xi ) = p (1 p), then p n Xn p p !D Z N(0; 1) p (1 p) by the Central Limit Theorem. Let Sn = n P Xi . Then i=1 S np p n = np (1 p) Now by 4.3.2(1) Sn It follows that n 1 n n P Xi p p p n p (1 p) i=1 p n Xn p = p !D Z p (1 p) N(0; 1) Binomial(n; p) and therefore Yn and Sn have the same distribution. Yn np Zn = p !D Z np (1 p) N(0; 1) 9.4. CHAPTER 5 317 Exercise 5.5.3 (a) Let g (x) = x2 which is a continuous function for all x 2 <. Since Xn !p a then by 5.5.1(1), Xn2 = g (Xn ) !p g (a) = a2 or Xn2 !p a2 . (b) Let g (x; y) = xy which is a continuous function for all (x; y) 2 <2 . Since Xn !p a and Yn !p b then by 5.5.1(2), Xn Yn = g (Xn ; Yn ) !p g (a; b) = ab or Xn Yn !p ab. (c) Let g (x; y) = x=y which is a continuous function for all (x; y) 2 <2 ; y 6= 0. Since Xn !p a and Yn !p b 6= 0 then by 5.5.1(2), Xn =Yn = g (Xn ; Yn ) !p g (a; b) = a=b , b 6= 0 or Xn =Yn !p a=b, b 6= 0. (d) Let g (x; z) = x 2z which is a continuous function for all (x; z) 2 <2 . Since Xn !p a and Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn 2Zn = g (Xn ; Zn ) !D g (a; Z) = a 2Z or Xn 2Zn !D a 2Z where Z N(0; 1). Since a 2Z N(a; 4), therefore Xn 2Zn !D a 2Z N(a; 4) (e) Let g (x; z) = 1=z which is a continuous function for all (x; z) 2 <2 ; z 6= 0. Since Zn !D Z N(0; 1) then by Slutsky’s Theorem, 1=Zn = g (Xn ; Zn ) !D g (a; z) = 1=Z or 1=Zn !D 1=Z where Z N(0; 1). Since h (z) = 1=z is a decreasing function for all z 6= 0 then by Theorem 2.6.8 the probability density function of W = 1=Z is 1 f (w) = p e 2 1=(2w2 ) 1 w2 for z 6= 0 Exercise 5.5.8 By (5.9) p n Xn p !D Z N(0; 1) and by Slutsky’s Theorem Un = p n Xn !D p p Let g (x) = x, a = , and b = 1=2. Then g 0 (x) = and the Delta Method n1=2 p Xn p Z 1 p 2 x N(0; ) (9.9) and g 0 (a) = g 0 ( ) = 1 p 1 !D p ( Z) = Z 2 2 N 0; 1 4 1 p 2 . By (9.9) 318 9. SOLUTIONS TO CHAPTER EXERCISES 9.5 Chapter 6 Exercise 6.4.6 The probability density function of a Exponential(1; ) random variable is (x f (x; ) = e ) for x and 2< and zero otherwise. The support set of the random variable X is [ ; 1) which depends on the unknown parameter . The likelihood function is n Q f (xi ; ) L( ) = = = i=1 n Q e i=1 n Q (xi e xi ) if xi en , i = 1; 2; : : : ; n and if xi and 2< i=1 or more simply L( ) = 8 <0 :en 2< if > x(1) if x(1) where x(1) = min (x1 ; x2 ; : : : ; xn ) is the maximum of the sample. (Note: In order to observe the sample x1 ; x2 ; : : : ; xn the value of must be smaller than all the observed xi ’s.) L( ) is a increasing function of on the interval ( 1; x(1)] ]. L( ) is maximized at = x(1) . The maximum likelihood estimate of is ^ = x(1) and the maximum likelihood estimator is ~ = X(1) . Note that in this example there is no solution to dd l ( ) = dd (n ) = 0 and the maximum likelihood estimate of is not found by solving dd l ( ) = 0. If n = 12 and x(1) = 2 8 <0 if > 2 L( ) = :e12 if 2 The relative likelihood function is R( ) = 8 <0 :e12( 2) if >2 if 2 is graphed in Figure 9.13 along with lines for determining 10% and 50% likelihood intervals. To determine the value of at which the horizontal line R = p intersects the graph of R( ) we solve e12( 2) = p to obtain = 2 + log p=12. Since R( ) = 0 if > 2 then a 100p% likelihood interval for is of the form [2 + log (p) =12; 2]. For p = 0:1 we obtain the 10% likelihood interval [2 + log (0:1) =12; 2] = [1:8081; 2]. For p = 0:5 we obtain the 50% likelihood interval [2 + log (0:5) =12; 2] = [1:9422; 2]. 9.5. CHAPTER 6 319 1 0.9 0.8 0.7 R(θ) 0.6 0.5 0.4 0.3 0.2 0.1 0 1.5 1.6 1.7 1.8 θ 1.9 2 2.1 2.2 Figure 9.13: Relative likelihood function for Exercise 6.4.6 Exercise 6.7.3 By Chapter 5, Problem 7 we have p n Q(Xn ; ) = q Xn n Xn n 1 Xn n !D Z N(0; 1) (9.10) and therefore Q(Xn ; ) is an asymptotic pivotal quantity. Let a be the value such that P (Z a) = (1 + p) =2 where Z N(0; 1). Then by (9.10) we have 1 0 p Xn n n aA p P@ a q Xn Xn 1 n n s s ! Xn a Xn Xn Xn a Xn Xn p = P 1 +p 1 n n n n n n n n and an approximate 100p% equal tail con…dence interval for is r r xn a xn xn xn a xn xn p 1 ; +p 1 n n n n n n n n or where ^ = xn =n. 2 6 6^ 4 a v u u^ 1 t n ^ 3 v u u^ 1 ^ 7 t 7 ;^ + a 5 n 320 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 6.7.12 (a) The likelihood function is L( ) = n Q f (xi ; ) = n Q i=1 i=1 n P = exp xi en or more simply L( ) = en n Q i=1 The log likelihood function is l( ) = n 2 1+e n Q n P ) ) 2 (x 1 1+e i=1 i=1 (x e (xi 1 log 1 + e for ) 2 (xi 1+e (xi ) for ) 2 2< for i=1 The score function is 2< 2< n P e (xi ) d l( ) = n 2 (xi ) d i=1 1 + e n P 1 = n 2 for 2 < xi 1 + e i=1 S( ) = Notice that S( ) = 0 cannot be solved explicitly. The maximum likelihood estimate can only be determined numerically for a given sample of data x1 ; x2 ; : : : ; xn . Note that since " # n P d e xi S( )= 2 for 2 < 2 d i=1 (1 + exi ) is negative for all values of > 0 then we know that S ( ) is always decreasing so there is only one solution to S( ) = 0. Therefore the solution to S( ) = 0 gives the maximum likelihood estimate. The information function is I( ) = = 2 d2 l( ) d 2 n P e xi i=1 (b) If X 2 (1 + exi ) for 2< Logistic( ; 1) then the cumulative distribution function of X is F (x; ) = 1 1 + e (x ) for x 2 <; Solving u= 1 1 + e (x ) 2< 9.5. CHAPTER 6 321 gives x= 1 u log 1 Therefore the inverse cumulative distribution function is F 1 (u) = log 1 u 1 for 0 < u < 1 If u is an observation from the Uniform(0; 1) distribution then vation from the Logistic( ; 1) distribution by Theorem 2.6.6. log 1 u 1 is an obser- (c) The generated data are 0:18 1:58 2:47 0:05 1:60 2:59 0:32 1:68 2:76 0:78 1:68 2:78 1:04 1:71 2:87 1:11 1:89 2:91 1:26 1:93 4:02 1:41 2:02 4:52 1:50 2:2:5 5:25 1:57 2:40 5:56 (d) Here is R code for calculating and graphing the likelihood function for these data. # function for calculating Logistic information for data x and theta=th LOLF<-function(th,x) {n<-length(x) L<-exp(n*th)*(prod(1+exp(th-x)))^(-2) return(L)} th<-seq(1,3,0.01) L<-sapply(th,LOLF,x) plot(th,L,"l",xlab=expression(theta), ylab=expression(paste("L(",theta,")")),lwd=3) 30000 0 10000 L(θ) 50000 The graph of the likelihood function is given in Figure 9.5.(e) Here is R code for Newton’s 1.0 1.5 2.0 θ 2.5 3.0 322 9. SOLUTIONS TO CHAPTER EXERCISES Method. It requires functions for calculating the score and information # function for calculating Logistic score for data x and theta=th LOSF<-function(th,x) {n<-length(x) S<-n-2*sum(1/(1+exp(x-th))) return(S)} # # function for calculating Logistic information for data x and theta=th LOIF<-function(th,x) {n<-length(x) I<-2*sum(exp(x-th)/(1+exp(x-th))^2) return(I)} # # Newton’s Method for Logistic Example NewtonLO<-function(th,x) {thold<-th thnew<-th+0.1 while (abs(thold-thnew)>0.00001) {thold<-thnew thnew<-thold+LOSF(thold,x)/LOIF(thold,x) print(thnew)} return(thnew)} # use Newton’s Method to find the maximum likelihood estimate # use the mean of the data to begin Newton’s Method # since theta is the mean of the distribution thetahat<-NewtonLO(mean(x),x) cat("thetahat = ",thetahat) The maximum estimate found using Newton’s Method is ^ = 2:018099 (f ) Here is R code for determining the values of S(^) and I(^). # calculate Score(thetahat) and the observed information Sthetahat<-LOSF(thetahat,x) cat("S(thetahat) = ",Sthetahat) Ithetahat<-LOIF(thetahat,x) cat("Observed Information = ",Ithetahat) The values of S(^) and I(^) are S(^) = 3:552714 I(^) = 11:65138 10 15 9.5. CHAPTER 6 323 (g) Here is R code for plotting the relative likelihood function for based on these data. # function for calculating Logistic relative likelihood function LORLF<-function(th,thetahat,x) {R<-LOLF(th,x)/LOLF(thetahat,x) return(R)} # # plot the Logistic relative likelihood function #plus a line to determine the 15% likelihood interval th<-seq(1,3,0.01) R<-sapply(th,LORLF,thetahat,x) plot(th,R,"l",xlab=expression(theta), ylab=expression(paste("R(",theta,")")),lwd=3) abline(a=0.15,b=0,col="red",lwd=2) 0.6 0.0 0.2 0.4 R(θ) 0.8 1.0 The graph of the relative likelihood function is given in Figure 9.5. 1.0 1.5 2.0 2.5 3.0 θ (h) Here is R code for determining the 15% likelihood interval and the approximate 95% con…dence interval (6.18) # determine a 15% likelihood interval using uniroot uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=1,upper=1.8)$root uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=2.2,upper=3)$root # calculate an approximate 95% confidence intervals for theta L95<-thetahat-1.96/sqrt(Ithetahat) U95<-thetahat+1.96/sqrt(Ithetahat) cat("Approximate 95% confidence interval = ",L95,U95) # display values 324 9. SOLUTIONS TO CHAPTER EXERCISES The 15% likelihood interval is [1:4479; 2:593964] The approximate 95% con…dence interval is [1:443893; 2:592304] which are very close due to the symmetric nature of the likelihood function. 9.6. CHAPTER 7 9.6 325 Chapter 7 Exercise 7.1.11 If x1 ; x2 ; :::; xn is an observed random sample from the Gamma( ; ) distribution then the likelihood function for ; is n Q f (xi ; ; ) L( ; ) = = i=1 n Q i=1 xi 1 e ( ) = [ ( ) xi = > 0; >0 1 n Q n ] for xi t2 exp for > 0; >0 i=1 where t2 = n P xi i=1 or more simply L( ; ) = [ ( ) ] n Q n xi t2 exp for > 0; >0 i=1 The log likelihood function is l ( ; ) = log L ( ; ) = n log ( ) n log where t1 = n P t2 + t1 for > 0; log xi i=1 The score vector is S( ; ) = = where h h @l @ n ( ) + t1 (z) = is the digamma function. The information matrix is i @l @ 2 = 4 @2l @ 2 @2l @ @ n 0 @2l @ @ @2l @ 2 n ( ) 2t2 n = n4 0 5 2 3 2 3 1 ( ) 1 3 n 3 2 2 d log (z) dz 2 I( ; ) = 4 t2 n log 2x 3 5 5 n i >0 326 9. SOLUTIONS TO CHAPTER EXERCISES where 0 (z) = d dz (z) is the trigamma function. The expected information matrix is J ( ; ) = E [I ( ; ) ; X1 ; :::; Xn ] 2 0 1 ( ) = n4 2E (X; ; ) 1 3 2 = n4 0 1 ( ) 1 2 2 2 3 2 3 5 5 since E X; ; = . S ( ; ) = (0 0) must be solved numerically to …nd the maximum likelihood estimates of and . 9.6. CHAPTER 7 327 Exercise 7.1.14 The data are 1:58 5:47 7:93 2:78 5:52 7:99 2:81 5:87 8:14 3:29 6:07 8:31 3:45 6:11 8:72 The maximum likelihood estimates of h (i+1) (i+1) i = h (i) (i) i 3:64 6:12 9:26 and +S 3:81 6:26 10:10 4:69 6:42 12:82 4:89 6:74 15:22 5:37 7:49 17:82 can be found using Newton’s Method (i) ; (i) h I (i) ; (i) i 1 for i = 0; 1; : : : Here is R code for Newton’s Method for the Gamma Example. # function for calculating Gamma score for a and b and data x GASF<-function(a,b,x)) {t1<-sum(log(x)) t2<-sum(x) n<-length(x) S<-c(t1-n*(digamma(a)+log(b)),t2/b^2-n*a/b) return(S)} # # function for calculating Gamma information for a and b and data x GAIF<-function(a,b,x) {I<-length(x)*cbind(c(trigamma(a),1/b),c(1/b,2*mean(x)/b^3-a/b^2)) return(I)} # # Newton’s Method for Gamma Example NewtonGA<-function(a,b,x) {thold<-c(a,b) thnew<-thold+0.1 while (sum(abs(thold-thnew))>0.0000001) {thold<-thnew thnew<-thold+GASF(thold[1],thold[2],x)%*%solve(GAIF(thold[1],thold[2])) print(thnew)} return(thnew)} thetahat<-NewtonGA(2,2,x) 328 9. SOLUTIONS TO CHAPTER EXERCISES The maximum likelihood estimates are ^ = 4:118407 and ^ = 1:657032. The score vector evaluated at (^ ; ^ ) is S(^ ; ^ ) = h 0 1:421085 which indicates we have obtained a local extrema. 10 14 i The observed information matrix is " # 8:239505 18:10466 ^ I(^ ; ) = 18:10466 44:99752 Note that since det[I(^ ; ^ )] = (8:239505) (44:99752) (18:10466)2 > 0 and [I(^ ; ^ )]11 = 8:239505 > 0 then by the second derivative test we have found the maximum likelihood estimates. 9.6. CHAPTER 7 329 Exercise 7.2.3 (a) The following R code generates the required likelihood regions. # function for calculating Gamma relative likelihood for parameters a and b and data x GARLF<-function(a,b,that,x) {t<-prod(x) t2<-sum(x) n<-length(x) ah<-that[1] bh<-that[2] L<-((gamma(ah)*bh^ah)/(gamma(a)*b^a))^n *t^(a-ah)*exp(t2*(1/bh-1/b)) return(L)} a<-seq(1,8.5,0.02) b<-seq(0.2,4.5,0.01) R<-outer(a,b,FUN = GARLF,thetahat,x) contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2) 1 2 b 3 4 The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) are shown in Figure 9.14. 2 4 6 8 a Figure 9.14: Likelihood regions for Gamma for 30 observations 330 9. SOLUTIONS TO CHAPTER EXERCISES The likelihood contours are not very elliptical in shape. The contours suggest that large values of together with small values of or small values of together with large values of are plausible given the observed data. (b) Since R (3; 2:7) = 0:14 the point (3; 2:7) lies inside a 10% likelihood region so it is a plausible value of ( ; ). 3.0 2.5 2.0 1.5 b 3.5 4.0 4.5 (d) The 1%, 5%, 10%, 50%, and 90% likelihood regions for ( ; ) for 100 observations are shown in Figure 9.15. We note that for a larger number of observations the likelihood regions are more elliptical in shape. 2.0 2.5 3.0 3.5 4.0 4.5 5.0 a Figure 9.15: Likelihood regions for Gamma for 100 observations 9.6. CHAPTER 7 331 Exercise 7.4.4 The following R code graphs the approximate con…dence regions. The function ConfRegion was used in Example 7.4.3. 1 2 b 3 4 # graph approximate confidence regions c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat) contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2) 2 4 6 8 a Figure 9.16: Approximate con…dence regions for Gamma for 30 observations These approximate con…dence regions which are ellipses are very di¤erent than the likelihood regions in Figure 9.14. In particular we note that ( ; ) = (3; 2:7) lies inside a 10% likelihood region but outside a 99% approximate con…dence region. There are only 30 observations and these di¤erences suggest the Normal approximation is not very good. The likelihood regions are a better summary of the uncertainty in the estimates. 332 9. SOLUTIONS TO CHAPTER EXERCISES Exercise 7.4.7 Let Since h h ~ ~ i I(^ ; ^ ) i 1 = " [J(~ ; ~ )]1=2 !D Z v^11 v^12 v^12 v^22 # BVN h 0 0 then for large n, V ar(~ ) v^11 , V ar( ~ ) v^22 and Cov(~ ; ~ ) mate 95% con…dence interval for a is given by p p [^ 1:96 v^11 ; ^ + 1:96 v^11 ] i ; " 1 0 0 1 #! v^12 . Therefore an approxi- and an approximate 95% con…dence interval for b is given by p p [ ^ 1:96 v^22 ; ^ + 1:96 v^22 ] For the data in Exercise 7.1.14, ^ = 4:118407 and ^ = 1:657032 and h i 1 I(^ ; ^ ) " # 1 8:239505 18:10466 = 18:10466 44:99752 " # 1:0469726 0:4212472 = 0:4212472 0:1917114 An approximate 95% marginal con…dence interval for is p p [4:118407 + 1:96 1:0469726; 4:118407 1:96 1:0469726] = [2:066341; 6:170473] An approximate 95% con…dence interval for is p p [1:657032 1:96 0:1917114; 1:657032 + 1:96 0:1917114] = [1:281278; 2:032787] 9.6. CHAPTER 7 333 To obtain an approximate 95% marginal con…dence interval for + we note that V ar(~ + ~ ) = V ar(~ ) + V ar( ~ ) + 2Cov(~ ; ~ ) v^11 + v^22 + 2^ v12 = v^ so that an approximate 95% con…dence interval for + is given by p p [^ + ^ 1:96 v^; ^ + ^ + 1:96 v^] For the data in Example 7.1.13 ^ + ^ = 4:118407 + 1:657032 = 5:775439 v^ = v^11 + v^22 + 2^ v12 = 1:0469726 + 0:1917114 + 2( 0:4212472) = 0:3961895 and an approximate 95% marginal con…dence interval for + is p p [5:775439 + 1:96 0:3961895; 5:775439 1:96 0:3961895] = [4:998908; 6:551971] 334 9.7 9. SOLUTIONS TO CHAPTER EXERCISES Chapter 8 Exercise 8.1.7 (a) Let X = number of successes in n trails. Then X Binomial(n; ) and E (X) = n . If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : 6= 0 then a suitable test statistic is D = jX n 0 j. For n = 100, x = 42, and The p-value is 0 = 0:5 the observed value of D is d = jx P (jX n 0j jx n 0 j ; H0 : = P (jX 50j 8) where X = P (X 42) + P (X = n 0 j = j42 50j = 8. 0) Binomial (100; 0:5) 58) = 0:06660531 + 0:06660531 = 0:1332106 calculated using R. Since p-value > 0:1 there is no evidence based on the data against H0 : = 0:5. (b) If the null hypothesis is H0 : a suitable test statistic is D = n For n = 100, x = 42, and The p-value is 0 = 0 0 and the alternative hypothesis is HA : 0 X = P (50 X = P (X 42) 0 then X. = 0:5 the observed value of D is d = n P (n < n 8) 0 x; H0 : where X = 0 x = 50 42 = 8. 0) Binomial (50; 0:5) = 0:06660531 calculated using R. Since 0:05 < p-value < 0:1 there is weak evidence based on the data against H0 : = 0:5. Exercise 8.2.5 The model for these data is (X1 ; X2 ; : : : ; X7 ) hypothesis of interest is H0 : 1 = 2 = = Multinomial(63; 1 ; 2 ; : : : ; 7 ) and the 1 = 7 7 . Since the model and parameters 7 P are completely speci…ed this is a simple hypothesis. Since j = 1 there are only k = 6 j=1 parameters. The likelihood ratio test statistic, which can be derived in the same way as Example 8.2.4, is 7 P Xj (X; 0 ) = 2 Xj log Ej i=1 where Ej = 63=7 is the expected frequency for outcome j. 9.7. CHAPTER 8 335 For these data the observed value of the likelihood ratio test statistic is (x; 0) = 2 7 P xj log i=1 = 2 22 log xj 63=7 22 + 7 log 63=7 7 63=7 + + 6 log 6 63=7 = 23:27396 The approximate p-value is p-value P (W 23:27396) 2 where W (6) = 0:0007 calculated using R. Since p-value < 0:001 there is strong evidence based on the data against the hypothesis that the deaths are equally likely to occur on any day of the week. Exercise 8.3.3 (a) = f( 1 ; 2 ) : 1 > 0; = f( 1 ; 2 ) : 1 = 2 ; 0 H0 : 1 = 2 is composite. > 0g which has dimension k = 2 and > 0; 2 > 0g which has dimension q = 1 and the hypothesis 2 1 From Example 6.2.5 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn from an Poisson( 1 ) distribution is L1 ( 1 ) = nx n 1 e 1 for 0 1 with maximum likelihood estimate ^1 = x. Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from an Poisson( 2 ) distribution is L2 ( 2 ) = ny n 2 e 2 for 0 2 with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood function for ( 1 ; 2 ) is L ( 1; 2) = L1 ( 1 )L2 ( 2 ) for 0; 1 0 2 and the log likelihood function l ( 1; 2) = nx log 1 n 1 + my log 2 m 2 for 1 > 0; 2 >0 The independence of the samples also implies the maximum likelihood estimators are ~1 = X and ~2 = Y . Therefore l(~1 ; ~2 ; X; Y) = nX log X nX + mY log Y = nX log X + mY log Y mY nX + mY 336 If 1 9. SOLUTIONS TO CHAPTER EXERCISES = 2 = then the log likelihood function is l ( ) = (nx + my) log which is only a function of . To determine (n + m) ( max l( 1 ; 1 ; 2 )2 d d l ( ) = 0 for ( max 1 ; 2 )2 nx+my n+m = l( 1 ; >0 2 ; X; Y) we note that 0 d nx + my l( ) = d and for (n + m) and therefore 2 ; X; Y) = nX + mY log 0 nX + mY n+m nX + mY The likelihood ratio test statistic is (X; Y; 0) = 2 l(~1 ; ~2 ; X; Y) ( max 1 ; 2 )2 l( 1 ; 2 ; X; Y) 0 = 2 nX log X + mY log Y nX + mY = 2 nX log X + mY log Y nX + mY log nX + mY log nX + mY n+m + nX + mY nX + m Y n+m with corresponding observed value (x; y; Since k q=2 0) (nx + my) log nx + my n+m 1=1 p-value (b) For n = 10, = 2 nx log x + my log y 10 P 2 P [W (x; y; 0 )] where W (1) h i p = 2 1 P Z (x; y; 0 ) where Z xi = 22, m = 15, i=1 test statistic is (x; y; p-value 15 P N (0; 1) yi = 40 the observed value of the likelihood ratio i=1 0) = 0:5344026 and 2 P [W 0:5344026] where W (1) h i p = 2 1 P Z 0:5344026 where Z N (0; 1) = 0:4647618 calculated using R. Since p-value > 0:1 there is no evidence against H0 : the data. 1 = 2 based on 10. Solutions to Selected End of Chapter Problems 337 338 10.1 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Chapter 2 1(a) Starting with 1 P x = x=0 it can be shown that 1 P x = x2 x 1 = x3 x 1 = x x=1 1 P x=1 1 P x=1 1 for j j < 1 1 for j j < 1 (1 )2 1+ for j j < 1 (1 )3 1+4 + 2 for j j < 1 (1 )4 (10.1) (10.2) (10.3) (1) and therefore 1 P 1 = x k x=1 x = f (x) = (1 )2 (1 )2 x using (10.1) gives k = x 1 The graph of f (x) in Figure 10.1 is for for x = 1; 2; :::; 0 < )2 (1 <1 = 0:3. 0.5 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 x Figure 10.1: Graph of f (x) (2) 8 > <0 x F (x) = P > : (1 t=1 if x < 1 )2 t t 1 =1 (1 + x x ) x for x = 1; 2; ::: 10.1. CHAPTER 2 339 Note that F (x) is speci…ed by indicating its value at each jump point. (3) 1 P E (X) = )2 x x (1 x 1 )2 = (1 x=1 x 1 (1 + ) (1 )3 )2 using (10.2) (1 + ) (1 ) = 1 P = )2 xx x2 (1 1 )2 = (1 x=1 (1 + 4 + 2 ) (1 )4 (1 + 4 + 2 ) (1 )2 = )2 [E (X)]2 = V ar(X) = E(X 2 ) 1 P x3 x 1 x=1 = (1 (4) Using x2 x=1 = (1 E X2 1 P using (10.3) (1 + 4 + 2 ) (1 )2 (1 + )2 2 2 = (1 ) (1 )2 = 0:3, P (0:5 < X 2) = P (X = 1) + P (X = 2) = 0:49 + (0:49)(2)(0:3) = 0:784 P (X > 0:5; X P (X 2) P (X = 1) + P (X = 2) =1 P (X 2) P (X > 0:5jX = 2) = 2) 1:(b) (1) 1 k = Z1 1 = 2 1 dx = 2 1 + (x= )2 Z1 1 dy 1 + y2 Z1 0 1 dx because of symmetry 1 + (x= )2 1 let y = x; dy = dx 0 = 2 lim arctan (b) = 2 b!1 Thus k = 1 2 = and f (x) = h 1 1 + (x= )2 i for x 2 <; >0 340 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS The graphs for = 0:5, 1 and 2 are plotted in Figure 10.2. The graph for each di¤erent value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That is, on the x axis, each point x is relabeled x and on the y axis, each point y is relabeled y= . The graph of f (x) below is for = 1: Note that the graph of f (x) is symmetric about the y axis. 0.7 0.6 θ=0.5 0.5 0.4 f (x) 0.3 0.2 θ=0 0.1 θ=2 0 -5 0 x 5 Figure 10.2: Cauchy probability density functions (2) F (x) = Zx 1 = 1 h 1 1 + (t= ) x arctan + 2 i dt = 1 2 1 lim arctan b! 1 t jxb for x 2 < (3) Consider the integral Z1 0 Since the integral x h i dx = 1 + (x= )2 Z1 0 does not converge, the integral Z1 1 Z1 t dt = 1 + t2 lim b!1 0 h x 1 + (x= )2 h i dx x 1 + (x= )2 i dx 1 ln 1 + b2 = +1 2 10.1. CHAPTER 2 341 does not converge absolutely and E (X) does not exist. Since E (X) does not exist V ar (X) does not exist. (4) Using = 1, P (0:5 < X 2) = F (2) F (0:5) 1 = [arctan (2) arctan (0:5)] 0:2048 P (X > 0:5; X 2) P (X 2) P (0:5 < X 2) F (2) F (0:5) = P (X 2) F (2) arctan (2) arctan (0:5) 0:2403 arctan (2) + 2 P (X > 0:5jX = = 2) = 1:(c) (1) 1 k = Z1 = Z1 e jx e jyj j dx 1 dy = 2 1 let y = x Z1 e y dy , then dy = dx by symmetry 0 = 2 (1) = 2 (0!) = 2 Thus k = 1 2 and 1 f (x) = e jx j for x 2 <; 2 < 2 The graphs for = 1, 0 and 2 are plotted in Figure 10.3. The graph for each di¤erent value of is obtained from the graph for = 0 by simply shifting the graph for = 0 to the right units if is positive and shifting the graph for = 0 to the left units if is negative. Note that the graph of f (x) is symmetric about the line x = . (2) 8 x R 1 t > > x > < 1 2 e dt F (x) = R 1 t Rx 1 t+ > > > e dt + dt x > : 2 2e = 8 < 1 ex 1 x 2 :1 + 2 1 t+ jx 2 e =1 1 x+ 2e x> 342 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 1 0.9 θ=0 θ=-1 0.8 θ=2 0.7 0.6 f(x) 0.5 0.4 0.3 0.2 0.1 0 -5 0 x 5 Figure 10.3: Double Exponential probability density functions (3) Since the improper integral Z1 converges, the integral (x E X = 1 2 Z1 1 Z1 = 1 2 = Z1 )e 1 2 R1 x2 e jx j dx jx j dx ye y dy = (2) = 1! = 1 y converges absolutely and by the symmetry of f (x) jyj let y = x 1 dy = 2 1 dy + 0 + 2 Z1 Z1 , then dy = dx y 2 + 2y + 2 e jyj dy 1 e y dy using the properties of even/odd functions 0 (3) + = 2+ dx = Z1 1 (y + ) e 0 = xe 2 y2e ) 0 we have E (X) = . 2 (x 2 (1) = 2! + 2 2 Therefore V ar(X) = E(X 2 ) (4) Using [E (X)]2 = 2 + 2 2 =2 2 ) = 0, P (0:5 < X 2) = F (2) 1 F (0:5) = (e 2 0:5 e 0:2356 10.1. CHAPTER 2 343 P (X > 0:5jX = P (X > 0:5; X P (X 2) 2 e ) 0:2527 2 2) = 1 0:5 2 (e 1 12 e 2) = P (0:5 < X 2) F (2) F (0:5) = P (X 2) F (2) 1:(e) (1) 1 k = Z1 x2 e x dx let y = x; 1 dy = dx 0 = 1 3 Z1 y2e y dy = 1 (3) = 3 2! 3 = 2 3 0 Thus k = 3 2 and 1 3 2 x x e for x 0; > 0 2 The graphs for = 0:5, 1 and 2 are plotted in Figure 10.4. The graph for each di¤erent value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That is, on the x axis, each point x is relabeled x= and on the y axis, each point y is relabeled y. f (x) = 0.3 θ=0.5 θ=1 θ=2 0.25 0.2 f(x) 0.15 0.1 0.05 0 0 2 4 6 8 10 x Figure 10.4: Gamma probabilty density functions (2) F (x) = 8 > <0 > : 21 if x Rx 0 3 2 t e t dt 0 if x > 0 344 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Using integration by parts twice we have 1 2 Zx 3 2 t e t dt = 0 = = = = Therefore F (x) = 2 3 Zx 14 ( t)2 e t jx0 + 2 te t dt5 2 80 93 2 Zx < = 14 ( x)2 e x + 2 ( t) e t jx0 + e t dt 5 : ; 2 0 n oi 1h ( x)2 e x 2 ( x) e x 2 e t jx0 2 i 1h ( x)2 e x 2 ( x) e x 2e x + 2 2 1 1 e x 2 x2 + 2 x + 2 for x > 0 2 8 <0 if x :1 1 2e 2 2 x x +2 x+2 0 if x > 0 (3) E (X) = 1 2 Z1 x 3 3 x e dx 1 let y = x; dy = dx 0 = 1 2 Z1 y3e y dy = 1 2 (4) = 3! 3 = 2 0 E X 2 = 1 2 Z1 x 3 4 x e dx let y = x; 1 dy = dx 0 = 1 2 2 Z1 y4e y dy = 1 4! 12 2 (5) = 2 = 2 2 2 0 and V ar (X) = E X 2 2 [E (X)] = 12 2 3 2 = 3 2 10.1. CHAPTER 2 (4) Using 345 = 1, P (0:5 < X 2) = F (2) F (0:5) 1 x 2 = 1 e x + 2x + 2 j20:5 2 1 2 1 = 1 e (4 + 4 + 2) 1 e 2 2 13 1 e 0:5 e 2 (10) = 2 4 0:3089 1 +1+2 4 0:5 P (X > 0:5; X 2) P (X 2) F (2) F (0:5) P (0:5 < X 2) = = P (X 2) F (2) 1 13 0:5 2 e e (10) 4 = 2 1 2 1 2 e (10) 0:9555 P (X > 0:5jX 2) = 1:(g) (1) Since ex= f ( x) = 2 1 + ex= ex= = e2x= 1+e 2 x= = e x= 1+e x= 2 = f (x) therefore f is an even function which is symmetric about the y axis. 1 k = Z1 f (x)dx = 1 Z1 = 2 0 = 2 Z1 1 e e x= x= 1+e 2 dx x= 1+e 2 dx x= 1 b!1 1 + e lim x= by symmetry 1 b!1 1 + e jb0 = 2 lim b= 1 =2 2 Therefore k = 1= . (2) F (x) = Zx 1 = = e t= 1+e 1 1+e 1 1+e x= x= t= 2 dt a! 1 11+e lim a! 1 11+e = lim for x 2 < a= t= jxa 1 2 = 346 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 0.25 0.2 0.15 f(x) 0.1 0.05 0 -10 -5 0 x 5 10 Figure 10.5: Graph of f (x) for =2 (3) Since f is a symmetric function about x = 0 then if E (jXj) exists then E (X) = 0. Now E (jXj) = 2 Z1 x= xe 1+e 0 x= 2 dx Since Z1 x e x= Z1 dx = 0 y ye dy = (2) 0 converges and x e x= xe 1+e x= x= 2 for x 0 therefore by the Comparison Test for Integrals the improper integral E (jXj) = 2 Z1 0 converges and thus E (X) = 0. xe 1+e x= x= 2 dx 10.1. CHAPTER 2 347 By symmetry E X2 Z1 = 1 = 2 x2 e x= 1+e x= Z0 1 = 2 x2 e x= 1+e x= Z0 2 2 dx 1 2 dx let y = x= y y2e (1 + e dy y )2 Using integration by parts with u = y2; du = 2ydy; dv = e y ; (1 + e y )2 v= 1 (1 + e y) we have Z0 1 y2e (1 + e y y2 dy = lim y )2 a! 1 (1 + e j0 y) a a2 = lim a! 1 (1 + e a ) = = 2a e a lim a! 1 2 lim e a! 1 = 0 = 2 2 Z0 Z0 a 2 2 2y 1+e y dy Z0 y 1+e y dy 1 2 Z0 Z0 Z0 1 y 1+e y dy y 1+e y dy by L’Hospital’s Rule dy y multiply by 1 yey dy 1 + ey Let u = ey ; du = ey dy; log u = y Z0 1 yey dy = 1 + ey Z1 0 =1 1 1 y 1+e a by L’Hospital’s Rule 1 to obtain lim e a! 1 log u du = 1+u 2 12 ey ey 348 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS This de…nite integral can be found at https://en.wikipedia.org/wiki/List_of_de…nite_integrals. Z1 2 log u du = 1+u 12 0 Therefore E X2 = 2 2 2 2 12 2 2 = 3 and V ar (X) = E X 2 [E (X)]2 = 2 2 3 02 2 2 = 3 (4) P (0:5 < X 2) = F (2) F (0:5) 1 1 = 2=2 1+e 1 + e 0:5=2 1 1 = 1 1+e 1 + e 0:25 0:1689 P (X > 0:5jX 2) = P (X > 0:5; X P (X 2) 2) P (0:5 < X P (X 2) F (2) F (0:5) = F (2) 1 1 = 1 1+e 1+e 0:2310 = using 0:25 1+e =2 2) 1 10.1. CHAPTER 2 349 2:(a) Since )2 x f (x; ) = (1 x 1 therefore f0 (x) = f (x; = 0) = 0 and f1 (x) = f (x; = 1) = 0 Since f (x; ) 6= f0 (x therefore ) 1 x and f (x; ) 6= f1 is neither a location nor scale parameter. 2: (b) Since f (x; ) = therefore 1 h i 1 + (x= )2 1 (1 + x2 ) f1 (x) = f (x; = 1) = Since therefore for x 2 <; 1 x f (x; ) = f1 >0 for x 2 < for all x 2 <; >0 is a scale parameter for this distribution. 2: (c) Since 1 f (x; ) = e 2 therefore jx j for x 2 <; 1 f0 (x) = f (x; = 0) = e 2 jxj 2< for x 2 < Since f (x; ) = f0 (x therefore ) for x 2 <; 2< is a location parameter for this distribution. 2: (e) Since f (x; ) = therefore 1 2 3 2 x e x for x 1 f1 (x) = f (x; = 1) = x2 e 2 0; x >0 for x Since f (x; ) = f1 ( x) for x therefore 1= is a scale parameter for this distribution. 0; >0 0 350 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 4: (a) Note that f (x) can be written as 8 2 c =2 ec(x ) > > <ke 2 f (x) = ke (x ) =2 > > :kec2 =2 e c(x x< c c ) x +c x> +c Therefore 1 k = e c2 =2 = 2ec Z 2 =2 Z+c ) dx + e c ec(x 1 Z1 = 2e = = ) =2 (x dx + e c2 =2 c e cu du + p 2 c c2 =2 2 lim b!1 e 1 p e 2 cu b jc + p z 2 =2 2 P (jZj 2 c2 =2 c2 p e e + 2 [2 (c) 1] c 2 c2 =2 p e + 2 [2 (c) 1] c where c) where Z F (x) = ke Zx ec(u x Z ecy dy ) du; 1 = kec 2 =2 1 = kec = and F ( c) = kc e c2 =2 . 2 =2 dx lim a! 1 k c2 =2+c(x e c ) 1 cy x e ja c N (0; 1) is the N (0; 1) c.d.f. c then c2 =2 ) dz as required. 4: (b) If x < c(x +c Zc c 1 e c Z1 let y = u 10.1. CHAPTER 2 If c x 351 + c then F (x) = k e c c2 =2 x p Z 1 p e +k 2 2 (u )2 =2 z 2 =2 dz du let z = u c = x p Z 1 p e +k 2 2 k e c c2 =2 k e c k e c c2 =2 p + k 2 [ (x ) ( c)] c2 =2 p + k 2 [ (x )+ (c) c = = 1] : If x > + c then F (x) = 1 ke c2 =2 Z1 e c(u ) cy dy du let y = u x ke c2 =2 = 1 ke c2 =2 = 1 k c2 =2 e c = 1 Z1 e x 1 e c lim b!1 c(x cy b jx ) Therefore 8 2 k c =2+c(x ) > > <ce p 2 F (x) = kc e c =2 + k 2 [ (x > > :1 k ec2 =2 c(x ) x< )+ (c) c x +c x> +c c Since 1] c is a location parameter (see part (c)) E X k = Z1 x f (x) dx = = Z1 (y + )k f0 (y) dy 1 1 k Z1 1 xk f0 (x ) dx let y = u (10.4) 352 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS In particular E (X) = Z1 (y + ) f0 (y) dy = = Z1 yf0 (y) dy + (1) = Z1 yf0 (y) dy + Z1 1 Z1 yf0 (y) dy + 1 f0 (y) dy 1 since f0 (y) is a p.d.f. 1 1 Now 1 k Z1 c2 =2 yf0 (y) dy = e 1 let y = = Z c cy ye dy + Zc ye y 2 =2 c2 =2 dy + e ye cy dy Z1 ye cy c c 1 Z1 u in the …rst integral Zc Z1 cu c2 =2 ue du + ye e c y 2 =2 c2 =2 dy + e dy c c By integration by parts Z1 cy ye dy = c = 2 1 lim 4 ye b!1 c lim b!1 = Also since g (y) = ye about 0 y 2 =2 1 ye c 1+ 1 c2 e cy b jc cy b jc + 1 c Zb cy e c 1 e c2 cy b jc 3 dy 5 c2 (10.5) is a bounded odd function and [ c; c] is a symmetric interval Z+c ye y 2 =2 dy = 0 c Therefore 1 k Z1 yf0 (y) dy = 1+ 1 c2 e c2 +0+ 1+ 1 c2 1 and E (X) = Z1 1 yf0 (y) dy + = 0 + = e c2 =0 10.1. CHAPTER 2 353 To determine V ar (X) we note that V ar (X) = E X 2 [E (X)]2 Z1 = (y + )2 f0 (y) dy 1 2 = Z1 y f0 (y) dy + 2 = Z1 y 2 f0 (y) dy + 2 (0) + = Z1 y 2 f0 (y) dy 2 1 Z1 using (10:4) 2 yf0 (y) dy + 1 Z1 f0 (y) dy 2 1 2 2 (1) 1 1 Now 1 k Z1 1 2 y f0 (y) dy = e c2 =2 Z c 2 cy y e dy + 2 y e y 2 =2 c 2 =2 y 2 =2 dy + e y2e cy dy + c Zc y2e c2 =2 y 2 =2 dy c By integration by parts and using (10.5) we have Z1 c y2e cy dy = 2 1 lim 4 y 2 e b!1 c cy b jc + 2 1 = ce + 1+ 2 c c 2 2 2 = c+ + 3 e c c c c2 y2e Z1 c c Z1 Z1 cy dy c u in the …rst integral) Z1 Zc c2 =2 2 cu = e u e du + y 2 e = 2ec dy + e c2 =2 c 1 (let y = Zc 2 c Zb ye c e c2 cy 3 dy 5 y2e cy dy 354 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Also Z+c y2e y 2 =2 c dy = 2 Zc y2e y 2 =2 dy 0 2 = 2 4 ye n y 2 =2 c j0 p + 2 Zc 0 1 p e 2 y 2 =2 o = 2 ce + 2 [ (c) 0:5] p 2 2 [2 (c) 1] 2ce c =2 = p c2 =2 3 dy 5 using integration by parts Therefore V ar (X) = Z1 y 2 f0 (y) dy 1 p 2 2 2 2 = k 2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2 c c ( ) 1 p = 2 c2 =2 + 2 [2 (c) 1] ce p 2 2 2 2 2 c + + 3 e c + 2 [2 (c) 1] 2ce c =2 c c 4: (c) Let f0 (x) = f (x; = 0) ( 2 ke x =2 = 2 ke cjxj+c =2 if jxj c if jxj > c Since f0 (x ) = ( ke ke = f (x) therefore (x cjx )2 =2 j+c2 =2 if jx if jx is a location parameter for this distribution. j c j>c 10.1. CHAPTER 2 355 4: (d) On the graph in Figure 10.6 we have graphed f (x) for c = 1, = 0 (red), f (x) for c = 2, = 0 (blue) and the N(0; 1) probability density function (black). 0.4 N(0,1) p.d.f. (black) 0.35 0.3 0.25 c=2(blue) f(x) 0.2 0.15 0.1 c=1(red) 0.05 0 -5 Figure 10.6: Graphs of f (x) for c = 1, (black). 0 x = 0 (red), c = 2, 5 = 0 and the N(0; 1) p.d.f. We note that there is very little di¤erence between the graphs of the N(0; 1) probability density function and the graph for f (x) for c = 2, = 0 however as c becomes smaller (c = 1) the “tails”of the probability density function become much “fatter”relative to the N(0; 1) probability density function. 356 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 5: (a) Since X P (X Geometric(p) k) = 1 P p (1 p)k p) = 1 (1 p) x p (1 x=k = (1 p)k by the Geometric Series for k = 0; 1; : : : (10.6) Therefore P (X k + jjX k) = = P (X k + j; X P (X k) p)k+j (1 = P (X k + j) P (X k) by (10:6) p)k (1 k) p)j = (1 = P (X j) for j = 0; 1; : : : (10.7) Suppose we have a large number of items which are to be tested to determine if they are defective or not. Suppose a proportion p of these items are defective. Items are tested one after another until the …rst defective item is found. If we let the random variable X be the number of good items found before observing the …rst defective item then X Geometric(p) and (10.6) holds. Now P (X j) is the probability we …nd at least j good items before observing the …rst defective item and P (X k + jjX k) is the probability we …nd at least j more good items before observing the …rst defective item given that we have already observed at least k good items before observing the …rst defective item. Since these probabilities are the same for all nonnegative integers by (10.7), this implies that, no matter how many good items we have already observed before observing the …rst defective item, the probability of …nding at least j more good items before observing the …rst defective item is the same as when we …rst began testing. It is like we have “forgotten” that we have already observed at least k good items before observing the …rst defective item. In other words, conditioning on the event that we have already observed at least k good items before observing the …rst defective item does not a¤ect the probability of observing at least j more good items before observing the …rst defective item. 5: (b) If Y Exponential( ) then P (Y a) = Z1 1 y= e dy = e a= for a > 0 a Therefore P (Y a + bjY a) = = P (Y e = e as required. a + b; Y P (Y a) a) = P (Y a + b) P (Y a) (a+b)= e b= a= = P (Y by (10.8) b) for all a; b > 0 (10.8) 10.1. CHAPTER 2 357 6: Since f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support sets A1 ; A2 ; : : : ; Ak then we know that fi (x) > 0 for all x 2 Ai ; i = 1; 2; : : : ; k. Also since k k P P 0 < p1 ; p2 ; : : : ; pk 1 with pi = 1, we have that g (x) = pi fi (x) > 0 for all i=1 x2A= k S i=1 Ai and A = support set of X. Also i=1 Z1 g(x)dx = k X Z1 pi i=1 1 fi (x) dx = k X pi (1) = i=1 1 k X pi = 1 i=1 Therefore g (x) is a probability density function. Now the mean of X is given by E (X) = Z1 xg(x)dx = k X pi Z1 2 x g(x)dx = k X pi Z1 pi i=1 1 = k X xfi (x) dx 1 i i=1 As well E X 2 = i=1 1 = k X 2 i + pi Z1 x2 fi (x) dx 1 2 i i=1 since Z1 x2 fi (x) dx = 1 = = Z1 (x 1 2 i + 2 i + i) 2 2 i 2 fi (x) dx + 2 i Z1 xfi (x) dx 1 2 i 1 2 i Thus the variance of X is V ar (X) = k X i=1 pi 2 i + 2 i k X i=1 2 i Z1 pi i !2 fi (x) dx 358 7:(a) Since X 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Gamma( ; ) the probability density function of X is f (x) = 1 e x= x for x > 0 ( ) and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = ex = h(x) is a one-to-one function on A and h maps the set A to the set B = fy : y > 1g. Also x=h 1 d h dy (y) = log y and 1 (y) = 1 y The probability density function of Y is = = d h dy 1 (y) (log y) 1 e ( ) (log y) 1 g (y) = f h y 1 (y) log y= 1= 1 y 1 for y 2 B ( ) and 0 otherwise. 7:(b) Since X Gamma( ; ) the probability density function of X is f (x) = 1 e x= x ( ) for x > 0 and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 1=x = h(x) is a one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also x=h 1 (y) = 1 d and h y dy 1 (y) = 1 y2 The probability density function of Y is g (y) = f h = y 1 (y) d h dy 1 e 1=( y) ( ) 1 (y) for y 2 B and 0 otherwise. This is the probability density function of an Inverse Gamma( ; ) random variable. Therefore Y = X 1 Inverse Gamma( ; ). 7:(c) Since X Gamma(k; ) the probability density function of X is f (x) = xk 1 e x= (k) k for x > 0 10.1. CHAPTER 2 359 and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = 2x= = h(x) is a one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also x=h 1 y d and h 2 dy (y) = 1 (y) = 2 The probability density function of Y is 1 g (y) = f h yk = d h dy (y) 1 e y=2 1 (y) for y 2 B (k) 2k and 0 otherwise for k = 1; 2; : : : which is the probability density function of a 2 (2k). variable. Therefore Y = 2X= 7:(d) Since X N ; 2 2 (2k) random the probability density function of X is f (x) = p 1 2 )2 1 2 (x e2 for x 2 < Let A = fx : f (x) > 0g = <. Now y = ex = h(x) is a one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also x=h 1 d h dy (y) = log y and 1 (y) = 1 y The probability density function of Y is 1 g (y) = f h 1 p y 2 = d h dy (y) 1 2 (log y e2 1 (y) )2 y2B Note this distribution is called the Lognormal distribution. 7:(e) Since X N ; 2 the probability density function of X is f (x) = p 1 2 e2 )2 1 2 (x for x 2 < Let A = fx : f (x) > 0g = <. Now y = x 1 = h(x) is a one-to-one function on A and h maps the set A to the set B = fy : y 6= 0; y 2 <g. Also x=h 1 1 d and h y dy (y) = 1 (y) = 1 y2 The probability density function of Y is g (y) = f h = p 1 (y) 1 2 y2 e2 d h dy 1 2 h 1 y 1 (y) i2 for y 2 B 360 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 7:(f ) Since X Uniform ; 2 2 the probability density function of X is f (x) = 1 for <x< 2 2 and 0 otherwise. Let A = fx : f (x) > 0g = x : 2 < x < 2 . Now y = tan(x) = h(x) is a one-to-one function on A and h maps A to the set B = fy : 1 < y < 1g. Also d 1 x = h 1 (y) = arctan(y) and dy h 1 (y) = 1+y 2 . The probability density function of Y is g (y) = f h 1 = 1 d h dy (y) 1 1 + y2 1 (y) for y 2 < and 0 otherwise. This is the probability density function of a Cauchy(1; 0) random variable. Therefore Y = tan(X) Cauchy(1; 0). 7:(g) Since X Pareto( ; ) the probability density function of X is f (x) = for x ; ; >0 x +1 and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = log (x= ) = h(x) is a one-to-one function on A and h maps A to the set B = fy : 0 y < 1g. Also x = d h 1 (y) = ey= . The probability density function of Y is h 1 (y) = ey= and dy g (y) = f h 1 ey= = d h dy (y) +1 ey= 1 (y) =e y for y 2 B and 0 otherwise. This is the probability density function of a Exponential(1) random variable. Therefore Y = log (X= ) Exponential(1). 7:(h) If X Weibull(2; ) the probability density function of X is f (x) = 2xe (x= )2 for x > 0; 2 >0 and 0 otherwise. Let A = fx : f (x) > 0g = fx : x > 0g. Now y = x2 = h(x) is a one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also x=h 1 (y) = y 1=2 and d h dy 1 (y) = The probability density function of Y is g (y) = f h = e y= 2 1 (y) d h dy 1 2 for y 2 B (y) 1 2y 1=2 10.1. CHAPTER 2 361 and 0 otherwise for k = 1; 2; : : : which is the probability density function of a Exponential random variable. Therefore Y = X 2 Exponential( 2 ) 7:(i) Since X 2 Double Exponential(0; 1) the probability density function of X is 1 f (x) = e 2 jxj for x 2 < The cumulative distribution function of Y = X 2 is p y X G (y) = P (Y y) = P X 2 y = P ( Z py 1 jxj = e dx p 2 y Z py e x dx by symmetry for y > 0 = p y) 0 By the First Fundamental Theorem of Calculus and the chain rule the probability density function of Y is p d p d G (y) = e y y dy dy p 1 p e y for y > 0 2 y g (y) = = and 0 otherwise. 7:(j) Since X t(k) the probability density function of X is f (x) = k+1 2 k 2 1 p k ( k+1 2 ) x2 1+ k for x 2 < which is an even function. The cumulative distribution function of Y = X 2 is G (y) = P (Y p = Z y p = 2 y p Zy 0 y) = P X 2 y =P( p y k+1 2 k 2 1 p k x2 1+ k ( k+1 2 ) k+1 2 k 2 1 p k x2 1+ k ( k+1 2 ) X p y) dx dx by symmetry for y > 0 By the First Fundamental Theorem of Calculus and the chain rule the probability density 362 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS function of Y is g (y) = d G (y) dy = 2 = 2 = k+1 1 2 p k k 2 k+1 1 2 p k p k 2 k+1 1 2 k 1 k 2 2 1+ y k ( k+1 2 ) d p y dy 1+ y k ( k+1 2 ) 1=2 y 1 p 2 y 1 2 1 1 1+ y k for y > 0 for y > 0 ( k+1 2 ) for y > 0 since 1 2 = p and 0 otherwise. This is the probability density function of a F(1; k) random variable. Therefore Y = X 2 F(1; k). 10.1. CHAPTER 2 363 8:(a) Let cn = If T n+1 2 n 2 1 p n t (n) then T has probability density function t2 n f (t) = cn 1 + (n+1)=2 for t 2 <; n = 1; 2; ::: Since f ( t) = f (t), f is an even function whose graph is symmetric about the y axis. Therefore if E (jT j) exists then due to symmetry E (T ) = 0. To determine when E (jT j) exists, again due to symmetry, we only need to determine for what values of n the integral Z1 t2 t 1+ n (n+1)=2 dt 0 converges. There are two cases to consider: n = 1 and n > 1. For n = 1 we have Z1 t 1 + t2 1 dt = lim b!1 1 1 ln 1 + t2 jb0 = lim ln 1 + b2 = 1 b!1 2 2 0 and therefore E (T ) does not exist. For n > 1 Z1 t 1+ t2 n (n+1)=2 (n 1)=2 n t2 1+ b!1 n 1 n " n b2 = 1 lim 1 + b!1 n 1 n n = n 1 dt = 0 lim jb0 (n 1)=2 # and the integral converges. Therefore E (T ) = 0 for n > 1. 8:(b) To determine whether V ar(T ) = E T 2 (since E (T ) = 0) exists we need to determine for what values of n the integral Z1 0 converges. t 2 t2 1+ n (n+1)=2 dt 364 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Now Z1 t2 1 + (n+1)=2 t2 n dt 0 p Zn t2 t2 1 + = n (n+1)=2 dt + Z1 p 0 t2 1 + (n+1)=2 t2 n dt n The …rst integral is …nite since it is the integral of a …nite function over the …nite interval p [0; n]. We will show that the second integral Z1 p diverges for n = 1; 2. Now Z1 p 2 t (n+1)=2 t2 1+ n 2 t dt n (n+1)=2 t2 1+ n p let y = t= n dt n =n 3=2 Z1 y2 1 + y2 (n+1)=2 dy (10.9) 1 For n = 1 and since y2 (1 + y 2 ) y2 1 = 2 2 (y + y ) 2 Z1 for y 1 1 dy 2 1 diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 1. (Note: For n = 1 we could also argue that V ar(T ) does not exist since E (T ) does not exist for n = 1.) For n = 2, y2 y2 1 = 3=2 for y 1 3=2 3=2 2 y (1 + y 2 ) (y 2 + y 2 ) and since 1 23=2 Z1 1 dy y 1 diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 2. 10.1. CHAPTER 2 365 Now for n > 2, E T 2 Z1 cn t t2 1+ n = 2cn Z1 t2 n = 1 2 t2 1 + (n+1)=2 dt (n+1)=2 dt 0 since the integrand is an even function. Integrate by parts using u = t, dv = t 1 + du = dt, v = (n+1)=2 t2 n n n 1 dt 1+ (n 1)=2 t2 n Then E T 2 = 2cn " lim t b!1 n +2cn n 1 Z1 (n+1)=2 t2 1+ n n n 1 1+ jb0 # (n 1)=2 t2 n dt 0 = n 2cn n 1 b lim b!1 1+ b2 n (n+1)=2 + cn Z1 n n 1 1+ (n 1)=2 t2 n dt 1 where we use symmetry on the second integral. Now b 1 lim = lim (n+1)=2 b!1 b!1 n+1 2 1 + bn b 1+ n b2 n (n 1)=2 =0 by L’Hopital’s Rule. Also cn n n 1 Z1 t2 1+ n (n 1)=2 dt 1 = = cn cn n 2 n cn cn 1 n n 2 n 1=2 n 2 1=2 n 1 n Z1 cn 2 y t let p =p n n 2 1+ (n 2+1)=2 y2 n 2 dy 1 2 where the integral equals one since the integrand is the p.d.f. of a t (n 2) random variable. 366 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Finally cn cn = = = n 2 p n 2 1 1) 2 n n n n 2 2 2 p (n 2) n n 2+1 2 n n+1 2 n+1 1 2 n+1 1 2 n+1 2 (n 1=2 n n+1 2 = = n n 2 1 n n 2 n+1 2 1 1=2 2 n n 2 1 n 2 1 1 2 V ar(T ) = E T 2 = 1 n n 1 Therefore for n > 2 n n 2 2 1=2 n 1 n 2 n n n n n 1 1 (n 2) =2 n 1=2 n n 1 2 10.1. CHAPTER 2 367 9: (a) To …nd E X k we …rst note that since (a + b) a x (a) (b) f (x) = 1 x)b (1 1 for 0 < x < 1 and 0 otherwise then Z1 (a + b) a x (a) (b) 1 (1 x)b 1 dx = 1 0 or Z1 xa 1 (1 x)b 1 (a) (b) (a + b) dx = (10.10) 0 for a > 0, b > 0. Therefore E X k = (a + b) (a) (b) Z1 xk xa 1 (1 x)b 1 dx (a + b) (a) (b) Z1 xa+k 1 (1 x)b 1 dx 0 = 0 (a + b) (a + k) (b) by (10:10) (a) (b) (a + b + k) (a + b) (a + k) for k = 1; 2; : : : (a) (a + b + k) = = For k = 1 we have E Xk (a + 1) (a + b) (a) (a + b + 1) (a + b) a (a) (a) (a + b) (a + b) a a+b = E (X) = = = For k = 2 we have E X2 = = = (a + 2) (a + b) (a) (a + b + 2) (a + 1) (a) (a) (a + b) (a) (a + b + 1) (a + b) (a + b) a (a + 1) (a + b) (a + b + 1) 368 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Therefore V ar (X) = E X 2 = = = = [E (X)]2 2 a (a + 1) a (a + b) (a + b + 1) a+b 2 a (a + 1) (a + b) a (a + b + 1) (a + b)2 (a + b + 1) a a2 + ab + a + b a2 + ab + a (a + b)2 (a + b + 1) ab 2 (a + b) (a + b + 1) 9: (b) 4.5 4 3.5 3 a=1,b=3 f (x) a=3,b=1 2.5 a=0.7,b=0.7 2 a=2,b=4 1.5 a=2,b=2 1 0.5 0 0 0.2 0.4 0.6 0.8 1 x Figure 10.7: Graphs of Beta probability density functions 9: (c) If a = b = 1 then f (x) = 1 for 0 < x < 1 and 0 otherwise. This is the Uniform(0; 1) probability density function. 10.1. CHAPTER 2 369 10: We will prove this result assuming X is a continuous random variable. The proof for X a discrete random variable follows in a similar manner with integrals replaced by sums. Suppose X has probability density function f (x) and E jXjk exists for some integer k > 1. Then the improper integral Z1 1 converges. Let A = fx : jxj Z1 1 jxjk f (x) dx 1g. Then jxjk f (x) dx = Z jxjk f (x) dx + A Z jxjk f (x) dx A Since jxjk f (x) 0 we have 0 Z k jxj f (x) dx A Convergence of 1 Now Z1 1 R1 j f (x) for x 2 A Z f (x) dx = P X 2 A jxj f (x) dx = (10.11) A jxjk f (x) dx and (10.11) imply the convergence of Z 1 j jxj f (x) dx + A Z R A jxjk f (x) dx. jxjj f (x) dx for j = 1; 2; :::; k 1 (10.12) A and 0 Z jxjj f (x) dx 1 A by the same argument as in (10.11). Since R A jxjk f (x) jxjj f (x) jxjk f (x) dx converges and for x 2 A, j = 1; 2; :::; k then by the Comparison Theorem for Improper Integrals R A both integrals on the right side of (10.12) exist, therefore E jXj j = Z1 1 1 jxjj f (x) dx converges. Since jxjj f (x) dx exists for j = 1; 2; :::; k 1 370 11: If X 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Binomial(n; ) then E (X) = n and p p V ar (X) = n (1 Let W = X=n. Then ) E (W ) = and (1 ) n From the result in Section 2:9 we wish to …nd a transformation Y = g (W ) = g (X=n) such that V ar (Y ) constant then we need g such that V ar (W ) = dg k =p d (1 ) where k is chosen for convenience. We need to solve the separable di¤erential equation Z Z 1 d (10.13) dg = k p (1 ) Since p d arcsin x dx = = = then the solution to (10.13) is q 1 d p x p 2 dx ( x) 1 1 1 x 1 p 2 x (1 p 1 p 2 x x) p g ( ) = k 2 arcsin( ) + C p Letting k = 1=2 and C = 0 we have g ( ) = arcsin( ). p Therefore if hX Binomial(n; ) and Y = arcsin( X=n) then i p V ar (Y ) = V ar arcsin( X=n) constant. 10.1. CHAPTER 2 371 13:(b) 1 P M (t) = E(etx ) = = e x! x=0 et = e x! x=0 x et 1 P = e xe etx e by the Exponential Series (et 1) for t 2 < t M 0 (t) = e (e 1) et E(X) = M 0 (0) = t M 00 (t) = e (e 1) t ( et )2 + e (e E(X 2 ) = M 00 (0) = 2 2 et + [E(X)]2 = V ar(X) = E(X ) 1) 2 2 + = 13:(c) M (t) = E(etx ) = = e = Z1 Z1 x e 1 etx e 1 t Let (x dx dx 1 which converges for 1 y= )= 1 t x; dy = t > 0 or t < t dx to obtain M (t) = = = = e Z1 = e 1 e = t x 1 t Z1 1 dx = e y e = dy = Z1 e (1 e x 1 t dx = lim t) b!1 e y b j 1 t t 1 e= lim e (1 t) b!1 t e 1 for t < (1 t) t e b = e (1 = 1 t) e t 1 372 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS M 0 (t) = t e (1 t) e t t)2 (1 t e = + [ (1 (1 t)2 E(X) = M 0 (0) = + e M 00 (t) = t (1 t)2 E(X 2 ) = M 00 (0) = = + 2 ( V ar(X) = E(X 2 ) t + (1 t)2 + ( + 2 )( + ) +3 = ( + )2 + e )+ t) + ] 2 +2 = 2 2 et [ (1 (1 t)3 +2 +2 t) + ] 2 2 [E(X)]2 = ( + )2 + 2 ( + )2 2 = 13:(d) M (t) = = = = = = Z1 1 etx e jx j dx E(e ) = 2 1 2 3 Z Z1 14 etx ex dx + etx e (x ) dx5 2 1 3 2 Z1 Z 14 e x(1 t) dx5 ex(t+1) dx + e e 2 1 " ! !# 1 e (t+1) e (1 t) e +e for t + 1 > 0 and 1 2 t+1 1 t tx 1 et et + for t 2 ( 1; 1) 2 t+1 1 t et for t 2 ( 1; 1) 1 t2 M 0 (t) = = t e 1 t2 e + e t (2t) (1 t2 )2 [ 1 t2 t (1 t2 )2 E(X) = M 0 (0) = + 2t] t>0 10.1. CHAPTER 2 373 e M 00 (t) = t [ 2t + 2] + (1 t2 )2 E(X 2 ) = M 00 (0) = 2 + V ar(X) = E(X 2 ) = 2+ t e (1 t2 )2 + e t (4t) (1 t2 )4 (1 t2 ) + 2t 2 [E(X)]2 2 2 = 2 13:(e) tX M (t) = E e = Z1 2xetx xdx 1 t etx + C 0 Since Z xetx dx = M (t) = = = 1 t x 1 tx 1 2 x e j0 t t 2 1 t 2 1 e t t t t 2 (t 1) e + 1 ; t2 For t = 0, M (0) = E (1) = 1. Therefore 8 <1 M (t) = 2[(t : 1 t if t 6= 0 if t = 0 1)et +1] t2 if t 6= 0 Note that 1) et + 1 0 indeterminate of the form 2 t!0 t 0 t t 2 e + (t 1) e = lim by Hospital’s Rule t!0 2t et + (t 1) et 0 = lim indeterminate of the form t!0 t 0 t t t = lim e + e + (t 1) e by Hospital’s Rule lim M (t) = lim t!0 2 (t t!0 = 1+1 1=1 = M (0) Therefore M (t) exists and is continuous for all t 2 <. 374 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Using the Exponential series we have for t 6= 0 2 (t 1) et + 1 t2 = = = = = = 1 ti P 2 (t 1) +1 t2 i=0 i! 1 ti+1 1 ti P 2 P +1 t2 i=0 i! i=0 i! 1 ti+1 1 ti P P 2 t + 1 + t + t2 i=1 i! i=2 i! i+1 i 1 1 Pt 2 Pt 2 t i=1 i! i=2 i! i+1 1 1 P ti+1 2 Pt t2 i=1 i! i=1 (i + 1)! 1 1 2 P 1 ti+1 t2 i=1 i! (i + 1)! +1 and since 1 2 P 1 2 t i=1 i! 1 ti+1 jt=0 = 1 (i + 1)! therefore M (t) has a Maclaurin series representation for all t 2 < given by 1 1 1 2 P ti+1 t2 i=1 i! (i + 1)! 1 P 1 1 = 2 ti 1 i! (i + 1)! i=1 1 P 1 1 = 2 ti (i + 1)! (i + 2)! i=0 Since E X k = k! the coe¢ cient of tk in the Maclaurin series for M (t) we have E (X) = (1!) (2) 1 (1 + 1)! 1 =2 (1 + 2)! 1 2 1 6 E X 2 = (2!) (2) 1 (2 + 1)! 1 =4 (2 + 2)! 1 6 1 24 = 2 3 and Therefore V ar (X) = E X 2 1 [E (X)] = 2 2 2 3 2 = 1 18 = 1 2 10.1. CHAPTER 2 375 Alternatively we could …nd E (X) = M 0 (0) using the limit de…nition of the derivative M 0 (0) = lim M (t) 2[(t 1)et +1] t2 = lim 2 (t t!0 = 2 3 1 t 1) et + 1 t3 t!0 = lim M (0) t t!0 t2 using L’Hospital’s Rule Similarly E X 2 = M 00 (0) could be found using M 00 (0) = lim M 0 (t) M 0 (0) t t!0 where ! 1) et + 1 t2 2 (t d M 0 (t) = dt for t 6= 0. 13:(f ) M (t) = E e tX Z1 = tx e xdx + 0 Z1 = Since Z M (t) = = 1 t 1 t = e2t = e2t xetx dx = Z2 etx dx x 1 t 1 1 t e x) dx Z2 xetx dx: 1 etx + C 1 tx 1 2 tx 2 1 1 e j0 + e j1 x t t t t 1 t 1 1 2 2t 1 e + e et t t t t 1 1 2 1 1 + 2 + et 1 t t t t t t 2e + 1 for t 6= 0 t2 x For t = 0, M (0) = E (1) = 1. Therefore ( 1 M (t) = 2t etx (2 1 xetx dx + 2 0 Z2 etx j21 1 2t 1 2 e 1 t t 2 1 1 1 + 1 + 2 t t t t if t = 0 2et +1 t2 if t 6= 0 1 t et 376 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Note that 2et + 1 0 , indeterminate of the form , use Hospital’s Rule 2 t!0 t 0 2e2t 2et 0 = lim , use Hospital’s Rule , indeterminate of the form t!0 2t 0 2e2t et = lim =2 1=1 t!0 1 lim M (t) = lim t!0 e2t and therefore M (t) exists and is continuous for t 2 <. Using the Exponential series we have for t 6= 0 1 (2t)2 (2t)3 (2t)4 f1 + 2t + + + + t2 2! 3! 4! t2 t3 t4 2 1+t+ + + + + 1g 2! 3! 4! 7 = 1 + t + t2 + 12 2et + 1 t2 e2t = and since 1+t+ 7 2 t + 12 (10.14) jt=0 = 1 (10.14) is the Maclaurin series representation for M (t) for t 2 <. Since E X k = k! the coe¢ cient of tk in the Maclaurin series for M (t) we have E (X) = 1! 7 7 = 12 6 1 = 1 and E X 2 = 2! Therefore 7 1 1= 6 6 0 Alternatively we could …nd E (X) = M (0) using the limit de…nition of the derivative [E (X)]2 = V ar (X) = E X 2 M 0 (0) = lim M (t) M (0) t 2et + 1 = lim t!0 t3 7 = lim 1 + t + t!0 12 t!0 e2t = lim e2t 2et +1 t2 t!0 t2 = lim t2 1 t + t3 + t!0 7 4 12 t t3 + t2 =1 Similarly E X 2 = M 00 (0) could be found using M 00 (0) = lim M 0 (t) t!0 M 0 (0) t where M 00 (t) = d dt e2t 2et + 1 t2 = 2 t e2t et e2t + 2et t3 1 for t 6= 0 10.1. CHAPTER 2 377 14:(a) K(t) = log M (t) M 0 (t) K 0 (t) = M (t) 0 (0) M K 0 (0) = M (0) E(X) = 1 = E(X) since M (0) = 1 K 00 (t) = M (t)M 00 (t) [M 0 (t)]2 [M (t)]2 K 00 (0) = M (0)M 00 (0) [M 0 (0)]2 [M (0)]2 [E(X)]2 E(X 2 ) = 1 = V ar(X) 14:(b) If X Negative Binomial(k; p) then M (t) = p 1 qet k for t < log q; q=1 p Therefore p 1 qet k log 1 qet for t < K(t) = log M (t) = k log = k log p log q qet kqet = 1 qet 1 qet kq kq E(X) = K 0 (0) = = 1 q p K 0 (t) = K 00 (t) = kq k " 1 qet et (1 V ar(X) = K 00 (0) = kq et qet )2 qet # 1 q+q kq = 2 2 p (1 q) 378 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 15:(b) 1+t 1 t M (t) = = (1 + t) 1 P = 1 P tk k=0 1 P tk + for jtj < 1 by the Geometric series tk+1 k=0 2 k=0 1 + t + t + ::: t + t2 + t3 + ::: 1 P 2tk for jtj < 1 = 1+ = (10.15) k=1 Since M (t) = 1 M (k) (0) 1 E(X k ) P P tk = tk k! k! k=0 k=0 (10.16) then by matching coe¢ cients in the two series (10.15) and (10.16) we have E(X k ) = 2 for k = 1; 2; ::: k! or E(X k ) = 2k! for k = 1; 2; ::: 15:(c) M (t) = et 1 = = t2 ti i=0 i! 1 P 1+ = 1+ + 1+ 1 P t2k for jtj < 1 by the Geometric series k=0 t2 t t3 + + + ::: 1 + t2 + t4 + ::: 1! 2! 3! 1 1 1 1 t+ 1+ t2 + + t3 1! 2! 1! 3! 1 1 + 2! 4! Since M (t) = t4 + 1 1 1 + + 1! 3! 5! for jtj < 1 t5 + ::: for jtj < 1 1 M (k) (0) 1 E(X k ) P P tk = tk k! k! k=0 k=0 then by matching coe¢ cients in the two series we have E X 2k E X 2k+1 k P 1 for k = 1; 2; ::: (2i)! i=0 k P 1 = (2k + 1)! for k = 1; 2; ::: (2i + 1)! i=0 = (2k)! 10.1. CHAPTER 2 379 16: (a) MY (t) = E e = = tY =E e 2 p 2 etz e 2 p 2 Z1 = 1 etjzj p e 2 2 e (z z 2 =2 since etjzj e dz z 2 =2 z 2 =2 dz is an even function 0 2 =2 = 2et t2 =2 = 2e t2 =2 Z1 2 2zt)=2 0 Z1 0 where Z1 1 Z1 = 2e tjZj Zt 1 1 p e 2 1 p e 2 (t) 2et =2 dz = p 2 (z t)2 =2 y 2 =2 dz 2 e (z 2zt t2 )=2 0 let y = (z t) ; dy = dy for t 2 < is the N(0; 1) cumulative distribution function. 16: (b) To …nd E (Y ) = E (jZj) we …rst note that (0) = d (t) = dt and 1 2 1 (t) = p e 2 t2 =2 1 (0) = p 2 Then i d h t2 =2 d MY (t) = 2e (t) dt dt 2 2 = 2tet =2 (t) + 2et =2 (t) Therefore E (Y ) = E (jZj) d = MY (t) jt=0 dt h = 2 =2 2tet 2 = 0+ p 2 dz 2 =2 (t) + 2et r 2 = i (t) jt=0 dz 380 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS To …nd V ar (Y ) = V ar (jZj) we note that d (t) = dt 0 = d dt (t) 1 p e 2 te p = t2 =2 t2 =2 2 and 0 (0) = 0 Therefore d2 MY (t) = dt2 d d MY (t) dt dt i d h t2 =2 2 = 2te (t) + 2et =2 (t) dt d 2 = 2 et =2 [t (t) + (t)] dt n = 2 and E Y2 2 =2 et t (t) + d2 MY (t) jt=0 dt2 = 2 (1) 0 + (0) + (t) + 0 2 =2 (t) + tet o [t (t) + (t)] = 0 (0) + (0) [(0) = 2 (0) 1 = 2 =1 2 Therefore V ar (Y ) = V ar (jZj) = E Y2 [E (Y )]2 r !2 2 = 1 = 1 = 2 2 (0) + (0)] 10.1. CHAPTER 2 381 18: Since MX (t) = E(etx ) 1 P = ejt pj j=0 for jtj < h; h > 0 then 1 P MX (log s) = = j=0 1 P ej log s pj s j pj j=0 for j log sj < h; h > 0 which is a power series in s. Similarly MY (log s) = 1 P sj qj j=0 for j log sj < h; h > 0 which is also a power series in s. We are given that MX (t) = MY (t) for jtj < h; h > 0. Therefore MX (log s) = MY (log s) for j log sj < h; h > 0 and 1 P j=0 s j pj = 1 P j=0 s j qj for j log sj < h; h > 0 Since two power series are equal if and only if their coe¢ cients are all equal we have pj = qj , j = 0; 1; ::: and therefore X and Y have the same distribution. 382 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 19: (a) Since X has moment generating function et M (t) = for jtj < 1 t2 1 the moment generating function of Y = (X 1) =2 is h = E et(X MY (t) = E etY t = e t=2 E e ( 2 )X = e t=2 M = e t=2 = i t 2 et=2 ;= t 2 2 1 1 t <1 2 for for jtj < 2 1 2 4t 1 1)=2 19: (b) 2 1 2 t 4 M 0 (t) = ( 1) 1 1 t 2 1 = t 1 2 1 2 t 4 2 E(X) = M 0 (0) = 0 M 00 (t) = 1 2 " 1 E(X 2 ) = M 00 (0) = V ar(X) = E(X 2 ) 1 2 t 4 2 1 2 t 4 + t ( 2) 1 3 1 t 2 # 1 2 [E(X)]2 = 1 2 0= 1 2 19: (c) Since the moment generating function of a Double Exponential( ; ) random variable is e t 1 M (t) = for jtj < 2 2 1 t and the moment generating function of Y is MY (t) = 1 1 1 2 4t for jtj < 4 therefore by the Uniqueness Theorem for Moment Generating Functions Y has a Double Exponential 0; 12 distribution. 10.2. CHAPTER 3 10.2 383 Chapter 3 1:(a) 1 k = 1 P 1 P q 2 px+y = q 2 y=0 x=0 = q2 = q 1 P 1 P py y=0 py y=0 1 P py 1 1 1 P px x=0 by the Geometric Series since 0 < p < 1 p since q = 1 p y=0 = q 1 1 by the Geometric Series p = 1 Therefore k = 1. 1:(b) The marginal probability function of X is f1 (x) = P (X = x) = P f (x; y) = y = q 2 px = qpx 1 1 P q 2 px+y = q 2 px 1 P y=0 y=0 py ! by the Geometric Series 1 p for x = 0; 1; ::: By symmetry marginal probability function of Y is f2 (y) = qpy for y = 0; 1; ::: The support set of (X; Y ) is A = f(x; y) : x = 0; 1; :::; y = 0; 1; :::g. Since f (x; y) = f1 (x) f2 (y) for (x; y) 2 A therefore X and Y are independent random variables. 1:(c) P (X = xjX + Y = t) = P (X = x; X + Y = t) P (X = x; Y = t x) = P (X + Y = t) P (X + Y = t) Now P (X + Y = t) = P P q 2 px+y = q 2 (x;y): x+y=t = q 2 pt (t + 1) t P px+(t x=0 for t = 0; 1; ::: x) = q 2 pt t P x=0 Therefore P (X = xjX + Y = t) = 1 q 2 px+(t x) = 2 t q p (t + 1) t+1 for x = 0; 1; :::; t 1 384 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 2:(a) f (x; y) = 2 e x! (y x)! for x = 0; 1; :::; y; y = 0; 1; ::: OR f (x; y) = f1 (x) = 2 e x! (y P x)! f (x; y) = y = = = Note that X 1 P e y=x x! (y 1 e 2 P 1 x! y=x (y x)! 2 x)! let k = y x 1 1 e 2 e1 e 2 P = x! k=0 k! x! e 1 x! x = 0; 1; ::: by the Exponential Series Poisson(1). f2 (y) = P f (x; y) = x = = = = Note that Y for y = x; x + 1; :::; x = 0; 1; ::: 2 y P y P e x=0 x! (y 2 x)! y! x)! x=0 x!(y 2 y P y x e 1 y! x=0 x e y! e 2 (1 + 1)y by the Binomial Series y! 2y e 2 for y = 0; 1; ::: y! Poisson(2). 2:(b) Since (for example) f (1; 2) = e 2 6= f1 (1)f2 (2) = e 2 2 12 e 2! = 2e 3 therefore X and Y are not independent random variables. OR The support set of X is A1 = fx : x = 0; 1; :::g, the support set of Y is A2 = fy : y = 0; 1; :::g and the support set of (X; Y ) is A = f(x; y) : x = 0; 1; :::; y; y = 0; 1; :::g. Since A 6= A1 A2 therefore by the Factorization Theorem for Independence X and Y are not independent random variables. 10.2. CHAPTER 3 385 3:(a) The support set of (X; Y ) (x; y) : 0 < y < 1 x2 ; 1<x<1 n o p p = (x; y) : 1 y < x < 1 y; 0 < y < 1 A = is pictured in Figure 10.8. y y =1-x 2 A -1 1 0 x Figure 10.8: Support set for Problem 3(a) 1 = Z1 Z1 1 = k f (x; y)dxdy = k 1 Z1 Z Z 1 x y + y 2 j10 2 2 x2 dx = k = k 0 = k Z1 1Z x2 Z1 x2 1 x2 + 1 1 2 x2 1 h 2x2 1 1 x2 + 1 x4 dx = k x x2 2 i dx by symmetry 4 5 1 51 x j = k and thus k = 5 0 5 4 0 Therefore f (x; y) = x2 + y dydx x= 1 y=0 (x;y) 2A 1 Z1 x2 + y dydx = k Z1 5 2 x +y 4 for (x; y) 2 A 2 dx 386 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 3:(b) The marginal probability density function of X is Z1 f1 (x) = f (x; y)dy 1 1Z x2 = 5 4 = 5 1 8 x2 + y dy 0 x4 for 1<x<1 and 0 otherwise. The support set of X is A1 = fx : 1 < x < 1g. The marginal probability density function of Y is f2 (y) = Z1 1 = f (x; y)dx p Z1 5 4 p y x2 + y dx 1 y p = Z1 y 5 2 x2 + y dx because of symmetry 0 = = = = p 5 1 3 1 y x + yxj0 2 3 5 1 (1 y)3=2 + y (1 y)1=2 2 3 5 (1 y)1=2 [(1 y) + 3y] 6 5 (1 y)1=2 (1 + 2y) for 0 < y < 1 6 and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g. 3:(c) The support set A of (X; Y ) is not rectangular. To show that X and Y are not independent random variables we only need to …nd x 2 A1 , and y 2 A2 such that (x; y) 2 = A. Let x = 3 4 and y = 12 . Since f 3 1 ; 4 2 = 0 6= f1 3 4 f2 1 2 therefore X and Y are not independent random variables. >0 10.2. CHAPTER 3 387 3:(d) The region of integration is pictured in Figure 10.9. y =x +1 y 2 y =1-x C B -1 1 0 x Figure 10.9: Region of integration for Problem 3 (d) P (Y X + 1) = Z Z 5 2 x + y dydx = 1 4 (x;y) 2B = 1 Z Z 5 2 x + y dydx 4 (x;y) 2C Z0 1Z x2 5 2 x + y dydx 4 x= 1 y=x+1 = 1 5 4 Z0 2 1 x2 y + y 2 j1x+1x dx 2 x= 1 = 1 5 8 Z0 n [2x2 (1 x2 ) + 1 2 x2 ] 1 Z0 = 1 5 8 = 1 5 1 1 ( 1) + + ( 1) + 1 = 1 8 5 2 x4 2x3 3x2 o [2x2 (x + 1) + (x + 1)2 ] dx 2x dx = 1 + 5 1 5 1 4 x + x + x3 + x2 j0 1 8 5 2 1 = 13 16 5 8 2+5 10 =1 3 16 388 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 4:(a) The support set of (X; Y ) (x; y) : x2 < y < 1; 1<x<1 p p y < x < y; 0 < y < 1g = f(x; y) : A = is pictured in Figure 10.10. y y=x 2 y=1 1 A | -1 | 1 0 x Figure 10.10: Support set for Problem 4 (a) 1 = Z1 Z1 1 = k f (x; y)dxdy = k 1 k 2 x2 ydydx (x;y) 2A Z1 Z1 Z1 x2 1 2 x ydydx = k x= 1 y=x2 = Z Z Z1 x2 1 21 y j 2 dx 2 x x= 1 x4 dx = k 1 3 x 2 3 1 71 x j 1 7 x= 1 = = k 1 2 3 4k 21 1 7 1 1 ( 1) + ( 1) = k 3 7 Therefore k = 21=4 and f (x; y) = and 0 otherwise. 21 2 x y 4 for (x; y) 2 A 1 3 1 7 10.2. CHAPTER 3 389 4:(b) The marginal probability density function of X is 21 4 f1 (x) = Z1 x2 21x2 = x2 ydy h y 2 j1x2 8 21x2 1 8 = i x4 for and 0 otherwise. The support set of X is A1 = fx : 1<x<1 1 < x < 1g. The marginal probability density function of Y p f2 (y) = 21 4 x= = = = = Z Zy p p x2 ydx y y 21 x2 ydx because of symmetry 2 0 7 h 3 py i y x j0 2 7 y y 3=2 2 7 5=2 y for 0 < y < 1 2 and 0 otherwise. The support set of Y is is A2 = fy : 0 < y < 1g. The support set A of (X; Y ) is not rectangular. To show that X and Y are not independent random variables we only need to …nd x 2 A1 , and y 2 A2 such that (x; y) 2 = A. Let x = 1 2 and y = 1 10 . Since f 1 1 ; 2 10 = 0 6= f1 1 2 f2 1 10 therefore X and Y are not independent random variables. >0 390 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 4:(c) The region of integration is pictured in Figure 10.11. y y=x y=1 2 1 y=x B | -1 | 1 0 x Figure 10.11: Region of intergration for Problem 4 (c) P (X Y) = Z Z 21 2 x ydydx 4 (x;y) 2B = Z1 Zx 21 2 x ydydx 4 x=0 y=x2 = 21 8 Z1 x2 y 2 jxx2 dx 21 8 Z1 x2 x2 21 8 Z1 0 = x4 dx 0 = x4 x6 dx 0 = = = 21 1 5 1 7 1 x x j 8 5 7 0 21 7 5 8 35 3 20 10.2. CHAPTER 3 391 4:(d) The conditional probability density function of X given Y = y is f (x; y) f2 (y) 21 2 4 x y 7 5=2 2y 3 2 3=2 x y 2 f1 (xjy) = = = p for y<x< p y; 0 < y < 1 and 0 otherwise. Check: Z1 p f1 (xjy) dx = 1 1 y 2 3=2 Zy p 3x2 dx y h p i y = y 3=2 x3 j0 = y 3=2 3=2 y = 1 The conditional probability density function of Y given X = x is f2 (yjx) = = = f (x; y) f1 (x) 21 2 4 x y 21x2 8 (1 x4 ) 2y for x2 < y < 1; (1 x4 ) 1<x<1 and 0 otherwise. Check: Z1 f2 (yjx) dy = 1 1 x4 ) (1 Z1 2ydy x2 = 1 (1 x4 ) 1 x4 = 1 x4 = 1 y 2 j1x2 392 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 6:(d) (i) The support set of (X; Y ) A = f(x; y) : 0 < x < y < 1g is pictured in Figure 10.12. y y=1 1 y=x A | 1 0 x Figure 10.12: Graph of support set for Problem 6(d) (i) 1 = Z1 Z1 1 = f (x; y) dxdy = k 1 Z1 Zy (x + y) dydx (x;y) 2A k (x + y) dxdy = k y=0 x=0 = k Z Z Z1 1 2 y + y 2 dx 2 Z1 k 31 3 2 y dy = y j0 2 2 Z1 1 2 x + xy jyx=0 dy 2 y=0 0 = k 0 = k 2 Therefore k = 2 and f (x; y) = 2 (x + y) and 0 otherwise. for (x; y) 2 A 10.2. CHAPTER 3 393 6:(d) (ii) The marginal probability density function of X is Z1 f1 (x) = f (x; y) dy = 2 (x + y) dy y=x 1 = Z1 2xy + y 2 j1y=x 2 2x + x2 = (2x + 1) = 1 + 2x 3x2 for 0 < x < 1 and 0 otherwise. The support set of X is A1 = fx : 0 < x < 1g. The marginal probability density function of Y is Z1 f2 (y) = f (x; y) dx = 1 2 Zy 2 (x + y) dx x=0 x + 2yx jyx=0 = = y 2 + 2y 2 = 3y 2 0 for 0 < y < 1 and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g. 6:(d) (iii) The conditional probability density function of X given Y = y is f1 (xjy) = = Check: Z1 f1 (xjy) dx = 1 Zy f (x; y) f2 (y) 2 (x + y) 3y 2 for 0 < x < y < 1 2 (x + y) 1 dx = 2 2 3y 3y x=0 Zy 2 (x + y) dx = 3y 2 =1 3y 2 x=0 The conditional probability density function of Y given X = x is f2 (yjx) = = f (x; y) f1 (x) 2 (x + y) 1 + 2x 3x2 for 0 < x < y < 1 and 0 otherwise. Check: Z1 1 f2 (yjx) dy = Z1 y=x 2 (x + y) 1 dy = 2 1 + 2x 3x 1 + 2x 3x2 Z1 y=x 2 (x + y) dx = 1 + 2x 1 + 2x 3x2 =1 3x2 394 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 6:(d) (iv) E (Xjy) = Z1 xf1 (xjy) dx = Zy x 1 2 (x + y) dx 3y 2 x=0 = 1 3y 2 Zy 2 x2 + yx dx x=0 = = = = 1 3y 2 1 3y 2 1 3y 2 5 y 9 2 3 x + yx2 jyx=0 3 2 3 y + y3 3 5 3 y 3 for 0 < y < 1 E (Y jx) = Z1 yf2 (yjx) dy = Z1 y 1 2 (x + y) dy 1 + 2x 3x2 y=x = 1 1 + 2x 3x2 2 4 Z1 y=x = = = = 3 2 xy + y 2 dy 5 2 1 xy 2 + y 3 j1y=x 1 + 2x 3x2 3 1 2 2 x+ x3 + x3 2 1 + 2x 3x 3 3 2 5 3 3 +x 3x 1 + 2x 3x2 2 + 3x 5x3 for 0 < x < 1 3 (1 + 2x 3x2 ) 10.2. CHAPTER 3 395 6:(f ) (i) The support set of (X; Y ) A = f(x; y) : 0 < y < 1 x; 0 < x < 1g = f(x; y) : 0 < x < 1 y; 0 < y < 1g is pictured in Figure 10.13. y 1 y =1-x A x 1 0 Figure 10.13: Support set for Problem 6 (f ) (i) 1 = Z1 Z1 1 = k f (x; y) dxdx = k 1 1 x Z1 Z = k 2 Z1 x2 x2 ydydx (x;y) 2A 2 x ydydx = k x=0 y=0 Z Z Z1 x 2 1 21 y j 2 0 x k dx = 2 0 2x3 + x4 dx = k 1 2 3 k = 60 Therefore k = 60 and k 1 3 x 2 3 1 4 1 51 x + x j0 2 5 1 1 k 10 15 + 6 + = 2 5 2 30 f (x; y) = 60x2 y and 0 otherwise. x2 (1 0 0 = Z1 for (x; y) 2 A x)2 dx 396 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 6:(f ) (ii) The marginal probability density function of X is Z1 f1 (x) = f (x; y) dy 1 = 60x 2 = 30x 2 1 Z x ydy 0 y 2 j10 = 30x2 (1 x x)2 for 0 < x < 1 and 0 otherwise. The support of X is A1 = fx : 0 < x < 1g. Note that X Beta(3; 3). The marginal probability density function. of Y is Z1 f2 (y) = 1 = 60y f (x; y) dx 1 Z y x2 dx 0 h = 20y x3 j10 = 20y (1 y i y)3 for 0 < y < 1 and 0 otherwise. The support of Y is A2 = fy : 0 < y < 1g. Note that Y Beta(2; 4). 6:(f ) (iii) The conditional probability density function of X given Y = y is f1 (xjy) = = = f (x; y) f2 (y) 60x2 y 20y (1 y)3 3x2 for 0 < x < 1 (1 y)3 y; 0 < y < 1 Check: Z1 1 f1 (xjy) dx = 1 (1 y)3 1 Z 0 y 3x2 dx = 1 (1 y)3 h x3 j01 y i = (1 (1 y)3 =1 y)3 10.2. CHAPTER 3 397 The conditional probability density function of Y given X = x is f (x; y) f1 (x) 60x2 y 30x2 (1 x)2 2y for 0 < y < 1 (1 x)2 f2 (yjx) = = = x; 0 < x < 1 and 0 otherwise. Check: Z1 1 f2 (yjx) dy = 1 (1 x)2 1 Z x 2ydy = 0 1 x)2 (1 y 2 j01 x 6:(f ) (iv) E (Xjy) = Z1 xf1 (xjy) dx 1 = 3 (1 y) 3 1 Z y x x2 dx 0 = 3 (1 = = E (Y jx) = 3 (1 4 3 (1 4 Z1 y) 3 y) 3 y) 1 41 x j 4 0 y y)4 (1 for 0 < y < 1 yf2 (yjx) dy 1 = 2 (1 x) 2 1 Z x y (y) dy 0 = 2 (1 = = 2 (1 3 2 (1 3 x) 2 x) 2 x) 1 31 y j 3 0 (1 x x)3 for 0 < x < 1 = (1 (1 x)2 =1 x)2 398 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 6:(g) (i) The support set of (X; Y ) A = f(x; y) : 0 < y < xg is pictured in Figure 10.14. y . . . y=x ... A x 0 Figure 10.14: Support set for Problem 6 (g) (i) 1 = Z1 Z1 1 = k 1 Z1 Z1 e x 2y e y=0 x=y Z1 = k dxdy = k = k 3 2y lim b!1 0 2y 3e e x 2y e (x;y) 2A Z1 e y lim e b!1 0 Z1 Z Z f (x; y) dxdx = k b dy = k Z1 dydx h e 3y x b jy e i dy dy 0 3y dy 0 3y But 3e is the probability density function of a Exponential therefore the integral is equal to 1. Therefore 1 = k=3 or k = 3. Therefore f (x; y) = 3e and 0 otherwise. x 2y for (x; y) 2 A 1 3 random variable and 10.2. CHAPTER 3 399 6:(g) (ii) The marginal probability density function of X is f1 (x) = Z1 f (x; y) dy = 1 3e x 2y dy 0 = 3e x 1 e 2 3 e 2 x 1 = Zx 2y x j0 e 2x for x > 0 and 0 otherwise. The support of X is A1 = fx : x > 0g. The marginal probability density function of Y is f2 (y) = Z1 f (x; y) dx = Z1 3e x 2y = 3e 2y = 3e 2y = 3e 3y 1 dx y lim b!1 e y e x b jy lim e b b!1 for y > 0 and 0 otherwise. The support of Y is A2 = fy : y > 0g. Note that Y Exponential 6:(g) (iii) The conditional probability density function of X given Y = y is f (x; y) f2 (y) 3e x 2y = 3e 3y = e (x y) for x > y > 0 f1 (xjy) = Note that XjY = y Two Parameter Exponential(y; 1). The conditional probability density function of Y given X = x is f2 (yjx) = = = f (x; y) f1 (x) 3e x 2y 3 x (1 e 2e 2y 2e 1 2x ) e 2x for 0 < y < x 1 3 . 400 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS and 0 otherwise. Check: Z1 1 f2 (yjx) dy = Zx 2y 2e 1 e dy = 2x 1 1 e 2x 2y x j0 e = 0 6:(g) (iv) E (Xjy) = Z1 = Z1 xf1 (xjy) dx 1 xe (x y) y = ey lim b!1 h dx (x + 1) e = ey (y + 1) e y x b jy lim b!1 i b+1 eb = y + 1 for y > 0 E (Y jx) = = Z1 yf2 (yjx) dy 1 Zx 2ye 2y dy 1 e 2x 0 = = = 1 1 y+ e 2y jx0 2x 1 e 2 1 1 1 x+ e 2x 1 e 2x 2 2 1 (2x + 1) e 2x for x > 0 2 (1 e 2x ) 1 1 e e 2x 2x =1 10.2. CHAPTER 3 7:(a) Since X 401 Uniform(0; 1) f1 (x) = 1 for 0 < x < 1 The joint probability density function of X and Y is f (x; y) = f2 (yjx) f1 (x) 1 = (1) 1 x 1 for 0 < x < y < 1 = 1 x and 0 otherwise. 7:(b) The marginal probability density function of Y is f2 (y) = Z1 = Zy f (x; y) dx 1 1 1 x dx x=0 = log (1 = log (1 x) jy0 y) for 0 < y < 1 and 0 otherwise. 7:(c) The conditional probability density function of X given Y = y is f1 (xjy) = = = and 0 otherwise. f (x; y) f2 (y) 1 1 x log (1 (x y) 1 1) log (1 y) for 0 < x < y < 1 402 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 9: Since Y j Binomial(n; ) then E (Y j ) = n Since and V ar (Y j ) = n (1 ) Beta(a; b) then E( )= a ab , V ar ( ) = a+b (a + b + 1) (a + b)2 and E 2 = V ar ( ) + [E ( )]2 = = = = 2 ab a + a+b (a + b + 1) (a + b)2 2 ab + a (a + b + 1) (a + b + 1) (a + b)2 a [b + a(a + b) + a] a (a + b) (a + 1) 2 = (a + b + 1) (a + b) (a + b + 1) (a + b)2 a (a + 1) (a + b + 1) (a + b) Therefore E (Y ) = E [E (Y j )] = E (n ) = nE ( ) = n a a+b and V ar (Y ) = E [var (Y j )] + V ar [E (Y j )] = E [n (1 = n E( ) )] + V ar (n ) E 2 + n2 V ar ( ) a (a + 1) n2 ab + (a + b + 1) (a + b) (a + b + 1) (a + b)2 (a + b + 1) (a + 1) n2 ab = na + (a + b + 1) (a + b) (a + b + 1) (a + b)2 b (a + b) n2 ab = na 2 + (a + b + 1) (a + b) (a + b + 1) (a + b)2 nab (a + b + n) = (a + b + 1) (a + b)2 = n a a+b 10.2. CHAPTER 3 10: (a) Since Y j 403 Poisson( ), E (Y j ) = and since Gamma( ; ), E ( ) = . E (Y ) = E [E (Y j )] = E ( ) = Since Y j Poisson( ), V ar (Y j ) = and since Gamma( ; ), V ar ( ) = 2 . V ar (Y ) = E [V ar (Y j )] + V ar [E (Y j )] = E ( ) + V ar ( ) = 10: (b) Since Y j 2 + Poisson( ) and Gamma( ; ) we have ye f2 (yj ) = for y = 0; 1; : : : ; y! and = 1e f1 ( ) = >0 for ( ) >0 and by the Product Rule ye f ( ; y) = f2 (yj ) f1 ( ) = 1e y! = for y = 0; 1; : : : ; ( ) >0 The marginal probability function of Y is f2 (y) = Z1 f ( ; y) d 1 = 1 y! ( ) Z1 y+ 1 (1+1= ) e d let x = 1+ 1 = +1 0 = y+ 1 y! ( ) +1 Z1 xy+ 1 e x dx 0 y = (y + ) y! ( ) (y + 1) (y + 2) y! ( ) (1 + )y+ y = = (1 + ) y+ y y+ 1 1 1+ 1 1 1+ ( ) ( ) y for y = 0; 1; : : : If is a nonnegative integer then we recognize this as the probability function of a Negative Binomial ; 1+1 random variable. 11: (a) First note that @ j+k @tj1 @tk2 et1 x+t2 y = xj y k et1 x+t2 y 404 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Therefore @ j+k @tj1 @tk2 M (t1 ; t2 ) = Z1 Z1 1 @ j+k @tj1 @tk2 1 et1 x+t2 y f (x; y) dxdx (assuming the operations of integration and di¤erentiation can be interchanged) = Z1 Z1 1 xj y k et1 x+t2 y f (x; y) dxdx 1 = E X j Y k et1 X+t2 Y and @ j+k @tj1 @tk2 M (t1 ; t2 ) j(t1 ;t2 )=(0;0) = E X j Y k as required. Note that this proof is for the case of (X; Y ) continuous random variables. The proof for (X; Y ) discrete random variables follows in a similar manner with integrals replaced by summations. 11: (b) Suppose that X and Y are independent random variables then M (t1 ; t2 ) = E et1 X+t2 Y = E et1 X et2 Y = E et1 X E et2 Y = MX (t1 ) MY (t2 ) Suppose that M (t1 ; t2 ) = MX (t1 ) MY (t2 ) for all jt1 j < h; jt2 j < h for some h > 0. Then Z1 Z1 1 t1 x+t2 y e f (x; y) dxdx = 1 Z1 e t1 x 1 = Z1 Z1 1 f1 (x) dx Z1 et2 Y f2 (y) dy 1 et1 x+t2 y f1 (x) f2 (y) dxdx 1 But by the Uniqueness Theorem for Moment Generating Functions this can only hold if f (x; y) = f1 (x) f2 (y) for all (x; y) and therefore X and Y are independent random variables. Thus we have shown that X and Y are independent random variables if and only if MX (t1 ) MY (t2 ) = M (t1 ; t2 ). Note that this proof is for the case of (X; Y ) continuous random variables. The proof for (X; Y ) discrete random variables follows in a similar manner with integrals replaced by summations. 10.2. CHAPTER 3 11: (c) If (X; Y ) 405 Multinomial(n; p1 ; p2 ; p3 ) then n M (t1 ; t2 ) = p1 et1 + p2 et2 + p3 for t1 2 <; t2 2 < By 11 (a) E (XY ) = = @2 M (t1 ; t2 ) j(t1 ;t2 )=(0;0) @t1 @t2 @2 n p1 et1 + p2 et2 + p3 j(t1 ;t2 )=(0;0) @t1 @t2 = n (n 1) p1 et1 p2 et2 p1 et1 + p2 et2 + p3 = n (n 1) p1 p2 n 2 Also MX (t) = M (t; 0) = p1 et + 1 p1 n for t 2 < and d MX (t) jt=0 dt = np1 p1 et + 1 E (X) = p1 = np1 n 1 jt=0 Similarly E (Y ) = np2 Therefore Cov (X; Y ) = E (XY ) = n (n = E (X) E (Y ) 1) p1 p2 np1 p2 np1 np2 j(t1 ;t2 )=(0;0) 406 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 1 = I (2 13:(a) Note that since is a symmetric matrix T = so that 2 identity T 1 T T T matrix) and = I. Also t = t since t is a scalar and xtT = txT since (x ) tT is a scalar. [x 1 ( + t )] = [(x ) t ] = [(x ) t ] = (x ) 1 = (x ) = (x ( + t )]T [x 1 t ]T ) 2 tT t tT 2 tT t tT (x [(x h 1 (x )T (x ) 1 (x )T (x ) tT ) 1 (x )T xtT + tT txT + t T 2 tT = (x ) 1 (x )T xtT + tT xtT + tT 2 tT = (x ) 1 (x )T 2xtT = (x ) 1 (x )T 2xtT )T tT i 1 2 tT tT t tT 1 t )T + t (x )T + t tT t (x 1 2 tT tT 2 tT t tT t tT as required. Now M (t1 ; t2 ) = E et1 X1 +t2 X2 = E exp XtT Z1 Z1 1 1 = (x ) exp(xtT ) exp 1=2 2 2 j j 1 1 = Z1 Z1 2 j j1=2 exp = Z1 Z1 1h (x 2 1 exp 2 j j1=2 1 [x 2 1 1 = exp = exp since 1 1 1 1 t + t tT 2 T 1 tT + t tT 2 1 2 j j1=2 Z1 Z1 1 1 1 ) 1 (x )T dx1 dx2 (x )T 2xtT i dx1 dx2 ( + t )] 1 [x ( + t )]T 1 1 [x 2 ( + t )] 2 j j1=2 exp 2 tT 1 [x t tT dx1 dx2 ( + t )]T dx1 dx2 for all t = (t1 ; t2 ) 2 <2 exp 1 [x 2 ( + t )] 1 [x ( + t )]T is a BVN( + t ; ) probability density function and therefore the integral is equal to one. 10.2. CHAPTER 3 407 13:(b) Since MX1 (t) = M (t; 0) = exp 1t 1 + t2 2 2 1 for t 2 < which is the moment generating function of a N 1 ; 21 random variable, then by the Uniqueness Theorem for Moment Generating Functions X1 N 1 ; 21 . By a similar argument X2 N 2 ; 22 . 13:(c) Since = = = @2 1 @2 T M (t1 ; t2 ) = exp t + tT t @t1 @t2 @t1 @t2 2 2 @ 1 2 2 1 2 2 exp 1 2 + t2 2 1 t1 + 2 t2 + t1 1 + t1 t2 @t1 @t2 2 2 @ 2 1 2 M (t1 ; t2 ) 1 + t1 1 + t2 @t2 2 2 1 2 M (t1 ; t2 ) + 1 2 1 + t1 1 + t2 2 + t2 2 + t1 1 2 M (t1 ; t2 ) therefore E(XY ) = = From (b) we know E(X1 ) = 1 @2 M (t1 ; t2 )j(t1 ;t2 )=(0;0) @t1 @t2 1 2+ 1 2 and E(X2 ) = Cov(X; Y ) = E(XY ) 2: Therefore E(X)E(Y ) = 1 2 + 1 2 1 2 = 1 2 13:(d) By Theorem 3.8.6, X1 and X2 are independent random variables if and only if M (t1 ; t2 ) = MX1 (t1 )MX2 (t2 ) then X1 and X2 are independent random variables if and only if exp 1 t1 + 2 t2 = exp 1 t1 1 + t21 2 1 + t21 2 2 1 2 1 exp + t1 t2 2 t2 1 2 1 + t22 2 1 + t22 2 2 2 2 2 for all (t1 ; t2 ) 2 <2 or 1 t1 + 2 t2 1 + t21 2 2 1 + t1 t2 1 2 1 + t22 2 2 2 = 1 t1 1 + t21 2 2 1 + 2 t2 1 + t22 2 2 2 or t1 t2 1 2 for all (t1 ; t2 ) 2 <2 which is true if and only if random variables if and only if = 0. =0 = 0. Therefore X1 and X2 are independent 408 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 13:(e) Since 1 tT + t tT 2 E exp(XtT ) = exp for t 2 <2 therefore Efexp[(XA + b)tT ]g = Efexp[XAtT + btT ]g n h = exp(btT )E exp X tAT = exp(btT ) exp T tAT T + io 1 tAT 2 T tAT 1 = exp btT + AtT + t AT A tT 2 1 = exp ( A + b) tT + t(AT A)t for t 2 <2 2 which is the moment generating function of a BVN A + b; AT A random variable, then by the Uniqueness Theorem for Moment Generating Functions, XA+b BVN A + b; AT A . 13:(f ) First note that 1 = 1=2 j j (x 1 ) 2 1 = 1 2 (1 2 " 2) 2 1 1 2 ) = 2 2 1 2) (1 " # # 1 2 2 1 1 2 1 2 2 2 T (x " 1=2 = h = 2 2 1 2 1 2) (1 x1 2 1 " # 1 2 1 1 2 1 2 2 1 2 i1=2 = 1 x2 2 1 2) ( 2 x1 1 1 2 1 p 1 2 + 2 2 x2 2 2 2 # and (x = = = = = 1 ) 1 2) (1 1 2) (1 " " 1 2 (1 2 1 2 (1 2 1 2 (1 2 2) 2) 2) 2 x1 )T (x 1 1 2 x1 1 x1 2 1 1 2 x1 1 x1 2 1 (x2 (x2 x2 x2 1 2 2) 2+ 2 2 x2 1 (x1 1 ) (x2 2 (x1 1) (x1 1) 1 2 1 2 2 2 2 + # 2 2) + 1 1 1 2 2 2 2 2 1 2 x1 2 x2 2 1 2 + 2 x2 2 1 2) 2 (x1 2 2 x1 1 1 2 1) # 10.2. CHAPTER 3 409 The conditional probability density function of X2 given X1 = x1 is f (x1 ; x2 ) f1 (x1 ) = 1 2 j j1=2 exp p 1 2 = = p p h 1 1 2 (x 1 p 1 2 1 2 2 1 p 2 1 2 2 ) exp exp " ) 2 x1 1 2 exp T 1 (x 1 i 1 ( " 1 (x 2 1 2 1 2 (1 2 1 ) (x ) 2 x1 T 1 1 x2 2) 2 which is the probability density function of a N variable. 2 2 + 2 (x1 1) 1 2 1 + #) (x1 2 2 1) ; # for x 2 <2 2 1 random 14:(a) M (t1 ; t2 ) = E et1 X+t2 Y Z1 Zy Z1 t1 x+t2 y x y 2e dxdy = 2 e = e y=0 x=0 Z1 = 2 e (1 t2 )y 0 = = = = = = = = 2 (1 t1 ) 2 (1 t1 ) 2 (1 2 (1 2 Z1 " e 1 t1 (1 t2 )y e 0 Z1 h (1 t1 )x e 0 lim t1 ) b!1 lim t1 ) b!1 " " h 1 (1 t2 )y # e e (1 (1 t2 )y e (1 (1 t2 )b 0 jy0 dy e i (2 t1 t2 )y e + t2 ) (2 0 dy dx5 dy i which converges if t1 < 1; t2 < 1 dy (2 t1 t2 )y t1 t2 ) jb0 # (2 t1 t2 )b t1 t2 ) + 1 1 t1 ) (1 t2 ) (2 t1 t2 ) 2 2 t1 t2 (1 t2 ) (1 t1 ) (1 t2 ) (2 t1 t2 ) 2 1 t1 (1 t1 ) (1 t2 ) (2 t1 t2 ) 2 for t1 < 1; t2 < 1 (1 t2 ) (2 t1 t2 ) (1 (1 t1 )x 3 if t1 < 1 (1 t1 )y e + t2 ) (2 (1 2 y Z t2 )y 4 e 1 (1 t2 ) (2 1 t1 t2 ) # 410 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 14:(b) 2 MX (t) = M (t; 0) = 2 t = 1 1 2t 1 for t < 1 which is the moment generating function of an Exponential 12 random variable. Therefore by the Uniqueness Theorem for Moment Generating Functions X Exponential 12 . MY (t) = M (0; t) = 2 t) (2 (1 t) = 2 2 3t + t2 for t < 1 which is not a moment generating function we recognize. We …nd the probability density function of Y using Zy 2e x y dx = 2e y e x jy0 f2 (y) = 0 = 2e y 1 e y for y > 0 and 0 otherwise. 14:(c) E (X) = 12 since X Alternatively Exponential E (X) = = 1 2 . d d 2 MX (t) jt=0 = dt dt 2 t 1 2 ( 1) ( 1) 2 2 jt=0 = 4 = 2 (2 t) jt=0 Similarly E (Y ) = = d d 2 MY (t) jt=0 = dt dt 2 3t + t2 2 ( 1) ( 3 + 2t) 6 3 jt=0 = = 2 4 2 (2 3t + t2 ) jt=0 Since @2 @2 1 M (t1 ; t2 ) = 2 @t1 @t2 @t1 @t2 (1 t2 ) (2 t1 t2 ) @ 1 = 2 @t2 (1 t2 ) (2 t1 t2 )2 1 1 1 = 2 2 2 + (1 t2 ) (2 (1 t2 ) (2 t1 t2 ) 2 t1 t2 )3 we obtain @2 M (t1 ; t2 ) j(t1 ;t2 )=(0;0) @t1 @t2 1 1 1 = 2 2 2 + (1 t2 ) (2 (1 t2 ) (2 t1 t2 ) 1 2 = 2 + =1 4 8 E (XY ) = 2 t1 t2 )3 j(t1 ;t2 )=(0;0) 10.2. CHAPTER 3 411 Therefore Cov(X; Y ) = E(XY ) 1 = 1 2 1 = 4 E(X)E(Y ) 3 2 412 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 10.3 Chapter 4 1: We are given that X and Y are independent random variables and we want to show that U = h (X) and V = g (Y ) are independent random variables. We will assume that X and Y are continuous random variables. The proof for discrete random variables is obtained by replacing integrals by sums. Suppose X has probability density function f1 (x) and support set A1 , and Y has probability density function f2 (y) and support set A2 . Then the joint probability density function of X and Y is f (x; y) = f1 (x) f2 (y) for (x; y) 2 A1 A2 Now for any (u; v) 2 <2 P (U u; V v) = P (h (X) u; g (Y ) v) ZZ = f1 (x) f2 (y) dxdy B where B = f(x; y) : h (x) Let B1 = fx : h (x) Since P (U ug and B2 = fy : g (y) u; V v) = ZZ u; g (y) vg. Then B = B1 = f1 (x) dx B1 Z f2 (y) dy B2 = P (h (X) = P (U B2 . f1 (x) f2 (y) dxdy B1 B2 Z vg u) P (g (Y ) u) P (V therefore U and V are independent random variables. v) v) for all (u; v) 2 <2 10.3. CHAPTER 4 413 2:(a) The transformation S : U = X + Y; V = X has inverse transformation X = V; Y =U V The support set of (X; Y ), pictured in Figure 10.15, is RXY = f(x; y) : 0 < x + y < 1; 0 < x < 1; 0 < y < 1g y 1 y=1-x R xy x 1 0 Figure 10.15: RXY for Problem 2 (a) Under S (k; 0) ! (k; k) (k; 1 k) ! (1; k) (0; k) ! (k; 0) 0<k<1 0<k<1 0<k<1 and thus S maps RXY into RU V = f(u; v) : 0 < v < u < 1g which is pictured in Figure 10.16. 414 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS v v=u 1 R uv u 0 1 Figure 10.16: RU V for Problem 2 (a) The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) 0 1 1 1 = 1 The joint probability density function of U and V is given by g (u; v) = f (v; u v) j 1j = 24v (u v) for (u; v) 2 RU V and 0 otherwise. 2: (b) The marginal probability density function of U is given by g1 (u) = Z1 g (u; v) dv 1 = Zu 24v (u v) dv 0 = 12uv 2 = 4u3 and 0 otherwise. 8v 3 ju0 for 0 < u < 1 10.3. CHAPTER 4 415 6:(a) y 1 t y=t-x A t t 0 Figure 10.17: Region of integration for 0 For 0 t x 1 t 1 G (t) = P (T t) = P (X + Y t) Zt Zt x Z Z 4xydydx 4xydydx = = x=0 y=0 (x;y) 2At = Zt 2x y 2 jt0 = Zt 2x (t Zt 2x t2 x dx 0 x)2 dx 0 = 2tx + x2 dx 0 = x2 t2 = t4 = 1 4 t 6 4 3 1 4t tx + x j0 3 2 4 4 1 4 t + t 3 2 1 416 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS y 1 Bt t-1 y=t-x 0 x 1 t-1 Figure 10.18: Region of integration for 1 For 1 t t 2 2 we use G (t) = P (T t) = P (X + Y t) = 1 P (X + Y > t) where P (X + Y > t) = Z Z 4xydydx = = t 1 Z1 h 2x 1 (t i x)2 dx = Z1 Z1 2x y 2 j1t t2 + 2tx x2 dx 4xydydx = x=t 1 y=t x (x;y)2 Bt Z1 Z1 2x 1 so dx t 1 t 1 1 41 4 x j = x 1 t + tx3 3 2 t 1 4 1 4 = 1 t2 + t (t 1)2 1 t2 + t (t 3 2 3 4 1 1 = 1 t2 + t + (t 1)3 (t + 3) 3 2 6 2 x 2 1)3 1 (t 2 4 1 1 t+ (t 1)3 (t + 3) for 1 t 2 3 2 6 The probability density function of T = X + T is 8 2 3 if 0 t dG (t) < 3 t h i g (t) = = 1 :2t 4 dt 1)2 (t + 3) + 16 (t 1)3 if 1 < t 3 2 (t 8 < 2 t3 if 0 t 1 3 = :2 t3 + 6t 4 if 1 < t 2 3 1)4 G (t) = t2 and 0 otherwise. 1 2 10.3. CHAPTER 4 417 6: (c) The transformation S : U = X 2; has inverse transformation X= p V = XY p Y = V= U U; The support set of (X; Y ), pictured in Figure 10.19, is RXY = f(x; y) : 0 < x < 1; 0 < y < 1g Under S y 1 Rxy x 0 1 Figure 10.19: Support set of (X; Y ) for Problem 6 (c) (k; 0) ! k2 ; 0 (0; k) ! (0; 0) (1; k) ! (1; k) (k; 1) ! 0<k<1 0<k<1 0<k<1 2 k ;k 0<k<1 and thus S maps RXY into the region RU V = = which is pictured in Figure 10.20. (u; v) : 0 < v < 2 p u; 0 < u < 1 (u; v) : v < u < 1; 0 < v < 1 418 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS v 1 2 u=v or v=sq rt(u) R uv 0 1 u Figure 10.20: Support set of (U; V ) for Problem 6 (c) The Jacobian of the inverse transformation is 1 p 2 u @y @u @ (x; y) = @ (u; v) 0 p1 u = 1 2u The joint probability density function of U and V is given by g (u; v) = f p p u; v= u p p = 4 u v= u = 2v u 1 2u 1 2u for (u; v) 2 RU V and 0 otherwise. 6:(d) The marginal probability density function of U is g1 (u) = = Z1 1 p Zu 0 = 1 and 0 otherwise. Note that U g (u; v) dv 2v 1 h 2 pu i dv = v j0 u u for 0 < u < 1 Uniform(0; 1). 10.3. CHAPTER 4 419 The marginal probability density function of V is g2 (v) = Z1 g (u; v) du 1 = Z1 2v du u v2 = 2v log uj1v2 = 2v log v 2 = 4v log (v) for 0 < v < 1 and 0 otherwise. 6:(e) The support set of X is A1 = fx : 0 < x < 1g and the support set of Y is A2 = fy : 0 < y < 1g. Since f (x; y) = 4xy = 2x (2y) for all (x; y) 2 RXY = A1 A2 , therefore by the Factorization Theorem for Independence X and Y are independent random variables. Also f1 (x) = 2x for x 2 A1 and f2 (y) = 2y for y 2 A2 so X and Y have the same distribution. Therefore h i E V 3 = E (XY )3 = E X3 E Y 3 2 E X3 2 1 32 Z = 4 x3 (2x) dx5 = 0 = = = 2 51 x j 5 0 2 5 4 25 2 2 420 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 7:(a) The transformation S : U = X=Y; V = XY has inverse transformation X= p UV ; Y = p V =U The support set of (X; Y ), picture in Figure 10.21, is RXY = f(x; y) : 0 < x < 1; 0 < y < 1g y 1 Rxy 1 0 x Figure 10.21: Support set of (X; Y ) for Problem 7 (a) Under S (k; 0) ! (1; 0) 0<k<1 (0; k) ! (0; 0) 0 < k < 1 1 (1; k) ! ;k 0<k<1 k (k; 1) ! (k; k) 0 < k < 1 and thus S maps RXY into RU V = which is picture in Figure 10.22. (u; v) : v < u < 1 ; 0<v<1 v 10.3. CHAPTER 4 421 v 1 uv=1 . . . v=u Ruv ... u 1 0 Figure 10.22: Support set of (U; V ) for Problem 7 (a) The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) p pu 2 v p 1p 2 u v p pv 2 pu v 2u3=2 = 1 1 1 + = 4u 4u 2u The joint probability density function of U and V is given by g (u; v) = f p uv; p v=u p p = 4 uv v=u = 2v u 1 2u 1 2u for (u; v) 2 RU V and 0 otherwise. 7:(b) The support set of (U; V ) is RU V = (u; v) : v < u < v1 ; 0 < v < 1 which is not rectangular. The support set of U is A1 = fu : 0 < u < 1g and the support set of V is A2 = fv : 0 < v < 1g. Consider the point 12 ; 43 2 = RU V so g 12 ; 34 = 0. Since 12 2 A1 then g1 21 > 0. Since 3 3 1 3 4 2 A2 then g2 4 > 0. Therefore g1 2 g2 4 > 0 so g 1 3 ; 2 4 = 0 6= g1 1 2 and U and V are not independent random variables. g2 3 4 422 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 7:(c) The marginal probability density function of V is given by g2 (v) = Z1 1 = 2v g (u; v) du Z1=v 1 du u v = 2v [ln v] j1=v v = 4v ln v for 0 < v < 1 and 0 otherwise. The marginal probability density function of U is given by g1 (u) = = 8 Ru 1 > > 2vdv > u > < 0 8 > <u > : u13 g (u; v) dv 1 if 0 < u < 1 1=u R > 1 > > 2vdv > u : 0 = and 0 otherwise. Z1 if u 1 if 0 < u < 1 if u 1 10.3. CHAPTER 4 423 8: Since X Uniform(0; ) and Y density function of X and Y is f (x; y) = Uniform(0; ) independently the joint probability 1 for (x; y) 2 RXY 2 where RXY = f(x; y) : 0 < x < ; 0 < y < g which is pictured in Figure 10.23.The transformation y θ R xy θ 0 x Figure 10.23: Support set of (X; Y ) for Problem 8 S: U =X Y; V = X + Y has inverse transformation X= 1 (U + V ) ; 2 Y = 1 (V 2 U) Under S (k; 0) ! (k; k) 0<k< (0; k) ! ( k; k) ( ; k) ! ( (k; ) ! (k 0<k< k; + k) 0<k< ;k + ) 0<k< S maps RXY into RU V = f(u; v) : u < v < 2 + u; which is pictured in Figure 10.24. <u 0 or u < v < 2 u; 0 < u < g 424 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS v 2θ v =2θ+u v =2θ-u R uv v =-u v =u -θ θ 0 u Figure 10.24: Support set of (U; V ) for Problem 8 The Jacobian of the inverse transformation is @ (x; y) = @ (u; v) 1 2 1 2 1 2 1 2 = 1 1 1 + = 4 4 2 The joint probability density function of U and V is given by g (u; v) = f = and 0 otherwise. 1 2 2 1 1 (u + v) ; (v 2 2 u) for (u; v) 2 RU V 1 2 10.3. CHAPTER 4 425 The marginal probability density function of U is g1 (u) = Z1 g (u; v) dv 1 = = = and 0 otherwise. 8 2 R+u > 1 > dv = > 2 > <2 u 2R u > > 1 > dv = > 2 :2 u 8 u+ > < 2 > : u for 1 2 2 (2 + u + u) for 1 2 2 (2 u for 0 < u < <u 0 2 for 0 < u < juj for 2 <u< u) <u 0 426 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 9: (a) The joint probability density function of (Z1 ; Z2 ) is f (z1 ; z2 ) = = 1 1 2 2 p e z1 =2 p e z2 =2 2 2 1 (z12 +z22 )=2 e for (z1 ; z2 ) 2 <2 2 The support set of (Z1 ; Z2 ) is <2 . The transformation X1 = 1+ 1 Z1 ; X2 = 2+ h Z1 + 1 X2 2 2 has inverse transformation Z1 = X1 1 Z2 = p 1 ; 1 1 2 2 1=2 Z2 X1 i 1 2 1 The Jacobian of the inverse transformation is 1 @ (z1 ; z2 ) = @ (x1 ; x2 ) 0 1 @z2 @x2 2 )1=2 2 (1 1 p 2 1 = 1 1 Note that z12 + z22 = 2 x1 1 1 = 1 (1 = (x 2) ) + " 1 2) (1 " 1 2 (x 2 2 x2 1 1 j j1=2 x2 1 2 2 2 x2 + 2 2 + 2 2 2 2 x1 1 1 # # ) x= h x1 x2 i and = h Therefore the joint probability density function of X = and thus X 1 T where g (x) = = 1 x1 1 1 x1 2 2 x1 2 x2 2 1 1=2 2 j j exp 1 (x 2 ) 1 (x 1 h 2 i X1 X2 )T i is for x 2 <2 BVN( ; ). 9: (b) Since Z1 N(0; 1) and Z1 N(0; 1) independently we know Z12 2 (1) and Z 2 + Z 2 2 (2). From (a) Z 2 + Z 2 = (X ) 1 (X 1 2 1 2 2 (2). (X ) 1 (X )T 2 (1) and Z22 )T . Therefore 10.3. CHAPTER 4 427 10: (a) The joint moment generating function of U and V is M (s; t) = E esU +tV h i = E es(X+Y )+t(X Y ) h i = E e(s+t)X e(s t)Y h i h i = E e(s+t)X E e(s t)Y = MX (s + t) MY (s = e = e2 = (s+t)+ s+ e(2 2 (s+t)2 =2 since X and Y are independent random variables t) e (s t)+ 2 (s t)2 =2 since X N ; 2 and Y N ; 2 (2s2 +2t2 )=2 )s+(2 2 )s2 =2 e(2 2 )t2 =2 for s 2 <; t 2 < 10: (b) The moment generating function of U is MU (s) = M (s; 0) = e(2 )s+(2 2 )s2 =2 for s 2 < which is the moment generating function of a N 2 ; 2 generating function of V is 2 random variable. The moment MV (t) = M (0; t) = e(2 2 )t2 =2 which is the moment generating function of a N 0; 2 for t 2 < 2 random variable. Since M (s; t) = MU (s) MV (t) for all s 2 <; t 2 < therefore by Theorem 3.8.6, U and V are independent random variables. Also by the Uniqueness Theorem for Moment Generating Functions U and V N 0; 2 2 . N 2 ;2 2 2 428 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 12: The transformation de…ned by Y1 = X1 ; X1 + X2 Y2 = X1 + X2 ; X1 + X2 + X3 Y3 = X1 + X2 + X3 has inverse transformation X1 = Y1 Y2 Y3 ; X2 = Y2 Y3 (1 Y1 ) ; X3 = Y3 (1 Y2 ) Let RX = f(x1 ; x2 ; x3 ) : 0 < x1 < 1; 0 < x2 < 1; 0 < x3 < 1g and RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 1; 0 < y3 < 1g The transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) maps RX into RY . The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is @ (x1 ; x2 ; x3 ) @ (y1 ; y2 ; y3 ) = = @x1 @y1 @x2 @y1 @x3 @y1 @x1 @y2 @x2 @y2 @x3 @y2 @x1 @y3 @x2 @y3 @x3 @y3 y2 y3 y1 y3 y1 y2 y2 y3 y3 (1 y1 ) y2 (1 y1 ) 0 y3 1 y2 = y3 (1 = y3 y22 y3 y1 ) y22 y3 + y1 y22 y3 + (1 y1 y22 y3 + y1 y22 y3 + (1 = y22 y32 + y2 y32 y2 ) y2 y32 (1 y1 ) + y1 y2 y32 y2 ) y2 y32 y1 y2 y32 + y1 y2 y32 y22 y32 = y2 y32 Since X1 ; X2 ; X3 are independent Exponential(1) random variables the joint probability density function of (X1 ; X2 ; X3 ) is f (x1 ; x2 ; x3 ) = e x1 x2 x3 for (x1 ; x2 ; x3 ) 2 RX The joint probability density function of (Y1 ; Y2 ; Y3 ) is g (y1 ; y2 ; y3 ) = e y3 = y2 y32 e and 0 otherwise. y2 y32 y3 for (y1 ; y2 ; y3 ) 2 RY 10.3. CHAPTER 4 429 Let A1 = fy1 : 0 < y1 < 1g A2 = fy2 : 0 < y2 < 1g A3 = fy3 : 0 < y3 < 1g g1 (y1 ) = 1 g2 (y2 ) = 2y2 for y1 2 A1 for y2 2 A2 and 1 g3 (y3 ) = y32 exp ( y3 ) for y3 2 A3 2 Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables. Since Z gi (yi ) dyi = 1 for i = 1; 2; 3 Ai therefore the marginal probability density function of Yi is equal to gi (yi ) ; i = 1; 2; 3. Note that Y1 independently. Uniform(0; 1) = Beta(1; 1) ; Y2 Beta(2; 1) ; and Y3 Gamma(3; 1) 430 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 13: Let RY = f(y1 ; y2 ; y3 ) : 0 < y1 < 1; 0 < y2 < 2 ; 0 < y3 < g Consider the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) de…ned by X1 = Y1 cos Y2 sin Y3; X2 = Y1 sin Y2 sin Y3 ; X3 = Y1 cos Y3 The Jacobian of the transformation from (X1 ; X2 ; X3 ) ! (Y1 ; Y2 ; Y3 ) is @ (x1 ; x2 ; x3 ) @ (y1 ; y2 ; y3 ) = @x1 @y1 @x2 @y1 @x3 @y1 @x1 @y2 @x2 @y2 @x3 @y2 = cos y2 sin y3 sin y2 sin y3 cos y3 = y12 sin y3 @x1 @y3 @x2 @y3 @x3 @y3 y1 sin y2 sin y3 y1 cos y2 cos y3 y1 cos y2 sin y3 y1 sin y2 cos y3 0 sin y3 cos y2 sin y3 sin y2 sin y3 cos y3 = y12 sin y3 [cos y3 sin y2 cos y2 cos y3 cos y2 sin y2 cos y3 0 sin y3 sin2 y2 cos y3 cos2 y2 cos y3 sin y3 cos2 y2 sin y3 + sin2 y2 sin y3 ] = y12 sin y3 cos2 y3 sin2 y3 y12 sin y3 = @(x1 ;x2 ;x3 ) 1 ;x2 ;x3 ) Since the entries in @(x @(y1 ;y2 ;y3 ) are all continuous functions for (y1 ; y2 ; y3 ) 2 RY and @(y1 ;y2 ;y3 ) 6= 0 for (y1 ; y2 ; y3 ) 2 RY therefore by the Inverse Mapping Theorem the transformation has an inverse in the neighbourhood of each point in RY . Since X1 ; X2 ; X3 are independent N(0; 1) random variables the joint probability density function of (X1 ; X2 ; X3 ) is f (x1 ; x2 ; x3 ) = (2 ) 3=2 exp 1 2 x + x22 + x22 2 1 for (x1 ; x2 ; x3 ) 2 <3 Now x21 + x22 + x22 = (y1 cos y2 sin y3; )2 + (y1 sin y2 sin y3 )2 + (y1 cos y3 )2 = y12 cos2 y2 sin2 y3; + sin2 y2 sin2 y3 + cos2 y3 = y12 sin2 y3; cos2 y2 + sin2 y2 + cos2 y3 = y12 sin2 y3; + cos2 y3 = y12 The joint probability density function of (Y1 ; Y2 ; Y3 ) is g (y1 ; y2 ; y3 ) = (2 ) 3=2 exp = (2 ) 3=2 exp 1 2 y y12 sin y3 2 1 1 2 2 y y sin y3 for (y1 ; y2 ; y3 ) 2 RY 2 1 1 10.3. CHAPTER 4 431 and 0 otherwise. Let A1 = fy1 : y1 > 0g A2 = fy2 : 0 < y2 < 2 g A3 = fy3 : 0 < y3 < g g1 (y1 ) = g2 (y2 ) = 2 1 2 p y12 exp y 2 1 2 1 for y2 2 A2 2 for y1 2 A1 and 1 sin y3 for y3 2 A3 2 Since g (y1 ; y2 ; y3 ) = g1 (y1 ) g2 (y2 ) g3 (y3 ) for all (y1 ; y2 ; y3 ) 2 A1 A2 A3 therefore by the Factorization Theorem for Independence, (Y1 ; Y2 ; Y3 ) are independent random variables. g3 (y3 ) = Since Z gi (yi ) dyi = 1 for i = 1; 2; 3 Ai therefore the marginal probability density function of Yi is equal to gi (yi ) ; i = 1; 2; 3. 432 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 15: Since X 2 (n), the moment generating function of X is MX (t) = Since U = X + Y 2 (m), 1 (1 2t) n=2 for t < 1 2 the moment generating function of U is MU (t) = 1 (1 m=2 2t) for t < 1 2 The moment generating function of U can also be obtained as MU (t) = E etU = E et(X+Y ) = E etX E etY since X and Y are independent random variables = MX (t) MY (t) By rearranging MU (t) = MX (t) MY (t) we obtain MY (t) = = = MU (t) MX (t) (1 2t) m=2 (1 2t) 1 n=2 (1 (m n)=2 2t) for t < 1 2 which is the moment generating function of a 2 (m n) random variable. Therefore by 2 (m the Uniqueness Theorem for Moment Generating Functions Y n). 10.3. CHAPTER 4 433 16:(a) Note that n P Xi n P X = i=1 Therefore n P n P nX = 0 and Xi n P ti Xi = i=1 = = = si i=1 n h P si Xi i=1 n P i=1 n P s n s+ Xi X + s n si Ui + (si s) = 0 i=1 i=1 s X +X s n n P s (10.17) Xi Xi X + (si X +X i=1 si Ui + sX n P s) X + (si s i X n s) + sX i=1 i=1 Also since X1 ; X2 ; :::; Xn are independent N n P E exp ti Xi n Y = i=1 2 ; random variables E [exp (ti Xi )] = i=1 = exp n P ti + i=1 Therefore by (10.17) and (10.18) E exp n P si Ui + sX 1 2 2 n Y i=1 n P i=1 = E exp i=1 exp n P ti Xi ti + i=1 16:(b) n P ti = i=1 n P i=1 n P si s+ i=1 t2i = = = = n P i=1 n P i=1 n P i=1 n P i=1 si (si (si (si n P s = (si n i=1 s+ s n s)2 + s) + n s2 n (10.18) 1 2 2 n P i=1 t2i s =0+s=s n (10.20) (10.21) n s P (si n i=1 s)2 + 0 + 2 2 ti (10.19) 2 s)2 + 2 1 2 t2i i=1 n P = exp ti + s2 n s) + n P i=1 s n 2 434 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 16:(c) M (s1 ; :::; sn ; s) = E exp = exp n P i=1 n P si Ui + sX i=1 s+ 1 2 = exp s+ 1 2 ti Xi i=1 ti + = exp n P = E exp 2 2 2 P n t2 by (10.19) 2 i=1 i n P s2 (si s)2 + by (10.20) and (10.21) n i=1 n s2 1 2P exp (si s)2 n 2 i=1 16:(d) Since MX (s) = M (0; :::; 0; s) = exp s+ and MU (s1 ; :::; sn ) = M (s1 ; :::; sn ; 0) = exp we have 1 2 1 2 2 2 n P s2 n (si s)2 i=1 M (s1 ; :::; sn ; s) = MX (s) MU (s1 ; :::; sn ) By the Independence Theorem for Moment Generating Functions, X and U = (U1 ; U2 ; : : : ; Un ) are independent random variables. Therefore by Chapter 4, Problem 1, X and n n P P 2 Ui2 = Xi X are independent random variables. i=1 i=1 10.4. CHAPTER 5 10.4 435 Chapter 5 1: (a) Since Yi Exponential( ; 1), i = 1; 2; : : : independently then P (Yi > x) = Z1 e (y ) dy = e (x ) for x > ; i = 1; 2; : : : (10.22) x and for x > Fn (x) = P (Xn x) = P (min (Y1 ; Y2 ; : : : ; Yn ) x) = 1 P (Y1 > x; Y2 > x; : : : ; Yn > x) n Q = 1 P (Yi > x) since Y1 ; Y2 ; : : : ; Yn are independent random variables = 1 i=1 n(x e ) using (10:22) (10.23) Since lim Fn (x) = n!1 therefore 8 <1 if x > :0 if x < Xn !p (10.24) (b) By (10.24) and the Limit Theorems Un = Xn !p 1 (c) Now P (Vn v) = P [n (Xn v + n using (10:23) ) < v] = P Xn = 1 e n(v=n+ = 1 e v ) for v 0 which is the cumulative distribution function of an Exponential(1) random variable. Therefore Vn Exponential(1) for n = 1; 2; ::: which implies Vn !D V Exponential (1) (d) Since P (Wn w) = P n2 (Xn ) < w = P Xn = 1 e n(w=n2 + = 1 e w=n v + n2 ) for w 0 therefore lim P (Wn n!1 w) = 0 for all w 2 < which is not a cumulative distribution function. Therefore Wn has no limiting distribution. 436 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 2: We …rst note that P (Yn y) = P (max (X1 ; X2 ; : : : ; Xn ) y) = P (X1 y; X2 y : : : ; Xn y) n Q = P (Xi y) since X1 ; X2 ; : : : ; Xn are independent random variables = i=1 n Q F (y) i=1 = [F (y)]n for y 2 < (10.25) Since F is a cumulative distribution function, F takes on values between 0 and 1. Therefore the function n [1 F ( )] takes on values between 0 and n. Gn (z) = P (Zn z), the cumulative distribution function of Zn , equals 0 for z 0 and equals 1 for z n. For 0<z<n Gn (z) = P (Zn z) = P (n [1 F (Yn )] = P F (Yn ) z) z n 1 Let A be the support set of Xi . F (x) is an increasing function for x 2 A and therefore has an inverse, F 1 , which is de…ned on the interval (0; 1). Therefore for 0 < z < n Gn (z) = P F (Yn ) = P Yn = 1 = 1 = 1 Since h lim 1 n!1 therefore z n 1 F 1 1 z n z in P Yn < F 1 1 h z F F 1 1 n z n 1 n 1 z n lim Gn (z) = n!1 ni =1 8 <0 :1 e n z by (10.25) for z > 0 if z < 0 e z if z > 0 which is the cumulative distribution function of a Exponential(1) random variable. Therefore by the de…nition of convergence in distribution Zn !D Z Exponential (1) 10.4. CHAPTER 5 437 3: The moment generating function of a Poisson( ) random variable is M (t) = exp et 1 for t 2 < The moment generating function of Yn = p n X is n 1 P =p Xi n i=1 p n Mn (t) = E etYn n t P p Xi n i=1 n p t P e n t E exp p Xi n i=1 n p Q t e n t E exp p Xi n i=1 since X1 ; X2 ; : : : ; Xn are independent random variables n p t e n t M p n h i p p e n t exp n et= n 1 h p i p exp n t + n et= n 1 for t 2 < = E exp = = = = = and log Mn (t) = p n t+ p n t+n p et= n 1 for t 2 < By Taylor’s Theorem ex = 1 + x + x2 ec 3 + x 2 3! for some c between 0 and x. Therefore p t 1 = 1+ p + n 2 t 1 = 1+ p + n 2 p for some cn between 0 and t= n. et= n t 2 1 t 3 cn p p + e 3! n n t2 1 t3 + ecn n 3! n3=2 438 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Therefore log Mn (t) = p n t+n = p n t+n = 1 2 t + 2 t3 3! p et= n 1 1 t3 t 1 t2 p + + 3! n3=2 n 2 n 1 p ecn for t 2 < n Since lim cn = 0 n!1 it follows that lim ecn = e0 = 1 n!1 Therefore t3 1 1 2 p t + n!1 2 3! n 3 1 2 t = t + (0) (1) 2 3! 1 2 = t for t 2 < 2 lim log Mn (t) = n!1 lim ecn and 1 lim Mn (t) = e 2 n!1 t2 for t 2 < which is the moment generating function of a N(0; ) random variable. Therefore by the Limit Theorem for Moment Generating Functions Yn !D Y N (0; ) ecn 10.4. CHAPTER 5 439 4: The moment generating function of an Exponential( ) random variable is 1 M (t) = 1 1 for t < t Since X1 ; X2 ; : : : ; Xn are independent random variables, the moment generating function of Zn = = is n P 1 p Xi n n i=1 n p 1 P p Xi n n i=1 Mn (t) = E etZn n p 1 P p Xi n n i=1 n p t P Xi e n t exp p n i=1 n p t n t Q E exp p Xi n i=1 n p t n t M p n !n p 1 n t 1 ptn = E exp t = E = e = e = e = e p t= n 1 n t p n for t < p n By Taylor’s Theorem ex = 1 + x + x2 ec 3 + x 2 3! for some c between 0 and x. Therefore e p t= n t 1 = 1+ p + n 2! t 1 = 1+ p + n 2 p for some cn between 0 and t= n. t p n 2 2 t n 2 + + 1 3! 1 3! t p n 3 3 t n3=2 3 ecn ecn 440 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Therefore 2 2 t t 1 1+ p + n 2 Mn (t) = t 1 = [1 + p + n 2 t p n n 2 2 t + n t p n t p n 1 2 = 1 1 2 2 2 t = 1 1 2 2 2 t n where (n) = n3=2 2 2 1 t 2 n n t p n 1 ecn 3 3 t 3 3 t 1 3! t p n 4 4 t + 1 3! n3=2 n2 + 1 3! 3 3 t p n 4 4 t n3=2 n3=2 n ecn n 3 3 t 1 2 ecn n3=2 3 3 t 3 3 t (n) n + n 1 3! 3 3 t 1 3! + n1=2 n ecn Since lim cn = 0 n!1 it follows that lim ecn = e0 = 1 n!1 Also 3 3 t lim p = 0; n!1 n 4 4 t lim n n!1 =0 and therefore lim (n) = 0 lim 1 2 n!1 Thus by Theorem 5.1.2 lim Mn (t) = n!1 n!1 = e 1 2 2 t =2 2 2 t n + (n) n n for t 2 < which is the moment generating function of a N 0; 2 random variable. Therefore by the Limit Theorem for Moment Generating Functions Zn !D Z N 0; 2 ecn t p n ] n 10.4. CHAPTER 5 441 6: We can rewrite Sn2 as Sn2 = = = = = = n P 1 n 1 n 1 n 1 n 1 n 1 n 2 = Xi Xn 2 1 i=1 n P (Xi ) Xn 1 i=1 n P (Xi )2 2 Xn 1 i=1 n P (Xi )2 2 Xn 1 i=1 n P (Xi )2 2 Xn 1 i=1 n P (Xi )2 n Xn 1 i=1 " n 1 P Xi n n 1 n i=1 2 n P (Xi i=1 n P Xi ) + n Xn n 2 2 + n Xn (10.26) i=1 + n Xn n Xn 2 (10.27) 2 2 2 Xn # (10.28) Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = and V ar (Xi ) = 2 < 1, then Xn !p (10.29) by the Weak Law of Large Numbers. By (10.29) and the Limit Theorems 2 Xn Let Wi = Xi 2 !p 0 (10.30) , i = 1; 2; : : : with E (Wi ) = E " Xi 2 # =1 and V ar (Wi ) < 1 since E Xi4 < 1. Since W1 ; W2 ; : : : are independent and identically distributed random variables with E (Wi ) = 1 and V ar (Wi ) < 1, then n Wn = 1X n Xi 2 i=1 !p 1 by the Weak Law of Large Numbers. By (10.28), (10.30), (10.31) and the Limit Theorems Sn2 !p 2 (1) (1 + 0) = 2 (10.31) 442 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS and therefore Sn !p 1 (10.32) by the Limit Theorems. Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = and V ar (Xi ) = 2 < 1, then p n Xn !D Z N (0; 1) (10.33) by the Central Limit Theorem. By (10.32), (10.33) and Slutsky’s Theorem Tn = p n Xn Sn p = n(Xn Sn ) !D Z =Z 1 N (0; 1) 10.4. CHAPTER 5 443 7: (a) Let Y1 ; Y2 ; : : : be independent Binomial(1; ) random variables with E (Yi ) = and V ar (Yi ) = (1 ) for i = 1; 2; : : : By 4.3.2(1) n P Yi Binomial (n; ) i=1 Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with E (Yi ) = and V ar (Yi ) = (1 ) < 1, then by the Weak Law of Large Numbers Since Xn and n P n 1 P Yi = Yn !p n i=1 Yi have the same distribution i=1 Tn = Xn !p n (10.34) (b) By (10.34) and the Limit Theorems Un = Xn n Xn n 1 !p (1 ) (10.35) (c) Since Y1 ; Y2 ; : : : are independent and identically distributed random variables with E (Yi ) = and V ar (Yi ) = (1 ) < 1, then by the Central Limit Theorem p n Yn p !D Z N (0; 1) (1 ) Since Xn and n P Yi have the same distribution i=1 p n Sn = p Xn n (1 ) !D Z N (0; 1) (10.36) By (10.36) and Slutsky’s Theorem Wn = Since Z p n N(0; 1), W = p Xn n (1 = Sn )Z p (1 N(0; (1 Wn !D W ) !D W = p (1 )Z )) and therefore N (0; (1 )) (10.37) 444 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS (d) By (10.35), (10.37) and Slutsky’s Theorem p (1 Z = p ) (1 Wn W Zn = p !D p Un (1 (e) To determine the limiting distribution of " ! r p Xn Vn = n arcsin n ) ) arcsin =Z p N (0; 1) # p let g (x) = arcsin ( x) and a = . Then g 0 (x) = q = 2 and 1 1 p 2 ( x) 1 p x (1 1 p 2 x x) g 0 (a) = g 0 ( ) = By (10.37) and the Delta Method 1 Vn !D p 2 (1 ) Z p 1 p 2 (1 (1 )= ) Z 2 N 0; 1 4 (f ) The limiting variance of Wn is equal to (1 ) which depends on . The limiting variance of Zn is 1 which does not depend on . The limiting variance of Vn is 1=4 which p does not depend on . The transformation g (x) = arcsin ( x) is a variance-stabilizing transformation for the Binomial distribution. 10.4. CHAPTER 5 445 8: X1 ; X2 ; : : : are independent Geometric( ) random variables with E (Xi ) = 1 and V ar (Xi ) = 1 2 i = 1; 2; : : : (a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Weak Law of Large Numbers n Yn 1 P 1 = Xi !p n n i=1 Xn = (10.38) (b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = 1 and V ar (Xi ) = 1 2 < 1, then by the Central Limit Theorem p n Xn 1 q !D Z N (0; 1) (10.39) 1 2 By (10.39) and Slutsky’s Theorem Wn = Since Z p 1 n Xn N(0; 1), W = q Wn = = 1 Z 2 p r p 1 2 N 0; 1 n Xn 1 2 1 n Xn q !D W = 1 2 r 1 2 Z and therefore !D W N 0; 1 2 (10.40) (c) By (10.38) and the Limit Theorems Vn = 1 1 !p 1 + Xn 1+ 1 (d) To …nd the distribution of we …rst note that p n (Vn Zn = p Vn2 (1 = ) Vn ) Wn = p n Xn = p n Xn = p n Xn + 1 = p n 1 Vn +1 1 1 1 1 1 = (10.41) 446 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Therefore by (10.40) p n 1 Vn 1 !D W N 0; 1 (10.42) 2 Next we determine the limiting distribution of p n (Vn 1 which is the numerator of Zn . Let g (x) = x ) 1 and a = g 0 (x) = . Then 2 x and g 0 (a) = g 0 By (10.42) and the Delta Theorem r p 1 2 n (Vn ) !D 1 p Z= 2 = 1 Z N 0; 2 (1 ) (10.43) By (10.43) and Slutsky’s Theorem or since if q Z p 1 2 (1 ) !D q n (Vn ) N(0; 1) then Z p n (Vn q 2 (1 p 1 2 ) ) (1 1 Z = Z N (0; 1) ) !D Z N (0; 1) (10.44) N(0; 1) by symmetry of the N(0; 1) distribution. Since p n (Vn Zn = p Vn2 (1 p ) Vn ) n pn(V 2 (1 =q 2 then by (10.41) and the Limit Theorems s s Vn2 (1 Vn ) !p 2 (1 ) Vn (1 Vn ) 2 (1 ) 2 2 (1 (1 ) =1 ) By (10.44), (10.45), (10.46), and Slutsky’s Theorem Zn !D Z =Z 1 ) ) N (0; 1) (10.45) (10.46) 10.4. CHAPTER 5 447 9: X1 ; X2 ; : : : ; Xn are independent Gamma(2; ) random variables with E (Xi ) = 2 and V ar (Xi ) = 2 2 for i = 1; 2; : : : ; n (a) Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Weak Law of Large Numbers n 1 P Xi !p 2 n i=1 Xn = By (10.47) and the Limit Theorems (10.47) p Xn 2 p !p p = 2 2 2 and p 2 p !p 1 Xn = 2 (10.48) (b) Since X1 ; X2 ; : : : are independent and identically distributed random variables with E (Xi ) = 2 and V ar (Xi ) = 2 2 < 1 then by the Central Limit Theorem p p n Xn 2 n Xn 2 p p Wn = = !D Z N (0; 1) (10.49) 2 2 2 By (10.49) and Slutsky’s Theorem Vn = p n Xn 2 !D p 2 Z N 0; 2 2 (c) By (10.48), (10.49) and Slutsky’s Theorem "p #" p # p n Xn 2 n Xn 2 2 p p p !D Z (1) = Z Zn = = Xn = 2 Xn = 2 2 (10.50) N (0; 1) (d) To determine the limiting distribution of Un = p n log Xn log (2 ) let g (x) = log x and a = 2 . Then g 0 (x) = 1=x and g 0 (a) = g 0 (2 ) = 1= (2 ). By (10.50) and the Delta Method 1p Z 1 2 Z=p N 0; Un !D 2 2 2 (e) The limiting variance of Zn is 1 which does not depend on . The limiting variance of Un is 1=2 which does not depend on . The transformation g (x) = log x is a variancestabalizing transformation for the Gamma distribution. 448 10.5 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS Chapter 6 3: (a) If Xi Geometric( ) then )x f (x; ) = (1 for x = 0; 1; : : : ; 0 < <1 The likelihood function is L( ) = = = n Q f (xi ; ) i=1 n Q )xi (1 i=1 n )t (1 where t= n P for 0 < <1 xi i=1 The log likelihood function is l ( ) = log L ( ) = n log + t log (1 ) for 0 < <1 The score function is S ( ) = l0 ( ) = = n n t 1 (n + t) (1 ) for 0 < <1 n n+t Since S ( ) > 0 for 0 < < n= (n + t) and S ( ) < 0 for n= (n + t) < the …rst derivative test the maximum likelihood estimate of is S ( ) = 0 for ^= = < 1; therefore by n n+t and the maximum likelihood estimator is ^= The information function is I( ) = = n n+T where T = n P Xi i=1 S 0 ( ) = l00 ( ) n t for 0 < 2 + (1 )2 <1 Since I ( ) > 0 for all 0 < < 1, the graph of l ( ) is concave down and this also con…rms that ^ = n= (n + t) is the maximum likelihood estimate. 10.5. CHAPTER 6 449 3: (b) The observed information is n t = 2 + ^ (1 ^)2 I(^) = n n n+t 2 + t t 2 ( n+t ) (n + t)2 (n + t)2 (n + t)3 + = n t nt n ^2 (1 ^) = = The expected information is n E 2 + T (1 ) 2 n E (T ) (1 )2 n n (1 )= = 2 + (1 )2 (1 )+ = n 2 (1 ) n for 0 < = 2 (1 ) = + 2 <1 3: (c) Since = E (Xi ) = (1 ) then by the invariance property of maximum likelihood estimators the maximum likelihood estimator of = E (Xi ) is 1 ^ T ^= = =X ^ n 3: (d) If n = 20 and t = 20 P xi = 40 then the maximum likelihood estimate of is i=1 ^= The relative likelihood function of R( ) = 20 1 = 20 + 40 3 is given by L( ) = L(^) 20 (1 1 20 3 )40 2 40 3 for 0 1 A graph of R ( ) is given in Figure 10.25A 15% likelihood interval is found by solving R ( ) = 0:15. The 15% likelihood interval is [0:2234; 0:4570]. R (0:5) = 0:03344 implies that = 0:5 is outside a 10% likelihood interval and we would conclude that = 0:5 is not a very plausible value of given the data. 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 0.0 0.2 0.4 R(θ) 0.6 0.8 1.0 450 0.2 0.3 0.4 0.5 θ Figure 10.25: Relative likelihood function for Problem 3 4: Since (X1 ; X2 ; X3 ) L ( 1; 2) Multinomial n; 2 ; 2 (1 n! = x1 !x2 ! (n 2 x1 x1 x2 )! x1 x2 )! n! = x1 !x2 ! (n 2 x2 )2 the likelihood function is ) ; (1 [2 (1 2x1 +x2 (1 h )]x2 (1 )2n )2 2x1 x2 in x1 x2 for 0 < <1 The log likelihood function is n! + x2 log 2 x1 !x2 ! (n x1 x2 )! + (2x1 + x2 ) log + (2n 2x1 x2 ) log (1 l ( ) = log ) for 0 < The score function is S( ) = = = 2x1 + x2 2n (2x1 + x2 ) 1 (2x1 + x2 ) (1 ) [2n (2x1 + x2 )] (1 ) 2n + (2x1 + x2 ) + (2x1 + x2 ) for 0 < (1 ) S ( ) = 0 if = 2x1 + x2 2n Since S ( ) > 0 for 0 < < 2x1 + x2 2n <1 <1 10.5. CHAPTER 6 451 and 2x1 + x2 2n therefore by the …rst derivative test, l ( ) has an absolute maximum at = (2x1 + x2 ) = (2n). Thus the maximum likelihood estimate of is S ( ) < 0 for 1 > > ^ = 2x1 + x2 2n and the maximum likelihood estimator of is ^ = 2X1 + X2 2n The information function is I( )= 2x1 + x2 2 + 2n (2x1 + x2 ) (1 )2 for 0 < <1 and the observed information is I(^) = I = = 2x1 + x2 2n (2n)2 (2x1 + x2 ) (2n)2 [2n (2x1 + x2 )] = + (2x1 + x2 )2 [2n (2x1 + x2 )]2 (2n)2 + (2x1 + x2 ) [2n 2n ^ 1 ^ (2n)2 = (2x1 + x2 )] 2x1 +x2 2n 2n 1 2x1 +x2 2n Since X1 Binomial n; 2 and X2 E (2X1 + X2 ) = 2n 2 Binomial (n; 2 (1 + n [2 (1 )] = 2n The expected information is (2X1 + X2 ) (1 )2 2n 2n (1 ) 1 1 + 2 + 2 = 2n 1 (1 ) 2n for 0 < < 1 (1 ) J( ) = E = = 2X1 + X2 2 + 2n 6: (a) Given P (k children in family; ) = P (0 children in family; ) = k for k = 1; 2; : : : 1 2 1 for 0 < < 1 2 )) 452 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS and the observed data No. of children Frequency observed the appropriate likelihood function for 0 17 1 22 2 7 3 3 4 1 Total 50 is based on the multinomial model: 50! 17!22!7!3!1! 1 2 1 17 L( ) = 50! 17!22!7!3!1! 1 2 1 17 = 22 2 7 3 3 49 for 0 < 4 1 < 1 2 or more simply L( ) = 1 2 1 17 49 for 0 < 1 2 < The log likelihood function is l ( ) = 17 log (1 2 ) 17 log (1 ) + 49 log for 0 < < 1 2 The score function is S( ) = = = 34 17 49 + + 2 1 2 34 + 17 2 2 + 49 1 (1 ) (1 2 ) 2 98 164 + 49 1 for 0 < < (1 ) (1 2 ) 2 1 3 +2 2 The information function is I( )= 68 (1 2 )2 17 (1 2 ) + 49 for 0 < 2 < 1 2 6: (b) S ( ) = 0 if 98 Therefore S ( ) = 0 if q 2 164 + 49 = 0 or 2 82 1 + =0 49 2 s 41 1 82 2 41 = = 2= 2 49 2 49 49 p 41 1p 41 1 = 6724 4802 = 1922 49 98 49 98 p 1 Since 0 < < 21 and = 41 49 + 98 1922 > 1, we choose 82 49 82 2 49 4 1 2 = 41 49 1p 1922 98 1 2 s (82)2 2 (49)2 (49)2 10.5. CHAPTER 6 453 Since S ( ) > 0 for 0 < < = 41 49 1p 1922 and S ( ) < 0 for 98 therefore the maximum likelihood estimate of ^= = 41 49 1p 1922 98 = 41 49 1p 1922 < 98 < 1 2 is 0:389381424147286 0:3894 The observed information for the given data is I(^) = 68 (1 2^)2 17 49 + 2 2 ^ ^ (1 ) 1666:88 0.6 0.0 0.2 0.4 R(θ) 0.8 1.0 6: (c) A graph of the relative likelihood function is given in Figure 10.26.A 15% likelihood 0.30 0.35 0.40 0.45 θ Figure 10.26: Relative likelihood function for Problem 6 interval for is [0:34; 0:43]. 6: (d) Since R (0:45) 0:0097, = 0:45 is outside a 1% likelihood interval and therefore = 0:45 is not a plausible value of for these data. 6: (e) The expected frequencies are calculated using ! 1 2^ k e0 = 50 and ek = 50^ ^ 1 The observed and expected frequencies are for k = 1; 2; : : : 454 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS No. of children Observed Frequency: fk 0 17 1 22 2 7 3 3 4 1 Total 50 Expected Frequency: ek 18:12 19:47 7:58 2:95 1:15 50 We see that the agreement between observed and expected frequencies is very good and the model gives a reasonable …t to the data. 7: Show ^ = x(1) is the maximum likelihood estimate. By Example 2.5.3(a) parameter for this distribution. By Theorem 6.6.4 Q (X; ) = ~ = X(1) quantity. q) = P ~ P (Q (X; ) q = P X(1) = 1 q P X(1) q + n Q e (q+ ) since P (Xi > x) = e = 1 i=1 nq = 1 e for q (x ) 0 Since P = P = P = P = 1 h ^+ Since 1 n log (1 ~ + 1 log (1 n p) ~ 1 log (1 p) n 1 0 Q (X; ) log (1 p) n 1 Q (X; ) log (1 p) P (Q (X; ) n 0 e ~ log(1 p) 0 = 1 (1 p) = p i p) ; ^ is a 100p% con…dence interval for . 1 1 p ~ + 1 log 1 + p P ~ + log n 2 n 2 1+p 1 1 p 1 ~ = P log log n 2 n 2 1 1+p 1 1 p = P log Q (X; ) log n 2 n 2 1 p 1+p = 1 elog( 2 ) 1 elog( 2 ) = is a location is a pivotal 1 p 1 p + + + =p 2 2 2 2 0) for x > 10.5. CHAPTER 6 h ^+ 1 n log 1 p 2 ;^ + 455 1 n log i 1+p 2 is a 100p% con…dence interval for . i h The interval ^ + n1 log (1 p) ; ^ is a better choice since it contains ^ while the interval i h ^ + 1 log 1 p ; ^ + 1 log 1+p does not. n 2 n 2 8:(a) If x1 ; x2 ; : : : ; xn is an observed random sample from the Gamma then the likelihood function is L( ) = n Q 1 1 2; distribution f (xi ; ) i=1 = 1=2 n Q 1=2 xi 1 2 1=2 i=1 n Q = xi e i=1 where n 1 2 xi t= n P n=2 e t for >0 xi i=1 or more simply L( ) = n=2 e t for >0 The log likelihood function is l ( ) = log L ( ) n = log 2 t for >0 and the score function is S( )= d n l( ) = d 2 t= n S ( ) = 0 for 2 t 2 = >0 n 2t Since S ( ) > 0 for 0 < for < n 2t and n 2t therefore by the …rst derivative test l ( ) has a absolute maximum at S ( ) < 0 for > ^= n = 1 2t 2x is the maximum likelihood estimate of and ~= 1 2X = n 2t . Thus 456 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS is the maximum likelihood estimator of . 8:(b) If Xi Gamma 1 1 2; ; i = 1; 2; : : : ; n independently then E (Xi ) = 1 2 and V ar (Xi ) = 1 2 2 and by the Weak Law of Large Numbers X !p 1 2 and by the Limit Theorems ~ = 1 !p 1 2X 2 21 = as required. 8:(c) By the Invariance Property of Maximum Likelihood Estimates the maximum likelihood estimate of = V ar (Xi ) = 1 2 2 is ^= 1 1 2 = n 2 2t 2^ 2 4t2 =2 2n2 = t n 2 = 2x2 8:(d) The moment generating function of Xi is 1 M (t) = t 1=2 1 n P The moment generating function of Q = 2 for t < Xi is i=1 MQ (t) = E etQ = E exp 2t n P Xi i=1 = n Q E [exp (2tXi )] = i=1 = = !n=2 1 2t 1 1 (1 n=2 2t) n Q M (2t ) i=1 for 2t < for t < 1 2 which is the moment generating function of a 2 (n) random variable. Therefore by the 2 (n). Uniqueness Theorem for Moment Generating Functions, Q 10.5. CHAPTER 6 457 To construct a 95% equal tail con…dence interval for P (Q a) = 0:025 = P (Q > b) so that we …nd a and b such that P (a < Q < b) = P (a < 2T < b) = 0:95 or a < 2T P so that a b 2t ; 2t < b 2T = 0:95 is a 95% equal tail con…dence interval for . For n = 20 we have P (Q For t = 20 P 9:59) = 0:025 = P (Q > 34:17) xi = 6 a 95% equal tail con…dence interval for is i=1 9:59 34:17 ; = [0:80; 2:85] 2(6) 2(6) Since = 0:7 is not in the 95% con…dence interval it is not a plausible value of the data. in light of 8:(e) The information function is I( )= n d S( )= 2 d 2 for >0 The expected information is n 2 2 J ( ) = E [I ( ; X1 ; : : : ; Xn )] = E and h By the Central Limit Theorem p n X 21 p1 2 = p 2n Let g(x) = 1 2 X g 0 (x) = g(a) = g !D Z 1 1 and a = 2x 2 then 1 2 n 2 2 p n =p 2~ p n ~ =p 2~ i1=2 ~ J(~) = 1 2x2 = 1 2 1 2 = for >0 1 2X N (0; 1) (10.51) 458 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS and g 0 (a) = g 0 1 2 1 = 2 1 2 2 = 2 2 By the Delta Theorem and (10.51) p or 2n 1 2X p ~ 2n 2 2Z !D !D 2 2Z 4 N 0; 4 N 0; 4 4 By Slutsky’s Theorem But if Z Since ~ !p p 2n ~ !D Z N (0; 1) 2 2 N(0; 1) then Z N(0; 1) and thus p n ~ p !D Z N (0; 1) 2 (10.52) then by the Limit Theorems ~ !p 1 (10.53) Thus by (10.52), (10.53) and Slutsky’s Theorem h = p ~ n p = 2~ i1=2 ~ J(~) p pn 2 ~ ~ !D Z =Z 1 An approximate 95% con…dence interval is given by 2 For n = 20 and t = 20 P 6 6^ 4 N (0; 1) 3 1:96 7 1:96 7 r ; ^+ r 5 J ^ J ^ xi = 6; i=1 ^= 1 2 6 20 = n 20 5 and J ^ = 2 = 3 2 53 2^ 2 and an approximate 95% con…dence interval is given by 5 3 1:96 5 1:96 p ; = [0:63; 2:70] +p 3 3:6 3:6 = 3:6 10.5. CHAPTER 6 For n = 20 and t = 459 20 P xi = 6 the relative likelihood function of is given by i=1 R( ) = L( ) = L(^) 10 6 e 5 10 e 10 3 = 3 5 10 e10 6 for >0 0.6 0.0 0.2 0.4 R(θ) 0.8 1.0 A graph of R ( ) is given in Figure 10.27.A 15% likelihood interval is found by solving 1 2 3 4 θ Figure 10.27: Relative likelihood function for Problem 8 R ( ) = 0:15. The 15% likelihood interval is [0:84; 2:91]. The exact 95% equal tail con…dence interval [0:80; 2:85], the approximate 95% con…dence interval [0:63; 2:70] ; and the 15% likelihood interval [0:84; 2:91] are all approximately of the same width. The exact con…dence interval and likelihood interval are skewed to the right while the approximate con…dence interval is symmetric about the maximum likelihood estimate ^ = 5=3. Approximate con…dence intervals are symmetric about the maximum likelihood estimate because they are based on a Normal approximation. Since n = 20 the approximation cannot be completely trusted. Therefore for these data the exact con…dence interval and the likelihood interval are both better interval estimates for . R (0:7) = 0:056 implies that = 0:7 is outside a 10% likelihood interval so based on the likelihood function we would conclude that = 0:7 is not a very plausible value of given the data. Previously we noted that = 0:7 is also not contained in the exact 95% con…dence interval. Note however that = 0:7 is contained in the approximate 95% con…dence interval and so based on the approximate con…dence interval we would conclude that = 0:7 is a reasonable value of given the data. Again the reason for the disagreement is because n = 20 is not large enough for the approximation to be a good one. 460 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS 10.6 Chapter 7 1: Since Xi has cumulative distribution function F (x; 1; 2) 2 1 =1 for x x 1; 1 > 0; 2 >0 the probability density function of Xi is f (x; 1; 2) d F (x; dx = = 2 1 x x 1; 2) 2 for x 1; 1 > 0; 2 >0 The likelihood function is L ( 1; 2) = n Q f (xi ; 1; 2) i=1 = n Q i=1 = 2 1 xi xi n n 2 1 2 n Q 2 if 0 < 2 1 xi ; i = 1; 2; : : : ; n and 2 >0 1 xi if 0 < x(1) and 1 >0 2 i=1 For each value of 2 the likelihood function is maximized over 1 by taking 1 to be as large as possible subject to 0 < 1 x(1) . Therefore for …xed 2 the likelihood is maximized for 1 = x(1) . Since this is true for all values of 2 the value of ( 1 ; 2 ) which maximizes L ( 1 ; 2 ) will necessarily have 1 = x(1) . To …nd the value of 2 which maximizes L x(1) ; L2 ( 2 ) = L x(1) ; = 2 n n 2 2 x(1) and its logarithm l2 ( 2 ) = log L2 ( 2 ) = n log 2 +n n Q 2 consider the function 2 1 xi for 2 >0 i=1 2 log x(1) ( 2 + 1) n P i=1 Now n d l2 ( 2 ) = l20 ( 2 ) = + n log x(1) d 2 2 n P n xi log = x(1) 2 i=1 n = t 2 = n 2t 2 n P i=1 log xi log xi 10.6. CHAPTER 7 461 where t= n P xi x(1) log i=1 Now l20 ( 2 ) = 0 for 2 = n=t. Since l20 ( 2 ) > 0 for 0 < 2 < n=t and l20 ( 2 ) < 0 for 2 > n=t therefore by the …rst derivative test l2 ( 2 ) is maximized for 2 = n=t = ^2 . Therefore L2 ( 2 ) = L x(1) ; 2 is also maximized for 2 = ^2 : Therefore the maximum likelihood estimators of 1 and 2 are given by ^1 = X(1) and ^2 = n n P log i=1 Xi X(1) 4: (a) If the events S and H are independent events then P (S \ H) = P (S)P (H) = P (S \ H) = P (S)P (H) = (1 ), etc. , The likelihood function is L( ; ) = n! ( x11 !x12 !x21 !x22 ! )x11 [ (1 )]x12 [(1 or more simply (ignoring constants with respect to L( ; ) = x11 +x12 )x21 +x22 (1 x11 +x21 (1 ) ]x21 [(1 ) (1 )]x22 and ) )x12 +x22 for 0 1, 0 1 The log likelihood is l( ; ) = (x11 + x12 ) log for 0 < < 1, 0 < + (x21 + x22 ) log(1 ) + (x11 + x21 ) log + (x12 + x22 ) log(1 <1 Since @l @ @l @ = = x11 + x12 x11 + x21 x21 + x22 x11 + x12 = 1 x12 + x22 x11 + x21 = 1 n n (x11 + x12 ) x11 + x12 n = 1 (1 ) (x11 + x21 ) x11 + x21 n = 1 (1 ) the score vector is S( ; ) = for 0 < h x11 +x12 n (1 ) < 1, 0 < x11 +x21 n (1 ) <1 Solving S( ; ) = (0; 0) gives the maximum likelihood estimates ^= x11 + x12 x11 + x21 and ^ = n n i ) 462 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS The information matrix is 2 I( ; ) = 4 2 @2l @ 2 @2l @ @ x11 +x12 2 = 4 3 @2l @ @ + 5 @2l @ 2 n (x11 +x12 ) (1 )2 0 x11 +x21 0 + 2 n (x11 +x21 ) (1 )2 3 5 4:(b) Since X11 + X12 = the number of times the event S is observed and P (S) = , then the distribution of X11 + X12 is Binomial(n; ). Therefore E (X11 + X12 ) = n and E X11 + X12 2 n + (X11 + X12 ) (1 )2 n (1 = ) Since X11 + X21 = the number of times the event H is observed and P (H) = , then the distribution of X11 + X21 is Binomial(n; ). Therefore E (X11 + X21 ) = n and E X11 + X21 2 n + (X11 + X21 ) (1 )2 Therefore the expected information matrix is 2 J( ; )=4 The inverse matrix is [J ( ; )] 1 n (1 2 n (1 0 (1 n =4 0 ) ) ) 0 n (1 = 3 5 0 (1 n ) ) 3 5 Also V ar (~ ) = (1n ) and V ar( ~ ) = (1n ) so the diagonal entries of [J ( ; )] the variances of the maximum likelihood estimators. 7:(a) If Yi Binomial(1; pi ) where pi = 1 + e constants, the likelihood function for ( ; ) is L( ; ) = n Q 1 yi p (1 yi i i=1 or more simply L( ; ) = n Q i=1 The log likelihood function is l ( ; ) = log L ( ; ) = n P i=1 pyi i (1 xi 1 pi )1 pi )1 [yi log (pi ) + (1 1 give us and x1 ; x2 ; : : : ; xn are known yi yi yi ) log (1 pi )] 10.6. CHAPTER 7 463 Note that @pi @ @ @ = 1 = 1 xi 1+e x i )2 (1 + e xi e xi ) (1 + e (1 + e xi e = xi ) = pi (1 pi ) and @pi @ @ @ = = xi 1 xi 1+e 1 xi ) (1 (1 + e = xi xi e xi )2 (1 + e xi e +e xi ) = xi pi (1 pi ) Therefore n P @l @pi yi (1 yi ) @pi = @pi @ (1 pi ) @ i=1 pi n P yi (1 pi ) (1 yi ) pi pi (1 = pi (1 pi ) i=1 n P = [yi (1 pi ) (1 yi ) pi ] @l @ = = i=1 n P (yi pi ) pi ) i=1 @l @ n P @l @pi yi (1 yi ) @pi = @pi @ (1 pi ) @ i=1 pi n P yi (1 pi ) (1 yi ) pi = xi pi (1 pi (1 pi ) i=1 n P = xi [yi (1 pi ) (1 yi ) pi ] = = i=1 n P xi (yi pi ) " # pi ) i=1 The score vector is S( ; )= @l @ @l @ 2 6 =6 4 n P i=1 n P (yi xi (yi pi ) pi ) i=1 To obtain the expected information we …rst note that @2l @ = 2 @ @ @l @ = @2l @ = @ @ @ @l @ = @ @ @ @ n P (yi pi ) = i=1 n P i=1 (yi pi ) = n @p P i = @ i=1 n @p P i = @ i=1 3 7 7 5 n P pi (1 pi ) i=1 n P i=1 xi pi (1 pi ) 464 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS and @ @2l 2 = @ @ @l @ = @ @ n P xi (yi pi ) = i=1 The information matrix is 2 I( ; ) = 4 2 n P xi i=1 @2l @ 2 @2l @ @ @2l @ @ n P @2l @ 2 pi (1 pi ) 6 6 i=1 = 6 n 4 P xi pi (1 pi ) i=1 @pi = @ n P i=1 x2i pi (1 pi ) 3 5 n P i=1 n P i=1 3 xi pi (1 pi ) 7 7 7 5 pi ) x2i pi (1 which is a constant function of the random variables Y1 ; Y2 ; :::; Yn and therefore the expected information is J ( ; ) = I ( ; ) 5:(b) The maximum likelihood estimates of and h i @l @l S( ; ) = @ @ n P (yi i=1 h i = 0 0 = pi ) are found by solving the equations n P xi (yi pi ) i=1 which must be done numerically. Newton’s method is given by h (i+1) where (i+1) (0) ; (0) i = h (i) (i) i +S (i) ; (i) is an initial estimate of ( ; ). h I (i) ; (i) i 1 i = 0; 1; : : : convergence 10.7. CHAPTER 8 10.7 465 Chapter 8 1:(a) The hypothesis H0 : speci…ed. = 0 is a simple hypothesis since the model is completely From Example 6.3.6 the likelihood function is n Q n L( ) = for xi >0 i=1 The log likelihood function is n P l( ) = n log + log xi for >0 i=1 and the maximum likelihood estimate is n ^= n P log xi i=1 The relative likelihood function is R( ) = n L( ) = L(^) ^ = 2 log R ( 0 ; X) " 2 log = 2 ~ 0 n log = 2 = 2n 0 n Q 0 for ~# Xi i=1 n ~ P log Xi i=1 +n ~ 0 + ~ 1 ~ 0 ~ 1 ~ 0 n 1P log Xi n i=1 n P since 0 log ~ The observed value of the likelihood ratio test statistic is ( 0 ; x) = 0 is 0 0 0 log ~ = ~ n log = 2n n 0 ^ xi i=1 The likelihood ratio test statistic for H0 : ( 0 ; X) = n Q 2 log R ( 0 ; X) = 2n 0 ^ 1 log 0 ^ log Xi i=1 n = 1 ~ 466 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS The parameter space is approximate p-value is p-value (b) If n = 20 and 20 P = f : > 0g which has dimension 1 and thus k = 1. The 2 P (W ( 0 ; x)) where W (1) h i p = 2 1 P Z ( 0 ; x) where Z log xi = N (0; 1) = 1 then ^ = 20=25 = 0:8 the observed value 25 and H0 : i=1 of the likelihood ratio test statistic is ( 0 ; x) = 2 log R ( 0 ; X) 1 = 2 (20) 1 log 0:8 = 40 [0:25 log (1:25)] 1 0:8 = 1:074258 and p value 2 P (W 1:074258) where W (1) h i p = 2 1 P Z 1:074258 where Z N (0; 1) = 0:2999857 calculated using R. Since p the data. value > 0:1 there is no evidence against H0 : = 1 based on 4: Since = f( 1 ; 2 ) : 1 > 0; 2 > 0g which has dimension k = 2 and 0 = f( 1 ; 2 ) : 1 = 2 ; 1 > 0; 2 > 0g which has dimension q = 1 and the hypothesis is composite. From Example 6.5.2 the likelihood function for an observed random sample x1 ; x2 ; : : : ; xn from an Weibull(2; 1 ) distribution is L1 ( 1 ) = 2n 1 n 1 P x2i 2 1 i=1 exp with maximum likelihood estimate ^1 = 1 n n P i=1 for 1 >0 1=2 x2i . Similarly the likelihood function for an observed random sample y1 ; y2 ; : : : ; ym from a Weibull(2; 2 ) distribution is L2 ( 2 ) = 2m 2 m 1 P yi2 2 2 i=1 exp with maximum likelihood estimate ^2 = 1 m m P i=1 yi2 for 1=2 . 2 >0 10.7. CHAPTER 8 467 Since the samples are independent the likelihood function for ( 1 ; L ( 1; 2) = L1 ( 1 )L2 ( 2 ) for > 0; 1 2) is >0 2 and the log likelihood function l ( 1; 2) = 2n log n 1 P 1 2 1 i=1 x2i 2m log m 1 P yi2 2 2 i=1 2 for 1 > 0; 2 >0 The independence of the samples implies the maximum likelihood estimators are n 1 P X2 n i=1 i ~1 = Therefore l(~1 ; ~2 ; X; Y) = If 1 = 2 = n log 1=2 n 1 P X2 n i=1 i d l( ) = d and 2 i=1 max 1 ; 2 )2 ( 2 (n + m) x2i + 3 n P 1 n+m i=1 and therefore max 1 ; 2 )2 l( 1 ; 2 ; X; Y) = (n + m) log 0 m P yi2 for 2 ; X; Y) >0 we note that i=1 x2i + m P i=1 yi2 1=2 yi2 1 n+m The likelihood ratio test statistic is (X; Y; i=1 l( 1 ; i=1 x2i + m P (n + m) 0 n P 2 + l ( ) = 0 for = ( n P 1 2 (n + m) log which is only a function of . To determine d d m 1 P Y2 m i=1 i m log then the log likelihood function is l( ) = 1=2 m 1 P Y2 m i=1 i ~2 = n P i=1 Xi2 + m P i=1 Yi2 0) = 2 l(~1 ; ~2 ; X; Y) ( max 1 ; 2 )2 l( 1 ; 2 ; X; Y) 0 n m 1 P 1 P Xi2 m log Y2 (n + m) n i=1 m i=1 i n m P P 1 + (n + m) log Xi2 + Yi2 + (n + m) ] n + m i=1 i=1 n m P P 1 = 2[ (n + m) log Xi2 + Yi2 n + m i=1 i=1 n m 1 P 1 P n log Xi2 m log Y2 ] n i=1 m i=1 i = 2[ n log (n + m) 468 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS with corresponding observed value (x; y; 0) = 2[ (n + m) log n log Since k q=2 1=1 p-value 1 n+m n 1 P x2 n i=1 i n P i=1 m log x2i + m P i=1 yi2 m 1 P y2 ] m i=1 i 2 P [W (x; y; 0 )] where W (1) h i p = 2 1 P Z (x; y; 0 ) where Z N (0; 1) 11. Summary of Named Distributions 469 Summary of Discrete Distributions Notation and Parameters Function fx Discrete Uniforma, b b≥a a, b integers x a, a 1, … , b xr N−r n−x N 1, 2, … Nn r 0, 1, … , N Binomialn, p 0 ≤ p ≤ 1, q 1 − p n 1, 2, … x max 0, n − N r, nx p x q n−x x 0, 1, … , n p x q 1−x 0 ≤ p ≤ 1, q 1 − p x 0, 1 Negative Binomialk, p k x xk−1 x p q −k x k p −q k 1, 2, … x 0, 1, … Geometricp pq x 0 p ≤ 1, q 1 − p x 0, 1, … Poisson x e − x! ≥0 x 0, 1, … Multinomialn; p 1 , p 2 , … , p k 0 ≤ pi ≤ 1 i 1, 2, … , k k and ∑ p i 1 i1 Variance Generating EX VarX Function b−a1 2 −1 12 ab 2 1 b−a1 b ∑ e tx xa t∈ nr N nr N 1 − r N N−n N−1 Not tractable … , minr, n Bernoullip 0 p ≤ 1, q 1 − p Mean Mt 1 b−a1 HypergeometricN, r, n n 0, 1, … , N Moment Probability x np npq pe t q n p pq pe t q kq p kq p2 q p q p2 fx 1 , x 2 , … , x k n! x 1 !x 2 !x k ! x i 0, 1, … , n EX i np i i 1, 2, … , k i 1, 2, … , k and ∑ x i n i1 t∈ p 1−qe t k t − ln q p 1−qe t t − ln q t e e −1 t∈ Mt 1 , t 2 , … , t k−1 p x11 p x22 p xk k k t∈ VarX i p 1 e t 1 p 2 e t 2 np i 1 − p i p k−1 e t k−1 p k n i 1, 2, … , k ti ∈ i 1, 2, … , k − 1 Summary of Continuous Distributions Moment Probability Notation and Density Mean Variance Generating Parameters Function EX VarX Function Mt fx Uniforma, b 1 b−a ba axb Γab ΓaΓb Betaa, b a 0, b 0 ab 2 b−a 2 12 x a−1 1 − x b−1 0x1 a ab ab ab1ab 2 Γ x −1 e −x dx e bt −e at b−at t≠0 1 t0 k−1 k1 i0 1∑ ai abi t∈ 0 2 2 e −x− /2 N, 2 2 ∈ , 2 0 2 2 e −log x− /2 2 x 2 ∈ , 0 e 2 /2 x0 1 Exponential 0 e −x/ x≥0 Two Parameter 1 Exponential, e −x−/ e 2 −e 2 2 2 2 2 Double 1 2 Exponential, e −|x−|/ 2 2 1 e x−/−e ∈ , 0 x∈ Gamma, x −1 e −x/ Γ 0, 0 x0 Inverse Gamma, 0, 0 x−/ − 0. 5772 1 1−t t 22 6 Euler’s constant 2 x −−1 e −1/x Γ 1 −1 1 2 −1 2 −2 x0 1 2 1 e t 1−t 1 e t 1− 2 t 2 |t| x∈ ∈ , 0 DNE t x≥ ∈ , 0 2 t 2 /2 t∈ x∈ Lognormal, 2 Extreme Value, e t 2 1 e t Γ1 t t −1/ 1 − t − t 1 DNE tk k! Summary of Continuous Distributions Continued Moment Probability Notation and Density Mean Variance Generating Parameters Function EX VarX Function Mt fx x k/2−1 e −x/2 2 k k 2 k/2 Γk/2 k 1, 2, … 1 − 2t −k/2 2k t x0 Weibull, x −1 e −x/ Γ1 1 2 2 Γ1 2 1 1 2 Not tractable 0, 0 x0 Pareto, x 1 −1 2 −1 2 −2 0, 0 x 1 2 22 3 e t Γ1 − tΓ1 t DNE DNE DNE 0 k k−2 k 2, 3, … DNE k 3, 4, … k2 k 2 −2 2k 22 k 1 k 2 −2 k 1 k 2 −2 2 k 2 −4 k2 2 k2 4 e −x−/ 1e −x−/ Logistic, ∈ , 0 1 1x−/ 2 ∈ , 0 DNE x∈ Γ tk −k1/2 2 k1 2 1 xk k Γ k 1, 2, … k 2 x∈ k1 2 k1 k2 Fk 1 , k 2 k1 2 Γ k 2 1, 2, … x∈ Cauchy, k 1 1, 2, … 2 −Γ 1 x k1 2 −1 Γ Γ 1 k 1 k 2 2 k2 2 k1 k2 x − k 1 k 2 2 DNE x0 X X1 X2 1 2 BVN, fx 1 , x 2 1 ∈ , 2 ∈ 21 1 2 1 2 22 1 0, 2 0 −1 1 1 2|| 1/2 e − 12 x− T −1 x− x 1 ∈ , x 2 ∈ Mt 1 , t 2 e T t 1 t T t 2 t 1 ∈ , t 2 ∈ 12. Distribution Tables 473 N(0,1) Cumulative Distribution Function This table gives values of F(x) = P(X ≤ x) for X ~ N(0,1) and x ≥ 0 x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 0.00 0.50000 0.53983 0.57926 0.61791 0.65542 0.69146 0.72575 0.75804 0.78814 0.81594 0.84134 0.86433 0.88493 0.90320 0.91924 0.93319 0.94520 0.95543 0.96407 0.97128 0.97725 0.98214 0.98610 0.98928 0.99180 0.99379 0.99534 0.99653 0.99744 0.99813 0.99865 0.99903 0.99931 0.99952 0.99966 0.99977 0.01 0.50399 0.54380 0.58317 0.62172 0.65910 0.69497 0.72907 0.76115 0.79103 0.81859 0.84375 0.86650 0.88686 0.90490 0.92073 0.93448 0.94630 0.95637 0.96485 0.97193 0.97778 0.98257 0.98645 0.98956 0.99202 0.99396 0.99547 0.99664 0.99752 0.99819 0.99869 0.99906 0.99934 0.99953 0.99968 0.99978 0.02 0.50798 0.54776 0.58706 0.62552 0.66276 0.69847 0.73237 0.76424 0.79389 0.82121 0.84614 0.86864 0.88877 0.90658 0.92220 0.93574 0.94738 0.95728 0.96562 0.97257 0.97831 0.98300 0.98679 0.98983 0.99224 0.99413 0.99560 0.99674 0.99760 0.99825 0.99874 0.99910 0.99936 0.99955 0.99969 0.99978 N(0,1) Quantiles: p 0.5 0.6 0.7 0.8 0.9 0.00 0.0000 0.2533 0.5244 0.8416 1.2816 0.01 0.0251 0.2793 0.5534 0.8779 1.3408 0.02 0.0502 0.3055 0.5828 0.9154 1.4051 0.03 0.51197 0.55172 0.59095 0.62930 0.66640 0.70194 0.73565 0.76730 0.79673 0.82381 0.84849 0.87076 0.89065 0.90824 0.92364 0.93699 0.94845 0.95818 0.96638 0.97320 0.97882 0.98341 0.98713 0.99010 0.99245 0.99430 0.99573 0.99683 0.99767 0.99831 0.99878 0.99913 0.99938 0.99957 0.99970 0.99979 0.04 0.51595 0.55567 0.59483 0.63307 0.67003 0.70540 0.73891 0.77035 0.79955 0.82639 0.85083 0.87286 0.89251 0.90988 0.92507 0.93822 0.94950 0.95907 0.96712 0.97381 0.97932 0.98382 0.98745 0.99036 0.99266 0.99446 0.99585 0.99693 0.99774 0.99836 0.99882 0.99916 0.99940 0.99958 0.99971 0.99980 0.05 0.51994 0.55962 0.59871 0.63683 0.67364 0.70884 0.74215 0.77337 0.80234 0.82894 0.85314 0.87493 0.89435 0.91149 0.92647 0.93943 0.95053 0.95994 0.96784 0.97441 0.97982 0.98422 0.98778 0.99061 0.99286 0.99461 0.99598 0.99702 0.99781 0.99841 0.99886 0.99918 0.99942 0.99960 0.99972 0.99981 0.06 0.52392 0.56356 0.60257 0.64058 0.67724 0.71226 0.74537 0.77637 0.80511 0.83147 0.85543 0.87698 0.89617 0.91309 0.92785 0.94062 0.95154 0.96080 0.96856 0.97500 0.98030 0.98461 0.98809 0.99086 0.99305 0.99477 0.99609 0.99711 0.99788 0.99846 0.99889 0.99921 0.99944 0.99961 0.99973 0.99981 0.07 0.52790 0.56749 0.60642 0.64431 0.68082 0.71566 0.74857 0.77935 0.80785 0.83398 0.85769 0.87900 0.89796 0.91466 0.92922 0.94179 0.95254 0.96164 0.96926 0.97558 0.98077 0.98500 0.98840 0.99111 0.99324 0.99492 0.99621 0.99720 0.99795 0.99851 0.99893 0.99924 0.99946 0.99962 0.99974 0.99982 0.08 0.53188 0.57142 0.61026 0.64803 0.68439 0.71904 0.75175 0.78230 0.81057 0.83646 0.85993 0.88100 0.89973 0.91621 0.93056 0.94295 0.95352 0.96246 0.96995 0.97615 0.98124 0.98537 0.98870 0.99134 0.99343 0.99506 0.99632 0.99728 0.99801 0.99856 0.99896 0.99926 0.99948 0.99964 0.99975 0.99983 0.09 0.53586 0.57535 0.61409 0.65173 0.68793 0.72240 0.75490 0.78524 0.81327 0.83891 0.86214 0.88298 0.90147 0.91774 0.93189 0.94408 0.95449 0.96327 0.97062 0.97670 0.98169 0.98574 0.98899 0.99158 0.99361 0.99520 0.99643 0.99736 0.99807 0.99861 0.99900 0.99929 0.99950 0.99965 0.99976 0.99983 This table gives values of F-1(p) for p ≥ 0.5 0.03 0.0753 0.3319 0.6128 0.9542 1.4758 0.04 0.1004 0.3585 0.6433 0.9945 1.5548 0.05 0.1257 0.3853 0.6745 1.0364 1.6449 0.06 0.1510 0.4125 0.7063 1.0803 1.7507 0.07 0.1764 0.4399 0.7388 1.1264 1.8808 0.075 0.1891 0.4538 0.7554 1.1503 1.9600 0.08 0.2019 0.4677 0.7722 1.1750 2.0537 0.09 0.2275 0.4959 0.8064 1.2265 2.3263 0.095 0.2404 0.5101 0.8239 1.2536 2.5758 Chi‐Squared Quantiles This table gives values of x for p = P(X ≤ x) = F(x) df\p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 0.005 0.000 0.010 0.072 0.207 0.412 0.676 0.989 1.344 1.735 2.156 2.603 3.074 3.565 4.075 4.601 5.142 5.697 6.265 6.844 7.434 8.034 8.643 9.260 9.886 10.520 11.160 11.808 12.461 13.121 13.787 20.707 27.991 35.534 43.275 51.172 59.196 67.328 0.01 0.000 0.020 0.115 0.297 0.554 0.872 1.239 1.647 2.088 2.558 3.054 3.571 4.107 4.660 5.229 5.812 6.408 7.015 7.633 8.260 8.897 9.542 10.196 10.856 11.524 12.198 12.879 13.565 14.256 14.953 22.164 29.707 37.485 45.442 53.540 61.754 70.065 0.025 0.001 0.051 0.216 0.484 0.831 1.237 1.690 2.180 2.700 3.247 3.816 4.404 5.009 5.629 6.262 6.908 7.564 8.231 8.907 9.591 10.283 10.982 11.689 12.401 13.120 13.844 14.573 15.308 16.047 16.791 24.433 32.357 40.482 48.758 57.153 65.647 74.222 0.05 0.004 0.103 0.352 0.711 1.146 1.635 2.167 2.733 3.325 3.940 4.575 5.226 5.892 6.571 7.261 7.962 8.672 9.391 10.117 10.851 11.591 12.338 13.091 13.848 14.611 15.379 16.151 16.928 17.708 18.493 26.509 34.764 43.188 51.739 60.391 69.126 77.929 0.1 0.016 0.211 0.584 1.064 1.610 2.204 2.833 3.490 4.168 4.865 5.578 6.304 7.042 7.790 8.547 9.312 10.085 10.865 11.651 12.443 13.240 14.041 14.848 15.659 16.473 17.292 18.114 18.939 19.768 20.599 29.051 37.689 46.459 55.329 64.278 73.291 82.358 0.9 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 17.275 18.549 19.812 21.064 22.307 23.542 24.769 25.989 27.204 28.412 29.615 30.813 32.007 33.196 34.382 35.563 36.741 37.916 39.087 40.256 51.805 63.167 74.397 85.527 96.578 107.570 118.500 0.95 3.842 5.992 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 19.675 21.026 22.362 23.685 24.996 26.296 27.587 28.869 30.144 31.410 32.671 33.924 35.172 36.415 37.652 38.885 40.113 41.337 42.557 43.773 55.758 67.505 79.082 90.531 101.880 113.150 124.340 0.975 5.024 7.378 9.348 11.143 12.833 14.449 16.013 17.535 19.023 20.483 21.920 23.337 24.736 26.119 27.488 28.845 30.191 31.526 32.852 34.170 35.479 36.781 38.076 39.364 40.646 41.923 43.195 44.461 45.722 46.979 59.342 71.420 83.298 95.023 106.630 118.140 129.560 0.99 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 24.725 26.217 27.688 29.141 30.578 32.000 33.409 34.805 36.191 37.566 38.932 40.289 41.638 42.980 44.314 45.642 46.963 48.278 49.588 50.892 63.691 76.154 88.379 100.430 112.330 124.120 135.810 0.995 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 26.757 28.300 29.819 31.319 32.801 34.267 35.718 37.156 38.582 39.997 41.401 42.796 44.181 45.559 46.928 48.290 49.645 50.993 52.336 53.672 66.766 79.490 91.952 104.210 116.320 128.300 140.170 Student t Quantiles This table gives values of x for p = P(X ≤ x ) = F (x ), for p ≥ 0.6 df \ p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 >100 0.6 0.3249 0.2887 0.2767 0.2707 0.2672 0.2648 0.2632 0.2619 0.2610 0.2602 0.2596 0.2590 0.2586 0.2582 0.2579 0.2576 0.2573 0.2571 0.2569 0.2567 0.2566 0.2564 0.2563 0.2562 0.2561 0.2560 0.2559 0.2558 0.2557 0.2556 0.2550 0.2547 0.2545 0.2543 0.2542 0.2541 0.2540 0.2535 0.7 0.7265 0.6172 0.5844 0.5686 0.5594 0.5534 0.5491 0.5459 0.5435 0.5415 0.5399 0.5386 0.5375 0.5366 0.5357 0.5350 0.5344 0.5338 0.5333 0.5329 0.5325 0.5321 0.5317 0.5314 0.5312 0.5309 0.5306 0.5304 0.5302 0.5300 0.5286 0.5278 0.5272 0.5268 0.5265 0.5263 0.5261 0.5247 0.8 1.3764 1.0607 0.9785 0.9410 0.9195 0.9057 0.8960 0.8889 0.8834 0.8791 0.8755 0.8726 0.8702 0.8681 0.8662 0.8647 0.8633 0.8620 0.8610 0.8600 0.8591 0.8583 0.8575 0.8569 0.8562 0.8557 0.8551 0.8546 0.8542 0.8538 0.8507 0.8489 0.8477 0.8468 0.8461 0.8456 0.8452 0.8423 0.9 3.0777 1.8856 1.6377 1.5332 1.4759 1.4398 1.4149 1.3968 1.3830 1.3722 1.3634 1.3562 1.3502 1.3450 1.3406 1.3368 1.3334 1.3304 1.3277 1.3253 1.3232 1.3212 1.3195 1.3178 1.3163 1.3150 1.3137 1.3125 1.3114 1.3104 1.3031 1.2987 1.2958 1.2938 1.2922 1.2910 1.2901 1.2832 0.95 6.3138 2.9200 2.3534 2.1318 2.0150 1.9432 1.8946 1.8595 1.8331 1.8125 1.7959 1.7823 1.7709 1.7613 1.7531 1.7459 1.7396 1.7341 1.7291 1.7247 1.7207 1.7171 1.7139 1.7109 1.7081 1.7056 1.7033 1.7011 1.6991 1.6973 1.6839 1.6759 1.6706 1.6669 1.6641 1.6620 1.6602 1.6479 0.975 12.7062 4.3027 3.1824 2.7764 2.5706 2.4469 2.3646 2.3060 2.2622 2.2281 2.2010 2.1788 2.1604 2.1448 2.1314 2.1199 2.1098 2.1009 2.0930 2.0860 2.0796 2.0739 2.0687 2.0639 2.0595 2.0555 2.0518 2.0484 2.0452 2.0423 2.0211 2.0086 2.0003 1.9944 1.9901 1.9867 1.9840 1.9647 0.99 31.8205 6.9646 4.5407 3.7469 3.3649 3.1427 2.9980 2.8965 2.8214 2.7638 2.7181 2.6810 2.6503 2.6245 2.6025 2.5835 2.5669 2.5524 2.5395 2.5280 2.5176 2.5083 2.4999 2.4922 2.4851 2.4786 2.4727 2.4671 2.4620 2.4573 2.4233 2.4033 2.3901 2.3808 2.3739 2.3685 2.3642 2.3338 0.995 0.999 0.9995 63.6567 318.3088 636.6192 9.9248 22.3271 31.5991 5.8409 10.2145 12.9240 4.6041 7.1732 8.6103 4.0321 5.8934 6.8688 3.7074 5.2076 5.9588 3.4995 4.7853 5.4079 3.3554 4.5008 5.0413 3.2498 4.2968 4.7809 3.1693 4.1437 4.5869 3.1058 4.0247 4.4370 3.0545 3.9296 4.3178 3.0123 3.8520 4.2208 2.9768 3.7874 4.1405 2.9467 3.7328 4.0728 2.9208 3.6862 4.0150 2.8982 3.6458 3.9651 2.8784 3.6105 3.9216 2.8609 3.5794 3.8834 2.8453 3.5518 3.8495 2.8314 3.5272 3.8193 2.8188 3.5050 3.7921 2.8073 3.4850 3.7676 2.7969 3.4668 3.7454 2.7874 3.4502 3.7251 2.7787 3.4350 3.7066 2.7707 3.4210 3.6896 2.7633 3.4082 3.6739 2.7564 3.3962 3.6594 2.7500 3.3852 3.6460 2.7045 3.3069 3.5510 2.6778 3.2614 3.4960 2.6603 3.2317 3.4602 2.6479 3.2108 3.4350 2.6387 3.1953 3.4163 2.6316 3.1833 3.4019 2.6259 3.1737 3.3905 2.5857 3.1066 3.3101