STAT 350 - An Introduction to Statistics Named Continuous Distributions Jeremy Troisi 1 Uniform Distribution - X ∼ U (a, b) Probability is uniform or the same over an interval a to b. X ∼ U (a, b), a < b where a is the beginning of the interval and b is the end of the interval. The Uniform Distribution derives ’naturally’ from Poisson Processes and how it does will be covered in the Poisson Process Notes. However, for the Named Continuous Distribution Notes, we will simply discuss its various properties. 1.1 Probability Density Function (PDF) - fX (x) = fX (x) = 1.1.1 1 b−a 1 b−a :a<x<b a<x<b Else 0 Rules 1 b−a 1. a < b ⇒ 2. limnր∞ Rn −n > 0 ≥ 0 for a < x < b and 0 ≥ 0 else. ⇒ fX (x) ≥ 0 for all (∀)x fX (x)dx = limnր∞ Ra −n (0)dx+ Rb Rn 1 dx+ b (0)dx a b−a 1 [x]ba +(0) = = (0)+ b−a 1 b−a (b−a) =1 Therefore, X is a PDF. 1.2 Cumulative Distribution Function (CDF) - FX (x) = Rx Ra Rx FX (x) = P (X ≤ x) = −∞ fX (t)dt = −∞ (0)dt + a x≤a 0 x−a a <x<b ⇒ FX (x) = b−a 1 x≥b 1 b−a dt = (0) + 1 x b−a [t]a = x−a b−a :a<x<b x−a b−a Thus, geometrically, we take the (x − a) portion of the entire (b − a) length of the interval (or rectangle). 1.3 Percentile, p : x∗ = a(1 − p) + bp For 0 < p < 1: FX (x∗ ) = p ⇒ x∗ −a b−a = p ⇒ x∗ − a = p(b − a) ⇒ x∗ = p(b − a) + a = a(1 − p) + bp Thus, the p percentile of a Uniform Distribution takes p proportion of the right endpoint value b and the remaining (1 − p) proportion of the left endpoint value a. 1 1.4 Probability - SX (x) = b−x b−a :a<x<b 1 b−x b−a C SX (x) = P (X > x) = 1 − P ((X > x) ) = 1 − P (X ≤ x) = 1 − FX (x) = 0 x≤a a<x<b x≥b Thus, geometrically, we take the (b − x) portion of the entire (b − a) length of the interval (or rectangle). AND x1 < x2 ⇒ P (x1 < X < x2 ) = FX (x2 ) − FX (x1 ) = 0 x2 −a b−a x1 < x2 ≤ a < b x1 ≤ a < x2 < b a < x1 < x2 < b a < x1 < b ≤ x2 a < b ≤ x1 < x2 x2 −x1 b−a b−x1 b−a 0 Thus, geometrically, we take the (x2 − x1 ) portion of the entire (b − a) length of the interval (or rectangle). 1.5 Mean/Expected Value - µ = E[X] = µ = E[X] = limnր∞ = (0) + 1 2 b 2(b−a) [x ]a Rn −n xfX (x)dx = limnր∞ 1 2 2(b−a) (b + (0) = − a2 ) = Ra −n a+b 2 x(0)dx + (b−a)(b+a) 2(b−a) = Rb a 1 )dx x( b−a a+b 2 Rn b x(0)dx An expected value is a ’center of gravity’ from Physics. The mean is the location that will holster the weight of our density and prevent it from toppling over in one direction or the other. Since the weight is distributed uniformally or even more generally, symmetrically (so long as the mean exists...see Cauchy Distribution for a symmetric distribution that does not possess a mean), this place is the midpoint of a and b, a+b 2 , and we have constructed a perfect ’teeter-totter’. Keep in mind, this value will always need to be bigger than or equal to the smallest possible value and smaller than or equal to the largest possible value. Beyond that, some basic logic can narrow down further whether your answer is plausible or has an error somewhere. a< 1.6 a+b 2 = E[X] = a+b 2 <b (b−a)2 12 Variance - σ 2 = V ar[X] = σ 2 = V ar[X] = E[X 2 ] − µ2 = limnր∞ Rn −n 2 x2 fX (x)dx − ( a+b 2 ) Rn Ra Rb 1 )dx b x2 (0)dx) − = limnր∞ ( −n x2 (0)dx + a x2 ( b−a 2 1 = ((0)+ 3(b−a) [x3 ]ba +(0))− a +2ab+b2 4 = 4(b−a)(a2 +ab+b2 ) 12(b−a) = (b−a)(4(a2 +ab+b2 )−3(a2 +2ab+b2 )) 12(b−a) − = 3(b−a)(a2 +2ab+b2 ) 12(b−a) = (a+b)2 22 (b−a)(a2 +2ab+b2 ) 1 3 3 3(b−a) (b −a )− 4(b−a) = = (b−a)(b2 +ab+a2 ) (b−a)(a2 +2ab+b2 ) − 3(b−a) 4(b−a) 4(b−a)(a2 +ab+b2 )−3(b−a)(a2 +2ab+b2 ) 12(b−a) (4a2 +4ab+4b2 )−(3a2 +6ab+3b2 ) 12 = a2 −2ab+b2 12 = (b−a)2 12 Intuitively, the variance should be based on the length of the interval (b − a). The furthest a point 2 can be from the mean/midpoint on this interval is thus does, yields (b−a)2 . 4 a quadratic term, b−a 2 . Squaring this value, as the variance operator The ’averaging’ of this quantity is to divide by another 3, because we are integrating x2 dx = R x3 3 . Thus, the result is (b−a)2 12 . Keep in mind, this value will always be bigger than zero and less than the largest distance from the center or mean squared. 0< (b−a)2 12 1.6.1 σ= = σ2 = (b−a)2 12 = (b−a)2 4 3 < Standard Deviation (σ = √ σ2 = q (b−a)2 12 = √ (b−a)2 √ 12 = (b−a)2 4 √ 3(b−a) ) 6 √ (b−a) 3 6 Keep in mind, this value will always be bigger than zero and less than the largest distance from the center or mean. 0< 2 √ (b−a) 3 6 =σ= √ (b−a) 3 6 = b−a √2 3 < (b−a) 2 Exponential Distribution - X ∼ Exp(λ) The Exponential Distribution is the random variable (r.v.) that models the waiting time (distance or other continuous metric) until the next rare event. The Poisson Distribution, a Discrete Distribution, counts the number of rare events over a continuous metric interval. The Exponential Distribution is interwoven with the Poisson Distribution by measuring the length of the continuous metric until the next count. This interwoven nature is known as the Poisson Process and will be in its own independent set of notes. X ∼ Exponential(λ) where λ > 0 is the rate at which counts occur per waiting ‘time’ just as it was in the P oisson(λ) Distribution. The Exponential Distribution derives from the Geometric Distribution in the limit as p ց 0 and the discrete, ’trial’, metric is extended to a continuous metric making the Exponential Distribution a Continuous Analog of the Geometric Distribution. This will be demonstrated with a little ’hand waving’: Provided the following two items we will derive the Survival Function SX (x) = P (X > x). 1. Y ∼ geo(p) ⇒ SY (y) = P (Y > y) = (1 − p)y 2. n ր ∞, p ց 0, such that (s.t.) np → λ > 0 ⇒ SX (x) =2 limnր∞,pց0,s.t.np→λ>0 P (Y > y)n =1 limnր∞,pց0,s.t.np→λ>0 [(1 − p)x ]n =2 limnր∞,0<λ<∞ [(1 − nλ )n ]x = [e−λ ]x = e−λx 1 x≤0 ⇒ SX (x) = P (X > x) = e−λx x>0 Provided the Survival Function SX (x) we are able to derive all of the other properties of the Exponential Distribution. 3 2.1 Probability Density Function (PDF) - fX (x) = λe−λx : x > 0 fX (x) = 2.1.1 d d [−SX (x)] = [−e−λx ] = λe−λx ⇒ fX (x) = dx dx 0 λe−λx x≤0 x>0 Rules 1. λ > 0 and e−λx > 0 ⇒ λe−λx ≥ 0 for x > 0 AND 0 ≥ 0 for x ≤ 0 ⇒ fX (x) ≥ 0, ∀x. Rn R0 Rn R −λn u du 2. limnր∞ −n fX (x)dx = limnր∞ −n (0)dx+ 0 λe−λx dx = (0)+limnր∞ λ 0 e ( −λ ) = limnր∞ −[eu ]−n 0 = −(limnր∞ e−n − e0 ) = −((0) − (1)) = 1 Therefore, X is a PDF. 2.2 Cumulative Distribution Function (CDF) - FX (x) = 1 − e−λx : x > 0 −λx FX (x) = P (X ≤ x) = 1 − SX (x) = 1 − e 2.3 ⇒ FX (x) = 0 1 − e−λx x≤0 x>0 Percentile p : x∗ = − λ1 ln(1 − p) For 0 < p < 1: FX (x∗ ) = p ⇔ 1 − e−λx = p ⇔ SX (x) = e−λx = 1 − p ⇔ ln(e− ∗ ∗ x∗ µ ) = −λx∗ = ln(1 − p) ⇔ x∗ = − λ1 ln(1 − p) Thus, the p percentile of a Exponential Distribution inverts the exponential function to find the needed metric value. 2.4 Probability - SX (x) = e−λx : x > 0 SX (x) = P (X > x) = 1 e−λx x≤0 x>0 AND 0 1 − e−λx2 x1 < x2 ⇒ P (x1 < X < x2 ) = FX (x2 ) − FX (x1 ) = −λx1 − e−λx2 e 2.4.1 x1 < x2 ≤ 0 x1 ≤ 0 < x2 0 < x1 < x2 Memoryless Property (as with the Geometric Distribution) - P (X > s + t|X > t) = P (X > s) P (X > s + t|X > t) = P (X>s+t∩X>t) P (X>t) = P (X>s+t) P (X>t) = e−λ(s+t) e−λt 4 =Property of Exponential Functions e−λs = SX (s) Similarly, P (X ≤ s + t|X > t) = P (X<s+t∩X>t) P (X>t) P (t<X<s+t) P (X>t) = = e−λt −e−λ(s+t) e−λt = 1 − e−λs = FX (s) Mean/Expected Value (µ = E[X] = λ1 ) 2.5 µ = E[X] = limnր∞ Rn −n xfX (x)dx = limnր∞ R0 −n x(0)dx + Rn 0 x(λe−λx )dx Rn d R [x][ e−λx dx]dx 0 dx Rn Rn = λ limnր∞ [x(− λ1 e−λx )]n0 − 0 (1)(− λ1 e−λx )dx = limnր∞ −[xe−λx ]n0 + 0 e−λx dx =Integration-by-Parts (IbP) (0) + λ limnր∞ [x = −((0) − (0)e−λ∗0 + limnր∞ R −λn 0 σ 2 = V ar[X] = E[X 2 ]−µ2 = limnր∞ =IbP (0) + λ limnր∞ ([x2 R Rn 0 = 2 limnր∞ ([x(− λ1 e−λx )]n0 − Rn 0 2.6.1 3 1 λ2 σ2 = q 1 λ2 = 1 λ R n d 2 R −λx [x ][ e dx]dx) − 0 dx R R −λn 0 e−λx dx]n0 − du )) − eu ( −λ = − λ22 (0 − 1) − Standard Deviation (σ = √ R0 Rn x2 fX (x)dx−( λ1 )2 = limnր∞ ( −n x2 (0)dx+ 0 x2 (λe−λx )dx)− λ12 (1)(− λ1 e−λx )dx) − = λ2 (−((0) − (0)e−λ(0) ) + limnր∞ = − λ22 limnր∞ (e−n − e0 ) − −n 1 λ 1 λ = µ2 ) (2x)(− λ1 e−λx )dx) − =IbP ((0) − (0)2 e−λ(0) ) + limnր∞ 2([x σ= 1 λ2 Rn e−λx dx]n0 − = λ limnր∞ ([x2 (− λ1 e−λx )]n0 − e−λx dx]n0 − 1 du −n ) = − λ1 limnր∞ [eu ]−n − e0 ) = − λ1 (0 − 1) = eu ( −λ 0 = − λ limnր∞ (e Variance (σ 2 = V ar[X] = 2.6 R 1 λ2 1 λ2 1 λ2 = − limnր∞ [x2 e−λx ]n0 + 2 R Rn d [x][ e−λx dx]dx) − 0 dx 1 λ2 = 2 λ 1 λ2 = − λ22 limnր∞ [eu ]−n 0 − = 1 λ2 Rn 0 xe−λx dx − 1 λ2 1 λ2 limnր∞ (−[xe−λx ]n0 + 1 λ2 Rn 0 e−λx dx) − 1 λ2 = µ2 = µ) =µ Normal Distribution - X ∼ N (µ, σ 2 ) The Normal Distribution is most utilized as an approximate distribution to sums and averages of a large number, n, of Independent, Identically Distributed (iid) r.v.s, which is derived from the Central Limit Theorem (CLT). As ”Limit” is in CLT, calculus limits as n ր ∞ tells us such sums and averages approach a Normal Distribution. However, statistically there is no way to take a sample of infinite size. Thus, only the approximation is utilized in statistics. Further, the assumptions described above can be relaxed. 1. Identical: The distributions need not be identical, it is merely the most common circumstance that statisticians find themselves in when taking samples and the precision of the approximation will be very difficult to determine without this property. 5 2. Independent: The distributions need not be independent, but the sequence of dependency must continue reduce and approach 0 as the sample approaches infinite. The strictness of this requirement is beyond the scope of this course, so this course shall only utilize the CLT approximation for independent distributions. Further, the Normal Distribution is known as a Location-Scale Distribution meaning that one only need know its location, in this case specifically the measure of center of the mean, and its scale, in this case specifically the measure of spread of the variance, the average distance squared away from the mean/center. Many Location-Scale Distributions including the Normal Distribution possess a very useful property of being able to standardize the distribution to be utilized in a simpler form: Standard Normal Distribution: Z = X−µ σ ∼ N ormal(0, 1) We will always be transforming any generic Normal Distribution X to the Standard Normal Distribution OR x = µ + zσ if one wishes to for purposes of computing probability with the transformation z = x−µ σ solve a percentile problem. Finally, like the Uniform Distribution, the Normal Distribution ⇒ the mean = median. These values also happen to be the same as the mode. 3.1 PDF - fX (x) = fX (x) = 3.1.1 √ 1 e− 2πσ (x−µ)2 2σ 2 √ 1 e− 2πσ (x−µ)2 2σ 2 2 z √1 e− 2 2π : x ∈ ℜ ⇒ fZ= X−µ (z) = σ : x ∈ ℜ ⇒ fZ (z) = 2 z √1 e− 2 2π :z∈ℜ :z∈ℜ Rules 1. σ > 0 ⇒ 2. limnր∞ √1 2πσ > 0 and e− (x−µ)2 2σ 2 Rn >0⇒ f (x)dx = limnր∞ −n X Rn √ 1 e− 2πσ √ 1 e− −n 2πσ (x−µ)2 2σ 2 (x−µ)2 2σ 2 ≥ 0, ∀x dx = limnր∞ √1 2πσ R n−µ σ −n−µ σ e− z2 2 (σdz) = √1 2π limnր∞ ... = 1 The completion of the ”...” part of the proof requires the use of a transformation to Polar Coordinates in 3-dimensions prompting knowledge of integration over volumes from Multivariate Calculus, which is not required for this course. Thus, the rest of the proof is omitted. If interested, feel free to attempt the problem on your own or ask anyone working for STAT 350. Therefore, X is a PDF. 3.2 CDF - FX (x) = limnր∞ FX (x) = P (X ≤ x) = limnր∞ = √1 2π limnր∞ R x−µ σ −n e− z2 2 Rx −n Rx f (t)dt = limnր∞ −n X fX (t)dt = limnր∞ Rx −n R x−µ σ −n √ 1 e− 2πσ fZ ( x−µ )dz = FZ ( x−µ ) = Φ( x−µ ) σ σ σ (t−µ)2 2σ 2 dx = x−µ du = FZ ( x−µ σ ) = Φ( σ ) = ...Z − T ABLE/R 6 √1 2πσ limnր∞ R x−µ σ −n−µ σ e− z2 2 (σdz) Rn −n e− z2 2 dz = 3.3 Percentile p For 0 < p < 1: ∗ FX (x∗ ) = p ⇔ FZ ( x σ−µ ) = FZ (z ∗ ) = p ⇒ Use Z-Table to find z ∗ and use it to solve for x∗ z∗ = 3.4 x∗ −µ σ ⇒ x∗ = µ + z ∗ σ Probability x−µ P (X > x) = 1 − P ((X > x)C ) = 1 − P (X ≤ x) = 1 − FX (x) = 1 − FZ ( x−µ σ ) = 1 − Φ( σ ) = ...Z − T ABLE/R AND x1 < x2 ⇒ P (x1 < X < x2 ) = P ( x1σ−µ < Z < ...Z − T ABLE/R 3.5 = FZ ( x2σ−µ ) − FZ ( x1σ−µ ) = Φ( x2σ−µ ) − Φ( x1σ−µ ) = Mean/Expected Value (µ = E[X]) µ = E[X] = limnր∞ = x2 −µ σ ) √1 2πσ limnր∞ Rn xfX (x)dx = limnր∞ −n 2 R − 21 ( n−µ σ ) )2 − 12 ( −n−µ σ 2 Rn 1 e− (x − (µ − µ))( √2πσ −n dz )+µ (x − µ)ez (− σx−µ Rn −n √ 1 e− 2πσ (x−µ)2 2σ 2 (x−µ)2 2σ 2 )dx dx −n √σ = − √σ2π limnր∞ [ez ]−n − e−n ) + µ = − √σ2π (0 − 0) + µ = µ −n + µ(1) = − 2π limnր∞ (e This mathematics is thoroughly unnecessary though as in defining the Normal Distribution the mean is necessarilly defined as well. 3.6 Variance (σ 2 = V ar[X]) σ 2 = V ar[X] = limnր∞ Rn (x − µ)2 fX (x)dx = limnր∞ −n Rn 1 (x − µ)2 ( √2πσ e− −n (x−µ)2 2σ 2 )dx = ... = σ 2 Again, the completion of the proof requires mathematical knowledge in the area of probability known as Moment Generating Functions (MGFs) or Characteristic Functions which is not required for this course. Thus, the rest of the proof is omitted. Further, this mathematics is again thoroughly unnecessary as in defining the Normal Distribution the variance is necessarilly defined as well. 7