Exam #2 Review Chapter 3 3.1 3.2 Chapter 4 4.1 4.2 4.3 4.5 4.6 4.7 Chapter 6 6.2 6.3 6.4 6.7 151 151 154 155 166 Expectation and Moments Expected Value of a Random Variable Conditional Expectations Moments of Random Variables Joint Moments Chebyshev and Schwarz Inequalities Markov Inequality The Schwarz Inequality Moment-Generating Functions Chernoff Bound Characteristic Functions The Central Limit Theorem 4.4 6.1 Functions of Random Variables Introduction Functions of a Random Variable (FRV): Several Views Solving Problems of the Type Y = g(X) General Formula of Determining the pdf of Y = g(X) 215 232 242 246 255 257 258 261 264 266 276 Statistics: Part 1 Parameter Estimation Introduction Independent, Identically Distributed (i.i.d.) Observations Estimation of Probabilities Estimators Estimation of the Mean Properties of the Mean-Estimator Function (MEF) Procedure for Getting a δ-confidence Interval on the Mean of a Normal Random Variable When σX Is Known Confidence Interval for the Mean of a Normal Distribution When σX Is Not Known Procedure for Getting a δ-Confidence Interval Based on n Observations on the Mean of a Normal Random Variable when σX Is Not Known Interpretation of the Confidence Interval Estimation of the Variance and Covariance Confidence Interval for the Variance of a Normal Random variable Estimating the Standard Deviation Directly Estimating the covariance Maximum Likelihood Estimators Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 1 of 36 ECE 3800 340 341 343 346 348 349 352 352 355 355 355 357 359 360 365 Chapter 7 7.1 7.2 7.3 7.4 Statistics: Part 2 Hypothesis Testing Bayesian Decision Theory Likelihood Ratio Test Composite Hypotheses Generalized Likelihood Ratio Test (GLRT) Goodness of Fit 391 396 402 403 417 Homework problems Previous homework problem solutions as examples – Dr. Severance’s Skill Examples Skills #2 Skills #3 Skills #4 Read through the 2015 Exam #2 … Three of the four to five problems will be: 1. You have earned the chance to do another sum of probabilities problem. (Exam #1, problem #3, except alternate pdfs or even pmfs are likely to be selected – discrete convolution) 2. You will be given a joint density function. Compute the marginal density functions. Compute all the means, 2nd moments, variances, and cross-correlation coefficient. (2015 exam #2 problem #1) 3. A confidence interval computation will be required, based on a probabilistic variance value being know (Gaussian normal distribution) AND based on a statistically generated variance value being known (Student’s t-distribution). And now for a quick chapter review … the important information without the rest! Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 2 of 36 ECE 3800 Chapter 3 Functions of Random Variables 151 The sum of two random variables Z=X+Y We are considering the CDF defined for Z X Y FZ z PrZ z f Z z f z y, y dy X ,Y If X and Y are independent, f X ,Y x, y f X x f Y y And f Z z f X ,Y z y, y dy f z y f y dy X Y A note, the convolution can be performed in either of the two forms: f Z z f X x f Y z x dx f z y f y dy X Y Extending the convolution result Z a X b Y For Assume two changes of variables: V a X and W b Y Then the new marginal pdf can be defined as 1 1 v w f V v f X and f W w f X a b a b and we have 1 v f V v fX a a f Z z 1 1 v 1 z v z w 1 w a f X a b f Y b dv a f X a b f Y b dw 1 1 z w w v z v f Z z f X fY fX f Y dw dv a b a a b a b b 4.1 Expected Value of a Random Variable 215 This is commonly called the expectation operator or expected value of … and is mathematically described as: Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 3 of 36 ECE 3800 X EX x f X x dx For discrete random variables, the integration becomes a summation and X EX x f X x x x Pr X x x General concept of an expected value In general, the expected value of a function is: E g X g X f x dx X E g X g X f x g X Pr X x X x x Estimating a parameter: If we know the expected value, you have a simple estimate of future expected outcomes. xˆ X E X Or for y g x yˆ E y Eg X Example 4.1-2 Expected value of a Gaussian f X x x 2 exp 2 2 2 2 1 , for x X E X Moments The moments of a random variable are defined as the expected value of the powers of the measured output or … x Xn E Xn n f X x dx Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 4 of 36 ECE 3800 x Xn E Xn f X x n x x Pr X x n x Therefore, the mean or average is sometimes called the first moment. Expected Mean Squared Value or Second Moment The mean square value or second moment is x X EX 2 2 f X x dx 2 x X EX 2 2 2 Pr X x x Central Moments and Variance The central moments are the moments of the difference between a random variable and its mean. X X n x X E XX n n f X x dx The second central moment is referred to as the variance of the random variable … XX 2 2 x X E XX 2 2 f X x dx The square root of the variance is the standard deviation, σ X X 2 Note that the variance may also be computed as: 2 X2 X 2 Another estimate of future outcomes, is the value that minimizes the mean squared error. min E error min E X xˆ 2 2 Power and Energy Considerations DC Power/Energy is related to the mean square EX 2 X AC Power/Energy is related to the variance E XX 2 2 2 2 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 5 of 36 ECE 3800 Notice that the 2nd moment has both AC and DC power terms …. 2 X 2 2 X 2 2 E X2 E X X E X 2 2 Means and Variances of Defined density functions. Linearity of the expected values – a linear operator The linearity allowing “linear operate” characteristics can be interpreted as allowing the expected value of a function to be the expected value of the internal elements of a function. N N E g i X E g i X i 1 i 1 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 6 of 36 ECE 3800 As another example EX Y EX EY Example 4.1-7 Variance of Gaussian x 2 exp 2 2 2 2 1 f X x , for x E X 2 2 Example 4.1-15 Geometric Distribution PX x pmf X x 1 a a x , EX EX 2 2 a2 1 a E X 4.2 2 2 0 x a 1 a a 22 1 a a2 1 a 2 a 2 1 a Conditional Expectations 232 Expected Values for Conditional Probability The conditional expectation is defined as E X | B E X | B xi Pr X |B x f x | B dx x | B x pmf x X |B i i i X |B i | B i Example 4.2-1” Conditional expectation of a uniform R.V 1 f X x , x0 x x1 x1 x0 x x0 FX x , x 0 x x1 x1 x 0 The condition B X a Therefore: Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 7 of 36 ECE 3800 f X x 1 a x x1 , 1 FX a x1 a F x FX a x a FX x | B X , a x x1 1 FX a x1 a f X x | B Expected Values for Joint Density functions and Conditional Probability Definition 4.2-3 The expected value of a conditional probability. f X ,Y x, y For the joint density function given as: EY | X x We want to now the expected value We know from before f X ,Y y | X x f x | Y y f Y y f x, y f X x f X x E Y | X x y f X ,Y y | X x dy Usefulness or application E Y y f X ,Y x, y dy dx But f x, y f x | Y y f Y y f y | X x f X x E Y y f y | X x f X x dy dx E Y y f y | X x dy f X x dx E Y E Y | X x f X x dx Properties of Conditional Expectations: Property I: Expected values of a conditional expected values: EY EEY | X Property II: If X and Y are independent (X should not matters) EY EY | X Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 8 of 36 ECE 3800 Property III: A conditional chain rule … EZ | X EEZ | X , Y | X Based on this equation, all values of y have been considered. Therefore, E E Z | X , Y | X z f Z | X z | X dz E Z | X 4.3 Moments of Random Variables 242 Example 4.3-1 Binomial R.V. n n x PX x pmf X x p x 1 p x EX n p E X 2 n n 1 p 2 n p Determine the variance E X E X 2 E X 2 2 E X n p 1 p n p q 2 Joint Moments 246 When we have multiple random variables, additional moments can be defined. g x, y f x, y dx dy E g X , Y All expected values may be computed using the Joint pdf. There are some “new” relationships. Correlation and Covariance between Random Variables The definition of correlation was given as EX Y x y f x, y dx dy But most of the time, we are not interested in products of mean values (observed when X and Y are independent) but what results when they are removed prior to the computation. Developing values where the random variable means have been extracted, is defined as computing the covariance Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 9 of 36 ECE 3800 COV X , Y E X E X Y E Y x y f x, y dx dy X Y This gives rise to another factor, when the random variable means and variances are used to normalize the factors or correlation/covariance computation. For example, the following definition – correlation coefficient based on the normalized covariance X X E X Y Y Y x X X y Y Y f x, y dx dy COV X , Y X Y Also E X Y X Y X Y EX Y X Y X Y The short derivation X X Y Y X Y X Y X Y Y X X Y E X Y E The expected value is a linear operator … constants remain constants and sums are sums … E X Y X E Y Y E X X Y X Y E X Y X Y Y X X Y X Y E X Y X Y X Y Properties of Uncorrelated Random Variables 248 For “standardized” random variables, the correlation coefficient can be solved for as the correlation value. (standardized R.V. are zero mean, unit variance) E x y X Y E x y 0 0 E x y 11 X Y Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 10 of 36 ECE 3800 For either X or Y a zero mean variable, E x y 0 E X Y X Y X Y For independent random variables … E X Y E X E Y X Y E X Y X Y X Y X Y 0 X Y X Y For X and Y uncorrelated … letting Z=X+Y VARZ E Z Z Z X 2 X Y Y 2 2 2 2 For uncorrelated X and Y VARZ Z X Y 2 2 2 A special note Independent random variables are uncorrelated. E X Y E X E Y X Y E X Y X Y X Y X Y 0 X Y X Y However, uncorrelated random variables are not necessarily independent. Example 4.3-4 Linear Prediction – mean square error Under a linear assumption of the relationship between X and Y yˆ a X b Y yˆ Y a X b The error can be defined as E 2 E Y 2 2 a E Y X 2 b Y a 2 E X 2 2 a b X b 2 Minimization says to take the derivative in a and set to zero and then derivative in b and set to zero. E 2 2 EY X 2 a E X 2 2 b X 0 a Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 11 of 36 ECE 3800 E 2 2 Y 2 a X 2 b 0 b a E Y X Y X E X 2 X 2 X Y Y X X2 b Y a X Y The linear predictor becomes yˆ Y Y X X X Y yˆ Y Y Y X X Y X X X E 2 Y 2 Y Y 1 2 2 2 2 Jointly Gaussian Random Variables 251 If two R.V. are jointly Gaussian x X 2 x X y Y y Y 2 1 2 exp 2 2 X Y Y2 2 1 X f X ,Y x, y 2 X Y 1 2 If 0 : x X 2 y Y 2 x X 2 y Y 2 exp exp exp 2 Y2 2 X2 2 Y2 2 X2 f X ,Y x, y 2 X Y 2 Y 2 X 4.4 Chebyshev and Schwarz Inequalities 255 The Chebyshev inequality furnishes a bound on the probability of how much an R.V. can deviate from its mean value. Chebyshev inequality Theorem 4.4-1 Let X be an arbitrary R.V. with known mean and variance. Then for any 0 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 12 of 36 ECE 3800 P X X X2 2 If we consider the complement of the probability described, P X X 1 2 2 It may be convenient to define the delta function in terms of a multiple of the variance. k Then the Chebyshev inequality becomes P x X k 1 k2 P x X k 1 1 k2 Example 4.4-1 Deviation from the mean for a Normal R.V. The Gaussian Normal CDF is xX X z z v 2 1 dv exp 2 2 2 v Therefore k x X v 2 1 dv X k X k 2 X k 1 k exp P 2 v k 2 2 P x X k 2 X k 1 1 1 k2 and x X k v 2 v 2 1 1 1 dv 2 P dv exp k exp 2 2 k 2 v 2 vk 2 2 x X 1 k X k 1 X k 1 Q k Qk 2 P k x X 1 k 2 1 X k 2 Qk 2 P k Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 13 of 36 ECE 3800 Markov Inequality 257 The Markov inequality focuses on the mean values and states. For a non-negative R.V. f X x 0, for x 0 PX EX The Schwarz Inequality 258 Schwarz Inequality considers the magnitude of the covariance of two R.C. CovX , Y X Y 2 CovX , Y X Y 2 2 2 2 You have already experienced the Schwartz Inequality in other setting … For the inner product h, g hx g x * dx h, g h g You may also see the convolution form as hx g x dx * hx hx dx * g x g x * dx Law of Large Numbers Now that we have discussed the Cheybyshev inequality, we can provide a proof of the Law of Large Numbers. The discussions here provides the condition that the sample mean converges to the ensemble mean … that is the statistical mean equals the R.V. ensemble mean. Example 4.4-3. The sample mean equals the expected value mean For a large enough number of samples, we say that 1 n ̂ X X i n i 1 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 14 of 36 ECE 3800 1 n 1 X n X X n i 1 n E ˆ X Var ˆ X 1 2 X n Therefore, using the Chebyshev inequality we state that X2 P ˆ X X n 2 Then for any fixed value of delta X2 lim P ˆ X X lim n n n 2 and lim P ˆ X X 0 n 4.5 Moment-Generating Functions 261 The moment generation function (MGF) is the two sided Laplace transform of the probability density function (pdf). If the MGF exists, there is a forward and inverse relationship between the MGF and the pdf. The MGF is defined bases on the expected value as M X t Eexpt X Therefore M X t f x expt x dx X If you like s better than t in your Laplace transforms … M X s f x exps x dx X For discrete R.V. we perform a discrete Laplace transform M X s pmf X xi exps xi i P x exps x i X i i Why do we do this? 1. It enables a convenient computation of the higher order moments Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 15 of 36 ECE 3800 2. It can be used to estimate fx(x) from experimental measurements of the moments 3. It can be used to solve problems involving the computation of the sums of R.V. 4. It is an important analytical instrument that can be used to demonstrate results and establish additional bounds (the Chernoff Bound and the Central Limit Theorem). It enables a convenient computation of the higher order moments k k M X t M X 0 mk k t t 0 Solution done by performing a 2-sided Laplace Transform and differentiation! Example 4.5-1: The MGF of a Gaussian f X x x 2 exp 2 2 2 2 1 , for x 2 t2 M X t exp t 2 Example 4.5-2 MGF of Binomial n n x PX x pmf X x p x 1 p x M X t 1 p p expt n Example 4.5-3 MGF of Geometric Distribution PX x pmf X x 1 a a x , M X t 0 x 1 a 1 a exp t The MGF of an Exponential f X x exp x , M X t t , for 0 x for t Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 16 of 36 ECE 3800 The MGF of a Uniform U(a,b) 4.6 f X x 1 , for a x b ba M X t exp t b exp t a t b 1 Chernoff Bound 264 The Chernoff bound is based on concepts related to the MGF. PX a exp a t M X t To minimize the bound, the value of t that provides the smallest value should be used. Therefore, differentiate with respect to t, find the value of t for the minimum and use the derived value for the bounds. Example 4.6-1 Chernoff Bound to Gaussian For the Gaussian f X x x 2 exp 2 2 2 2 1 , for x 2 t2 M X t exp t 2 The Bound becomes 2 t2 PX a exp a t M X t exp a t exp t 2 1 a 2 , PX a exp for a 2 2 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 17 of 36 ECE 3800 4.7 Characteristic Functions 266 Therefore, the characteristic function is X w Eexp j w X X w f x exp j w X dx X For discrete RV X w pmf X xi exp j w xi i P x exp j w x i X i i Note that there is a relationship between the MGF and CF … M X t Eexpt X and X w Eexp j w X X w M X j w For sums of R.V., Z=X+Y, as previously shown the density function is the convolution of the two density function in X and Y. f Z z f X x f Y z x dx f z y f y dy X Y But this can also be performed in the “frequency domain” by multiplication Z w X w Y w As a result it makes it much easier to contemplate Z X1 X 2 X N This would become Z w X 1 w X 2 w X N w Moment Generation with the Characteristic Function As with the MGF, the CF can generate moments by differentiation. 1 k k k X w X 0 j k mk or mk k X 0 k w j w0 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 18 of 36 ECE 3800 Joint MGF and CF Joint MGF and CRF Function can and are defined. As may be expected, they can be used to compute “cross-products” of random variables just as they generated moments. The Joint MGF is defined as M X 1, X 2,,, X N t 1,t 2 , ,t N Eexpt 1 X 1t 2 X 2 t N X N The Joint CF is defined as X 1, X 2,,, X N w1 ,w2 , ,w N Eexp j w1 X 1 j w2 X 2 j w N X N All the moments become for the joint and marginal variables can then be computed based on M X ,Y ( l ,n ) 0,0 l n M X ,Y t1 , t 2 ml ,n l n t1 t 2 t t 0 1 2 or X ,Y ( l ,n ) 0,0 l n X ,Y w1 , w2 l n w1 w2 w w 0 1 and 2 j l n X ,Y (l ,n ) 0,0 ml ,n The Central Limit Theorem The central limit theorem states that the normalized sum of a large number of mutually independent R.V. with zero mean and finite variance tends to the Gaussian normal CDF, provided that the individual variances are small compared to the sum of the variances. Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 19 of 36 ECE 3800 Chapter 6 Statistics: Part 1 Parameter Estimation Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or facts: Descriptive statistics – the collecting, grouping and presenting data in a way that can be easily understood or assimilated. Inductive statistics or statistical inference – use data to draw conclusions about or estimate parameters of the environment from which the data came from. Definitions Population: the collection of data being studied N is the size of the population Sample: a random sample is the part of the population selected all members of the population must be equally likely to be selected! n is the size of the sample Sample Mean: the average of the numerical values that make of the sample Population: N Sample set: S x1 , x2 , x3 , x 4 , x5 , xn Sampling Theory – The sample mean – (the mean itself, not the value of X) 1 n Xˆ X i , where X i are random variables with a pdf. n i 1 Sample Mean 1 n 1 n 1 n E Xˆ E X i E X i n i 1 n i 1 n i 1 Variance of the sample mean – (the mean itself, not the value of X) For N>>n or N infinity (or even a known pdf), using the collected samples …. 2 1 n ˆ Var X E X i E Xˆ n i 1 2 2 2 1 n X 2 X 2 X Var Xˆ X 2 n n2 n n Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 20 of 36 ECE 3800 Sampling Theory – The Sample Variance The sample variance of the population (stdevp) is defined as: S2 1 n X n i 2 Xˆ i 1 n 1 2 n where is the true variance of the random variable. E S2 Note: the sample variance is not equal to the true variance; it is a biased estimate! To create an unbiased estimator, scale by the biasing factor to compute (stdev): n n 1 n ~ E S 2 x2 E S2 X i Xˆ n 1 n 1 n i 1 n 1 1 X 2 n i 1 i Xˆ 2 Variance of the sample variance As before, the variance of the variance can be computed. (Instead of deriving the values, it is given.) It is defined as Var S 2 4 4 n where 4 is the fourth central moment of the population and is defined by 4 E X X 4 For the unbiased variance, the result is ~ Var S 2 4 4 n 4 4 n2 n2 2 Var S n n 12 n 12 n 12 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 21 of 36 ECE 3800 Statistical Mean and Variance Summary For taking samples and estimating the mean and variance … The Estimate Mean Variance of Estimate Var Xˆ X n 1 n ̂ X Xˆ X i n i 1 2 An unbiased estimate E Xˆ E X ˆ X X Variance (biased) 1 S n 2 n 2 X i Xˆ i 1 Variance (unbiased 4 E X X A biased estimate E S2 n 4 X ~ Var S 2 n 12 4 4 n 1 2 X n n ~ 2 ES2 E S2 X n 1 An unbiased estimate ~ E S ˆ ~ 2 E S 2 E X X 2 2 X X Var S 2 4 X 4 n 4 E X X 4 2 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 22 of 36 ECE 3800 Building a confidence interval P X X Using the Chebyshev Inequality X2 2 It may be convenient to define the delta function in terms of a multiple of the variance. k X2 P X X k X 2 2 k X Flipping he bounds on the inequality P X X k X 1 1 k2 Expanding the absolute value and adding the mean P k X X X k X 1 1 k2 The confidence interval for the statistical average becomes 1 P k X ˆ X X k X 1 2 k n n If we assume that the mean value has a Gaussian distribution, an exact value can be computed for this probability ˆ X P k X k k k 2 k 1 X n A confidence interval often chosen is 95% or 0.95. k 1 0.95 0.975 2 k 1.96 and ˆ X P 1.96 X 1.96 0.95 X n Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 23 of 36 ECE 3800 Gaussian Confidence intervals: For a “two sided” confidence interval, we want ˆ X X k k k 2 k 1 C.I . P k X n (A: textbook steps 1 & 2) Find the appropriate value k for the confidence interval selected. (B: textbook step 3 & 4) Based on the known Gaussian mean and variance for the “estimated” R.V and the known number of samples, Compute the bounds on the inequality. Summary P bounds k X ˆ X X k X bounds 2 k 1 C.I . n n Compute what you need … Looking at the numbers … The higher the confidence interval, the wider are the bounds. The tighter are the bounds, the smaller the confidence interval. Gaussian “confidence” +/- one standard deviation 68.3% +/- two standard deviation 95.44% +/- three standard deviation 99.74% 90% k=+/-1.64 standard deviation 95% k=+/-1.96 standard deviation 99% k=+/-2.58 standard deviation Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 24 of 36 ECE 3800 Confidence Intervals when we do not know the actual variance … We have the statistically computed, non-biased variance estimate. Define the new “estimated random variable mean function: as ˆ ˆ T ~ ˆ S n n and define Tn 1 ˆ ˆ n ˆ n 1 2 X i ˆ n n 1 i 1 To simplify the textbook description involving the chi-squared distribution, this is the basis for Student’s t-distribution with n-1 degrees of freedom. The Student’s t probability density function (letting v=n-1, the degrees of freedom) is defined as f T t where v 1 v 1 2 2 2 1 t v v v 2 is the gamma function. For a “two sided” confidence interval, we want ˆ X X t Tn1 t Tn1 t 2 Tn1 t 1 C.I . P t ˆ X n (A: textbook steps 1 & 2) Find the appropriate value t based on the value v=n-1 for the confidence interval selected. (Hint. the tables says x and n, but you are looking up v=n-1 (not n) and finding t=x based on FT) (B: textbook step 3 & 4) Based on the known computed variance for the “estimated” R.V and the known number of samples. Compute the bounds on the inequality. ˆ ˆ P t ˆ X X t C.I n n Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 25 of 36 ECE 3800 Or the bounds on the true mean, based on the confidence interval are ˆ ˆ P ˆ X t X ˆ X t C.I n n CI 100 tc f t dt F t F t T T c T c for t c t t c , 2-sided t c 6.2 Definitions Estimator: a function of the observations vector that estimates a particular parameter. Unbiased estimator an estimator is unbiased if the estimate converges to the correct value. Biased estimator an estimator may be biased. Being converging to an offset or gain adjusted value. Linear estimation the estimate is a linear combination of the sample points … bH x X Consistent the estimator is consistent if the estimate converges to the appropriate value as the number of samples goes to infinity. There are estimators that minimize the variance in the estimate from the sample values. There are estimators that minimize the mean-squared error for the sample values. 6.7 Maximum Likelihood Estimators The likelihood function can be “properly” defined as n L ; x1 , x 2 , , x n f X xi | i 1 we are interested in finding the value of theta, , that maximizes this function! To solve … take the derivative, set to zero, solve and determine the minima and maxima. Pick the global maxima! Easy … right ?! Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 26 of 36 ECE 3800 6.8 Ordering, Ranking and Percentiles Percentile: https://en.wikipedia.org/wiki/Percentile “A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found.” Textbook: The u-th percentile of X is the number xu such that FX(xu)=u. u FX xu One would say that the result xu is in the uth percentile. Example 6.8-1 Assume a person’s IQ is distributed as N(100,100). That is , a Gaussian normal with mean of 100, a variance of 100 and a standard deviation of 10. Then an IQ of 115 would be defined at what percentile of the popolations? z 115 100 1.5 10 1.5 0.9332 The individual is in the 93rd percentile for IQ. Median: The median of a population is defined where half of the population is above and half is below the value. FX xmedian 0.5 For some of the distributions described, the mean and the median are not equal! For example, the exponential distribution. xmedian ln0.5 0.69 whereas X 1 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 27 of 36 ECE 3800 Chapter 7 7. Statistics: Part 2 Hypothesis Testing Hypothesis Testing Statistical Decision Making. A statement is made in terms of a Hypothesis. The goal of interpreting the measured values is to accept or reject the Hypothesis. When there is only one Hypothesis, it is referred to as the null Hypothesis (H0). There are potentially multiple Hypotheses, we can generate criteria to accept one over another (establishing thresholds for decision making). 7.1 Bayesian Decision Theory Bayes Hypothesis Testing Consider a simple binary hypothesis problem where we are to decide between two hypotheses based on a random sample X n X 1 , X 2 ,, X n : and we assume that we know that H 0 occurs with probability p0 and H 1 with probability p1 1 p0 . There are four possible outcomes of the hypothesis test, and we assign a cost to each outcome as a measure of its relative importance: Cost = C00 1. H 0 true and decide H 0 2. H 0 true and decide H 1 (Type I error) Cost = C01 3. H 1 true and decide H 0 (Type II error) Cost = C10 4. H 1 true and decide H 1 Cost = C11 It is reasonable to assume that the cost of a correct decision is less than that of an erroneous decision, that is C00 C01 and C11 C10 . Our objective is to find the decision rule that minimizes the average cost C: C C00 PH 0 | H 0 C01 PH1 | H 0 p0 C11 PH1 | H1 C10 PH 0 | H1 p1 Note: in detection and estimation, “optimal” is typically defined based on a cost function. The cost functions are often readily related to the operation or parameter (e.g. minimum meansquare-error, maximum likelihood, etc.) but they can include arbitrary terms, as long as they can be readily defined based on the measured values, differences, statistics, or probability. Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 28 of 36 ECE 3800 The following theorem identifies the decision rule that minimizes the average cost. 7.2 Likelihood Ratio Test When we don’t have underlying probabilities or a cost function but do have the conditional probabilities, we define a likelihood ratio test as f X 1 ; 1 f X 1 ; 1 f X n ; 1 L X ; 1 k accept 1 as choice f X 1 ; 2 f X 1 ; 2 f X n ; 2 L X ; 2 f X 1 ; 1 f X 1 ; 1 f X n ; 1 L X ; 1 k reject 1 as choice f X 1 ; 2 f X 1 ; 2 f X n ; 2 L X ; 2 where the threshold k is determined from criteria that could be based on any decision theory desired. Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 29 of 36 ECE 3800 Hypothesis Testing Based on : Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999.. Null Hypothesis Testing A significance test based on a decision rule must be determined. The significance test establishes a level, potentially the confidence level or confidence interval, to determine whether to accept or reject the hypothesis. This is stated as a decision rule where Accept H0: if the computed value “passes” the significance test. Reject H0: if the computed value “fails” the significance test. In general this involves a significance test that defines and equation or function that can be computed based on the measured data (statistics). A performance threshold is then defined that defines a Accept/Reject or pass/fail boundary. Inside or outside the confidence interval. Acceptably meet desired criteria or not. Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 30 of 36 ECE 3800 Example: (p. 174) A capacitor manufacturer claims that the capacitors have a mean breakdown voltage of 300V or greater. We establish a significance test of a 99% confidence level. (In this case, we are looking for values above the minimum confidence level as acceptable, a one sided test.) In testing, 100 capacitors are tested (note this is a destructive test) and a mean value of 290V with an unbiased sample standard deviation of 40 V. Is the Hypothesis accepted or rejected? The significance test at the 99% confidence level requires ~ X t S Xˆ n Using v=99 and F=0.99 (1 sided test) on G-4, t=2.358 (using 120). Therefore ~ 40 tS 2.358 9.432 n 100 300 9.432 Xˆ or 290 .568 Xˆ The measurement results cause us to reject the Hypothesis! Therefore, we would say that the mean claimed is not valid. Final note: a 99% confidence level was selected for the significance test in this example, if a 99.5% level were selected; the Hypothesis would have been accepted! (300-10.468=289.532) To alleviate (?) this confusion, a “level of significance” can be defined that is 1.0 - confidence level. This would say that the above test was to a 1% level of significance. Therefore a 1% level of significance is rejected but a 0.5% level of significance would be accepted. Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 31 of 36 ECE 3800 Example: Testing a fair Coin The binomial random variable allows us to develop a test to see if a coin is fair when flipped. We need to count the number of heads that occur and test at a level of significance of 5% or a 95% confidence level. Accept H0: if the number of heads inside 95% confidence level, it is fair. Reject H0: if the number of heads outside 95% confidence level, it is not fair. We assume that number of trials has lead to Gaussian statistics, except for a- prior knowledge of the mean and variance expected for a fair coin (p=0.5) using the binomial random variable. For this distribution, the mean and variance are E X X n p or Var X 2 n p 1 p Assuming 100 trials and that the statistics have become Gaussian. k k Xˆ X X n n We use a two-sided test, therefore k=1.96 and we have k 1.96 100 0.5 0.5 1.96 100 4.9 10 4 n 100 Therefore the test region for the Hypothesis is 50 4.9 Xˆ 50 4.9 or 45.1 Xˆ 54.9 Now you have a criteria to establish if a coin is fair …. From Matlab: Pr(46<=x<=54) = 0.631798 Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 32 of 36 ECE 3800 Example: Hypothesis testing in communications Signal plus noise input to the receiver. “Digital symbol” receiver outputs a value corresponding to a symbol plus noise. Hypothesis testing establishes the rules to select one symbol as compared to another. Incorrect selections results in symbol errors and digital bit errors once the symbols are translated into bits. r t s i t nt , for i T t i 1 T Bernard Sklar, “Digital Communications, Fundamentals and Applications,” Prentice Hall PTR, Second Edition, 2001. Appendix B. For each symbol a detection statistic is generated for the symbol period T. z T ai T n0 T Hypothesis testing then determines the estimated symbol value from the detection statistic. The number of Hypothesis is equivalent to the number of possible symbols transmitted. Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 33 of 36 ECE 3800 7.3 Composite Hypotheses 402 Generalized Likelihood Ratio Test (GLRT) A Generalized Likelihood Ratio Test can be used when multiple Hypothesis exist. In this case, we could think of each hypothesis involving a different subset or subspace of the possible outcomes. Again a test statistic could be generated to define inclusion of exclusion such as LLM * LGM z The numerator term involves a local maximization for the parameter theta, while the denominator involves a global maximization in terms of the full range of theta. Multiple values are computed based on the search restricted regions desired for each of the local maxima. This is more closely associated with detecting multiple signal levels from a multi-level transmission, providing detection statistics within each expected regions for possible symbol identification. 7.4 Goodness of Fit 417 Pearson test for valid statistical measures … flipping coins, rolling die, A critical aspect of simulations is that the data actually has the desired characteristics defined or desired. This is particularly important in creating random variables of mathematically defined densities and distributions. The general model is that of sorting the data into “l” bins and comparing the estimated probabilities with the specified probabilities. Then if all of the estimates are near the desired results, it is likely that population are correct. If they differ in two or more bins, it is likely that it is not the correct density function. In performing this test, the number and widths of the bins is important. For simple discrete tests, all bins can be considered. For continuous tests, the number and size drive complexity, the amount of data needed to be taken and the overall accuracy. The R.V. is defined as 1, jth observation of X in bin i X ij 0, else Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 34 of 36 ECE 3800 The probability is P X ij pi Then we have the sum R.V. … the number of outcomes in the bin. n Yi X ij j 1 This becomes a multinomial probability function and is based on n! PY1 r1 , Y2 r2 ,1 , Yl rl p1r1 p2r2 plrl r1!r2 !rl ! The approximation is PY1 r1 , Y2 r2 ,1 , Yl rl n! r1!r2 !rl ! 1 l r n pi exp i 2 i 1 n pi 2 2 nl 1 p1 p2 pl Skipping the maximum likelihood discussion in the text … Under the large sample assumption we can approximate using the Gaussian normal with Y n poi N n pi , n pi while the U i i n poi The statistical test, known as Pearson Test Statistics becomes Y n poi V i n p i 1 oi 2 l The Person’s test statistics has the form of a chi-squared RV with l-1 degrees of freedom. (Note chi-squared is based on the sum of Gaussian R.V. squared). Therefore in assigning regions of convergence or probability the chi-squared table must be used (G-5 Table 3). Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 35 of 36 ECE 3800 Example 7.4-1 Coin fairness Assuming heads and tails are probability 0.50. Assume 100 flips at alevel of significance of alpha =0.05. If we observe 61 heads … 2 Y n poi 1 2 2 V i 61 50 39 50 n p 100 0.5 i 1 oi l V 1 121 121 4.84 50 Using the table … l-1=1 degree of freedom, 0.95 re3sults in z=3.84 F 2 0.95, l 1 1 3.84 3.84 V The criteria is not passed. If there were 41 and 59 … V 1 9 2 9 2 3.24 50 Example 7.4-2 Fair Die Assume 1000 rolls with the following sequential results: 152, 175, 165, 180, 159, 171. 2 Y n poi 1 V i n p 1000 1 i 1 oi 6 l V 152 1672 175 1672 165 1672 2 2 2 180 167 159 167 171 167 1 225 64 4 169 64 16 3.25 167 Using the table … l-1=5 degree of freedom, 0.95 results in z=11.1 Therefore, F 2 0.95, l 1 5 11.1 3.25 V The criterion is passed. Notes and figures are based on or taken from materials in the course textbook: Probabilistic Methods of Signal and System Analysis (3rd ed.) by George R. Cooper and Clare D. McGillem; Oxford Press, 1999. ISBN: 0-19-512354-9. B.J. Bazuin, Spring 2015 36 of 36 ECE 3800