Chapter 4 Slides (PPT)

Random Variables and Probability Distributions • Random Variables - Random outcomes corresponding to subjects randomly selected from a population. • Probability Distributions - A listing of the possible outcomes and their probabilities (discrete r.v.s) or their densities (continuous r.v.s) • Normal Distribution - Bell-shaped continuous distribution widely used in statistical inference • Sampling Distributions - Distributions corresponding to sample statistics (such as mean and proportion) computed from random samples Discrete Probability Distributions • Discrete RV - Random variable that can take on a finite (or countably infinite) set of discontinuous possible outcomes (Y) • Discrete Probability Distribution - Listing of outcomes and their corresponding probabilities (y , P(y)) 0  P( y )  1  P ( y )  1 all y Example - Supreme Court Vacancies • Supreme Court Vacancies by Year 18371975 • Y  # Vacancies in Randomly selected year # Vacancies (y) 0 1 2 3 >3 Total Frequency (# of Years) 81 43 14 1 0 139 Proportion (P(y)) 81/139=.5827 43/139=.3094 14/139=.1007 1/139=.0072 0/139=.0000 1.0000 Source: R.J. Morrison (1977), “FDR and the Supreme Court: An Example of the Use of Probability Theory in Political History”, History and Theory, Vol. 16, pp 137-146 Parameters of a P.D. • Mean (aka Expected Value) - Long run average outcome   E(Y )   yP( y) Deviation - Measure of the “typical” distance of an outcome from the mean  Standard   E (Y   ) 2  2 ( y   ) P( y )   2 2 y P ( y )    Example - Supreme Court Vacancies y P(y) yP(y) y2P(y) 0 .5827 .0000 .0000 1 .3094 .3094 .3094 2 .1007 .2014 .4028 3 .0072 .0216 .0648 Total 1.0000 .5324 .7770    yP( y)  .5324  2 2 2 y P ( y )    . 7770  (. 5324 )  .4936  .7025  Normal Distribution • Bell-shaped, symmetric family of distributions • Classified by 2 parameters: Mean () and standard deviation (). These represent location and spread • Random variables that are approximately normal have the following properties wrt individual measurements: – – – – Approximately half (50%) fall above (and below) mean Approximately 68% fall within 1 standard deviation of mean Approximately 95% fall within 2 standard deviations of mean Virtually all fall within 3 standard deviations of mean • Notation when Y is normally distributed with mean  and standard deviation  : Y ~ N ( , ) Normal Distribution P(Y   )  0.50 P(     Y     )  0.68 P(   2  Y    2 )  0.95 Example - Heights of U.S. Adults • Female and Male adult heights are well approximated by normal distributions: YF~N(63.7,2.5) YM~N(69.1,2.6) 20 20 18 16 14 12 10 10 8 6 4 Std. Dev = 2.48 Std. Dev = 2.61 2 Mean = 63.7 Mean = 69.1 0 N = 99.68 55.5 57.5 56.5 59.5 58.5 61.5 60.5 63.5 62.5 65.5 64.5 67.5 66.5 INCHESF 69.5 68.5 70.5 N = 99.23 0 59.5 61.5 63.5 65.5 67.5 69.5 71.5 73.5 75.5 60.5 62.5 64.5 66.5 68.5 70.5 72.5 74.5 76.5 INCHESM Cases weighted by PCTM Cases weighted by PCTF Source: Statistical Abstract of the U.S. (1992) Standard Normal (Z) Distribution • Problem: Unlimited number of possible normal distributions (- <  <  ,  > 0) • Solution: Standardize the random variable to have mean 0 and standard deviation 1 Y ~ N ( , )  Z  Y   ~ N (0,1) • Probabilities of certain ranges of values and specific percentiles of interest can be obtained through the standard normal (Z) distribution Standard Normal (Z) Distribution • Standard Normal Distribution Characteristics: – – – – a za P(Z  0) = P(Y   ) = 0.5000 P(-1  Z  1) = P(-  Y  + ) = 0.6826 P(-2  Z  2) = P(-2  Y  +2 ) = 0.9544 P(Z  za) = P(Z  -za) = a (using Z-table) 0.500 0.000 0.100 1.282 0.050 1.645 0.025 1.960 0.010 2.326 0.005 2.576 Finding Probabilities of Specific Ranges • Step 1 - Identify the normal distribution of interest (e.g. its mean () and standard deviation () ) • Step 2 - Identify the range of values that you wish to determine the probability of observing (YL , YU), where often the upper or lower bounds are  or - • Step 3 - Transform YL and YU into Z-values: ZL  YL    ZU  YU    • Step 4 - Obtain P(ZL Z  ZU) from Z-table Example - Adult Female Heights • What is the probability a randomly selected female is 5’10” or taller (70 inches)? • Step 1 - Y ~ N(63.7 , 2.5) • Step 2 - YL = 70.0 YU =  • Step 3 70.0  63.7 ZL   2.52 ZU   2.5 • Step 4 - P(Y  70) = P(Z  2.52) = .0059 (  1/170) z 2.4 2.5 2.6 .00 .0082 .0062 .0047 .01 .0080 .0060 .0045 .02 .0078 .0059 .0044 .03 .0075 .0057 .0043 Finding Percentiles of a Distribution • Step 1 - Identify the normal distribution of interest (e.g. its mean () and standard deviation () ) • Step 2 - Determine the percentile of interest 100p% (e.g. the 90th percentile is the cut-off where only 90% of scores are below and 10% are above) • Step 3 - Turn the percentile of interest into a tail probability a and corresponding z-value (zp): – If 100p  50 then a = 1-p and zp = za – If 100p < 50 then a = p and zp = -za • Step 4 - Transform zp back to original units: Yp    z  p Example - Adult Male Heights • • • • Above what height do the tallest 5% of males lie above? Step 1 - Y ~ N(69.1 , 2.6) Step 2 - Want to determine 95th percentile (p = .95) Step 3 - Since 100p > 50, a = 1-p = 0.05 zp = za = z.05 = 1.645 • Step 4 - Y.95 = 69.1 + (1.645)(2.6) = 73.4 z 1.5 1.6 1.7 .03 .0630 .0516 .0418 .04 .0618 .0505 .0409 .05 .0606 .0495 .0401 .06 .0594 .0485 .0392 Statistical Models • When making statistical inference it is useful to write random variables in terms of model parameters and random errors Y    (Y   )      Y  • Here  is a fixed constant and  is a random variable • In practice  will be unknown, and we will use sample data to estimate or make statements regarding its value Sampling Distributions and the Central Limit Theorem • Sample statistics based on random samples are also random variables and have sampling distributions that are probability distributions for the statistic (outcomes that would vary across samples) • When samples are large and measurements independent then many estimators have normal sampling distributions (CLT):    Y ~ N  ,  n  – Sample Mean: – Sample Proportion:   (1   )    ~ N   ,  n   ^ Example - Adult Female Heights • Random samples of n = 100 females to be selected • For each sample, the sample mean is computed • Sampling distribution: 2.5   Y ~ N  63.5,   N (63.5,0.25) 100   • Note that approximately 95% of all possible random samples of 100 females will have sample means between 63.0 and 64.0 inches

Chapter 4 Slides (PPT)

Related documents

Products

Support

Chapter 4 Slides (PPT)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib