Lecture 2 Statistical Principles Probability • A random variable X is a variable whose numerical value is determined by chance, the outcome of a random phenomenon – A discrete random variable has a countable number of possible values, such as 0, 1, and 2 – A continuous random variable, such as time and distance, can take on any value in an interval • A probability distribution P[Xi] for a discrete random variable X assigns probabilities to the possible values X1, X2, and so on • For example, when a fair six-sided dice is rolled, there are six equally likely outcomes, each with a 1/6 probability of occurring 1 Mean, Variance, and Standard Deviation • The expected value (or mean) of a discrete random variable X is a weighted average of all possible values of X, using the probability of each X value as weights: N X E[ X ] XiP[ Xi ] i 1 • When all weights are equal (like in our example of throwing a dice), we can simplify to: 1 N X E[ X ] Xi N i 1 2 Mean, Variance, and Standard Deviation • The variance of a discrete random variable X is a weighted average, for all possible values of X, of the squared difference between X and its expected value, using the probability of each X value as weights: N E [( X X ) ] ( X i X )2 P [ X i ] 2 X 2 i 1 • When all weights are equal (like in our example of throwing a dice), we can simplify to: N 1 X2 E [( X X )2 ] ( X i X )2 N i 1 3 Standardized Variables • To standardize a random variable X, we subtract its mean µ and then divide by its standard deviation σ: Zi X i X X • No matter what the initial units of X, the standardized random variable Z has a mean of 0 and a standard deviation of 1. • The standardized variable Z measures how many standard deviations X is above or below its mean: – If X is equal to its mean, Z is equal to 0 – If X is one standard deviation above its mean, Z is equal to 1 – If X is two standard deviations below its mean, Z is equal to –2 4 Example: The throw of a dice 5 Example: The throw of a dice (cont.) 6 Probability Distribution (Density Curve) for 10 six-sided dice, using standardized Z Now, let X be the sum of the numbers when rolling 10 six-sided dice. The next Figure illustrates the standardized random variable Z for that case: 7 The Normal Distribution • The density curve of Z for many rolls of a dice approaches the normal distribution (graphed in the next Figure). • The central limit theorem (CLT) states: “If Z is a standardized sum of N independent, identically distributed random variables with a finite, nonzero standard deviation, then the probability distribution of Z approaches the normal distribution as N increases.” • In other words, the CLT says that the sum (or mean) of many random variables is distributed according to the normal distribution, e.g.: – The weights of humans, dogs, and tomatoes – Scores on IQ, SAT, and GRE tests – Many economic variables. 8 The Normal Distribution 9 The Normal Distribution (cont.) • Special feature of the normal distribution: The probability that the value of Z will be in a specified interval is given by the corresponding area under the density curve – These areas can be determined by consulting statistical software, such as GRETL or a table, such as Table B-7 in Appendix B of the textbook. – As a rule of thumb: P[-1<Z<1] = 0.6826 P[-2<Z<2] = 0.9544 10 Moments of a Distribution • The Mean of a distribution is called the first (central) moment of a distribution. The Variance is called the second moment of a distribution. When a statistical distribution has strong central tendency, it is useful to characterize it by its moments. • The third and fourth moments of a distribution are skewness and kurtosis. • Skewness is a measure of asymmetry of a distribution: Skewness E ( X x )3 x3 – Skewness = 0: distribution is symmetric – Skewness > (<) 0: distribution has long right (left) tail 11 Moments of a Distribution • Kurtosis measures the mass in the tails of a distribution. It is a measure for the probability of large values: Kurtosis E ( X x )4 x4 – Kurtosis = 3: normal distribution – Kurtosis > 3: heavy tails • The kurtosis of a distribution is a measure of how much mass is in the tails and, therefore, is a measure of how much of the variance of X arises from extreme values (outliers). 12 Moments of a Distribution 13