Midterm Review Econ 240A 1 The Big Picture 2 The Classical Statistical Trail Rates & Proportions Inferential Statistics Descriptive Statistics Probability Discrete Random Application Binomial Variables Power 4-#4 Discrete Probability Distributions; Moments Descriptive Statistics Power One-Lab One Concepts central tendency: mode, median, mean dispersion: range, inter-quartile range, standard deviation (variance) Are central tendency and dispersion enough descriptors? 4 Concepts Normal Distribution – Central tendency: mean or average – Dispersion: standard deviation Non-normal distributions Density Function for the Standardized Normal Variate Draw a Histogram 0.45 80 0.4 Frequency 0.35 Density 0.3 0.25 0.2 60 40 20 0.15 0 0.1 15 0.05 -4 -3 -2 -1 45 60 75 90 105 120 Bills 0 -5 30 0 1 2 3 4 5 Standard Deviations 5 The Classical Statistical Trail Rates & Proportions Descriptive Statistics Inferential Statistics Classicall Application Modern Probability Discrete Random Binomial Variables Power 4-#4 Discrete Probability Distributions; Moments Exploratory Data Analysis Stem and Leaf Diagrams Box and Whiskers Plots 7 Weight Data Males: 140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 175 170 180 135 170 157 130 185 190 155 170 155 215 150 145 155 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 190 145 150 164 140 142 136 123 155 Females: 140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 8 115 150 110 116 108 95 125 133 110 150 108 9 Box Diagram median First or lowest quartile; 25% of observations below Upper or highest quartile 25% of observations above 10 3rd Quartile + 1.5* IQR = 156 + 46.5 = 202.5; 1st value below =195 11 The Classical Statistical Trail Rates & Proportions Inferential Statistics Descriptive Statistics Probability Discrete Random Application Binomial Variables Power 4-#4 Discrete Probability Distributions; Moments Power Three - Lab Two Probability 13 Operations on events The event A and the event B both occur: ( A B) Either the event A or the event B ( A B) occurs or both do: The event A does not occur, i.e.not A: A 14 Probability statements Probability of either event A or event B p( A B) p( A) p( B) p( A B) – if the events are mutually exclusive, then p( A B) 0 probability of event B p( B) 1 p ( B ) 15 Conditional Probability Example: in rolling two dice, what is the probability of getting a red one given that you rolled a white one? – P(R1/W1) ? 16 In rolling two dice, what is the probability of getting a red one given that you rolled a white one? 17 Conditional Probability Example: in rolling two dice, what is the probability of getting a red one given that you rolled a white one? – P(R1/W1) ? p( R1 / W 1) p( R1 W 1) / p(W 1) (1 / 36) /(1 / 6) 18 Independence of two events p(A/B) = p(A) – i.e. if event A is not conditional on event B – then p A B p( A) * p( B) 19 The Classical Statistical Trail Rates & Proportions Inferential Statistics Descriptive Statistics Probability Discrete Random Application Binomial Variables Power 4-#4 Discrete Probability Distributions; Moments Power 4 – Lab Two 21 Three flips of a coin; 8 elementary outcomes H p p p H 1-p H H T p 1-p T T H H T 1-p p T H 1-p T T 3 heads 2 heads 2 heads 1 head 2 heads 1 head 1 head 0 heads 22 The Probability of Getting k Heads The probability of getting k heads (along a given branch) in n trials is: pk *(1-p)n-k The number of branches with k heads in n trials is given by Cn(k) So the probability of k heads in n trials is Prob(k) = Cn(k) pk *(1-p)n-k This is the discrete binomial distribution where k can only take on discrete values of 0, 1, …k 23 Expected Value of a discrete random variable E(x) = n x(i) * p[ x(i)] i 0 the expected value of a discrete random variable is the weighted average of the observations where the weight is the frequency of that observation 24 Variance of a discrete random variable VAR(xi) = n 2 {[ x ( i ) E [ x ( i )] } p[ x(i )] i 0 the variance of a discrete random variable is the weighted sum of each observation minus its expected value, squared,where the weight is the frequency of that observation 25 Lab Two The Binomial Distribution, Numbers & Plots – Coin flips: one, two, …ten – Die Throws: one, ten ,twenty The Normal Approximation to the Binomial – As n ∞, p(k) N[np, np(1-p)] – Sample fraction of successes: ˆp k / n, E ( pˆ ) np / n p,Var ( pˆ ) np(1 p) / n 2 pˆ ~ N [ p, p(1 p) / n] 26 Lab Three and Power 5,6 1/ 2[( z 0) /1]2 f ( z) [1/ 2 ] * e Z~N(0,1) Prob(1.96≤z≤1.96)=0.95 ˆ ˆ ˆ Density Function for the Standardized Normal Variate 0.45 0.4 0.35 Density 0.3 0.25 0.2 2.5% 0.15 z [ p E ( p )] / pˆ 2.5% 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 -1.96Standard Deviations 1.96 4 5 ˆ pˆ pˆ (1 pˆ ) / n prob (1.96 [ pˆ E ( pˆ )] / ˆ pˆ 1.96) 0.95 prob (1.96 *ˆ pˆ pˆ p 1.96 *ˆ pˆ ) 0.95 prob (1.96 * ˆ pˆ p pˆ 1.96 *ˆ pˆ ) 0.95 prob ( pˆ 1.96 *ˆ pˆ p pˆ 1.96 *ˆ pˆ ) 0.95 27 Hypothesis Testing: Rates & Proportions One-tailed test: Step #1:hypotheses H0 : p f Ha : p f One-tailed test: Step #2: test statistic z [ pˆ E ( pˆ )] / ˆ pˆ [ pˆ f ] / ˆ pˆ , ˆ pˆ pˆ (1 pˆ ) / n Density Function for the Standardized Normal Variate One-tailed test: Step #3: choose e.g. = 5% 0.4 Z=1.645 0.35 0.3 Density Step # 4: this determines The rejection region for H0 0.45 Reject if 0.25 ( pˆ f ) / ˆ 1.645 0.2 0.15 5% 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 Standard Deviations 2 3 4 28 5 Remaining Topics Interval estimation and hypothesis testing for population means, using sample means Decision theory Regression – Estimators OLS Maximum lilelihood Method of moments – ANOVA 29 Midterm Review Cont. Econ 240A 30 Last Time 31 The Classical Statistical Trail Rates & Proportions Inferential Statistics Descriptive Statistics Probability Discrete Random Application Binomial Variables Power 4-#4 Discrete Probability Distributions; Moments Remaining Topics Interval estimation and hypothesis testing for population means, using sample means Decision theory Regression – Estimators OLS Maximum lilelihood Method of moments – ANOVA 33 Lab Three Power 7 Population Random variable x Distribution f(m, 2 f? Pop. Sample Sample Statistic: x ~ N (m , ) 2 Sample Statistic n s 2 ( xi x ) 2 /( n 1) i 1 34 f(x) in this example is Uniform X~U(0.5, 1/12) E(x) = 0.5 Var(x) = 1/12 f(x) 0 x Nonetheless, from the central Limit theorem, the sample mean Has a normal density 1 x ~ N [0.5, (1 / 12) / n ] Density Function for the Standardized Normal Variate 0.45 z [ x E ( x )] / x 0.4 0.35 z [ x 0.5] / (1 / 12) / n Density 0.3 0.25 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 Standard Deviations 2 3 4 5 35 Histogram of 50 Sample Means, Uniform, U(0.5, 1/12) 0. 95 M or e 0. 85 0. 75 0. 65 0. 55 0. 45 0. 35 0. 25 0. 15 20 15 10 5 0 0. 05 Frequency Histogram of 50 sample means Sample Mean Average of the 50 sample means: 0.4963 36 Inference ingeneral , x ~ f ( m , 2 ) fromthecentra lim ittheoremweknow x ~ N [ m , 2 / n] Density Function for the Standardized Normal Variate so, z [ x m ] /( / n ) 0.45 0.4 Pr ob(1.96 z 1.96) 0.95 Pr ob(1.96( / n ) ( x m ) 1.96( / n )) 0.95 Z=-1.96 Density Pr ob(1.96 ( x m ) /( / n ) 1.96) 0.95 0.35 0.3 0.25 Z=1.96 0.2 2.5% Pr ob(1.96( / n ) ( m x ) 1.96( / n )) 0.95 0.15 2.5% 0.1 0.05 Pr ob( x 1.96( / n ) ( m ) x 1.96( / n )) 0.95 0 -5 -4 -3 -2 -1 0 1 2 3 Standard Deviations 37 4 5 Confidence Intervals If the population variance is known, use the normal distribution and z z ( x m ) /( / n ) If the population variance is unknown, use Student’s t-distribution and t t ( x m ) /( s / n ) where, s i n 2 ( x x ) /( n 1) i i 1 38 t-distribution Text p.253 Normal compared to t t distribution as smple size grows 39 Appendix B Table 4 p. B-9 40 Hypothesis tests Step two: choose the test statistic Step One: state the hypotheses H0 : m v You choose v Ha : m v 2-tailed test Step Three: choose the size Of the Type I error, =0.05 Density Function for the Standardized Normal Variate z [ x E ( x )] / x z [ x v] /( / n ) Step four: reject the null hypothesis if the test statistic is in the Rejection region 0.45 0.4 Density Z=-1.96 0.35 Z=1.96 0.3 0.25 0.2 2.5% 2.5% 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Standard Deviations 41 True State of Nature p = 0.5 Accept null Decision Reject null No Error 1- Type I error C(I) P > 0.5 Type II error b C(II) No Error 1-b E[C] = C(I)* + C(II)* b 42 Regression Estimators Minimize the sum of squared residuals Maximum likelhood of the sample Method of moments 43 Minimize the sum of squared residuals n n Min. eˆ ( yi yˆ i ) i 1 2 i 2 i 1 n 2 ˆ ( yi aˆ bxi ) i 1 n eˆ / aˆ 0, y aˆ bˆx i 1 2 i 44 Maximum likelihood n 2 2 ˆ ˆ ln Lik / 0, [ ei ] / n 2 i 1 Method of moments n n n bˆ ( yi y )( xi x ) / n( xi x ) 2 2 ˆ b ( yi y )( xi x ) / ( xi x ) i 1 i 1 i 1 i 1 45 Inference in Regression Interval estimation t [bˆ Ebˆ] / ˆ bˆ [bˆ b] / ˆ bˆ prob(t.025 [bˆ b] / ˆ bˆ t.975 ) 0.95 prob(ˆ bˆt.025 bˆ b ˆ bˆt.975 ) 0.95 prob(ˆ bˆ t.025 b bˆ ˆ bˆt.975 ) 0.95 prob(bˆ ˆ bˆ t.025 b bˆ ˆ bˆt.975 ) 0.95 46 Estimated Coefficients, Power 8 taˆ [aˆ E (aˆ )] / ˆ aˆ (1.204 0) / 0.727 1.41 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% -1.02377776 0.727626534 -1.40701 0.167999648 -2.499472762 0.451917 0.06565026 0.001086328 60.43316 8.58311E-38 0.063447085 0.067853 â Intercept b̂ X Variable 1 47 Appendix B Table 4 p. B-9 48 Inference in Regression Hypothesis testing Step One State the hypothesis H0 : b 0 Ha : b 0 Step Three Choose the size of the Type I error, Step Two Choose the test statistic t [bˆ b] / ˆ bˆ Step Four Reject the null hypothesis if the Test statistic is in the rejection region 49