Final Review Econ 240A 1 Outline The Big Picture Processes to remember ( and habits to form) for your quantitative career (FYQC) Concepts to remember FYQC Discrete Distributions Continuous distributions Central Limit Theorem Regression 2 The Classical Statistical Trail Rates & Proportions Inferential Statistics Descriptive Statistics Probability Discrete Random Application Binomial Variables Discrete Probability Distributions; Moments Where Do We Go From Here? Contingency Tables Regression Properties Assumptions Violations Diagnostics Modeling Probability Count ANOVA 4 Processes to Remember Exploratory Data Analysis Distribution of the random variable Histogram Lab 1 Stem and leaf diagram Lab 1 Box plot Lab 1 Time Series plot: plot of random variable y(t) Vs. time index t X-y plots: Y Vs. x1, y Vs. x2 etc. Diagnostic Plots Actual, fitted and residual Cross-section data: heteroskedasticity-White test Time series data: autocorrelation- Durbin- Watson statistic 5 Time Series UC's Share of the CA General Fund 1969-70 through 2009-10 0.08 UCBUDSHARE 0.07 0.06 0.05 0.04 0.03 0.02 0 10 20 30 40 50 T IMEX 6 7 UCBudsh(t) = a + b*timex(t) + e(t) e(t) = 0.68*e(t-1) + u(t) 0.68*UCbudsh(t-1) = 0.68*a + b*0.68*timex(t-1) + 0.68*e(t-1) [UCbudsh(t) – 0.68*UCbudsh(t-1)] = [(1-0.68)*a] + b*[timex – 0.68*timex(-1)] + u(t) Y(t) = a* + b*x(t) + u(t) Called autoregressive (auto-correlated) error 8 9 10 Concepts to Remember Random Variable: takes on values with some probability Repeated Independent Bernoulli Trials Flipping a coin Flipping a coin twice or more Random Sample Likelihood of a random sample Prob(e1^e2 …^en) = Prob(e1)*Prob(e2)…*Prob(en) 11 Discrete Distributions Discrete Random Variables Probability density function: Prob(x=x*) Cumulative distribution function, CDF x x* Pr ob( x) x x1 Equi-Probable or Uniform E.g x = 1, 2, 3 Prob(x=1) =1/3 = Prob(x=2) =Prob(x=3) 12 Discrete Distributions Binomial: Prob(k) = [n!/k!*(n-k)!]* pk (1-p)n-k E(k) = n*p, Var(k) = n*p*(1-p) Simulated sample binomial random variable Lab 2 Rates and proportions pˆ k / n E ( pˆ ) n * p / n p Var ( pˆ ) n * p * (1 p) / n 2 p * (1 p) / n Poisson 13 Continuous Distributions Continuous random variables Density function, f(x) Cumulative distribution function x* F ( x*) f ( x)dx Survivor function S(x*) = 1 – F(x*) Hazard function h(t) =f(t)/S(t) Cumulative hazard functin, H(t) t* H (t * ) h(t )dt 0 14 Continuous Distributions Simple moments E(x) = mean = expected value E ( x) x * f ( x)dx E(x2) Central Moments E[x - E(x)] = 0 E[x – E(x)]2 =Var x E[x – E(x)]3 , a measure of skewness E[x – E(x)]4 , a measure of kurtosis 15 Continuous Distributions Normal Distribution Simulated sample random normal variable Lab 3 Approximation to the binomial, n*p>=5, n*(1-p)>=5 Standardized normal variate: z = (x-)/ Exponential Distribution Weibull Distribution Cumulative hazard function: H(t) = (1/) t Logarithmic transform ln H(t) = ln (1/) + lnt 16 Density Function for the Standardized Normal Variate f ( z) [1 / 2 ] * e 1/ 2[( z 0) /1]2 0.45 0.4 0.35 Density 0.3 0.25 0.2 0.15 0.1 0.05 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Standard Deviations 17 Cumulative Distribution Function for a Standardized Normal Variate 1 0.9 0.8 Probabilty 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Standard Deviations 18 Central Limit Theorem Sample mean, n x xi / n i 1 19 Population Random variable x Distribution f(, 2) f? Pop. Sample Sample Statistic: x ~ N ( , ) 2 Sample Statistic n s 2 ( xi x ) 2 /( n 1) i 1 20 The Sample Variance, 2 s n s 2 [ x(i ) x ]2 /( n 1) i 1 (n 1) * s 2 / 2 square with n-1 degrees of Is distributed chi 2 freedom (text, 12.2 “inference about a population variance) (text, pp. 266-270, Chi-Squared distribution) n n (n 1) s / ( xi x ) / z 2 2 2 i 1 2 2 i 1 21 Regression Models Statistical distributions and tests Student’s t F Chi Square Assumptions Pathologies 22 Regression Models Time Series Linear trend model: y(t) =a + b*t +e(t) Lab 4 Exponential trend model: y(t) =exp[a+b*t+e(t)] Natural logarithmic transformation ln Ln y(t) = a + b*t + e(t) Lab 4 Linear rates of change: yi = a + b*xi + ei dy/dx = b Returns generating process: [ri(t) – rf0] = + *[rM(t) – rf0] + ei(t) Lab 6 23 Regression Models Percentage rates of change, elasticities Cross-section Ln assetsi =a + b*ln revenuei + ei Lab 5 dln assets/dlnrevenue = b = [dassets/drevenue]/[assets/revenue] = marginal/average 24 Linear Trend Model Linear trend model: y(t) =a + b*t +e(t) Lab 4 25 Lab 4 UC Budget Share of General Fund Expenditure, 1968-69 through 2005-06 8.00% 1968-69 7.00% 6.00% Percent 5.00% means: 5.22%, 18.5 yr. 4.00% 3.00% 2005-06 y = -0.0009x + 0.0691 R2 = 0.8449 2.00% 1.00% 0.00% 0 5 10 15 20 25 30 35 40 Year 26 Lab Four SUMMARY OUTPUT Regression Statistics Multiple R 0.9191666 R Square 0.8448673 Adjusted R Square 0.840558 Standard Error 0.0044164 Observations 38 F-test: F1,36 = [R2/1]/{[1-R2]/36} = 196 = Explained Mean Square/Unexplained mean square ANOVA df Regression Residual Total Intercept X Variable 1 1 36 37 SS MS F Significance F 0.003824089 0.003824 196.0593 3.872E-16 0.000702171 1.95E-05 0.00452626 Coefficients Standard Error t Stat 0.0690865 0.00140505 49.17012 -0.000915 6.53335E-05 -14.00212 RESIDUAL OUTPUT Observation Predicted Y 1 0.0690865 2 0.0681717 3 0.0672569 4 0.066342 Residuals 0.005433868 0.005728697 0.001943805 0.005271241 P-value Lower 95% Upper 95% 1.32E-34 0.0662369 0.071936 3.87E-16 -0.001047 -0.000782 t-test: H0: b=0 HA: b≠0 t =[ -0.000915 – 0]/0.0000653 = -14 27 Lab 4 X Variable 1 Residual Plot Residuals 0.01 0.005 0 -0.005 0 10 20 30 40 -0.01 -0.015 X Variable 1 28 Lab 4 29 Lab 4 Student's t-distribution for 36 degrees of freedom 0.4 DENSITY 0.3 0.2 0.1 2.5% 0.0 -3 -14 -2 -1 -2.03 0 1 2 3 STUDENT 30 Lab Four F-Distribution, 1,36 degrees of f reedom 20 FDENSITY 15 10 5 5% 0 0 5 10 4.12 FSTAT 15 196 31 Exponential Trend Model Exponential trend model: y(t) =exp[a+b*t+e(t)] Natural logarithmic transformation ln Ln y(t) = a + b*t + e(t) Lab 4 32 Lab Four UC Budget in Billions, 1968-69 through 2005-06 5 4.5 y = 0.3949e 4 0.0637x 2 R = 0.9079 3.5 $ 3 2005-06 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 35 40 Year 33 Lab Four UC Budget in Billions, 1968-69 through 2005-06 2 y = 0.0637x - 0.929 2 R = 0.9079 1.5 37 1 Logarithm 2005- 0.5 0 0 5 10 15 20 25 30 35 40 -0.5 -1 -1.5 Year 34 Percentage Rates of Change, Elasticities Percentage rates of change, elasticities Cross-section Ln assetsi =a + b*ln revenuei + ei Lab 5 dln assets/dlnrevenue = b = [dassets/drevenue]/[assets/revenue] = marginal/average 35 Lab Five Elasticity b = 0.778 H0: b=1 HA: b<1 t25 = [0.778 – 1]/0.148 = - 1.5 t-crit(5%) = -1.71 36 Linear Rates of Change Linear rates of change: yi = a + b*xi + ei dy/dx = b Returns generating process: [ri(t) – rf0] = + *[rM(t) – rf0] + ei(t) Lab 6 37 Watch Excel on xy plots! 15.00 10.00 y = 1.0601x - 0.106 2 R = 0.9136 5.00 0.00 -15 -10 -5 0 5 10 -5.00 -10.00 -13.35, 16.09;Ucnet, S&Pnet -15.00 True x axis: UC Net -20.00 38 Lab Six SUMMARY OUTPUT rGE = a + b*rSP500 + e Regression Statistics Multiple R 0.6362898 R Square 0.4048647 Adjusted R Square 0.391927 Standard Error 0.0340527 Observations 48 ANOVA df Regression Residual Total 1 46 47 SS MS F Significance F 0.036287438 0.036287 31.29335 1.17E-06 0.053341113 0.00116 0.089628551 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% 0.0065263 0.005659195 1.153229 0.254774 -0.00487 0.0179177 1.0926736 0.195327967 5.594046 1.17E-06 0.699499 1.4858484 Intercept X Variable 1 RESIDUAL OUTPUT Observation 1 2 3 Predicted Y Residuals 0.014493 -0.00718303 0.0213124 -0.044534406 0.0297096 0.037520397 39 Lab Six Y X Variable 1 Line Fit Plot -0.1 0.2 0.1 0 -0.05 -0.1 0 Y Predicted Y 0.05 0.1 X Variable 1 40 Lab Six Residuals X Variable 1 Residual Plot -0.1 -0.05 0.2 0.1 0 -0.1 0 0.05 0.1 X Variable 1 41 View/Residual tests/Histogram-Normality Test 42 Linear Multivariate Regression House Price, # of bedrooms, house size, lot size Pi = a + b*bedroomsi + c*house_sizei + d*lot_sizei + ei 43 Lab Six price bedrooms House_size Lot_size 44 Price = a*dummy2 +b*dummy34 +c*dummy5 +d*house_size01 +e 45 Lab Six C captures three and four bedroom houses 46 Regression Models How to handle zeros? Labs Six and Seven: Lottery data-file Linear probability model: dependent variable: zero-one Logit: dependent variable: zero-one Probit: dependent variable: zero-one Tobit: dependent variable: lottery See PowerPoint application to lottery with Bern variable 47 Regression Models Failure time models Exponential Survivor: S(t) = exp[-*t], ln S(t) = -*t Hazard rate, h(t) = Cumulative hazard function, H(t) = *t Weibull Hazard rate, h(t) = f(t)/S(t) = (/)(t/)-1 Cumulative hazard function: H(t) = (1/) t Logarithmic transform ln H(t) = ln (1/) + lnt 48 Applications: Discrete Distributions Binomial Equi-probable or uniform Poisson Rates & proportions, small samples, ex. Voting polls If I asked a question every day, without replacement, what is the chance I will ask you a question today? Approximate the binomial where p→0 49 Aplications: Discrete Distributions Multinomial More than two outcomes, ex each face of the die or 6 outcomes 50 Applications: Continuous Distributions Normal Equi-probable or uniform Students t Rates & proportions, np>5, n(1-p)>5; tests about population means given 2 Tests about population means, 2 not known; test regression parameter = 0 51 Applications: Continuous Distributions F Ch-Square, 2 Regression: ratio of explained mean square to unexplained mean square, i.e. R2/k÷(1-R2)/(n-k); test dropping 2 or more variables (Wald test) Contingency Table analysis; Likelihood ratio tests (Wald test) 52 Applications: Continuous Distributions Exponential Weibull Failure (survival) time with constant hazard rate Failure time analysis, test whether hazard rate is constant or increasing or decreasing 53 Labs 7, 8, 9 Lab 7 Failure Time Analysis Lab 8 Contingency Table Analysis Lab 9 One-Way and Two-Way ANOVA 54