Final Review

advertisement
Final Review
Econ 240A
1
Outline







The Big Picture
Processes to remember ( and habits to form) for
your quantitative career (FYQC)
Concepts to remember FYQC
Discrete Distributions
Continuous distributions
Central Limit Theorem
Regression
2
The Classical Statistical Trail
Rates &
Proportions
Inferential
Statistics
Descriptive
Statistics
Probability
Discrete Random
Application
Binomial
Variables
Discrete Probability Distributions; Moments
Where Do We Go From Here?
Contingency Tables
Regression
Properties
Assumptions
Violations
Diagnostics
Modeling
Probability
Count
ANOVA
4
Processes to Remember

Exploratory Data Analysis

Distribution of the random variable






Histogram Lab 1
Stem and leaf diagram Lab 1
Box plot Lab 1
Time Series plot: plot of random variable y(t) Vs. time
index t
X-y plots: Y Vs. x1, y Vs. x2 etc.
Diagnostic Plots



Actual, fitted and residual
Cross-section data: heteroskedasticity-White test
Time series data: autocorrelation- Durbin- Watson statistic
5
Time Series
UC's Share of the CA General Fund 1969-70 through 2009-10
0.08
UCBUDSHARE
0.07
0.06
0.05
0.04
0.03
0.02
0
10
20
30
40
50
T IMEX
6
7
UCBudsh(t) = a + b*timex(t) + e(t)
e(t) = 0.68*e(t-1) + u(t)
0.68*UCbudsh(t-1) = 0.68*a + b*0.68*timex(t-1) + 0.68*e(t-1)
[UCbudsh(t) – 0.68*UCbudsh(t-1)] = [(1-0.68)*a] + b*[timex – 0.68*timex(-1)] + u(t)
Y(t) = a* + b*x(t) + u(t)
Called autoregressive (auto-correlated) error
8
9
10
Concepts to Remember

Random Variable: takes on values with
some probability


Repeated Independent Bernoulli Trials



Flipping a coin
Flipping a coin twice or more
Random Sample
Likelihood of a random sample

Prob(e1^e2 …^en) = Prob(e1)*Prob(e2)…*Prob(en)
11
Discrete Distributions

Discrete Random Variables


Probability density function: Prob(x=x*)
Cumulative distribution function, CDF
x  x*
 Pr ob( x)
x  x1

Equi-Probable or Uniform

E.g x = 1, 2, 3 Prob(x=1) =1/3 = Prob(x=2) =Prob(x=3)
12
Discrete Distributions

Binomial: Prob(k) = [n!/k!*(n-k)!]* pk (1-p)n-k



E(k) = n*p, Var(k) = n*p*(1-p)
Simulated sample binomial random variable Lab 2
Rates and proportions
pˆ  k / n
E ( pˆ )  n * p / n  p
Var ( pˆ )  n * p * (1  p) / n 2  p * (1  p) / n

Poisson
13
Continuous Distributions

Continuous random variables


Density function, f(x)
Cumulative distribution function
x*
F ( x*) 
 f ( x)dx




Survivor function S(x*) = 1 – F(x*)
Hazard function h(t) =f(t)/S(t)
Cumulative hazard functin, H(t)
t*
H (t * )   h(t )dt
0
14
Continuous Distributions

Simple moments

E(x) = mean = expected value

E ( x) 
 x * f ( x)dx



E(x2)
Central Moments




E[x - E(x)] = 0
E[x – E(x)]2 =Var x
E[x – E(x)]3 , a measure of skewness
E[x – E(x)]4 , a measure of kurtosis
15
Continuous Distributions

Normal Distribution





Simulated sample random normal variable Lab 3
Approximation to the binomial, n*p>=5, n*(1-p)>=5
Standardized normal variate: z = (x-)/
Exponential Distribution
Weibull Distribution


Cumulative hazard function: H(t) = (1/) t
Logarithmic transform ln H(t) = ln (1/) +  lnt
16
Density Function for the Standardized Normal Variate
f ( z)  [1 / 2 ] * e
1/ 2[( z 0) /1]2
0.45
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
Standard Deviations
17
Cumulative Distribution Function for a Standardized Normal
Variate
1
0.9
0.8
Probabilty
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-5
-4
-3
-2
-1
0
1
2
3
4
5
Standard Deviations
18
Central Limit Theorem

Sample mean,
n
x   xi / n
i 1
19
Population
Random variable x
Distribution f(, 2)
f?
Pop.
Sample
Sample Statistic:
x ~ N ( ,  )
2
Sample Statistic
n
s 2   ( xi  x ) 2 /( n  1)
i 1
20
The Sample Variance,
2
s
n
s 2   [ x(i )  x ]2 /( n  1)
i 1
(n  1) * s 2 /  2
 square with n-1 degrees of
Is distributed chi
2
freedom (text, 12.2 “inference about a population variance)
(text, pp. 266-270, Chi-Squared distribution)
n
n
(n  1) s /    ( xi  x ) /    z
2
2
2
i 1
2
2
i 1
21
Regression


Models
Statistical distributions and tests





Student’s t
F
Chi Square
Assumptions
Pathologies
22
Regression Models

Time Series


Linear trend model: y(t) =a + b*t +e(t) Lab 4
Exponential trend model: y(t) =exp[a+b*t+e(t)]



Natural logarithmic transformation ln
Ln y(t) = a + b*t + e(t) Lab 4
Linear rates of change: yi = a + b*xi + ei


dy/dx = b
Returns generating process:

[ri(t) – rf0] =  + *[rM(t) – rf0] + ei(t) Lab 6
23
Regression Models

Percentage rates of change, elasticities

Cross-section

Ln assetsi =a + b*ln revenuei + ei Lab 5
 dln assets/dlnrevenue = b =
[dassets/drevenue]/[assets/revenue] = marginal/average
24
Linear Trend Model

Linear trend model: y(t) =a + b*t +e(t) Lab 4
25
Lab 4
UC Budget Share of General Fund Expenditure, 1968-69 through 2005-06
8.00%
1968-69
7.00%
6.00%
Percent
5.00%
means:
5.22%, 18.5
yr.
4.00%
3.00%
2005-06
y = -0.0009x + 0.0691
R2 = 0.8449
2.00%
1.00%
0.00%
0
5
10
15
20
25
30
35
40
Year
26
Lab Four
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9191666
R Square
0.8448673
Adjusted R Square
0.840558
Standard Error
0.0044164
Observations
38
F-test: F1,36 = [R2/1]/{[1-R2]/36} = 196
= Explained Mean Square/Unexplained mean square
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
36
37
SS
MS
F
Significance F
0.003824089 0.003824 196.0593 3.872E-16
0.000702171 1.95E-05
0.00452626
Coefficients Standard Error
t Stat
0.0690865
0.00140505 49.17012
-0.000915
6.53335E-05 -14.00212
RESIDUAL OUTPUT
Observation
Predicted Y
1 0.0690865
2 0.0681717
3 0.0672569
4 0.066342
Residuals
0.005433868
0.005728697
0.001943805
0.005271241
P-value Lower 95% Upper 95%
1.32E-34 0.0662369 0.071936
3.87E-16 -0.001047 -0.000782
t-test:
H0: b=0
HA: b≠0
t =[ -0.000915 – 0]/0.0000653 = -14
27
Lab 4
X Variable 1 Residual Plot
Residuals
0.01
0.005
0
-0.005 0
10
20
30
40
-0.01
-0.015
X Variable 1
28
Lab 4
29
Lab 4
Student's t-distribution for 36 degrees of freedom
0.4
DENSITY
0.3
0.2
0.1
2.5%
0.0
-3
-14
-2
-1
-2.03
0
1
2
3
STUDENT
30
Lab Four
F-Distribution, 1,36 degrees of f reedom
20
FDENSITY
15
10
5
5%
0
0
5
10
4.12
FSTAT
15
196
31
Exponential Trend Model

Exponential trend model: y(t) =exp[a+b*t+e(t)]


Natural logarithmic transformation ln
Ln y(t) = a + b*t + e(t) Lab 4
32
Lab Four
UC Budget in Billions, 1968-69 through 2005-06
5
4.5
y = 0.3949e
4
0.0637x
2
R = 0.9079
3.5
$
3
2005-06
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
35
40
Year
33
Lab Four
UC Budget in Billions, 1968-69 through 2005-06
2
y = 0.0637x - 0.929
2
R = 0.9079
1.5
37
1
Logarithm
2005-
0.5
0
0
5
10
15
20
25
30
35
40
-0.5
-1
-1.5
Year
34
Percentage Rates of Change, Elasticities

Percentage rates of change, elasticities

Cross-section

Ln assetsi =a + b*ln revenuei + ei Lab 5
 dln assets/dlnrevenue = b =
[dassets/drevenue]/[assets/revenue] = marginal/average
35
Lab Five
Elasticity b = 0.778
H0: b=1
HA: b<1
t25 = [0.778 – 1]/0.148 = - 1.5
t-crit(5%) = -1.71
36
Linear Rates of Change

Linear rates of change: yi = a + b*xi + ei


dy/dx = b
Returns generating process:

[ri(t) – rf0] =  + *[rM(t) – rf0] + ei(t) Lab 6
37
Watch Excel on xy plots!
15.00
10.00
y = 1.0601x - 0.106
2
R = 0.9136
5.00
0.00
-15
-10
-5
0
5
10
-5.00
-10.00
-13.35,
16.09;Ucnet,
S&Pnet
-15.00
True x axis: UC Net
-20.00
38
Lab Six
SUMMARY OUTPUT
rGE = a + b*rSP500 + e
Regression Statistics
Multiple R
0.6362898
R Square
0.4048647
Adjusted R Square
0.391927
Standard Error
0.0340527
Observations
48
ANOVA
df
Regression
Residual
Total
1
46
47
SS
MS
F
Significance F
0.036287438 0.036287 31.29335 1.17E-06
0.053341113 0.00116
0.089628551
Coefficients Standard Error t Stat
P-value Lower 95% Upper 95%
0.0065263 0.005659195 1.153229 0.254774 -0.00487 0.0179177
1.0926736 0.195327967 5.594046 1.17E-06 0.699499 1.4858484
Intercept
X Variable 1
RESIDUAL OUTPUT
Observation
1
2
3
Predicted Y
Residuals
0.014493 -0.00718303
0.0213124 -0.044534406
0.0297096 0.037520397
39
Lab Six
Y
X Variable 1 Line Fit Plot
-0.1
0.2
0.1
0
-0.05 -0.1 0
Y
Predicted Y
0.05
0.1
X Variable 1
40
Lab Six
Residuals
X Variable 1 Residual Plot
-0.1
-0.05
0.2
0.1
0
-0.1 0
0.05
0.1
X Variable 1
41
View/Residual tests/Histogram-Normality Test
42
Linear Multivariate Regression

House Price, # of bedrooms, house size, lot
size

Pi = a + b*bedroomsi + c*house_sizei + d*lot_sizei + ei
43
Lab Six
price
bedrooms
House_size
Lot_size
44
Price = a*dummy2 +b*dummy34 +c*dummy5 +d*house_size01 +e
45
Lab Six
C captures three and four bedroom houses
46
Regression Models

How to handle zeros?

Labs Six and Seven: Lottery data-file




Linear probability model: dependent variable:
zero-one
Logit: dependent variable: zero-one
Probit: dependent variable: zero-one
Tobit: dependent variable: lottery
See PowerPoint application to lottery with Bern variable
47
Regression Models

Failure time models

Exponential




Survivor: S(t) = exp[-*t], ln S(t) = -*t
Hazard rate, h(t) = 
Cumulative hazard function, H(t) = *t
Weibull



Hazard rate, h(t) = f(t)/S(t) = (/)(t/)-1
Cumulative hazard function: H(t) = (1/) t
Logarithmic transform ln H(t) = ln (1/) +  lnt
48
Applications: Discrete Distributions

Binomial

Equi-probable or
uniform

Poisson



Rates & proportions,
small samples, ex.
Voting polls
If I asked a question
every day, without
replacement, what is
the chance I will ask
you a question today?
Approximate the
binomial where p→0
49
Aplications: Discrete Distributions

Multinomial

More than two
outcomes, ex each
face of the die or 6
outcomes
50
Applications: Continuous Distributions

Normal

Equi-probable or
uniform
Students t


Rates & proportions,
np>5, n(1-p)>5; tests
about population
means given 2

Tests about population
means, 2 not known;
test regression
parameter = 0
51
Applications: Continuous Distributions

F


Ch-Square, 2

Regression: ratio of
explained mean
square to unexplained
mean square, i.e.
R2/k÷(1-R2)/(n-k); test
dropping 2 or more
variables (Wald test)
Contingency Table
analysis; Likelihood
ratio tests (Wald test)
52
Applications: Continuous Distributions


Exponential
Weibull


Failure (survival) time
with constant hazard
rate
Failure time analysis,
test whether hazard
rate is constant or
increasing or
decreasing
53
Labs 7, 8, 9

Lab 7 Failure Time Analysis

Lab 8 Contingency Table Analysis

Lab 9 One-Way and Two-Way ANOVA
54
Download