FUNCTIONAL FORM OF REGRESSION RELATION

advertisement
FUNCTIONAL FORM OF REGRESSION RELATION
Sometimes theory indicates appropriate functional form form. EX: Concentration of a drug in the blood as a function of time after
intake often follows a negative exponential curve. More often the functional form is derived from the data. The regression function
may be approximated by a linear function, a quadratic, a higher degree polynomial, or a combination of linear functions ("splines").
In the next example the relationship is clearly not linear, but a straight line can still represent a useful first approximation of the
relationship.
DATA FOR REGRESSION ANALYSIS
1. Observational Data
Data obtained from nonexperimental studies so that values of X are not controlled. Observational data do not directly offer strong
support for causal interpretations.
2. Experimental Data
Experimental units are randomly assigned to treatments (i.e., different values of the independent variable(s)). Experimental data
allow stronger causal inferences.

ML Estimation of the Mean 
Assume Y normally distributed and variance known ( = 10); estimate mean  from sample with n=3 (Y1 = 250, Y2 = 265, Y3 = 259).
Given an estimate of , the likelihood of the sample (Y1, Y2, Y3) is the product of the probability densities of the Y i given the value of
the estimated mean.
The normal probability density function is f(Y) = (1/(SQRT(2))) exp(-(1/2)((Y- )/)2) (see NKNW A.34 in Appendix A)
EX: If  = 230, f(Y1) = .005399, f(Y2) = .000087, f(Y3) = .000595 so that likelihood L (  = 230) = (.005399)(.000087)(.000595) =
(.279)10-9
Similarly, L (  = 259) = (.026609)(.033322)(.039894) = .0000354 The likelihood of  = 259 is greater than the likelihood of  = 230.
One could calculate the likelihood of the sample over a range of closely-spaced values of  and graph the resulting likelihood
function of 
The value of  corresponding to the maximum of the likelihood function (here  = 258) is the ML estimate of  In practice the
value(s) of the parameter(s) that maximize the likelihood function are found by iterative numerical optimization methods.
8. USING SYSTAT'S CALCULATOR IN LIEU OF STATISTICAL TABLES
SYSTAT provides cumulative, density, inverse and random variate functions for the 13 distributions listed in the table below. The
functions are named systematically with 3-letters names with suffix -CF, -DF, -IF, or -RN, accrding to the type of function.
Cumulative distribution functions (suffix -CF) compute the probability that a random value from the specified distribution falls
below or is equal to the given value.
Density functions (suffix -DF) is the height at x of the ordinate under the density curve of the specified distribution.
Inverse (cumulative) distribution functions (suffix -IF) take a specified alpha (a probability value between zero and one) and return
the critical value below which lies that proportion of the specified distribution.
Random variate functions (suffix -RN) generate pseudo-random variates from the specified distribution.
Exhibit:
Table
of
SYSTAT's
distribution
functions
(SYSTAT
6.0
DATA
p.
143)
1. Cumulative Distribution Functions
Use the -CF distributions to obtain probabilities (i.e., p-values) associated with observed sample statistics.
EX: calculate the 2-sided p-value of the slope of a simple regression model (i.e., with 1 independent variable plus a constant) with t*
=
1.79
and
n=27.
Use
the
Student
t
distribution
with
n-2
=
25
df:
calc
2*(1-tcf(1.79,25))
0.085575
This calculates the 2-sided p-value as twice the area under the curve above 1.79.
EX: calculate the 2-sided p-value of a regression coefficient in a multiple regression model with p-1 = 5 independent variables plus a
constant (so that p=6), with t* = 1.79 and n=27. Use the Student t distribution with n-p = 27-6 = 21 df:
calc
2*(1-tcf(1.79,21))
0.087885
Note this is slightly larger than in the previous example because of fewer df (21 versus 25).
EX: calculate the 2-sided p-value of a regression coefficient in a regression model (simple or multiple, it doesn't matter) when n is
large
(say
n=
100),
with
t*
=
1.79.
Use
the
standard
normal
distribution:
calc
2*(1-zcf(1.79))
0.073454
Note that you can use the t distribution tcf with any n. The tcf result will automatically converge to the zcf result when n becomes
large.
EX:
calculate the p-value for an F test.
F* is 4.14; the F distribution has 3 and 7 df.
calc
1
fcf(4.14,3,7)
0.055480
Here the result is not multiplied by 2 because the F test is one-sided.
EX: calculate the one-sided p-value of a regression coefficient from a multiple regression when n is large (say above 100), so one can
assume the sampling distribution is normal, t* = -2.033, and the research hypothesis is that the regression coefficient is negative:
calc
zcf(-2.033)
0.021026
Here the result is not multiplied by 2, because one wants the one-sided p-value. Note also that (unlike normal tables) the zcf function
returns the probability for negative values of the sample statistic.
2. Density Functions
Use the -DF density functions to calculate the probability density at x.
EX: NKNW (pp. 30assuming normally distributed errors.
Y1
Thus the likelihood of Y1 is
calc
zdf((250-259)/10)
0.266085
Note that the values given by NKNW (p. 31) are incorrect. They are all shifted one decimal to the right.
3. Inverse (Cumulative) Distribution Functions
Use the -IF inverse cumulative distribution functions to calculate critical values given alpha and to construct confidence intervals.
EX:
-ratio t* such that t* greater than this
value indicates that the regression coefficient is significantly different from zero at the .05 level (2-tailed)? Use the inverse t
distribution
with
n-2
=
15
df.
calc
tif(0.975,15)
2.131450
OR
calc
tif(0.025,15)
-2.131450
EX: in a multiple regression analysis a regression coefficient is 3.77 with s.e. 0.23. Calculate the 95% CI assuming that n is large, so
the
sampling
distribution
can
be
assumed
normal.
calc
3.77
+
zif(0.975)*0.23
4.220792
calc
3.77
zif(0.975)*0.23
3.319208
EX: same thing, but now n = 24, and there are p-1 = 3 independent variables plus a constant term, so that p = 4. You now need the t
distribution
with
24-p
=
20
df
calc
3.77
+
tif(0.975,20)*0.23
4.249772
calc
3.77
tif(0.975,20)*0.23
3.290228
The CI is a bit wider, as one would expect.
4. Random Variate Functions
The -RN functions generate psudo-random numbers distributed according to the particular distribution. They are mostly useful to
generate large samples of random observations to do Monte-Carlo studies, using SYSTAT's programming language. However, there
may be situations when you want to get single random values.
EX:
pick
a
uniformly
distributed
random
number
between
0
and
1
calc
urn(0,1)
0.179755
EX:
assign
yourself
a
random
IQ
score
(with
mean
=
100
and
sd
=
15)
calc
zrn
(100,
15)
95.609537
(ouch!)
EX:
flip
calc
1.000000
(nrn(1,0.5) is the binomial function with 1 trial and p=0.5)
a
nrn
coin
(1,0.5)
Module 4 - MATRIX REPRESENTATION OF REGRESSION MODEL

Using matrices we can represent the simple and multiple regression model in a compact form.

Definition of Matrix :A matrix is a rectangular array of elements arranged in rows and columns Or, A = [aij] for i = 1, 2, 3; j = 1,
2 (In the expression aij the first subscript always refers to the row index and the second subscript to the column index.)

A square matrix is a matrix with the same number of rows and columns.

A (column) vector is a matrix with one column. A row vector is a matrix with one row. "Vector" alone refers to a column
vector.

The transpose of a matrix A = [aij] is the matrix A' = [aji] in which the row and column indexes have been exchanged. An
alternative notation found for the transpose A' is AT.

Equality of Matrices: The matrices A and B are equal if they have the same dimensions and all corresponding elements are
equal.

The matrices involved in addition or subtraction must have the same dimensions. In general, with Arxc = [aij], Brxc = [bij],
A + B = [ aij+bij ]rxc A - B = [aij - bij]rxc

Multiplication by a Scalar. In general, if A = [aij ] and  is a scalar ( = an ordinary number, or a symbol representing an ordinary
number) A = A = [aij] To multiply by a scalar, multiply each element of the matrix by the scalar. Note that one can factor
out a scalar that is a common factor of every matrix element. the order of multiplication by a scalar does not matter

Multiplication of a Matrix by a Matrix

Symmetric Matrix: A is symmetric if A = A' In regression analysis, many expressions of the type X'X or Y'Y are symmetric.

Diagonal Matrix: Only diagonal elements are non-zero.

Identity Matrix I3x3. Diagonal is 1 everything else is 1. In general AI = IA = A, so that I can be dropped in simplifying
expressions.

A scalar matrix is a diagonal matrix with all diagonal elements the same. Multiplying by I is the same as multiplying by .

Vectors and Matrix with All Elements Unity. lrx1 is a column vector with all elements 1

Jrxr is a square matrix with all elements 1
These matrices may seem strange but they are very useful for representing sums and means in matrix notation. EX: l'l = n (where l is
n x 1); ll' = J (where l is n x 1 and J is n x n)

Zero Vector: A vector 0 composed entirely of zeroes.
M5
4. Historical Note.
The use of multiple regression analysis as a means of controlling for possible confounding factors that may spuriously produce an
apparent relationship between two variables was first proposed by G. Udny Yule in the late 1890s. In a pathbreaking 1899 paper "An
Investigation into the Causes of Changes in Pauperism in England, chiefly in the last Two Intercensal Decades" Yule investigated the
effect on change in pauperism (poverty rate) of a change in the ratio of the poor receiving relief outside as opposed to inside the
poorhouses (the "out-relief ratio"). This was a hot topic of policy debate in Great Britain at the time. Charles Booth had argued that
increasing the proportion of the poor receiving relief outside the poorhouses did not increase pauperism in a union (unions are British
administrative units). Using correlation coefficients (a then entirely novel technique that had been just developed by his colleague and
mentor Karl Pearson), Yule had discovered that there was a strong association between change in the out-relief ratio and change in
pauperism, contrary to Booth's impression. The 1899 paper is the first published use of multiple regression analysis. In it Yule uses
multiple regression to confirm the relationship between pauperism and out-relief by controlling for other possible causes of the
apparent association, specifically change in proportion of the old (to control for the greater incidence of poverty among the elderly)
and change in population (using population increase in a union as an indicator of prosperity). It is hard to improve on Yule's
description of the logic of the method. (See also Stigler 1986: 345-361.)
Exhibit (forthcoming): Algebra of specification bias.
M6

EX: the model with response function E{Y} = 1,740 - 4x12 - 3x22 - 3x1x2 yields the following quadratic response surface The
following exhibit is another example of a quadratic surface
Download