Document 11168781

advertisement
l o s t when t h e mean o f EACH v a r i a b l e i s c a l c u l a t e d .
b. A c o r r e l a t i o n o f k.35 i s n o t v e r y l i k e l y t o occur by chance!
(In
t h i s sample o f o n l y 30 cases i t was almost s u f f i c i e n t l y l a r g e t o
r e j e c t Ho. )
Thus i f one v a r i a b l e e x p l a i n s (1 i n e a r l y ) about 15% o f
t h e v a r i a n c e i n another, a sample o f around 30 w i l l l i k e l y be l a r g e
enough t o d e t e c t i t . T h i s should g i v e you some i n t u i t i v e " f e e l " f o r
the r e l a t i v e streugth o f
Irl
=
.35
.
W. F i n d i n g a 95% c o n f i d e n c e i n t e r v a l f o r r
RECALL t h a t
or2
=
1-
L
$ .
Thus when ( p 1 i s small, you would have a 1 a r g e r v a r i a n c e t h a n when
large!
Ip 1
is
As a r e s u l t , y o u r confidence i n t e r v a l must be " l o p s i d e d " w i t h a
l a r g e r d e v i a t i o n from r toward z e r o and a s m a l l e r d e v i a t i o n from r toward
k1.
To e s t i m a t e these d e v i a t i o n s we use Table E, which can be found a t t h e
end o f t h e S y l l a b u s .
Here a r e THE MECHANICS:
1. Table E p r o v i d e s a t r a n s f o r m a t i o n ( i . e . ,
v a r i a b l e w i t h a normal d i s t r i b u t i o n - a
u n r e l a t e d t o p ' s magnitude.
to a
variable with a distribution
Specifically,
2. I n t h e knee brace problem we have
confidence i n t e r v a l i s
r
a "mapping") of
r
=
.35
and
n
=
30
,
t h u s t h e 95%
3. From Table E we f i n d t h a t
T(.35)
=
.3654
.
Thus t h e c o n f i d e n c e
interval i s
4. BUT t h i s i s a c o n f i d e n c e i n t e r v a l around T ( r ) !
t h e s e two values back t o r ' s .
~-l(.743)
(-.012,
=
.631
.
This y i e l d s
So we must c o n v e r t
T-I(-,012)
=
-.012
r
Thus t h e 95% c o n f i d e n c e i n t e r v a l f o r
=
and
.35
is
.631).
5. N o t i c e how these c o n f i d e n c e bounds l o o k on a number l i n e :
I n p a r t i c u l a r , n o t i c e t h a t t h e bounds o f t h e i n t e r v a l a r e " l o p s i d e d . "
That i s , n o t e t h a t
1
.35 - (-.012)
A
X. The d i s t r i b u t i o n o f a when
x
=
0
1
=
.362 > .281
=
( .35 - .631
1 .
.
A
1. L e t a,
equal t h e c o n s t a n t i n a b i v a r i a t e r e g r e s s i o n e q u a t i o n i n which
t h e mean on t h e independent v a r i a b l e equals zero.
When X i s transformed
by s u b t r a c t i n g o u t i t s mean ( i . e . ,
then
A
when
=
0 ),
a
=
and
A
t h u s a,
i s an e s t i m a t e o f py.
NOTE: V a r i a b l e s a r e "centered" i f t h e i r means equal zero.
A
s u b s c r i p t s on a,
and Xc a r e reminders t h a t
Xc
I
0
The " c "
.
A
2. The standard e r r o r o f a,
i s u s u a l l y much s m a l l e r t h a n t h a t a s s o c i a t e d
4. We can now add t o t h i s graph t h e 95% p r e d i c t i o n i n t e r v a l around t h e
p r e d i c t i o n o f 999 r o b b e r i e s i n LA next year:
OR
(218 t o 1780)
-
a VERY wide p r e d i c t i o n i n t e r v a l !
5. F i n a l l y , note t h a t t h e smallest p r e d i c t i o n i n t e r v a l i s a t
OR
( 5.06,
12.94 )
OR
X
:
=
from 5 t o 13 r o b b e r i e s
AB. CONCLUSIONS:
a. The f u r t h e r t h e b a s i s o f your p r e d i c t i o n ( i . e . ,
b. t h e s m a l l e r t h e sample s i z e
X
P
) i s from
AND
AND
c. t h e smaller t h e estimated p o p u l a t i o n variance o f X,
THE LESS PRECISE YOUR PREDICTION w i l l be.
These conclusions f o l l o w as a d i r e c t consequence o f t h e formula f o r a
prediction interval :
NOTE: A c r i t i c a l value o f t ( r a t h e r than Z) w i l l be needed i n computing
both confidence and p r e d i c t i o n i n t e r v a l s whenever
k
=
n
-k
- 1 < 30
,
t h e number o f independent v a r i a b l e s i n t h e r e g r e s s i o n equation.
i n t h e b i v a r i a t e case, t h e number o f degrees o f freedom f o r t i s
where
Thus
n - 2
.
Stat 404
Assumptions Underlying Regression Analysis (continued)
A. Assumption 5: The design matrix, X, is fixed, or measured without error.
1. A researcher “fixes” values on an independent variable when subjects are assigned to
particular groups (e.g., as in a psychological experiment), or when “fixed” numbers of
subjects are sampled within strata (e.g., 50 males and 50 females).
2. When X has been fixed, one speaks of its variables as having “fixed effects.” If the
values of the independent variables in X are the unforeseen results of randomly sampled
cases, these variables are said to have “random effects” on the dependent variable. Even
though you may not have fixed X, it is important to be aware that you are assuming X to
be fixed (or measured without error) when you do regression analysis.
3. When a researcher has less control over the values of the independent variables (i.e.,
when their effects are random ones), she must assume that these variables are measured
without error. If one thinks of ex as the total influences that lead to inaccurate
measurement of the independent variable, x, one might depict this assumption as follows:
ex
=0
x
4. Note that we shall NOT deal with issues of measurement error in Stat 404.
1
B. Assumption 6: X is not correlated with errors in the measurement of Y.
1. Put differently, this assumption suggests that variances in X and Y are not related to any
third effect (e.g., time) such that they covary due to this third effect. A depiction of this
assumption for a particular independent variable, x, would be as follows:
eY
0=
x
Y
2. For our purposes the matrix, Y, is simply a vector of values for a single dependent
variable, y. Thus, ey is simply the error terms (i.e., the ei) from the regression of y on x.
3. What does it mean to say that the xi are uncorrelated with the ei?
a. To answer this question, let’s consider a theory of cognitive development according
to which children learn self-confidence by following the example of (or by
“modeling”) self-confident parents.
b. You assemble data to test this theory. However, only after collecting your data do
you realize (say, after spending a bit more time perusing other relevant literature) that
self-confidence is also associated with physical stature. The taller the child, the more
self-confident she is. Thus a child’s self-confidence has both psychological and
physiological origins. Given that physiology is genetically inherited, we might sketch
out the relations among parents’ and child’s stature and self-confidence as follows:
2
Parents’
physical
stature
Child’s
physical
stature
Parents’
selfconfidence
Child’s
selfconfidence
c. Unfortunately, you have no data on parents’ or child’s physiology, so you are left
with a causal model of the following form:
Other
causes (eY)
0≠
Parents’ selfconfidence
(x)
Child’s selfconfidence
(Y)
d. Since here child’s physical stature is among the “other causes” excluded from the
model and since this stature and parents’ self-confidence have a common origin in
parents’ physical stature, you could NOT assume that parents’ self-confidence is
uncorrelated with the errors from a regression of child’s self-confidence on parents’
self-confidence.
C. Assumption 7: No independent variable (i.e., no column in the design matrix, X) may
have all of its variance explained by any subset of the remaining independent variables.
The assumption can be put mathematically as follows:
3
Rxi . x1 ... xi−1 xi+1 ... xk < 1, ∀i
We shall soon discover that when this assumption is not met, your statistics program will
be unable to calculate any regression coefficients.
4
Download