Lab 5

advertisement
Stat401E
Fall 2010
Lab 5
1. You have developed an Index of Leadership Potential (ILP)
with scores that range on a 13-point scale from -6 to 6.
Values on this scale are counts of each respondent's positive or
negative ILP-responses to each of six questionnaire items. (For
example, a score of 6 means that a respondent gave a positive
ILP-response on each of the six items.) A negative score on the ILP
indicates a deficiency in leadership potential, whereas a positive ILP
score indicates an abundance of leadership potential. To validate the
ILP as a measure of leadership potential, you administer the ILP to a
sample of 8 business executives, each of whose supervisors has attested
to the executive's stellar (i.e., abundant) leadership potential. The
eight business executives' ILP scores are as follows:
4
2
4
2
2
-2
0
2
a. In this validation of the ILP you wish to test whether evidence
of abundant leadership potential can be demonstrated based on ILP
scores from a sample of individuals with unquestionably abundant
leadership potential. State the null and alternative hypotheses
needed to make this test. (Hints: Note that in this validation of
the ILP you are NOT at all concerned with testing whether
deficiencies in leadership potential can be demonstrated. Also
remember to give the numerical value of the parameter-of-interest
under the null hypothesis.)
b. Using the ILP data given above, what is the p-value associated
with the statistic that is evaluated in the hypothesis test? (Hint:
You may only be able to give an approximate p-value or to say that
the p-value falls within a specific range of values. Also, you may
assume that the underlying distribution of ILP scores is normally
distributed among business executives.)
c. Could one conclude at the .05 significance level that ILP scores
afford accurate reflections of abundant leadership potential?
Explain your answer.
d. In testing the hypothesis stated in part a, how large a sample
would have been required to obtain a precision equal to "one-half of
an ILP-point"? (Hint: Use the .05 significance level and the same
estimate of the population variance as would be used in testing the
hypotheses stated in part a.)
1
2. We shall be using data from the 1984 General Social
Survey (GSS84)--a U.S. national probability sample assembled
by the National Opinion Research Center (NORC). You can download
these data onto your PC via the GSS84 link in our class web
site’s “Assignmentspage.” Using this link, please save the
file as “gss84.txt” into a convenient folder on your hard
drive. After the file is saved, change the file’s extension
from “txt” to “por”. (WARNING: Do NOT save the file as
“gss84.por” from your browser, because the file will then be saved in
the wrong [i.e., a nontext] format.) To access the data, start SPSS-PC
then select “File” then “Open” then “Data,” and then change “File of
type:” to be “SPSS Portable (*.por).” If you then navigate to the
just-mentioned “convenient folder” you should see “gss84.por” there.
Clicking on the file name and pressing the “Open” button will place the
data into SPSS’s Data Editor. You can now conduct statistical analyses.
You are strongly recommended to run your statistics lab problems using
SPSS’s Syntax Editor, and NOT to run them using the program’s pull-down
windows. The reason for this recommendation is that programs run from
the Syntax Editor can be saved, whereas ones run from pull-down windows
cannot be saved. If you are analyzing data (let’s say, as part of a
thesis or dissertation) you may be asked how you obtained the numbers,
for example, in Table 14. If you were to have saved the corresponding
program (from the Syntax Editor) into a file called “table14.sps” you
would be able to answer this question very easily. Were you to have
generated the output for Table 14 using pull-down windows, you might
never be able to discover where the numbers in Table 14 came from. The
Moral: Save your programs in syntax files or you will someday regret
not having done so.
Now open the Syntax Editor by selecting “File” then “New” then
“Syntax.” Once within the Syntax Editor type the following one-line
program (not forgetting the period at the end of the line):
frequencies vars = age / statistics = mean,stddev.
Next choose “Edit” then “Select all” then press the button with the
black arrow-head pointing toward the right. The mean and standard
deviation (plus individual data on the 1467 subjects with valid data on
the age variable) will appear in the SPSS Viewer. You will need these
two numbers (plus the “valid sample size” of 1467) in answering parts a
and b below.
a. Assume that the NORC data are from a random sample of United
States residents. (Actually the data comprise a multistage
stratified cluster sample.) The variable, AGE, gives values that
correspond to subjects' responses to the question, "How old are
2
you?" Have the computer calculate the mean and standard
deviation for this variable. (This will already have been
done if you followed the above directions.) Then using the
standard deviation estimate from your ouput to estimate
the population standard deviation, give the 95%
confidence interval for the average age in the US
population.
b. If you wanted to estimate the average age in the US population to
within one year of the actual population mean, how large a sample
size would you need to do this at the .05 level of significance?
3. Solve both of the following for k. (Hints: Assume that k is a
constant. Part b has a solution for k that is a real number; part a does
not. Also, you may wish to (optionally) check your solutions by
substituting them back into the original equations.):
n
a.
–4
n
n
  Xi + k 
2
= 8k
i=1
2
  –Xi  –   3k 
2
i=1
i=1
n
n
    –1   Xij – 3   –   X2j – X1j 
i
2
i=1j=1
2
2
j=1
b. k = ---------------------------------------------------------------------------------------------------------n
n
 X1j –  X2j
j=1
j=1
ANNOUNCEMENT: No more summation-notation problems after Lab 5!!!
Below please find R and SAS code for problem 2:
### R
###
###
###
###
Directions:
Copy the below R code into the "R Editor" window (accessed by
selecting "New script" under the "File" pull-down menu), swipe the
code, and press F5.
### Code:
### read lab5data.txt into "gss", and call its first column "age"
gss<-read.table('http://www.public.iastate.edu/~carlos/401/labs/lab5da
3
ta.txt')
age<-gss[,1]
### get rid of missing data cases when age=99
age<-age[age!=99]
### results: mean, standard deviation, and sample size
mean(age)
sd(age)
length(age)
* SAS
* Directions:
* Copy lab5data.txt into the C-drive's root (i.e., into "C:/").
* Copy the below SAS code into the "Editor" window,
*
and press the button with the figure of a little guy running.
* Code:
* read lab5data.txt into "gss";
data gss;
infile 'C:\lab5data.txt';
input age rincome sex fear papres16 prestige educ agewed xnorcsiz;
run;
* remove missing data while copying "gss" into "gssnew";
data gssnew;
set gss;
if (age=99 | age=98 | age=0 | age=-1) then delete;
run;
* results (the "means" procedure generates mean, standard deviation,
etc. for the variable, age);
proc means data=gssnew;
var age;
run;
4
Download