Stat401E Fall 2010 Lab 5 1. You have developed an Index of Leadership Potential (ILP) with scores that range on a 13-point scale from -6 to 6. Values on this scale are counts of each respondent's positive or negative ILP-responses to each of six questionnaire items. (For example, a score of 6 means that a respondent gave a positive ILP-response on each of the six items.) A negative score on the ILP indicates a deficiency in leadership potential, whereas a positive ILP score indicates an abundance of leadership potential. To validate the ILP as a measure of leadership potential, you administer the ILP to a sample of 8 business executives, each of whose supervisors has attested to the executive's stellar (i.e., abundant) leadership potential. The eight business executives' ILP scores are as follows: 4 2 4 2 2 -2 0 2 a. In this validation of the ILP you wish to test whether evidence of abundant leadership potential can be demonstrated based on ILP scores from a sample of individuals with unquestionably abundant leadership potential. State the null and alternative hypotheses needed to make this test. (Hints: Note that in this validation of the ILP you are NOT at all concerned with testing whether deficiencies in leadership potential can be demonstrated. Also remember to give the numerical value of the parameter-of-interest under the null hypothesis.) b. Using the ILP data given above, what is the p-value associated with the statistic that is evaluated in the hypothesis test? (Hint: You may only be able to give an approximate p-value or to say that the p-value falls within a specific range of values. Also, you may assume that the underlying distribution of ILP scores is normally distributed among business executives.) c. Could one conclude at the .05 significance level that ILP scores afford accurate reflections of abundant leadership potential? Explain your answer. d. In testing the hypothesis stated in part a, how large a sample would have been required to obtain a precision equal to "one-half of an ILP-point"? (Hint: Use the .05 significance level and the same estimate of the population variance as would be used in testing the hypotheses stated in part a.) 1 2. We shall be using data from the 1984 General Social Survey (GSS84)--a U.S. national probability sample assembled by the National Opinion Research Center (NORC). You can download these data onto your PC via the GSS84 link in our class web site’s “Assignmentspage.” Using this link, please save the file as “gss84.txt” into a convenient folder on your hard drive. After the file is saved, change the file’s extension from “txt” to “por”. (WARNING: Do NOT save the file as “gss84.por” from your browser, because the file will then be saved in the wrong [i.e., a nontext] format.) To access the data, start SPSS-PC then select “File” then “Open” then “Data,” and then change “File of type:” to be “SPSS Portable (*.por).” If you then navigate to the just-mentioned “convenient folder” you should see “gss84.por” there. Clicking on the file name and pressing the “Open” button will place the data into SPSS’s Data Editor. You can now conduct statistical analyses. You are strongly recommended to run your statistics lab problems using SPSS’s Syntax Editor, and NOT to run them using the program’s pull-down windows. The reason for this recommendation is that programs run from the Syntax Editor can be saved, whereas ones run from pull-down windows cannot be saved. If you are analyzing data (let’s say, as part of a thesis or dissertation) you may be asked how you obtained the numbers, for example, in Table 14. If you were to have saved the corresponding program (from the Syntax Editor) into a file called “table14.sps” you would be able to answer this question very easily. Were you to have generated the output for Table 14 using pull-down windows, you might never be able to discover where the numbers in Table 14 came from. The Moral: Save your programs in syntax files or you will someday regret not having done so. Now open the Syntax Editor by selecting “File” then “New” then “Syntax.” Once within the Syntax Editor type the following one-line program (not forgetting the period at the end of the line): frequencies vars = age / statistics = mean,stddev. Next choose “Edit” then “Select all” then press the button with the black arrow-head pointing toward the right. The mean and standard deviation (plus individual data on the 1467 subjects with valid data on the age variable) will appear in the SPSS Viewer. You will need these two numbers (plus the “valid sample size” of 1467) in answering parts a and b below. a. Assume that the NORC data are from a random sample of United States residents. (Actually the data comprise a multistage stratified cluster sample.) The variable, AGE, gives values that correspond to subjects' responses to the question, "How old are 2 you?" Have the computer calculate the mean and standard deviation for this variable. (This will already have been done if you followed the above directions.) Then using the standard deviation estimate from your ouput to estimate the population standard deviation, give the 95% confidence interval for the average age in the US population. b. If you wanted to estimate the average age in the US population to within one year of the actual population mean, how large a sample size would you need to do this at the .05 level of significance? 3. Solve both of the following for k. (Hints: Assume that k is a constant. Part b has a solution for k that is a real number; part a does not. Also, you may wish to (optionally) check your solutions by substituting them back into the original equations.): n a. –4 n n Xi + k 2 = 8k i=1 2 –Xi – 3k 2 i=1 i=1 n n –1 Xij – 3 – X2j – X1j i 2 i=1j=1 2 2 j=1 b. k = ---------------------------------------------------------------------------------------------------------n n X1j – X2j j=1 j=1 ANNOUNCEMENT: No more summation-notation problems after Lab 5!!! Below please find R and SAS code for problem 2: ### R ### ### ### ### Directions: Copy the below R code into the "R Editor" window (accessed by selecting "New script" under the "File" pull-down menu), swipe the code, and press F5. ### Code: ### read lab5data.txt into "gss", and call its first column "age" gss<-read.table('http://www.public.iastate.edu/~carlos/401/labs/lab5da 3 ta.txt') age<-gss[,1] ### get rid of missing data cases when age=99 age<-age[age!=99] ### results: mean, standard deviation, and sample size mean(age) sd(age) length(age) * SAS * Directions: * Copy lab5data.txt into the C-drive's root (i.e., into "C:/"). * Copy the below SAS code into the "Editor" window, * and press the button with the figure of a little guy running. * Code: * read lab5data.txt into "gss"; data gss; infile 'C:\lab5data.txt'; input age rincome sex fear papres16 prestige educ agewed xnorcsiz; run; * remove missing data while copying "gss" into "gssnew"; data gssnew; set gss; if (age=99 | age=98 | age=0 | age=-1) then delete; run; * results (the "means" procedure generates mean, standard deviation, etc. for the variable, age); proc means data=gssnew; var age; run; 4