Fall 2013 Statistics 479 Assignment #2 (40 points) Note: You must provide hand-written (or typed) answers to these questions. Do not turn-in SAS output as answers to any part. 1. In each of the following cases show the observations written to the SAS data set (variable names and corresponding data values) when the given lines of data are read by the given INPUT statement: (a) input Id Gender $ Age GPA SAT; 1906 M 18 3.15 720 3045 F . 3.85 683 2119 M 23 3.72 775 (b) input (Id Quiz1-Quiz4) (4. 4*3.); 3490432 16798 2 197857 36 84 (c) input Id $1-4 @8 Pulse 3. +2 Weight 4.1 Pushups; L4WP236 85 92517 28 M5XQ 47 1423 32 158 79 81145 19 (d) input Name :$11. Id 4. Visit : mmddyy8.; Wilson 3974 11/25/47 Worthington 1598 7/16/86 NOTE: The symbol denotes one space. 2. Sketch the output resulting from executing the following SAS program. Describe in your own words the flow of operations in the data step in creating this data set. data class; input Score @@; if Score > 50 then do; Grade = "Pass"; Rating = (Score -50)/10; end; else if Score < 50 then do; Grade = "Fail"; end; datalines; 47 49 50 52 55 . ; proc print data=class; run; 3. A data step reads the following data lines: 1 4. 5. 5.68 1.25 4.57 2 1 2.34 3 3.46 What would be the view of the SAS dataset if the input statement used was each of the following ? Write a brief explanation of what you think takes place in each data step. (a) input X Y; (b) input X Y @@; (c) input X; (d) input X Y Z; 4. The program data vector in a SAS data step has variables with values as shown below: Code = ‘VLC’ Size = ‘M’ V1 = 2 V2 = 3 V3 = 7 V4 = . Determine the results of the following SAS expressions (as stored internally): (a) (V1 + V2 -V3)/3 (b) V3 - V2/V1 (c) V1*V2 - V3 (d) V2*V3/V1 (e) V1**2 + V2**2 (f) Code = ‘VLC’ (g) Code = ‘VLC’ & Size = ‘M’ (h) Code = ‘VLC’|size = ‘M’ (i) Code = ‘VLC’ & V4^=. (j) (V3=.) + (V2=3) (k) V1 + V2 + V3 ˆ = 12 (l) Code = ‘VLC’ | (Size = ‘M’ & V1 = 3) (m) 3 < V2 < 5 [Hint: Recall that logical expressions evaluate to 1 (‘TRUE’) or 0 (‘FALSE’)] 5. Show the values for the variable Miles that will be stored in the SAS dataset distance: data distance; input Miles 5.2; datalines; 1 12 123 1234 12345 1. 12. 12.3 1234.5 ; 2 6. Display the printed output produced from executing the following SAS program. Show what is in the program data vector immediately after processing the first line of data. (5 points) data carmart; input Dept $ Id $ P82 P83 P84; Drop P82 P83 P84; Year=1982; Sales=P82; output; Year=1983; Sales=P83; output; Year=1984; Sales=P84; output; datalines; parts 176 3500 2500 800 parts 217 2644 3500 3000 tools 124 5672 6100 7400 tools 45 1253 4698 9345 repairs 26 9050 5450 8425 repairs 142 ; proc print; run; 7. Sketch the printed output produced by executing the following SAS program. Display the contents of the program data vector at the point the first observation is to be written to the SAS data set. data one; input X1-X3 @@; X3=3*X3-X1**2; X4=sqrt(X2); drop X1 X2; datalines; 3 4 5 -2 9 3 -3 ; proc print data=one; run; 1 4 . 16 8 8. Study the the following program: data tests; input Name $ Score1 Score2 Score3 Team $ ; datalines; Joe 11 32 76 red Michael 13 29 82 blue Susan 14 27 74 green ; proc print; run; (a) Sketch the printed output produced from executing this program. (b) What would be the printed output if the input statement is changed to the following: input name $ Score1 Score2 Score3; (c) What would you do to modify the above program if the data value for the variable Score2 was missing for Michael? 3 (d) Would the above input statement still work if the datalines were of the form given below. Explain why or why not. Joe 11 32 76 red Michael ... (e) Using the SAS function sum() or otherwise, write a single SAS assignment statement to create a new variable called Total that contains the total test score for each individual. Where would you insert this statement in the above program? (f) If the raw data were available in the text file test.txt (with the data lines entered in the same format) in the folder M:\stat479\class, modify the above data step to create the SAS data set tests using this file. 9. A local high school collects data on student performance in grades 9 through 12. In grades 9 and 10 data were collected for Science and English only while for for grades 11 and 12 Math scores were also recorded. Unfortunately, the data so collected were recorded as described below resulting in two completely different data layouts. Write a SAS program to create a temporary SAS data set called perform which contains observations for all four years of high school by accessing raw data (may be, containing 100’s of data lines) as described below. Turn in your SAS program. Sample Data Lines: 1 2 3 Columns:123456789012345678901234567890 ______________________________ 0962432736578 118091315736792 0945712817859 125916294847689 1076543057182 112479329697883 Data Description: (note the two types of data lines depending on grade) Field 1 2 3 4 5 1 2 3 4 5 6 Variable Description Grade Student Id GPA Science English Grade Student Id GPA Science Math English Columns 1-2 2-6 7-9 10-11 12-13 1-2 2-6 7-9 10-11 12-13 14-15 Type char (09 or 10) char numeric (with 2 decimals) numeric (whole number) numeric (whole number) char (11 or 12) char numeric (with 2 decimals) numeric (whole number) numeric (whole number) numeric (whole number) Due Thursday, September 19, 2013 4