Statistics 6250 Fall 2012 Prof. Fan Name:__________________ (print: first last ) NetID #:________________ Midterm Two Instructions: This is an in-class and open book midterm. No internet access (except our class website) is allowed!! You must write your answers on the provided spaces. Multiple Choice (could be more than one correct answers) 1. The following program is submitted: data work.firsthalf work.thirdqtr work.misc; set sashelp.retail; if 1<=month<=6 then output work.firsthalf; else if 7<=month<=9 then output work.thirdqtr; run; Which of the following statements is true regarding the previous program with an observation having month equal to 12? a. The observation will be output to the work.firsthalf data set. b. The observation will be output to the work.thirdqtr data set. c. The observation will be output to the work.misc data set. d. The observation will not be output to any data set. Answer: D 2. Given the input data set products: CODE PRODUCT A123 Sandal A234 Slipper B345 Boot B456 Sneaker Given the input data set costs: CODE COST A123 19.99 A234 9.99 B456 25.99 The following program is submitted: data prodcost; merge products(in=p) costs(in=c); by code; if p and c; run; 1 Which of the following are the results? a. The program fails execution because of invalid IN= syntax. b. The program runs with(out) warnings that the subsetting IF statement is incomplete. c. The program runs without errors or warnings and produces a data set with three observations and three variables. d. The program runs without errors or warnings and produces a data set with four observations and three variables. Answer: B or C 3. The following program is submitted: data personnel; hired='01MAR2003'd; name='William Smith'; run; Which of the following is true regarding the variables created with the assignment statements? a. The variables hired and name are both 8 bytes. b. The variable hired is 8 bytes and name is 13 bytes. c. The variables hired and name are both character. d. The variable hired is numeric and name is character. Answer: B, D 4. Given the SAS data set birth: NAME STATE Tim Sue Bill CA IN NY The following SAS program is submitted: data birthregion; set birth; if state='CA' then do; region='West'; end; else if state='NY' then do; region='East'; run; What is the result? a. The program fails execution because of invalid DO block syntax. b. The program fails execution because there is not a DO block for the state value of IN. c. The program runs without errors or warnings and produces a data set with two observations and three variables. 2 d. The program runs without errors or warnings and produces a data set with three observations and three variables. Answer: D 5. Which of the following is true regarding the sum statement? a. The sum statement can only be used for variables being read in from a SET statement. b. The sum statement initializes the variable to zero before the first iteration of the DATA step. c. The sum statement automatically retains the variable value without using a RETAIN statement. d. The sum statement produces an error if a missing value is added to the accumulator variable. Answer: B, C 6. Which of the following ARE valid syntax for SELECT and WHEN statements? a. select(salary); when (<100000) status='Non-Exec'; when (>=100000) status='Exec'; end; b. select(salary); when salary<100000 status='Non-Exec'; when salary>=100000 status='Exec'; end; c. select; when (salary<100000) status='Non-Exec'; when (salary>=100000) status='Exec'; end; d. select; when salary<100000 status='Non-Exec'; when salary>=100000 status='Exec'; end; Answer: C Question One (8 points) Three SAS data sets: test1, test2, and test3 contain information of a test of five questions. Test 1 contains the first three scores of the first three subjects together with the dates of test. Test 2 contains the first three scores of the next three subjects. Test 3 contains the 3 last two scores of these subjects. We would like to combine all the information in the three files into one data file step by step as follows. data test1; input ID $ date $10. Q1-Q3; datalines; 02 02/04/2008 4 1 3 01 03/05/2008 3 5 4 03 06/03/2008 9 8 7 ; data test2; input NO $ Q1-Q3; datalines; 04 3 6 4 05 6 7 7 06 8 3 5 ; data test3; input ID $ Q4-Q5; datalines; 01 7 4 03 8 8 06 6 9 05 5 7 ; (a) [2 points] The variable “date” in test1 was incorrectly read as a character variable. Without reading the data again, fix the problem and write your SAS code. Answer: data test1; set test1(rename=(date=char_date)); date=input(char_date, mmddyy10.); format date mmddyy10.; drop char_date; run; (b) [2 points] Create a SAS data file called “five” which contains all information in test1, test2 and test3, i.e. combine the three files and call it “five”. Print your data file “five”; make sure date data are printed by mm/dd/yyyy format. Write your SAS code and the PROC PRINT output of columns of Q1,date and Q5 here. Answer: proc sort data=test1; by ID; proc sort data=test2; by NO; proc sort data=test3; by ID; run; data five; merge test1 test2(rename=(NO=ID)) test3; by ID; run; 4 proc print data=five noobs; var Q1 date Q5; run; Q1 date Q5 3 4 9 3 6 8 03/05/2008 02/04/2008 06/03/2008 . . . 4 . 8 . 7 9 (c) [2 points] Add two variables into the data file five: 1) mean_score, the mean score of the non-missing questions (of each ID) and 2) counts, the number of nonmissing questions (of each ID). Print this data file and copy the columns of ID, counts and mean_score here. Also write your SAS code here. Answer: data five; set five; counts=n(of Q1-Q5); mean_score=mean(of Q1-Q5); run; proc print data=five; var id counts mean_score; run; mean_ Obs ID counts score 1 2 3 4 5 6 01 02 03 04 05 06 5 3 5 3 5 5 4.60000 2.66667 8.00000 4.33333 6.40000 6.20000 (d) Draw the plot illustrating the relation between counts and mean_score. Sketch your plot and describe the relation. Write your SAS code here. Answer: proc gplot data=five; plot mean_score*counts; run; positive/increasing association: 5 mean_score 8 7 6 5 4 3 2 1 3 4 5 counts Question Two (8 points) Data set Study (in the library “learn”) is shown below: data study; input Subj : $3. Group : $1. Dose : $4. Weight : $8. Subgroup; datalines; 001 A Low 220lbs. 2 002 A High 90Kg. 1 003 B Low 88kg 1 004 B High 165lbs. 2 005 A Low 88kG 1 ; (a) [2 points] Create a new SAS data file (study1) in which a variable “DoseGroup” is added by putting Dose and Group together, separated by a slash (/), and then Dose and Group are both dropped. Make sure there are no blanks in this value. Use “PROC CONTENTS” to test this and copy the output of “Alphabetic List of Variables and Attributes”. Answer: data study1; set learn.study; *DoseGroup = catx('/',Dose,Group); DoseGroup=strip(Dose)||'/'||Strip(Group); drop group dose; run; title "Listing of STUDY"; proc contents data=study1; run; # Variable 4 3 1 2 Type DoseGroup Subgroup Subj Weight Len Char Num Char Char 6 8 3 8 6 (b) [6 points] We will clean the weight data in this part. As seen in the data list, the units of weight are not consistent. Create a new SAS data file (study2) with a numeric variable called Wtkg that represents weight in kilograms, rounded to the nearest kilogram. Print your study2 file and copy the columns of Subj and Wtkg here. Also write your SAS code here. Note: 1 kilogram = 2.2 pounds. data study2; set study1; Weightkg = input(compress(Weight,,'kd'),8.); if find(Weight,'KG','i') then Weightkg = round(Weightkg,1); else if find(Weight,'LB','i') then Weightkg = round(Weightkg/2.2,1); run; proc print data=study2 noobs; run; Subj Weight Subgroup Dose Group Weightkg 001 002 003 004 005 220lbs. 90Kg. 88kg 165lbs. 88kG 2 1 1 2 1 Low/A High/A Low/B High/B Low/A 100 90 88 75 88 Question Three Briefly explain the difference between the two SAS statements: “MERGE file1 file2;” and “UPDATE file1 file2;”. Use examples to illustrate your points. Answer: Both combine the contents of the two files together by an ID variable and replace the information in file 1 by the information in file 2 when their variable names and ID numbers are matched. However, update will not replace the data in file 1 by the corresponding missing values in file 2, while merge does the replacement anyway. Example: (Only the data of ID 04 in test 4 is different to those in test 2) data test4; input ID $ Q1-Q3; datalines; 04 4 . . 05 6 7 7 06 8 3 5 ; by ID; run; data result2; update test2 test4; by ID; run; proc sort data=test2; by ID; proc sort data=test4; by ID; run; data result1; merge test2 test4; 7 Result1 data: ID Q1 Q2 Q3 04 05 06 4 6 8 . 7 3 . 7 5 ID 04 05 06 Q1 4 6 8 Q2 6 7 3 Q3 4 7 5 Result2 data: 8