Statistics 3900

advertisement
Statistics 6250
Fall 2012
Prof. Fan
Name:__________________
(print: first
last )
NetID #:________________
Midterm Two
Instructions: This is an in-class and open book midterm. No internet access (except our
class website) is allowed!! You must write your answers on the provided spaces.
Multiple Choice (could be more than one correct answers)
1. The following program is submitted:
data work.firsthalf work.thirdqtr work.misc;
set sashelp.retail;
if 1<=month<=6 then output work.firsthalf;
else if 7<=month<=9 then output work.thirdqtr;
run;
Which of the following statements is true regarding the previous program with an
observation having month equal to 12?
a. The observation will be output to the work.firsthalf data set.
b. The observation will be output to the work.thirdqtr data set.
c. The observation will be output to the work.misc data set.
d. The observation will not be output to any data set.
Answer: D
2. Given the input data set products:
CODE PRODUCT
A123 Sandal
A234 Slipper
B345 Boot
B456 Sneaker
Given the input data set costs:
CODE COST
A123 19.99
A234 9.99
B456 25.99
The following program is submitted:
data prodcost;
merge products(in=p) costs(in=c);
by code;
if p and c;
run;
1
Which of the following are the results?
a. The program fails execution because of invalid IN= syntax.
b. The program runs with(out) warnings that the subsetting IF statement is incomplete.
c. The program runs without errors or warnings and produces a data set with three
observations and three variables.
d. The program runs without errors or warnings and produces a data set with four
observations and three variables.
Answer: B or C
3. The following program is submitted:
data personnel;
hired='01MAR2003'd;
name='William Smith';
run;
Which of the following is true regarding the variables created with the assignment
statements?
a. The variables hired and name are both 8 bytes.
b. The variable hired is 8 bytes and name is 13 bytes.
c. The variables hired and name are both character.
d. The variable hired is numeric and name is character.
Answer: B, D
4. Given the SAS data set birth:
NAME
STATE
Tim
Sue
Bill
CA
IN
NY
The following SAS program is submitted:
data birthregion;
set birth;
if state='CA' then do;
region='West';
end;
else if state='NY' then do;
region='East';
run;
What is the result?
a. The program fails execution because of invalid DO block syntax.
b. The program fails execution because there is not a DO block for the state value of IN.
c. The program runs without errors or warnings and produces a data set with two
observations and three variables.
2
d. The program runs without errors or warnings and produces a data set with three
observations and three variables.
Answer: D
5. Which of the following is true regarding the sum statement?
a. The sum statement can only be used for variables being read in from a SET statement.
b. The sum statement initializes the variable to zero before the first iteration of the DATA
step.
c. The sum statement automatically retains the variable value without using a RETAIN
statement.
d. The sum statement produces an error if a missing value is added to the accumulator
variable.
Answer: B, C
6. Which of the following ARE valid syntax for SELECT and WHEN statements?
a.
select(salary);
when (<100000) status='Non-Exec';
when (>=100000) status='Exec';
end;
b.
select(salary);
when salary<100000 status='Non-Exec';
when salary>=100000 status='Exec';
end;
c.
select;
when (salary<100000) status='Non-Exec';
when (salary>=100000) status='Exec';
end;
d.
select;
when salary<100000 status='Non-Exec';
when salary>=100000 status='Exec';
end;
Answer: C
Question One (8 points)
Three SAS data sets: test1, test2, and test3 contain information of a test of five questions.
Test 1 contains the first three scores of the first three subjects together with the dates of
test. Test 2 contains the first three scores of the next three subjects. Test 3 contains the
3
last two scores of these subjects. We would like to combine all the information in the
three files into one data file step by step as follows.
data test1;
input ID $ date $10. Q1-Q3;
datalines;
02 02/04/2008 4 1 3
01 03/05/2008 3 5 4
03 06/03/2008 9 8 7
;
data test2;
input NO $ Q1-Q3;
datalines;
04 3 6 4
05 6 7 7
06 8 3 5
;
data test3;
input ID $ Q4-Q5;
datalines;
01 7 4
03 8 8
06 6 9
05 5 7
;
(a) [2 points] The variable “date” in test1 was incorrectly read as a character variable.
Without reading the data again, fix the problem and write your SAS code.
Answer:
data test1;
set test1(rename=(date=char_date));
date=input(char_date, mmddyy10.);
format date mmddyy10.;
drop char_date;
run;
(b) [2 points] Create a SAS data file called “five” which contains all information in
test1, test2 and test3, i.e. combine the three files and call it “five”. Print your data
file “five”; make sure date data are printed by mm/dd/yyyy format. Write your
SAS code and the PROC PRINT output of columns of Q1,date and Q5 here.
Answer:
proc sort data=test1;
by ID;
proc sort data=test2;
by NO;
proc sort data=test3;
by ID;
run;
data five;
merge test1 test2(rename=(NO=ID)) test3;
by ID;
run;
4
proc print data=five noobs;
var Q1 date Q5;
run;
Q1
date
Q5
3
4
9
3
6
8
03/05/2008
02/04/2008
06/03/2008
.
.
.
4
.
8
.
7
9
(c) [2 points] Add two variables into the data file five: 1) mean_score, the mean
score of the non-missing questions (of each ID) and 2) counts, the number of nonmissing questions (of each ID). Print this data file and copy the columns of ID,
counts and mean_score here. Also write your SAS code here.
Answer:
data five;
set five;
counts=n(of Q1-Q5);
mean_score=mean(of Q1-Q5);
run;
proc print data=five;
var id counts mean_score;
run;
mean_
Obs
ID
counts
score
1
2
3
4
5
6
01
02
03
04
05
06
5
3
5
3
5
5
4.60000
2.66667
8.00000
4.33333
6.40000
6.20000
(d) Draw the plot illustrating the relation between counts and mean_score. Sketch
your plot and describe the relation. Write your SAS code here.
Answer:
proc gplot data=five;
plot mean_score*counts;
run;
positive/increasing association:
5
mean_score
8
7
6
5
4
3
2
1
3
4
5
counts
Question Two (8 points)
Data set Study (in the library “learn”) is shown below:
data study;
input Subj
: $3.
Group : $1.
Dose
: $4.
Weight : $8.
Subgroup;
datalines;
001 A Low 220lbs. 2
002 A High 90Kg. 1
003 B Low 88kg
1
004 B High 165lbs. 2
005 A Low 88kG 1
;
(a) [2 points] Create a new SAS data file (study1) in which a variable “DoseGroup”
is added by putting Dose and Group together, separated by a slash (/), and then
Dose and Group are both dropped. Make sure there are no blanks in this value.
Use “PROC CONTENTS” to test this and copy the output of “Alphabetic List of
Variables and Attributes”.
Answer:
data study1;
set learn.study;
*DoseGroup = catx('/',Dose,Group);
DoseGroup=strip(Dose)||'/'||Strip(Group);
drop group dose;
run;
title "Listing of STUDY";
proc contents data=study1;
run;
#
Variable
4
3
1
2
Type
DoseGroup
Subgroup
Subj
Weight
Len
Char
Num
Char
Char
6
8
3
8
6
(b) [6 points] We will clean the weight data in this part. As seen in the data list, the
units of weight are not consistent. Create a new SAS data file (study2) with a
numeric variable called Wtkg that represents weight in kilograms, rounded to the
nearest kilogram. Print your study2 file and copy the columns of Subj and Wtkg
here. Also write your SAS code here. Note: 1 kilogram = 2.2 pounds.
data study2;
set study1;
Weightkg = input(compress(Weight,,'kd'),8.);
if find(Weight,'KG','i') then Weightkg = round(Weightkg,1);
else if find(Weight,'LB','i') then Weightkg =
round(Weightkg/2.2,1);
run;
proc print data=study2 noobs;
run;
Subj
Weight
Subgroup
Dose
Group
Weightkg
001
002
003
004
005
220lbs.
90Kg.
88kg
165lbs.
88kG
2
1
1
2
1
Low/A
High/A
Low/B
High/B
Low/A
100
90
88
75
88
Question Three
Briefly explain the difference between the two SAS statements: “MERGE file1 file2;”
and “UPDATE file1 file2;”. Use examples to illustrate your points.
Answer:
Both combine the contents of the two files together by an ID variable and replace the
information in file 1 by the information in file 2 when their variable names and ID
numbers are matched. However, update will not replace the data in file 1 by the
corresponding missing values in file 2, while merge does the replacement anyway.
Example: (Only the data of ID 04 in test 4 is different to those in test 2)
data test4;
input ID $ Q1-Q3;
datalines;
04 4 . .
05 6 7 7
06 8 3 5
;
by ID;
run;
data result2;
update test2 test4;
by ID;
run;
proc sort data=test2;
by ID;
proc sort data=test4;
by ID;
run;
data result1;
merge test2 test4;
7
Result1 data:
ID
Q1
Q2
Q3
04
05
06
4
6
8
.
7
3
.
7
5
ID
04
05
06
Q1
4
6
8
Q2
6
7
3
Q3
4
7
5
Result2 data:
8
Download