Statistics 479 Assignment #2 (40 points)

advertisement
Fall 2013
Statistics 479
Assignment #2 (40 points)
Note: You must provide hand-written (or typed) answers to these questions. Do not turn-in SAS output as
answers to any part.
1. In each of the following cases show the observations written to the SAS data set (variable names and
corresponding data values) when the given lines of data are read by the given INPUT statement:
(a) input Id Gender $ Age GPA SAT;
1906 M 18 3.15 720
3045 F . 3.85 683
2119 M 23
3.72
775
(b) input (Id Quiz1-Quiz4) (4. 4*3.);
3490432 16798 2
197857
36
84
(c) input Id $1-4 @8 Pulse 3. +2 Weight 4.1 Pushups;
L4WP236 85 92517 28
M5XQ 47
1423 32
158 79 81145 19
(d) input Name :$11. Id 4. Visit : mmddyy8.;
Wilson 3974 11/25/47
Worthington 1598 7/16/86
NOTE: The symbol denotes one space.
2. Sketch the output resulting from executing the following SAS program. Describe in your own words the
flow of operations in the data step in creating this data set.
data class;
input Score @@;
if Score > 50 then do;
Grade = "Pass";
Rating = (Score -50)/10;
end;
else if Score < 50 then do;
Grade = "Fail";
end;
datalines;
47 49 50 52 55 .
;
proc print data=class; run;
3. A data step reads the following data lines:
1
4.
5.
5.68
1.25
4.57
2
1
2.34
3
3.46
What would be the view of the SAS dataset if the input statement used was each of the following ? Write
a brief explanation of what you think takes place in each data step.
(a) input X
Y;
(b) input X
Y @@;
(c) input X;
(d) input X
Y
Z;
4. The program data vector in a SAS data step has variables with values as shown below:
Code = ‘VLC’
Size = ‘M’
V1 = 2
V2 = 3
V3 = 7
V4 = .
Determine the results of the following SAS expressions (as stored internally):
(a) (V1 + V2 -V3)/3
(b) V3 - V2/V1
(c) V1*V2 - V3
(d) V2*V3/V1
(e) V1**2 + V2**2
(f) Code = ‘VLC’
(g) Code = ‘VLC’ & Size = ‘M’
(h) Code = ‘VLC’|size = ‘M’
(i) Code = ‘VLC’ & V4^=.
(j) (V3=.)
+ (V2=3)
(k) V1 + V2 + V3 ˆ = 12
(l) Code = ‘VLC’ | (Size = ‘M’ & V1 = 3)
(m) 3 < V2 < 5
[Hint: Recall that logical expressions evaluate to 1 (‘TRUE’) or 0 (‘FALSE’)]
5. Show the values for the variable Miles that will be stored in the SAS dataset distance:
data distance;
input Miles 5.2;
datalines;
1
12
123
1234
12345
1.
12.
12.3
1234.5
;
2
6. Display the printed output produced from executing the following SAS program. Show what is in the
program data vector immediately after processing the first line of data. (5 points)
data carmart;
input Dept $ Id $ P82 P83 P84;
Drop P82 P83 P84;
Year=1982; Sales=P82; output;
Year=1983; Sales=P83; output;
Year=1984; Sales=P84; output;
datalines;
parts
176 3500 2500
800
parts
217 2644 3500 3000
tools
124 5672 6100 7400
tools
45 1253 4698 9345
repairs 26 9050 5450 8425
repairs 142
;
proc print; run;
7. Sketch the printed output produced by executing the following SAS program. Display the contents of the
program data vector at the point the first observation is to be written to the SAS data set.
data one;
input X1-X3 @@;
X3=3*X3-X1**2;
X4=sqrt(X2);
drop X1 X2;
datalines;
3
4
5
-2
9
3
-3
;
proc print data=one; run;
1
4
.
16
8
8. Study the the following program:
data tests;
input Name $ Score1 Score2 Score3 Team $ ;
datalines;
Joe 11 32 76 red
Michael 13 29 82 blue
Susan 14 27 74 green
;
proc print; run;
(a) Sketch the printed output produced from executing this program.
(b) What would be the printed output if the input statement is changed to the following:
input name $ Score1 Score2 Score3;
(c) What would you do to modify the above program if the data value for the variable Score2 was
missing for Michael?
3
(d) Would the above input statement still work if the datalines were of the form given below. Explain
why or why not.
Joe
11 32 76
red
Michael
...
(e) Using the SAS function sum() or otherwise, write a single SAS assignment statement to create a
new variable called Total that contains the total test score for each individual. Where would you
insert this statement in the above program?
(f) If the raw data were available in the text file test.txt (with the data lines entered in the same
format) in the folder M:\stat479\class, modify the above data step to create the SAS data set
tests using this file.
9. A local high school collects data on student performance in grades 9 through 12. In grades 9 and 10 data
were collected for Science and English only while for for grades 11 and 12 Math scores were also recorded.
Unfortunately, the data so collected were recorded as described below resulting in two completely
different data layouts. Write a SAS program to create a temporary SAS data set called perform which
contains observations for all four years of high school by accessing raw data (may be, containing 100’s of
data lines) as described below. Turn in your SAS program.
Sample Data Lines:
1
2
3
Columns:123456789012345678901234567890
______________________________
0962432736578
118091315736792
0945712817859
125916294847689
1076543057182
112479329697883
Data Description: (note the two types of data lines depending on grade)
Field
1
2
3
4
5
1
2
3
4
5
6
Variable Description
Grade
Student Id
GPA
Science
English
Grade
Student Id
GPA
Science
Math
English
Columns
1-2
2-6
7-9
10-11
12-13
1-2
2-6
7-9
10-11
12-13
14-15
Type
char (09 or 10)
char
numeric (with 2 decimals)
numeric (whole number)
numeric (whole number)
char (11 or 12)
char
numeric (with 2 decimals)
numeric (whole number)
numeric (whole number)
numeric (whole number)
Due Thursday, September 19, 2013
4
Download