Statistics 479 Assignment #1 (30 points)

advertisement
Fall 2013
Statistics 479
Assignment #1 (30 points)
1. Start a SAS session under SAS System for Windows 9.3, enter the SAS program given in
Example A5 in the text book into the editor window and submit it for execution. (Alternatively, you may cut and paste the program from the Download SAS Programs webpage or
drag and drop the program from the Stat479 folder in Gilman 2272 lab Desktop.) Edit this
program to add ODS statements needed to direct the output to an rtf file to be saved in your
U: drive folder. Submit your program for execution and examine the resulting log window
and the results viewer and compare to the results shown in the text book. (Resubmit
program if necessary with corrections, if any needed.) Open the rtf file in WORD and add the
contents of both the program and SAS log windows obtained from executing the final version
of the program (by cutting and pasting). Print this page (should be just one page) on a
printer like those in Gilman 2272. Label with your name and problem # and turn in this
single page.
2. Provide written answers to the following questions on a separate sheet:
(a) Name the SAS statement that must follow the data statement when creating a SAS
data set from external data (i.e., raw data). Note carefully that this is *not* the infile
statement.
(b) Name the SAS statement that must immediately precede lines of data when external
data (i.e., raw data) is included instream (i.e., when data included included within the
program) instead of reading the data from a file.
(c) Attributes are information saved in a SAS data set other than data values. Name five
attributes of a SAS variable. Give at least two SAS statements that allow you to
specify attributes for SAS variables.
(d) Suppose there is a data file called survey.txt in your Stat479 folder on the U drive.
Write an infile statement needed to read this data using a data step in a SAS program.
(e) Three styles of data input available in SAS are list input, formatted input, and column
input. Write SAS input statements for reading data values for variables Id, Name, Age,
Height, and Weight to illustrate each of these styles of input. Show a sample data line
to be read by each input statement.(The answer is similar to examples to illustrate these
input styles in powerpoint notes.)
(f) When a new SAS data set is created in a data step using data from an existing SAS data
set as input, the set statement is used to name the old data set. Write SAS statements
to create a new SAS data set named reorder from the observations in the SAS data set
sales with values for variables Region is equal to ‘Midwest’ and Quantity is less than
3476.25, respectively.
(g) How many data steps are there in the SAS program in Example B1? Name the SAS
data sets created in each of these steps. Are these data sets permanent or temporary?
Give the names of the variables and the number of observations in each of these data
sets.
1
(h) Describe how date and time values are represented in the computer memory in the SAS
system (refer to the relevant online manual pages to answer this).
(i) Explain clearly the difference between the delete and drop statements. You may use
examples to illustrate the effects of these statements.
(j) Read pages 339–343 of the SAS 9.3 Language Reference: Concepts (Use the PDF
Version of the SAS online documentation). [These pages are in Chapter 18: DATA Step
Processing in the section titled Processing a DATA Step: A Walkthrough ]
i. Sketch how the input buffer and the program data vector would appear after
the third line of data in the sample data step on page 339 has been processed but
just before the observation is written to the SAS data set. (Your answer should look
like Figures 18.4)
ii. Show the 3rd observation as it is written to the SAS data set. (Your answer should
look like Figure 18.5)
3. Ms. Anderson wants to use a SAS program to compute the total score, assign letter grades,
and compute summary statistics for her college Stat 101 class. A maximum of 50 points each
could be earned for the quizzes, 100 points each for the midterm exams and the labs, and 200
points for the final exam. Data for the entire semester are available and a subset of the data
is shown below:
Id
5109
7391
4720
4587
..
.
Major
Psych
Econ
Math
Stat
..
.
Year
4
4
4
3
..
.
Quiz
50
49
39
46
..
.
Exam1
93
95
63
92
..
.
Exam2
93
98
84
96
..
.
Lab
98
97
95
88
..
.
Final
162
175
95
150
..
.
3907
4013
4456
7324
0746
Note:
CE
4
44
80
99
99
134
Econ
2
48
86
87
96
165
Acct
4
36
83
88
91
154
Psych
3
42
78
98
95
102
Chem
4
48
84
84
97
154
Year = the year in school, Quiz = the total for 5 quizzes
Lab = the total for 10 labs
Do not run the SAS program you are required to construct until you have completed all of
the steps described below in parts a) to e).
a) Write SAS statements necessary to create a SAS data set named stat101. Name
your variables as Id, Major, Year, Quiz, Exam1, Exam2, Lab and Final, respectively. Enter the data instream, with at least one blank between data values and use the
list input style to read the data in (You may cut and paste the data from the data file
into your program). [This is the first data step in your program]
b) Write SAS statement(s) to be added to the above data step to create
i. a new numeric variable Score containing the value of the course percentage, based
on weighting the points obtained for the quizzes by 10%, each of the two midterms
by 20%, the lab total by 10% and the final by 40% computed for each student.
2
ii. a new character variable Grade containing letter grades A, B, C, D, and F , using 90,
80, 70, 60 percent cutoffs, respectively. You may use the variable Score you created
in part (i) above, in the SAS statements needed for this part.
[These would modify the first data step.]
c) Add a proc step provide a SAS listing of the data set stat101. [This would be the
first proc step in your program.]
d) Students who are juniors and seniors and obtain A’s from this class will qualify for
applying to a research internship next summer. Create a SAS data set named intern
containing only those juniors and seniors earning a letter grade A, using the observations
from the data set stat101. Provide a SAS listing of the new data set that show only the
variables Id, Major, Year and Score. Include a statement to suppress the observation
number from this listing, instead identifying the students in the output by their Id
number. [You would be adding a second data step and a second proc step to
your program.]
e) Obtain exactly the same listing as in part d), without creating a new SAS data set to do
it. Instead use the SAS statement where within a proc print procedure step to select
the subset of observations to be processed. (See Example A3 of your text or go to the
SAS 9.3 Language Reference: Concepts link on the Stat479 course page, click on
the Search tab and perform a search on the word where). [This would be a third
proc step.]
NOTE: Notes for Problem 3:
• SAS code to accomplish all of the above parts must be in a single SAS program. Put
different titles on each listing produced (i.e., use appropriate title statements in the
proc steps) for the purpose of identifying the outputs properly. Run the program when
you have completed all the step.
• You will find that you have made plenty of errors and therefore have to debug your
program many times. Remember to save the current version everytime you edit your
program.
• Turn in a printed copy of the SAS program and the final output. Need to use WORD
to organize the output nicely into 1 or 2 printed pages as shown in class. You may use
ODS statements to produce WORD output as you learnt in class Quiz#1 )
• Use the SAS Explorer to look in the work folder in the Libraries for the temporary
SAS data set named work.stat101. Open this file as a viewtable by clicking on the
icon named stat101. Just check if the data set is created correctly as you intended to
do. Do not need to turn this in.
Due Tuesday September 10, 2013
3
Download