Fall 2013 Statistics 479 Assignment #1 (30 points) 1. Start a SAS session under SAS System for Windows 9.3, enter the SAS program given in Example A5 in the text book into the editor window and submit it for execution. (Alternatively, you may cut and paste the program from the Download SAS Programs webpage or drag and drop the program from the Stat479 folder in Gilman 2272 lab Desktop.) Edit this program to add ODS statements needed to direct the output to an rtf file to be saved in your U: drive folder. Submit your program for execution and examine the resulting log window and the results viewer and compare to the results shown in the text book. (Resubmit program if necessary with corrections, if any needed.) Open the rtf file in WORD and add the contents of both the program and SAS log windows obtained from executing the final version of the program (by cutting and pasting). Print this page (should be just one page) on a printer like those in Gilman 2272. Label with your name and problem # and turn in this single page. 2. Provide written answers to the following questions on a separate sheet: (a) Name the SAS statement that must follow the data statement when creating a SAS data set from external data (i.e., raw data). Note carefully that this is *not* the infile statement. (b) Name the SAS statement that must immediately precede lines of data when external data (i.e., raw data) is included instream (i.e., when data included included within the program) instead of reading the data from a file. (c) Attributes are information saved in a SAS data set other than data values. Name five attributes of a SAS variable. Give at least two SAS statements that allow you to specify attributes for SAS variables. (d) Suppose there is a data file called survey.txt in your Stat479 folder on the U drive. Write an infile statement needed to read this data using a data step in a SAS program. (e) Three styles of data input available in SAS are list input, formatted input, and column input. Write SAS input statements for reading data values for variables Id, Name, Age, Height, and Weight to illustrate each of these styles of input. Show a sample data line to be read by each input statement.(The answer is similar to examples to illustrate these input styles in powerpoint notes.) (f) When a new SAS data set is created in a data step using data from an existing SAS data set as input, the set statement is used to name the old data set. Write SAS statements to create a new SAS data set named reorder from the observations in the SAS data set sales with values for variables Region is equal to ‘Midwest’ and Quantity is less than 3476.25, respectively. (g) How many data steps are there in the SAS program in Example B1? Name the SAS data sets created in each of these steps. Are these data sets permanent or temporary? Give the names of the variables and the number of observations in each of these data sets. 1 (h) Describe how date and time values are represented in the computer memory in the SAS system (refer to the relevant online manual pages to answer this). (i) Explain clearly the difference between the delete and drop statements. You may use examples to illustrate the effects of these statements. (j) Read pages 339–343 of the SAS 9.3 Language Reference: Concepts (Use the PDF Version of the SAS online documentation). [These pages are in Chapter 18: DATA Step Processing in the section titled Processing a DATA Step: A Walkthrough ] i. Sketch how the input buffer and the program data vector would appear after the third line of data in the sample data step on page 339 has been processed but just before the observation is written to the SAS data set. (Your answer should look like Figures 18.4) ii. Show the 3rd observation as it is written to the SAS data set. (Your answer should look like Figure 18.5) 3. Ms. Anderson wants to use a SAS program to compute the total score, assign letter grades, and compute summary statistics for her college Stat 101 class. A maximum of 50 points each could be earned for the quizzes, 100 points each for the midterm exams and the labs, and 200 points for the final exam. Data for the entire semester are available and a subset of the data is shown below: Id 5109 7391 4720 4587 .. . Major Psych Econ Math Stat .. . Year 4 4 4 3 .. . Quiz 50 49 39 46 .. . Exam1 93 95 63 92 .. . Exam2 93 98 84 96 .. . Lab 98 97 95 88 .. . Final 162 175 95 150 .. . 3907 4013 4456 7324 0746 Note: CE 4 44 80 99 99 134 Econ 2 48 86 87 96 165 Acct 4 36 83 88 91 154 Psych 3 42 78 98 95 102 Chem 4 48 84 84 97 154 Year = the year in school, Quiz = the total for 5 quizzes Lab = the total for 10 labs Do not run the SAS program you are required to construct until you have completed all of the steps described below in parts a) to e). a) Write SAS statements necessary to create a SAS data set named stat101. Name your variables as Id, Major, Year, Quiz, Exam1, Exam2, Lab and Final, respectively. Enter the data instream, with at least one blank between data values and use the list input style to read the data in (You may cut and paste the data from the data file into your program). [This is the first data step in your program] b) Write SAS statement(s) to be added to the above data step to create i. a new numeric variable Score containing the value of the course percentage, based on weighting the points obtained for the quizzes by 10%, each of the two midterms by 20%, the lab total by 10% and the final by 40% computed for each student. 2 ii. a new character variable Grade containing letter grades A, B, C, D, and F , using 90, 80, 70, 60 percent cutoffs, respectively. You may use the variable Score you created in part (i) above, in the SAS statements needed for this part. [These would modify the first data step.] c) Add a proc step provide a SAS listing of the data set stat101. [This would be the first proc step in your program.] d) Students who are juniors and seniors and obtain A’s from this class will qualify for applying to a research internship next summer. Create a SAS data set named intern containing only those juniors and seniors earning a letter grade A, using the observations from the data set stat101. Provide a SAS listing of the new data set that show only the variables Id, Major, Year and Score. Include a statement to suppress the observation number from this listing, instead identifying the students in the output by their Id number. [You would be adding a second data step and a second proc step to your program.] e) Obtain exactly the same listing as in part d), without creating a new SAS data set to do it. Instead use the SAS statement where within a proc print procedure step to select the subset of observations to be processed. (See Example A3 of your text or go to the SAS 9.3 Language Reference: Concepts link on the Stat479 course page, click on the Search tab and perform a search on the word where). [This would be a third proc step.] NOTE: Notes for Problem 3: • SAS code to accomplish all of the above parts must be in a single SAS program. Put different titles on each listing produced (i.e., use appropriate title statements in the proc steps) for the purpose of identifying the outputs properly. Run the program when you have completed all the step. • You will find that you have made plenty of errors and therefore have to debug your program many times. Remember to save the current version everytime you edit your program. • Turn in a printed copy of the SAS program and the final output. Need to use WORD to organize the output nicely into 1 or 2 printed pages as shown in class. You may use ODS statements to produce WORD output as you learnt in class Quiz#1 ) • Use the SAS Explorer to look in the work folder in the Libraries for the temporary SAS data set named work.stat101. Open this file as a viewtable by clicking on the icon named stat101. Just check if the data set is created correctly as you intended to do. Do not need to turn this in. Due Tuesday September 10, 2013 3