Study_Guide_Chap_1-6_SAS_Oct2012

advertisement
Chapter 2.1-2.2
1.
How is SAS different from other programming languages?
Compiles one statements one at a time
2.
When does SAS know it has reached the end of a step?
 Run Statement (for most procs)
 gets to another step
 Quit Statement (for some procs)
3.
SAS can operate on what three operating systems (or platforms)?
Windows Unix Mainframe
Chapter 2.3-2.4
1. What does the descriptor portion contain?
General information about data set
Variable Attributes
2. How can you see the descriptor portion of a SAS data set?
PROC CONTENTS
3. Write sample code using PROC CONTENTS to display the descriptor portion of the SAS data set called
events.homecoming.
Proc contents data= events.homecoming;
Run;
4. What are the characteristics of the two types of variables?
a. character
 Contain any value: letters, numbers, special characters, and blanks.
 Character values are stored with a length of 1 to 32,767 bytes.
 One byte equals one character.
 Are left-aligned.
b. numeric
 Stored as floating point numbers in 8 bytes of storage by default.
 Eight bytes of floating point storage provide space for 16 or 17 significant digits.
 You are not restricted to 8 digits.
 Are right-aligned.
5. What are the rules for SAS naming conventions for both data sets and variables?
 can be up to 32 characters long.
 can be uppercase, lowercase, or mixed-case.
 They must start with a letter or underscore. Subsequent characters can be numbers, letters, or
underscores. (No special characters)
6. How does SAS handle dates?
SAS stores date values as numeric values.
A SAS date value is stored as the number of days between January 1, 1960, and a specific date.
7. Write the correct SAS code to display the data portion of the SAS data set called SPORTS.mens_soccer .
Proc print data= sports.mens_soccer;
Run;
8. What does it mean that SAS statements are “free-format”?
 One or more blanks or special characters can be used to separate words.
 They can begin and end in any column.
 A single statement can span multiple lines.
 Several statements can be on the same line.
9. Write the correct SAS code to add a comment with your name and the date on one line, then on two lines using
a different method.
/* Name */
* Name;
10. What does SAS do when it encounters a syntax error? What is written in the SAS Log?
SAS underlines where SAS thinks things went wrong and the following information is written to the SAS
log:
 the word ERROR or WARNING
 the location of the error
 an explanation of the error.
11. What happens if you highlight a section of code, then submit?
Submits that line of code only
12. What are the rules for using quotation marks?
 Quotes must be pairs.
 Quotes must match.
 You can use single or double quotes.
1. Most of the time SAS won’t care.
13. What is the result of submitting a SAS program with unbalanced quotation marks?
 Most of the code is purple in color.
 The code is echoed out in the log with no notes, warnings or errors. There are no notes in the SAS log
because all of the SAS statements after the INFILE statement have become part of the quoted string.
 The banner on the window indicates the step is still running because the RUN statement was not
recognized.
14. How can we fix unbalanced quotation marks programmatically?
*’; *”; run;
15. Explain the function and importance of the buffer when submitting your SAS programs.
a. This buffer gets added to every time you submit.
b. The last thing that is added to the buffer is the first thing that is available to remove (just like that stack
of plates).
16. You have submitted five programs and need the third program you submitted back. How many times will you
need to submit the RECALL command? Where will you find your code, at the bottom or at the top?
3 times
Top of editor window
Chapter 3
1. What is the work library and how does it differ from the sasuser library?
Ans:
Work library is a temporary library where the files are deleted when the SAS session is closed.
The sasuser library is a permanent library where the files are not deleted.
2. What is the general form of the LIBNAME statement? Write an example.
Ans:
LIBNAME libref ‘physical address of folder';
Libname eprom ‘c:\events\prom’;
3. What are the rules for naming a libref?
Ans:
1. must be 8 characters or less
2. must begin with a letter or underscore
3. remaining characters are letters, numbers, or underscores.
4. How long does a libref remain in effect?
Ans:
1. Change it and resubmit it OR
2. Delete the connection OR
3. Exit SAS
5. What does it mean that the libname statement is global?
Ans:
It is a stand alone statement. It does not need to go inside a step.
It does not need a run statement!
6. What happens when you submit a libname statement?
Ans:
a connection is made between the libref (nickname) and the physical location of files on your
operating system.
7. What does it mean that SAS has a two-level statement?
Ans:
The first name (libref) refers to the library
The second name (filename) refers to the file in the library.
8. Write the SAS statement/s to explore the descriptor portion of the SAS data set called homecoming in the
events library. Don’t forget to create a libref. The events library is stored at c:\yr2010\events’.
Ans:
libname events ‘c:\yr2010\events’;
proc contents data=events.homecoming;
run;
9. Re-write #8 to list all the SAS files in the library, and the descriptor portion of every data set in that library.
Ans:
libname events ‘c:\yr2010\events’;
proc contents data=events._all_;
run;
10. Re-write # 8 to get a list of all the SAS files in the library and no descriptor portions.
Ans:
libname events ‘c:\yr2006\events’;
proc contents data=events._all_ nods;
run;
11. Write an appropriate statement to assign the library located on c:\mydocs\data to the libref mydata.
Ans:
libname mydata ‘c:\mydocs\data’;
Chapter 4.1
1. Write the syntax to create a list report with an underline where the data set name would appear.
Ans:
proc print data=____________________;
run;
2. What is the purpose of the VAR statement?
Ans:
The VAR statement enables you to
 select variables to include in the report
 define the order of the variables in the report.
3. In the program below, what data set is specified? From what libref?
libname mygrades 'c:\class\mygrades;
proc print data=mygrades.fall2010;
run;
Ans:
data set is fall2010
libref is mygrades
4. Will the following two segments of code produce the same results? Why?
proc print data=ia.empdata;
var JobCode EmpID Salary;
run;
proc print data=ia.empdata;
var jobcode salary empid;
run;
Ans:
No – the variables will be ordered by the var statement.
5. Does the VAR statement change what is stored in the data set?
Ans:
No
6. What are Operands and Operators?
Ans:
Operands include
 Variables (Salary, LastName, JobCode, EmpID…)
 Constants (the actual values the WHERE statement is comparing to). examples: 5,
‘Jones’, 50000, ‘PILOT’
Operators include
 comparison operators (=, >=, <, >, <= …)
 logical operators (and, or, not)
 special operators (between and, contains)
 functions (discussed in chapter 7)
7. With what statement can you control which observations are printed?
Ans:
WHERE statement
8. When you use a WHERE statement, is the actual data set modified? Why?
Ans:
No – PROC PRINT does not modify data.
9. Will the following two segments of code produce the same results? Why?
proc print data= orion.empdata2;
where JobCode = ‘PILOT’;
proc print data= orion.empdata2;
where JobCode = ‘Pilot’;
run;
run;
No – PILOT does not equal Pilot. Character comparisons are case-sensitive. Character strings go
inside matching quotes.
10. Would a StudentID of 0209457 be a character or numeric variable and why do you think that?
Ans:
It is a character string.
• It begins with a 0. If it was numeric, SAS would have cut off the leading 0.
• You will not be running math functions on this number.
• It needs quotes around the value since it is a character string…. But not around the variable
name!
11. What would be the output of the following code?
Ans:
proc print data=ia.empdata2;
where Salary=.;
Ans:
Where Salary=. ; means return all the results where Salary is missing
run;
12. Write three different WHERE statements that will give you universities in North Carolina and South Carolina.
 Format 1: where Variable in(‘option1’ ‘option2’);
 Format 2: where Variable in(‘option1’ , ‘option2’);
 Format 3: where Variable operator value and Variable operator value;
Ans:
where univ in (‘North Carolina’ ‘South Carolina’);
where univ in (‘North Carolina’,‘South Carolina’);
where univ eq ‘North Carolina’ and univ eq ‘South Carolina’;
13. Write a WHERE statement that will give you universities not in New York or New Jersey.
Ans:
where univ not in (‘New York’ or ‘New Jersey’);
where univ ^ in (‘New York’ or ‘New Jersey’);
14. Write a WHERE statement that will display grades between 93 and 100.
Ans: where grades between 93 and 100;
15. What is the mnemonic equivalent to CONTAINS?
Ans: ?
16. Write a WHERE statement that will display all values in the LastName variable that contains the string
‘in’. Give five examples of Last Names that would be displayed.
Ans:
where LastName ? ‘in’;
where LastName contains ‘in’;
Loudin, Grindy, Blankinship, Krindin, Quinn…
17. Write a WHERE statement that selects observations where the value of Univ begins with an U, followed
by a single character, followed by a C, followed by any number of characters. Give examples of what
might be displayed.
Ans:
where Univ like ‘U_C%’;
UNC, USC, UNCW, …
Chapter 4.2
1. What is the function of PROC SORT?
Ans: The SORT procedure
 rearranges the observations in a SAS data set
 can create a new SAS data set containing the rearranged observations
 can sort on multiple variables
 can sort variable contents in ascending (default) or descending order
 does not generate printed (displayed) output
 treats missing values as the smallest possible value
2. How can we create a new dataset from sorting an existing data set?
Ans: Use the out= option
3. Write the by statement that would put the variables age and id in descending order.
Ans: by descending age descending id
4. Write the by statement that would put the variable age in ascending order and id in descending order.
Ans: by age descending id
5. What is the difference between using a where statement in the proc sort versus proc print?
Ans: By putting the WHERE statement in the SORT step, the data set only contains the rows
where variable contains the value.
6. How can you create subtotals in a proc print?
Ans: Using a BY statement and a SUM statement together in a PROC PRINT step produces
subtotals and grand totals.
7. Which code segment is correct to put each group on a separate page (page breaks)? Why?
proc print data=work.empdata;
proc print data=work.empdata;
by JobCode;
pageby JobCode;
pageby JobCode;
sum Salary;
sum Salary;
run;
The first one is correct. You need a PAGEBY statement and this means you must have a BY
run;
statement.
8. If you want to use a BY statement, what must your data be?
Ans: sorted
Chapter 5.1
1. What does it mean that titles and footnotes are global?
Ans:
They remain set until: changed, cancelled, or you end your SAS session
2. How can you cancel all titles?
Ans:
title;
3. How many titles and footnotes can you have?
Ans:
up to 10 titles and 10 footnotes
4. Write a statement that would set the title to ‘SAS Programming’ on the first line of the output, then skip
a line and have a subtitle of ‘Fall 2006’.
Ans:
Title ‘SAS Programming’; or Title1 ‘SAS Programming’;
Title3 ‘Fall 2010’;
5. In the following program, which output will have the title?
Proc print data=colleges.pref;
Title ‘My Preferred Colleges’;
Run;
Proc print data = colleges.other;
Run;
Ans:
both – titles are set globally even if the title (or footnote) is inside the proc step.
6. Complete the table. Each proc step is submitted within the same SAS session.
proc print data= vacation.spring
My Vacation Choices
title ‘My Vacaction Choices’;
title3 ‘Spring 2011’;
Spring 2011
run;
proc print data= vacation.spring
My Vacation Choices
title3 ‘Spring Break 2011’;
run;
Spring Break 2011
proc print data= vacation.spring
My Vacation Possibilites
title1 ‘My Vacation Possibilities;
run;
proc print data= vacation.spring
My Vacation Possibilites
title3 ‘Senior Spring Break’;
run;
Senior Spring Break
proc print data= vacation.spring
title;
run;
7. Do labels change variable names in the data set? If not, what do they do?
Ans:
No – they replace variable names in the output
8. Differentiate the label option and the split= option.
Ans:
Label Option
Goes on the PROC Print statement
Label Option tells SAS to use the labels
Split= Option
Tells SAS where to split the label into multiple lines
Goes on the PROC Print statement with the split character identified
Used instead of the label option
9. Write the SAS statement to print the data set univ.public, assign the label ‘University Name’ to the
variable name. Have SAS put the words University and Name on two different lines.
Ans: proc print data=univ.public split= ‘ ’;
label name= ‘University Name’;
run;
10. Write a SAS statement to suppress the date.
Ans: options nodate;
11. Write a SAS statement to set the page number to 100.
Ans: options pageno=100;
12. Write a SAS statement to suppress the page number.
Ans: options nonumber;
13. What is the function of PROC SORT?
Ans: The SORT procedure
 rearranges the observations in a SAS data set
 can create a new SAS data set containing the rearranged observations
 can sort on multiple variables
 can sort variable contents in ascending (default) or descending order
 does not generate printed (displayed) output
 treats missing values as the smallest possible value
14. How can we create a new dataset from sorting an existing data set?
Ans: Use the out= option
15. Write the by statement that would put the variables age and id in descending order.
Ans: by descending age descending id
16. Write the by statement that would put the variable age in ascending order and id in descending order.
Ans: by age descending id
17. What is the difference between using a where statement in the proc sort versus proc print?
Ans: By putting the WHERE statement in the SORT step, the data set only contains the rows
where variable contains the value.
18. How can you create subtotals in a proc print?
Ans: Using a BY statement and a SUM statement together in a PROC PRINT step produces
subtotals and grand totals.
19. Which code segment is correct to put each group on a separate page (page breaks)? Why?
proc print data=work.empdata;
proc print data=work.empdata;
by JobCode;
pageby JobCode;
pageby JobCode;
sum Salary;
sum Salary;
run;
The first one is correct. You need a PAGEBY statement and this means you must have a BY
run;
statement.
20. If you want to use a BY statement, what must your data be?
Ans: sorted
Chapter 5.2
1. Describe each part of the following format statement.
format Savings dollar9.2;
Ans:
format is the keyword
Savings is the variable name.
Dollar tells SAS you want a $ and commas
9 tells SAS that you want to take up 9 columns in the output (including the dollar sign, decimals
digits and decimal point)
2 tells SAS that you want 2 decimal places.
2. What are the three most common things forgotten when programming in SAS?
Ans:
1. Semicolon ;
2. Label (or split=) option in PROC Print
3. The dot . on the format name
3. What happens if the width is not big enough?
Ans:
SAS preserves the number
1. First SAS will get rid of what it thinks is the least essential, like a comma, then the dollar sign
2. then SAS will round
3. Then SAS will switch to exponential notation
4. SAS will display asterisks ***
4. What happens if you forget the . after the format name
Ans:
SAS looks for a variable with the name of the format. When SAS can’t find it, SAS puts a note in
the log.
5. Using today’s date, complete the table.
Format
MMDDYY6.
MMDDYY8.
MMDDYY10.
DATE7.
DATE9.
Displayed Value
010111
01/01/11
01/01/2011
01JAN11
01JAN2011
6. When creating your own format, what are the constraints of the following:
a. Labels
i. can be up to 32,767 characters in length
ii. are typically enclosed in quotes, although it is not required (good practice to do so)
b. Range(s)
i. can be single values
ii. ranges of values
c. Format-name
i. Names the format you are creating
ii. cannot be more than 32 characters in SAS 9
iii. cannot be the name of a SAS format (i.e. commaw.d)
7. Summarize the rules for format names.
Ans:
names the format you are creating
cannot be more than 32 characters in SAS 9
for character values, must have a
dollar sign ($) as the first character,
letter or underscore as the second character
for numeric values, must have a letter or underscore as the first character
cannot end in a number
cannot be the name of a SAS format
does not end with a period in the VALUE statement.
8. What are the steps to creating user-defined formats?
Ans:
i. Create the format
ii. Apply the format
9. a). Write a SAS program to create a numeric format for grades as follows:
93 – 100 A
85 – 92 B
77 – 84 C
70 – 76 D
0 – 69 F
Ans:
proc format;
value $grformat
0 - 69 = ‘F’
70 – 76 = ‘D’
77 – 84 = ‘C’
85 – 92 = ‘B’
93 – 100= ‘A’;
run;
b). You are creating a list report of a data set containing your grades called fallgr, which is stored in the school
library. You should apply the format you created in part a to the variable MYGRADE.
Ans:
Proc print data=school.fallgr;
Format mygrade grformat. ;
Run;
Chapter 6
1.
What are the three statements needed to create a data set from a raw data file?
Ans: 1. DATA statement - starts a DATA step and name the SAS data set being created
2. INFILE statement - Identifies the location and the raw data file to read
3. INPUT statement - Describes how to read the data fields from the raw data file.
2. Write a DATA statement that creates a permanent SAS data set called example in the ORION library.
Ans: data orion.example;
3. What is the purpose of the INFILE statement?
Ans: the INFILE statement is to tell SAS you are reading from a raw data file, and which file to
read from
4. If you do not specify a path in your INFILE statement, where will SAS go to get the file? How can you
change the location besides putting the full path in the INFILE statement?
Ans: The current working directory, found in the bottom right corner of your SAS session. You
can change the current working directory by double clicking on the path in the bottom right
corner of your SAS session and navigating to the correct location of the file.
5. In the statement INPUT input-specifications;, what is the purpose of input-specifications.
Ans: Input-specifications
 names the SAS variables
 identifies the variables as character or numeric
 specifies the locations of the fields in the raw data
 can be specified as column, formatted, and list (delimited) input. We will look at list
input later in this chapter.
6. What are standard numerics?
Ans Positive and negative numbers
 Numbers with decimals
 Exponential notation
7. What are the two phases of the DATA step? What is created by each phase?
Ans: 1. Compilation – Descriptor portion
2. Execution – Data portion
8. What is created at compile time when reading from a raw data file?
Ans:
 The input buffer to hold the current raw data file record that is being processed.
 The program data vector (PDV) which contains a spot for each variable, and that will
eventually hold the current SAS observation.
 The descriptor portion of the output data set.
9. What is the purpose of the INFILE statement at compile time?
Ans: Tells SAS that it will be reading from a particular data file and to create the input buffer.
10. What is the purpose of the INPUT statement at compile time?
Ans: SAS “walks” through the INPUT statement, creating a spot in memory for each variable.
This is called the Program Data Vector (PDV).
11. When does SAS create the descriptor portion of the SAS data set?
Ans: At the end of the data step after the RUN statement is processed. This happens at the end of
COMPILE time.
12. What is automatic output?
Ans: When at the RUN statement, SAS will output what is in the PDV to the data set.
13. When should you use formatted input to read data?
Ans:
1. data in fixed columns
2. standard and nonstandard character and numeric data
3. calendar values to be converted to SAS date values.
14. How does the INPUT statement read data?
Ans:
read data values by
moving the input pointer to the starting position of the field
specifying a variable name
specifying an informat.
18.
When does SAS detect a data error?
Ans: 1. the INPUT statement encounters invalid data in a field
2. illegal arguments are used in functions (We will talk about this in the next
chapter)
3. impossible mathematical operations are requested.
19. What happens when SAS detects a data error?
Ans: 1. a note that describes the error is printed in the SAS log
2. the input record being read is displayed in the SAS log (contents of the
input buffer)
3. the values in the SAS observation being created are displayed in the SAS
log (contents of the PDV)
4. a missing value is assigned to the appropriate SAS variable
5. execution continues.
20. What are _ERROR_ and _N_?
Ans: _ERROR_ and _N_ are internal SAS variables.
They are not written to the data set.
_N_ is the number of times SAS has looped through the data step.
_ERROR_ can have a value of 0 or 1.
It indicates whether or not an error occurred.
0 in SAS means false, 1 means true.
21. Answer the questions
i.Which observation has the error?
Ans: 7
ii.What are the contents of the input buffer for that observation?
Ans: 07KING
DANNY
FRESHMAN
8p.25 47
iii.What are the contents of the PDV?
Ans: ID=07 LastName=KING FirstName=DANNY Class=FRESHMAN
Grade=. _ERROR_=1 _N_=7
iv.What variable has the problem?
Ans: Grade
v.What was assigned to the variable?
Ans: .
vi.What type of variable is it? (Character or Numeric) How do you know?
Ans: Numeric – Missing numeric values are assigned a period.
vii.Where does SAS tell you to look in the buffer? (Column numbers)
Ans: 43-47
viii.What is the value SAS found?
Ans: 8p.25
ix.How did the error affect your output?
Ans: It placed a . where the grade should be which means there will be a missing value for a
grade for Danny King
1. Differentiate what is assigned and what is not assigned when a variable is created in a DATA step.
Ans: name, type, and length of the variable are automatically assigned
remaining attributes such as label and format are not automatically assigned.
2. How does the Descriptor portion of the data set change if you permanently assign attributes, such as a
label?
Ans: The permanently assigned attributes will be stored in the descriptor portion
3. You permanently assign labels in the school.events data set. Write a PROC PRINT step to display the
data with the labels.
Ans: proc print data=school.events label;
run;
4. How can you override a permanently assigned format?
Ans: Use a format statement in the PROC PRINT
5. What is free format?
Ans: Raw data with fields that are not in fixed columns
6. How can you read free format data?
Ans: Use list input, also known as delimited input, to read free-format data.
7. Name three common delimiters.
Ans: blanks, commas, tabs
8. Differentiate formatted input and list input.
Ans: column input is used when data is set in specific locations
List input is used when there is a delimiter and the data is free-form
9. Write the input statement for the following variables: fname, lname, age.
Ans: input fname $ lname $ age;
10. How do you specify an informat in the input statement?
Ans: To specify an informat, use the colon (:) format modifier in the INPUT statement between
the variable name and the informat.
11. Why is the colon important?
Ans: The colon tells SAS you want LIST input, not FORMATTED input.
12. Add a length statement setting the ID to 9.
data airplanes;
length ID $ 9;
infile 'raw-data-file';
input ID $
GradDate : date9.
Credits GPA;
run;
13. What happens if the colon is left out of the informat?
Ans: If the colon is omitted, SAS begins reading one column to the right of the pointer. It reads the
length of the informat, which may cause it to read past the end of the field, or only part of the
data.
Basically, forgetting the colon tells SAS you are using FORMATTED input…. Without the
@ pointer.
14. What do you think SAS does if you use more than one character in the DLM= option?
Ans: SAS will treat either as the delimiter
12. By default, when there is missing data at the end of a row, what does SAS do?
Ans:
1. SAS loads the next record to finish the observation
2. a note is written to the log
3. SAS loads a new record at the top of the DATA step and continues
processing.
13. What is the DSD option?
Ans:
The DSD option
 sets the default delimiter to a comma
 treats consecutive delimiters as missing values
 enables SAS to read values with embedded delimiters if the value is surrounded by double
quotes.
14. Complete the table.
Problem
Option
Setting delimiters
dlm = ‘delimiter’
Missing data at end of row
Missover
End of Record Marker inside a fixed column data value
Pad
End of Record Marker inside a fixed column value
Truncover
Missing data represented by consecutive delimiters
dsd
and/or
Embedded delimiters where values are surrounded by
double quotes
15. How do you use an infile statement option (like DLM= or Missover) with cards or datalines?
Ans:
If you want to use an INFILE statement option (like DLM= or Missover, etc.), replace the raw data file name,
simply put cards or datalines.
Download