Homework 05

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Homework 5 (Due Tuesday, Sept. 8th)
Example of a SAS Program without a Data Step
The SAS program for this homework will not include a Data Step, because we don’t need to create any
new variables or modify our variables. We’ll just be working with the variables as they are in dataset
FewerVariables.xls.
Import Dataset “FewerVariables.xls” and Run Proc Contents
 Write a SAS program that uses Proc Import to import the “FewerVariables.xls” dataset that you created
and saved in your ECN377 folder in Homework 4. Use Proc Import to name the dataset “dataset01”.
 Run Proc Contents on dataset01. Based on the results of Proc Contents in the output window, answer
these questions: How many rows of data are in dataset01? How many variables are in dataset01? Which
of these variables are character/text variables?
Using Proc Sort and Proc Print
1. Continuing to add commands to your SAS program, use Proc Sort to sort dataset01 by CntyName.
2. Use Proc Print with dataset01 to print the data for the following variables (only these variables) to the
Output window in SAS: CntyName, PopCens, ManfJobs, and UnempRate. (Reminder: Don’t put
commas between the variable names.)
Note: If you run your SAS program multiple times, which is fine, SAS will add the results of the
new run to the bottom of the output window below the results from any prior runs. So, the
newest results will always be at the bottom of the output window. If you want, you can clear the
output window each time before you run your program to clear out any results from prior runs.
The same is true for the log window. SAS adds log info from new runs beneath any log info
from old runs. So, the newest log info is always at the bottom of the log window.
3. Look at the output from Proc Print in the output window (Not the output from Proc Contents, but the
output from Proc Print, which will be at the bottom of the output window.). Answer these questions:
Which NC county name is first in alphabetical order? Which is last? What was the population in each of
these two counties in 2000? What about the unemployment rate in each of these two counties in 2000?
4. Use a new Proc Sort command to sort dataset01 again, this time by variable PopCens.
5. Use a new Proc Print command to print the data again for variables CntyName and PopCens (only) to the
output window in SAS. Look at the output (closest to the bottom of the output window) to answer these
questions: Which NC county had the largest population in 2000? The smallest? Which had population
(PopCens) equal to 160,307 ?
6. Use another Proc Sort command to sort dataset01 again, this time by variable ManfJobs.
7. Use another Proc Print to print the data again for variables CntyName and ManfJobs (only) to the output
window in SAS. Which NC county had the largest value of ManfJobs in year 2000? The smallest? What
was the value of ManfJobs for NewHanover county?
8. Use another Proc Sort command to sort dataset01 again by variable GeoRegion and then by variable
ManfJobs within each region, like this: proc sort data=dataset01;
by georegion manfjobs;
run;
9. Use another Proc Print command to print the data from dataset 01 for variables CntyName, GeoRegion
and ManfJobs (only) to the Output window in SAS. Which NC county in the "coast" region had the
largest manufacturing employment per 1000 population in year 2000, and what was the employment?
Which NC county in the "mountain" region had the largest manufacturing employment per 1000
population in year 2000, and what was the employment?
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Using Proc Means
1. Use a Proc Means command with dataset01 to calculate the following statistics: N, MAX, MIN, MEAN,
MEDIAN, CV, showing 2 decimal places to the right of the decimal, for the following variables:
PopCens, ManfJobs, ConstJobs, ServJobs, FarmJobs, UnempRate. Proc Means automatically sends the
output to the output window; you don’t need to use a separate Proc Print command.
2. Based on the output from Proc Means, what was the smallest unemployment rate in year 2000 in a North
Carolina county? What was the largest? What was the mean county unemployment rate? What about the
median? Is the mean different from the median? If so, what is this telling you?
3. What is the coefficient of variation (CV) of unemployment rate across NC counties? What is this telling
you?
4. Compare the CV’s of ManfJobs, ConstJobs, ServJobs, FarmJobs. What does the comparison of CV’s tell
you?
5. Why can’t we use Proc Means to analyze variables CntyName, GeoRegion and UnempCat?
6. Use another Proc Means command to calculate the MAX, MEAN and CV of ManfJobs by GeoRegion.
Before you use Proc Means to do this, you must use Proc Sort again to sort your data by
GeoRegion. Looking at the output of this second Proc Means command in the output window (scroll
down to the bottom of the output window—remember, newest results at the bottom), which GeoRegion
had the county with the largest value of ManfJobs, and what was the value? The counties of which
GeoRegion had the largest mean value of ManfJobs? Which GeoRegion had the greatest variation of
ManfJobs across the counties in the region?
Using Proc Gchart
We can’t use Proc Means to describe character/categorical/text variables like GeoRegion and UnEmpCat, but we
can use Proc Gchart to describe these variables by creating frequency distributions and percentage distributions:
1. Use a Proc Gchart command with dataset01 to make a vertical frequency Gchart for GeoRegion and a
vertical percentage Gchart for variable UnEmpCat. You need two “vbar” statements in the command,
one for each chart. What number (frequency) of counties is in the "mountain" region of North Carolina?
What percentage of counties is in the “high” UnEmpCat?
2. Use another Proc Gchart command with dataset01 to make a horizontal frequency distribution of the
UnempCat variable. By default, SAS will print the categories in alphabetical order along the Gchart axis,
but it makes more sense to order the categories from 'Low' to ‘Med’ to 'High'. Recall that you can control
the order of the categories using the "midpoints" option, as shown below. Put the category names in
single quotes:
proc Gchart data=dataset01;
hbar UnempCat / midpoints = 'Low' 'Med' 'High';
run;
How many counties (frequency) are in the ‘Low’ category of UnempCat? What percentage of counties is
in the ‘High’ category? What does the cumulative percent value for the ‘Med’ category of UnempCat tell
us?
Save Your Program and Write up Your Homework
After you run your SAS program and verify that it is working correctly, save the SAS program as HW05.sas.
When this homework asks you to answer specific questions about the results, you need to answer in complete
sentences, in addition to giving the appropriate numbers. Don’t forget to include the answers to the multiple
choice questions below. Also, include a print out of your SAS program “HW05.sas.” Finally, be sure to put your
name, ECN377, your section, and “Homework 5” at the top of your homework.
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Multiple Choice and Matching Section
1) Suppose you want to summarize the values of a numerical measurement variable. Which descriptive
statistics should be used? a) frequency distributions, mean, standard deviation
b) histograms, mean,
standard deviation
c) mean, mode, t-test
d) mean, median, frequency distributions
2) Suppose you want to summarize the values of a nominal character/text variable. Which descriptive
statistics should be used?
a) skew, mean, histograms b) mean, median, frequency distributions
c) frequency distributions, mode
d) histograms, mean, mode
3) The procedure used to calculate descriptive statistics for numerical measurement variables in SAS is:
a) proc contents
b) proc sort c) proc means d) proc ttest
4) Suppose you have a dataset with variables State, Year, Revenues and Expenses, and suppose you
want to calculate mean Revenues and mean Expenses by State. One must first use ________ to sort the
data before using _________ to calculate the means by State.
a) Proc Print, Proc Sort b) Proc Sort, Proc Means
c) Proc Means, Proc Print d) Proc Print, Proc Sort
5) Suppose you are trying to describe the income of a typical household in a small, poor country that has
many poor households and a few very rich households. Which measure of central tendency would be
better to use?
a) range
b) mean
c) median
d) coefficient of variation
6) Suppose you are trying to describe the variation in stock price data, and you need to describe the
variation using different measurement units (different currencies) for different clients. Which measure
of variation would produce results that are comparable across the different measurement units?
a) variance
b) standard deviation c) coefficient of variation
d) sum of deviations
7) In SAS, Proc ______ is used to create frequency distribution graphs.
a) sort
b) print
c) gchart
d) graph
8) Suppose you have descriptive statistics for the variables in a client’s dataset (but you don’t have the
actual data), and you are trying to determine which of the patterns below best describes the values of one
of the variables in the dataset. Which descriptive statistic would help you?
a) mean b) standard deviation
c) skewness
d) kurtosis
3
UNC-Wilmington
ECN 377
Department of Economics and Finance
Dr. Chris Dumas
9) Suppose you have descriptive statistics for the variables in a client’s dataset (but you don’t have the
actual data), and you are trying to determine which of the patterns below best describes the values of one
of the variables in the dataset. Which descriptive statistic would help you?
a) mean b) standard deviation
c) skewness
d) kurtosis
10) When should a histogram be used to describe the distribution of the values of a variable instead of a
frequency distribution?
a) when the variable is a text/character variable
b) when the variable is an ordinal numerical variable
c) when the variable is a measurement numerical variable
d) when the variable gives the names of the key epochs in European history
11) Suppose you see the following commands in a colleague’s SAS program. What do these programs
produce?
proc gchart data=dataset02;
vbar Revenue / levels=7;
run;
a) a frequency distribution with 7 categories for variable “levels”
b) a histogram with 7 bars for variable Revenue
c) descriptive statistics for the 7 sub-types of Revenue in the dataset
d) a printout summary of the Revenue data in a table with 7 columns
12) Suppose you want to use Proc Print to print the data from dataset02 for only variables A, B and C to the
output window of SAS, and you want to print only the rows of data for which B = 4. Which Proc Print command
below would accomplish this?
a)
b)
proc print data=dataset01;
where A B C;
var B=4;
run;
d)
c)
proc print data=dataset02;
var A B C;
where B=4;
run;
e)
proc print data=dataset02;
where A, B=2, C;
run;
g)
f)
proc print data=dataset02;
where A B C;
var B=4;
run;
h)
proc print data=dataset01;
var A B C;
var B=4;
run;
proc print data=dataset01;
var A B C;
where B=4;
run;
proc print data=dataset02;
var A=all, B=2, C=all;
run;
i)
proc print data=dataset02;
where C=2;
run;
proc print data=dataset01;
where A B C;
where B=4;
run;
4
Download