UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Homework 5 Solutions Import the Data and Run Proc Contents See SAS program at end of this HW05 solutions handout. Based on the results of Proc Contents in the output window: o How many rows of data are in dataset01? 100 o How many variables are in dataset01? 10 o Which of these variables are character/text variables? CntyName, GeoRegion and UnempCat Using Proc Sort and Proc Print Which NC county name is first in alphabetical order? Alamance Which is last? Yancey What was the population in each of these two counties in 2000? Alamance 130,800; Yancey, 17,774. What about the unemployment rate in each of these two counties in 2000? Alamance 2.8; Yancey, 3.9. Which NC county had the largest population in 2000? Mecklenburg The smallest? Tyrrell Which county had population (PopCens) equal to 160,307 ? New Hanover Which NC county had the largest value of ManfJobs in year 2000? Catawba The smallest? Currituck, and maybe Graham and Gates (which had missing values in the dataset, but these could have been zeros that were mistakenly left blank by the person inputting the data, causing us to wonder whether these counties actually had zero ManfJobs, or the data for these counties are actually missing, which seems unlikely . . .notice the importance of “zero” vs. “missing” in a data set ) What was the value of ManfJobs for NewHanover county? 57.4 (this might seem low, but recall that this is the number of manufacturing jobs per 1000 people in the county) Which NC county in the "coast" region had the largest manufacturing employment per 1000 population in year 2000, and what was the employment? Bertie, 151.87 Which NC county in the "mountain" region had the largest manufacturing employment per 1000 population in year 2000, and what was the employment? Caldwell, 220.61 Using Proc Means What was the smallest unemployment rate in year 2000 in a North Carolina county? 1.30 What was the largest? 13.00 What was the mean county unemployment rate? 4.62 What about the median? 4.10 Is the mean different from the median? Yes If so, what is this telling you? A few counties with very high unemployment are pulling up the mean, away from the median. What is the coefficient of variation (CV) of unemployment rate across NC counties? 47.43 What is this telling you? SAS reports the CV as a percentage of the mean. So, a CV of 47.43 means that the standard deviation of the unemployment rates is 47.43% of the mean unemployment rate. Compare the CV’s of ManfJobs, ConstJobs, ServJobs, FarmJobs. What does the comparison of CV’s tell you? Recall that CV is a measure of variation, or spread, in the values of a variable. The relatively high CV for FarmJobs means that there is relatively large variation in the number of farm jobs (per 1000 population) across counties, meaning that some counties have many farm jobs and some have few, whereas there is less variation in other types of jobs across counties. There is relatively little variation in construction jobs, meaning that construction jobs (per 1000 population) are spread more evenly across counties. 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Why can’t we use Proc Means to analyze variables CntyName, GeoRegion and UnempCat? We can’t use Proc Means to analyze these three variables because they are text/character variables; Proc Means only works with numerical variables. Which GeoRegion had the county with the largest value of ManfJobs, and what was the value? The “Plain” region, 310.53 ManfJobs per 1000 population. The counties of which GeoRegion had the largest mean value of ManfJobs? The “Plain” region, 110.11 ManfJobs per 1000 population. Which GeoRegion had the greatest variation of ManfJobs across the counties in the region? The “Coast” region, because this region had the highest Coefficient of Variation (CV). Using Proc Gchart What number (frequency) of counties is in the "mountain" region of North Carolina? NOT GRADED. [Note: There was a type-o in the homework questions. The questions should have asked you to make a vertical frequency chart for GeoRegion rather than UnEmpCat. If you had made such a chart for GeoRegion, then you could see from the chart that the number of counties in the “mountain” region of North Carolina is 23.] What percentage of counties is in the “high” UnEmpCat? The percentage of counties in the “high” UnEmpCat category is 26%. How many counties (frequency) are in the ‘Low’ category of UnempCat? 47 counties What percentage of counties is in the ‘High’ category? 26 % What does the cumulative percent value for the ‘Med’ category of UnempCat tell us? The cumulative percent value of 74% for the ‘Med’ category of UnEmpCat tells us that 74% of North Carolina categories are in either the Low or Med categories of UnEmpCat. Multiple Choice Answers 1) b 2) c 3) c 4) b 5) c 6) c 7) c 8) c 9) d 10) c 11) b 12) b or e (mistake in question, should have been only b) SAS Program HW05.sas /* SOFTWARE: SAS Statistical Software program, version 9.2 AUTHOR: Dr. Chris Dumas, UNC-Wilmington, Sept. 2015. TITLE: "HW05.sas" program. */ options options options options helpbrowser=sas; number pageno=1 nodate nolabel font="SAS Monospace" 10; leftmargin=1.00 in rightmargin=1.00 in; topmargin=1.00 in bottommargin=1.00 in; proc import datafile="v:\ECN377\FewerVariables.xls" replace; run; dbms=xls out=dataset01 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas proc contents data=dataset01; run; proc sort data=dataset01; by CntyName; run; proc print data=dataset01; var CntyName PopCens ManfJobs UnempRate; run; proc sort data=dataset01; by PopCens; run; proc print data=dataset01; var CntyName PopCens; run; proc sort data=dataset01; by ManfJobs; run; proc print data=dataset01; var CntyName ManfJobs; run; proc sort data=dataset01; by GeoRegion ManfJobs; run; proc print data=dataset01; var CntyName GeoRegion ManfJobs; run; proc means data=dataset01 maxdec=2 N MAX MIN MEAN MEDIAN CV; var PopCens ManfJobs ConstJobs ServJobs FarmJobs UnempRate; run; proc sort data=dataset01; by GeoRegion; run; proc means data=dataset01 maxdec=2 MAX MEAN CV; var ManfJobs; by GeoRegion; run; proc Gchart data=dataset01; vbar GeoRegion; vbar UnempCat / type=pct; run; proc Gchart data=dataset01; hbar UnempCat / midpoints = run; 'Low' 'Med' 'High'; 3