SAS for Categorical Data Copyright © 2004 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law. SAS SAS is a huge integrated data management and analysis suite. It takes years to master 20% of SAS. Most people take weeks if not months to get comfortable working with it. The course I teach has online slides which demonstrate how to do categorical data analyses as well as data management. http://www.stanford.edu/class/hrp223/ Topic 0 has information on using SAS as a calculator. Using SAS When you start SAS in a windowing environment you automatically have access to at least 4 windows. The (enhanced) program editor is a place where you type instructions to SAS. The log window gives you feedback on how SAS interprets your work. The output window displays any printed results from your request. There is also a two pained window. One, Explorer, allows you to look at data sets. The other, Results, acts like a hyperlinked table of contents for the output window. Telling SAS what to do. You type instructions in the program editor and then push the run button. The instructions you will use for this class will be data steps (to create data sets) and procedures (to analyze data sets). Data steps In data steps you can create variables (a variable is just like a box that can hold either numbers or letters). You can do math on variables including using functions that are build into SAS. data work.someData; theAnswer = 1 + 1; run; After you type the instructions you have to tell SAS to actually do the work. Push the running person icon to do this. Data steps The above code will create a data set that will exist until you quit SAS. You can view it as if it was a spreadsheet by double clicking on Libraries then the Work library and finally the data set inside the SAS Explorer window. Functions SAS has thousands of functions built in: data work.blah; numberOne = 1; someTrigThing = sin(numberOne); run; I have tried to document the ones that students frequently need in Lecture 2 of 223. Take a look at the slides labeled Frequently Used Functions. Finding fuctions … or you can look up the function in the SAS online documentation. One of the useful links in the useful links section of the class website http://www.stanford.edu/class/hrp223/2002f/usefulLinks.html is the SAS online documentation. The URL of SAS OnLineDoc is: http://v9doc.sas.com/sasdoc/ If you enter a bad password 3 times and it will take you to the registration page. Access to the documentation is free. Example of a Function If you roll a die 50 times what's the chance that you'll get more than 10 "6"'s? data work.pfft; x = 1 - CDF('BINOMIAL',10, 1/6, 50) ; run; Procedures SAS has many built in statistical analysis procedures. The ones you will use for this class are: proc freq – contingency tables See 223 topics 12 and 13 proc logistic – logistic regression See 223 topics 14 and 15 Real data looks like this: data work.epi; input subjectID exposure $ disease $; datalines; 1 exposed Diseased 2 exposed Diseased 3 exposed Diseased 4 exposed Diseased 5 exposed notDiseased 6 notExposed notDiseased 7 exposed Diseased 8 exposed Diseased 9 exposed Diseased 10 notExposed notDiseased 11 exposed notDiseased 12 exposed Diseased 13 notExposed Diseased 14 notExposed notDiseased 15 exposed Diseased 16 exposed Diseased 17 exposed notDiseased 18 notExposed notDiseased 19 exposed Diseased 20 exposed Diseased 21 exposed Diseased 22 notExposed notDiseased 23 exposed notDiseased 24 exposed Diseased 25 notExposed notDiseased 26 exposed notDiseased 27 notExposed notDiseased 28 exposed Diseased ; run; Contingency tables You can get a frequency table like this: proc freq data = work.epi; tables exposure * disease; run; Contingency tables analysis proc freq data= epi; tables exposure*disease /chisq; run; Grouped Data You will get grouped data in statistics classes… In a case-control study of 50 patients with pancreatic cancer and 50 hospital controls, 15 patients and 25 controls are non-coffee-drinkers, 15 patients and 10 controls are mid-level coffee drinkers, and 20 patients and 15 controls are high-octane coffee addicts. What are the odds ratios for the association between coffee drinking and pancreatic cancer (comparing high to low, high to none, low to none, and any to none)? Grouped Data data work.epi; input exposure $ disease $ people; datalines; notExposed diseased 15 notExposed notDiseased 25 little diseased 15 little notDiseased 10 lots diseased 20 lots notDiseased 15 ; run; Problems… proc freq data = epi; tables exposure * disease; run; Weighted data proc freq data = epi; weight people; tables exposure * disease; run; Analysis of weighted data proc freq data = epi; weight people; tables exposure * disease /relrisk; where exposure in ("notExpos", "lots"); run; Other groups Just copy and paste the proc freq and pick different groups. To get the combined groups use a character format (if you took 223) or just add the two exposed groups by hand. Formats in Freq proc format; value $coffee "lots" = "Exposed" "little" = "Exposed" "notExpos" = "notExpos" ; run; proc freq weight format tables run; data = epi; people; exposure $coffee.; exposure * disease /relrisk; Analysis of Formatted Grouped Data