Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language Preliminaries • Create a Folder: c:/Stat6337 – Send to the Desktop • Access Blackboard • Download the Eysenck Data File • Download the lecture7Eysenck.sas File • Download the lecture7class.sas File • Download the lecture7SASSummary.doc File 2 Eysenck’s Data File Age Group old old old old old old old old old old young young young young young young young young Counting 9 8 6 8 10 4 6 5 7 7 8 6 4 6 7 6 5 7 Recall Condition Rhyming Adjective Imagery 7 11 12 9 13 11 6 8 16 6 6 11 6 14 9 11 11 23 6 13 12 3 13 10 8 10 19 7 11 11 10 14 20 7 11 16 8 18 16 10 14 15 4 13 18 7 22 16 10 17 20 6 16 22 Intentional 10 19 14 5 10 11 14 15 11 11 21 19 17 15 22 16 22 22 3 Open the SAS Program • Double-click the lecture7.sas File – Press the Run Icon (Runner Image) • Editor – Create and Modify SAS Command Files – Can Save in the Stat 6337 Folder : File / Save As … • Log – Messages about the Compilation and Execution of the SAS Program – Contains Error Messages (in red), if any – Can Save in the Stat 6337 Folder : File / Save As … • Output – Results of the Execution of the SAS Program – Can Save in the Stat 6337 Folder : File / Save As … To Erase the Contents of the Log or Output Files Right Click, Select “Clear All” 4 SAS Structure • DATA Step – Describe the data, provide names for variables, define new or transformed variables • PROCs : SAS Procedures – – – – – – – Descriptive Statistics: Proc Univariate, Proc Means Graphics: Proc Chart, Proc Plot Regression: Proc Reg Two-sample t-tests: Proc Ttest Analysis of Variance: Proc Anova, Proc GLM, Proc Mixed Specialized Data Operations: Proc Sort etc. 5 SAS Syntax • Every command MUST end with a semicolon – Commands can continue over two or more lines – This WILL be Your #1, #2 & #3 Mistakes !!!! • Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters – Note: values for character variables can exceed 8 characters • Comments – Begin with *, end with ; – Can comment several lines: begin with /* and end with */ 6 Data Input in the SAS File • Data fname ; – creates temporary file with the data that are described in the data step • Input name . . . name $ . . . ; – list input: lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable – name MUST be followed by $ if name is a character variable – alternatives: comma separated, column specified • Datalines (or Cards) ; – indicates that the data follow, line by line • ; – indicates that the last line of data has been input, the semicolon is on a line by itself • Example: lecture7class.sas – Open lecture7class.sas » Change filename, if necessary – Clear output and log files; Run lecture7class.sas 7 Data Input with Multiple Responses on a Single Line of the Data File • SAS Requires that Each Response Value be on a Separate Line of Data • When n Responses are on One Line of Data – – – – – Input y1 y2 … yn y = y1; output; y = y2; output; ... y = yn; output; Creates n Data Lines with 1 Response Value on Each Line • If y1 …yn Represent Responses for n Levels of a Factor – – – – – Input y1 y2 … yn factor = ‘Level 1’; y = y1; output; factor = ‘Level 2’; y = y2; output; ... factor = ‘Level n’; y = yn; output; Creates n Data Lines with 1 Factor & Response Value on Each Line • Example: lecture7.sas – Data Flow2 8 Data Input from an External File • Filename fn ‘complete directory/file specification’ ; – e.g., filename eysdata ‘c:/Stat6337/EysenckRecall.dat’ – Be Careful with Spaces in Directories and File Names !!! • Data fname ; – creates temporary file with the data that are described in the data step • Infile fn ; – input the data from the file labeled fn • Input name . . . name $ . . . ; – lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable – name MUST be followed by $ if name is a character variable • Run ; – indicates that the data step is completed • Example: lecture7class.sas – Data Recall 9 Program Data Vector • One line of data is stored, as indicated on the Input statement of the Data Step • Any calculations, deletions, etc. in the Data Step are performed on that line of data • When the Data Step is completed, the variables in the Program Data Vector are output to a temporary (work) file • Can force data lines to be written at any time with the Output statement 10 Operations in the Data Step • Arithmetic Operations – x=u+v; • Transformations – x = log(y) ; • Logical – If x > 0 then z = y/x ; • Recoding – If gender = ‘m’ then gender = ‘Male’; else if gender = ‘f’ then gender = ‘Female’; – Note: SAS formats based on the first value of a variable – To force a length (e.g., character variable), use length 11 Titles and Labels • Title# ‘…’ ; – Up to 10 title lines: title# ‘include your title here’; – Can be placed in Data Steps or Procs – Changing Title# replaces that title and eliminates Titlex, where x># • Label name = ‘…’ ; – Can be in a Data Step or Proc Print 12 Some Useful PROCs • Proc Chart – vertical or horizontal bar charts • Proc Freq – frequency distributions, cross tabs • Proc Means – select summary statistics • Proc Plot – scatterplots • Proc Print – prints data files • Proc Sort – sorts data files by the values of one or more variables • Proc Univariate – a wide range of summary statistics, box plots 13 General Form of PROCs PROC xxxx data=fname options; by groups; proc-specific statements; title . . . ; output out = fn . . . ; run ; 14 Printing to the Output File • Proc Print data = fname ; – var . . . ; omitted) – run ; complete lists the variables to be printed (can be indicates the print commands are 15 Group Analyses • Sort the Groups – Proc Sort data= … ; – by group; – run; • Execute the Proc, by Group – Proc xxx data= … ; – by group; – ... – run; 16 Summarize the Recall Data Calculate the average, standard deviation, minimum, and maximum to 2 decimal places Proc Means Graph a histogram of the recall data Proc Chart Calculate frequencies for each condition/group and each age Proc Freq 17 Summarize the Recall Data Calculate descriptive statistics for each condition/group Proc Means, Proc Univariate Note: Sort First, then Use the BY Command. Graph Average Recall for All Combinations of Recall Condition/Group and Age Use a Group Identifier as the Plotting Symbol Proc Plot 18 Proc Anova • Only for Complete Factorial Experiments in Completely Randomized Designs – Otherwise: Proc GLM • MUST have an Equal Number of Repeats for Each Factor-Level Combination 19 Proc Anova • Proc Anova data = fn ; – By … ; » Separate ANOVA Fits for Each Value of the BY variable(s). – Class … ; » List all the factors. – Model … / options; » e.g., model recall = age group age*group ; • factors: list individually; e.g. age group • interactions: connect with asterisk(s); e.g., age*group – Means … / options; » e.g., means age group age*group / t bon; – Run; 20 Eysenck’s Study of Incidental Learning Make analysis of variance calculations, use only recall condition as factor. Calculate factor-level averages, with the t option. 21 Effect of Cocaine Usage on Newborn Infant Body Lengths Usage Groups: First Trimester Throughout Pregnancy Drug-Free Research Question: Do Mean Body Lengths (cm) Differ by Cocaine Usage? 22 Effect of Cocaine Usage on Newborn Infant Body Lengths First T hroughout Ca se T rime ste r P re gna ncy D rug-Fre e 1 45.1 40.2 44.3 2 45.7 41.3 45.3 3 45.8 41.7 46.9 4 46.7 41.9 47.0 5 47.3 43.4 47.2 Ave ra ge 46.12 41.70 46.14 23 Assignment • Create a Data File • Input the Data File into a SAS Program • Cocaine Usage Groups – Calculate Averages and Standard Deviations – Make Comparative Box Plots – Test the Equality of the Group Means • Email Me ONLY the FINAL .log File 24