Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the first 10 counties Include: ID, County, Number of reporting Units (v1), Number of employees (v2), Payroll (v3) Save to your flash drive as ‘countydata’ The ® SAS System Statistical Analysis Programming Introduction to SAS® Arguably the most popular computer software for conducting statistical data analysis Does both data management & statistical analysis Useful for managing even the most complex data sets Operates on its own language Introduction to SAS® Open the SAS® Window Introduction to SAS® You essentially have 4 windows within SAS: The Explorer Sidebar Window The Log Window The Editor Window The Output Window You can resize and reconfigure these windows, and minimize & maximize as you would in any windows-based program Introduction to SAS® The Editor Window is for constructing & running programs “Programming” in SAS involves writing out step-by-step instructions in the correct order in a format the SAS System can understand The program you write must be perfect SAS will give you error messages SAS® Programming Three major components to most SAS programs: Input Manipulation Output SAS® Programming Input Most of the time data are placed into a data file and inputted into the program The program tells the system which variables are located in which columns SAS® Programming: Input Input data & column locations SAS® Programming Manipulation Data are then manipulated to accomplish the tasks for which the program was written: transforming or combining variables or conducting statistical or other analyses SAS® Programming: Manipulation Manipulate the Data SAS® Programming Output Program Output The results of the program are then outputted into the Output Window You must save these results Log SAS® Programming: Output SAS® Programming: Log SAS® Programming Basic Input Statement = “DATA Step” Begins with an “options” statement that formats what the output page will look like Names the temporary data set location Tells SAS where to find your actual data set “data1,” “data 2,” etc. or text name (8 characters max) File location Gives the “Input” – or, column locations for your variables SAS® Programming: Input Temporary Data Set Input Column Locations Options Data Location SAS® Programming Basic Input Statement After your input statement, you add statements to transform or manipulate the data Add statements to perform analysis procedures Ends with a RUN statement SAS® Programming: Input Analysis Procedure Data Manipulations & Transformations SAS® Programming: Syntax SAS Statements Commands or instructions that can be interpreted by the SAS system These commands appear as blue text in the Enhanced Editor window DATA, PROC, PUT, INPUT, RUN, etc. SAS® Programming: Syntax Every SAS statement must end in a semicolon; This is how the system knows the statement is complete One of the most common errors is omitting semicolons Comments begin with an asterisk * SAS® Programming: Syntax In the Enhanced Editor: Plain text is black Numerical values are teal SAS Statements are blue Errors are red Basic arithmetic functions can be used (+, -, *, /) SAS® Programming: Logical Operators Symbol Abbreviation Operation = eq equal to ^= or ~= ne not equal to > gt greater than < lt less than >= or => ge greater than or equal to <= or =< le less than or equal to & and and | or or Building a SAS® Program 1. 2. 3. Open the SAS Program and Click inside the Editor Window Add your “options” statements: options nocenter nonumber nodate linesize=88 pagesize=72; Add the “data” statement, then the name of your first temporary data file (data1) Building a SAS® Program Building a SAS® Program Add the “infile” statement, then the file location where your data is stored 4. Add the “input” statement, then each variable name followed by its numeric location 5. A dollar sign $ after a variable name signifies that the variable is character (text) data Recommend that you input data in 80 column lines, #2 would signify the start of a new line Building a SAS® Program Building a SAS® Program Add statements for data management or statistical analysis. 6. SAS Statements vary based on the task to be accomplished Data management: create new variables, change values, etc. Statistical procedures: frequencies, correlations, crosstabulations, regression, etc. Building a SAS® Program Building a SAS® Program Hands-On Exercise 1: Build a Basic SAS Program Using SAS, write a basic program for the county data set you created For your analysis, run a “print” command: Proc Print; var county v1 v2 v3; Exercise 1 SAS® Procedures PROC Commands SAS procedures that perform different operations use “PROC” commands A lot of different PROC commands, we’ll touch on a few of the most used Some for data management Some for statistical analysis SAS® Procedures PROC PRINT Prints the data you have in your temporary SAS data set Will print the variables you designate (either those from your initial INPUT statement, or variables you create) Helps you better understand your data set; helps you spot errors SAS® Procedures Proc Print; var v1 v2 v3; This statement tells SAS to print the data / information for v1, v2, and v3 If you run “PROC PRINT” without any variables designated, it will print ALL of your variables SAS® Procedures PROC PRINT You should run a proc print when you transform variables or create new variables to insure that the transformations were done correctly Example Create a new variable by adding two others: newvar = v1+v2; Proc print; var v1 v2 newvar; Check the output to insure that the operation is correct Variable Manipulations SAS will permit you to perform many different types of variable manipulations Add Variables newvar1 = v1+v2+v3; Subtract Variables newvar2 = v3 – v2; Variable Manipulations Multiply Variables Divide Variables newvar3 = v2 * v3; newvar4 = v2/v1; More complex transformations can be done following basic rules for arithmetic operations newvar5 = (v1+v2/v3)*4; Variable Manipulations You can also use your new variables in other transformations newvar6 = newvar4*newvar5 Create categorical variables You can reformat your data into new variables If you have a survey question with responses showing ‘year of birth’ you can convert it to ‘age’ Variable Manipulations Variable Manipulations For example, if you have a series of data for a variable: Variable name: “vexample” Values: 1 2 3 4 5 6 7 8 9 10 We want to create a categorical variable with the categories and corresponding values of: Low = 1 Medium = 2 High = 3 Variable Manipulations Give your new variable a name like “newvexample” or “vexamplecat” Your new categorical variable would be created with this if/then syntax: Variable Manipulations If your data is not as simple as 1 2 3 4 5 6 and so on, you can use the “PROC SORT” command to help you sort your data set Variable Manipulations Run a PROC SORT for v2, and then run a PROC PRINT to show the variable rearranged in ascending order Variable Manipulations Variable Manipulations Now, create a new variable “newv2” with the following categories: Low = 1 (values less than 100) Medium = 2 (values 100 to 500) High = 3 (values more than 500) Run a PROC PRINT and PROC FREQ to check your transformations Variable Manipulations Variable Manipulations IF/THEN Statements In the previous exercise, you saw how if / then statements can be used to create new variables If / then statements are very powerful and can be used in a number of ways to help you manage your data IF/THEN Statements Segmenting Data Sets – the IF statement Simple IF Statements The SAS “IF” command can be used to segment or partition your data set For example, suppose you only want to examine certain cases in your data set – only females, only people over age 55, only Florida counties with populations greater than 500,000, etc. IF/THEN Statements You can segment in this way, using the IF statement: If we only want to examine the number of reporting units in our sample for counties with a “low” number of employees: If newv2 is low looks like this in SAS language: IF newv2=1; IF/THEN Statements IF/THEN Statements Combining IF statements to segment data sets with the DATA command It is very useful to combine the IF command to segment data with the DATA command we learned earlier Recall that your initial data step started with the command: data data1; This created the initial temporary SAS data set IF/THEN Statements The temporary data set “data 1” contained all of the cases that you entered into your data set If you now want to examine only a subset of those cases, you can do that in a second data set: data data 2; set data1; This creates a second temporary data set called “data 2” (remember SAS allows a large number of data sets) IF/THEN Statements We can now use an IF statement to segment the data in our set “data 2” Let’s create a second data set that includes only counties with a “medium” number of employees Run a PROC PRINT to check the output IF/THEN Statements IF/THEN Statements The PROC PRINT shows us that the temporary data set we’re now dealing with has only the 5 counties with a “medium” number of employees IF/THEN Statements Hands-On Exercise Use the commands we’ve just learned to: 1. Create a new variable for high, medium, and low payroll amounts (newv3) 2. Use the DATA and IF statements to create a new data set that contains only those counties with the highest payroll for gasoline services stations – run a PROC PRINT to check your results IF/THEN Statements IF/THEN Statements IF/THEN Statements IF/THEN Statements The IF and THEN commands are most often used together with the operators we talked about before SAS® Programming: Logical Operators Symbol Abbreviation Operation = eq equal to ^= or ~= ne not equal to > gt greater than < lt less than >= or => ge greater than or equal to <= or =< le less than or equal to & and and | or or IF/THEN Statements More Complex IF statements Multiple IF statements can be connected using “and” or “or” statements to make more complex statements: if v1 eq 2 or v2 gt 5 and v3 ne 2 then newvar =1 IF/THEN Statements Using IF and THEN statements: The general form of this command (for creating new variables, separating data sets, etc.) is: IF variable condition exists (character indicator abbreviation: eq, ne, lt, le, ge) THEN new variable condition (numeric symbol) IF v2 eq 5 then newv2 = 1; Again, you can combine conditions for more complex statements IF/THEN Statements Add Variables & Cases Two other important data management functions that SAS can perform are adding additional cases or observations and adding new variables Add Variables & Cases Adding Cases The term for adding cases or observations is “concatenation” This allows you to add new cases to the bottom of your existing data set You simply create a second data set and add it to your initial data set Add Variables & Cases Initial Data Set Additional Cases Merged Set Add Variables & Cases Hands-On Exercise You have already created one data set of 10 counties 1. Create a new data set containing information for the next 4 counties (Collier, Columbia, De Soto, and Dixie) 2. Add these cases to your existing data set 3. Do a PROC PRINT for data3 to verify Exercise Add Variables & Cases Adding Variables Adding variables to your existing data is simple as well Again, you will need to create a second data set that will essentially add a column or columns to your initial data set The second data set will contain the new variable you are adding and one variable that matches exactly a variable in your initial data set – usually the sequential ID number (similar to Access) Add Variables & Cases To make sure that the data sets are properly combined, you must SORT the initial and second data set by the matching variable The syntax looks like this: Add Variables & Cases Initial Data Set Added Variables Merge SAS® Statistical Procedures Descriptive Procedure for Continuous Data PROC UNIVARIATE; Proc Univariate will provide basic descriptive information for continuous variables The syntax looks like this: SAS® Statistical Procedures SAS® Statistical Procedures Descriptive Procedure for Categorical Data PROC FREQ; Proc Freq will provide basic descriptive information for categorical or ordinal variables The syntax looks like this: SAS® Statistical Procedures SAS® Statistical Procedures Analytical Procedures for Continuous Data PROC CORR; Proc Corr provides an analysis of the association between two continuous variables Computes a correlation coefficient that demonstrates the level of association, as well as a p-value showing the significance of that association The syntax looks like this: SAS® Statistical Procedures Correlation coefficient p-value SAS® Statistical Procedures Analytical Procedures for Categorical Data PROC FREQ; Proc Freq can also be used to calculate the level of association between two categorical or nominal variables X2 can be added to assess the significance level of that association The syntax looks like this: DV IV SAS® Statistical Procedures Crosstab Table Chi-square analysis SAS® Statistical Procedures PROC FREQ can also be used in conjunction with DEVIATION to analyze the standard deviation Many SAS procedures like this have additional analyses that can be added in this way SAS® Statistical Procedures Multivariate Analysis: PROC REG; computes the association between a continuous dependent variable and numerous independent variables PROC LOGIT; computes the association between a categorical dependent variable and numerous independent variables SAS® Statistical Procedures Regression analysis: PROC REG; Uses the “model” command Construct your model with your dependent variable first, then your independent variables The syntax looks like this: SAS® Statistical Procedures SAS® Statistical Procedures These are only a few examples of the analyses you can do with SAS SAS can also do: Time series analysis Factor analysis ANNOVA T-tests …and more!