You will need to download all of the files from http://www.calstatela.edu/faculty/pthomas/CIS301/W12/PASW in order to do the exercises here. Many Excel users are not familiar with, or are intimidated by Pivot Tables, one of the most powerful features in Excel. A pivot table is a great reporting tool that sorts and sums independent of the original data layout in the spreadsheet. First, set up a create some data, in A1:D50, like this, with 4 or 5 different names, 4 or 5 different activities and a little variety of week numbers and expenses: Who Joe Beth Janet Joe Joe Janet Joe Beth Janet Joe Joe Beth Beth Joe Joe Janet Joe Beth Janet Joe Week 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 What Beer Food Beer Food Beer Car Food Beer Beer Car Beer Food Beer Food Beer Car Food Beer Food Beer Amount 18 17 14 12 19 12 19 15 19 20 16 12 16 17 14 19 17 20 18 14 Joe Janet Janet Janet 5 3 4 5 Food Beer Car Food 12 18 17 12 Add as many rows as you can stand -- around 50 will do. Now choose any cell in this table and select the Insert Tab on the Ribbon. Now select, Pivot Table on the Ribbon. Excel asks for the data source and suggests this table. Use the Defaults and Click OK. Next question is the data range. Excel suggests the table. If you expect to add data in the future, set the data range to include as many rows as you think you will ever need. Rather than A1:D50, you may want to specify $A$1:$D$500. Now comes the layout wizard, show below. Drag the headers Who, Week and What into the ROW area, and the Amount header into the Data area. (Leave the Column area blank for now.) If the Amount tag does not show "Sum of Amount", double-click it and choose the Sum option. Now you have your table, and it looks very much like a sorted version of the original data list, except from the automatic subtotals. Now comes the cool stuff: Grab the What header in the table and drag it all the way to the left. When you drop it here, the table re-sorts and re-sums; you have a table of beer costs by person instead. Now drag the Week header to the left and you have a weekly report. Double-clicking the headers gives options of showing/hiding specific data (like Empty and Beer, may come in handy) and removing subtotaling for this column. Right-clicking gives other options, among them Hide and Show Detail for reading totals only. Here comes another useful pivot, made from the same list. Select any item in the original data list and choose Pivot Table wizard again. This time, drag Who into the Row field, What into the Column field and Amount into the Data field. The only tricky thing is this: You have to update the table manually from the Data menu. A Pivot table does not update itself. If this becomes boring, here is some macro code that makes the tables update on selecting the worksheet: Sub Auto_Open() Application.OnSheetActivate = "UpdateIt" End Sub Sub UpdateIt() Dim iP As Integer Application.DisplayAlerts = False For iP = 1 To ActiveSheet.PivotTables.Count ActiveSheet.PivotTables(iP).RefreshTable Next Application.DisplayAlerts = True End Sub SPSS Crosstabs A crosstab (short for cross tabulation) is a summary table, with the emphasis on summary. Here’s an example: Notice that the rows contain one set of categories (employment category) while the columns contain another (gender). In this crosstab, the cells contain counts, but in others you can use percentages, means, standard deviations, and the like. Here’s the important part: crosstabs are used for only categorical (discrete) data, that is, groups like employment categories or gender. You can’t use a crosstab for continuous data like temperature or dosage or income. BUT, you can change data like temperature or dosage or income into categories by creating groups, like income less than $25,000, income between 25000 and 49999, income 50000 or higher. We’ll discuss these data conversions known as transformations or recodes later. For now, you just need to understand that crosstabs deal with groups or categories. Now that you’ve seen the various windows you’ll be using, we’ll move on to the techniques you’ll use in SPSS for managing your data files. Creating primary reference lists There is one set of outputs you’ll create that is more important than anything else, and that is the set of primary references. Primary references describe your overall data set. In other words, how many in all? How many in each category? What are the maximums and minimums? Means? Standard deviations? Here’s our rule: list out the summaries and put them somewhere where you refer to them quickly. You may not always want to print out all the details of your data set. For example, printing out every single income for a data set of one million people, would not be useful, economical, or nice to either your printer or the trees. So here are the basic rules: print frequencies for categorical variables and descriptive (also called univariate) statistics for continuous variables. In this exercise, we’ll use the sample Employee dat.sav file. Frequencies 1. If it’s not already open, navigate to the location where you saved the Employee data.sav file and open it. 2. From the menu, select File > New > Output. 3. From the menu, select Analyze > Descriptive Statistics > Frequencies. 4. Double-click Gender, Employment Category, and Minority Classification to move them to the Variables list. 5. Click the check box labeled Display frequency tables. 6. Click Statistics. 7. Make sure all the check boxes are cleared (not checked). 8. Click Continue. 9. Click Charts. 10. If it is not already selected, select None by clicking it. 11. Click Continue. 12. Click OK. 13. From the menu, select File > Save As. 14. Navigate to the location where you are saving your files and save the file as AllFreqs. Statistics Employment Minority Category Classification Gender N Valid Missing 474 474 474 0 0 0 FREQUENCIES Gender Frequency Percent Valid Valid Percent Cumulative Percent Female 216 45.6 45.6 45.6 Male 258 54.4 54.4 100.0 Total 474 100.0 100.0 Employment Category Frequency Percent Valid Clerical Valid Percent Cumulative Percent 363 76.6 76.6 76.6 Custodial 27 5.7 5.7 82.3 Manager 84 17.7 17.7 100.0 474 100.0 100.0 Total Minority Classification Frequency Percent Valid Valid Percent Cumulative Percent No 370 78.1 78.1 78.1 Yes 104 21.9 21.9 100.0 Total 474 100.0 100.0 Descriptive statistics: descriptives (univariate) The next step is to print the descriptive or univariate statistics for the continuous variables. 1. From the menu, select File > New > Output. 2. From the menu, select Analyze > Descriptive Statistics > Descriptives. 3. Click Reset to clear any previous selections. 4. Double-click Current Salary, Beginning Salary, Months since Hire, and Previous Experience to move them to the Variables list. 5. Click Options. 6. In the Descriptives: Options window, click Mean, Std. deviation, Variance, Range, Minimum, Maximum, Kurtosis, and Skewness 7. Click Continue. 8. Click OK. 9. When the resulting table is displayed, notice that the variables you selected are listed as rows, while the statistics are listed in columns. 10. From the menu, select File > Save As. 11. Navigate to the location where you saved your files and save the file as AllDescriptives. 12. Notice that the statistic Range displays the difference (distance) between the minimum and maximum. Descriptives Descriptive Statistics N Statistic Current Salary Beginning Salary Months since Hire Previous Experience (months) Valid N (listwise) Range Statistic 474 $119,250 Minimum Maximum Statistic Statistic Mean Statistic Statistic $15,750 $135,000 $34,419.57 $17,075.661 474 $70,980 $9,000 474 35 63 98 81.11 474 476 0 476 95.86 474 Std. Deviation $79,980 $17,016.09 Variance Statistic Skewness Statistic Std. Error Kurtosis Statistic Std. Error 2.916E8 2.125 .112 5.378 .224 $7,870.638 6.195E7 2.853 .112 12.390 .224 10.061 101.223 -.053 .112 -1.153 .224 104.586 10938.281 1.510 .112 1.696 .224 Measuring differences Typically, differences in one or more continuous dependent variables based on differences in one of more categorical variables are evaluated using a t-test or an analysis of variance. If we have only one continuous dependent variable and only one categorical independent variable with no more than two values, the t-test can be used to look for differences. If we have more than one dependent continuous variable or more than two values across the categorical independent variables or we have both categorical and continuous independent variables, we need to use an analysis of variance (ANOVA). T-Tests There are three types of t-tests: An Independent Samples t-test is used when cases are randomly assigned to one of two groups. After a differential treatment has been applied to the two groups, a measurement is taken which is related to the effect of the treatment. The t-test is calculated to determine if any difference between the two groups is statistically significant. A Paired Samples t-test can be used to evaluate differences between two groups who have been matched on one or more characteristics or evaluate differences in before/after measures on same person. If you want to use pre/post measures, make sure the post-test is the same as the pre-test. A One Sample t-test is used to evaluate whether the mean of a continuous dependent variable is different from zero. To test if the mean is different than some other value, subtract that value from each observation and the test to see if the mean of the new values is zero. In the independent samples t-test, the purpose of randomly assigning the people to groups is to control for the effect of other differences between people that might have affected the effect we’re measuring. Using the matched pairs has the same goal, but now the theory is that we have matched the subjects on the concomitant variables. What if you can’t randomly assign people to groups and you can’t match them? For example, if you want to examine salary differences based on gender? One thing you can do is assume that the only difference between people that could result in a difference in income is gender. With that assumption in hand, we can do an independent sample t-test to see if there’s a gender difference in income. If you’re one of those folks who think that assumption doesn’t pass the “smell” test, you can go back to our earlier friend the correlation, put gender in as a binary variable (where 1 is female and 0 is not female), and look at the correlation. This does not mean that gender has caused the difference in salaries, but you can measure how well gender is associated with income level. More sophisticated analyses could be done by using partial correlations or multiple regression to control for concomitant factors. How to: In this exercise, you’ll split a file based upon a single variable and test the set of cases for each value of the variable as a separate sample. First you’ll instruct SPSS to treat each group (in this case, each result from a specific machine) as a separate sample; this procedure is called splitting the file. Then you’ll run the T-test to determine whether all the machines are meeting the production specifications. 1. From the menu, select File > Open > Data. 2. Navigate to the folder where you downloaded the brakes.sav file. 3. From the menu, select Data > Split File to open the Split File window 4. Click Compare Groups. 5. Select Machine Number and move it to the Groups Based on pane. 6. Click OK. Even though you haven’t created separate files, you have instructed SPSS to treat each group within the file as if it were a separate sample, with a group being defined by the machine number. 7. From the menu, select Analyze > Compare Means > One-Sample T Test. You’re going to test against a known value which in this case is the diameter of the disc brake. 8. Select Disc Brake Diameter and move it to the Test Variables pane. 9. Select the text in the Test Value field and type: 322 10. Click Options to set the confidence level. 11. Change the confidence interval to 90. 12. Click Continue. 13. Click OK. The test results are displayed in the output window. GET FILE='C:\Documents and Settings\pthomas\My Documents\Brakes.sav'. DATASET NAME DataSet1 WINDOW=FRONT. SORT CASES BY machine. SPLIT FILE LAYERED BY machine. T-TEST /TESTVAL=322 /MISSING=ANALYSIS /VARIABLES=brake /CRITERIA=CI(.90). One-Sample Statistics Machine Number N Mean Std. Deviation Std. Error Mean 1 Disc Brake Diameter (mm) 16 321.998514 .0111568 .0027892 2 Disc Brake Diameter (mm) 16 322.014263 .0106913 .0026728 3 Disc Brake Diameter (mm) 16 321.998283 .0104812 .0026203 4 Disc Brake Diameter (mm) 16 321.995435 .0069883 .0017471 5 Disc Brake Diameter (mm) 16 322.004249 .0092022 .0023005 6 Disc Brake Diameter (mm) 16 322.002452 .0086440 .0021610 7 Disc Brake Diameter (mm) 16 322.006181 .0093303 .0023326 8 Disc Brake Diameter (mm) 16 321.996699 .0077085 .0019271 One-Sample Test Test Value = 322 Sig. (2tailed) df Mean Difference 90% Confidence Interval of the Difference Machine Number t Lower Upper 1 Disc Brake Diameter (mm) -.533 15 .602 -.0014858 -.006375 .003404 2 Disc Brake Diameter (mm) 5.336 15 .000 .0142629 .009577 .018948 3 Disc Brake Diameter (mm) -.655 15 .522 -.0017174 -.006311 .002876 4 Disc Brake Diameter (mm) -2.613 15 .020 -.0045649 -.007628 -.001502 5 Disc Brake Diameter (mm) 1.847 15 .085 .0042486 .000216 .008282 6 Disc Brake Diameter (mm) 1.134 15 .274 .0024516 -.001337 .006240 7 Disc Brake Diameter (mm) 2.650 15 .018 .0061813 .002092 .010270 8 Disc Brake Diameter (mm) -1.713 15 .107 -.0033014 -.006680 .000077 Analysis of Variance If we have more than one dependent continuous variable or more than two values across the categorical independent variables or we have both categorical and continuous independent variables, we need to use an analysis of variance to measure differences. There are a number of particularly useful special cases for the general analysis of variance model. If we have only one dependent continuous variable and one independent categorical variable we can use a One-Way Analysis of Variance or One-Way ANOVA. If we have only one dependent continuous variable, but more than one independent categorical variable we can use a general Univariate Analysis of Variance or Univariate ANOVA. If we have one or more independent continuous variables, we can use the Univariate Analysis of Covariance or Univariate ANCOVA. More advanced models are available for more than one dependent continuous variable. If the model has one or more independent categorical variables, we would use a Multivariate Analysis of Variance or MANOVA. If the model also included one or more continuous independent variables, we would use a Multivariate Analysis of Covariance or MANCOVA. One-Way ANOVA One-way analysis of variance is an extension of the t-test in that, in a t-test you have two groups, one that received a treatment and one that did not, while in a oneway analysis of variance you have more than two groups where the groups received different variation of the same treatment. For example, a treatment factor could be whether you fry donuts in vegetable oil, butter, or lard. How to: In this exercise, you’re going to identify the most effective number of training hours to achieve a given result, with results scored between zero and 100, 100 being the optimal score. 1. From the menu, select File > Open > Data. Navigate to the folder where you saved the Training.sav file. 2. First you’ll graph the means and standard error, so select Graphs > Legacy Dialogs > Error Bar 3. You’ll use the default choices, so click Error bar graph: defining summaries for groups of cases 4. Select Skills Training group and move it to the Category Axis field. 5. Select Score on training exam and move it to the Variable field. 6. In the Bars represent field, open the drop-down list and select Standard error of mean. 7. Click OK. The results are displayed in the output window. Note that the three bars do not represent equal variance among the three groups; instead, variance decreases as days of training increase. These data may not be appropriate for ANOVA. In the next step, you’ll run the ANOVA to test the data. 8. From the menu, select Analyze > Compare Means > One-Way ANOVA. 9. Move Score on training exam to the Dependent List. 10. Move Skills training group to the Factor list. 11. Click Options. 12. Select Descriptive and Homogeneity of variance test by clicking their check boxes. 13. Click Continue. 14. Click OK. The results appear in the output window. Descriptives Score on training exam N 1 2 3 Total Std. Deviation Mean 20 20 20 60 63.5798 73.5677 79.2792 72.1422 Std. Error 13.50858 10.60901 4.40754 12.00312 95% Confidence Interval for Mean Lower Bound Upper Bound Minimum Maximum 3.02061 2.37225 .98556 1.54960 57.2576 68.6025 77.2165 69.0415 69.9020 78.5328 81.3420 75.2430 Test of Homogeneity of Variances Score on training exam Levene Statistic 4.637 df1 df2 2 Sig. 57 .014 ANOVA Score on training exam Sum of Squares Between Groups Within Groups Total Mean Square df 2525.691 2 1262.846 5974.724 8500.415 57 59 104.820 F 12.048 Sig. .000 32.68 47.56 71.77 32.68 86.66 89.65 89.69 89.69