EXCEL-PASW

advertisement
You will need to download all of the files from http://www.calstatela.edu/faculty/pthomas/CIS301/W12/PASW in order to do
the exercises here.
Many Excel users are not familiar with, or are intimidated by Pivot Tables, one of the most powerful features in Excel.
A pivot table is a great reporting tool that sorts and sums independent of the original data layout in the spreadsheet.
First, set up a create some data, in A1:D50, like this, with 4 or 5 different names, 4 or 5 different activities and a little variety of week
numbers and expenses:
Who
Joe
Beth
Janet
Joe
Joe
Janet
Joe
Beth
Janet
Joe
Joe
Beth
Beth
Joe
Joe
Janet
Joe
Beth
Janet
Joe
Week
3
4
5
3
4
5
3
4
5
3
4
5
3
4
5
3
4
5
3
4
What
Beer
Food
Beer
Food
Beer
Car
Food
Beer
Beer
Car
Beer
Food
Beer
Food
Beer
Car
Food
Beer
Food
Beer
Amount
18
17
14
12
19
12
19
15
19
20
16
12
16
17
14
19
17
20
18
14
Joe
Janet
Janet
Janet
5
3
4
5
Food
Beer
Car
Food
12
18
17
12
Add as many rows as you can stand -- around 50 will do.
Now choose any cell in this table and select the Insert Tab on the Ribbon. Now select, Pivot Table on the Ribbon. Excel asks for the
data source and suggests this table. Use the Defaults and Click OK.
Next question is the data range. Excel suggests the table. If you expect to add data in the future, set the data range to include as
many rows as you think you will ever need. Rather than A1:D50, you may want to specify $A$1:$D$500.
Now comes the layout wizard, show below.
Drag the headers Who, Week and What into the ROW area, and the Amount header into the Data area. (Leave the Column area
blank for now.) If the Amount tag does not show "Sum of Amount", double-click it and choose the Sum option.
Now you have your table, and it looks very much like a sorted version of the original data list, except from the automatic subtotals.
Now comes the cool stuff:
Grab the What header in the table and drag it all the way to the left. When you drop it here, the table re-sorts and re-sums; you have
a table of beer costs by person instead. Now drag the Week header to the left and you have a weekly report.
Double-clicking the headers gives options of showing/hiding specific data (like Empty and Beer, may come in handy) and removing
subtotaling for this column. Right-clicking gives other options, among them Hide and Show Detail for reading totals only.
Here comes another useful pivot, made from the same list. Select any item in the original data list and choose Pivot Table wizard
again. This time, drag Who into the Row field, What into the Column field and Amount into the Data field.
The only tricky thing is this: You have to update the table manually from the Data menu. A Pivot table does not update itself. If this
becomes boring, here is some macro code that makes the tables update on selecting the worksheet:
Sub Auto_Open()
Application.OnSheetActivate = "UpdateIt"
End Sub
Sub UpdateIt()
Dim iP As Integer
Application.DisplayAlerts = False
For iP = 1 To ActiveSheet.PivotTables.Count
ActiveSheet.PivotTables(iP).RefreshTable
Next
Application.DisplayAlerts = True
End Sub
SPSS
Crosstabs
A crosstab (short for cross tabulation) is a summary table, with the emphasis on summary. Here’s an example:
Notice that the rows contain one set of categories (employment category) while the columns contain another (gender). In this
crosstab, the cells contain counts, but in others you can use percentages, means, standard deviations, and the like.
Here’s the important part: crosstabs are used for only categorical (discrete) data, that is, groups like employment categories or
gender. You can’t use a crosstab for continuous data like temperature or dosage or income. BUT, you can change data like
temperature or dosage or income into categories by creating groups, like income less than $25,000, income between 25000 and
49999, income 50000 or higher. We’ll discuss these data conversions known as transformations or recodes later. For now, you just
need to understand that crosstabs deal with groups or categories.
Now that you’ve seen the various windows you’ll be using, we’ll move on to the techniques you’ll use in SPSS for managing your
data files.
Creating primary reference lists
There is one set of outputs you’ll create that is more important than anything else, and that is the set of primary references. Primary
references describe your overall data set. In other words, how many in all? How many in each category? What are the maximums
and minimums? Means? Standard deviations? Here’s our rule: list out the summaries and put them somewhere where you refer to
them quickly.
You may not always want to print out all the details of your data set. For example, printing out every single income for a data set of
one million people, would not be useful, economical, or nice to either your printer or the trees. So here are the basic rules: print
frequencies for categorical variables and descriptive (also called univariate) statistics for continuous variables.
In this exercise, we’ll use the sample Employee dat.sav file.
Frequencies
1. If it’s not already open, navigate to the location where you saved the Employee data.sav file and open it.
2. From the menu, select File > New > Output.
3. From the menu, select Analyze > Descriptive Statistics > Frequencies.
4. Double-click Gender, Employment Category, and Minority Classification to
move them to the Variables list.
5. Click the check box labeled Display frequency tables.
6. Click Statistics.
7. Make sure all the check boxes are cleared (not checked).
8. Click Continue.
9. Click Charts.
10. If it is not already selected, select None by clicking it.
11. Click Continue.
12. Click OK.
13. From the menu, select File > Save As.
14. Navigate to the location where you are saving your files and save the file as AllFreqs.
Statistics
Employment
Minority
Category
Classification
Gender
N
Valid
Missing
474
474
474
0
0
0
FREQUENCIES
Gender
Frequency Percent
Valid
Valid
Percent
Cumulative
Percent
Female
216
45.6
45.6
45.6
Male
258
54.4
54.4
100.0
Total
474
100.0
100.0
Employment Category
Frequency Percent
Valid
Clerical
Valid
Percent
Cumulative
Percent
363
76.6
76.6
76.6
Custodial
27
5.7
5.7
82.3
Manager
84
17.7
17.7
100.0
474
100.0
100.0
Total
Minority Classification
Frequency Percent
Valid
Valid
Percent
Cumulative
Percent
No
370
78.1
78.1
78.1
Yes
104
21.9
21.9
100.0
Total
474
100.0
100.0
Descriptive statistics: descriptives (univariate)
The next step is to print the descriptive or univariate statistics for the continuous variables.
1. From the menu, select File > New > Output.
2. From the menu, select Analyze > Descriptive Statistics > Descriptives.
3. Click Reset to clear any previous selections.
4. Double-click Current Salary, Beginning Salary, Months since Hire, and Previous
Experience to move them to the Variables list.
5. Click Options.
6. In the Descriptives: Options window, click Mean, Std. deviation, Variance,
Range, Minimum, Maximum, Kurtosis, and Skewness
7. Click Continue.
8. Click OK.
9. When the resulting table is displayed, notice that the variables you selected are
listed as rows, while the statistics are listed in columns.
10. From the menu, select File > Save As.
11. Navigate to the location where you saved your files and save the file as AllDescriptives.
12. Notice that the statistic Range displays the difference (distance) between the minimum and maximum.
Descriptives
Descriptive Statistics
N
Statistic
Current
Salary
Beginning
Salary
Months
since Hire
Previous
Experience
(months)
Valid N
(listwise)
Range
Statistic
474 $119,250
Minimum Maximum
Statistic
Statistic
Mean
Statistic
Statistic
$15,750 $135,000 $34,419.57 $17,075.661
474
$70,980
$9,000
474
35
63
98
81.11
474
476
0
476
95.86
474
Std.
Deviation
$79,980 $17,016.09
Variance
Statistic
Skewness
Statistic
Std.
Error
Kurtosis
Statistic
Std.
Error
2.916E8
2.125
.112
5.378
.224
$7,870.638
6.195E7
2.853
.112
12.390
.224
10.061
101.223
-.053
.112
-1.153
.224
104.586 10938.281
1.510
.112
1.696
.224
Measuring differences
Typically, differences in one or more continuous dependent variables based on differences in one of more categorical variables are
evaluated using a t-test or an analysis of variance. If we have only one continuous dependent variable and only one categorical
independent variable with no more than two values, the t-test can be used to look for differences. If we have more than one
dependent continuous variable or more than two values across the categorical independent variables or we have both categorical
and continuous independent variables, we need to use an analysis of variance (ANOVA).
T-Tests
There are three types of t-tests:
An Independent Samples t-test is used when cases are randomly assigned to one of two groups. After a differential treatment has
been applied to the two groups, a measurement is taken which is related to the effect of the treatment. The t-test is calculated to
determine if any difference between the two groups is statistically significant.
A Paired Samples t-test can be used to evaluate differences between two groups who have been matched on one or more
characteristics or evaluate differences in before/after measures on same person. If you want to use pre/post measures, make sure
the post-test is the same as the pre-test.
A One Sample t-test is used to evaluate whether the mean of a continuous dependent variable is different from zero. To test if the
mean is different than some other value, subtract that value from each observation and the test to see if the mean of the new values
is zero.
In the independent samples t-test, the purpose of randomly assigning the people to groups is to control for the effect of other
differences between people that might have affected the effect we’re measuring. Using the matched pairs has the same goal, but
now the theory is that we have matched the subjects on the concomitant
variables.
What if you can’t randomly assign people to groups and you can’t match them? For example, if you want to examine salary
differences based on gender? One thing you can do is assume that the only difference between people that could result in a
difference in income is gender. With that assumption in hand, we can do an independent
sample t-test to see if there’s a gender difference in income.
If you’re one of those folks who think that assumption doesn’t pass the “smell” test, you can go back to our earlier friend the
correlation, put gender in as a binary variable (where 1 is female and 0 is not female), and look at the correlation. This does not
mean that gender has caused the difference in salaries, but you can measure how well gender is associated with income level. More
sophisticated analyses could be done by using partial correlations or multiple regression to control for concomitant factors.
How to: In this exercise, you’ll split a file based upon a single variable and test the set of cases for each value of the variable as a
separate sample. First you’ll instruct SPSS to treat each group (in this case, each result from a specific machine) as a separate
sample; this procedure is called splitting the file. Then you’ll run the T-test to determine whether all the machines are meeting the
production specifications.
1. From the menu, select File > Open > Data.
2. Navigate to the folder where you downloaded the brakes.sav file.
3. From the menu, select Data > Split File to open the Split File window
4. Click Compare Groups.
5. Select Machine Number and move it to the Groups Based on pane.
6. Click OK. Even though you haven’t created separate files, you have instructed
SPSS to treat each group within the file as if it were a separate sample, with a
group being defined by the machine number.
7. From the menu, select Analyze > Compare Means > One-Sample T Test. You’re going to test against a known value which in this
case is the diameter of the disc brake.
8. Select Disc Brake Diameter and move it to the Test Variables pane.
9. Select the text in the Test Value field and type: 322
10. Click Options to set the confidence level.
11. Change the confidence interval to 90.
12. Click Continue.
13. Click OK. The test results are displayed in the output window.
GET
FILE='C:\Documents and Settings\pthomas\My Documents\Brakes.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
SORT CASES BY machine.
SPLIT FILE LAYERED BY machine.
T-TEST
/TESTVAL=322
/MISSING=ANALYSIS
/VARIABLES=brake
/CRITERIA=CI(.90).
One-Sample Statistics
Machine Number
N
Mean
Std.
Deviation
Std. Error
Mean
1
Disc Brake Diameter
(mm)
16 321.998514
.0111568
.0027892
2
Disc Brake Diameter
(mm)
16 322.014263
.0106913
.0026728
3
Disc Brake Diameter
(mm)
16 321.998283
.0104812
.0026203
4
Disc Brake Diameter
(mm)
16 321.995435
.0069883
.0017471
5
Disc Brake Diameter
(mm)
16 322.004249
.0092022
.0023005
6
Disc Brake Diameter
(mm)
16 322.002452
.0086440
.0021610
7
Disc Brake Diameter
(mm)
16 322.006181
.0093303
.0023326
8
Disc Brake Diameter
(mm)
16 321.996699
.0077085
.0019271
One-Sample Test
Test Value = 322
Sig. (2tailed)
df
Mean
Difference
90% Confidence Interval of
the Difference
Machine Number
t
Lower
Upper
1
Disc Brake Diameter
(mm)
-.533
15
.602
-.0014858
-.006375
.003404
2
Disc Brake Diameter
(mm)
5.336
15
.000
.0142629
.009577
.018948
3
Disc Brake Diameter
(mm)
-.655
15
.522
-.0017174
-.006311
.002876
4
Disc Brake Diameter
(mm)
-2.613
15
.020
-.0045649
-.007628
-.001502
5
Disc Brake Diameter
(mm)
1.847
15
.085
.0042486
.000216
.008282
6
Disc Brake Diameter
(mm)
1.134
15
.274
.0024516
-.001337
.006240
7
Disc Brake Diameter
(mm)
2.650
15
.018
.0061813
.002092
.010270
8
Disc Brake Diameter
(mm)
-1.713
15
.107
-.0033014
-.006680
.000077
Analysis of Variance
If we have more than one dependent continuous variable or more than two values across the categorical independent variables or
we have both categorical and continuous independent variables, we need to use an analysis of variance to measure differences.
There are a number of particularly useful special cases for the general analysis of variance model.
If we have only one dependent continuous variable and one independent categorical variable we can use a One-Way Analysis of
Variance or One-Way ANOVA. If we have only one dependent continuous variable, but more than one independent categorical
variable we can use a general Univariate Analysis of Variance or Univariate ANOVA. If we have one or more independent continuous
variables, we can use the Univariate Analysis of Covariance or Univariate ANCOVA.
More advanced models are available for more than one dependent continuous variable.
If the model has one or more independent categorical variables, we would use a Multivariate Analysis of Variance or MANOVA. If the
model also included one or more continuous independent variables, we would use a Multivariate Analysis of Covariance or
MANCOVA.
One-Way ANOVA
One-way analysis of variance is an extension of the t-test in that, in a t-test you have two groups, one that received a treatment and
one that did not, while in a oneway analysis of variance you have more than two groups where the groups received different variation
of the same treatment. For example, a treatment factor could be whether you fry donuts in vegetable oil, butter, or lard.
How to: In this exercise, you’re going to identify the most effective number of training hours to achieve a given result, with results
scored between zero and 100, 100 being the optimal score.
1. From the menu, select File > Open > Data. Navigate to the folder where you saved the Training.sav file.
2. First you’ll graph the means and standard error, so select Graphs > Legacy Dialogs > Error Bar
3. You’ll use the default choices, so click Error bar graph: defining summaries for groups of cases
4. Select Skills Training group and move it to the Category Axis field.
5. Select Score on training exam and move it to the Variable field.
6. In the Bars represent field, open the drop-down list and select Standard error of mean.
7. Click OK. The results are displayed in the output window.
Note that the three bars do not represent equal variance among the three groups; instead, variance decreases as days of training
increase. These data may not be
appropriate for ANOVA. In the next step, you’ll run the ANOVA to test the data.
8. From the menu, select Analyze > Compare Means > One-Way ANOVA.
9. Move Score on training exam to the Dependent List.
10. Move Skills training group to the Factor list.
11. Click Options.
12. Select Descriptive and Homogeneity of variance test by clicking their check
boxes.
13. Click Continue.
14. Click OK. The results appear in the output window.
Descriptives
Score on training exam
N
1
2
3
Total
Std.
Deviation
Mean
20
20
20
60
63.5798
73.5677
79.2792
72.1422
Std.
Error
13.50858
10.60901
4.40754
12.00312
95% Confidence Interval for
Mean
Lower Bound Upper Bound Minimum Maximum
3.02061
2.37225
.98556
1.54960
57.2576
68.6025
77.2165
69.0415
69.9020
78.5328
81.3420
75.2430
Test of Homogeneity of Variances
Score on training exam
Levene
Statistic
4.637
df1
df2
2
Sig.
57
.014
ANOVA
Score on training exam
Sum of
Squares
Between
Groups
Within Groups
Total
Mean
Square
df
2525.691
2
1262.846
5974.724
8500.415
57
59
104.820
F
12.048
Sig.
.000
32.68
47.56
71.77
32.68
86.66
89.65
89.69
89.69
Download