Mgt 486/Wm Ross - McGrath Research Group

advertisement
1
MGT/William Ross
Word file: c:eeo~spss~Spring~2015
For Data file (SPSS for Windows 22)
See: Vt2015486.sav
Exercise last revised: Feb. 13, 2015
PART III OF HRIS PROJECT:
EQUAL EMPLOYMENT OPPORTUNITY IN SELECTION &
DIFFERENCES IN VALIDITY
Recall from Part II of this project that Metro Hospital is planning to expand its Call Center
Representatives (CCR) staff. You wonder if your new selection system is fair to Asian-Americans who
are applicants. Executives are eager to insure that the selection method(s) used in your validity/decision
accuracy exercise are fair to all groups. Therefore, you want to compare Whites (“Mainstream”) and
Asian-Americans to insure that the selection methods are fair to both groups.
You will need to remember what predictor(s) you chose for Part II and their regression equation as you
complete Part III. You will use the same data file as Part II. The same housekeeping information from Part
II (e.g., how to get a listing of your variables, how to print you output, etc.) also applies to this exercise.
Recall that you had the following predictor choices, scaled out of 100 (be sure to name them in the intro to
this third report – sometimes students only recommend a subset of the possible predictors in Part II, but in
Part III, they forget to name the ones they recommended, leaving me to guess what is being used!):
(a)
(b)
(c)
X1
X2
X3.
Customer Service Aptitude Profile (CSAP) test.
A Weighted Application Blank (“Biodata”) Inventory of Service Orientation (SO).
A five-minute Typing Skill Test (TST) score administered vie either PC or mobile app.
In your literature review for Part III, you will want to examine whether the above selection methods show
adverse impact and are fair to minority groups (assuming you can find relevant literature on those topics).
Your data set also has information about the following categories:
% in surrounding
3-mile area:
70
30
Variable Name
AGE
Value Code
1=Under 40
2=40 or Older
SEX
1=Ladies
2=Gentlemen
50
50
RACE
1=Asian-American
2=White (Mainstream)
35
65
In your literature review, you should discuss discrimination among these demographic groups in hiring.
2
Part “A”: Underutilization Analysis (Geographic Adverse Impact Analysis)
There are many ways to assess adverse impact. One way is to look at Geographic Area Comparisons. With
this approach, the percentage of workers of various ethnic/racial/age/ability groups hired by the firm in a
particular job is compared to that percentage to the percentage in the surrounding area. Note: For this
particular analysis, use just the subset of people who will be “hired” (e.g., n=14) with your new system and
NOT the larger set (e.g., N=70 or whatever your total sample size is).
In SPSS, move your cursor to the “toolbar” (horizontal menu) at the top of the screen. Click on
Data…
Select Cases…
Select those 14 people who were Hired under your new system in Part II.
[If you did not save your data set with that information (“hired” vs. “not hired”),
you will need to re-compute that information from Part II (“Validity”) around
pages 21-22.]
Analyze... (or “Statistics” in some versions of SPSS…)
Summarize… (or “Descriptive Statistics”…)
Frequencies...
Suppose you want to compare Asian-Americans and Whites. First, get a frequencies output for the
RACE/ETHNICITY variable. For this hypothetical example, the race variable will refer to whether the
job candidate is Asian-Americans (1) or Whites (2). In the Frequencies dialog box put RACE in the
“variable” box. [OK]
Next, using a hand-held calculator, please compare each group's percentage (from your SPSS output) to the
percentage in the surrounding three-mile area (see the previous page). So you will compute two
percentages – one for those in the group = 1 and another for those in the group = 2 category.
If you take the percentage of, say, Asian-American employees in the firm in this particular job (obtained
from your frequency printout) and then divide it by the percentage of Asian-American workers in the
surrounding area, what is the resulting ratio? Is it less than .80? If so, then Asian-American workers are
underutilized. If it is greater than .80, then they are not underutilized. Next, do this analysis for the White
group. Is either group underutilized by Metro in this position?
Now repeat this underutilization analysis for males and females. Is either group underutilized by Metro?
Finally, repeat this underutilization analysis for older and younger employees. Is either group
underutilized by Metro Hospital for this position?
An aside: Another way to assess adverse impact is to use the "Flow Analysis: 80% selection ratio rule"
(described later, when you compute decision accuracy statistics for each group). When you write your
report, you will want to consider whether the “underutilization results” (geographic analysis results)
correspond with or differ from the “selection ratio” (flow analysis) adverse impact statistics.
Don’t forget to include your results for the above analyses in your report! (It is easy to forget them
because these results were computed with a calculator and you won’t have SPSS printouts for them.)
3
Reviewing Some Concepts
Before you continue this exercise, you may want to ask yourself: What makes a selection system “fair”?
You may want to consult the readings for this course before beginning this part of the project. Part III of the
project gives results that can be used with several models of test fairness. These models are reviewed
briefly here, using a variety of groups as examples (some of these groups are the same as those from your
project, some aren’t – the purpose is to illustrate the models of fairness):
One definition of “unfairness” is that the validities (e.g., correlation coefficients between a predictor
measure and performance appraisal score) are significantly different from zero for one group and are not
significantly different from zero for the other group. This is called Single-Group Validity. For example, if
Predictor X had a validity of .60 (probability < .05 that a correlation that large would occur by chance alone)
for Anglo applicants, but only .17 (probability = not significantly different from zero by chance) for
Hispanic applicants, then you could conclude that Predictor X should only be used for Anglo applicants and
not for Hispanic applicants. Alternatively, you might conclude that you should look for a different predictor
that was valid for both groups. You hope to find validity for both groups; you hope you don’t find SingleGroup validity.
Caution: The Single-Group Validity test is strongly influenced by sample size. The smaller the sample
size, the larger the correlation (regression) value that is needed to be statistically significant. This is noted
in Arvey & Fahley in their book, Fairness in Selecting Employees.
Another definition is Differential Validity. Here, you'd compare validity (regression) coefficients for the
two groups. For example, suppose the validity for Men was .70 and the validity for Women was only .30.
Although both of those might be different from zero, the predictor was significantly better (i.e, more
accurate) at predicting job success for Men than for Women. It is fair to use for Men only.
A third definition of fairness is the Regression (Cleary) Model. Here, you compare the regression lines for
the two groups. If, for example, the regression line for Younger employees (under age 40) is:
Predicted Y = 12.05 + .69*X1 + .25*X2 + .10*X3
and the regression line for Older employees (those over age 40) is:
Predicted Y = 15.09 + .20*X1 + .45*X2 + .70*X3
and you may get different lines of prediction for the two groups. Thus, you should use separate regression
lines. Why? The weights are different! If the weights had been similar, you might have suggested using
only one common regression line for the entire sample. {note: These are all hypothetical results – your
results will differ}.
It is often instructive to look at a graph of the regression lines along with the common (overall sample)
regression line. Sometimes you can readily tell from the graph whether the lines appear to be quite different
from each other. If they are different, you should use separate lines for each group.
By the time you finish this exercise, you should:
a.
perform single-group validity analysis for two groups (Asian-Americans and Whites).
b.
perform differential validity analysis for these same two demographic groups of workers.
c.
test the regression model of fairness for these same two demographic groups of workers.
d.
compute decision accuracy statistics for these same two demographic groups of workers.
4
SELECTING DIFFERENT GROUPS OF WORKERS
For most of the subsequent parts of the exercise, you will need to analyze each group separately out of the
full data set (e.g., N = 70). Thus, it is useful to remember how to select the cases from a particular group.
Rather than repeat these instructions throughout the exercise, I will put them here for your reference, using
the first group as an example.
Selecting the First Group of Workers:
For this part of the exercise, you will select the Asian-American workers' records from the full sample.
You will use these to determine whether the selection system is valid for this specific group.
Please click on the Data menu:
File Edit View Data Transform Statistics Graphs Utilities Windows
Help
From this menu, click on Select Cases to see the "Select Cases" Dialogue Box. This box consists of a list
of variables in the left column and a list of selection options in the right column.
Click on the circle to the left of the If condition is satisfied option.
Next, click on the [if...] box.
This brings you to the "Select Cases: If" Dialogue Box, containing your list of variables in the left column
and room for your "select if" equation in the right column.
Use the down-arrow key or mouse to highlight the RACE or ETHNICITY variable , then click on the [>]
box to put this variable in the equation box. Next click on the [=] key. Then type 1. You should see an
equation similar to the following:
RACE or ETHNICITY = 1
What does this mean? It means you are going to first investigate the cases who are Asian-American.
{Quick, ask yourself: how would you write an equation to select cases from the other group? Don't write
such an equation now; I just want you to think about how you would do it because you will do it later. If
you said you'd write "RACE [or ETHNICITY] =” then you are on the right track! Just plug in the correct
number!}.
Click [ continue ]
Click [ OK ]
5
SECTION B: SINGLE-GROUP VALIDITY MODEL
Is the model that you recommended in Part II valid for each subgroup?
You want to know whether the model you recommended in Part II is valid for each group.
Regression Analysis for Asian-American Workers
You need to perform a regression analysis for this group of workers. You will use the predictor(s) (a.k.a.
“tests” or “selection measures”) that you chose as your final recommendation for Part II of the project.
Hopefully you saved the data set from Part II that contains y-predict in your data set (if not, you’ll need to
quickly redo that part of the Part II project using the equation you identified in Part II to create a y-predict
variable). If you recommended only one predictor, then use that (X) predictor for this subgroup. If you
recommended more than one predictor, then you will use y-predict for this subgroup.
The goal of this analysis is to see if the predictor(s) you recommended for everyone (in PART II) also
works well for JUST THIS PARTICULAR GROUP workers.
Please click on the Analyze (or Statistics) menu:
File
Edit Data Transform Analyze (or Statistics) Graphs Utilities Windows
Select Regression from the following menu (older versions of SPSS names are in parentheses):
Reports
Descriptive Statistics (or Summarize)
Compare Means
General Linear Model (ANOVA Models)
Correlate
Regression
Loglinear
Classify
Data Reduction
Scale
Nonparametric Tests
Survival
Multiple Response
There are different types of regressions, so please select Linear...
Help
6
This opens the "Linear Regression" Dialogue Box:
Variables:
id
x1 (or the name you gave it)
x2 (or the name you gave it)
x3 (or the name you gave it)
y
:
:
[>]
Dependent Variable:
[
]
[ Paste]
[ Previous] Block 1 of 1 [ Next ]
[>]
Independent Variable(s):
[
]
[
]
Method:
[
[ WLS>>]
[ OK ]
][]
[ Reset]
[Cancel]
[ Help]
[ Statistics ] [ Plot ] [ Save ] [ Options ]
***If you recommended using ONE predictor when you completed Part II of this project, please complete
the operations below. If you recommended using MORE THAN ONE predictor when you completed Part
II, please skip to page 9.***
If you recommended ONLY ONE predictor in Part II...
Select the first of the demographic subgroups using the Data… Select Cases… dropdown menus.
You want to predict y. So y will be your dependent variable. Use the mouse to move the cursor over the y
variable in the variable list. Click on the [>] key to put y in the Dependent Variable Box. Next, put your
chosen x? (or whatever name you gave your chosen variable) in the Independent Variables box for your
first analysis. You should see the following:
Variables:
id
x1 (or the name you gave it)
x2 (or the name you gave it)
x3 (or the name you gave it)
y
:
:
[>]
Dependent Variable:
[
y
]
[ OK ]
[ Paste]
[ Previous] Block 1 of 1 [ Next ]
[>]
Independent Variable(s):
[
x?
]
[
]
[ Reset]
[Cancel]
Method:
[
Enter ][]
[ WLS>>]
[ Help]
[ Statistics ] [ Plot ] [ Save ] [ Options ]
Next, click on the [ Statistics ] button. From this, select [ ] Estimates, [ ] Descriptives, and [ ] Model Fit
then click on [Continue].
You are now ready to run the regression for your predictor. Click on [ OK ]. You will see a printout with
information like this (only with different numbers for your printout):
7
What it is:
What it Means:
Multiple R
.70
R Squared
.49
Adjusted R Square
.47
Standard Error
1.09
This is the same as a correlation coefficient when there
is only one predictor. It is properly written as "rxy" and varies from .00 (no
relationship) to 1.00 (a perfect linear relationship)
This means that x is explaining 49% of the variation in y. If the predictor
was perfectly related to y, it would explain 100%.
This says "if you were to test this predictor on a second sample, what is the
estimate of the R Square? It is usually a bit lower than the actual R square
reported on the previous line because SPSS assumes that you probably
capitalized on chance covariation.
This is like a standard deviation only it is for the regression line. Imagine
a regression line (line that best fits the data) with two parallel lines near it
(one above and one below the regression line) that are 1.09 y-units away.
Such a line would tell you where most scores will fall if you plot x against
y. The larger the Standard Error, the less confidence we have in our ability
to predict.
Analysis of Variance:
F = 17.92 Sig. F = .001 This tells us that there is, indeed, a relationship between x and y. The probability
(significance or "sig.") that such a large F statistic would emerge by
chance is very small -- only one time in one-thousand (.001). If it is not
due to chance, it must be a real effect. Here, it tells us that the correlation
(r) is different from zero. So in this example there is a significant
correlation: x predicts job performance (y). Note that the T value in the
equation below also tells you the same thing for x. When there is more
than one predictor, the F-statistic tells you whether the overall equation
predicts y, and the T-statistics tell you whether each individual component
is significant (and belongs in the equation).
---------------------------Variables in the Equation ----------Variable Name
B
SE B
Beta
X
1.14
.11
.70
(Constant)
3.14
.04
T
8.96
99
Sig. T
.0001
.0001
What does this mean? It means that the equation for the regression line is:
predicted y = 1.14 (x) + 3.14
for Group #1 |
|
1.14 is 3.14 represents the intercept of the line.
the "B" It is the "B" value for the intercept,
value for
(which is also called the constant).
x and
represents
the weight multipled by x. It is the slope of the line.
If a job applicant from “Group #1,” that is, an Asian-American job applicant, took the x test, we would
predict his performance appraisal score using this regression equation. If we had several applicants and one
opening, we could give them all the test, use the equation to calculate predicted performance, and then hire
the one with the best predicted performance appraisal score.
8
Note that your printouts will look somewhat different from what is shown above in their format.
Specifically, they will show specific models (e.g. Model 1 might contain X3, and Model 2 might contain,
say X1 and X3) in a “box tables” format. See the instructions for Part II if you don’t recall how to interpret
such printouts.
Now, look at your data. How large is your R? Is it different from zero? Does it explain a large percentage
of the variation in y? What is your regression equation?
Now, repeat this procedure for White applicants. Don't forget to first use Data... Select Cases... to select
just the White cases (RACE/ETHNICITY = 2).
In order to know whether you have validity for each group, look at the correlation (the r-value) and
accompanying p values for the Asian-American and the White applicants. To determine whether your test
is fair using this model, answer the following questions:
Is the r for Asian-Americans valid -- different from zero (p < .05)?
•If "yes" then you can use the selection method (test) with applicants from this group.
•If "no" then it is not fair to use the method with applicants from this group.
Is the r for Whites valid -- different from zero (p < .05)?
•If "yes" then you can use the selection method (test) with applicants from this group.
•If "no" then it is not fair to use the method with applicants from this group.
Are both correlations valid? That is, is the validity different from zero for BOTH groups?
•If your answer is "yes," then it is fair to use with both groups.
•If "it is only valid for one group" then you have Single-Group Validity and the
test can only be used with that particular group.
•If "it is not valid for either group" then you have a poor selection system and
it should not be used at all. Tell that test salesman to give you your money back!
***After completing these analyses, please skip ahead to page 11. ***
9
If you recommended MORE THAN ONE predictor at the end of your PART II report...
Select one of the demographic subgroups (Asian-Americans) using the Data… Select Cases… dropdown
menus.
Click on Analyze… Regression…
In the "Linear Regression" Dialogue Box, you will see the following:
Variables:
id
x1 (or the variable name)
x2 (or the variable name)
x3 (or the variable name)
y
:
:
:
ypredict
[>]
Dependent Variable:
[
]
[ OK ]
[ Paste]
[ Previous] Block 1 of 1 [ Next ]
[>]
Independent Variable(s):
[
]
[
]
Method:
[
Enter
[ WLS>>]
][]
[ Reset]
[Cancel]
[ Help]
[ Statistics ] [ Plot ] [ Save ] [ Options ]
Highlight Y and click on the [>] to move it into the Dependent Variable box.
Select y-predict as your Independent Variable; there will only be one “Block” in this analysis. Next, click
on the [ Statistics ] button. From this, select [x] Estimates, [x] Descriptive statistics, and
[x] Model Fit, then click on [Continue]. You are now ready to run the regression for your predictor. Click
on [ OK ].
You will see a printout containing this type of information (only with different numbers for your printout):
What it is:
What it Means:
Multiple R
.70
R Squared
.49
Adjusted R Square
.47
Standard Error
1.09
This is like a correlation coefficient. and varies from .00 (no relationship)
to 1.00 (a perfect linear relationship)
This means that y-predict is explaining 49% of the variation in y. If the
predictor was perfectly related to y, it would explain 100%.
This says "if you were to test this predictor on a second sample, what is the
estimate of the R Square? .
This is like a standard deviation only it is for the regression line. Imagine
a regression line (line that best fits the data) with two parallel lines near it
(one above and one below the regression line) that are 1.09 y-units away.
Such a line would tell you where most scores will fall. if you plot y-predict
against y. You hope the Standard Error is small.
Analysis of Variance:
F = 17.92 Sig. F = .001 This tells us that there is, indeed, a significant relationship between y and y-predict.
The probability (significance or "sig.") is < .05. The T-value below tells
you the same thing about your predictor (y-predict).
10
UNSTANDARDIZED STANDARDIZED
MODEL
B
SE B
BETA T
SIG.T
1
y-predict
1.14
.11
.70
8.96
.0001
(constant)
3.14
.04
-----------------------------------------------------------------------------------------Footnotes:
Predictor(s) [in Model 1] is/are: y-predict
Dependent Variable = Overall job performance measure (Y)
R
.40
R-squared
16%
What does this mean? It means that the equation for the regression line in the model is:
predicted y = 1.14 (ypredict) + 3.14
for this group |
|
of Applicants 1.14 is
3.14 represents the intercept of the line.
the "B"
It is the "B" value for the intercept, (which is also
value for X1
called the constant). It can be positive or negative.
and represents the
weight multipled by y-predict.
It is the slope of the line.
Now, look at your data. How large is your R? Is it different from zero? Does it explain a large percentage
of the variation in y? What is your regression equation?
Now, repeat this entire procedure for White applicants. Don't forget to first use Data... Select Cases...
to select just the White cases (RACE/ETHNIC Group = 2).
In order to know whether you have validity for each group, look at the R value and accompanying p values
for the Asian-American and the White applicants. To determine whether your test is fair using this model,
answer the following questions:
Is the R for Asian-Americans valid -- different from zero (p < .05)?
•If "yes" then you can use the selection method (test) with applicants from this group.
•If "no" then it is not fair to use the method with applicants from this group.
Is the R for Whites valid -- different from zero (p < .05)?
•If "yes" then you can use the selection method (test) with applicants from this group.
•If "no" then it is not fair to use the method with applicants from this group.
Are both Rs valid? That is, is the validity different from zero for BOTH groups?
•If your answer is "yes," then it is fair to use with both groups.
•If "it is only valid for one group" then you have Single-Group Validity and the
test can only be used with that particular group.
•If "it is not valid for either group" then you have a poor selection system and
it should not be used at all. Tell that test salesman to give you your money back!
11
Section C: Do you have “Differential Validity”?
In order to assess this, you will use "Fisher's r --> z’ statistic." This data transformation allows you to
determine precisely whether one correlation is significantly larger than another. This procedure is described
in detail below:
How to compare two correlation coefficients, using Fisher’s r  z’ statistic, in order to determine whether
you have “differential validity:”
Because Pearson correlation coefficients (r) are not “normally distributed,” you cannot compare two
sample’s r values by using conventional statistics such as t tests. You must first transform the value of the
two rs so that they follow a normal distribution. This is accomplished by using Fisher’s r  z’
transformation.
Step # Operation to be Performed:
1
Convert the correlation coefficients into the closest z’ scores using Table 1 (see next page).
2
Compare the two z’ values by using the following formula. This formula will yield a standard
normal distribution (Z – not to be confused with z’).
z’ group “a”( the group with the larger correlation) - z’ group “b” (the group with the smaller correlation )
Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
1
+
1____
n group a - 3
n group b - 3
where n = the number of applicants from that particular demographic group.
3
After taking the square root and finding the Z value, look in a z-table (such as that found in the back
of most statistics books) to see if the value is significant. For this exercise, we will use the cutoffs
found in Table 2 here (see next page). If your R value is between two values, you can interpolate
by using the average of the two appropriate z’ values (e.g. R = .03; z’ = (.020 + .040)/2 = .030.
12
Table 1: r  z’
R (or r)
value: z’ value
.00
.000
.01
.010
.02
.020
.03
.030
.04
.040
.05
.050
.06
.060
.07
.070
.08
.080
.09
.090
.10
.100
R (or r)
value: z’ value
R (or r)
value: z’ value
R (or r)
value: z’ value
.31
.32
.33
.34
.35
.36
.37
.38
.39
.40
.321
.332
.343
.354
.366
.377
.389
.400
.412
.424
.61
.62
.63
.64
.65
.66
.67
.68
.69
.70
.709
.725
.742
.758
.776
.793
.811
.829
.848
.867
.91
.92
.93
.94
.95
.96
.97
.98
.99
1.00
.11
.12
.13
.14
.15
.16
.17
.18
.19
.20
.111
.121
.131
.141
.151
.161
.172
.182
.193
.203
.41
.42
.43
.44
.45
.46
.47
.48
.49
.50
.436
.448
.460
.472
.485
.497
.510
.523
.536
.549
.71
.72
.73
.74
.73
.76
.77
.78
.79
.80
.888
.908
.929
.950
.973
.996
1.021
1.045
1.072
1.099
.21
.22
.23
.24
.25
.26
.27
.28
.29
.30
.214
.224
.235
.245
.256
.266
.277
.288
.299
.310
.51
.52
.53
.54
.55
.56
.57
.58
.59
.60
.563
.576
.590
.604
.619
.633
.648
.662
.678
.693
.81
.82
.83
.84
.85
.86
.87
.88
.89
.90
1.128
1.157
1.189
1.221
1.257
1.293
1.335
1.376
1.424
1.472
1.531
1.589
1.664
1.738
1.842
1.946
2.122
2.298
2.649
3.000
Table 2: Significance Thresholds for Z values
Two-tailed
Probability
Level
Z value needed
To reach significance
.05
.01
.001
1.96 or larger*
2.55 or larger
3.50 or larger
=
=
=
One-tailed
Probability
Z value needed
Level
To reach significance
.05
.01
.001
=
=
=
1.65 or larger
2.32 or larger
3.10 or larger
* for your project use the 1.96 threshold.
Note: Generally, use the two-tailed probability when you have NO hypothesis that the test should be more valid for one
group than another, based on the published literature. If you DO have a hypothesis that the test should be more valid
for one group than another, based on the published literature, use the one-tailed probability levels.
13
**If the Z-value is “significant” (e.g., larger than 1.96) then that means the test is unfair because it is
“significantly more valid” for one group than another.
**If the Z-values are smaller than the threshold values (e.g., suppose the Z-value was 1.59) then that means
the test is fair because it is equally valid for both groups.
**So, you really hope to find a SMALL Z-value so that you cannot conclude the test is unfair.
Just to make sure that you understand how Fisher’s r  z’ transformation works, here’s an example:
rxy for 100 applicants from “group a” (e.g., Anglos) = .66.
rxy for 25 applicants from “group b” (e.g., Hispanics) = .12.
Before going any further, ask yourself: Does the correlation for one group LOOK larger than the
correlation for the other group? (In this example, it is pretty obvious that they do!)
Step # Operation to be Performed:
1
Convert the correlation coefficients into z’ scores using Table 1 (see previous page).
What is the z’ value for group a (Anglos)? ______
What is the z’ value for group b (Hispanics)? ______
2
Compare the two z’ values by using the following formula. This formula will yield a standard
normal distribution (Z – not to be confused with z’).
z’ group “a”( the group with the larger correlation) - z’ group “b” the group with the smaller correlation
Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
1
+
1___
n group a - 3
n group b - 3
where n = the number of applicants from that particular demographic group.
“Plug in” the numbers from the problem above into this formula. What Z value did you get?
(use the space below to do your work with this example problem; the answer is on the next page)
14
z’ group “a”( the group with the larger correlation) - z’ group “b” the group with the smaller correlation
Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
1
+
1___
n group a - 3
n group b - 3
where n = the number of applicants from that particular demographic group.
.793 - .121
Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
1
+
1___
100 - 3
25 - 3
.672
Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
1
+
1___
97
22
.672
Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
.0103 +
.0455
.672
Z =  .0558
Z=
Z=
3
.672
.2361
2.845
After taking the square root and finding the Z value, look in the z-table cutoffs (Table 2 on the
previous page). Since we have no a priori hypothesis that the test is more valid for one group than
the other, we will use the two-tailed probabilities and the p = .01 threshold.
Is there a significant difference in the validities? (yes, Z = 2.845 is larger than 2.55, so the p < .01).
The test is unfair to Hispanic applicants. Based on the validities, you know that you can predict job
performance well for Anglo applicants (r2 = 43.5% of the variation in job performance is explained
by the test), but not well at all for Hispanic applicants (r2 = 1.44%). To use the test to predict
Hispanic applicant performance scores would be little better than rolling dice.
Please try this analysis for your SPSS data!
15
SECTION D: THE REGRESSION (CLEARY) MODEL
Selecting Everyone:
Do this exercise for the predictors (X-variables) identified in Part II. For this portion of the exercise, you
want to use everyone’s data. If you are continuing the exercise from the earlier sections (where you selected
each demographic group and analyzed the data separately), then you need to select all of the cases. To do
this, select Data from the main menu. Then Select Cases... This opens the "Select Cases" Dialogue Box:
Variable Names:
Age
ID
:
:
:
Select:
( ) All Cases
( ) If Condition is satisfied:
[ If ]
( ) Random Sample of Cases
[ Sample ]
( ) Based on Time or Case Range:
( ) Use Filter Variable
Unselected cases are:
(•) Filtered
( ) Deleted
If they are not already selected, please click on these options:
Select:
(•) All Cases
Unselected cases are:
(•) Filtered
The easiest way to see if your overall regression equation fits both subgroups is to plot the data for the two
groups. Regardless of whether you recommended one variable or more than one in Part II, you will need to
plot your results for each subgroup. Such scatter plots can really help you understand your data.
SPSS keeps changing the way it does graphs, which makes it very hard to write instructions that consistently
work! On the following pages, you will find instructions for two different methods of creating graphs; I
don’t guarantee that every method works with every version of SPSS. Try your best to get plots with the
subgroup regression equations. If worst comes to worst, you can always plot the regression lines by hand
because you know the regression equations for both subgroups and for the overall sample from your
regression printouts.
From the “horizontal menu” at the top of the screen, select Graphs. You should use one of three options:
*(Interactive) Chart Builder (pg. 16) – Use one of the other two methods; this doesn’t easily work correctly
when it comes to plotting the regression lines for the subgroups.
*Legacy Dialogues (pg. 20) – Personally, I found THIS method easiest to use with SPSS. Works smoothly.
*Regression Plots (pg. 23) – This is easier to use than the Interactive Chart Builder too. Works smoothly.
You can create graphs from any of these. You only have to graph your results using ONE method.
16
Graphing Method #1: The Interactive Chart Builder
From the horizontal menu at the top of the screen select Graphs… and Chart Builder.
You may see an introductory dialog box with the options [OK] or [Define your variable properties]. Select
[OK]. The main Chart Builder box will appear.
Within the “Gallery” tab select the following: Scatter/Dot. If you hold the mouse over the boxes, you’ll
find one called “Grouped Scatter” that has both blue and green circles in the box.
Next, click on the box |
0
|
0
| o
|o_______
that reflects “Grouped Scatter” then ‘drag’ that box up to the main chart area.
Using the “virtual hand” (or cursor) drag the Y variable into the “Y-Axis” box. Drag either Y-Predict or
your chosen X variable into the X-axis box. Drag the appropriate demographic group variable name to the
“Set Color” (or “Set Symbol”) box. Then select [OK].
Chart Builder
Age
Variables
ID
Set Color:
Y-Axis
Race
Sex
X1
X2
X3
Y
X-Axis
Bar
Gallery
Line
Choose
From:
Area
Pie
Scatter/Dot
Histogram
High-Low …
Gallery
Basic Elements
Group/Point ID
Titles/Footnotes
Element Properties
Options
[Simple Scatter] [Group Scatter] [Simple 3-D Scatter] [Grouped 3-D Scatter]
[Summary Point Plot] [Simple Dot Plot] [Scatter Plot Matrix] [Drop Line Plot]
[OK] [Paste] [Reset] [Cancel] [Help]
17
You will see a graph with different symbols for the two demographic groups in your output. If you doubleclick on the graph in your output, you can edit it.
Here are two different ways to edit the graph to add overall and subgroup regression lines. One of these
may work:
Method #1:
If you look at the Chart Editor box, there are four rows at the top, above the graph. Of those four rows, the
bottom row has several symbols (icons). One is:
:.:
.: :
If you click on that, it will add the overall regression line. Another box with the line “Properties” should
appear to the right of the Chart Editor box. If the line is not a linear line, select “linear” from the options
and click “apply.”
Another icon on that Chart Editor is:
.:::. ::::
: .::::
If you hold the cursor over it, it will say [‘Add Fit Line at subgroups’]. Click on it to add two more
regression lines – one for each demographic group. (Note: A “Properties” dialogue box may also open – if
you have selected a linear line, then you can just close that dialogue box).
If this worked properly, you should now have two different-colored symbols for your two demographic
groups and you should have three regression lines: An overall line and one line for each ethnic group.
Method #2 [Frankly, this one gave me some trouble]:
Click on Edit…Properties to bring up another dialogue box (the Properties box, illustrated below).
Select the Variables tab (at the top of the box)… You will see that the first “element type” is called
“Markers.” That is fine. What you want to do is to add a second element type – your regression lines for
the groups.
Step 1: Look at the demographic variable (e.g., Asian-American). Does it say “Group” to the right?
If it DOES, go to Step 2. If not, click on the area where “Group” should be.
A drop-down menu will appear. Select “Group”.
Note: It is important that the demographic group variable that you have chosen be set to
“Group” so that you can get regression lines for each demographic group.
Step 2: In the row of buttons (near the bottom) labeled [Previous] [Delete] [New] [Next] select [New].
Step 3: Now, where it says ELEMENT TYPE (near the top of the dialog box):
use the drop down menu to select Fit Line.
(If it asks what type of line, select Regression line from the options.)
18
Step 4: Go to the bottom row and click [Apply].
Properties
__________
[Chart Size] [Variables]_________________
Element Type: [Fit Line ]
Variables:
Case Number ________________________
Frequency___________________________
Demographic Group #1 Group__________
Demographic Group #2 Style: Border Color
___________________________________
Per cent_____________________________
X (or Y-predict)
X-axis___________
Y=Overall job
Performance
Y-axis___________
[Previous] [Delete] [New] [Next]
[Apply] [Cancel] [Help]
19
If you look at your graph now, you will see two lines (with different colors) appear for the two demographic
groups you are graphing. Are the two lines close together or far apart?
But what about the OVERALL regression line?
To add a third element type (the overall regression line for the entire sample), you will repeat the steps from
above, but first exclude the demographic grouping variable.
Step 1: In the row of buttons (near the bottom) labeled [Previous] [Delete] [New] [Next] select [New].
Step 2: Now, where it says ELEMENT TYPE (near the top of the dialog box):
use the drop down menu to select Fit Line.
(If it asks what type of line, select Regression line from the options.)
Step 3: Where the demographic variable still says the word “Group,” click on the word “Group.” A dropdown menu will appear. Select “Exclude” from that menu.
Step 4: Go to the bottom row and click [Apply]. You should now see a regression line for the entire
sample.
Properties
__________
[Chart Size] [Variables]_________________
Element Type: [Fit Line ]
Variables:
Case number
Frequency
Exclude
Demographic Group #1 Group
Demographic Group #2 Style: Border Color
Percent
X (or Y-predict)
Y= Overall job
Performance
X-Axis
X-axis
Y-axis
Z-axis
Matrix
Slice
Y-Axis
[Previous] [Delete] [New] [Next]
[Apply] [Cancel] [Help]
Are the three lines close together near the right edge of the graph? If so, perhaps you can use the overall
regression line when hiring new employees. If the demographic group lines are far apart from each other
(and far apart from the overall regression line) in that part of the graph, perhaps it would be more accurate to
use the separate group regression lines to predict job performance.
When finished, please skip ahead to page 24.
20
Graphing Method #2: Legacy Dialogues
If you are using “Legacy” Graphs for SPSS:
These instructions assume you are using a color printer. If you are using a B & W printer, you may want to
fix SPSS so that different subgroups are indicated by different symbols for different subgroups and different
group’s regression lines use different styles of dashes from the overall regression line. These changes are
useful if you plan to photocopy your report using a B & W photocopier. See handout entitled
“Distinguishing Groups using a Black & White Printer” on D2L (and/or found at my personal website,
http://websites.uwlax.edu/wross/mgt486.htm ) if interested.
Plotting your Results
Different versions of SPSS Legacy Dialogues “do graphs” differently. The following instructions apply to
several recent versions:
From the main menu, select Graphs. Then select Scatter/Dot...
You will see a dialogue box with the following options:
[x] Simple
[ ] Overlay
[ ] Matrix
[ ] 3-D
[ ] Dot
[ Define
[ Cancel
[ Help
]
]
]
Note that your dialogue box may have pictures of the types of graphs with a double-lined border around the
picture that is currently selected. If Simple is not currently selected, please click on that word or picture or
type an x in the appropriate box, depending upon your version of SPSS for Windows. Then click on the
[ Define ] button. You will see a box that looks like this:
Variable Names:
ID
X1
X2
X3
Y
Y-Predict
Age Group
Sex Group
Racial Group
Filter
[>]
Y-Axis:
[
]
[>]
X-Axis:
[
]
[>]
Set Markers By:
[
]
[>]
Label Cases By:
[
]
[OK]
[Paste]
[Reset]
[Cancel]
[Help]
[Titles] [Options]
Select Y as your Y-Axis Variable. For your X-Axis Variable select your chosen predictor (X1, X2, or X3 - or whatever you renamed them -- IF you picked ONE predictor; YPREDICT if you used more than one).
In the Set Markers By: [ ] box, please select the demographic variable that you are using (age or sex or
whatever groups YOU are comparing). Leave the "Label Cases by [ ]" option blank.
21
Adding Titles & Footnotes
Click [Titles] and add a title to your graph. You will see two lines where you can add titles. What
variables are you plotting? What subgroups are you comparing? Your answers to these questions should
give you clues as to what you could type in your title. For your subtitle, type your name(s).
If you know it, you can write out your overall regression equation (and/or your two subgroup equations if
you know them) in the footnotes. However, you don’t have to.
For example:
Footnote 1 [ y' Anglos = 3.34 + .03(x)]  Use YOUR equations for your groups!
Footnote 2 [ y’ Hispanics = 3.21+.04(x)]
Click [OK] to plot the graph.
Editing your Graph
You will need to edit your graph. Double-click on the EDGE of the graph. The chart will reappear in a
“Chart Editor” window with a different horizontal menu above the chart:
File
Edit View Options Elements Transform
Help
Adding an overall regression line:
For your graph, you will want to plot both the regression line for the total sample (equivalent to the line you
computed from Part II) and the regression lines for each of the subgroups. Why? If the two subgroups' lines
are very close to the total sample line, then you can probably justify using the total sample line. If the two
lines are not close to the total sample line (either because of a very different intercept or because of very
different slopes) then that tells you that the overall line is inappropriate.
You can do this one of two ways. The easy way is to look at the horizontal menu within the Chart Editor
(Remember? Click on the edge of the graph in your output to bring up the Chart Editor). There are three
rows of menu items. You will see a symbol near the right on the bottom menu row that looks like this:
:.:
.: :
If you hold the cursor over it, it will say [‘Fit Line at total’]. Click on it to add an overall regression line.
[Note: A “Properties” dialogue box may also open – click the “fit line” upper tab. Then for the type of line
(“fit method”), select a linear line rather than, say, a “quadratic” line or a Loess line, and then close that
dialogue box].
The less easy way to do this is to click on Elements
And then select Fit Line at Total from the drop-down menu.
Another possibility is this: After you plot both subgroup lines click to the right of Asian-American and
White on the “properties” menu of the chart editor. Then select exclude and hit apply it will exclude the
fact there are two subgroups and apply the ‘line of best fit’ to the whole graph.
22
Adding demographic group regression lines:
These graphing instructions worked with some versions of SPSS. When I tried this with SPSS version 15, it
should have worked, but the two subgroup lines – while circled on the screen -- were invisible! If it doesn’t
work for you, you can print it and draw them using a ruler. By playing with it (trial & error) I finally got the
lines to appear in color, but not dashed. So don’t get anxious over it if it doesn’t work…
You can add two more regression lines in one of two ways. The easy way is to look at the bottom row in the
horizontal menu within the Chart Editor. You will see a symbol near the right on the third row of the menu
that looks like this:
.:::. ::::
: .::::
If you hold the cursor over it, it will say [‘Add Fit Line at subgroups’]. Click on it to add two more
regression lines – one for each demographic group. (Note: A “Properties” dialogue box may also open – if
you have selected a linear line, then you can just close that dialogue box).
The less easy way to do this is to click on Elements
And then select Fit Line at Subgroups from the drop-down menu. If necessary, make sure they are linear.
Note that in addition to seeing the two regression lines, you may also see the R-squared value for each
subgroup. Feel free to click on the R-Squared value and edit the label to make it clear what group has what
R-squared value if you wish.
Once your plots look correct, skip ahead to page 24.
23
Graphing Method #3: Regression Plots
From the top (horizontal) menu, select Graphs…Regression Variable Plots…
This will bring up a dialog box.
Move your Y variable to the Vertical-Axis box using the appropriate [>] button.
Move your X variable (or Y-predict if using more than one predictor) to the Horizontal-Axis box.
Move your demographic variable (e.g., Race) to the Shapes by: box.
When your output appears, double-click on the graph to open the Chart Editor box.
If you look at the Chart Editor box, there are four rows at the top, above the graph. Of those four rows, the
bottom row has several symbols (icons). One is:
:.:
.: :
If you click on that, it will add the overall regression line. Another box with the line “Properties” should
appear to the right of the Chart Editor box. If the line is not a linear line, select “linear” from the options
and click “apply.”
Another icon on that Chart Editor is:
.:::. ::::
: .::::
If you hold the cursor over it, it will say [‘Add Fit Line at subgroups’]. Click on it to add two more
regression lines – one for each demographic group. (Note: A “Properties” dialogue box may also open – if
you have selected a linear line, then you can just close that dialogue box; if not, change it to “Linear”).
If this worked properly, you should now have two different-styled symbols for your two demographic
groups and you should have three regression lines: An overall line and one line for each ethnic group.
The only problem with this is that the overall line is a solid line and so is one of the lines for one of the
ethnic groups (the other line is dashed). You can change one of these solid lines to a different line style by
selecting “lines” in the “Properties” dialog box and then under “Style” click the drop-down arrow and select
a style (e.g., dots ………) that is different from either of the other two lines. Click [Apply]. Now you should
have three different lines with three different styles. Easy to tell the lines apart if you are using a black &
white printer.
You will also notice that this method gives you a box above and to the side of the graph. It looks like:
|--------[////////|\\\\\\\]-------|
^
What is that all about?
The end markers show the range of scores |------------|.
The shaded box shows the range between the first and third quartiles.
The line near the middle of the shaded box (in this example, above the ^ symbol) shows the median score.
You may (or may not) find all of that information interesting and useful.
Proceed to the next page (pg. 24), please.
24
Adding A Horizontal Reference Line:
Sometimes we want to see how many members of each group can successfully perform the job.
{Quick, what is the minimum score on y that indicates job success?}
Double-click on the EDGE of the graph. The chart will reappear in a “Chart Editor” window.
Within the Chart Editor, on the second row of the horizontal menu, you will see an icon that looks like this:
{Or you can click on either the [Elements] option or the [Option] option (depending on which version of
SPSS you are using) and select Y-Axis Reference Line from a drop-down menu.}
You will see a “Properties” box with the following menu: Chart Size
Lines Reference Line
The Reference Line screen may look like either of the following (depending on your version of SPSS):
Scale Axis:
Axis Title:
Y: Overall Job Performance
Y-Axis Line Position: [ 4 ]
Minimum:
1
Maximum:
7
Category Axis
Variable: Y: Overall performance
Position: [ 1 “Quite poor perf.” ▼]
(click on the ▼ and select the value that
is as close as you can get to 3.411)
Change the “Y-Axis Line Position” value to the minimum level of acceptable performance that you were
told when you completed Part II of your project (e.g. 3.411). Click Apply and Close. You should now see
a horizontal line across your graph. If you bother to count the points above the line, you will see that it
equals the number you used to compute your Base Rate from Part II. Without counting everything, it is
instructive to see if, for example, one demographic group is disproportionately below the line. That is
useful knowledge because, if, for example, left-handed people can’t do the job, then a selection system that
shows “adverse impact” against left-handed people isn’t really so bad is it? It is simply telling you not to
hire people who can’t do the job anyway. On the other hand, if both left and right-handed people can do the
job in similar proportions and your selection system only results in right-handed people being hired, then
you may have problems with a selection system that is biased against left-handed people.
Adding a Vertical Reference Line:
Sometimes we also want to see how many members of each group would have been hired using the common
cutoff score.
Do the same steps as for the horizontal reference line, only this time, insert an X-Axis Reference Line.
Within the Chart Editor, on the second row of the horizontal menu, you will see an icon that looks like this:
{Or you can click on the [Elements] option and select X-Axis Reference Line from a drop-down menu.}
For the Position of the Line:[ ] box, please type ??? (the number that was between the Fourteenthhighest and Fifteenth-first-highest predictor score; i.e., the Y-predict score or the X score that you
selected for your overall hiring cutoff score when you completed PART II of the project).
Next, click on the [ Add ] and then click on the [ OK ] button.
25
Interpreting your graphs:
Once you have graphed the regression lines for each subgroup, compare them. Are the slopes similar for
each of the two subgroups? Are the slopes similar to those of the Overall (entire sample) regression line?
Are the y-intercepts for the two subgroups similar (within say 2.0 of each other on y)?
Stated differently, if you were to use the regression formula (that you derived for the entire sample) that
employed (all of) your predictors, would it be fair to both groups, given that you will be hiring from the
right-hand part of the graph?
If the answer is "yes" to all of these questions, then it is probably OK to use the overall regression line. The
test(s) is/are fair using the Cleary (Regression) model of test fairness. Go on to the next page.
If the answer to any one of these is "no" then the overall line is not fair and if you plan to use this
predictor(s) then you are safer recommending that you use the separate regression lines for each group.
Note: If you concluded on pages 8- 12 that you need to use different predictors for each group, you may
decide that you want to re-run these Cleary analyses using those predictors. (for extra credit)
26
SECTION E: DECISION ACCURACY STATISTICS FOR EACH GROUP
AND ADVERSE IMPACT STATISTICS
Remember the decision accuracy statistics you computed for Part II of this exercise? Now, compute
decision accuracy statistics for each racial group. The raw numbers for these statistics can be obtained in
one of three ways:
(1)
Since you have a small sample, you can count the number of cases that fall within each of the four
quadrants on your scatterplot. From this, you can determine your decision accuracy statistics (e.g.,
among Asian-American applicants, the percent hired who would be successful).
(2)
You can print the data, and look at each racial group’s OKWorker, and Hire scores (these
variables were created in Part II). From these, you can determine how many members from each
racial group are in each of the four quadrants. It is hard to tell visually if points are close together
near the cutoff line.
(3)
Third, you can use the Data... Select Cases... to select just one group. Then you can use the
Analyze/Statistics...Descriptives... Crosstabs... to compare OKWorker and Hire for that group
(with OKWorker in the rows and Hire in the columns). This assumes you saved the data set with
these variables included. If you don't remember how to do a crosstabs, please consult your
instructions from Part II. This is my preferred method of the three and is best if you have very large
data sets.
After completing your first crosstabs, please select the other racial/ethnic subgroup using the
Data... Select Cases... procedure and then compute the crosstabs for the second group.
When you have finished, you should have the frequency counts for your 2 x 2 table upon which you
can base your decision accuracy statistics. You can "check your work" in this way. If you add up
the total number of people hired from the two 2 x 2 tables, it should equal 14 people. You don’t
have to compute all of those good statistics that you computed for Part II. Just compute “hit rate,”
“success rate,” and “Selection Ratio” for each demographic group separately.
Testing for Adverse Impact (AI) using “Flow” analysis:
Does the 2 x 2 table suggest that the test shows adverse impact? That is, are more people from one
racial/ethnic group hired (“flowing into the organization”) than people from the other racial/ethnic group?
Remember that adverse impact, by itself does not prove the test is biased, and it is possible to continue using
a test that produces adverse impact. However, adverse impact does indicate that you need to insure validity
and fairness (using one of the above models). In other words, we are checking for an "apparent problem" -ideally, you would do this part first, but because of the nature of the computer exercise, it is easier to do this
after doing the other computer work. You might report this early in your “Results” section of your report,
though.
How do you check for Adverse Impact (AI)?
•You need to find the selection ratio for White applicants. This is:
White applicants hired with your new system  the no. of White applicants.
•You need to also determine the selection ratio for Asian-American applicants.
???  the total no. of Asian-American job applicants.
27
Note that for this project, you are making a consistent assumption throughout this project that future
samples of applicants will be very similar demographically to the current sample.
•Then you need to take the "ratio of the ratios." For example, suppose I was concerned about discrimination
against Jewish applicants. If I hired 6 Jews and 14 Gentiles for 20 vacancies, and I had 25 Jews and 35
Gentiles who applied, then I would compute this ratio:
6/25
.24
----- = ----- = .60
14/35
.40
The critical value here is whether the final number is above or below .80. This "80% rule" is the threshold
the Equal Employment Opportunity Commission (EEOC) uses to determine whether adverse impact occurs.
It is possible to also check for "reverse discrimination" in this example by putting the Gentile ratio on top
and the Jewish ratio on the bottom of the equation. Now you try it for Asian-American and White
applicants.
After completing Part III of this project (so far), be sure to think about the following questions. Some of
them may be relevant to your “Discussion.”
-Do your Part II recommendations change?
-Is your predictor(s) acceptable, with little Adverse Impact, using the “Underutilization” approach?
How might the geographic area that was chosen affect one’s conclusions?
-Is your predictor(s) fair and valid for both groups?
-Does your system show Adverse Impact using the “80% Rule” approach?
-What else might be done to improve the fairness of this – or any – selection system?
28
SECTION F: COMPARING PC-BASED TESTING WITH iPAD APP TESTING
Recall that X3 was administered by either a PC or by iPad. In Part II you checked to see if the mean scores
were different for the two types of test administration. Now you want to see if the two types of testing are
fair to all groups. We could use several fairness models, but to keep this simple, we are going to use the
single-group validity model: That is, we want to see if the two types of tests are similarly valid for each of
the racial groups. You can report the differential validity model and/or the Cleary (regression) model for
extra credit if you wish (but I’m not going to tell you how – use the earlier pages of these instructions as
your guide; up to 4% per model).
You’ll be selecting four different groups:
1.
Using the Data…Select Cases...If Condition is Satisfied [IF]… select just those who took the test
on a PC (App =1) who were Asian-Americans (Race = 1). This is done by selecting App in the
variable list and clicking the [>] button to move it into the appropriate box. Type “=1 & ”. Next,
select Race from the variable list and move it into the appropriate box. Type “=1”. When
completed you should see, “App=1 & Race = 1”. Filter out the remaining cases and select just that
group.
Select Analyze from the main horizontal menu. When the drop-down menu appears, select
Correlate…Bivariate… Make sure that Pearson, Two-Tailed, and Flag Significant
Correlations are all checked.
“Move” (using the [>] button) X3 and Y into the “Variables to correlate” box. Click [Options]…
Check Means & Standard Deviations, and Exclude Cases Pairwise. Click [Continue]...[OK].
What sort of correlation did you get between X3 and job performance (Y) for this demographic
group who took the test via PC? Was it statistically significant? Was the sample size large or
small? How does that impact your confidence in the findings? How does sample size impact
whether the correlation was significant? Also, what were the mean and SD?
2.
Select only Whites who took the X3 test via PC (“App=1 & Race = 2”).
Repeat the Correlate…Bivariate… analysis.
3.
Select only Asian-Americans who took the X3 test via mobile iPad app (“App=2 & Race = 1”).
Repeat the Correlate…Bivariate… analysis.
4.
Select only Whites who took the X3 test via mobile iPad app.
Repeat the Correlate…Bivariate… analysis.
Think about the questions asked earlier (this page) for each of the four groups. For which groups is X3 a
valid predictor of job performance? Does that mean that X3 is (or is not) “fair” (and to whom)? Does that
affect the conclusions you drew earlier regarding whether the selection system you recommended from Part
II is fair?
29
Extra Credit: Testing Every Individual Predictor for Each Racial Group
Finding the Single Best Predictor for each Group with Correlations
The above analysis tells you whether the one predictor (X1, X2, or X3, or the linear combination of some of
those expressed as Y-predict) that you recommended from Part II (based on the overall group) fits each of
the subgroups too. Whether it does or it doesn’t, you may want to know whether any of your other
individual predictors are “better” (e.g., more valid) for specific subgroups than the one variable you
recommended. To investigate this possibility, do the following with all predictors and Y:
Step 1: Select one of the demographic subgroups using the Data… Select Cases… dropdown menus.
Step 2: Click Analyze… Correlate… Bivariate. Run your Pearson correlation analysis for
X1, X2, X3, and Y.
Step 3: Select the other demographic subgroup using the Data… Select Cases… dropdown menus.
Step 4: Repeat Step 2
Compare the correlations for the two groups. Are the largest correlations for one group (e.g., the correlation
of X1 and Y for Asian-Americans) the same as the largest correlation coefficients for the other group? Or
do you find that different variables are highly correlated with Y for the different groups (e.g., X1 for AsianAmericans and X3 for Whites)?
If you find that the same X variable is the largest (and shows no single-group or differential validity –
hint: test for those -- but rather is similarly valid for both groups), then ask yourself: Is this X-variable
“better” than the system I recommended from Part II? Is it the same variable that you recommended in Part
II (if you recommended only one X variable)? If so, that gives you confidence that you picked the most fair
variable of the three predictors, doesn’t it?
If you find that different predictors are better for the different groups (e.g., X1 for Asian-Americans and
X3 for Whites), then you will have to make a tough decision before you make a recommendation: Do you
want to give Asian-Americans applicants one type of predictor and White applicants a second type of
predictor based on the validity statistics? Why or why not? Is there a second predictor that is “close” in
magnitude of validity that shows no differential validity problems? What will you do?
30
Extra Credit: Are Some Combinations of Predictors Only Valid for Specific Demographic Groups?
To investigate whether a combination of predictors is valid for each of the two groups separately:
Step 1: Select one of your two demographic groups using the Data… Select Cases… dropdown menus
Step 2: Click Analyze…Regression…Linear…
Step 3: Select Y as your Dependent Variable.
Recall that when you analyzed the data in Part II, you entered the X variables in blocks, based on
their validity (largest to smallest). You used the “Enter” feature with three different “blocks.”
This time, put all three predictors in the “Independent Variable(s)” box in one block and click on
the drop-down menu where “Enter” appears. Select Stepwise. The “Stepwise” procedure allows
the computer to select only the significant variables. It will exclude non-significant X variables.
Step 4: Select the other demographic group using the Data… Select Cases… dropdown menus.
Step 5: Repeat the Multiple Regression Analysis (Steps 2 & 3), again using the Stepwise procedure.
Step 6: Look at the two sets of printouts. Did the computer pick the same predictor(s) for each group?
If you find that the same combination of X variables are significant for both groups (and the Multiple R
value shows no single-group or differential validity problems), and they are the ones that you recommended
from Part II, then that gives you great confidence that what you recommended earlier is indeed the “best set”
of predictors. They are valid and fair. This is what you hope for!
If you find that different combinations of predictors are better for different groups (e.g., X1, X2, & X3 for
Anglo employees and only X2 & X3 for Hispanic employees), then you will have to make a tough
decision before you make a recommendation. Do you want to give Anglo applicants one set of predictors (in
this example, all three predictors) and Hispanic applicants a separate set of predictors (in this example, just
X2 & X3) based on the Multiple Regression validity statistics? Why or why not? Is there a “subset” of
predictors (in this example, just X2 & X3) that is “close” in magnitude of validity for each group that shows
no differential validity problems? (In the above example, you would probably want to re-run the regression
for Anglo subjects specifying the subset of variables you thought might work well for both groups – X2 &
X3 in this example). Generally, if you find that the same predictors do not work similarly well for the
different demographic groups, what will you do? What will you recommend?
Other Extra Credit Ideas
1.
2.
3.
4.
Using the graphs… scatterplot… and chart editor…features, address the issue: Do quadratic
lines (curves) explain more of the variance than linear lines (see p. 21)? Does that approach
change your conclusions about the fairness of your system?
Repeat some or all of the analyses you did in Part III (underutilization, validities, Multiple Rs,
80% rule, etc.) for different demographic groups, analyzed separately. For example:
Is your selection system fair to both men and women?
Is your selection system fair to both older and younger applicants?
Do these analyses suggest the same predictors? Or different predictors?
Read about and apply any or all of the following “alternative” Adverse Impact statistics to your data
set: Upton’s Corrected Chi Square, the Reverse-one rule, and/or the 2SD rule. These are described
in Collins, M. W., & Morris, S. B. (2008). Testing for adverse impact when sample size is small.
Journal of Applied Psychology, 93 (2), 463-471.
Re-do your overall selection system fairness analyses using just those who took PC-based testing
for X3 (it has a larger sample size than the iPad group). Is this system fair and valid?
Be sure to identify your extra credit printouts and to discuss the extra work in the body of your paper
(otherwise, I might miss seeing your work)! (10% maximum)
31
Names ___________________________________________________
Equal Employment Opportunity Project Grading Rubric (2015):
Checklist:
____Mentioned job & racial groups
____Mentioned three predictors briefly
Purpose/Overview
Literature Review &
Explanation & Logic; Rationale leads to your study
____Did you cite ten sources?
____Mention models of fairness
____Relate lit review to type of job
____Relate lit review to relevant demographic groups
____Mention EEO laws or Uniform Guidelines
____Mention relevant court cases
____Mention Adverse Impact
____Mention predictor 1 (CSAP) & fairness
____Mention predictor 2 (WAB) & fairness
____Mention predictor 3 (TST) & fairness
____Mention PC vs. mobile app testing & fairness
Method
____Geo. Adv. Imp. (underutilization) formula given
____Mention single-group validity
____Differential validity (r-z)
____Cleary (Regression) model
____Discuss “flow” analysis (80% Rule)
Results
____ Geographic. (Stock/underutilization) Analysis
____ ____ Multiple Rs (for single-group validity)
____ Comparing Rs (differential validity)
____ Regression Lines examined
____ ____ Hit Rate for each Group
____ ____ Success Rate for each Group
____ Selection Ratios & Adverse Impact (Flow)
____ Validity comparisons of PC vs. mobile app testing.
Discussion and Recommendations
Reference Page:
Correct Figures/Results/Printouts In an Appendix:
EXTRA CREDIT: Additional analyses, etc. (10% max)
Sex (1% each):
Age (1% each):
____ Single Group
____ Single Group validity
____ Differential
____ Differential validity
____ Regression
____ Regression model
____ Flow Analysis
____ Flow Analysis
____ Geo. (Stock)
____ Geo. (Stock)
____ Hit & Success Rate
____ Hit & Success Rate
____Discusses results
____Recommends selection system
____Additional Policies related to EEO
____Discusses limits of the study & Directions for
future research
____Ten+ complete references
____ Results are correct
____ Graphs appear correct
____ HR policy is included, clear, and reasonable
Other extra credit:
___ Used rxy to find best predictor for ea. gp.
___ Stepwise Mult R. for each group
___ Quadratic regression
___ Upton’s Corrected Chi-Square
___ Reverse-One Rule
___ 2SD adverse impact test
___ Differential validity or Cleary w. PC vs. App
___ Checked system fairness w. just PC X3
group in the sample.
Download