1 MGT/William Ross Word file: c:eeo~spss~Spring~2015 For Data file (SPSS for Windows 22) See: Vt2015486.sav Exercise last revised: Feb. 13, 2015 PART III OF HRIS PROJECT: EQUAL EMPLOYMENT OPPORTUNITY IN SELECTION & DIFFERENCES IN VALIDITY Recall from Part II of this project that Metro Hospital is planning to expand its Call Center Representatives (CCR) staff. You wonder if your new selection system is fair to Asian-Americans who are applicants. Executives are eager to insure that the selection method(s) used in your validity/decision accuracy exercise are fair to all groups. Therefore, you want to compare Whites (“Mainstream”) and Asian-Americans to insure that the selection methods are fair to both groups. You will need to remember what predictor(s) you chose for Part II and their regression equation as you complete Part III. You will use the same data file as Part II. The same housekeeping information from Part II (e.g., how to get a listing of your variables, how to print you output, etc.) also applies to this exercise. Recall that you had the following predictor choices, scaled out of 100 (be sure to name them in the intro to this third report – sometimes students only recommend a subset of the possible predictors in Part II, but in Part III, they forget to name the ones they recommended, leaving me to guess what is being used!): (a) (b) (c) X1 X2 X3. Customer Service Aptitude Profile (CSAP) test. A Weighted Application Blank (“Biodata”) Inventory of Service Orientation (SO). A five-minute Typing Skill Test (TST) score administered vie either PC or mobile app. In your literature review for Part III, you will want to examine whether the above selection methods show adverse impact and are fair to minority groups (assuming you can find relevant literature on those topics). Your data set also has information about the following categories: % in surrounding 3-mile area: 70 30 Variable Name AGE Value Code 1=Under 40 2=40 or Older SEX 1=Ladies 2=Gentlemen 50 50 RACE 1=Asian-American 2=White (Mainstream) 35 65 In your literature review, you should discuss discrimination among these demographic groups in hiring. 2 Part “A”: Underutilization Analysis (Geographic Adverse Impact Analysis) There are many ways to assess adverse impact. One way is to look at Geographic Area Comparisons. With this approach, the percentage of workers of various ethnic/racial/age/ability groups hired by the firm in a particular job is compared to that percentage to the percentage in the surrounding area. Note: For this particular analysis, use just the subset of people who will be “hired” (e.g., n=14) with your new system and NOT the larger set (e.g., N=70 or whatever your total sample size is). In SPSS, move your cursor to the “toolbar” (horizontal menu) at the top of the screen. Click on Data… Select Cases… Select those 14 people who were Hired under your new system in Part II. [If you did not save your data set with that information (“hired” vs. “not hired”), you will need to re-compute that information from Part II (“Validity”) around pages 21-22.] Analyze... (or “Statistics” in some versions of SPSS…) Summarize… (or “Descriptive Statistics”…) Frequencies... Suppose you want to compare Asian-Americans and Whites. First, get a frequencies output for the RACE/ETHNICITY variable. For this hypothetical example, the race variable will refer to whether the job candidate is Asian-Americans (1) or Whites (2). In the Frequencies dialog box put RACE in the “variable” box. [OK] Next, using a hand-held calculator, please compare each group's percentage (from your SPSS output) to the percentage in the surrounding three-mile area (see the previous page). So you will compute two percentages – one for those in the group = 1 and another for those in the group = 2 category. If you take the percentage of, say, Asian-American employees in the firm in this particular job (obtained from your frequency printout) and then divide it by the percentage of Asian-American workers in the surrounding area, what is the resulting ratio? Is it less than .80? If so, then Asian-American workers are underutilized. If it is greater than .80, then they are not underutilized. Next, do this analysis for the White group. Is either group underutilized by Metro in this position? Now repeat this underutilization analysis for males and females. Is either group underutilized by Metro? Finally, repeat this underutilization analysis for older and younger employees. Is either group underutilized by Metro Hospital for this position? An aside: Another way to assess adverse impact is to use the "Flow Analysis: 80% selection ratio rule" (described later, when you compute decision accuracy statistics for each group). When you write your report, you will want to consider whether the “underutilization results” (geographic analysis results) correspond with or differ from the “selection ratio” (flow analysis) adverse impact statistics. Don’t forget to include your results for the above analyses in your report! (It is easy to forget them because these results were computed with a calculator and you won’t have SPSS printouts for them.) 3 Reviewing Some Concepts Before you continue this exercise, you may want to ask yourself: What makes a selection system “fair”? You may want to consult the readings for this course before beginning this part of the project. Part III of the project gives results that can be used with several models of test fairness. These models are reviewed briefly here, using a variety of groups as examples (some of these groups are the same as those from your project, some aren’t – the purpose is to illustrate the models of fairness): One definition of “unfairness” is that the validities (e.g., correlation coefficients between a predictor measure and performance appraisal score) are significantly different from zero for one group and are not significantly different from zero for the other group. This is called Single-Group Validity. For example, if Predictor X had a validity of .60 (probability < .05 that a correlation that large would occur by chance alone) for Anglo applicants, but only .17 (probability = not significantly different from zero by chance) for Hispanic applicants, then you could conclude that Predictor X should only be used for Anglo applicants and not for Hispanic applicants. Alternatively, you might conclude that you should look for a different predictor that was valid for both groups. You hope to find validity for both groups; you hope you don’t find SingleGroup validity. Caution: The Single-Group Validity test is strongly influenced by sample size. The smaller the sample size, the larger the correlation (regression) value that is needed to be statistically significant. This is noted in Arvey & Fahley in their book, Fairness in Selecting Employees. Another definition is Differential Validity. Here, you'd compare validity (regression) coefficients for the two groups. For example, suppose the validity for Men was .70 and the validity for Women was only .30. Although both of those might be different from zero, the predictor was significantly better (i.e, more accurate) at predicting job success for Men than for Women. It is fair to use for Men only. A third definition of fairness is the Regression (Cleary) Model. Here, you compare the regression lines for the two groups. If, for example, the regression line for Younger employees (under age 40) is: Predicted Y = 12.05 + .69*X1 + .25*X2 + .10*X3 and the regression line for Older employees (those over age 40) is: Predicted Y = 15.09 + .20*X1 + .45*X2 + .70*X3 and you may get different lines of prediction for the two groups. Thus, you should use separate regression lines. Why? The weights are different! If the weights had been similar, you might have suggested using only one common regression line for the entire sample. {note: These are all hypothetical results – your results will differ}. It is often instructive to look at a graph of the regression lines along with the common (overall sample) regression line. Sometimes you can readily tell from the graph whether the lines appear to be quite different from each other. If they are different, you should use separate lines for each group. By the time you finish this exercise, you should: a. perform single-group validity analysis for two groups (Asian-Americans and Whites). b. perform differential validity analysis for these same two demographic groups of workers. c. test the regression model of fairness for these same two demographic groups of workers. d. compute decision accuracy statistics for these same two demographic groups of workers. 4 SELECTING DIFFERENT GROUPS OF WORKERS For most of the subsequent parts of the exercise, you will need to analyze each group separately out of the full data set (e.g., N = 70). Thus, it is useful to remember how to select the cases from a particular group. Rather than repeat these instructions throughout the exercise, I will put them here for your reference, using the first group as an example. Selecting the First Group of Workers: For this part of the exercise, you will select the Asian-American workers' records from the full sample. You will use these to determine whether the selection system is valid for this specific group. Please click on the Data menu: File Edit View Data Transform Statistics Graphs Utilities Windows Help From this menu, click on Select Cases to see the "Select Cases" Dialogue Box. This box consists of a list of variables in the left column and a list of selection options in the right column. Click on the circle to the left of the If condition is satisfied option. Next, click on the [if...] box. This brings you to the "Select Cases: If" Dialogue Box, containing your list of variables in the left column and room for your "select if" equation in the right column. Use the down-arrow key or mouse to highlight the RACE or ETHNICITY variable , then click on the [>] box to put this variable in the equation box. Next click on the [=] key. Then type 1. You should see an equation similar to the following: RACE or ETHNICITY = 1 What does this mean? It means you are going to first investigate the cases who are Asian-American. {Quick, ask yourself: how would you write an equation to select cases from the other group? Don't write such an equation now; I just want you to think about how you would do it because you will do it later. If you said you'd write "RACE [or ETHNICITY] =” then you are on the right track! Just plug in the correct number!}. Click [ continue ] Click [ OK ] 5 SECTION B: SINGLE-GROUP VALIDITY MODEL Is the model that you recommended in Part II valid for each subgroup? You want to know whether the model you recommended in Part II is valid for each group. Regression Analysis for Asian-American Workers You need to perform a regression analysis for this group of workers. You will use the predictor(s) (a.k.a. “tests” or “selection measures”) that you chose as your final recommendation for Part II of the project. Hopefully you saved the data set from Part II that contains y-predict in your data set (if not, you’ll need to quickly redo that part of the Part II project using the equation you identified in Part II to create a y-predict variable). If you recommended only one predictor, then use that (X) predictor for this subgroup. If you recommended more than one predictor, then you will use y-predict for this subgroup. The goal of this analysis is to see if the predictor(s) you recommended for everyone (in PART II) also works well for JUST THIS PARTICULAR GROUP workers. Please click on the Analyze (or Statistics) menu: File Edit Data Transform Analyze (or Statistics) Graphs Utilities Windows Select Regression from the following menu (older versions of SPSS names are in parentheses): Reports Descriptive Statistics (or Summarize) Compare Means General Linear Model (ANOVA Models) Correlate Regression Loglinear Classify Data Reduction Scale Nonparametric Tests Survival Multiple Response There are different types of regressions, so please select Linear... Help 6 This opens the "Linear Regression" Dialogue Box: Variables: id x1 (or the name you gave it) x2 (or the name you gave it) x3 (or the name you gave it) y : : [>] Dependent Variable: [ ] [ Paste] [ Previous] Block 1 of 1 [ Next ] [>] Independent Variable(s): [ ] [ ] Method: [ [ WLS>>] [ OK ] ][] [ Reset] [Cancel] [ Help] [ Statistics ] [ Plot ] [ Save ] [ Options ] ***If you recommended using ONE predictor when you completed Part II of this project, please complete the operations below. If you recommended using MORE THAN ONE predictor when you completed Part II, please skip to page 9.*** If you recommended ONLY ONE predictor in Part II... Select the first of the demographic subgroups using the Data… Select Cases… dropdown menus. You want to predict y. So y will be your dependent variable. Use the mouse to move the cursor over the y variable in the variable list. Click on the [>] key to put y in the Dependent Variable Box. Next, put your chosen x? (or whatever name you gave your chosen variable) in the Independent Variables box for your first analysis. You should see the following: Variables: id x1 (or the name you gave it) x2 (or the name you gave it) x3 (or the name you gave it) y : : [>] Dependent Variable: [ y ] [ OK ] [ Paste] [ Previous] Block 1 of 1 [ Next ] [>] Independent Variable(s): [ x? ] [ ] [ Reset] [Cancel] Method: [ Enter ][] [ WLS>>] [ Help] [ Statistics ] [ Plot ] [ Save ] [ Options ] Next, click on the [ Statistics ] button. From this, select [ ] Estimates, [ ] Descriptives, and [ ] Model Fit then click on [Continue]. You are now ready to run the regression for your predictor. Click on [ OK ]. You will see a printout with information like this (only with different numbers for your printout): 7 What it is: What it Means: Multiple R .70 R Squared .49 Adjusted R Square .47 Standard Error 1.09 This is the same as a correlation coefficient when there is only one predictor. It is properly written as "rxy" and varies from .00 (no relationship) to 1.00 (a perfect linear relationship) This means that x is explaining 49% of the variation in y. If the predictor was perfectly related to y, it would explain 100%. This says "if you were to test this predictor on a second sample, what is the estimate of the R Square? It is usually a bit lower than the actual R square reported on the previous line because SPSS assumes that you probably capitalized on chance covariation. This is like a standard deviation only it is for the regression line. Imagine a regression line (line that best fits the data) with two parallel lines near it (one above and one below the regression line) that are 1.09 y-units away. Such a line would tell you where most scores will fall if you plot x against y. The larger the Standard Error, the less confidence we have in our ability to predict. Analysis of Variance: F = 17.92 Sig. F = .001 This tells us that there is, indeed, a relationship between x and y. The probability (significance or "sig.") that such a large F statistic would emerge by chance is very small -- only one time in one-thousand (.001). If it is not due to chance, it must be a real effect. Here, it tells us that the correlation (r) is different from zero. So in this example there is a significant correlation: x predicts job performance (y). Note that the T value in the equation below also tells you the same thing for x. When there is more than one predictor, the F-statistic tells you whether the overall equation predicts y, and the T-statistics tell you whether each individual component is significant (and belongs in the equation). ---------------------------Variables in the Equation ----------Variable Name B SE B Beta X 1.14 .11 .70 (Constant) 3.14 .04 T 8.96 99 Sig. T .0001 .0001 What does this mean? It means that the equation for the regression line is: predicted y = 1.14 (x) + 3.14 for Group #1 | | 1.14 is 3.14 represents the intercept of the line. the "B" It is the "B" value for the intercept, value for (which is also called the constant). x and represents the weight multipled by x. It is the slope of the line. If a job applicant from “Group #1,” that is, an Asian-American job applicant, took the x test, we would predict his performance appraisal score using this regression equation. If we had several applicants and one opening, we could give them all the test, use the equation to calculate predicted performance, and then hire the one with the best predicted performance appraisal score. 8 Note that your printouts will look somewhat different from what is shown above in their format. Specifically, they will show specific models (e.g. Model 1 might contain X3, and Model 2 might contain, say X1 and X3) in a “box tables” format. See the instructions for Part II if you don’t recall how to interpret such printouts. Now, look at your data. How large is your R? Is it different from zero? Does it explain a large percentage of the variation in y? What is your regression equation? Now, repeat this procedure for White applicants. Don't forget to first use Data... Select Cases... to select just the White cases (RACE/ETHNICITY = 2). In order to know whether you have validity for each group, look at the correlation (the r-value) and accompanying p values for the Asian-American and the White applicants. To determine whether your test is fair using this model, answer the following questions: Is the r for Asian-Americans valid -- different from zero (p < .05)? •If "yes" then you can use the selection method (test) with applicants from this group. •If "no" then it is not fair to use the method with applicants from this group. Is the r for Whites valid -- different from zero (p < .05)? •If "yes" then you can use the selection method (test) with applicants from this group. •If "no" then it is not fair to use the method with applicants from this group. Are both correlations valid? That is, is the validity different from zero for BOTH groups? •If your answer is "yes," then it is fair to use with both groups. •If "it is only valid for one group" then you have Single-Group Validity and the test can only be used with that particular group. •If "it is not valid for either group" then you have a poor selection system and it should not be used at all. Tell that test salesman to give you your money back! ***After completing these analyses, please skip ahead to page 11. *** 9 If you recommended MORE THAN ONE predictor at the end of your PART II report... Select one of the demographic subgroups (Asian-Americans) using the Data… Select Cases… dropdown menus. Click on Analyze… Regression… In the "Linear Regression" Dialogue Box, you will see the following: Variables: id x1 (or the variable name) x2 (or the variable name) x3 (or the variable name) y : : : ypredict [>] Dependent Variable: [ ] [ OK ] [ Paste] [ Previous] Block 1 of 1 [ Next ] [>] Independent Variable(s): [ ] [ ] Method: [ Enter [ WLS>>] ][] [ Reset] [Cancel] [ Help] [ Statistics ] [ Plot ] [ Save ] [ Options ] Highlight Y and click on the [>] to move it into the Dependent Variable box. Select y-predict as your Independent Variable; there will only be one “Block” in this analysis. Next, click on the [ Statistics ] button. From this, select [x] Estimates, [x] Descriptive statistics, and [x] Model Fit, then click on [Continue]. You are now ready to run the regression for your predictor. Click on [ OK ]. You will see a printout containing this type of information (only with different numbers for your printout): What it is: What it Means: Multiple R .70 R Squared .49 Adjusted R Square .47 Standard Error 1.09 This is like a correlation coefficient. and varies from .00 (no relationship) to 1.00 (a perfect linear relationship) This means that y-predict is explaining 49% of the variation in y. If the predictor was perfectly related to y, it would explain 100%. This says "if you were to test this predictor on a second sample, what is the estimate of the R Square? . This is like a standard deviation only it is for the regression line. Imagine a regression line (line that best fits the data) with two parallel lines near it (one above and one below the regression line) that are 1.09 y-units away. Such a line would tell you where most scores will fall. if you plot y-predict against y. You hope the Standard Error is small. Analysis of Variance: F = 17.92 Sig. F = .001 This tells us that there is, indeed, a significant relationship between y and y-predict. The probability (significance or "sig.") is < .05. The T-value below tells you the same thing about your predictor (y-predict). 10 UNSTANDARDIZED STANDARDIZED MODEL B SE B BETA T SIG.T 1 y-predict 1.14 .11 .70 8.96 .0001 (constant) 3.14 .04 -----------------------------------------------------------------------------------------Footnotes: Predictor(s) [in Model 1] is/are: y-predict Dependent Variable = Overall job performance measure (Y) R .40 R-squared 16% What does this mean? It means that the equation for the regression line in the model is: predicted y = 1.14 (ypredict) + 3.14 for this group | | of Applicants 1.14 is 3.14 represents the intercept of the line. the "B" It is the "B" value for the intercept, (which is also value for X1 called the constant). It can be positive or negative. and represents the weight multipled by y-predict. It is the slope of the line. Now, look at your data. How large is your R? Is it different from zero? Does it explain a large percentage of the variation in y? What is your regression equation? Now, repeat this entire procedure for White applicants. Don't forget to first use Data... Select Cases... to select just the White cases (RACE/ETHNIC Group = 2). In order to know whether you have validity for each group, look at the R value and accompanying p values for the Asian-American and the White applicants. To determine whether your test is fair using this model, answer the following questions: Is the R for Asian-Americans valid -- different from zero (p < .05)? •If "yes" then you can use the selection method (test) with applicants from this group. •If "no" then it is not fair to use the method with applicants from this group. Is the R for Whites valid -- different from zero (p < .05)? •If "yes" then you can use the selection method (test) with applicants from this group. •If "no" then it is not fair to use the method with applicants from this group. Are both Rs valid? That is, is the validity different from zero for BOTH groups? •If your answer is "yes," then it is fair to use with both groups. •If "it is only valid for one group" then you have Single-Group Validity and the test can only be used with that particular group. •If "it is not valid for either group" then you have a poor selection system and it should not be used at all. Tell that test salesman to give you your money back! 11 Section C: Do you have “Differential Validity”? In order to assess this, you will use "Fisher's r --> z’ statistic." This data transformation allows you to determine precisely whether one correlation is significantly larger than another. This procedure is described in detail below: How to compare two correlation coefficients, using Fisher’s r z’ statistic, in order to determine whether you have “differential validity:” Because Pearson correlation coefficients (r) are not “normally distributed,” you cannot compare two sample’s r values by using conventional statistics such as t tests. You must first transform the value of the two rs so that they follow a normal distribution. This is accomplished by using Fisher’s r z’ transformation. Step # Operation to be Performed: 1 Convert the correlation coefficients into the closest z’ scores using Table 1 (see next page). 2 Compare the two z’ values by using the following formula. This formula will yield a standard normal distribution (Z – not to be confused with z’). z’ group “a”( the group with the larger correlation) - z’ group “b” (the group with the smaller correlation ) Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 1 + 1____ n group a - 3 n group b - 3 where n = the number of applicants from that particular demographic group. 3 After taking the square root and finding the Z value, look in a z-table (such as that found in the back of most statistics books) to see if the value is significant. For this exercise, we will use the cutoffs found in Table 2 here (see next page). If your R value is between two values, you can interpolate by using the average of the two appropriate z’ values (e.g. R = .03; z’ = (.020 + .040)/2 = .030. 12 Table 1: r z’ R (or r) value: z’ value .00 .000 .01 .010 .02 .020 .03 .030 .04 .040 .05 .050 .06 .060 .07 .070 .08 .080 .09 .090 .10 .100 R (or r) value: z’ value R (or r) value: z’ value R (or r) value: z’ value .31 .32 .33 .34 .35 .36 .37 .38 .39 .40 .321 .332 .343 .354 .366 .377 .389 .400 .412 .424 .61 .62 .63 .64 .65 .66 .67 .68 .69 .70 .709 .725 .742 .758 .776 .793 .811 .829 .848 .867 .91 .92 .93 .94 .95 .96 .97 .98 .99 1.00 .11 .12 .13 .14 .15 .16 .17 .18 .19 .20 .111 .121 .131 .141 .151 .161 .172 .182 .193 .203 .41 .42 .43 .44 .45 .46 .47 .48 .49 .50 .436 .448 .460 .472 .485 .497 .510 .523 .536 .549 .71 .72 .73 .74 .73 .76 .77 .78 .79 .80 .888 .908 .929 .950 .973 .996 1.021 1.045 1.072 1.099 .21 .22 .23 .24 .25 .26 .27 .28 .29 .30 .214 .224 .235 .245 .256 .266 .277 .288 .299 .310 .51 .52 .53 .54 .55 .56 .57 .58 .59 .60 .563 .576 .590 .604 .619 .633 .648 .662 .678 .693 .81 .82 .83 .84 .85 .86 .87 .88 .89 .90 1.128 1.157 1.189 1.221 1.257 1.293 1.335 1.376 1.424 1.472 1.531 1.589 1.664 1.738 1.842 1.946 2.122 2.298 2.649 3.000 Table 2: Significance Thresholds for Z values Two-tailed Probability Level Z value needed To reach significance .05 .01 .001 1.96 or larger* 2.55 or larger 3.50 or larger = = = One-tailed Probability Z value needed Level To reach significance .05 .01 .001 = = = 1.65 or larger 2.32 or larger 3.10 or larger * for your project use the 1.96 threshold. Note: Generally, use the two-tailed probability when you have NO hypothesis that the test should be more valid for one group than another, based on the published literature. If you DO have a hypothesis that the test should be more valid for one group than another, based on the published literature, use the one-tailed probability levels. 13 **If the Z-value is “significant” (e.g., larger than 1.96) then that means the test is unfair because it is “significantly more valid” for one group than another. **If the Z-values are smaller than the threshold values (e.g., suppose the Z-value was 1.59) then that means the test is fair because it is equally valid for both groups. **So, you really hope to find a SMALL Z-value so that you cannot conclude the test is unfair. Just to make sure that you understand how Fisher’s r z’ transformation works, here’s an example: rxy for 100 applicants from “group a” (e.g., Anglos) = .66. rxy for 25 applicants from “group b” (e.g., Hispanics) = .12. Before going any further, ask yourself: Does the correlation for one group LOOK larger than the correlation for the other group? (In this example, it is pretty obvious that they do!) Step # Operation to be Performed: 1 Convert the correlation coefficients into z’ scores using Table 1 (see previous page). What is the z’ value for group a (Anglos)? ______ What is the z’ value for group b (Hispanics)? ______ 2 Compare the two z’ values by using the following formula. This formula will yield a standard normal distribution (Z – not to be confused with z’). z’ group “a”( the group with the larger correlation) - z’ group “b” the group with the smaller correlation Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 1 + 1___ n group a - 3 n group b - 3 where n = the number of applicants from that particular demographic group. “Plug in” the numbers from the problem above into this formula. What Z value did you get? (use the space below to do your work with this example problem; the answer is on the next page) 14 z’ group “a”( the group with the larger correlation) - z’ group “b” the group with the smaller correlation Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 1 + 1___ n group a - 3 n group b - 3 where n = the number of applicants from that particular demographic group. .793 - .121 Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 1 + 1___ 100 - 3 25 - 3 .672 Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ 1 + 1___ 97 22 .672 Z = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ .0103 + .0455 .672 Z = .0558 Z= Z= 3 .672 .2361 2.845 After taking the square root and finding the Z value, look in the z-table cutoffs (Table 2 on the previous page). Since we have no a priori hypothesis that the test is more valid for one group than the other, we will use the two-tailed probabilities and the p = .01 threshold. Is there a significant difference in the validities? (yes, Z = 2.845 is larger than 2.55, so the p < .01). The test is unfair to Hispanic applicants. Based on the validities, you know that you can predict job performance well for Anglo applicants (r2 = 43.5% of the variation in job performance is explained by the test), but not well at all for Hispanic applicants (r2 = 1.44%). To use the test to predict Hispanic applicant performance scores would be little better than rolling dice. Please try this analysis for your SPSS data! 15 SECTION D: THE REGRESSION (CLEARY) MODEL Selecting Everyone: Do this exercise for the predictors (X-variables) identified in Part II. For this portion of the exercise, you want to use everyone’s data. If you are continuing the exercise from the earlier sections (where you selected each demographic group and analyzed the data separately), then you need to select all of the cases. To do this, select Data from the main menu. Then Select Cases... This opens the "Select Cases" Dialogue Box: Variable Names: Age ID : : : Select: ( ) All Cases ( ) If Condition is satisfied: [ If ] ( ) Random Sample of Cases [ Sample ] ( ) Based on Time or Case Range: ( ) Use Filter Variable Unselected cases are: (•) Filtered ( ) Deleted If they are not already selected, please click on these options: Select: (•) All Cases Unselected cases are: (•) Filtered The easiest way to see if your overall regression equation fits both subgroups is to plot the data for the two groups. Regardless of whether you recommended one variable or more than one in Part II, you will need to plot your results for each subgroup. Such scatter plots can really help you understand your data. SPSS keeps changing the way it does graphs, which makes it very hard to write instructions that consistently work! On the following pages, you will find instructions for two different methods of creating graphs; I don’t guarantee that every method works with every version of SPSS. Try your best to get plots with the subgroup regression equations. If worst comes to worst, you can always plot the regression lines by hand because you know the regression equations for both subgroups and for the overall sample from your regression printouts. From the “horizontal menu” at the top of the screen, select Graphs. You should use one of three options: *(Interactive) Chart Builder (pg. 16) – Use one of the other two methods; this doesn’t easily work correctly when it comes to plotting the regression lines for the subgroups. *Legacy Dialogues (pg. 20) – Personally, I found THIS method easiest to use with SPSS. Works smoothly. *Regression Plots (pg. 23) – This is easier to use than the Interactive Chart Builder too. Works smoothly. You can create graphs from any of these. You only have to graph your results using ONE method. 16 Graphing Method #1: The Interactive Chart Builder From the horizontal menu at the top of the screen select Graphs… and Chart Builder. You may see an introductory dialog box with the options [OK] or [Define your variable properties]. Select [OK]. The main Chart Builder box will appear. Within the “Gallery” tab select the following: Scatter/Dot. If you hold the mouse over the boxes, you’ll find one called “Grouped Scatter” that has both blue and green circles in the box. Next, click on the box | 0 | 0 | o |o_______ that reflects “Grouped Scatter” then ‘drag’ that box up to the main chart area. Using the “virtual hand” (or cursor) drag the Y variable into the “Y-Axis” box. Drag either Y-Predict or your chosen X variable into the X-axis box. Drag the appropriate demographic group variable name to the “Set Color” (or “Set Symbol”) box. Then select [OK]. Chart Builder Age Variables ID Set Color: Y-Axis Race Sex X1 X2 X3 Y X-Axis Bar Gallery Line Choose From: Area Pie Scatter/Dot Histogram High-Low … Gallery Basic Elements Group/Point ID Titles/Footnotes Element Properties Options [Simple Scatter] [Group Scatter] [Simple 3-D Scatter] [Grouped 3-D Scatter] [Summary Point Plot] [Simple Dot Plot] [Scatter Plot Matrix] [Drop Line Plot] [OK] [Paste] [Reset] [Cancel] [Help] 17 You will see a graph with different symbols for the two demographic groups in your output. If you doubleclick on the graph in your output, you can edit it. Here are two different ways to edit the graph to add overall and subgroup regression lines. One of these may work: Method #1: If you look at the Chart Editor box, there are four rows at the top, above the graph. Of those four rows, the bottom row has several symbols (icons). One is: :.: .: : If you click on that, it will add the overall regression line. Another box with the line “Properties” should appear to the right of the Chart Editor box. If the line is not a linear line, select “linear” from the options and click “apply.” Another icon on that Chart Editor is: .:::. :::: : .:::: If you hold the cursor over it, it will say [‘Add Fit Line at subgroups’]. Click on it to add two more regression lines – one for each demographic group. (Note: A “Properties” dialogue box may also open – if you have selected a linear line, then you can just close that dialogue box). If this worked properly, you should now have two different-colored symbols for your two demographic groups and you should have three regression lines: An overall line and one line for each ethnic group. Method #2 [Frankly, this one gave me some trouble]: Click on Edit…Properties to bring up another dialogue box (the Properties box, illustrated below). Select the Variables tab (at the top of the box)… You will see that the first “element type” is called “Markers.” That is fine. What you want to do is to add a second element type – your regression lines for the groups. Step 1: Look at the demographic variable (e.g., Asian-American). Does it say “Group” to the right? If it DOES, go to Step 2. If not, click on the area where “Group” should be. A drop-down menu will appear. Select “Group”. Note: It is important that the demographic group variable that you have chosen be set to “Group” so that you can get regression lines for each demographic group. Step 2: In the row of buttons (near the bottom) labeled [Previous] [Delete] [New] [Next] select [New]. Step 3: Now, where it says ELEMENT TYPE (near the top of the dialog box): use the drop down menu to select Fit Line. (If it asks what type of line, select Regression line from the options.) 18 Step 4: Go to the bottom row and click [Apply]. Properties __________ [Chart Size] [Variables]_________________ Element Type: [Fit Line ] Variables: Case Number ________________________ Frequency___________________________ Demographic Group #1 Group__________ Demographic Group #2 Style: Border Color ___________________________________ Per cent_____________________________ X (or Y-predict) X-axis___________ Y=Overall job Performance Y-axis___________ [Previous] [Delete] [New] [Next] [Apply] [Cancel] [Help] 19 If you look at your graph now, you will see two lines (with different colors) appear for the two demographic groups you are graphing. Are the two lines close together or far apart? But what about the OVERALL regression line? To add a third element type (the overall regression line for the entire sample), you will repeat the steps from above, but first exclude the demographic grouping variable. Step 1: In the row of buttons (near the bottom) labeled [Previous] [Delete] [New] [Next] select [New]. Step 2: Now, where it says ELEMENT TYPE (near the top of the dialog box): use the drop down menu to select Fit Line. (If it asks what type of line, select Regression line from the options.) Step 3: Where the demographic variable still says the word “Group,” click on the word “Group.” A dropdown menu will appear. Select “Exclude” from that menu. Step 4: Go to the bottom row and click [Apply]. You should now see a regression line for the entire sample. Properties __________ [Chart Size] [Variables]_________________ Element Type: [Fit Line ] Variables: Case number Frequency Exclude Demographic Group #1 Group Demographic Group #2 Style: Border Color Percent X (or Y-predict) Y= Overall job Performance X-Axis X-axis Y-axis Z-axis Matrix Slice Y-Axis [Previous] [Delete] [New] [Next] [Apply] [Cancel] [Help] Are the three lines close together near the right edge of the graph? If so, perhaps you can use the overall regression line when hiring new employees. If the demographic group lines are far apart from each other (and far apart from the overall regression line) in that part of the graph, perhaps it would be more accurate to use the separate group regression lines to predict job performance. When finished, please skip ahead to page 24. 20 Graphing Method #2: Legacy Dialogues If you are using “Legacy” Graphs for SPSS: These instructions assume you are using a color printer. If you are using a B & W printer, you may want to fix SPSS so that different subgroups are indicated by different symbols for different subgroups and different group’s regression lines use different styles of dashes from the overall regression line. These changes are useful if you plan to photocopy your report using a B & W photocopier. See handout entitled “Distinguishing Groups using a Black & White Printer” on D2L (and/or found at my personal website, http://websites.uwlax.edu/wross/mgt486.htm ) if interested. Plotting your Results Different versions of SPSS Legacy Dialogues “do graphs” differently. The following instructions apply to several recent versions: From the main menu, select Graphs. Then select Scatter/Dot... You will see a dialogue box with the following options: [x] Simple [ ] Overlay [ ] Matrix [ ] 3-D [ ] Dot [ Define [ Cancel [ Help ] ] ] Note that your dialogue box may have pictures of the types of graphs with a double-lined border around the picture that is currently selected. If Simple is not currently selected, please click on that word or picture or type an x in the appropriate box, depending upon your version of SPSS for Windows. Then click on the [ Define ] button. You will see a box that looks like this: Variable Names: ID X1 X2 X3 Y Y-Predict Age Group Sex Group Racial Group Filter [>] Y-Axis: [ ] [>] X-Axis: [ ] [>] Set Markers By: [ ] [>] Label Cases By: [ ] [OK] [Paste] [Reset] [Cancel] [Help] [Titles] [Options] Select Y as your Y-Axis Variable. For your X-Axis Variable select your chosen predictor (X1, X2, or X3 - or whatever you renamed them -- IF you picked ONE predictor; YPREDICT if you used more than one). In the Set Markers By: [ ] box, please select the demographic variable that you are using (age or sex or whatever groups YOU are comparing). Leave the "Label Cases by [ ]" option blank. 21 Adding Titles & Footnotes Click [Titles] and add a title to your graph. You will see two lines where you can add titles. What variables are you plotting? What subgroups are you comparing? Your answers to these questions should give you clues as to what you could type in your title. For your subtitle, type your name(s). If you know it, you can write out your overall regression equation (and/or your two subgroup equations if you know them) in the footnotes. However, you don’t have to. For example: Footnote 1 [ y' Anglos = 3.34 + .03(x)] Use YOUR equations for your groups! Footnote 2 [ y’ Hispanics = 3.21+.04(x)] Click [OK] to plot the graph. Editing your Graph You will need to edit your graph. Double-click on the EDGE of the graph. The chart will reappear in a “Chart Editor” window with a different horizontal menu above the chart: File Edit View Options Elements Transform Help Adding an overall regression line: For your graph, you will want to plot both the regression line for the total sample (equivalent to the line you computed from Part II) and the regression lines for each of the subgroups. Why? If the two subgroups' lines are very close to the total sample line, then you can probably justify using the total sample line. If the two lines are not close to the total sample line (either because of a very different intercept or because of very different slopes) then that tells you that the overall line is inappropriate. You can do this one of two ways. The easy way is to look at the horizontal menu within the Chart Editor (Remember? Click on the edge of the graph in your output to bring up the Chart Editor). There are three rows of menu items. You will see a symbol near the right on the bottom menu row that looks like this: :.: .: : If you hold the cursor over it, it will say [‘Fit Line at total’]. Click on it to add an overall regression line. [Note: A “Properties” dialogue box may also open – click the “fit line” upper tab. Then for the type of line (“fit method”), select a linear line rather than, say, a “quadratic” line or a Loess line, and then close that dialogue box]. The less easy way to do this is to click on Elements And then select Fit Line at Total from the drop-down menu. Another possibility is this: After you plot both subgroup lines click to the right of Asian-American and White on the “properties” menu of the chart editor. Then select exclude and hit apply it will exclude the fact there are two subgroups and apply the ‘line of best fit’ to the whole graph. 22 Adding demographic group regression lines: These graphing instructions worked with some versions of SPSS. When I tried this with SPSS version 15, it should have worked, but the two subgroup lines – while circled on the screen -- were invisible! If it doesn’t work for you, you can print it and draw them using a ruler. By playing with it (trial & error) I finally got the lines to appear in color, but not dashed. So don’t get anxious over it if it doesn’t work… You can add two more regression lines in one of two ways. The easy way is to look at the bottom row in the horizontal menu within the Chart Editor. You will see a symbol near the right on the third row of the menu that looks like this: .:::. :::: : .:::: If you hold the cursor over it, it will say [‘Add Fit Line at subgroups’]. Click on it to add two more regression lines – one for each demographic group. (Note: A “Properties” dialogue box may also open – if you have selected a linear line, then you can just close that dialogue box). The less easy way to do this is to click on Elements And then select Fit Line at Subgroups from the drop-down menu. If necessary, make sure they are linear. Note that in addition to seeing the two regression lines, you may also see the R-squared value for each subgroup. Feel free to click on the R-Squared value and edit the label to make it clear what group has what R-squared value if you wish. Once your plots look correct, skip ahead to page 24. 23 Graphing Method #3: Regression Plots From the top (horizontal) menu, select Graphs…Regression Variable Plots… This will bring up a dialog box. Move your Y variable to the Vertical-Axis box using the appropriate [>] button. Move your X variable (or Y-predict if using more than one predictor) to the Horizontal-Axis box. Move your demographic variable (e.g., Race) to the Shapes by: box. When your output appears, double-click on the graph to open the Chart Editor box. If you look at the Chart Editor box, there are four rows at the top, above the graph. Of those four rows, the bottom row has several symbols (icons). One is: :.: .: : If you click on that, it will add the overall regression line. Another box with the line “Properties” should appear to the right of the Chart Editor box. If the line is not a linear line, select “linear” from the options and click “apply.” Another icon on that Chart Editor is: .:::. :::: : .:::: If you hold the cursor over it, it will say [‘Add Fit Line at subgroups’]. Click on it to add two more regression lines – one for each demographic group. (Note: A “Properties” dialogue box may also open – if you have selected a linear line, then you can just close that dialogue box; if not, change it to “Linear”). If this worked properly, you should now have two different-styled symbols for your two demographic groups and you should have three regression lines: An overall line and one line for each ethnic group. The only problem with this is that the overall line is a solid line and so is one of the lines for one of the ethnic groups (the other line is dashed). You can change one of these solid lines to a different line style by selecting “lines” in the “Properties” dialog box and then under “Style” click the drop-down arrow and select a style (e.g., dots ………) that is different from either of the other two lines. Click [Apply]. Now you should have three different lines with three different styles. Easy to tell the lines apart if you are using a black & white printer. You will also notice that this method gives you a box above and to the side of the graph. It looks like: |--------[////////|\\\\\\\]-------| ^ What is that all about? The end markers show the range of scores |------------|. The shaded box shows the range between the first and third quartiles. The line near the middle of the shaded box (in this example, above the ^ symbol) shows the median score. You may (or may not) find all of that information interesting and useful. Proceed to the next page (pg. 24), please. 24 Adding A Horizontal Reference Line: Sometimes we want to see how many members of each group can successfully perform the job. {Quick, what is the minimum score on y that indicates job success?} Double-click on the EDGE of the graph. The chart will reappear in a “Chart Editor” window. Within the Chart Editor, on the second row of the horizontal menu, you will see an icon that looks like this: {Or you can click on either the [Elements] option or the [Option] option (depending on which version of SPSS you are using) and select Y-Axis Reference Line from a drop-down menu.} You will see a “Properties” box with the following menu: Chart Size Lines Reference Line The Reference Line screen may look like either of the following (depending on your version of SPSS): Scale Axis: Axis Title: Y: Overall Job Performance Y-Axis Line Position: [ 4 ] Minimum: 1 Maximum: 7 Category Axis Variable: Y: Overall performance Position: [ 1 “Quite poor perf.” ▼] (click on the ▼ and select the value that is as close as you can get to 3.411) Change the “Y-Axis Line Position” value to the minimum level of acceptable performance that you were told when you completed Part II of your project (e.g. 3.411). Click Apply and Close. You should now see a horizontal line across your graph. If you bother to count the points above the line, you will see that it equals the number you used to compute your Base Rate from Part II. Without counting everything, it is instructive to see if, for example, one demographic group is disproportionately below the line. That is useful knowledge because, if, for example, left-handed people can’t do the job, then a selection system that shows “adverse impact” against left-handed people isn’t really so bad is it? It is simply telling you not to hire people who can’t do the job anyway. On the other hand, if both left and right-handed people can do the job in similar proportions and your selection system only results in right-handed people being hired, then you may have problems with a selection system that is biased against left-handed people. Adding a Vertical Reference Line: Sometimes we also want to see how many members of each group would have been hired using the common cutoff score. Do the same steps as for the horizontal reference line, only this time, insert an X-Axis Reference Line. Within the Chart Editor, on the second row of the horizontal menu, you will see an icon that looks like this: {Or you can click on the [Elements] option and select X-Axis Reference Line from a drop-down menu.} For the Position of the Line:[ ] box, please type ??? (the number that was between the Fourteenthhighest and Fifteenth-first-highest predictor score; i.e., the Y-predict score or the X score that you selected for your overall hiring cutoff score when you completed PART II of the project). Next, click on the [ Add ] and then click on the [ OK ] button. 25 Interpreting your graphs: Once you have graphed the regression lines for each subgroup, compare them. Are the slopes similar for each of the two subgroups? Are the slopes similar to those of the Overall (entire sample) regression line? Are the y-intercepts for the two subgroups similar (within say 2.0 of each other on y)? Stated differently, if you were to use the regression formula (that you derived for the entire sample) that employed (all of) your predictors, would it be fair to both groups, given that you will be hiring from the right-hand part of the graph? If the answer is "yes" to all of these questions, then it is probably OK to use the overall regression line. The test(s) is/are fair using the Cleary (Regression) model of test fairness. Go on to the next page. If the answer to any one of these is "no" then the overall line is not fair and if you plan to use this predictor(s) then you are safer recommending that you use the separate regression lines for each group. Note: If you concluded on pages 8- 12 that you need to use different predictors for each group, you may decide that you want to re-run these Cleary analyses using those predictors. (for extra credit) 26 SECTION E: DECISION ACCURACY STATISTICS FOR EACH GROUP AND ADVERSE IMPACT STATISTICS Remember the decision accuracy statistics you computed for Part II of this exercise? Now, compute decision accuracy statistics for each racial group. The raw numbers for these statistics can be obtained in one of three ways: (1) Since you have a small sample, you can count the number of cases that fall within each of the four quadrants on your scatterplot. From this, you can determine your decision accuracy statistics (e.g., among Asian-American applicants, the percent hired who would be successful). (2) You can print the data, and look at each racial group’s OKWorker, and Hire scores (these variables were created in Part II). From these, you can determine how many members from each racial group are in each of the four quadrants. It is hard to tell visually if points are close together near the cutoff line. (3) Third, you can use the Data... Select Cases... to select just one group. Then you can use the Analyze/Statistics...Descriptives... Crosstabs... to compare OKWorker and Hire for that group (with OKWorker in the rows and Hire in the columns). This assumes you saved the data set with these variables included. If you don't remember how to do a crosstabs, please consult your instructions from Part II. This is my preferred method of the three and is best if you have very large data sets. After completing your first crosstabs, please select the other racial/ethnic subgroup using the Data... Select Cases... procedure and then compute the crosstabs for the second group. When you have finished, you should have the frequency counts for your 2 x 2 table upon which you can base your decision accuracy statistics. You can "check your work" in this way. If you add up the total number of people hired from the two 2 x 2 tables, it should equal 14 people. You don’t have to compute all of those good statistics that you computed for Part II. Just compute “hit rate,” “success rate,” and “Selection Ratio” for each demographic group separately. Testing for Adverse Impact (AI) using “Flow” analysis: Does the 2 x 2 table suggest that the test shows adverse impact? That is, are more people from one racial/ethnic group hired (“flowing into the organization”) than people from the other racial/ethnic group? Remember that adverse impact, by itself does not prove the test is biased, and it is possible to continue using a test that produces adverse impact. However, adverse impact does indicate that you need to insure validity and fairness (using one of the above models). In other words, we are checking for an "apparent problem" -ideally, you would do this part first, but because of the nature of the computer exercise, it is easier to do this after doing the other computer work. You might report this early in your “Results” section of your report, though. How do you check for Adverse Impact (AI)? •You need to find the selection ratio for White applicants. This is: White applicants hired with your new system the no. of White applicants. •You need to also determine the selection ratio for Asian-American applicants. ??? the total no. of Asian-American job applicants. 27 Note that for this project, you are making a consistent assumption throughout this project that future samples of applicants will be very similar demographically to the current sample. •Then you need to take the "ratio of the ratios." For example, suppose I was concerned about discrimination against Jewish applicants. If I hired 6 Jews and 14 Gentiles for 20 vacancies, and I had 25 Jews and 35 Gentiles who applied, then I would compute this ratio: 6/25 .24 ----- = ----- = .60 14/35 .40 The critical value here is whether the final number is above or below .80. This "80% rule" is the threshold the Equal Employment Opportunity Commission (EEOC) uses to determine whether adverse impact occurs. It is possible to also check for "reverse discrimination" in this example by putting the Gentile ratio on top and the Jewish ratio on the bottom of the equation. Now you try it for Asian-American and White applicants. After completing Part III of this project (so far), be sure to think about the following questions. Some of them may be relevant to your “Discussion.” -Do your Part II recommendations change? -Is your predictor(s) acceptable, with little Adverse Impact, using the “Underutilization” approach? How might the geographic area that was chosen affect one’s conclusions? -Is your predictor(s) fair and valid for both groups? -Does your system show Adverse Impact using the “80% Rule” approach? -What else might be done to improve the fairness of this – or any – selection system? 28 SECTION F: COMPARING PC-BASED TESTING WITH iPAD APP TESTING Recall that X3 was administered by either a PC or by iPad. In Part II you checked to see if the mean scores were different for the two types of test administration. Now you want to see if the two types of testing are fair to all groups. We could use several fairness models, but to keep this simple, we are going to use the single-group validity model: That is, we want to see if the two types of tests are similarly valid for each of the racial groups. You can report the differential validity model and/or the Cleary (regression) model for extra credit if you wish (but I’m not going to tell you how – use the earlier pages of these instructions as your guide; up to 4% per model). You’ll be selecting four different groups: 1. Using the Data…Select Cases...If Condition is Satisfied [IF]… select just those who took the test on a PC (App =1) who were Asian-Americans (Race = 1). This is done by selecting App in the variable list and clicking the [>] button to move it into the appropriate box. Type “=1 & ”. Next, select Race from the variable list and move it into the appropriate box. Type “=1”. When completed you should see, “App=1 & Race = 1”. Filter out the remaining cases and select just that group. Select Analyze from the main horizontal menu. When the drop-down menu appears, select Correlate…Bivariate… Make sure that Pearson, Two-Tailed, and Flag Significant Correlations are all checked. “Move” (using the [>] button) X3 and Y into the “Variables to correlate” box. Click [Options]… Check Means & Standard Deviations, and Exclude Cases Pairwise. Click [Continue]...[OK]. What sort of correlation did you get between X3 and job performance (Y) for this demographic group who took the test via PC? Was it statistically significant? Was the sample size large or small? How does that impact your confidence in the findings? How does sample size impact whether the correlation was significant? Also, what were the mean and SD? 2. Select only Whites who took the X3 test via PC (“App=1 & Race = 2”). Repeat the Correlate…Bivariate… analysis. 3. Select only Asian-Americans who took the X3 test via mobile iPad app (“App=2 & Race = 1”). Repeat the Correlate…Bivariate… analysis. 4. Select only Whites who took the X3 test via mobile iPad app. Repeat the Correlate…Bivariate… analysis. Think about the questions asked earlier (this page) for each of the four groups. For which groups is X3 a valid predictor of job performance? Does that mean that X3 is (or is not) “fair” (and to whom)? Does that affect the conclusions you drew earlier regarding whether the selection system you recommended from Part II is fair? 29 Extra Credit: Testing Every Individual Predictor for Each Racial Group Finding the Single Best Predictor for each Group with Correlations The above analysis tells you whether the one predictor (X1, X2, or X3, or the linear combination of some of those expressed as Y-predict) that you recommended from Part II (based on the overall group) fits each of the subgroups too. Whether it does or it doesn’t, you may want to know whether any of your other individual predictors are “better” (e.g., more valid) for specific subgroups than the one variable you recommended. To investigate this possibility, do the following with all predictors and Y: Step 1: Select one of the demographic subgroups using the Data… Select Cases… dropdown menus. Step 2: Click Analyze… Correlate… Bivariate. Run your Pearson correlation analysis for X1, X2, X3, and Y. Step 3: Select the other demographic subgroup using the Data… Select Cases… dropdown menus. Step 4: Repeat Step 2 Compare the correlations for the two groups. Are the largest correlations for one group (e.g., the correlation of X1 and Y for Asian-Americans) the same as the largest correlation coefficients for the other group? Or do you find that different variables are highly correlated with Y for the different groups (e.g., X1 for AsianAmericans and X3 for Whites)? If you find that the same X variable is the largest (and shows no single-group or differential validity – hint: test for those -- but rather is similarly valid for both groups), then ask yourself: Is this X-variable “better” than the system I recommended from Part II? Is it the same variable that you recommended in Part II (if you recommended only one X variable)? If so, that gives you confidence that you picked the most fair variable of the three predictors, doesn’t it? If you find that different predictors are better for the different groups (e.g., X1 for Asian-Americans and X3 for Whites), then you will have to make a tough decision before you make a recommendation: Do you want to give Asian-Americans applicants one type of predictor and White applicants a second type of predictor based on the validity statistics? Why or why not? Is there a second predictor that is “close” in magnitude of validity that shows no differential validity problems? What will you do? 30 Extra Credit: Are Some Combinations of Predictors Only Valid for Specific Demographic Groups? To investigate whether a combination of predictors is valid for each of the two groups separately: Step 1: Select one of your two demographic groups using the Data… Select Cases… dropdown menus Step 2: Click Analyze…Regression…Linear… Step 3: Select Y as your Dependent Variable. Recall that when you analyzed the data in Part II, you entered the X variables in blocks, based on their validity (largest to smallest). You used the “Enter” feature with three different “blocks.” This time, put all three predictors in the “Independent Variable(s)” box in one block and click on the drop-down menu where “Enter” appears. Select Stepwise. The “Stepwise” procedure allows the computer to select only the significant variables. It will exclude non-significant X variables. Step 4: Select the other demographic group using the Data… Select Cases… dropdown menus. Step 5: Repeat the Multiple Regression Analysis (Steps 2 & 3), again using the Stepwise procedure. Step 6: Look at the two sets of printouts. Did the computer pick the same predictor(s) for each group? If you find that the same combination of X variables are significant for both groups (and the Multiple R value shows no single-group or differential validity problems), and they are the ones that you recommended from Part II, then that gives you great confidence that what you recommended earlier is indeed the “best set” of predictors. They are valid and fair. This is what you hope for! If you find that different combinations of predictors are better for different groups (e.g., X1, X2, & X3 for Anglo employees and only X2 & X3 for Hispanic employees), then you will have to make a tough decision before you make a recommendation. Do you want to give Anglo applicants one set of predictors (in this example, all three predictors) and Hispanic applicants a separate set of predictors (in this example, just X2 & X3) based on the Multiple Regression validity statistics? Why or why not? Is there a “subset” of predictors (in this example, just X2 & X3) that is “close” in magnitude of validity for each group that shows no differential validity problems? (In the above example, you would probably want to re-run the regression for Anglo subjects specifying the subset of variables you thought might work well for both groups – X2 & X3 in this example). Generally, if you find that the same predictors do not work similarly well for the different demographic groups, what will you do? What will you recommend? Other Extra Credit Ideas 1. 2. 3. 4. Using the graphs… scatterplot… and chart editor…features, address the issue: Do quadratic lines (curves) explain more of the variance than linear lines (see p. 21)? Does that approach change your conclusions about the fairness of your system? Repeat some or all of the analyses you did in Part III (underutilization, validities, Multiple Rs, 80% rule, etc.) for different demographic groups, analyzed separately. For example: Is your selection system fair to both men and women? Is your selection system fair to both older and younger applicants? Do these analyses suggest the same predictors? Or different predictors? Read about and apply any or all of the following “alternative” Adverse Impact statistics to your data set: Upton’s Corrected Chi Square, the Reverse-one rule, and/or the 2SD rule. These are described in Collins, M. W., & Morris, S. B. (2008). Testing for adverse impact when sample size is small. Journal of Applied Psychology, 93 (2), 463-471. Re-do your overall selection system fairness analyses using just those who took PC-based testing for X3 (it has a larger sample size than the iPad group). Is this system fair and valid? Be sure to identify your extra credit printouts and to discuss the extra work in the body of your paper (otherwise, I might miss seeing your work)! (10% maximum) 31 Names ___________________________________________________ Equal Employment Opportunity Project Grading Rubric (2015): Checklist: ____Mentioned job & racial groups ____Mentioned three predictors briefly Purpose/Overview Literature Review & Explanation & Logic; Rationale leads to your study ____Did you cite ten sources? ____Mention models of fairness ____Relate lit review to type of job ____Relate lit review to relevant demographic groups ____Mention EEO laws or Uniform Guidelines ____Mention relevant court cases ____Mention Adverse Impact ____Mention predictor 1 (CSAP) & fairness ____Mention predictor 2 (WAB) & fairness ____Mention predictor 3 (TST) & fairness ____Mention PC vs. mobile app testing & fairness Method ____Geo. Adv. Imp. (underutilization) formula given ____Mention single-group validity ____Differential validity (r-z) ____Cleary (Regression) model ____Discuss “flow” analysis (80% Rule) Results ____ Geographic. (Stock/underutilization) Analysis ____ ____ Multiple Rs (for single-group validity) ____ Comparing Rs (differential validity) ____ Regression Lines examined ____ ____ Hit Rate for each Group ____ ____ Success Rate for each Group ____ Selection Ratios & Adverse Impact (Flow) ____ Validity comparisons of PC vs. mobile app testing. Discussion and Recommendations Reference Page: Correct Figures/Results/Printouts In an Appendix: EXTRA CREDIT: Additional analyses, etc. (10% max) Sex (1% each): Age (1% each): ____ Single Group ____ Single Group validity ____ Differential ____ Differential validity ____ Regression ____ Regression model ____ Flow Analysis ____ Flow Analysis ____ Geo. (Stock) ____ Geo. (Stock) ____ Hit & Success Rate ____ Hit & Success Rate ____Discusses results ____Recommends selection system ____Additional Policies related to EEO ____Discusses limits of the study & Directions for future research ____Ten+ complete references ____ Results are correct ____ Graphs appear correct ____ HR policy is included, clear, and reasonable Other extra credit: ___ Used rxy to find best predictor for ea. gp. ___ Stepwise Mult R. for each group ___ Quadratic regression ___ Upton’s Corrected Chi-Square ___ Reverse-One Rule ___ 2SD adverse impact test ___ Differential validity or Cleary w. PC vs. App ___ Checked system fairness w. just PC X3 group in the sample.