Psychology 5130 Lecture 3 Summated Scale Construction Psychological Scales Scale: An instrument, usually a questionnaire, designed to measure a person's position on some dimension. Can be responded to by oneself or by others – yielding what is called self-report or other-report. Can assess ability, or attitudes, or personality Just a very few of the dimensions which are important are The Big 5 dimensions of Extroversion, Agreeableness, Conscientiousness, Stability, and Openness, Narcissism, Cognitive ability, Job Satisfaction, Organizational commitment, Positive Affectivity, Negative affectivity, Intent to leave, Job Embededness, Depression, Integrity, Self-esteem, . . . . . . . . . . . Do we have enough psychological scales? Here’s what one author has said . . . Focus here will be on Summated scales to measure personality related constructs. Historical Perspective Three types Guttman Thurstone Likert (pronounced Likkert) Now, most scales are Likert. P513 Lecture 3: Scale Construction - 1 3/16/2016 Likert Scale / Summated Scale A set of statements regarding the construct being measured is presented to respondents Respondents indicate the extent of agreement with each statement, on a response scale consisting of from 2 alternatives to 11 alternatives. Each response is assigned a numeric value. Respondent's position on the dimension being measured, that is, the respondents, score, is the sum or mean of the values of responses to the statements related to that dimension. An example questionnaire from which 5 scale scores are extracted follows on the next two pages. It’s the Sample 50-item Big Five questionnaire taken from the web site of the International Personality Item Pool (IPIP) (http://ipip.ori.org/ipip/). The items on the web site have been modified so that each is a complete sentence. For example, item 1 on the web site is “Am the life of the party.” Here it is “I am the life of the party.” Even numbered items have been shaded. I have no evidence that such shading is beneficial. The IPIP web site recommends a 5-point response scale. I prefer a 7-point response scale. If you need a 50-item Big Five questionnaire, you may copy and use what follows. Items are: E: 1,6,11,16,21,26,31,36,41,46 A: 2,7,12,17,22,27,32,37,42,47 C: 3,8,13,18,23,28,33,38,43,48 S: 4,9,14,19,24,29,34,39,44,49 O: 5,10,15,20,25,30,35,40,45,50 P513 Lecture 3: Scale Construction - 2 3/16/2016 Questionnaire ID__________________________ Circle the number that represents how accurately the statement describes you. 7 = Completely Accurate 6 = Very Accurate 5 = Probably Accurate 4 = Sometimes Accurate, Sometimes Inaccurate 3 = Probably Inaccurate 2 = Very Inaccurate 1 = Completely Inaccurate 1. I am the life of the party. 1 2 3 4 5 6 7 2. I feel little concern for others. 1 2 3 4 5 6 7 3. I am always prepared. 1 2 3 4 5 6 7 4. I get stressed out easily. 1 2 3 4 5 6 7 5. I have a rich vocabulary. 1 2 3 4 5 6 7 6. I don't talk a lot. 1 2 3 4 5 6 7 7. I am interested in people. 1 2 3 4 5 6 7 8. I leave my belongings around. 1 2 3 4 5 6 7 9. I am relaxed most of the time. 1 2 3 4 5 6 7 10. I have difficulty understanding abstract ideas. 1 2 3 4 5 6 7 11. I feel comfortable around people. 1 2 3 4 5 6 7 12. I insult people. 1 2 3 4 5 6 7 13. I pay attention to details. 1 2 3 4 5 6 7 14. I worry about things. 1 2 3 4 5 6 7 15. I have a vivid imagination. 1 2 3 4 5 6 7 16. I keep in the background. 1 2 3 4 5 6 7 17. I sympathize with others' feelings. 1 2 3 4 5 6 7 18. I make a mess of things. 1 2 3 4 5 6 7 19. I seldom feel blue. 1 2 3 4 5 6 7 20. I am not interested in abstract ideas. 1 2 3 4 5 6 7 21. I start conversations. 1 2 3 4 5 6 7 22. I am not interested in other people's problems. 1 2 3 4 5 6 7 23. I get chores done right away. 1 2 3 4 5 6 7 24. I am easily disturbed. 1 2 3 4 5 6 7 25. I have excellent ideas. 1 2 3 4 5 6 7 P513 Lecture 3: Scale Construction - 3 3/16/2016 Circle the number that represents how accurately the statement describes you. 7 = Completely Accurate 6 = Very Accurate 5 = Probably Accurate 4 = Sometimes Accurate, Sometime Inaccurate 3 = Probably Inaccurate 2 = Very Inaccurate 1 = Completely Inaccurate 26. I have little to say. 1 2 3 4 5 6 7 27. I have a soft heart. 1 2 3 4 5 6 7 28. I often forget to put things back in their proper place. 1 2 3 4 5 6 7 29. I get upset easily. 1 2 3 4 5 6 7 30. I do not have a good imagination. 1 2 3 4 5 6 7 31. I talk to a lot of different people at parties. 1 2 3 4 5 6 7 32. I am not really interested in others. 1 2 3 4 5 6 7 33. I like order. 1 2 3 4 5 6 7 34. I change my mood a lot. 1 2 3 4 5 6 7 35. I am quick to understand things. 1 2 3 4 5 6 7 36. I don’t like to draw attention to myself. 1 2 3 4 5 6 7 37. I take time out for others. 1 2 3 4 5 6 7 38. I shirk my duties. 1 2 3 4 5 6 7 39. I have frequent mood swings. 1 2 3 4 5 6 7 40. I use difficult words. 1 2 3 4 5 6 7 41. I don’t mind being the center of attention. 1 2 3 4 5 6 7 42. I feel others’ emotions. 1 2 3 4 5 6 7 43. I follow a schedule. 1 2 3 4 5 6 7 44. I get irritated easily. 1 2 3 4 5 6 7 45. I spend time reflecting on things. 1 2 3 4 5 6 7 46. I am quiet around strangers. 1 2 3 4 5 6 7 47. I make people feel at ease. 1 2 3 4 5 6 7 48. I am exacting in my work. 1 2 3 4 5 6 7 49. I often feel blue. 1 2 3 4 5 6 7 50. I am full of ideas. 1 2 3 4 5 6 7 P513 Lecture 3: Scale Construction - 4 3/16/2016 Example - Overall Job Satisfaction Scale from Michelle Hinton Watson's Thesis The items in this scale are presented as questions. In other instances, they are presented as statements. If presented as statements, the responses would represent amount of agreement. If you need an overall job satisfaction scale, you may use this. For each statement please put a check ( ) in the space showing how you feel about the following aspects of your job. This time, indicate how satisfied you are with the following things about your job. (1) Very Dissatisfied (2) Moderately Dissatisfied (3) Slightly Dissatisfied (4) Neither Satisfied nor Dissatisfied (5) Slightly Satisfied VD MD 1 2 SD 3 N 4 SS 5 (6) Moderately Satisfied MS 6 (7) Very Satisfied VS 7 Overall Satisfaction 27. How satisfied do you feel with your chances for getting ahead in this organization? ( ) ( ) ( ) ( ) ( ) ( ) ( ) 32. All in all, how satisfied are you with the persons in your work group? ( ) ( ) ( ) ( ) ( ) ( ) ( ) 35. All in all, how satisfied are you with your supervisor? ( ) ( ) ( ) ( ) ( ) ( ) ( ) 37. All in all, how satisfied are you with this organization, compared to most others? ( ) ( ) ( ) ( ) ( ) ( ) ( ) 43. How satisfied do you feel with the progress you have made in this organization up to now? ( ) ( ) ( ) ( ) ( ) ( ) ( ) 45. Considering your skills and the effort you put into the work, how satisfied are you with your pay? ( ) ( ) ( ) ( ) ( ) ( ) ( ) 50. All in all, how satisfied are you with your job? ( ) ( ) ( ) ( ) ( ) ( ) ( ) P513 Lecture 3: Scale Construction - 5 3/16/2016 Why do we have multi-item scales. 1. Precision. Consider the first item in the above scale. Suppose the true amounts of satisfaction of several respondents is represented by the positions of the top arrows: True Positions 27. All in all, how satisfied are you with your job? VD MD 1 2 SD 3 N 4 SS 5 ( ) ( ) ( ) ( ) ( ) MS 6 ( ) VS 7 ( ) The response labels represent points on a continuum. They're like 1-foot marks on a scale of height. So, each respondent above except the rightmost one would respond, MS, which would be assigned the value 6. But the rightmost respondent, whose actual position on the dimension is close to his/her nearest neighbor, would pick VS, creating a considerable “response” distance from that neighbor. Since each respondent can pick only one of the response CATEGORIES, any response made may miss the respondent’s true amount of satisfaction by about 7 percent on a 7-point scale, by about 10 percent on a 5-point scale. Note the wide range of actual feelings which would be represented by a 6 above. Consider that two persons very close in their actual feelings about the job could get scores which were 7% apart. E.g., a person whose actual feeling is 6.55 would check 7. But a person whose actual feeling was 6.45 would check 6. The difference of 1 would be much greater than the actual difference of .1 in actual feeling. See red arrows above. 6.45 6.55 6 7 This situation is analogous to one that most students have strong feelings about – the use of 5 grades to represent performance in a course. We all remember those instances in which we missed the next higher grade mark by a 10th of a point. The use of a single item with just a few response categories is analogous. Solution: Use multiple items. While each one may miss its mark considerably, some of the misses will be positive and some will be negative, cancelling out the positives, so the average of the responses will be very close to the respondent’s true position on the continuum. Conclusion: Having multiple items and computing scale scores by summing responses to the multiple items increases accuracy of identification of the respondent’s true position on a dimension. P513 Lecture 3: Scale Construction - 6 3/16/2016 2. Reliability. Since a single categorical item response involves only a gross approximation to the actual (True) feeling, on repeated measurement, a person giving only one response might get a very different score (6 vs. 7, for example) on a single item. This reduces reliability. Reducing reliability reduces estimated validity. Reducing estimated validity reduces your chances of getting published. Conclusion: Summing or averaging responses to multiple items results in a measure that is inherently more reliable. 3. Ability to use internal consistency to assess Reliability. It is possible to assess the reliability of multiple items scales in a single administration of the scale to a group by computing coefficient alpha. That is not possible with a single-item scale. Conclusion: Using multiple items and basing the scale score on the sum or mean greatly facilitates our ability to estimate the reliability of the scale score. 4. Insulation from the effects of idiosyncratic items. Sometimes, a respondent will have a unique reaction to the wording of a single item. This reaction may be based on the respondent’s history or understanding of that item. If that item is the only item in the scale, then the respondent’s position on the dimension will be greatly distorted by that reaction. Conclusion: Including multiple items and using the sum or mean of responses to them diminishes the influence of any one idiosyncratic item. Come on!!! What’s not to like about using multiple items??? 1. Test length. 2. Overestimation of reliability by alpha from using too-similar items. P513 Lecture 3: Scale Construction - 7 3/16/2016 Issues in development of Likert/Summated scales 1. Do you have to ask for agreement? The original idea was to assess agreement. But now, other ordered characteristics are used. E.g., level of satisfaction, strength of feeling, accuracy with which a statement describes you, etc. 2. How many response categories should be employed? 2-11. Seven or more is preferable. Spector on p. 21 recommends 5-9. There are 3 reasons to use 7 or more. a. If your study will involve level shifts between conditions, you should allow plenty of room to shift, which means using 7- or 9-point scales. b. If you plan to use confirmatory factor analysis of structural equation models on your data, seven or more response options per item is preferred for reasons associated with factor analysis. c. We’ve obtained results suggesting that inconsistency of responding is relatively independent of response level when 7 point scales are used. Below are correlations of inconsistency (vertical) vs. level (horizontal) for 5-point scales on the left and 7-point on the right. X-axis is individual person scores. Y-axis is standard deviations of items making up the score. Nhung Honest 5 point response scale; r=-.35 Incentive Honest – 7 point response scale; r = -.03 Nhung faking 5 point response scale; r=-.43 Incentive Faking 7 point response scale; r = +.08 Vikus 5 point response scale; r = -.49 FOR Study Gen 7 point response scale; r = -.04 Bias Study IPIP – 5 point response scale; r = -.32 FOR Study FOR 7 point response scale; r = +.08 Bias Study NEO-FFI 5 point response scale; r = -.34 Worthy Thesis 7 point response scale; r=-.18 P513 Lecture 3: Scale Construction - 8 3/16/2016 3. Should there be a neutral category? I am not familiar with a clear-cut, strong argument either way. I prefer one. If you analyze the data using Confirmatory Factor Analysis or Structural Equation Modeling, it doesn’t matter. My guess (and it’s just a guess) is that you’ll get a few more failures to respond without one, from people who just can’t make up their minds. And variability of responses might be slightly smaller with one, from those same people responding in the middle. But, I’m not aware of a meta-analysis on this issue. 4. What numeric values should be assigned to each response possibility for analyses based on sums or means? Although at one time there were arguments for scaling the various response alternatives, now almost everyone who analyzes the data traditionally uses successive equally spaced integers. They need not be, but everyone uses successive, as opposed to every other, for example, integers. For example Strongly Disagree 1 Disagree 2 Neutral 3 Strongly Agree 5 Agree 4 Or Strongly Disagree 1 Moderately Disagree 2 Disagree 3 Neutral 4 Agree 5 Moderately Agree 6 Strongly Agree 7 Newer Confirmatory Factor Analysis and Structural Equation Modeling based analyses assuming the data are “Ordered Categorical” require simply that the responses categories be ordered. No numeric assignment is required. 5. If the analyses are based on sums or means, which integers should be used? Answer: Any set will do. They should be successive integers. 1 to 5 or 1 to 7 0 to 4 or 0 to 6 -2 to +2 or -3 to +3. 6. Does agreement have to be high or low numbers? Yes, the God of statistics will strike you down if you make small numbers indicate more of ANY construct. Being a golfer will not save you. In fact, after the Tiger Woods debacle, it will make the strike from above even worse. I strongly prefer assigning numbers so that a bigger response value represents more of the construct as it is named. I’m sure it’s what the God of Statistics intended. P513 Lecture 3: Scale Construction - 9 3/16/2016 7. What about including negatively worded items, perhaps better labeled as “opposite idea” items. Negatively worded items may be included, although there is no guarantee that responses to negatively worded items will be the actual negation of what the responses to a positively worded counterpart would have been. I like my supervisor vs. I dislike my supervisor. Responses to these two items should be perfectly negatively correlated, but often they are not. Some studies have found that items with negative wording are responded to similarly to other negatively worded items, regardless of content or dimension, presumably due to the negativity of the item, regardless of the main content of the items. We have found this in seven datasets. Data on existence of “Wording” biases – From Biderman, Nguyen, Cunningham, & Ghorbani, 2011. We administered the IPIP and NEO-FFI Big Five Questionnaires to 200+ students and correlated corresponding factors from the two questionnaires. Data showing convergent validity of IPIP and NEO Big Five domain factor scores . Correlation of “purified” IPIP Extraversion with “purified” NEO Extraversion .624 Correlation of “purified” IPIP Agreeableness with “purified” NEO Agreeableness. .492 Correlation of “purified” IPIP Conscientiousness with “purified” NEO Conscientiousness .688 Correlation of “purified” IPIP Stability with “purified” NEO Stability .664 Correlation of “purified” IPIP Openness with “purified” NEO Openness .525 Data showing how much response tendencies correlate with each other Correlation of IPIP Positive Wording Bias with NEO Positive Wording Bias Correlation of IPIP Negative Wording Bias with NEO Negative Wording Bias .739 .752 These data, along with other data, suggest that 1) There are consistent individual differences in tendencies to respond to items based on whether the item is positively-worded or negatively-worded. 2) The individual differences in tendency to respond in a specific fashion to positively-worded items is separate and different from the tendency to response in a specific fashion to negatively-worded items. Moreover, the above data suggest that the bias tendencies are as reliable from one questionnaire to the other as are the Big Five characteristics. Recommendation: Best: Design and analyze your questionnaire so that it permits estimation of the bias tendencies. Nobody does this now, because the discovery of such wording-related response tendencies has just been made. Expedient: Have an equal number of positively-worded and negatively-worded items and average across wordings to cancel out differences in response tendencies associated with wording. P513 Lecture 3: Scale Construction - 10 3/16/2016 8. If negatively worded items are included, how should they be scored? Typically, negatively worded items are reverse-scored and then they’re treated as if they had been positively worded. Example for items with 5 categories and values 1,2,3,4, and 5. Original 1 2 3 4 5 Reversed 5 4 3 2 1 Suppose Q1 = I like my job Suppose Q7 = I don’t like to come to work in the morning, a negatively-worded item for job satisfaction. Data matrix: Person 1 2 3 4 Q1 5 4 1 2 Q7 2 1 5 4 Q7R 4 5 1 2 Scale as sum 9=5+4 9=4+5 2=1+1 4=2+2 Scale as mean 4.5 4.5 1 2 9. Should the scale score be the sum or the mean of items? If there are no missing values, the sum and the mean will be perfectly correlated – they’re mathematically equivalent, so you can use either. The mean is more easily related to the questionnaire items if they all have the same response format. SPSS’s RELIABILITY procedure computes only the sum. If there are missing values, use the mean of available items or use imputation techniques to be described next to impute missing values, after which it won’t matter whether you use the mean or sum. P513 Lecture 3: Scale Construction - 11 3/16/2016 10. What about missing responses? There are several possibilities a. Listwise deletion. Generally not preferred but if only a couple out of 200 are missing, use it. 1 2 3 4 Q1 5 2 3 1 Q2 4 _ 2 2 Q3 5 3 4 1 Substituted: 1 3 4 Q1 5 3 1 Q2 4 2 2 Q3 5 4 1 Problem: Can decimate the dataset. You may be left with highly conscientiousness, agreeable participants because that kind of participant is the only that that will respond to all the items. b. Item mean substitution. Substitute mean of other persons' responses to the same item for missing item. Item mean substitution. Not recommended. Q1 5 2 3 1 Q2 4 _ 2 2 Q3 5 3 4 1 Substituted: Q1 5 2 3 1 Q2 4 2.7 2 2 Q3 5 3 4 1 or Q1 5 2 3 1 Q2 4 3 2 2 Q3 5 3 4 1 | Mean of 4, 2, & 2 substituted. c. Person scale mean substitution. Substitute mean of other items from same scale that person responded to for the missing item. Assume Q1, Q2, and Q3 are three items forming a scale. Not recommended. Q1 Q2 Q3 Substituted: Q1 Q2 Q3 5 4 5 5 4 5 2 _ 3 2 2.5 3 Mean of 2 & 3 substituted. 3 2 4 3 2 4 1 2 1 1 2 1 d. Use a more sophisticated imputation technique. Several are available in SPSS. I have often used SPSS’s imputation techniques. --------------------------------- e. The convention wisdom is changing on issues of missing values. Many modern statistical techniques are designed to work with all available data. These techniques do not include REGRESSION and GLM. 11. Writing the items. Spector, p. 23... a. Each item should involve only one idea. The death penalty should be abolished because it’s against religious law. b. Avoid colloquialisms, jargon. I am the life of the party. I shirk my duties. e. Avoid items that might trigger emotional responses in certain samples. c. Consider the reading level of the respondent. d. Avoid using “not” to create negatively worded items. Good: Communication in my organization is poor. Bad: Communication is my organization is not good. P513 Lecture 3: Scale Construction - 12 3/16/2016 Self presentation biases when the same method is used to assess two or more constructs Suppose two independent constructs are being measured using summated rating scales. Suppose each construct was measured with a two-item scale using a 6-valued response format consisting of the values 1 through 6. Suppose 16 persons participated, giving the following matrix of responses. Construct 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Q1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q2 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Construct 2 Q3 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 Q4 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 For these hypothetical data, Q1 and Q2 are perfectly correlated, as are Q3 and Q4. Obviously, items within the same scale are not perfectly correlated in real life. But Q1+Q2 are uncorrelated with Q3+Q4. The constructs are independent. compute C1=mean(Q1,Q2). compute C2=mean(Q3,Q4). correlate c1 with c2. Syntax to create construct scale scores True Correlation between the constructs, C1 and C2. Correlations C1 Pearson Correlation Sig. (2-tailed) N C2 .000 1.000 16 P513 Lecture 3: Scale Construction - 13 3/16/2016 Now, suppose that the odd-numbered participants were people who preferred the low end of whatever response scale they were filling out, while the even numbered participants were people who preferred the high end of whatever scale they were filling out. Obviously, our participants don’t separate into odd-even groups like this, but they do separate. There ARE people who prefer the high end of the response scale and there ARE people who prefer the low end. For example, many personality items have valence – agreeing indicates you’re “good”; disagreeing indicates you’re “not good”. We think that those who are feeling good about themselves will be tend to choose the agreement end of the response scale and those who are not feeling so good about themselves will tend to choose the disagreement end. Assume that the response tendency results in a bias of 1 response value down in the case of those who tend to disagree or up in the case of those who tend to agree. True Values The resulting data matrix would look like the following . . . Construct 1 Person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Biased Q1 1 3 1 3 2 4 2 4 3 5 3 5 4 6 4 6 Construct 2 Biased Q2 1 3 1 3 2 4 2 4 3 5 3 5 4 6 4 6 Biased Q3 1 4 3 6 1 4 3 6 1 4 3 6 1 4 3 6 Biased Q4 1 4 3 6 1 4 3 6 1 4 3 6 1 4 3 6 Construct 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 True Q1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Construct 2 True Q2 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 True Q3 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 True Q4 2 3 4 5 2 3 4 5 2 3 4 5 2 3 4 5 Now the correlation between Q1+Q2 with Q3+Q4 is .555, a value that is statistically significant. The point of this is that differences in participants' response tendencies (e.g., the tendency of some to use only the upper part of a response scale while others use the lower part of the scale) can result in positive correlations between constructs that are in fact, uncorrelated. This is the method bias problem that plagues the use of summated rating scales. Correlations compute biasedC1=mean(biasedq1,biasedq2). compute biasedC2=mean(biasedq3,biasedq4). X = T + B + eij correlate biasedc1 with biasedc2. P513 Lecture 3: Scale Construction - 14 biasedC1 Pearson Correlation Sig. (2-tailed) N 3/16/2016 biasedC2 .555 .026 16 The process of creating a summated scale. 1. Define/conceptualize the construct to be measured. 2. See if someone else has already created a scale measuring that construct. If so, and if it appears OK, don’t re-invent the wheel. Faculty. Buros Institute. IPIP web site. Google. Remember . . . 3. If you must develop your own, begin by generating items. 4. Have a sample of SMEs rate the extent to which each item represents the construct. Keep only the best. 5. Administer items to a pilot sample from the population of interest. 6. Perform item analysis of the responses of the pilot sample. a. Assess reliability. b. Identify bad items, those that reduce reliability, and eliminate them. c. Assess dimensionality using exploratory factor analysis. 7. Perform a validation study assessing convergent and discriminant validity using the population of interest, perhaps using the pilot sample. a. Administer other similar scales. b. Administer other discriminating scales. 8. Administer to a sample from the population of interest along with the other scales that are part of your research project. Assess the theoretical relationships of interest to you. 9. Publish the scale and get rich. P513 Lecture 3: Scale Construction - 15 3/16/2016 Example of processing items of a scale in SPSS This example is taken from an independent study project conducted by Lyndsay Wrensen examining factors related to faking of the Big 5 personality inventory. She administered the IPIP Big 5 inventory twice – once under instructions to respond honestly and again (counterbalanced) under instructions to respond as if seeking a customer service job. The data here are the honest condition responses to the Extroversion scale. Participants read each item and indicate how accurately it described them using 1=Very inaccurate to 5=Very accurate. Some of the items were negatively worded. We now would use a 7-point response scale. This project was done almost 10 years ago. Extroversion item responses before reverse-scoring the negatively worded items. of P513 Lecture 3: Scale Construction - 16 3/16/2016 1. Reverse-score the negatively-worded items. SPSS Syntax to reverse-score negatively worded items. recode he2 he4 he6 he8 he10 (1=5)(2=4)(3=3)(4=2)(5=1) into he2r he4r he6r he8r he10r. execute. Or you can do the reverse scoring manually or using pull-down menus. However, you do it, put the reverse-scored values in columns that are different from the originals. Extroversion item responses after reverse-scoring the negatively worded items. P513 Lecture 3: Scale Construction - 17 3/16/2016 2. Deal with missing data. (Not illustrated here.) For example, use SPSS’s imputation features. Set up a time with me, and I’ll walk you through the process. 3. Perform reliability analyses. Analyze -> Scale -> Reliabilities Note that the reverse-scored items are the ones that are included in the RELIABILITY analysis. Reliability Wa rnings Th e covarian ce m atrix is cal culat ed an d use d in the a nalysis. Ca se Pr oces sing Summ ary N Ca ses Va lid Exclude d a To tal % 179 90. 4 19 9.6 198 100 .0 a. Listwise delet ion b ased on al l vari ables in th e pro cedure. P513 Lecture 3: Scale Construction - 18 3/16/2016 Re liabil ity Statisti cs Cro nbach's Alp ha B ased on Sta ndardized Cro nbach's A lpha Ite ms .85 9 .86 0 N o f Item s 10 Ite m Sta tistic s Me an Std . Deviatio n 3.1 3 1.1 22 he 1 N 17 9 he 3 3.9 7 .90 8 17 9 he 5 3.7 2 1.0 93 17 9 he 7 3.3 4 1.2 77 17 9 he 9 3.4 1 1.2 16 17 9 he 2r 3.5 6 1.2 54 17 9 he 4r 3.2 7 1.1 36 17 9 he 6r 3.7 9 1.1 10 17 9 he 8r 2.7 4 1.2 24 17 9 he 10r 2.7 0 1.2 85 17 9 Inter-Ite m Correla tion M atrix he 1 he 1 1.0 00 he 3 .33 4 he 5 .24 5 he 7 he 3 he 5 .24 5 .51 8 .54 2 he 2r .36 7 he 4r .43 5 he 6r .21 1 he 8r .33 6 he 10r .29 6 1.0 00 .47 3 .37 2 .33 1 .35 4 .36 7 .36 8 .17 0 .38 3 .47 3 1.0 00 .55 3 .32 9 .42 1 .40 7 .48 8 .24 2 .51 9 .51 8 .37 2 .55 3 1.0 00 .42 7 .39 1 .40 4 .44 6 .19 8 .37 5 he 9 .54 2 .33 1 .32 9 .42 7 1.0 00 .38 2 .49 6 .25 8 .46 1 .29 5 he 2r .36 7 .35 4 .42 1 .39 1 .38 2 1.0 00 .55 0 .57 2 .33 1 .44 8 he 4r .43 5 .36 7 .40 7 .40 4 .49 6 .55 0 1.0 00 .37 5 .37 1 .45 0 he 6r .21 1 .36 8 .48 8 .44 6 .25 8 .57 2 .37 5 1.0 00 .24 1 .41 7 he 8r .33 6 .17 0 .24 2 .19 8 .46 1 .33 1 .37 1 .24 1 1.0 00 .21 0 he 10r .29 6 .38 3 .51 9 .37 5 .29 5 .44 8 .45 0 .41 7 .21 0 1.0 00 .33 4 he 7 he 9 Th e covarian ce m atrix is cal culate d an d use d in the a nalysis. Summa ry Ite m Sta tistic s Item Me ans Item Va riance s Inte r-Item Co rrelati ons Me an 3.3 63 Min imum 2.6 98 Ma ximu m 3.9 72 Ra nge 1.2 74 Ma ximu m / Min imum 1.4 72 Va riance .18 0 N o f Item s 10 1.3 63 .82 5 1.6 50 .82 5 2.0 00 .06 5 10 .38 1 .17 0 .57 2 .40 2 3.3 63 .01 0 10 Th e covarian ce ma trix i s calculate d and use d in th e an alysis. Ite m-Total Statisti cs he 1 Scale M ean if Scale V arian ce Ite m De leted if I tem Delete d 30 .50 50 .162 Co rrecte d Ite m-To tal Co rrelat ion .54 7 Sq uared Mul tiple Co rrelat ion .45 6 Cro nbach's A lpha if I tem Delete d .84 8 he 3 29 .66 52 .496 .51 6 .31 2 .85 1 he 5 29 .92 49 .504 .61 2 .50 4 .84 2 he 7 30 .29 47 .724 .61 0 .50 1 .84 2 he 9 30 .22 48 .691 .58 6 .44 9 .84 4 he 2r 30 .07 47 .501 .63 8 .48 9 .83 9 he 4r 30 .36 48 .546 .64 9 .45 5 .83 9 he 6r 29 .84 50 .080 .56 0 .44 3 .84 7 he 8r 30 .89 51 .309 .41 7 .27 3 .85 9 he 10r 30 .93 48 .501 .55 7 .37 5 .84 7 P513 Lecture 3: Scale Construction - 19 3/16/2016 New Directions in Measurement of Psychological Constructs 1) Measurement Using Factor Scores from factor analyses Factor scores are measures obtained from factor analyses of items. Factors scores are computed by differentially weighting each item according to its contribution to the indication of the dimension. Items which are not highly correlated with the dimension are given little weight. Those which are highly correlated with the dimension are given more weight. Note that summated scale scores are computed by equally weighting each item that is thought to be relevant. The loadings of the items on the factor are used to determine the weights. Advantages of factor scores Factor score methods are thought by some to be more refined than the equal weighting that is part of traditional summated scores because each item is weighted according to how much it represents the factor. They probably better capture the dimension of interest. They’re probably more highly correlated with the dimension than the simple sum of items. They can be computed taking into account other factors that might influence the items, thus may be uncontaminated by the other factors. Disadvantages of factor scores They are harder to compute, requiring a factor analysis program. The weights will differ from sample to sample so your weighting scheme based on your sample will differ from my weighting scheme based on my sample. P513 Lecture 3: Scale Construction - 20 3/16/2016 2) Taking self-presentation tendencies into account The examples given above in which the effects of method bias (self-presentation tendency) on correlations were shown illustrate this. I believe that in the future, scores on scales, such as those of the Big Five, will be computed as factor scores, with the effects of self-presentation tendencies removed. Right now, that’s not happening, except in our own research. 3) Using the whole buffalo: Measuring other aspects of responses to questionnaires, such as inconsistency Virtually all scales represent the mean of responses to items representing a dimension. So, a Conscientiousness score is the mean of the responses of a person to the Conscientiousness items in a questionnaire. What about the variability of a person’s responses to the C items? We’ve been exploring the relationships of Inconsistency of responding, as measured by the standard deviation of persons’ responses to items from the same dimension. Recent data (Reddock, Biderman, & Nguyen, 2011). Overall UGPA was the criterion. Conscientiousness and Variability were predictors. Results with Conscientiousness scale scores from the FOR condition FORCon .250 UGPA R = .315 -.219 Inconsistency Both standardized coefficients are significant at p < .01. These data suggest that Inconsistency of Responding may be a valid predictor of certain types of performance. References Reddock, C. M., Biderman, M. D., & Nguyen, N. T. (2011). The relationship of reliability and validity of personality tests to frame-of-reference instructions and within-person inconsistency. International Journal of Selection and Assessment, 19, 119-131. For example: If you give a Big 5 questionnaire to a group of respondents, you can measure the following 11 attributes Extraversion Agreeableness Conscientiousness Stability Openness General Affect Positive Wording Bias Negative Wording Bias Inconsistency Extreme Response Tendency Acquiescent Response Tendency P513 Lecture 3: Scale Construction - 21 3/16/2016 4) Techniques based on Item Response Theory (IRT). There is a burgeoning literature on test construction based on item response theory techniques that threatens to displace Likert/Summated response theory based techniques. Although at the present time, the vast majority of personality tests are based on summated responses, this will change in the coming years. IRT based tests make use of items that have been scored, much as we now score people, so that each item has a “scale” value, just like each person has a scale score. Items are selected so that they represent a range of scale values, and a person’s score on a test constructed from such items is based on which items the person endorsed or got correct, not simply on how many the person endorsed or got correct. P513 Lecture 3: Scale Construction - 22 3/16/2016