Ideal Answers to Chapter 5 Questions QUESTION 5.1. I decide to name this factor “traditionalism” because all of the value items that load heavily on this factor seem to involve traditional values, such as politeness, religiosity and respect for American power. QUESTION 5.2. I ran a principal components factor analysis. I chose the six survey items that loaded most highly on the factor with the highest eigenvalue. In so doing, I made sure that none of the six items loaded very highly on the other two factors that had eigenvalues greater than 1.0. The reliability of this measure of traditional values was high, especially for a measure with only six items (α = .83). This means that people who agreed with any one item on the scale tended, on average, to agree with the other five items. Further, deleting any one of these six items would have reduced the overall reliability of the scale. QUESTION 5.3. People who scored high in traditionalism tended to be older (r = .23), more politically conservative (r = .27), and more opposed to abortion (r = .29), all ps < .01. These associations are pretty intuitive, and they support my interpretation of “trism” as “traditionalism.” Older people tend to be more traditional, either because people tend to become more traditional as they age (e.g., because they become better served by the status quo), or because earlier U.S. cohorts were simply more traditional. It is also highly intuitive that conservative political attitudes go hand in hand with traditional values. Finally, more traditional people tend to be more opposed to abortion, perhaps because, historically speaking, abortion is a recent newcomer to the human scene and perhaps because religious conservatives tend to be strongly opposed to abortion. QUESTION 5.4. Because there were only 2 items in this scale, the item-total correlation for the (recoded) shyness item and the outgoing item had to be identical. That is, shyness correlates with outgoingness in exactly the same way that outgoingness correlates with shyness. QUESTION 5.5a. 2b. The Spearman Brown formula says r =.650 becomes a reliability estimate of .788 for this extraversion scale. Two items with individual estimated reliabilities of .65 combine to form a scale whose overall estimated reliability is .788. QUESTION 5.5b. Cronbach’s alpha for this scale is also exactly .788. That makes sense because alpha is based on the average correlation of every item in a scale with every other item in the scale. In the simplest possible case of a 2-item scale, the average of a single correlation is that correlation. QUESTION 5.6. The average correlation between the 10 individual items in the self-esteem scale and the outspoken variable was .26, with a range of r = -.07 to r = .43. It certainly makes sense that, on average, these correlations were positive. People high in self-esteem should be more confident in social situations and thus more willing to speak up. QUESTION 5.7. Yes, the scale was internally consistent, α = .80 (standardized item α = .83). This is a high reliability score for a 10-item scale. I didn’t delete any items from the scale because they all had good item-total correlations. Most important, as shown by the statistics under “alpha if item deleted” each individual item contributed positively to the scale’s reliability. Deleting any one item would have reduced the reliability of the scale to at least a small degree. QUESTION 5.8a. The correlation between the composite self-esteem scale and the outspoken variable was r = .46, p < .05. This was much higher than the average correlation for the 10 individual items (r = .26). Presumably, this happened because when you average multiple items together, the noise or error associated with any one specific item is reduced or negated by the unique source of error associated with a different specific item. Thus, noise or error tends to average out in the composite score, making that composite score superior. In this case reliability seems to contribute to validity in the sense that the validity correlation between self-esteem and outspokenness increase from an average of r = .26 to r = .46 for the total self-esteem score. QUESTION 5.8b. This self-esteem scale is temporally stable, that is, it has a high test-retest correlation, r = .76 (which converts to a Spearman-Brown reliability estimate of .86). People who say that they are high (or low) in self-esteem at time 1 tend to say the same thing at time 2. If self-esteem is an individual difference variable, we would hope this would be true. QUESTION 5.8c. When I averaged together the self-esteem scores from time 1 and time 2, the correlation between self-esteem and outspokenness went up even further (r = .50). This is another example of how multiple measurement increases reliability. For example, if someone were having a bad week when they filled out the time 1 self-esteem scale then this wouldn’t likely be true again at time 2, and the average score would better reflect the person’s true, chronic level of self-esteem. All else being equal, reliability does seem to contribute to validity QUESTION 5.9. I ran a reliability analysis and found that the items “gregarious,” “pleasant,” and “convivial” each reduced the overall reliability of the scale. Thus, I deleted them. I’m guessing that many people in this pilot study simply did not know the meaning of obscure terms such as gregarious and convivial. Presumably, they all knew what pleasant meant, but being pleasant would seem to have little to do with extraversion. It might load better on “agreeableness,” which is a different Big 5 personality trait. Alternately, this item might have little validity on any scale because it is so vague and socially desirable that everyone feels it describes them. In other words, “pleasant” may have also been a bad item because of a ceiling effect. The mean score of 8.7 (on a 9-point scale) for the “pleasant” item is highly consistent with this idea! After deleting the three bad items, the resulting 4-item scale was highly reliable for such a brief measure; α = .91. To reduce any concerns about a “yay-saying” bias inflating the reliability of this scale, I could simply add a few items that indicate a lack of extraversion (e.g., shy, quiet), and then recode these items. People who tend to say yes to everything would presumably tend to say yes to these items. If we had as many negative as positive indicators of extraversion, we could thus balance out this yay-saying bias. QUESTION 5.10. Jessica’s ratings are quite a bit higher than those of the other five raters. Specifically, she seems to be more generous than anyone else – in the sense that she is giving higher average delay of gratification ratings. Without looking more closely, though, it’s hard to say whether she is doing a poor or a stellar job. QUESTION 5.11a. Jessica is arguably the most reliable rater in the group. Her corrected “item-total correlation” (actually a rater-total correlation) was the highest of any rater. That is, her individual ratings for each child agreed very well with the average group rating received by each child (they agreed better, in fact, than those of any other rater). Deleting her ratings would reduce the reliability of the total EQM score. The alpha for the six raters was .87, but if we were to delete Jessica, the alpha would drop to .81. In contrast, Jolie seems to have been doing a very poor job. Her corrected item total correlation was only .15, and deleting her ratings from the group would substantially improve the reliability of the ratings (the alpha would increase to .91!). QUESTION 5.11b. Although Jessica’s ratings correlated very well with those of the other raters, there was a problem with her ratings. Specifically, she seems to be overly generous in her ratings across the board. Her rater mean was clearly the highest in the group. She should undergo a short re-training session, the goal of which would be to cut about 1 point off her average rating -- without influencing the rank order of her ratings. Because I am guessing that the absolute score (not merely the relative score) for the quiet mouse game has a lot of meaning, I would be concerned about the elevation in Jessica’s ratings. Thus, although elevation won’t change the correlation between the QM ratings and some other outcome we might care about, elevation could be a big problem if we want to say something about the mean level of these kids’ quiet mouse scores and compare them with kids in other samples. QUESTION 5.11c. At first glance, it looks like GS won. However, if you delete Jolie’s ratings (as you should because she is highly unreliable) the winner would be IB, with a mean score of 5.38. QUESTION 5.12. Although a few of these items had very low item total correlations, Cronbach’s alpha for this 27-item scale was quite acceptable, α = .82. Based on this reliability score, few people would presumably be inclined to question the internal consistency of the scale. QUESTION 5.13. Despite the high Cronbach’s alpha level of this scale, a principal components analysis of these 27 items clearly suggested that the data are more consistent with a 2-factor than with a 1-factor solution. Specifically, there were two factors with eigenvalues much higher than 1.0. Component 1 had an eigenvalue of 8.81 and accounted for 32.62% of the variance in the items. Component 2 had an eigenvalue of 4.96, and accounted for an additional 18.36% of the variance. Collectively, then, these two factors accounted for slightly more than half of the variance in the 27 items. The item loadings for the two factors make it extremely clear that the first 15 items load more heavily on the first factor than on any other. These 15 items seems to have to do a hedonistic form of life satisfaction that has to do with enjoying oneself and getting pleasure from life. The last 12 items actually seem to load negatively on this factor. Unlike the first set of items, these items seem to have to do with feeling that one’s life is meaningful or serves an important purpose. I thus labeled the two factors, pleasure and meaning, respectively. QUESTION 5.14. The first 15 items map very well onto what I called pleasure, and the last 12 items map very well onto what I called meaning. These two factors actually correlate negatively with one another, r (58) = -.27, p < .05. This certainly not what you’d expect of two subsets of the 27 items that seem to make up an internally consistent scale. Further, this suggests very strongly that a solid Cronbach’s alpha score for a scale does not guarantee that the scale is unidimensional. Finally, these results also suggest quite clearly that there are at least two very different kinds of life satisfaction. In fact, they are so different that people who tend to score high on one kind of life satisfaction tend to score low on the other kind.