AOM PAPER 13912 Comprehensive Assessment of Team Member Effectiveness: A Behaviorally Anchored Rating Scale Abstract Instructors who use team learning methods often use self and peer evaluations to assess individual contributions to team projects to provide developmental feedback or adjust team grades based on individual contributions. This paper describes the development of a behaviorally anchored rating scale (BARS) version of the Comprehensive Assessment of Team Member Effectiveness (CATME). CATME is a published, Likert-scale instrument for self and peer evaluation that measures team-member behaviors that the teamwork literature has shown to be effective. It measures students’ performance in five areas of team-member contributions: contributing to the team’s work, interacting with teammates, keeping the team on track, expecting quality, and having relevant knowledge, skills, and abilities. A BARS version has several advantages for instructors and students, especially that it can help students to better understand effective and ineffective team-member behaviors and it requires fewer rating decisions. Two studies were conducted to evaluate the instrument. The first shows that the CATME BARS instrument measures the same dimensions as the CATME Likert instrument (linking the BARS version to the theoretical underpinnings of the Likert version). The second provides evidence of convergent and predictive validity for the CATME BARS. Ways to use the instrument and areas for future research are discussed. KEYWORDS: Peer Evaluation, Team-member effectiveness, Teamwork AOM PAPER 13912 College faculty in many disciplines are integrating team-based and peer-based learning methods into their courses in an effort to help students develop the interpersonal and teamwork skills that employers strongly desire but students often lack (Boni, Weingart, & Evenson, S., 2009; Cassidy, 2006; Verzat, Byrne, & Fayolle, 2009). Further, students are often asked to work in teams as part of experiential, service learning, research-based learning, and simulation-based learning projects, which aim to give students more hands-on practice and make them more active participants in their education than traditional lecture-based coursework (Kolb & Kolb, 2005; Loyd, Kern, & Thompson, 2005; Raelin, 2006; Zantow, Knowlton, & Sharp, 2005). As a result, there is a recognized need for students to learn teamwork skills as part of their college education and teamwork is increasingly integrated into the curriculum, yet “teamwork competencies and skills are rarely developed” (Chen, Donahue, & Klimoski, 2004: 28). Teamwork presents challenges for both students and faculty. Student teams often have difficulty working effectively together and may have problems such as poor communication, conflict, and social loafing (Burdett, 2003; Verzat et al., 2009). For instructors, challenges in managing teamwork include preventing or remediating these student issues and assigning fair grades to students based on their individual contributions (Chen & Lou, 2004). Evaluating students’ individual contributions to teamwork in order to develop team skills and deter students from free riding on the efforts of teammates is an important part of team-based instructional methods (Michaelsen, Knight, & Fink, 2004). This paper describes the development and testing of an instrument for self and peer evaluation of individual member’s contributions to the team for use in educational settings where team-based instructional methods are used. The results of two studies provide support for the use of the instrument. 1 AOM PAPER 13912 LITERATURE REVIEW There are several reasons why teamwork is prevalent in higher education. First, educational institutions need to teach students to work effectively in teams to prepare them for careers because many graduates will work in organizations where team work methods are used. About half of U.S. companies and 81% of Fortune 500 companies use team-based work structures because they are often more effective or efficient than individual work (Hoegl & Gemuenden, 2001; Lawler, Mohrman, & Benson, 2001). Second, team-based or collaborative learning is being used with greater frequency in many disciplines to achieve the educational benefits of students working together on learning tasks (Michaelsen, et al., 2004). Team learning makes students active in the learning process, allows them to learn from one another, makes students responsible for doing their part to facilitate learning, and increases students’ satisfaction with the experience compared to lecture classes (Michaelsen, et. al, 2004). Finally, accreditation standards in some disciplines require that students demonstrate an ability to work in teams (ABET, 2000; Commission on Accreditation for Dietetics Education, 2008). Self- and Peer Evaluations Many team-based learning advocates agree that the approach works best if team grades are adjusted for individual performance. Although instructors could use various methods to determine grade adjustments, peer evaluations of teammates’ contributions are the most commonly used method. Millis and Cottell (1998) give strong reasons for using peer evaluation to adjust student grades. Among these are that students, as members of a team, are better able to observe and evaluate members’ contributions than are instructors, who are outsiders to the team. In addition, peer evaluations increase students’ accountability to their teammates and reduce free riding. Furthermore, students learn from the process of completing peer evaluations. Self and 2 AOM PAPER 13912 peer evaluations may also be used to provide feedback in order to improve team skills (Chen, Donahue, & Klimoski, 2004). There is a fairly large body of management research that examines peer ratings, which are used in work contexts, such as 360-degree performance appraisal systems and assessment center selection techniques, in addition to executive education and traditional educational settings (Fellenz, 2006; Hooijberg & Lane, 2009; London, Smither, & Adsit, 1997; Saavedra & Kwun, 1993; Shore, Shore, & Thornton, 1992). Several meta-analytic studies have examined the inter-rater reliability of peer evaluation scores and their correlations with other rater sources and have found that peer ratings are positively correlated with other rating sources and have good predictive validity for various performance criteria (Conway & Huffcutt, 1997; Harris & Schaubroeck, 1988; Schmitt, Gooding, Noe, & Kirsch, 1984; Viswesvaran, Ones, & Schmidt, 1996; Viswesvaran et al., 2005). In work and educational settings, peer evaluations are often used as part of a comprehensive assessment of individuals’ job performance to take advantage of information that only peers are in a position to observe. Thus, peer ratings are often one part of a broader performance assessment that also includes ratings by supervisors or instructors or other sources (Viswesvaran, Schmidt, & Ones, 2005). In addition to assessing performance, peer evaluations of teamwork reduce the tendency of some team members to free-ride on the efforts of others. Completing peer evaluations also helps individuals to understand the performance criteria and how everyone will be evaluated (Sheppard, Chen, Schaeffer, Steinbeck, Neumann, & Ko, 2004). Self appraisals are valuable because they involve ratees in the ratings process, provide information to facilitate a discussion about an individual’s performance, and in the performance appraisal literature, research has shown that ratees feel that self appraisals should be included in 3 AOM PAPER 13912 the evaluation process and that ratees want to provide information about their performance (Inderrieden, Allen, & Keaveny, 2004). Self-appraisals are, however, vulnerable to leniency errors (Inderrieden, et al., 2004). An additional problem that plagues self-appraisals is that people who are unskilled in an area are often unable to recognize deficiencies in their own skills or performance because their lack of competence prevents them from having the metacognitive skills to recognize their deficiencies (Kruger & Dunning, 1999). Therefore, people with poor teamwork skills would be likely to overestimate their own abilities and contributions to the team. A number of problems are associated with both self and peer ratings (Saavedra & Kwun, 1993). One is that many raters, particularly average and below-average performers, do not differentiate in their ratings of team members when it is warranted. Furthermore, when rating the performance of team members, raters tend to use a social comparison framework and rate members relative to one another rather than using objective, independent criteria. Also, some raters worry that providing accurate ratings would damage social relations in the team. In spite of the limitations of self and peer evaluations, instructors commonly use them because instructors need a measure of students’ teamwork contributions that can be used to adjust students’ grades in order to effectively implement team-based learning. However, although the use of peer evaluations is commonplace, there is not widespread agreement about what a system for self and peer evaluation should look like. This paper describes the development of a behaviorally anchored rating scale (BARS) instrument for the self and peer evaluation of students’ contributions to teamwork in higher education classes that is based on research about what types of contributions are necessary for team-members to be effective. CATME Likert-Scale Instrument 4 AOM PAPER 13912 Although a large body of literature examines individual attributes and behaviors associated with effective team performance (Stewart, 2006), there have been no generally accepted instruments for assessing individuals’ contributions to the team. Some peer evaluation instruments, or peer evaluation systems, such as students dividing points among team members, have been published, but they have not become standard in college classes (e.g,. Fellenz, 2006; Gatfield, 1999; Gueldenzoph & May, 2002; McGourty & De Meuse, 2001; Moran, Musselwhite, & Zenger, 1996; Rosenstein & Dickinson, 1996; Sheppard, Chen, Schaeffer, Steinbeck, Neumann, & Ko, 2004; Taggar & Brown, 2001; Van Duzer & McMartin, 2000; Wheelan, 1999). Further, not all published peer evaluation instruments were designed to align closely with the literature on team-member effectiveness. To meet the need for a research-based, peer evaluation instrument for use in college classes, Loughry, Ohland, and Moore (2007) developed the Comprehensive Assessment of Team Member Effectiveness (CATME). The researchers used the teamwork literature to identify the ways by which team members can help their team to be effective. They drew heavily on the work of Anderson and West (1998), Campion, Medsker, and Higgs (1993), Cannon-Bowers, Tannenbaum, Salas, and Volpe (1995), Erez, LePine, and Elms (2002), Guzzo, Yost, Campbell, and Shea (1993), Hedge, Bruskiewicz, Logan, Hanson, and Buck (1999), Hyatt and Ruddy (1997), McAllister (1995), McIntyre and Salas (1995), Niehoff and Moorman (1993), Spich and Keleman (1985), Stevens & Campion (1994 & 1996), and Weldon, Jehn, and Pradhan (1991). Based on the literature, they created a large pool of potential items, which they then tested using two large surveys of college students. They used exploratory factor analysis and confirmatory factor analysis to select the items for the final instrument. The researchers found 29 specific types of team-member contributions that clustered into five broad categories (contributing to the team’s work, interacting with teammates, 5 AOM PAPER 13912 keeping the team on track, expecting quality, and having relevant knowledge, skills, and abilities). The full (87-item) version of the instrument uses three items to measure each of the 29 types of team member contributions with high internal consistency. The short (33-item) version uses a subset of the items to measure the five broad categories (these items are presented in Appendix A). Raters rate each teammate on each item using Likert scales (strongly disagree strongly agree). Although the CATME instrument is solidly rooted in the literature on team effectiveness, instructors who wish to use peer evaluation to adjust their students’ grades in team-based learning projects may need a simpler instrument. Even the short version of the CATME requires that students read 33 items and make judgments about each of the items for each of their teammates. If there are four-person teams and the instructor requires a self-evaluation, each student must make 132 independent decisions to complete the evaluation. To carefully consider all of these decisions may require more effort than students are willing to put forth. Then, in order for instructors to use the instrument to adjust grades within the team, the instructor must review 528 data points (132 for each of the four students on the team) before making the computations necessary to determine an adjustment factor for the four students’ grades. For instructors who have large numbers of students and heavy work loads, this may not be feasible. Other published peer evaluation systems, such as those cited earlier, also require substantial effort, which may be one reason why they have not been widely adopted. Another concern about the CATME instrument is that students using the Likert-scale format to make their evaluations may have very different perceptions about what rating a teammate who behaved in a particular way deserves. This is because the response choices (strongly disagree - strongly agree) are not behavioral descriptions. Therefore, although the 6 AOM PAPER 13912 CATME Likert-format instrument has a number of strengths, some instructors who need to use peer evaluation may prefer a shorter, behaviorally anchored version of the instrument. In the remainder of this paper, we describe our efforts to develop a behaviorally anchored rating scale (BARS) version of the CATME instrument. We believe that a BARS version will have two main advantages that, in some circumstances, may make it more appropriate than the original CATME for instructors who want to adjust individual grades in team learning environments. First, because the BARS instrument will assess only the five broad areas of teammember contribution, raters will only need to make five decisions about each person they rate. In a four-person team, this would reduce the burden for students who fill out the survey from 132 decisions to 20. For the instructor, it reduces the number of data points to review for the team from 528 to 80. In addition, by providing descriptions of the behaviors that a team member would display to warrant any given rating, a BARS version of the instrument could help students to better understand the standards by which both they and their peers are being evaluated. If students learn from reading the behavioral descriptions what constitutes good performance and poor performance, they may try to display more effective team behaviors and refrain from behaviors that harm the team’s effectiveness. Behaviorally Anchored Ratings Scales Behaviorally anchored rating scales provide a way to measure how much an individual’s behavior in various performance categories contributes to achieving the goals of the team or organization of which the individual is a member (Campbell, Dunnette, Arvey, & Hellervick, 1973). The procedure for creating BARS instruments was developed in the early 1960s and became more popular in the 1970s (Smith & Kendall, 1963). Subject-matter experts (SMEs), who fully understand the job for which the instrument is being developed, provide input for its 7 AOM PAPER 13912 creation. More specifically, SMEs provide examples of actual performance behaviors, called “critical incidents,” and help to classify whether the examples represent high, medium, or low performance in the category in question. The scales provide descriptions of specific behaviors that people at various levels of performance would typically display. Research on the psychometric advantages of BARS scales has been mixed (MacDonald & Sulsky, 2009). However, there is research to suggest that BARS scales have a number of advantages over Likert ratings scales, such as greater inter-rater reliability and less leniency error (Campbell, et al., 1973; Ohland, Layton, Loughry, & Yuhasz 2005). In addition, instruments with descriptive anchors may generate more positive rater and ratee reactions, have more face validity, be more defensible in court, and have strong advantages for raters from collectivist cultures (MacDonald & Sulsky, 2009). In addition, training both raters and ratees using a BARS scale prior to using it to evaluate performance may make the rating process easier and more comfortable for raters and ratees (MacDonald & Sulsky, 2009). Using the BARS process for developing anchors also greatly facilitates frame-of-reference (FOR) training, which has been shown to be the best approach to rater training (Woehr & Huffcutt, 1994). DEVELOPMENT OF THE CATME BARS INSTRUMENT A cross-disciplinary team of nine university faculty members with expertise in management, education, education assessment, various engineering disciplines, and engineering education research collaborated to develop a BARS version of the CATME instrument. We used the “critical incident methodology” described in Hedge, et al. (1999) to develop behavioral anchors for the five broad categories of team-member contribution measured by the original CATME instrument. This method is useful for developing behavioral descriptions to anchor a rating scale, and requires the identification of examples of a wide range of behaviors from poor to excellent (rather than focusing only on extreme cases). Critical 8 AOM PAPER 13912 incidents include observable behaviors related to what is being assessed, context, and the result of the behavior, and must be drawn from the experience of subject-matter experts who also categorize and translate the list of critical incidents. All members of the research team can be considered as subject-matter experts on team learning and team-member contributions. All have published research relating to student learning teams and all have used student teams in their classrooms. Collectively, they have approximately 90 years of teaching experience. We began by working individually to generate examples of behaviors that we have observed or heard described by members of teams. These examples came from our students’ teams, other teams that we have encountered in our research and experience, and teams (such as faculty committees) of which we have been members. We noted which of the five categories of team contribution we thought that the observation represented. We exchanged our lists of examples and then tried to generate more examples after seeing each other’s ideas. We then held several rounds of discussions to discuss our examples and arrive at a consensus about which behaviors described in the critical incidents best represented each of the five categories in the CATME Likert instrument. We came to an agreement that we wanted fivepoint response scales with anchors for high, medium and low team-member performance in each category. We felt that having more than five possible ratings per category would force raters to make fine-grain distinctions in their ratings that would increase the cognitive effort required to perform the ratings conscientiously. This would have been in conflict with our goal of creating an instrument that was easier and less time-consuming to use than the CATME-Likert version. We also agreed that we wanted the instrument’s medium level of performance (3 on the 5-point scale) to describe satisfactory team-member performance. Therefore, the behaviors that anchored the “3” rating would describe typical or average team-member contributions. The 9 AOM PAPER 13912 behavioral descriptions that would anchor the “5” rating would describe excellent team contributions, or things that might be considered as going well above the requirements of being a fully satisfactory team member. The “1” anchors would describe behaviors that are commonly displayed by poor or unsatisfactory team members and therefore are frequently heard complaints about team members. The “2” and “4” ratings do not have behavioral anchors, but provide an option for students who want to give a rating in-between the levels described by the anchors. Having agreed on the types of behaviors that belonged in each of the five categories and our goal to describe outstanding, satisfactory, and poor performance in each, we then worked to reach consensus about the descriptions that we would use to anchor each category. We first made long lists of descriptive statements that everyone agreed fairly represented the level of performance for the category. Then we developed more precise wording for our descriptions, and prioritized the descriptions to determine which behaviors were most typical of excellent, satisfactory, and poor performance in each category. There was considerable discussion about how detailed the descriptions for each anchor should be. Longer, more-detailed descriptions that listed more aspects of behavior would make it easier for raters using the instrument to recognize that the ratee’s performance was described by the particular anchor. Thus, raters could be more confident that they were rating accurately. However, longer descriptions require more time to read and take more space on a page, and thus are at odds with our goal of designing an instrument that is simple and quick to complete. Our students have told us often, and in no uncertain terms, that the longer an instrument is, the less likely that they are to read it and put full effort into answering each question. We therefore decided to create an instrument that could fit on one page. We agreed that three bulleted lines of description would be appropriate for each of the five categories of behavior. The instrument we 10 AOM PAPER 13912 developed after many rounds of exchanging drafts among the team members and gathering feedback from students and other colleagues is presented in Appendix B. We pilot-tested the instrument in several settings, including a junior-level course with 30 sections taught by five different graduate assistants, in order to improve both the instrument and our understanding of the factors that are important for a smooth administration. For example, to lessen peer pressure, students must not be seated next to their teammates when they complete the instrument. Because the BARS format is less familiar to students than is the Likert format, we found that it is helpful to students to explain how to use the BARS scale, which takes about 10 minutes. It then takes about 10 minutes for students in 3- to 4-member teams to conscientiously complete the self-and peer evaluation. After our testing, we decided that it would be beneficial to develop a web-based administration of the CATME BARS instrument. A web administration provides confidentiality for the students and makes it easier for instructors to gather the self and peer evaluation data, eliminates the need for instructors to type the data into a spreadsheet, and makes it easier for instructors to interpret the results. After extensive testing to make sure that the web interface worked properly, we conducted two studies of the CATME BARS instrument. STUDY 1: EXAMINING THE PSYCHOMETRIC PROPERTIES OF THE CATME BARS MEASURE The primary goal of Study 1 was to evaluate the psychometric characteristics of the webbased CATME BARS instrument relative to the paper-based CATME-Likert instrument. We therefore administered both measures to a sample of undergraduate students engaged in teambased course-related activity. We then used an application of generalizability (G) theory (Brennan, 1994; Shavelson & Webb, 1991) to evaluate consistency and agreement for each of 11 AOM PAPER 13912 the two rating forms. G theory is a statistical theory developed by Cronbach, Glesser, Nanda, & Rajaratnam (1972) about the consistency of behavioral measures. It uses the logic of analysis of variance (ANOVA) to differentiate multiple systemic sources of variance (as well as random measurement error) in a set of scores. In addition, it allows for the calculation of a summary coefficient (i.e., generalizability coefficient) that reflects the dependability of measurement. This generalizability coefficient is analogous to the reliability coefficient in classical test theory. A second summary coefficient (i.e., dependability coefficient) provides a measure of absolute agreement across various facets of measurement. Thus, G theory provides the most appropriate approach for examining the psychometric characteristics of the CATME instruments. In this study, each participant rated each of their teammates using each of the two CATME instruments (at two different times). Three potential sources of variance in ratings were examined: person effects, rater effects, and scale effects. In the G theory framework, person effects represent ‘true scores,’ while rater and scale effects represent potential sources of systematic measurement error. In addition to these main effects, variance estimates were also obtained for person by rater, person by scale, and scale by rater two-way interactions, and the three-way person by scale by rater interaction (It is important to note that in G theory, the highest level interaction is confounded with random measurement error). All of the interaction terms are potential sources of measurement error. This analysis allowed for a direct examination of the impact of rating scale as well as a comparison of the measurement characteristics of each scale. Method Participants and Procedure Participants were 86 students in two sections of a sophomore-level chemical engineering course (each course was taught by a different instructor). Participants were assigned to one of 17 12 AOM PAPER 13912 teams, consisting of 4 to 5 team members each. The teams were responsible for completing a series of nine group homework assignments worth 20% of their final grade. Participants were randomly assigned to one of two measurement protocols. Participants in Protocol A (n = 48) completed the BARS instrument at Time 1 and the Likert instrument at Time 2 (approximately 6 weeks later). Participants were told that they would be using two different survey formats for the mid-semester and end-of-semester surveys, to help determine which format was more effective. Participants in Protocol B (n = 38) completed the Likert instrument at Time 1 and the BARS at Time 2. All participants rated each of their teammates at both measurement occasions. However, 13 participants did not turn in ratings for their teammates; thus, ratings were obtained from 73 raters (56 participants received ratings from 4 team members, 16 participants received ratings from 3 team members, and 1 participant received ratings from only 1 team member). Thus, the total data set consisted of a set of 273 ratings ([56 x 4] + [16 x 3] + 1). This set of ratings served as input into the G theory analyses. Measures All participants completed both the Likert-type and web-based BARS versions of the CATME (as presented in the Appendices). Ratings on the Likert-type scale were made on a 5 point scale of agreement (i.e., 1 = Strongly Disagree, 5 = Strongly Agree). Results Descriptive statistics for each dimension on each scale are provided in Table 1. Reliabilities (Cronbach’s alpha) for each Likert dimension are also provided in Table 1. This index of reliability could not be calculated for the BARS instrument due to the fact that each team member was only rated on one item per dimension, per rater; however, G theory analysis 13 AOM PAPER 13912 does provide a form of reliability (discussed in more detail later) that can be calculated for both the Likert-type and BARS scales. Correlations among the BARS dimensions and Likert-type dimensions are provided in Table 2. Cross-scale, within-dimension correlations (e.g., convergent validity estimates) were modest, with the exception of those for the dimensions of Interacting with teammates (.74) and Keeping the team on track (.74), which were adequate. Also of interest from the correlation matrix is the fact that correlations among dimensions for the BARS scale are lower than correlations among dimensions for the Likert-type scale. This may indicate that the BARS scale offers better discrimination among dimensions than the Likert-type scale. G theory analyses were conducted using the MIVQUE0 method via the SAS VARCOMP procedure. This method makes no assumptions regarding the normality of the data, can be used to analyze unbalanced designs, and is efficient (Brennan, 2000). Results of the G theory analysis are provided in Table 3. These results support the convergence of the BARS and Likert-type scales. Specifically, the type of scale used explained relatively little variance in responses (e.g., 0% for Interacting with teammates and Keeping the team on track, and 5%, 10%, and 18% for Having relevant KSAs, Contributing to the team’s work, and Expecting quality, respectively). Also, ratee effects explained a reasonable proportion of the variance in responses (e.g., 31% for Contributing to the team’s work). Generalizability coefficients calculated for each dimension were adequate (Table 4). Note that Table 4 presents two coefficients: ρ (rho), which is also called the generalizability coefficient (the ratio of universe score variance to itself plus relative error variance) and is analogous to a reliability coefficient in Classical Test Theory (Brennan, 2000); and φ (phi), also called the dependability coefficient (the ratio of universe score variance to itself plus absolute error variance), which provides an index of absolute agreement. 14 AOM PAPER 13912 It is important to note that in this study, rating scale format is confounded with time of measurement. Thus, the variance associated with scale (i.e., the 2 way interactions as well as the main effect) may reflect true performance differences (i.e., differences between the two measurement occasions. As a result, the previous analyses may underestimate the consistency of measurement associated with the CATME. Consequently, we conducted a second set of G theory analyses in which we analyzed the data separately for each scale. The results from the second GT analysis are presented in Table 5. These results were fairly consistent with the variance estimates provided by the first set of analyses. Generalizability coefficients (similar to estimates of reliability) calculated for each dimension on each scale are provided in Table 6. These estimates indicate similar characteristics for both rating scale formats. Discussion The primary goal of study 1 was to examine the psychometric characteristics of the BARS version of the CATME relative to that of the previously developed Likert version. Results indicate that both scales provide relatively consistent measures of the teamwork dimensions assessed by the CATME. The results of Study 1 do not, however, address the validity of the rating scale relative to external variables of interest. Therefore, we conducted a second study. STUDY 2: EXAMINING THE VALIDITY OF THE CATME BARS MEASURE The objective of Study 2 was to examine the validity of the CATME BARS rating instrument with respect to course grades as well as another published peer evaluation measure (the instrument created by Van Duzer and McMartin, 2000). Here we expect a high degree of convergence between the two peer evaluation measures, both in terms of overall ratings and individual items. In addition, we expect that, to the extent that the peer evaluation measures 15 AOM PAPER 13912 capture effective team performance, they should predict course grades in courses requiring team activity. Thus both measures should be significantly related to final course grades. Method Participants and Procedure Participants in this study were 104 students enrolled in four discussion sections of a freshman-level Introduction to Engineering course. We chose this course because 40% of each student’s grade is based on team-based projects. All participants were assigned to 5-member teams based on a number of criteria including grouping students who lived near one another and matching team-member schedules. Six students were excluded from the final data analysis due to missing data, thus the effective sample size was 98. As part of their coursework, students worked on a series of team projects over the course of the semester. All participants were required to provide peer evaluation ratings of each of their teammates at four points during the semester. Participants used the CATME BARS measure to provide ratings at Times 1 and 3 and the Van Duzer and McMartin measure at Times 2 and 4. Measures CATME BARS. Participants used the web-based CATME BARS measure presented in Appendix B to provide two sets of peer evaluations. As in study 1 there was a high degree of intercorrelation among the 5 CATME performance dimensions (mean r = .76). Consequently, we formed a composite index representing an overall rating as the mean rating across the 5 dimensions (coefficient alpha = .94). In addition, the correlation between the CATME rating composite across the two administrations was .44. We averaged these two measures to form an overall CATME rating composite. 16 AOM PAPER 13912 Van Duzer and McMartin (VM) Scale. The peer evaluation measure presented by Van Duzer and McMartin (2000) is comprised of 11 rating items. Respondents provided ratings on 4point scales of agreement (1= Disagree, 2 = Tend to Disagree, 3= Tend to Agree, 4 = Agree). We formed a composite score as the mean rating across the 11 items. The coefficient alpha for the composite was .91. The composite scores from first and second VM administrations had a correlation of .55. Thus, we averaged these two measures to form an overall VM rating composite. Course Grades. Final course grades were based on a total of 16 requirements. Each requirement was differentially weighted such that the total number of points obtainable was 1000. Of the total points, up to 50 points were assigned on the basis of class participation, which was directly influenced by the peer evaluations. Thus, we excluded the participation points from the final course score, resulting in a maximum possible score of 950. Results Descriptive statistics and intercorrelations for all study variables are presented in Table 7. We next examined the correlation of each of the two overall rating composites with final course grade. As expected, both the VM measure and the CATME measure were significantly related to final course grades (r = .51 for the CATME; r = .60 for the VM). It is also interesting to note that the correlation between the composite peer ratings and the final course score increases as the semester progresses. The peer rating correlates with final course score .27 at time 1 (CATME BARS), .38 at time 2 (VM scale), .54 at time 3 (CATME BARS), and .61 at time 4 (VM scale). In addition, the overall mean ratings decrease over time while rating variability increases Our results indicate a high level of convergence between the CATME BARS and VM composites (r = .64, p < .01). In addition, the 5 components that make up the CATME ratings 17 AOM PAPER 13912 demonstrated differential relationships with the 11 individual VM items (with correlations ranging from -.02 to .58 (M = .34, MDN = .39; SD = .15). Thus, we also examined this set of correlations for coherence. As can be seen in Table 9, the pattern of correlations was highly supportive of convergence across the two measures. Discussion The results of Study 2 provide evidence for both the convergent and predictive validity of the CATME BARS peer evaluation scale. As expected, the CATME scale demonstrates a high degree of convergence with another commonly used peer evaluation measure and a significant relationship with final course grades in a course requiring a high level of team interaction. It should also be noted the CATME BARS measure offers a far more efficient measure of peer performance than does the longer Likert version of the CATME or the measure provided by Van Duzer and McMartin (2000). GENERAL DISCUSSION In this paper, we presented the development and initial testing of a behaviorally anchored rating scale version of the Comprehensive Assessment of Team Member Effectiveness. This instrument is for peer evaluation and self evaluation by team members and may be useful to instructors who use team learning methods and need a simple but valid way to adjust team grades for individual students. Because this instrument measures the same categories as the Likert-scale version, it is based on the literature on team effectiveness and assesses five major ways by which members can contribute to their teams. It uses simple behavioral descriptions to anchor three levels of performance in each category. In general our results provide a high level of support for the CATME BARS instrument as a peer evaluation measure. Specifically, the CATME BARS instrument demonstrates 18 AOM PAPER 13912 equivalent psychometric characteristics to the much longer Likert version. In addition, the CATME scale demonstrates a high degree of convergence with another commonly used peer evaluation measure and a significant relationship with final course grades in a course requiring a high level of team interaction. Because the initial testing of the CATME BARS instrument showed that the instrument was viable and an on-line administration of the instrument was useful, we believed that enhancing the on-line administration was worthwhile and we created additional features for the on-line system to help students and instructors. Security features were added so that team members can complete the survey from any computer with internet access using a passwordprotected log-in, increasing both convenience and privacy for students. A comments field was added so that students can type in comments that are only visible to their instructor. Instructors who wish to do so can insist that their students use the comments field to provide reasons to justify their ratings in order to increase students’ accountability for providing accurate ratings. These comments can also facilitate discussions between instructors and students. Because the ratings are all done electronically, instructors do not need to re-key the data in order to see all of the ratings. In order to further lessen the time that is required for instructors to review and interpret the data, we added features to the system that calculate average ratings for each student in each of the five CATME categories, compare those to the team average, and compute a suggested adjustment factor for instructors who want to adjust team grades based on individual contributions. These adjustment factors are displayed in two ways – one that includes the self-rating and the peer ratings, and the other that uses only the peer ratings. In addition, we developed a program that analyzes the ratings patterns and flags exceptional conditions with color-coded messages that may indicate the need for the instructor’s attention. These can 19 AOM PAPER 13912 indicate team-level situations, such as where the rating patterns indicate that the team has split into cliques, or individual-level situations, such as low-performing team members. We also created a secure system that instructors can use to electronically provide feedback to students indicating their self-rating, the average of how their teammates rated them, and the team average in each of the five categories measured by the CATME BARS instrument. Because the BARS rating format in general and the CATME BARS instrument in particular are unfamiliar to students, we developed a demonstration instrument that instructors can use to help students understand the five areas of teamwork included in the instrument and how they are measured. This practice rating exercise asks users to rate four fictitious team members whose performance is described in written scenarios. After raters complete the rating practice, they receive feedback that shows how they rated each team member in each CATME category and how that rating compares to the ratings assigned by the expert ratings assigned by the researchers who created the rating scale. Challenges, Limitations, and Future Research Needs Our results indicate several problems that are familiar in peer evaluation research. Although we designed the instrument so that a “3” score would be satisfactory performance and a “5” would be excellent performance, the average score for all five dimensions was approximately 4.2. Students in these studies did not appear to use the full range of the scale, resulting in a restriction of range problem with the data. Although this is a common problem in peer evaluation research for a variety of reasons including social pressures to give high ratings (Taggar & Brown, 2006), it reduces the accuracy of the peer ratings and makes it difficult to obtain strong correlations with predicted criteria. 20 AOM PAPER 13912 There were also high correlations among the five factors. Research finds that team members who are highly skilled in areas related to the team’s work also display better social skills and contribute more to team discussions (Sonnetag & Volmer 2009). Thus, there is probably a substantial amount of real correlation among CATME categories due to some team members being stronger or weaker contributors in many or all of the five areas measured by the instrument. For example, team members with strong relevant knowledge, skills, and abilities are more likely to contribute highly to the team’s work than students who lack the necessary skills to contribute. However, the high correlations among the CATME dimensions may also indicate the presence of halo error. A halo effect occurs when peers’ perceptions of a teammate as a good or bad team member affect their ratings in specific areas (Harris & Barnes-Farrell, 1997). A recent meta-analysis found that correlations among different dimensions of job performance that are rated by peers are inflated by 63% due to halo error (Viswesvaran et al., 2005). Instructors may be able to help students learn to rate more accurately by training students about the dimensions of teamwork represented in the rating scale that they use and by giving students practice using the ratings instrument before they use the instrument to rate their teammates. Frame-of-reference training is an established technique for training raters in performance appraisal settings that may be able to be adapted to help make self and peer appraisals more accurate in educational settings (Woehr & Huffcutt, 1994; Woehr, 1994). Although training may help students develop the ability to accurately rate teammates’ performance, training will not motivate students to rate accurately. To improve the accuracy of peer ratings, and thus improve the reliability and validity of peer evaluation scores, future research must identify ways to change the conditions that motivate team members to rate their teammates based on considerations other than the peers’ contributions to the team. Past research 21 AOM PAPER 13912 has found that both students and employees in organizations are reluctant to evaluate their peers, particularly for administrative purposes such as adjusting grades or determining merit-based pay (Bettenhausen & Fedor, 1997; Sheppard et al., 2004). Individuals who are required to participate in peer evaluation systems often resist the systems because they are concerned that peer evaluations will be biased by friendships, popularity, jealousy, or revenge. Recent research suggests that these concerns may be well-founded (Taggar & Brown, 2006). In particular, students who received positive peer ratings then liked their teammates better and rated them higher on subsequent peer evaluations, whereas students who received negative peer ratings liked their teammates less and gave them lower ratings on subsequent peer evaluations. This was the case even though only aggregated feedback from multiple raters was provided. Future research needs to identify the social pressures and moral codes that influence students’ rating patterns and what, if anything, instructors can do to make students want to provide accurate peer ratings. In our pilot tests and numerous discussions with students and other faculty, we found that many students do not attempt to complete peer evaluations carefully and accurately. This adds a large amount of random rater error or halo error to peer evaluation data. In our conversations with students, they have told us candidly that they often do not take peer evaluations seriously. They advised us to keep our surveys short, but also told us that regardless of the survey length, their answers to peer evaluations are usually governed by social concerns, a desire to avoid conflict with teammates, and unspoken norms about how one should rate a teammate. For many students, there is an unspoken rule that teammates should be given the highest rating unless the teammate’s performance falls seriously below the group’s expectations. Future research needs to examine the informal peer pressures that students use to 22 AOM PAPER 13912 reward or punish team teammates, and how these interact with the peer evaluation mechanisms that instructors put in place. References ABET 2000. Criteria for Accrediting Engineering Programs. Published by The Accreditation Board for Engineering and Technology (ABET), Baltimore, Maryland. Anderson, N.R., & West, M. A. 1998. Measuring climate for work group innovation: Development and validation of the team climate inventory. Journal of Organizational Behavior, 19: 235-258. Bettenhausen, K. L., & Fedor, D. B. 1997. Peer and upward appraisals: A comparison of their benefits and problems. Group & Organization Management, 22: 236-263. Boni, A. A., Weingart, L. R., & Evenson, S. 2009. Innovation in an academic setting: Designing and leading a business through market-focused, interdisciplinary teams. Academy of Management Learning & Education, 8: 407-417. Brennan, R. L. 1994. Variance components in generalizability theory. In C. R. Reynolds (Ed.), Cognitive assessment: A multidisciplinary perspective: 175–207. NY: Plenum Press. Brennan, R. L. 2000. Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24: 339-353. Burdett, J. 2003. Making groups work: University students’ perceptions. International Education Journal, 4: 177-190. Campbell, J. P., Dunnette, M. D., Arvey, R. D., and Hellervik, L. V. 1973. The development and evaluation of behaviorally based rating scales. Journal of Applied Psychology, 57: 15-22. 23 AOM PAPER 13912 Campion, M. A.,Medsker, G. J.,& Higgs, A. C. 1993. Relations between work group characteristics and effectiveness: Implications for designing effective work groups. Personnel Psychology, 46 : 823-850. Cassidy, S. 2006. Developing employability skills: Peer assessment in higher education. Education + Training, 48: 508-517. Chen, G., Donahue, L. M., & Klimoski, R. J. 2004. Training undergraduates to work in organizational teams. Academy of Management Learning and Education, 3: 27-40. Chen, Y., & Lou, H. 2004. Students’ perceptions of peer evaluation: An expectancy perspective. Journal of Education for Business, 79: 275-282. Commission on Accreditation for Dietetics Education 2008. 2008 Eligibility requirements and accreditation standards for coordinated programs in Dietetics (CP). Retrieved April 9, 2009 from http://www.eatright.org/ada/files/CP_Final_ERAS_Document_7-08.pdf. Conway, J. M., & Huffcutt, A. I. 1997. Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisory, peer, and self-ratings. Human Performance, 10: 331-360. Cronbach, L. J., Glesser, G. C., Nanda, H., & Rajaratnam, N. 1972. The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: John Wiley & Sons, Inc. Erez, A., LePine, J. A., & Elms, H. 2002. Effects of rotated leadership and peer evaluation on the functioning and effectiveness of self-managed teams: A quasi-experiment. Personnel Psychology, 55: 929-248. Fellenz, M. R. 2006. Toward fairness in assessing student groupwork: A protocol for peer evaluation of individual contributions. Journal of Management Education, 30: 570591. 24 AOM PAPER 13912 Gatfield, T. 1999. Examining student satisfaction with group projects and peer assessment. Assessment & Evaluation in Higher Education, 24: 365-377. Guzzo, R. A., Yost, P. R., Campbell, R. J., & Shea, G. P. 1993. Potency in groups: Articulating a construct. British Journal of Social Psychology, 32: 87-106. Gueldenzoph, L. E., & May, G. L. 2002. Collaborative peer evaluation: Best practices for group member assessments. Business Communication Quarterly, 65: 9-20. Harris, T. C., & Barnes-Farrell, J. L. 1997. Components of teamwork: Impact on evaluations of contributions to work team effectiveness. Journal of Applied Social Psychology, 27: 1694-1715. Harris, M. M., & Schaubroeck, J. 1988. A meta-analysis of self-supervisor, self-peer, and peersupervisor ratings. Personnel Psychology, 41: 43-62. Hedge, J. W., Bruskiewicz, K. T., Logan, K. K., Hanson, M. A., & Buck, D. 1999. Crew resource management team and individual job analysis and rating scale development for air force tanker crews (Technical Report No. 336). Minneapolis, MN: Personnel Decisions Research Institutes, Inc. Hoegl, M., & Gemuenden, H. G. 2001. Teamwork quality and the success of innovative projects: A theoretical concept and empirical evidence. Organization Science, 12: 435449. Hooijberg, R. & Lane, N. 2009. Using multisource feedback coaching effectively in executive education. Academy of Management Learning & Education, 8, 483-493. Hyatt, D. E., & Ruddy, T. M. 1997. An examination of the relationship between work group characteristics and performance: Once more into the breech. Personnel Psychology, 50: 553-585. 25 AOM PAPER 13912 Inderrieden, E. J., Allen, R. E., & Keaveny, T. J. 2004. Managerial discretion in the use of selfratings in an appraisal system: The antecedents and consequences. Journal of Managerial Issues, 16: 460-482. Kolb, A. Y., & Kolb, D. A. 2005. Learning styles and learning spaces: Enhancing experiential learning in higher Education. Academy of Management Learning and Education, 4: 193-212. Kruger, J., & Dunning, D. 1999. Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77: 1121-1134. Lawler, E. E., Mohrman, S. A., Benson, G. 2001. Organizing for high performance: Employee Involvement, TQM, Reengineering, and Knowledge Management in the Fortune 1000. San Francisco, CA: Jossey-Bass. Loyd, D. L., Kern, M. C., & Thompson, L. 2005. Classroom research: Bridging the ivory divide. Academy of Management Learning & Education, 4: 8-21. London, M., Smither, J. W., & Adsit, D. J. 1997. Accountability: The Achilles’ heel of multisource feedback. Group & Organization Management, 22: 162-184. Loughry, M. L., Ohland, M.W., and Moore, D.D. 2007. “Development of a Theory-Based Assessment of Team Member Effectiveness”. Educational and Psychological Measurement, 67: 505-524. MacDonald, H. A., & Sulsky, L. M. 2009. Rating formats and rater training redux: A contextspecific approach for enhancing the effectiveness of performance management. Canadian Journal of Behavioural Science, 41: 227-240 26 AOM PAPER 13912 McAllister, D. J. 1995. Affect-and cognition-based trust as foundations for interpersonal cooperation in organizations. Academy of Management Journal, 38: 24-59. McGourty, J., & De Meuse, K. P. 2001. The Team Developer: An assessment and skill building program. Student guidebook. New York: John Wiley & Sons, Inc. McIntyre, R. M., & Salas, E. 1995. Measuring and managing for team performance: Emerging principles from complex environments. In R. A. Guzzo & E. Salas (Eds.), Team effectiveness and decision making in organizations (pp. 9-45). San Francisco: JosseyBass. Michaelsen, L. K., Knight, A. B., & Fink, L. D. (Eds.). 2004. Team-based learning: A transformative use of small groups in college teaching. Sterling, VA: Stylus Publishing. Millis, B.J. and P.G. Cottell, Jr. 1998. Cooperative Learning for Higher Education Faculty. Phoenix, AZ: American Council on Education/Oryx Press. Moran, L., Musselwhite, E., & Zenger, J. H. 1996. Keeping teams on track: What to do when the going gets rough. Chicago, IL: Irwin Professional Publishing. Niehoff, B. P., & Moorman, R. H. 1993. Justice as a mediator of the relationship between methods of monitoring and organizational citizenship behavior. Academy of Management Journal, 36: 527-556. Ohland, M. W., Layton, R. A., Loughry, M. L., & Yuhasz, A. G. 2005. Effects of Behavioral Anchors on Peer Evaluation Reliability. Journal of Engineering Education, 94: 319-326. Raelin, J. 2006. Does action learning promote collaborative leadership? Academy of Management Learning and Education, 5: 152-168. Rosenstein, R., & Dickinson, T.L. (1996, August). The teamwork components model: An 27 AOM PAPER 13912 analysis using structural equation modeling.. In R.M. McIntyre (Chair), Advances in definitional team research. Symposium conducted at the annual meeting of the American Psychological Association, Toronto. Saavedra, R. & Kwun, S. K. 1993. Peer evaluation in self-managing work groups. Journal of Applied Psychology, 78: 450-462. Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. 1984. Metaanalyses of validity studies published between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 37: 407-422. Shavelson, R. J., & Webb, N. M. 1991. Generalizability Theory. Newbury Park: Sage Publications, Inc. Sheppard, S., Chen, H. L. Schaeffer, E., Steinbeck, R., Neumann, H., & Ko, P. 2004. Peer assessment of student collaborative processes in undergraduate engineering education. Final Report to the National Science Foundation, Award Number 0206820, NSF Program 7431 CCLI-ASA. Shore, T. H., Shore, L. M., & Thornton III, G. C. 1992. Construct validity of self-and peer evaluations of performance dimensions in an assessment center. Journal of Applied Psychology, 77: 42-54. Smith, P. C., & Kendall, L. M. 1963. Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47: 149-155. Sonnetag, S. & Volmer, J. 2009. Individual-level predictors of task-related teamwork processes: The role of expertise and self-efficacy in team meetings. Group & Organization Management, 34, 37-66. 28 AOM PAPER 13912 Spich, R. S., & Keleman, K. 1985. Explicit norm structuring process: A strategy for increasing task-group effectiveness. Group & Organization Studies, 10: 37-59. Stevens, M. J., & Campion, M. A. 1994. The knowledge, skill, and ability requirements for teamwork: Implications for human resource management. Journal of Management, 20: 503-530. Stewart, G. L. 2006. A meta-analytic review of relationships between team design features and team performance. Journal of Management, 32: 29-55. Stevens, M. J., & Campion, M. A. 1999. Staffing work teams: Development and validation of a selection test for teamwork settings. Journal of Management, 25: 207-228. Taggar, S., & Brown, T. C. 2006. Interpersonal affect and peer rating bias in teams. Small Group Research, 37: 87-111. Taggar, S., & Brown, T. C. 2001. Problem-solving team behaviors: Development and validation of BOS and a hierarchical factor structure. Small Group Research, 32: 698726. Van Duzer, E., & McMartin, F. 2000. Methods to improve the validity and sensitivity of a self/peer assessment instrument. IEEE Transactions on Education, 43: 153-158. Verzat, Byrne, & Fayolle, 2009. Tangling with spaghetti: Pedagogical lessons from games. Academy of Management Learning and Education, 8, 356-369. Viswesvaran, C., Ones, D. S., & Schmidt, F. L. 1996. Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81: 557-574. Viswesvaran, C., Schmidt, F. L., & Ones, D. S. 2005. Is there a general factor in ratings of job performance? A meta-analytic framework for disentangling substantive and error influences. Journal of Applied Psychology, 90: 108-131. 29 AOM PAPER 13912 Weldon, E., Jehn, K. A., & Pradhan, P. 1991. Processes that mediate the relationship between a group goal and improved group performance. Journal of Personality and Social Psychology, 61: 555-569. Wheelan, S. A. 1999. Creating effective teams: A guide for members and leaders. Thousand Oaks, CA: Sage Publications. Woehr, D. J. 1994. Understanding frame-of-reference training: The impact of training on the recall of performance information. Journal of Applied Psychology, 79: 525-534. Woehr, D. J., & Huffcutt, A. I. 1994. Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67: 189-206. Zantow, K., Knowlton, D. S., & Sharp, D. C. 2005. More than fun and games: Reconsidering the virtues of strategic management simulations. Academy of Management Learning and Education 4: 451-458. 30 AOM PAPER 13912 TABLE 1 Descriptive Statistics for CATME BARS and Likert-type Scale Dimensions in Study 1 BARS Scale Contributing to the Team’s Work Interacting With Teammates Keeping the Team on Track Expecting Quality Having Relevant KSAs Likert-type Scale Contributing to the Team’s Work Interacting With Teammates Keeping the Team on Track Expecting Quality Having Relevant KSAs N Min Max Mean SD α 260 264 264 264 264 2.00 1.00 1.00 1.00 2.00 5.00 5.00 5.00 5.00 5.00 4.59 4.33 4.20 4.18 4.24 .62 .75 .78 .75 .71 ------ 273 273 273 273 273 1.88 2.20 2.14 2.50 2.25 5.00 5.00 5.00 5.00 5.00 4.31 4.32 4.19 4.60 4.44 .56 .54 .59 .49 .54 .90 .91 .87 .81 .78 31 AOM PAPER 13912 TABLE 2 Correlations among CATME BARS and Likert-type Scale Dimensions in Study 1 1 1 2 3 4 5 6 7 8 9 10 BARS Scale Contributing to the Team’s Work Interacting With Teammates Keeping the Team on Track Expecting Quality Having Relevant KSAs -.37** .26** .35** .35** BARS Scale 2 3 4 -.44** .51** .47** -.38** .50** -.52** Likert-type Scale Contributing to the Team’s Work .59** .63** .55** .51** Interacting With Teammates .40** .74** .66** .54** Keeping the Team on Track .47** .66** .74** .71** Expecting Quality .43** .50** .52** .41** Having Relevant KSAs .73** .48** .38** .44** Notes. N = 260-273; Convergent validity estimates are in bold. ** p < .01. 5 6 Likert-type Scale 7 8 9 10 -.81** .63** .57** -- -.65** .58** .78** .50** .45** -.71** .80** .68** .73** -.65** .64** -.59** 32 AOM PAPER 13912 TABLE 3 Variance Estimates Provided by Generalizability Theory Models Tested on Each CATME Dimension in Study 1 Component Variance Contributing to the Team’s Work Estimate % Ratee .12 31.45% Rater .04 11.02% Scale .04 9.95% Ratee x Rater .04 10.48% Ratee x Scale .02 6.18% Rater x Scale .05 13.44% Error .07 17.47% Total Variance .37 Note. Scale = BARS vs. Likert-type. Interacting With Teammates Keeping the Team on Track Expecting Quality Having Relevant KSAs Estimate % Estimate % Estimate % Estimate % .09 .14 .00 .06 .03 .04 .06 .43 20.51% 33.10% 0.00% 14.92% 7.69% 9.32% 14.45% .07 .19 .00 .07 .02 .05 .07 .47 15.19% 39.45% 0.00% 15.61% 4.43% 10.55% 14.77% .03 .06 .09 .05 .01 .17 .08 .49 6.94% 12.24% 18.37% 10.20% 1.22% 34.69% 16.33% .06 .05 .02 .06 .02 .12 .07 .41 15.65% 11.98% 5.13% 14.67% 4.40% 30.07% 18.09% 33 AOM PAPER 13912 TABLE 4 Coefficients for Generalizability Theory Models Tested on Each CATME Dimension in Study 1 ρ (rho) φ (phi) Contributing to the Team’s Work Interacting With Teammates Keeping the Team on Track Expecting Quality Having Relevant KSAs .85 .71 .76 .64 .75 .59 .70 .31 .75 .58 34 AOM PAPER 13912 TABLE 5 Variance Estimates Provided by Generalizability Theory Models Tested on Each CATME Dimension for Each Scale in Study 1 Component Variance BARS Scale Ratee Rater Ratee x Rater Total Variance Likert-type Scale Ratee Rater Ratee x Rater Total Variance Contributing to the Team’s Work Interacting With Teammates Keeping the Team on Track Expecting Quality Having Relevant KSAs Estimate % Estimate % Estimate % Estimate % Estimate % .14 .11 .12 .37 36.86% 30.89% 32.25% .17 .20 .21 .58 29.31% 35.00% 35.69% .08 .31 .22 .61 12.77% 50.90% 36.33% .05 .35 .18 .57 8.93% 60.42% 30.65% .05 .27 .19 .50 10.52% 52.58% 36.90% .15 .08 .08 .31 47.60% 25.88% 26.52% .06 .17 .05 .28 20.43% 60.57% 19.00% .09 .17 .08 .34 26.63% 51.18% 22.19% .03 .13 .09 .24 10.64% 53.19% 36.17% .10 .09 .08 .28 37.82% 31.64% 30.55% 35 AOM PAPER 13912 TABLE 6 Coefficients for Generalizability Theory Models Tested on Each CATME Dimension for Each Scale in Study 1 BARS Scale ρ (rho) φ (phi) Likert-type Scale ρ (rho) φ (phi) Contributing to the Team’s Work Interacting With Teammates Keeping the Team on Track Expecting Quality Having Relevant KSAs .90 .82 .87 .77 .74 .54 .70 .44 .70 .48 .93 .88 .90 .67 .91 .74 .70 .49 .91 .83 36 AOM PAPER 13912 TABLE 7 Descriptive Statistics for and Correlations among All Study Variables CATME Time 1 CATME Time 3 CATME composite VM Time 2 VM Time 4 VM composite Course Points Mean SD CATME Time 1 4.20 4.07 4.14 3.72 3.65 3.69 822.73 .38 .59 .41 .19 .37 .25 53.55 -.44** .77** .53** .30** .42** .27** CATME CATME Time 3 composite -.91** .53** .58** .63** .54** -.62** .55** .64** .51** VM Time 1 VM Time 2 VM composite Course Points -.55** .79** .38** -.95** .61** -.60** -- ** p < .01. 37 AOM PAPER 13912 TABLE 8 Correlations between VM Items and CATME BARS Dimensions in Study 2 VM Items 1 Failed to do an equal share of the work. (R) 2 Kept an open mind/was willing to consider other’s ideas 3 Was fully engaged in discussions during meetings. 4 Took a leadership role in some aspects of the project. 5 Helped group overcome differences to reach effective solutions. 6 Often tried to excessively dominate group discussions. (R) 7 Contributed useful ideas that helped the group succeed. 8 Encouraged group to complete the project on a timely basis. 9 Delivered work when promised/needed. 10 Had difficulty negotiating issues with members of the group. (R) 11 Communicated ideas clearly/effectively. Notes (R) = Reverse Scored. * p < .05. ** p < .01. CATME BARS Dimensions Contributing to Interacting with Keeping the Expecting the team’s work teammates team on track quality ** ** ** .45 .32 .41 .37** .15 .20* .14 .14 Having relevant KSAs .40** .19 .44** .33** .42** .35** .41** .58** .38** .52** .44** .52** .49** .40** .49** .45** .52** -.02 .04 .02 .03 .06 .48** .34** .47** .43** .45** .41** .24* .40** .32** .34** .42** .20* .27** .18 .37** .27** .39** .16 .30** .20* .54** .42** .53** .43** .52** 38 AOM PAPER 13912 APPENDIX A Comprehensive Assessment of Team Member Effectiveness - Likert Short Version Contributing to the Team’s Work Did a fair share of the team’s work. Fulfilled responsibilities to the team. Completed work in a timely manner. Came to team meetings prepared. Did work that was complete and accurate. Made important contributions to the team’s final product. Kept trying when faced with difficult situations. Offered to help teammates when it was appropriate. Interacting With Teammates Communicated effectively. Facilitated effective communication in the team. Exchanged information with teammates in a timely manner. Provided encouragement to other team members. Expressed enthusiasm about working as a team. Heard what teammates had to say about issues that affected the team. Got team input on important matters before going ahead. Accepted feedback about strengths and weaknesses from teammates. Used teammates’ feedback to improve performance. Let other team members help when it was necessary. Keeping the Team on Track Stayed aware of fellow team members’ progress Assessed whether the team was making progress as expected. Stayed aware of external factors that influenced team performance. Provided constructive feedback to others on the team. Motivated others on the team to do their best. Made sure that everyone on the team understood important information. Helped the team to plan and organize its work. Expecting Quality Expected the team to succeed. Believed that the team could produce high-quality work. Believed that the team should achieve high standards. Cared that the team produced high-quality work. Having Relevant Knowledge, Skills, and Abilities Had the skills and expertise to do excellent work. Had the skills and abilities that were necessary to do a good job. Had enough knowledge of teammates’ jobs to be able to fill in if necessary. Knew how to do the jobs of other team members. 39 AOM PAPER 13912 APPENDIX B Comprehensive Assessment of Team Member Effectiveness - BARS Version Your name Write the names of the people on your team including your own name. Having Relevant Knowledge, Skills, and Abilities Expecting Quality Keeping the Team on Track Interacting with Teammates Contributing to the Team’s Work This self and peer evaluation asks about how you and each of your teammates contributed to the team during the time period you are evaluating. For each way of contributing, please read the behaviors that describe a “1”, “3,” and “5” rating. Then confidentially rate yourself and your teammates. 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 Does more or higher-quality work than expected. Makes important contributions that improve the team’s work. Helps to complete the work of teammates who are having difficulty. Demonstrates behaviors described in both 3 and 5. Completes a fair share of the team’s work with acceptable quality. Keeps commitments and completes assignments on time. Fills in for teammates when it is easy or important. Demonstrates behaviors described in both 1 and 3. Does not do a fair share of the team’s work. Delivers sloppy or incomplete work. Misses deadlines. Is late, unprepared, or absent for team meetings. Does not assist teammates. Quits if the work becomes difficult. Asks for and shows an interest in teammates’ ideas and contributions. Improves communication among teammates. Provides encouragement or enthusiasm to the team. Asks teammates for feedback and uses their suggestions to improve. Demonstrates behaviors described in both 3 and 5. Listens to teammates and respects their contributions. Communicates clearly. Shares information with teammates. Participates fully in team activities. Respects and responds to feedback from teammates. Demonstrates behaviors described in both 1 and 3. Interrupts, ignores, bosses, or makes fun of teammates. Takes actions that affect teammates without their input. Does not share information. Complains, makes excuses, or does not interact with teammates. Accepts no help or advice. Watches conditions affecting the team and monitors the team’s progress. Makes sure that teammates are making appropriate progress. Gives teammates specific, timely, and constructive feedback. Demonstrates behaviors described in both 3 and 5. Notices changes that influence the team’s success. Knows what everyone on the team should be doing and notices problems. Alerts teammates or suggests solutions when the team’s success is threatened. Demonstrates behaviors described in both 1 and 3. Is unaware of whether the team is meeting its goals. Does not pay attention to teammates’ progress. Avoids discussing team problems, even when they are obvious. Motivates the team to do excellent work. Cares that the team does outstanding work, even if there is no additional reward. Believes that the team can do excellent work. Demonstrates behaviors described in both 3 and 5. Encourages the team to do good work that meets all requirements. Wants the team to perform well enough to earn all available rewards. Believes that the team can fully meet its responsibilities. Demonstrates behaviors described in both 1 and 3. Satisfied even if the team does not meet assigned standards. Wants the team to avoid work, even if it hurts the team. Doubts that the team can meet its requirements. Demonstrates the knowledge, skills, and abilities to do excellent work. Acquires new knowledge or skills to improve the team’s performance. Able to perform the role of any team member if necessary. Demonstrates behaviors described in both 3 and 5. Has sufficient knowledge, skills, and abilities to contribute to the team’s work. Acquires knowledge or skills needed to meet requirements. Able to perform some of the tasks normally done by other team members. Demonstrates behaviors described in both 1 and 3. Missing basic qualifications needed to be a member of the team. Unable or unwilling to develop knowledge or skills to contribute to the team. Unable to perform any of the duties of other team members. 40