IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 87 A Framework for Analyzing Levels of Analysis Issues in Studies of E-Collaboration —MICHAEL J. GALLIVAN AND RAQUEL BENBUNAN-FICH Abstract—There has been a proliferation of competing explanations regarding the inconsistent results reported by the e-collaboration literature since its inception. This study advances another possible explanation by investigating the range of multilevel issues that can be encountered in research on the use of synchronous or asynchronous group support systems. We introduce concepts of levels of analysis from the management literature and then examine all empirical studies of e-collaboration from seven information systems journals for the period 1999–2003. We identified a total of 54 studies of e-collaboration in these journals, and after excluding 18 nonconforming studies—those that were primarily conceptual, qualitative, or exploratory only—we analyzed the levels of analysis issues in the remaining 36 empirical studies. Based on our analysis and classification of these studies into six different clusters according to their levels of analysis, we found that a majority of these studies contain one or more problems of levels incongruence that cast doubts on the validity of their results. It is indeed possible that these methodological problems are in part responsible for the inconsistent results reported in this literature, especially since researchers’ frequent decisions to analyze data at the individual level—even when the theory was formulated at the group level and when the research setting featured individuals working in groups—may very well have artificially inflated the authors’ chances of finding statistically significant results. Based on our discussion of levels of analysis concepts, we hope to provide guidance to empirical researchers who study e-collaboration. Index Terms—E-collaboration, group support systems (GSS), levels of analysis. U nderstanding and enhancing the value of information technology (IT) within organizations is, arguably, the primary research objective of information systems (IS) literature. Over the past 30 years, beginning with early studies by Lucas, researchers have sought to identify when and why computer technology delivers benefits to organizational members [1]. While this endeavor has evolved into distinct research streams examining the use of computer and communication technologies by individuals, groups, organizations, and interorganizational supply chains, the issues and insights from each research stream have important implications for each other. While the explanations regarding whether and how IT creates benefits have value from one level of analysis to another, it is critical that IS researchers bear level issues in mind as they formulate their theories, design their studies, and analyze their data. Our objective is to evaluate IS research conducted on electronic collaboration (henceforth e-collaboration) to examine levels of Manuscript received March 22, 2004; revised July 26, 2004. M. J. Gallivan is with the Department of Computer Information Systems, Robinson College of Business, Georgia State University, Atlanta, GA 30302 USA (email: mgallivan@gsu.edu). R. Benbunan-Fich is with the Computer Information Systems Department, Zicklin School of Business, Baruch College, City University of New York, New York, NY 10010 USA (email: raquel_benbunanfich@baruch.cuny.edu). IEEE DOI 10.1109/TPC.2005.843301 0361-1434/$20.00 © 2005 IEEE analysis issues, and to assess the extent to which these concerns are appropriately addressed. One example of the importance of consciously reflecting upon levels of analysis issues appears in the IT payoff literature. While studies of IT payoff are generally conducted at the organizational level of analysis, in contrast to research on e-collaboration, which is usually studied at the group level of analysis, problems that accompany misspecification of the appropriate level of analysis are important for all researchers to recognize and resolve [2]. In order to illustrate our topic, we offer the following analogy to issues that have plagued researchers in the IT payoff literature over the past decade. Within the organizational IT payoff literature, there has been considerable emphasis on the so-called productivity paradox, a phenomenon first mentioned in the late 1980s by economist Stephen Roach [3], [4]. Over the subsequent 15 years, many studies that have investigated IT payoffs at first supported Roach’s productivity paradox [5], [6], but more recent studies have rejected it and, conversely, have demonstrated the considerable value of IT investments to firm-level performance [7]–[9]. Among the advances that have led to better insights into the consequences of IT investments have been studies that urged researchers to more clearly specify the levels of analysis at which their data are collected and analyzed (i.e., at the 88 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 industry, firm, or business unit level) [5]. A second set of improvements have resulted from developing more precise construct definitions and statistical procedures for detecting so-called payoffs from IT investments—for example, specifying whether the benefits appear in terms of greater productivity, profitability, or consumer welfare [10], and whether a time lag exists between when the funds are invested and when payoffs appear [5]. A decade ago, Brynjolfsson underscored the importance of levels of analysis when he noted the difference between analyzing payoffs from IT investments at the firm level versus the industry level of analysis: IT may be beneficial to individual firms, but unproductive from the standpoint of the industry as a whole or the economy as a whole: IT rearranges the shares of the pie without making it any bigger . . . [E]conomists have recognized for some time that . . . one firm’s gain comes entirely at the expense of others, instead of by creating new wealth . . . IS researchers would draw very different conclusions from studies that examine industry-level benefits from IT investment (which may be absent or negative) vs. studies that examine firm-level benefits (which may be positive). Thus, misspecification of the appropriate level of analysis regarding where the benefits accrue would lead to incorrect conclusions regarding the value of IT investments. [5, p. 75] Moreover, other recent studies have urged researchers to seek more complex theories to explain the conflicting results that characterize this area of study [11], such as process models [12], or mediating variables that link IT effects on specific processes to overall firm performance [13]. Based on these advances in research methods and in the precision of theoretical formulation, IT payoff studies in recent years have been able to consistently identify firm-level benefits of IT spending [7], [8], [14], thus refuting the productivity paradox. Without a doubt, research on IT usage at the group level is an important research domain within the IS literature, yet researchers often fail to notice the parallels between group- and organizational-level research. Chan has noted that researchers at different levels of analysis often “talk past each other,” and she claims that, with regard to these levels of analysis issues, the “schisms are getting more noticeable over time” [2, p. 241]. Most group support systems (GSS) research is plagued with seemingly contradictory findings that sometimes advocate for use of these technologies, and other times report little or no benefit. If any smoking gun exists (i.e., in terms of a study that challenges the value of GSS technologies, as Roach did when he first noted the productivity paradox), it is the study by Pinsonneault, Barki, Gallupe and Hoppen where the authors concluded that the use of GSS for supporting group brainstorming created an “illusion of productivity” [15, p. 110]. Aside from this particular study, there appear to be no other studies that challenge the value of GSS technologies or the GSS research program as a whole. Yet, the most optimistic conclusion one might offer regarding the past 20 years of GSS research is that the findings have been steadfastly inconsistent. It is unclear whether the fault lies in the use of overly deterministic epistemologies [16], [17], one-shot research approaches that neglect to consider group history and changes over time [16], [18], adherence to deterministic theories that assume an unproblematic “logic of determination” [11], failure to ensure task-technology fit within the research context studied [19], or other possible explanatory factors. Our objective in this paper is to draw attention to another potential explanation in the literature on group-level IT usage by arguing that ongoing neglect of levels of analysis issues that have been discussed in the management literature for over 20 years may contribute to the confusion and inconsistent results concerning GSS usage and other forms of e-collaboration among users [20], [21]. We do so by attempting to bridge the gap, or schism, between researchers who focus on IT use and its impacts at different levels of analysis. We believe that the earlier problems and insights derived from research on IT payoffs (conducted at the firm level) indeed have important implications for IS researchers who study issues related to group-level IT use and impacts. We proceed by drawing attention to recent contributions to the levels of analysis debate within the management literature [22]–[24], arguing that the insights offered there have important implications for IS researchers studying e-collaboration in terms of how we theorize, operationalize our constructs, collect data, and conduct our analyses. Based on a review of 54 studies of e-collaboration published in seven leading IS journals during the period 1999–2003, we find that there has been insufficient attention to ensuring a good fit between the levels at which the theory is formulated vs. the levels at which data are collected and analyzed. In his recent commentary on the IT payoff literature, Kohli noted that: past studies that had been looking for IT payoff at the economy or industry level should have been examining the impacts at the firm-level . . . [Given the prior history of] mixed or negative results . . . the business value of IT, or IT payoff, literature appears to face challenges to move from the macro level to micro level. [25, p. 1467] In an analogous fashion, we believe that inadequate attention to levels of analysis concerns may be responsible for the contradictory findings in studies GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION of GSS use and other forms of e-collaboration. This is consistent also with Poole and Van de Ven’s advice that many problems and solutions apparent at one level of organization manifest themselves in different and contradictory ways at other levels . . . Key dynamics can often be explained and understood at one level of organization as a result of processes occurring at another. [26, pp. 570–571] Through our review of the GSS literature, we hope to make IS researchers aware of the problems with levels of analysis issues that have characterized much research on GSS use and impacts, and to offer guidelines for redressing these problems in the future. LITERATURE REVIEW There is increasing evidence that despite constant effort on the part of IS researchers over the past two decades, there has been little in the way of a cumulative body of findings on the use of IT to support e-collaboration. Within the past few years alone, several studies have reviewed and even meta-analyzed the literature on GSS use and its impacts [19], [27]–[29]. Among the problems identified from such analyses were conclusions from Dennis and Wixom who noted that: For almost 20 years, researchers have been studying the effectiveness and efficiency of systems that support synchronous and asynchronous teams . . . Unfortunately, drawing some overall conclusions from this collective body of research about the general effect of GSS has not been easy because GSS findings have been relatively inconsistent . . . (Based on findings from two meta-analyzes), GSS use was found to improve decision quality and the number of alternatives or ideas generated across studies, but . . . to increase the time taken to complete the task and reduce participant satisfaction . . . [In their] comprehensive nonstatistical analysis of more than 230 articles . . . [Fjermestad and Hiltz] found that in most cases, use of GSS led to no improvements, even in the applications believed to be most suited to GSS use ([namely] idea generation) . . . [27, pp. 235–236] Gopal and Prasad, who advocated alternative epistemologies and longitudinal studies to follow groups over time, described the problems that such inconsistent results create for IS researchers: There is wide acknowledgment that GDSS research results have been either inconsistent or nonexistent . . . Rather than narrowing down to some “truth,” . . . there appears to be little accord among researchers . . . The inconsistency and the lack of significant results appear to have been particularly disturbing to the GDSS research community, with almost every new voice within 89 the literature drawing attention to them and attempting to explain how the problem might be solved. The solutions proposed, unfortunately, have resulted in the proliferation of theories and models and in the fragmentation of views within the community. [16, p. 510] In addition to calls for alternate research epistemologies and methods [16], several authors have called attention to the need for researchers to understand whether and how group members appropriate the various features that support group collaboration [30], to examine the level of task-technology fit [31] or to carefully attend to both sets of concerns [27]. Other proposed solutions have been to develop new constructs such as faithfulness of appropriation [32] and consensus on appropriation [33], or to employ theories that recognize inconsistency and a “logic of opposition” in their basic assumptions, rather than to expect consistency and technological determinism [11]. In this paper, we offer a different explanation for the inconsistency in research findings that characterize previous studies of e-collaboration: we propose that researchers have neglected to attend to levels of analysis issues, and that several of the prior findings may be called into question as being trustworthy or valid. While we recognize the degree to which our claim may be controversial, nevertheless, our arguments are well supported by much research in the fields of psychology and management, and these issues are finally beginning to receive attention from IS researchers [25], [34]. Despite these nascent attempts to raise such issues, both within the traditional IS outlets as well as beyond the boundaries of the IS discipline, we believe that such issues should bear increased attention within the IS community. Hence, we seek to convey our message to IS researchers who study e-collaboration through this special issue on “Expanding the Boundaries of E-Collaboration.” In reviewing studies of GSS use and other forms of e-collaboration, we examined a broader set of studies than have been discussed in previous review papers [19], [27]–[29]. While the set of papers we reviewed is more circumscribed in terms of the time period covered (compared to these prior review studies), we sought to review not only the traditional studies of electronic meeting systems or group decision support systems (GDSS) (i.e., technologies that support same time, same place meetings) but also to include studies of distributed teams and communities using asynchronous support tools. While the earlier review studies found a very small proportion of studies of asynchronous technologies among the overall set of GSS studies (only 8% of all GSS lab studies [28] and 16% of GSS field studies [29] focused on asynchronous support technologies), we deliberately searched for studies that feature distributed teams and asynchronous support tools to complement the 90 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 traditional studies of GSS, GDSS, and electronic meeting systems. We included studies that examined the use of a broad range of group-related technologies, regardless of the specific labels employed by the authors (GSS, GDSS, electronic meeting systems, asynchronous collaboration, distributed or virtual teams, virtual communities, etc.). Below, we introduce the core concepts and terminology related to levels of analysis issues and emphasize that these insights are valuable for all IS researchers who study e-collaboration among individuals and groups. Despite the fact that the concepts that we introduce and discuss are often labeled as multilevel theory concerns, we strongly emphasize that awareness of these issues should not be restricted to researchers who regard themselves as multi-level researchers, or to those who explicitly conduct multi-level or cross-level research, but rather to all IS researchers. Throughout the paper, we seek to draw parallels between refinements in theory and methods that can help to improve the consistency of e-collaboration research by drawing analogies to advances that were previously achieved in the domain of firm-level IT payoff studies. These advances have encouraged researchers who study IT payoffs at the firm or industry levels to more clearly specify the levels at which they expect the hypothesized effects to occur, and to follow appropriate statistical procedures for analyzing IT payoffs [6], [13]. LEVELS OF ANALYSIS CONCEPTS DEFINITIONS AND Over the past several years, the set of issues that fall under the label of levels of analysis have become a niche research area in management and psychology, with scholars such as Katherine Klein [22]–[24], [35], [36], Steve Kozlowski [37], [38], and Fred Dansereau [22], [39] among the leading thinkers and writers. While there has been increasing interest in multi-level research and levels of analysis concerns in the management, psychology, and educational research literature in recent years, the underlying concepts were first explicated over 20 years ago by Denise Rousseau [20], [21]. In addition to such seminal studies of key multilevel concepts [40], statistical approaches for ensuring that data collected at the individual level can be properly aggregated to the group level have been in existence for over 20 years [41], [42]. Despite the longevity of these concepts and statistical methods concerning levels of analysis, we believe that there are two key reasons that these important ideas have been almost entirely neglected in the IS literature (with one notable exception [34]). First, there has been a proliferation of recent studies comparing and critiquing the various statistical approaches for examining measures of intercoder reliability and within-group homogeneity [37], [43]–[45]. In addition to being very complex—in terms of the level of mathematical sophistication required to understand them—these papers have inadvertently created the illusion that the primary issues are statistical ones. This is an unfortunate state of affairs because the key issues in understanding and correctly specifying the levels of analysis for research are actually conceptual rather than statistical concerns [22]. It is critical for all researchers to clearly understand and specify the levels of analysis at which their theories apply, even before they delve into the details of appropriate statistical techniques to ensure that their data collection and analytic methods conform to the level of their theories [22], [23], [38]. A second potential reason for the general neglect of this level of analysis literature is that many authors consider it relevant only for researchers who explicitly develop multi-level or cross-level theories. This second misunderstanding is easily explained, since the first publications to explicate the key issues involved in levels of analysis featured titles such as “multi-level and cross-perspectives,” which implied that this topic was only of concern to multi-level or cross-level theorists [21]. While both misconceptions have contributed greatly to the lack of attention to levels of analysis issues in organizational research, we are not the first authors to acknowledge these contributing problems. A decade ago, psychologist Katherine Klein and her colleagues provided a cogent, nonstatistical explanation of the theoretical issues involved in specifying proper levels of analysis [22]. In numerous publications, and assisted by various co-authors, she has led the charge to develop a greater awareness of levels of analysis issues through special journal issues [24], focused monographs [23], and other research studies [36]. In particular, with regard to the misconception that levels of analysis issues should be of concern only to those researchers who regard their work as multi-level, Klein et al. acknowledge the fact that prior guidelines on levels of analysis issues: . . . create the inadvertent impression that attention to levels is only a priority for scholars who undertake mixed-level theory. But, precise articulation of the level of one’s constructs is an important priority for all organizational scholars whether they propose single- or mixed-level theories . . . [There are] profound implications of specifying a given level or levels of a theory. Greater appreciation and recognition of these implications will . . . enhance the clarity, testability, comprehensiveness, and creativity of [all] organizational theories. [22, p. 196] We acknowledge that the concepts we discuss below are not novel, but have been discussed by Klein and her colleagues [22], [36], Rousseau and her colleagues [20], [21], [46], and James and his colleagues [41], [42], [47] in years past. We believe, however, that while our observations below may not be groundbreaking to methodological experts within the IS field—those GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION Cortina labels as the “methods folks”—nevertheless, we are the first to bring these issues to the attention of a broader group of IS researchers, specifically those studying the use of synchronous and asynchronous tools for e-collaboration [48, p. 339]. Levels of analysis concerns are important for all researchers to understand and follow. Seminal works on levels issues [20], [22] have identified three domains at which levels issues may be considered: the level at which the theory is conceptualized, the level at which data are collected, and the level at which data are analyzed. While the level of data collection and level of data analysis are largely self-explanatory, the notion of the level at which theory is conceptualized—the FOCAL UNIT—is not so straightforward. According to Rousseau [20]: The level to which generalizations are made is the focal unit. In practice, the focal unit often is not identical to either the level of measurement or the level of [data] analysis. Researchers [may seek to] . . . measure an individual’s sense of autonomy and the number of formal rules and regulations in an individual’s job and conclude that an organization’s technology affects its structure. [20, p. 4] In both seminal articles, the authors explain that it is critical that the level at which data are analyzed must conform to the focal unit of the theory [20], [22]. Where the focal unit is incongruent with the level of data analysis, problems of misspecification occur, leading to cross-level fallacies, contextual fallacies, and aggregation biases. Contrary to what most researchers believe, Klein et al. argue that it is not necessary that the level of data collection match the level of data analysis—and, in fact, they urge researchers to collect data at multiple levels of analysis [22]. While such advice may appear counterintuitive, Klein et al. argue that it is the level at which data are analyzed that must match the focal unit (but the level at which data are collected may be different—ideally, collecting data at a more “micro” level than the level at which data will be analyzed). Thus, it is possible, and even desirable, for researchers to specify group-level hypotheses as their focal unit, but then to collect data at the individual level and analyze their data at the group level. In this regard, the focal unit indeed conforms to the level of data analysis since both are at the group level, but the focal unit (i.e., group level) is at a higher level of analysis than the level at which data are collected. This is both appropriate and desirable [22]. Researchers must, however, demonstrate that their data meet specific criteria before analyzing individual-level data at the group level of analysis; otherwise problems of aggregation bias will occur [20], leading to findings that are statistically significant, but possibly invalid [34]. 91 Not only is it permissible for data to be collected one level lower than the level at which they are to be analyzed, but Klein et al. strongly encourage researchers to do so because such lower-level data can be statistically analyzed to ensure that they exhibit the necessary attributes to be aggregated to the higher, group level [22]. In contrast, researchers who collect their data only at the group level (e.g., in order to test a group-level theory) have no choice but to simply assume that their constructs are valid at the group level. They are unable to statistically test whether these constructs are valid at the group level. According to Klein et al. [22], it is always better for researchers to test their assumption that individual-level data can be statistically aggregated to the group level, rather than simply assume this to be true [22]. Thus, for example, in a study whose focal unit is the group level, it is desirable for the researcher to collect data at both the individual and the group levels, and then to statistically test whether individuals are more similar within groups than would occur by chance, thus demonstrating that the data can be analyzed at the group level. There are several statistical methods for doing so (which we briefly review in the next section). The important conclusion is that it is acceptable for the researcher to use individual-level data for testing group-level theories, but only if the data can be statistically proven to meet the criteria for conducting analyses at the group level (generally known as within-group homogeneity of variance or interrater reliability). Klein et al. explain why a demonstration of within-group homogeneity is such a critical threshold for researchers to meet when testing their theories at the group (or unit) level: The very definition of such [group] constructs asserts that unit members agree in their perceptions of the relevant characteristics of the unit. In the absence of substantial within-unit agreement, the unit-level construct is untenable, moot . . . For example, in the absence of substantial agreement among the members of a unit about the unit’s norms, the unit simply has no shared norms. [22, p. 4] Below, we distinguish between three types of group-level constructs, which are global (measured at the group level), shared (measured at the individual level, but homogeneous within groups), and fuzzy or fictional (constructs that arguably do not exist at the group level) [43]. (There is another category we do not discuss here which is known as the “frog-pond effect” [22]. This concerns data which are measured individually, but which are shown to be heterogeneous among members of a group. Such constructs may be useful for explaining how members’ divergence from their group’s mean can explain other constructs. An example is research on organizational demography, which shows, for example, how age, gender, or racial diversity within a management team influences group- or corporate-level performance.) 92 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 If a construct is measured at the group level, and there is no individual-level analog, then the measure is a global measure. Some examples are the group’s mission and the quality of the group’s output or performance. Other measures that are typically assumed to be global measures include group size and the number of unique ideas proposed by the group during a brainstorming task. If the construct is actually measured at the individual level and then aggregated up to the group level (usually through averaging or summing the individual-level data within each group), the construct is shared rather than global. In order for a construct to be shared at the group level, however, the individuals within each group must be relatively homogenous, in terms of their individual scores. One way to operationalize such within-group homogeneity is by requiring that the statistical variation of the scores within each given group be less than the variance among the individual scores across the various groups. A number of complex statistical metrics have been proposed to measure within-group homogeneity (sometimes labeled intercoder agreement), with names such as eta-squared [44], rwg (within-group correlation [19]), intra-class correlation coefficient [44], [45] (there are two versions of this, known as ICC-1 and ICC-2), and within-and-between analysis (WABA) [49]. The statistical methods for assessing such within-group homogeneity are complex, and have been explicated in more than a dozen papers in psychology and management journals dating back some 20 years [41], [42]. Some useful studies have recently been published which compare and contrast the various metrics for assessing within-group homogeneity [35], [45], examining issues such as how group size affects the various metrics [44]. Most of these metrics are used to assess whether it makes sense to use aggregated, individual-level data by examining the data for all groups on a specific construct (these include eta-squared, ICC-1, and ICC-2). Other metrics are used for testing whether it is appropriate to aggregate the data for a single group on a particular construct (rwg ), in which case, a separate rwg score can be derived for each group on each construct [42], [47]. One other metric, within-and-between analysis, examines the entire set of constructs for all groups, indicating whether it is permissible for the overall data to be aggregated to the group level [45], [35]. We note that there is no metric for assessing within-group homogeneity of a single group on multiple constructs. Thus, it makes no sense for a researcher to state that a single group was homogenous across many constructs, or to report a single metric for inter-coder reliability for several constructs—although one recent study did just that [82]. While it is beyond the scope of this paper to define these measures or to explain their mathematical formulae, two points worth bearing in mind are: (1) much contemporary writing already exists on the legitimacy and statistical methods for aggregating individual-level data to the group level, and (2) if no metrics are provided by researchers to support their claim of within-group homogeneity, then it makes no sense to aggregate individual data up to the group level. The mere reporting of Cronbach’s alpha or Cohen’s kappa coefficients is inadequate, since these measures have no bearing on the decision to aggregate individual-level data to the group level. Without some statistical evidence for within-group homogeneity, the meaning of data that has been averaged or summed to the group level is unclear [38]. Bliese labels such constructs as “fuzzy” [43, p. 369]. The mere fact that some aggregated measure of individual perceptions may be statistically significant in explaining other group-level constructs is not sufficient proof that the construct truly exists at the group level. Klein et al. are very resolute in their statement that “within-group agreement is a prerequisite for the aggregation of the individual-level data to the group level. In these models, within-group agreement is more than a statistical hurdle. It is an integral element in the definition of the group level construct” [36, p. 4]. This is a critical point. Even if researchers wish to use aggregated individual data to test a group-level theory, if the individual-level scores cannot be shown to exhibit within-group homogeneity, then the construct makes no sense at the group level of analysis. Klein et al. assert that: if the level of statistical analysis matches the level of theory [e.g., the group level], yet the data do not conform to the predicted level of theory, a researcher may draw erroneous conclusions from the data. The importance of conformity of the data to theories predicting within-group homogeneity is relatively well known and well understood. [22, p. 199] With this overview of levels of analysis issues and statistical techniques for assessing within-group homogeneity, the following section describes our research methods for examining recent literature on e-collaboration to determine the extent to which these levels issues are appropriately managed. RESEARCH METHODS To examine the levels of analysis issues described above, we analyzed recently published research on e-collaboration. We selected the three leading IS journals in North America (Information Systems Research, MIS Quarterly, and Journal of MIS); two top European journals (European Journal of IS and Information Systems Journal); two specialized outlets, including a technical journal (Decision Support Systems) and an e-commerce journal (International Journal of Electronic Commerce). We examined all papers published in these journals from 1999–2003. Although some recent review studies focused on just the three North American journals (for example, [2], [34]), we included a more comprehensive set of outlets GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION where studies of e-collaboration are published. Our data collection period was one year longer than that of Walczuch and Watson, who conducted a similar review of whether researchers had appropriately taken group-level factors into account in their studies of GSS usage, based on four years of empirical studies [34]. To identify relevant articles, we used ABI/Inform and Business Source Premier to search the seven journals, using terms and phrases such as GSS, electronic meeting systems, groupware, virtual teams, and collaborative technology. Next, both authors reviewed each paper retrieved from the search to ensure that it was related to the use of IT for collaboration among individuals in groups, teams, or communities. We used three attributes of Fjermestad and Hiltz’s definition to identify studies of e-collaboration, to wit: First, the study had to be published in a refereed journal . . . Second, [those we included] . . . were studies of groups, which we defined as comprising at least three members . . . Third, they used a computer-based GDSS or GCSS [group communication support system] with at least minimal features designed to support group communication and decision-making processes. [28, p. 9] We specifically did not follow the fourth criterion specified by Fjermestad and Hiltz—namely, that the study had to be a controlled experiment [28]. Instead, we deliberately sought diversity in terms of the research methods employed. In fact, similar to a recent study that identified three primary research methodologies that have been employed to investigate e-collaboration [50], the studies we retrieved employed a range of experimental, survey, and case research methods. Despite the variety of research methods and group technologies employed, the articles that we retrieved excluded studies conducted at the organizational level of analysis (e.g., firm-level case studies of electronic markets, alliances, and supply chain management [51]–[53] or studies that examined the adoption of interorganizational collaborative technologies such as EDI and B2B e-commerce [54]–[57]). We found a total of 54 articles that met our search criteria. This included 37 papers from the three leading North American Journals, seven papers from the European journals, and ten from the specialized publications. The total number of research articles published in these seven outlets during 1999–2003 was 989. This figure includes only research articles and research notes, and excludes research commentaries, issues and opinions pieces, editorial comments, and book reviews. A total of 388 articles were published in the leading North American journals, from which nearly 10% (37 papers) matched our search criteria. The distribution of the sample of 93 this set of 37 articles was 57% from JMIS, 30% from MISQ, and 13% from ISR. The ratio of selected articles to total articles from these three North American general IS outlets were 4.5% from ISR, 12% from JMIS, and 11% from MISQ. These figures are similar to the proportion of studies that Vessey et al. identified at the group level of analysis in the same journals for an earlier time period (which were 10.6%, 15.3%, and 8% for these three journals, respectively) [58]. Moreover, the larger number of studies on e-collaboration that we retrieved from JMIS (21 studies) than from ISR (5 studies) or MISQ (11 studies) is thus consistent with the findings of Vessey et al. [58], showing that JMIS published more group-level studies than the other journals they examined. It was surprising to find so few qualifying studies in the European journals (just 7 out of 200), in the e-commerce journal (1 out of 100), and in the specialized publication (9 out of 301). Table I presents the list of the studies that met our search criteria, organized by publication. TABLE I Research articles on e-collaboration published from 1999–2003 by journal title For each of these 54 studies we identified, we coded the following information: the type of collaborative technology examined; the number of groups studied; group size; total number of individuals studied; the duration of the study; independent, dependent and mediating variables; and the analytic methods employed. In terms of level of analysis issues, we identified three domains, consistent with the arguments summarized above [21], [22]: (1) whether the initial theory and hypotheses were specified at the group or individual level, (2) the level at which data were collected, and (3) the level at which data were analyzed. If the researchers aggregated individual-level data to the group level, we also noted whether they included some metrics to establish within-group homogeneity or interrater reliability 94 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 prior to aggregation. Similarly, if the researchers analyzed their data at the individual level, we noted whether they controlled for the group to which each subject was assigned, as recommended by Walczuch and Watson, and whether the subjects actually interacted with each other within their groups, or whether these individuals worked alone. In this regard, we noted some confusion in the terminology employed by researchers. In much of the empirical GSS research, the term group is employed to refer to the context in which task performance occurs (e.g., such as electronic brainstormingor other problem solving). Some researchers, however, employ the term group where they are simply referring to the treatment condition to which subjects were assigned. For example, in some studies, the treatment conditions were having access to videoconferencing versus simple, textual data. We consider these to be two different treatment conditions rather than two groups per se. In some studies, there were multiple groups within each treatment condition; in other studies, however, there were no groups, because the treatment was administered to individual subjects without any group interaction or communication. These subjects co-existed within a classroom setting or virtual community but did not have to produce a specific joint output (e.g., a group decision or group report) [34]. Of the total 54 studies, there were 18 “nonconforming studies,” to use Fjermestad and Hiltz’s terminology, that we were unable to analyze for levels of analysis issues [28, p. 77]. These studies met our definition of e-collaboration, but for various reasons, we were unable to examine the levels of analysis issues in the same manner as with the other published studies. This included two quantitative meta-analyzes [19], [27]; two qualitative literature reviews [28], [29]; two conceptual papers [59], [60]; two methodological reviews [61], [62]; a qualitative, comparative case study analysis [63]; and six case studies that were conducted within a single group, project team, or firm [16], [64]–[68]. We also excluded three exploratory studies for which no a priori theory or hypotheses were stated [69]–[71], where only descriptive statistics were presented, rather than the results of any multivariate analysis or hypothesis testing. The latter study collected quantitative and qualitative data from several groups, with the goal of showing how positivist and interpretive analyses of the same data could reveal different insights [71]. Although these studies did collect and analyze quantitative data from multiple groups using a GSS, these studies were exploratory, and no theory or hypotheses were stated. We classified these three studies as “nonconforming” because only descriptive statistics were reported, without any true analysis or hypothesis testing. We excluded these 18 studies either because quantitative meta-analysis studies combine data from many prior studies or because qualitative case studies are not susceptible to the level of analysis concerns that we described above [19], [27]. In this regard, we concur with Larsen, who noted that qualitative research has many advantages “due to its rich description, [but] research developed using quantitative approaches offer a higher degree of formalization in application of methods” [72, p. 170]. Finally, for the studies that examined only a single group or team [16], [64]–[68], it is meaningless to attempt to compute measures of within-group homogeneity for a single group [22]. After omitting these 18 nonconforming studies, there were a total of 36 empirical studies of e-collaboration remaining, which both authors read and coded [28]. RESULTS We analyzed the sample of articles by identifying two sets of issues regarding congruence of levels. First, what is the level at which the theory is stated (the focal unit, per Rousseau [20]), and does it match the level at which the data were analyzed? Second, is the level at which the data were analyzed congruent with the nature of data themselves? For instance, a group-level theory is congruent with group measures that are collected globally (e.g., group size, quality of group output, completion time, or number of unique ideas within a brainstorming exercise) or with individual-level data that can be statistically shown to be shared (i.e., homogeneous) among group members. As Klein et al. described, it is feasible to posit hypotheses at the group level and then aggregate individual-level data up to the group level, but such analyses make sense only if the researchers can first demonstrate that group members are more similar to their peers within the group than to members of other groups [22]. The authors must demonstrate such within-group homogeneity separately on each measure for which they wish to aggregate individual-level data up to the group level. There are a host of techniques for documenting within-group homogeneity of variance, as described above (e.g., [35], [43], [45]), and doing so is a necessary prerequisite to conducting the analysis at the group level. Based on our review of these two sets of questions, we identified six distinct clusters of research articles. Table II summarizes the criteria describing each cluster, as well as any levels incongruence, and the studies corresponding to each cluster. Of the six clusters of articles that we identified, the findings from three of these clusters are valid and trustworthy (clusters 1, 2, and 5), because there is congruence between the level of analysis at which the theory is stated (the focal unit), the level at which data are analyzed, and the nature of the data themselves [20]. For the other clusters, there is some form of incongruence—whether incongruence GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION between the focal unit and the level of data analysis (cluster 4) or incongruence between the nature of the data themselves and the methods using to analyze them (clusters 3 and 6). Below, we describe and give examples of each cluster, explaining how these studies exhibit levels of analysis congruence or incongruence. Cluster 1 consists of studies where the level of analysis is unquestionably group level. This means that the focal unit at which the theory and hypotheses are stated, the nature of the data collected, and the analysis methods are all at the group level. To belong to this cluster, all of the data collected and analyzed should be global constructs measured at the group level—such as the amount of time to solve a problem or reach a decision, the total number of unique ideas generated during brainstorming, or the quality of the group’s solution. The type of global performance measure is dependent upon the nature of the task the group is performing. For tasks with right answers, the accuracy of the group’s decision is appropriate; for brainstorming tasks, the total number of unique ideas generated is a meaningful, global construct; finally, for judgment tasks, the quality of the group’s output may be used. Conclusions drawn from such studies are valid because there is a strong fit between the group focal unit and the levels at which data were collected and analyzed. There were three studies that conformed to cluster 1, based on our search. Adkins et al. [73] and Benbunan-Fich et al. [74] partially corresponded to this cluster because most of their hypotheses (except for one in each case) were at the 95 group level. In addition, we also placed the study by Pinsonneault et al. in this cluster because the theory, the data collection, and the data analysis were all at the group level [15]. Another study by Barkhi was identified as also partially conforming to cluster 1 [75]. However, only two of its hypotheses were stated at the group level of analysis, while a total of four hypotheses were formulated at the individual level; thus, we coded this study as belonging to cluster 6. Cluster 2 consists of valid and trustworthy studies, where the focal unit of the theory was at the group level and all data were collected at the individual level, and then appropriately aggregated up to the group level before testing group-level hypotheses. By “appropriately aggregated,” we mean that the authors specifically conducted and reported their tests of within-group homogeneity of variance to show that subjects were indeed more similar to their peers within a given group than to other subjects across groups, and that such individual-level data could be safely aggregated to the group level—usually by averaging or summing individual survey responses. We found just two studies belonging to cluster 2: Piccoli and Ives [76] and Yoo and Alavi [77]. In both studies, the researchers examined and reported specific measures of within-group homogeneity (based on James’s rwg , a form of inter-coder reliability) before they averaged the individual survey responses up to the group level [41]. Results from these studies can be considered valid and trustworthy, at least in terms of the levels of analysis issues discussed here. TABLE II Cluster classification of research articles on e-collaboration 96 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 Cluster 3 represents the first set of problematic studies—whose results are not guaranteed to be valid and trustworthy. These are studies where the focal unit was the group, but where the data were inappropriately aggregated from the individual level to the group level before testing a group-level theory or hypothesis. By “inappropriately aggregated,” we mean that the authors neglected to conduct or report any statistical tests of within-group homogeneity, which are needed to justify aggregation of individual-level data to the group through averages or sums of individual-level scores. In the absence of such evidence of within-group homogeneity, we cannot be certain whether the researchers’ use of group averages or sums representing individual subjects’ scores are valid representations of the actual data. There were eight such studies in which the researchers averaged (or summed) the individual-level data to the group level, without providing any justification for doing so: Huang and Wei [78]; Huang et al. [79]; Kahai and Cooper [80], [81]; Kayworth and Leidner [82]; Limayem and DeSanctis [83]; Tan et al. [84]; and Townsend et al. [85]. While in many cases, the authors found support for their theories or hypotheses, such conclusions are questionable. In this regard, we reiterate the observation of Klein et al. regarding the problematic nature of any such conclusions: “if the level of statistical analysis matches the level of theory (i.e., both are at the group level), yet the data do not conform to the predicted level of theory, a researcher may draw erroneous conclusions” [22, p. 199]. Cluster 4 represents another set of problematic studies whose results are not guaranteed to be valid or trustworthy. These are studies in which the focal unit was the group, but in which all data were collected and analyzed at the individual level. There is a mismatch between the level of the focal unit (because the hypotheses were formulated at the group level) and the level at which the data were collected and analyzed (at the individual level, without controlling for the different groups). The problem is that the authors have anthropomorphized some phenomenon, claiming that a given behavior occurs at the group level while only providing evidence of a different (but related) phenomenon at the individual level. Given this divergence between the group-level theory and the individual-level data and analysis, this means that a “misspecification” or “fallacy of the wrong level” exists [21, p. 5]. Such incongruence may easily be remedied in one of two ways. First, the authors may restate their theory and hypotheses to refer to individuals or “individuals within groups,” rather than trying to theorize about the behavior of group entities as a whole [22, p. 198]. The second way is to retain the theory at the group level and to continue analyzing individual-level data, but also include a dummy variable to represent each group in the statistical analysis, to detect possible differences between groups. This is the solution advocated by Walczuch and Watson, who argued that individual-level ANOVA analyses are improper in evaluating GSS data, unless the researchers recognized and controlled for the manner in which subjects were clustered into groups [34]. We found three studies corresponding to cluster 4 in our review: Burke and Chidambaram [86], Grise and Gallupe [87], and Tung and Quaddus [88]. In the first study, the authors stated 21 group-level hypotheses, but then collected and analyzed individual-level data, using ANOVA, and ignoring any group-level effects—despite the fact that subjects worked and interacted in four-person groups [86]. Similarly, the other two studies formulated their theories at the group level, but they employed ANOVA analyses to test only individual-level data, again neglecting to take into account the fact that individuals had been assigned to small groups and interacted within those groups [87], [88]. Such individual-level analytic methods (e.g., ANOVA) treat all individuals as identical, thus ignoring the fact that they worked within different groups—and thus, may have been subject to specific group-level effects. A simple dummy variable added to the ANOVA analysis (sometimes called nested-ANOVA analysis) could easily resolve the problem. Without somehow controlling for possible group-level effects, the problem is that individual-level differences may be statistically supported, but in some cases the effect would be absent if the researchers had bothered to control for group membership. According to Walczuch and Watson [34], the distortion that results from ignoring group-level effects when analyzing such individual-level data is more problematic for larger-sized groups than for smaller groups [44]. Cluster 5 consists of studies where the level is unquestionably individual, even though the study examines collaborative technologies. This means that the level at which the theory and hypothesis are stated, the nature of the data collected, and the analysis methods all appropriately occur at the individual level. The results from these studies may be considered valid and trustworthy. To belong to this cluster, all of the data collected and analyzed must be individual-level constructs, and the authors must have explicitly demonstrated that individual-level analysis is appropriate (either because the subjects were not assigned to work together in groups—and thus, did not interact with each other—or because the subjects did work in teams, but the authors tested for and showed that within-group homogeneity was absent). Given the differences between these two conditions, we opted to split this cluster into two sub-clusters: cluster 5a (studies in which the subjects were not assigned to groups, and thus had no interaction with other members in the study), and cluster 5b (in which subjects were assigned to work in small groups, but where evidence of within-group homogeneity was explicitly tested for and shown to be absent). We found six studies conforming to cluster GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION 5a (Garfield et al. [89], Hilmer and Dennis [90], Khalifa and Kwok [91], Koh and Kim [92], Piccoli, et al. [93], Reinig and Shin, [18]) and just one study, by Alavi et al. [94], conforming to cluster 5b. The study by Khalifa and Kwok consisted of two separate empirical studies, one where individual subjects did not interact within a group (corresponding to cluster 5a), and another where the subjects did interact within their groups (thus, corresponding to cluster 6). Since this article was comprised of two empirical studies within the same paper, we counted it as half a paper corresponding to cluster 5a and another half paper corresponding to cluster 6. Regarding the studies corresponding to cluster 5a, some novel techniques employed by researchers to justify their individual levels of analysis are worth noting. For example, in a recent study conducted by Garfield et al., the subjects in the study believed that they were collaborating with other team members in their group; however, the presumed members were fictitious [89]. Instead, the researchers employed a technology called a group simulator which: looks and acts like a groupware system, but instead of sharing ideas among participants, [the simulator] . . . presents participants with comments that appear to be from other participants, but which are, in fact, drawn from a database of preset ideas. Simulators increase experimental control by enabling a very specific and precise experimental environment [89, p. 327] Despite subjects’ beliefs that they were interacting with other team members via a groupware technology, these subjects were exposed to controlled, identical feedback via the group simulator. All subjects were truly individuals in the experiment, and thus no group-level effects were possible. In the remaining studies corresponding to cluster 5a, subjects were not assigned to groups, but rather were studied as individuals working alone, often in classroom settings. In the single remaining study that we classified as belonging to cluster 5b, Alavi et al. assigned students to 7–10 member student teams [94], yet an examination of within-group homogeneity statistics (based on James’s rwg metric [41]) showed that subjects within each group were no more similar on the measured constructs than subjects were across groups. By showing that an individual-level analysis was appropriate, these researchers elected to analyze all data at the individual level (using ANOVA), an approach that conformed to their individual-level hypotheses. In this example, Alavi et al. stated one hypothesis (H3) at the individual level and stated two hypotheses (H1, H2) in such a manner that either an individual- or group-level of analyses was possible [94]. The results of these studies corresponding to clusters 5a and 5b are valid, because the hypotheses, data 97 collection, and analysis were all at the individual level. In summary, there were a total of 6.5 studies corresponding to cluster 5 (including cluster 5a and 5b). The studies comprising cluster 6 are similar to those in cluster 5, with the exception that subjects did communicate and interact within groups, and thus, the researchers should have taken group-level effects into account. Like cluster 5, the cluster 6 studies were those in which the theory was formulated at the individual level, and individual-level data were collected and analyzed. The problem with the studies corresponding to cluster 6 (which distinguishes them from cluster 5) is that subjects were assigned to work in groups, and thus, any individual-level analysis ignores group-level effects (similar to the problems noted in cluster 4). What distinguishes the studies in cluster 6 from those in cluster 4 is that for studies in cluster 6, the overall theory is formulated at the individual level (e.g., “members of GSS-supported groups will exhibit more of some attribute, compared to members of face-to-face groups”), whereas the hypotheses in cluster 4 studies are formulated at the group level. In the studies corresponding to cluster 4, the group level theory or hypotheses often took the form of statements such as “GSS groups will exhibit more of [some outcome variable], compared to face-to-face groups.” Such cluster 6-type hypotheses concern the attributes or performance of the group entity as a whole, rather than of the individual members comprising the group. It is uncertain whether the individual-level data should have been analyzed at the individual or group level of analysis, because the authors provided no evidence of within-group homogeneity to show whether the data should be analyzed at the group level or the individual level of analysis. Since no tests of within-group homogeneity are reported, the researchers’ decision to analyze data at the individual level of analysis has unknown congruence with the focal unit—which is at the individual level. We consider these studies to be problematic because they ignore possible group-level effects. Given the fact that these studies do state their theory at the individual level and conduct their analyses at the individual level, there is (at least) partial congruence. The problem arises with the fact that, due to a lack of evidence for within-group homogeneity or heterogeneity, it is unclear whether the appropriate level of analysis at which researchers should test their theory is the group or the individual level. By including such evidence, as did Alavi et al. in the one study corresponding to cluster 5b, the researchers could have justified that their individual level of analysis was appropriate, but they failed to do so [94]. There were 12 studies that fully corresponded to cluster 6—over one-third of the total empirical studies of e-collaboration. This included studies by 98 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 Dennis et al. [95]; Dennis and Garfield [96]; Hayne et al. [97], Hender et al. [98]; Karahanna et al. [99]; Kwok, et al. [100]; Kwok, Ma and Vogel [101]; Luo et al. [102]; Miranda and Bostrom [103]; Reinig [104]; Sia et al. [105]; and Warkentin and Beranek [106]. In addition, we counted the paper by Khalifa and Kwok [91] as corresponding one-half to cluster 6, because one of its two embedded studies featured subjects interacting within groups. Also, the study by Barkhi that we mentioned above corresponded primarily to cluster 6 because most of its hypotheses were stated at the individual level, and data were analyzed at the individual level, despite the fact that subjects interacted within groups [75]. Like the other studies in cluster 6, the studies by Barkhi [75] and one of the two embedded studies within Khalifa and Kwok’s paper [91] neglected to test whether group-level effects existed (by examining within-group homogeneity or by adding dummy variables to their ANOVA analyses) to test for or to rule out possible group-level effects. Thus, there were a total of 13.5 studies corresponding to cluster 6, 37.5% of these empirical studies. DISCUSSION AND CONTRIBUTIONS Due to the nature of e-collaboration as a field of inquiry, research in this area is naturally positioned to encounter different challenges than other IS research domains, in terms of multi-level issues. Some of the most widely used and tested theories are formulated at the group level, but empirical data is often collected at the individual level and then aggregated up to the group level through simple means and sums. Often, researchers neglect to report whether their individual-level data are sufficiently homogeneous within groups and are thus suitable candidates for aggregation to the group level. We identified two clusters of what we consider to be highly problematic studies, which we labeled as clusters 3 and 4. The results of the studies corresponding to these clusters cannot be considered valid and trustworthy, but for different reasons. The eight studies corresponding to cluster 3 do not provide any evidence of within-group homogeneity, and therefore the researchers’ decision to aggregate their individual-level data to the group level is inappropriate. To even speak of the group-level construct (e.g., process satisfaction) may be untenable or moot, if evidence of within-group homogeneity is lacking [22]. The three studies located in cluster 4 are also open to challenge as being valid and trustworthy, although here the problem is somewhat different, namely a mismatch between the level of the theory (the group level) and the level of the data collection and analysis (both at the individual level). Taken together, the studies corresponding to clusters 3 and 4 violate warnings by Rousseau and Klein et al. to ensure that the level of theory is congruent with the level at which data are analyzed [20], [22]. In total, almost one-third of the studies that we found (11 out of 36) corresponded to these highly problematic clusters. However, judging by the number of studies classified as belonging to cluster 6, the most widely encountered multi-level issue occurs when researchers seek to test theories formulated at the individual level, and when they collect and analyze individual-level data without examining the potential group-level effects. Although the studies in this cluster do not suffer from level mismatch problems that Rousseau warned against, data collected from subjects working in groups violates one of the key assumptions for using ANOVA or regression analysis—since the scores for members within groups are likely to be correlated, thus violating the assumption of statistical independence of data [20]. When individuals are assigned to groups (and thus interact or communicate within these groups), the data are no longer independent, but are subject to common group-level effects. When researchers erroneously analyze such data at the individual level, they artificially inflate the degrees of freedom, and hence the likelihood of finding apparent results by rejecting the null hypothesis when true individual-level effects may be absent [34], [44]. In such cases, researchers should account for group-level effects in order for their results to be valid and trustworthy. While we do not consider the 13.5 studies corresponding to cluster 6 to be problematic in exactly the same manner as those in clusters 3 and 4 (because with cluster 6, at least the level of theory matches the level of data analysis), nevertheless, researchers are violating assumptions of statistical independence in their data. To remedy this problem, they should test to see whether group-level effects exist before assuming that individual-level analyses are appropriate. As we described above, there are two ways to do this. One way is to examine within-group homogeneity of variance and show that it is absent, as did Alavi et al. [94]. A second way is to conduct a nested-ANOVA analysis, adding group-level dummy variables to control for any possible within-group effects [34]. Over one-third (13.5 out of 36) of the empirical studies of e-collaboration that we identified corresponded to this moderately problematic cluster 6, in which the results should not be considered reliable and trustworthy. Given that about one-third of the studies corresponded to the highly problematic clusters (clusters 3 and 4), and over one-third (37.5%) corresponded to the moderately problematic cluster 6, this leaves just one-third of the empirical e-collaboration studies from the seven journals that we reviewed (32%) as exhibiting appropriate congruence between the theory level and the level of data collection and analysis. These relatively few studies (11.5 papers, to be exact) were divided among several clusters, including those in which the focal unit and data were all at the group GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION level (cluster 1, consisting of three studies); those in which the focal unit was at the group level, but where the authors collected individual-level data and justified their aggregation to the group level (cluster 2, consisting of two studies); and studies in which the focal unit was the individual, and the subjects either worked independently, or in which the authors statistically demonstrated a lack of within-group homogeneity (cluster 5a, consisting of 5.5 studies, and cluster 5b, consisting of 1 study, respectively). All of these studies corresponding to clusters 1, 2, and 5 may be considered valid and trustworthy, at least in the sense that they exhibit no levels of analysis incongruence. The fact that just three studies correspond to cluster 1 is an indication that researchers are rarely able to restrict their empirical measures of group behavior solely to global group constructs (i.e., constructs where the property exists for the group as a whole, and not for its individual members). While it is common for GSS researchers to feature some global constructs in their models (e.g., group size, decision quality, or completion time), there are very few studies that consist only of such global constructs. More common is for researchers to include at least some individual-level constructs (e.g., individual process satisfaction) in their studies to complement the global measures. Most of these studies employed some individual-level measures, which indicate that researchers are more inclined to collect individual-level data rather than measure all group constructs globally, regardless of the level of the theory or their data analysis. Although researchers may posit theories and hypotheses at the group level, they are also likely to gather individual-level data. Of course, how the researchers choose to treat such data is critical, whether they first justify that the data exhibit within-group homogeneity and thus can be appropriately aggregated to the group level (as in cluster 2), or instead neglect to consider levels of analysis concerns by just assuming that the data can be aggregated to the group level (as in cluster 3). According to our classification, only one study totally conformed to cluster 1, while two were found to partially conform to this cluster. This means that few studies of e-collaboration can afford to limit themselves to global group-level constructs, and underscores the importance of researchers’ understanding what they can and cannot do with individual-level data that they collect. In such cases, the researchers may follow appropriate guidelines from the management literature either for ensuring that their data can be appropriately aggregated to the group level, in order to test group-level hypotheses (cluster 2), or alternatively, they can show that their study scenario and data are free from group-level effects and instead examine individual-level hypotheses (cluster 5). Before considering what we learned from the three problematic clusters 99 (clusters 3, 4, and 6), we reiterate the point that all the empirical studies collected at least some individual-level data, and thus, it is very important for researchers to understand how to treat this data so as to ensure that the data themselves are congruent with their level of data analysis and also correspond to the focal unit of their theories. It is thus vitally important for all researchers who collect individual-level data when studying the use of collaboration technologies by individuals, groups, and communities to bear in mind the insights from our study. Overall, over two-thirds of the studies that we analyzed (24.5 out of 36 studies, or 68%) corresponded to one of the three problematic clusters (3, 4, or 6) for which the validity of results and conclusions should be considered questionable. Within this group of studies, however, we have distinguished between studies that are highly problematic (30.5% corresponding to clusters 3 and 4) and those that are moderately problematic (37.5% corresponding to cluster 6). Given this surprisingly large fraction of studies whose statistical results are thus open to question, it seems likely that the predominant confusion and inconsistent results that have characterized the e-collaboration literature over the past two decades are due, at least in part, to the multi-level problems described above. By creating awareness of these multi-level issues, the IS research community can be prompted to pay greater attention to the important conceptual issues involving levels of analysis [20], [22], [24], and the available statistical techniques for measuring within-group homogeneity [35], [41], [42], [45], [47], some of which have been in circulation for two decades or more. Just as the IS literature on firm-level payoffs from IT investments began to exhibit greater consistency once the levels of analysis confusion described by Brynjolfsson was resolved (in addition to other methodological advances), we believe that IS researchers who study group-level IT use may benefit from the insights described here [5]. The contributions of this paper are three-fold. First, we have offered an alternative explanation for why studies of e-collaboration present divergent and inconsistent results. Unlike other voices in the literature who have attempted to explain how the problem of inconsistent findings might be solved, we do not propose alternative epistemologies [16]; new theoretical lenses [30], [31], new constructs [32], [33]; or advocate longitudinal studies of groups [16], [18]. Instead, we link the problem of inconsistent results of e-collaboration with various forms of incongruence between the level at which theory is stated (the focal unit) and the levels at which data are collected and analyzed. The second contribution of our work is to empirically demonstrate the extent of the problem of levels incongruence: we found that only 32% of the empirical studies in seven leading international IS journals were without any levels of analysis concerns 100 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 (11.5 studies out of 36), while nearly one-third of the studies were highly problematic (11 out of 36), and over one-third were moderately problematic (13.5 out of 36, or 37.5%). We consider this strong evidence of the severity of the problem of levels incongruence in e-collaboration research, and we strongly encourage IS researchers who study e-collaboration to read the seminal studies that explain the conceptual issues involving levels of analysis, particularly some classic pieces by Rousseau and her colleagues [40], [46], [76] and Klein and her colleagues [22]–[24]. Of course, the insights from our work may be of value not only to researchers who empirically study e-collaboration but to peer reviewers and journal editors who are evaluating the merits of this research domain. We hope that this occasion of a special issue of IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION can serve as a useful vehicle for this third contribution, a framework for classifying studies of GSS and e-collaboration, which will be of value to reviewers and journal editors, as well as to researchers. CONCLUSION There has been a proliferation of competing explanations regarding the inconsistent results reported by the e-collaboration literature since its inception. This study advances another possible explanation by investigating the range of multi-level issues that can be encountered in this research. In order to avoid problems of levels incongruence, the level of the theory (the focal unit), the level of the data analysis, and the unit of analysis collected must be consistent with each other. Our analysis of 36 studies of e-collaboration published in several IS journals in the last five years found that over two-thirds of these studies contain one or more problems of levels incongruence that cast doubts on the validity of their findings. It is indeed possible that these methodological problems are in part responsible for the inconsistency of the results reported in this literature, especially since a researcher’s decision to analyze data at the individual level, even when the research setting features individuals working in groups, will artificially increase the likelihood of finding artifactual or inaccurate results [34]. Such an outcome may have occurred in some of the studies that we classified as belonging to cluster 6. Alternatively, the studies that inappropriately analyzed their data at the group level of analysis (based on aggregated individual-level data) would be less likely to find support for their hypotheses than if the data had been analyzed individually (e.g., studies corresponding to cluster 3). While we cannot definitively state whether any specific author’s results from the studies that we classified into clusters 3, 4, or 6 are valid or not, the possibility that the researchers may have reached inappropriate conclusions due to levels of analysis incongruence means that such findings should be considered cautiously, and scholars should not be surprised by the lack of consistency across studies [11], [16]. By reflecting more consciously on these levels of analysis issues described in this paper, we believe that IS researchers who study IT use for e-collaboration will be better able to build a more solid, consistent, and trustworthy body of findings in the future. REFERENCES [1] H. C. Lucas, Why Information Systems Fail. New York: Columbia Univ. Press, 1974. [2] Y. E. Chan, “IT value: The great divide between qualitative and quantitative and individual and organizational measures,” J. Manage. Inform. Syst., vol. 16, no. 4, pp. 225–261, 2000. [3] S. S. Roach, “America’s technology dilemma: A profile of the information economy,” Morgan Stanley Special Economic Study, Apr. 1987. , “Services under siege: The restructuring imperative,” Harvard Bus. Rev., pp. 82–92, Sept.–Oct. 1991. [4] [5] E. Brynjolfsson, “The productivity paradox of information technology,” Commun. ACM, vol. 36, no. 12, pp. 67–77, 1993. [6] G. M. Loveman, “An assessment of the productivity of information technology,” in Information Technology and the Corporation of the 1990s., T. Allen and M. S. S. Morton, Eds. Cambridge, MA: MIT Press, 1994. [7] E. Brynjolfsson and L. M. Hitt, “Paradox lost? Firm-level evidence on the returns to information systems spending,” Manage. Sci., vol. 42, no. 4, pp. 541–558, 1996. [8] , “Beyond computation: Information technology, organizational transformation, and business performance,” Journal of Economic Perspectives, vol. 14, no. 4, pp. 23–48, 2000. [9] J. Dedrick, V. Gurbaxani, and K. L. Kraemer, “Information technology and economic performance: A critical review of the empirical evidence,” ACM Computing Surveys, vol. 35, no. 1, pp. 1–28, 2003. [10] L. M. Hitt and E. Brynjolfsson, “Productivity, business profitability, and consumer surplus: Three different measures of information technology value,” MIS Quart., vol. 20, no. 2, pp. 121–142, 1996. [11] D. Robey and M.-C. Boudreau, “Accounting for the contradictory organizational consequences of information technology: Theoretical directions and methodological implications,” Inform. Syst. Res., vol. 10, no. 2, pp. 167–186, 1999. [12] C. Soh and L. M. Markus, “How IT creates business value: A process theory synthesis,” in Proc. 16th Int. Conf. Inform. Syst., J. I. DeGross, G. Ariav, C. Beath, R. Hoyer, and K. Kemerer, Eds., Amsterdam, The Netherlands, Dec. 1995, pp. 29–41. GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION 101 [13] A. Barua, C. H. Kriebel, and T. Mukhopadhyay, “Information technologies and business value: An analytic and empirical investigation,” Inform. Syst. Res., vol. 6, no. 2, pp. 3–23, 1995. [14] T. F. Bresnahan, E. Brynjolfsson, and L. M. Hitt, “Information technology, workplace organization, and the demand for skilled labor: Firm-level evidence,” Quart. J. Econ., vol. 117, no. 1, pp. 339–370, 2002. [15] A. Pinsonneault, H. Barki, R. B. Gallupe, and N. Hoppen, “Electronic brainstorming: The illusion of productivity,” Inform. Syst. Res., vol. 10, no. 2, pp. 110–133, 1999. [16] A. Gopal and P. Prasad, “Understanding GDSS in symbolic context: Shifting the focus from technology to interaction,” MIS Quart., vol. 24, no. 3, pp. 509–546, 2000. [17] M. L. Markus and D. Robey, “Information technology and organizational change: Causal structure in theory and research,” Manage. Sci., vol. 34, no. 5, pp. 583–598, 1988. [18] B. A. Reinig and B. Shin, “The dynamic effects of group support systems on group meetings,” J. Manage. Inform. Syst., vol. 19, no. 2, pp. 303–325, 2002. [19] A. R. Dennis, B. H. Wixom, and R. J. Vandenberg, “Understanding fit and appropriation effects in group support systems via meta-analysis,” MIS Quart., vol. 25, no. 2, pp. 167–193, 2001. [20] D. M. Rousseau, “Issues in level in organizational research: Multi level and cross level perspectives,” in Research in Organizational Behavior, L. L. Cummings and B. M. Staw, Eds. Greenwich, CT: JAI Press, 1985, vol. 7, pp. 1–37. , “Characteristics of departments, positions and individuals: Contexts for attitudes and behavior,” Admin. [21] Sci. Quart., vol. 23, no. 4, pp. 521–540, 1978. [22] K. J. Klein, F. Dansereau, and R. J. Hall, “Levels issues in theory development, data collection, and analysis,” Acad. Manage. Rev., vol. 19, no. 2, pp. 195–229, 1994. [23] K. J. Klein and S. W. J. Kozlowski, Eds., Multilevel Theory, Research, and Methods in Organizations. San Francisco, CA: Jossey-Bass, 2000. [24] K. J. Klein, H. Tosi, and A. A. Cannella, “Multilevel theory building: Benefits, barriers and new developments,” Acad. Manage. Rev., vol. 24, no. 2, pp. 243–248, 1999. [25] R. Kohli, “In search of IT business value: Do measurement levels make a difference?,” in Proc. 9th Amer. Conf. Inform. Syst., 2003, pp. 1465–1468. [26] M. S. Poole and A. Van de Ven, “Using paradox to build management and organization theories,” Acad. Manage. Rev., vol. 14, no. 4, pp. 562–580, 1989. [27] A. R. Dennis and B. R. Wixom, “Investigating the moderators of the group support systems use with meta-analysis,” J. Manage. Inform. Syst., vol. 18, no. 3, pp. 235–258, 2001/2002. [28] J. Fjermestad and S. R. Hiltz, “An assessment of group support systems experimental research: Methodology and results,” J. Manage. Inform. Syst., vol. 15, no. 3, pp. 7–149, 1998/1999. [29] , “Group support systems: A descriptive evaluation of case and field studies,” J. Manage. Inform. Syst., vol. 17, no. 3, pp. 115–159, 2000/2001. [30] G. DeSanctis and M. S. Poole, “Capturing the complexity in advanced technology use: Adaptive structuration theory,” Org. Sci., vol. 5, no. 2, pp. 121–147, 1994. [31] I. Zigurs and B. K. Buckland, “A theory of task/technology fit and group support systems effectiveness,” MIS Quart., vol. 22, no. 3, pp. 313–334, 1998. [32] W. W. Chin, A. Gopal, and W. D. Salisbury, “Advancing the theory of adaptive structuration: Development of a scale to measure faithfulness of appropriation,” Inform. Syst. Res., vol. 8, no. 4, pp. 342–367, 1997. [33] W. D. Salisbury, W. W. Chin, A. Gopal, and P. R. Newsted, “Better theory through measurement: Developing a scale to capture consensus on appropriation,” Inform. Syst. Res., vol. 13, no. 1, pp. 91–105, 2002. [34] R. M. Walczuch and R. T. Watson, “Analyzing group data in MIS research: Including the effect of the group,” Group Decision Negot., vol. 10, no. 1, pp. 83–94, 2001. [35] K. J. Klein, P. D. Bliese, S. W. Kozlowski, F. Dansereau, M. B. Gavin, M. A. Griffin, D. A. Hofmann, L. R. James, F. J. Yammarino, and M. C. Bligh, “Multilevel analytical techniques: Commonalities, differences, and continuing questions,” in Multilevel Theory, Research, and Methods in Organizations, K. J. Klein and S. W. Kozlowski, Eds. San Francisco, CA: Jossey-Bass, 2000, pp. 512–553. [36] K. J. Klein, A. B. Conn, D. B. Smith, and J. S. Sorra, “Is everyone in agreement? An exploration of within-group agreement in employee perceptions of the work environment,” J. Appl. Psych., vol. 86, no. 1, pp. 3–14, 2001. [37] S. W. Kozlowski and K. Hattrup, “A disagreement about within-group agreement: Disentangling issues of consistency versus consensus,” J. Appl. Psych., vol. 77, no. 2, pp. 161–167, 1992. [38] S. W. Kozlowski and K. J. Klein, “A multilevel approach to theory and research in organizations,” in Multilevel Theory, Research, and Methods in Organizations, K. J. Klein and S. W. J. Kozlowski, Eds. San Francisco, CA: Jossey-Bass, 2000, pp. 3–90. [39] F. Dansereau, F. J. Yammarino, and J. C. Kohles, “Multiple levels of analysis from a longitudinal perspective,” Acad. Manage. Rev., vol. 24, no. 2, pp. 346–357, 1999. [40] K. H. Roberts, C. L. Hulin, and D. M. Rousseau, Developing an Interdisciplinary Science of Organizations. San Francisco, CA: Jossey Bass, 1978. [41] L. R. James, “Aggregation bias in estimates of perceptual agreement,” J. Appl. Psych., vol. 67, no. 2, pp. 219–229, 1982. [42] L. R. James, R. G. Demaree, and G. Wolf, “Estimating within-group interrater reliability with and without response bias,” J. Appl. Psych., vol. 69, no. 1, pp. 85–98, 1984. [43] P. D. Bliese, “Within-group agreement, nonindependence, and reliability: Implications for data aggregation and analysis,” in Multilevel Theory, Research, and Methods in Organizations, K. J. Klein and S. W. J. Kozlowski, Eds. San Francisco, CA: Jossey-Bass, 2000, pp. 349–381. 102 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 [44] P. D. Bliese and R. H. Halverson, “Group size and measures of group level properties: An examination of eta-squared and ICC Values,” J. Manage., vol. 24, no. 2, pp. 157–172, 1998. [45] S. L. Castro, “Data analytic methods for the analysis of multilevel questions: A comparison of intraclass correlation coefficients, rwg , hierarchical linear modeling, WABA, and random group resampling,” Leadership Quart., vol. 13, no. 1, pp. 69–93, 2002. [46] R. House, D. M. Rousseau, and M. Thomas-Hunt, “The meso paradigm: A framework for integration of micro and macro organizational behavior,” in Research in Organizational Behavior, vol. 17, L. L. Cummings and B. Staw, Eds. Greenwich, CT, 1995, pp. 71–114. [47] L. R. James, R. G. Demaree, and G. Wolf, “rwg : An assessment of within-group interrater agreement,” J. Appl. Psych., vol. 78, no. 2, pp. 306–309, 1993. [48] J. M. Cortina, “Big things have small beginnings: An assortment of ‘minor’ methodological misunderstandings,” J. Manage., vol. 28, no. 3, pp. 339–362, 2002. [49] D. A. Waldman and F. J. Yammarino, “CEO charismatic leadership: Levels-of-management and levels-of-analysis effects,” Acad. Manage. Rev., vol. 24, no. 2, pp. 266–285, 1999. [50] N. Kock, “Action research: Lessons learned from a multi-iteration study of computer-mediated communication in groups,” IEEE Trans. Profess. Commun., vol. 46, no. 2, pp. 105–120, 2003. [51] E. Christiaanse and N. Venkatraman, “Beyond Sabre: An empirical test of expertise exploitation in electronic channels,” MIS Quart., vol. 26, no. 1, pp. 15–38, 2002. [52] H. G. Lee, T. Clark, and K. Y. Tam, “Research report. Can EDI benefit adopters?,” Inform. Syst. Res., vol. 10, no. 2, pp. 186–195, 1999. [53] G. E. Truman, “Integration in electronic exchange environments,” J. Manage. Inform. Syst., vol. 17, no. 1, pp. 209–244, 2000. [54] P. Chwelos, I. Benbasat, and A. S. Dexter, “Research report: Empirical test of an EDI adoption model,” Inform. Syst. Res., vol. 12, no. 3, pp. 304–321, 2001. [55] R. J. Kauffman, J. McAndrews, and Y.-M. Wang, “Opening the ‘black box’ of network externalities in network adoption,” Inform. Syst. Res., vol. 11, no. 1, pp. 61–94, 2000. [56] H. H. Teo, K. K. Wei, and I. Benbasat, “Predicting intention to adopt interorganizational linkages: An institutional perspective,” MIS Quart., vol. 27, no. 1, pp. 19–50, 2003. [57] A. A. Yoris and R. J. Kauffman, “Should we wait? Network externalities, compatibility, and electronic billing adoption,” J. Manage. Inform. Syst., vol. 18, no. 2, pp. 47–63, 2001. [58] I. Vessey, V. Ramesh, and R. L. Glass, “Research in information systems: An empirical study of diversity in the discipline and its journals,” J. Manage. Inform. Syst., vol. 19, no. 2, pp. 129–174, 2002. [59] R. B. Johnston and S. Gregor, “A theory of industry-level activity for understanding the adoption of interorganizational systems,” Eur. J. Inform. Syst., vol. 9, no. 4, pp. 243–251, 2000. [60] A. Morton, F. Ackerman, and V. Belton, “Technology-driven and model-driven approaches to group decision support: Focus, research philosophy, and key concepts,” Eur. J. Inform. Syst., vol. 12, no. 2, pp. 110–126, 2003. [61] M. Mandviwalla and S. Khan, “Collaborative object workspaces (COWS): Exploring the integration of collaboration technology,” Decision Support Syst., vol. 27, no. 3, pp. 241–254, 1999. [62] M. J. McQuaid, T.-H. Ong, H. Chen, and J. F. Nunamaker, “Multidimensional scaling for group memory visualization,” Decision Support Syst., vol. 27, no. 1–2, pp. 163–176, 1999. [63] A. R. Dennis, T. Carte, and G. G. Kelly, “Breaking the rules: Success and failure in groupware-supported business process reengineering,” Decision Support Syst., vol. 36, no. 1, pp. 31–47, 2003. [64] R. O. Briggs, M. Adkins, D. Mittleman, J. Kruse, S. Miller, and J. F. Nunamaker, “A technology transition model derived from field investigation of GSS use,” J. Manage. Inform. Syst., vol. 15, no. 3, pp. 151–196, 1998/1999. [65] R. O. Briggs, G-J. De Vreede, and J. F. Nunamaker, “Collaboration engineering with thinklets to pursue sustained success with group support systems,” J. Manage. Inform. Syst., vol. 19, no. 4, pp. 31–64, 2003. [66] A. Majchrzak, R. Rice, A. Malhotra, and N. King, “Technology adaptation: The case of a computer-supported inter-organizational virtual team,” MIS Quart., vol. 24, no. 4, pp. 569–600, 2000. [67] A. Malhotra, A. Majchrzak, R. Carman, and V. Lott, “Radical innovation without collocation: A case study at Boeing-Rocketdyne,” MIS Quart., vol. 25, no. 2, pp. 229–249, 2001. [68] J. Scott, “Facilitating interorganizational learning with information technology,” J. Manage. Inform. Syst., vol. 17, no. 2, pp. 81–113, 2000. [69] G.-J. De Vreede, N. Jones, and R. J. Mgaya, “Exploring the application and acceptance of group support systems in Africa,” J. Manage. Inform. Syst., vol. 15, no. 3, pp. 197–234, 1998/1999. [70] A. P. Massey, M. M. Montoya-Weiss, and Y. Hung, “Because time matters: Temporal coordination in global virtual project teams,” J. Manage. Inform. Syst., vol. 19, no. 4, pp. 129–155, 2003. [71] E. M. Trauth and L. M. Jessup, “Understanding computer-mediated discussions: Positivist and interpretive analyses of group support system use,” MIS Quart., vol. 24, no. 1, pp. 43–79, 2000. [72] K. R. T. Larsen, “A taxonomy of antecedents of information systems success: Variable analysis studies,” J. Manage. Inform. Syst., vol. 20, no. 2, pp. 160–246, 2003. [73] M. Adkins, M. Burgoon, and J. F. Nunamaker, “Using group support systems for strategic planning with the United States air force,” Decision Support Syst., vol. 34, no. 3, pp. 315–337, 2003. [74] R. Benbunan-Fich, S. R. Hiltz, and M. Turoff, “A comparative content analysis of face-to-face vs. asynchronous group decision making,” Decision Support Syst., vol. 34, no. 4, pp. 457–469, 2003. [75] R. Barkhi, “The effects of decision guidance and problem modeling on group decision-making,” J. Manage. Inform. Syst., vol. 18, no. 3, pp. 259–283, 2001. [76] G. Piccoli and B. Ives, “Trust and the unintended effects of behavior control in virtual teams,” MIS Quart., vol. 27, no. 3, pp. 365–393, 2003. GALLIVAN AND BENBUNAN-FICH: ANALYZING LEVELS OF ANALYSIS ISSUES IN STUDIES OF E-COLLABORATION 103 [77] Y. Yoo and M. Alavi, “Media and group cohesion: Relative influences on social presence, task participation, and group consensus,” MIS Quart., vol. 25, no. 3, pp. 371–390, 2001. [78] W. W. Huang and K. K. Wei, “An empirical investigation of the effects of group support systems (GSS) and task type on group interactions from an influence perspective,” J. Manage. Inform. Syst., vol. 17, no. 2, pp. 181–206, 2000. [79] W. W. Huang, K. K. Wei, R. T. Watson, and B. Tan, “Supporting virtual team-building with a GSS: An empirical investigation,” Decision Support Syst., vol. 34, no. 4, pp. 359–367, 2003. [80] S. S. Kahai and R. B. Cooper, “Exploring the core concepts of media richness theory: The impact of cue multiplicity and feedback immediacy on decision quality,” J. Manage. Inform. Syst., vol. 20, no. 1, pp. 263–300, 2003. , “The effect of computer-mediated communication on agreement and acceptance,” J. Manage. Inform. [81] Syst., vol. 16, no. 1, pp. 165–188, 1999. [82] T. R. Kayworth and D. E. Leidner, “Leadership effectiveness in global virtual teams,” J. Manage. Inform. Syst., vol. 18, no. 3, pp. 7–40, 2001. [83] M. Limayem and G. DeSanctis, “Providing decisional guidance for multicriteria decision making in groups,” Inform. Syst. Res., vol. 11, no. 4, pp. 386–401, 2000. [84] B. Tan, K. K. Wei, and J-E. Lee-Partridge, “Effects of facilitation and leadership on meeting outcomes in a group support system environment,” Eur. J. Inform. Syst., vol. 8, no. 4, pp. 233–246, 1999. [85] A. M. Townsend, S. M. Demarie, and A. R. Hendrickson, “Desktop video conferencing in virtual workgroups: Anticipation, system evaluation and performance,” Inform. Syst. J., vol. 11, no. 3, pp. 213–227, 2001. [86] K. Burke and L. Chidambaram, “How much bandwidth is enough? A longitudinal examination of media characteristics and group outcomes,” MIS Quart., vol. 23, no. 4, pp. 557–579, 1999. [87] M. Grise and B. Gallupe, “Information overload: Addressing the productivity paradox in face-to-face electronic meetings,” J. Manage. Inform. Syst., vol. 16, no. 3, pp. 157–185, 1999. [88] L. L. Tung and M. A. Quaddus, “Cultural differences explaining the differences in results in GSS: Implications for the next decade,” Decision Support Syst., vol. 33, no. 2, pp. 177–199, 2002. [89] M. J. Garfield, N. J. Taylor, A. R. Dennis, and J. W. Satzinger, “Modifying paradigms: Individual differences, creativity techniques, and exposure to ideas in group idea generation,” Inform. Syst. Res., vol. 12, no. 3, pp. 322–333, 2001. [90] K. Hilmer and A. R. Dennis, “Stimulating thinking: Cultivating better decisions with groupware through categorization,” J. Manage. Inform. Syst., vol. 17, no. 3, pp. 93–114, 2000. [91] M. Khalifa and R. Kwok, “Remote learning technologies: Effectiveness of hypertext and GSS,” Decision Support Syst., vol. 26, no. 3, pp. 195–207, 1999. [92] J. Koh and Y-G. Kim, “Sense of virtual community: A conceptual framework and empirical validation,” Int. J. Electron. Commerce, vol. 8, no. 2, pp. 75–93, 2003/2004. [93] G. Piccoli, R. Ahmad, and B. Ives, “Web-based virtual learning environments: A research framework and a preliminary assessment of effectiveness in basic IT skills training,” MIS Quart., vol. 25, no. 4, pp. 401–426, 2001. [94] M. Alavi, G. M. Marakas, and Y. Yoo, “A comparative study of distributed learning environments on learning outcomes,” Inform. Syst. Res., vol. 13, no. 4, pp. 404–415, 2002. [95] A. R. Dennis, J. E. Aronson, W. G. Heninger, and E. D. Walker, “Structuring time and task in electronic brainstorming,” MIS Quart., vol. 23, no. 1, pp. 95–108, 1999. [96] A. R. Dennis and M. J. Garfield, “The adoption and use of GSS in project teams: Toward more participative processes and outcomes,” MIS Quart., vol. 27, no. 2, pp. 289–323, 2003. [97] S. C. Hayne, C. E. Pollard, and R. E. Rice, “Identification of comment authorship in anonymous group support systems,” J. Manage. Inform. Syst., vol. 20, no. 1, pp. 301–330, 2003. [98] J. M. Hender, D. L. Dean, T. L. Rodgers, and J. F. Nunamaker, “An examination of the impact of stimuli type and GSS structure on creativity: Brainstorming versus nonbrainstorming techniques in a GSS environment,” J. Manage. Inform. Syst., vol. 18, no. 4, pp. 59–86, 2002. [99] E. Karahanna, M. Ahuja, M. Srite, and J. Galvin, “Individual differences and relative advantage: The case of GSS,” Decision Support Syst., vol. 32, no. 4, pp. 327–341, 2002. [100] R. Kwok, J.-N. Lee, M. Huynh, and S.-M. Pi, “Role of GSS on collaborative problem-based learning: A study on knowledge externalization,” Eur. J. Inform. Syst., vol. 11, no. 2, pp. 98–107, 2002. [101] R. Kwok, J. Ma, and D. R. Vogel, “Effects of group support systems and content facilitation on knowledge acquisition,” J. Manage. Inform. Syst., vol. 19, no. 3, pp. 185–229, 2002. [102] H. Lou, W. Luo, and D. Strong, “Perceived critical mass effect on groupware acceptance,” Eur. J. Inform. Syst., vol. 9, no. 2, pp. 91–103, 2000. [103] S. M. Miranda and R. P. Bostrom, “Meeting facilitation: Process versus content interventions,” J. Manage. Inform. Syst., vol. 15, no. 4, pp. 89–114, 1999. [104] B. A. Reinig, “Toward an understanding of satisfaction with the process and outcomes of teamwork,” J. Manage. Inform. Syst., vol. 19, no. 4, pp. 65–83, 2003. [105] C. L. Sia, B.C.Y. Tan, and K. K. Wei, “Group polarization and computer-mediated communication: Effect of communication cues, social presence and anonymity,” Inform. Syst. Res., vol. 13, no. 1, pp. 70–90, 2002. [106] M. Warkentin and P. M. Beranek, “Training to improve virtual team communication,” Inform. Syst. J., vol. 9, no. 4, pp. 271–289, 1999. 104 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 48, NO. 1, MARCH 2005 Michael Gallivan is an Associate Professor in the Computer Information Systems Department at Georgia State University in the Robinson College of Business. He conducts research on human resource practices for managing IT professionals, as well as strategies for managing effective IT implementation, IT outsourcing, and interorganizational alliances. He received his Ph.D. from MIT Sloan School of Management. His research has appeared in Database for Advances in IS, Information Systems Journal, Information Technology & People, Information & Management, Information and Organization, and IEEE TRANSACTION ON PROFESSIONAL COMMUNICATIONS. Raquel Benbunan-Fich is at the Computer Information Systems Department of the Zicklin School of Business, Baruch College, City University of New York. Her research interests include computer-mediated communications, group collaboration and e-commerce. She received her Ph.D. from Rutgers University. She has published articles in Communications of the ACM, Decision Support Systems, Information & Management, International Journal of E-commerce, IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATIONS, and other journals.