CONTENTS LIST OF CONTRIBUTORS vii REVIEWER ACKNOWLEDGMENTS ix EDITOR’S COMMENTS xi EDITORIAL POLICY AND SUBMISSION GUIDELINES xiii PART I: ACCOUNTING BEHAVIORAL RESEARCH A STRUCTURAL EQUATION MODEL OF AUDITORS’ PROFESSIONAL COMMITMENT: THE INFLUENCE OF FIRM SIZE AND POLITICAL IDEOLOGY John T. Sweeney, Jeffrey J. Quirin and Dann G. Fisher 3 AN ANALYSIS OF GROUP INFLUENCES ON GOING CONCERN AUDITOR JUDGMENTS Sunita S. Ahlawat and Timothy J. Fogarty 27 INVESTIGATING ERROR PROJECTION AMONG STATE AUDITORS: THE IMPACT OF INTENTIONAL AND SYSTEMATIC MISSTATEMENTS John T. Reisch, Karen S. McKenzie and Alan H. Friedberg 53 HOW DOES NEGATIVE SOURCE CREDIBILITY AFFECT COMMERCIAL LENDERS’ DECISIONS? Philip R. Beaulieu and Andrew J. Rosman 79 v vi EARNINGS MANAGEMENT AND FRAMING: THE SPECIFIC CASE OF OBSOLETE INVENTORY Marybeth M. Murphy and Joanne P. Healy 95 THE EFFECTS OF INCENTIVE STRUCTURE AND GOAL DIFFICULTY ON TIME PLANNING DECISIONS WITHIN A BALANCED SCORECARD FRAMEWORK Brad Tuttle and Mark J. Ullrich 121 THE EFFECT OF FAIRNESS IN CONTRACTING ON THE CREATION OF BUDGETARY SLACK Theresa Libby 145 PART II: PERSPECTIVES ON RESEARCH PRODUCTIVITY A TOBIT ANALYSIS OF ACCOUNTING FACULTY PUBLISHING PRODUCTIVITY IN AUSTRALIAN AND NEW ZEALAND UNIVERSITIES Brett R. Wilkinson, Chris H. Durden and Katherine J. Wilkinson 173 PART III: METHODOLOGICAL ISSUES IN BEHAVIORAL RESEARCH CLASSIFICATION OF CUSTOMIZED ASSURANCE SERVICES BY DECISION MAKERS: THE CASE OF SysTrust™ Philip R. Beaulieu 189 LIST OF CONTRIBUTORS Sunita S. Ahlawat School of Business, The College of New Jersey, USA Philip R. Beaulieu Haskayne School of Business, University of Calgary, Canada Chris H. Durden Department of Accounting, University of Southern Queensland, Australia Dann G. Fisher Department of Accounting, Kansas State University, USA Timothy J. Fogarty Weatherhead School of Management, Case Western Reserve University, USA Alan H. Friedberg School of Accounting, Florida Atlantic University, USA Joanne P. Healy College of Business Administration, Kent State University, USA Theresa Libby School of Business and Economics, Wilfrid Laurier University, Canada Karen S. McKenzie School of Accounting, Florida Atlantic University, USA Marybeth M. Murphy College of Business Administration, Kent State University, USA Jeffrey J. Quirin Barton School of Business, Wichita State University, USA John T. Reisch School of Business, East Carolina University, USA Andrew J. Rosman School of Business, University of Connecticut, USA vii viii John T. Sweeney School of Accounting, Information Systems & Business Law, Washington State University, USA Brad Tuttle Moore School of Business, University of South Carolina, USA Mark J. Ullrich (Deceased) Graduate School of Business & Public Policy, Naval Post Graduate School, USA Brett R. Wilkinson Hankamer School of Business, Baylor University, USA Katherine J. Wilkinson Rawls College of Business, Texas Tech University, USA REVIEWER ACKNOWLEDGMENTS The Editor and Associate Editors at AABR would like to thank the many excellent reviewers who have volunteered their time and expertise to make this an outstanding publication. Publishing quality papers in a timely manner would not be possible without their efforts. Elizabeth Dreike Almer Portland State University, USA Roger Debreceny Nanyang Technological University, Singapore John C. Anderson San Diego State University, USA William N. Dilla Iowa State University, USA Philip R. Beaulieu University of Calgary, Canada Alan S. Dunk University of Tasmania, Australia Jean Bedard Northeastern University, USA Jennifer D. Goodwin University of Queensland, Australia James Bierstaker University of Massachusetts, Boston, USA Glen Gray California State University, Northridge, USA Dennis M. Bline Bryant College, USA Heather Hermanson Kennesaw State University, USA Robert H. Chenhall Monash University, Australia Mary Callahan Hill Kennesaw State University, USA Freddie Choo San Francisco State University, USA Karen L. Hooks Florida Atlantic University, USA Christie L. Comunale Long Island University – C.W. Post Campus, USA James E. Hunton Bentley College, USA Charles Cullinan Bryant College, USA Mike Kirschenheiter Columbia University, USA Elizabeth Davis Baylor University, USA Stacy Kovar Kansas State University, USA ix x Kip R. Krumwiede Brigham Young University, USA Robert J. Parker University of South Florida, USA Theresa Libby Wilfrid Laurier University, Canada Will Quilliam University of South Florida, USA Daryl Lindsay University of Saskatchewan, Canada John Reisch East Carolina University, USA Timothy J. Louwers Louisiana State University, USA Michael Roberts University of Alabama, USA Nace Magner Western Kentucky University, USA Andrew J. Rosman University of Connecticut, USA James Maroney Northeastern University, USA Steve G. Sutton University of Connecticut, USA and University of Melbourne, Australia Lokman Mia Griffith University – Gold Coast, Australia Linda Thorne York University, Canada Venky Nagar University of Michigan, USA Sandra Vera-Munoz University of Notre Dame, USA Marcus Odom Southern Illinois University, USA Sally A. Webber Northern Illinois University, USA Ed O’Donnell Arizona State University, USA Kristin Wentzel La Salle University, USA William R. Pasewark Texas Tech University, USA Patrick Wheeler University of Missouri, USA Laurie Pant Suffolk University, USA Stephen W. Wheeler University of the Pacific, USA EDITOR’S COMMENTS Welcome to Volume 6 of Advances in Accounting Behavioral Research. This issue contains an eclectic collection of behavioral research papers that examine several very important issues. Several of the papers focus on various aspects of auditors’ decisions such as professional commitment in public accounting firms, mitigating bias via group decision making, and appropriately using sample information to estimate errors in governmental auditing. The decisions of other professionals that use accounting information such as commercial lenders and divisional managers are also examined. Two papers examine how accounting information impacts the behaviors of individuals within an organization under various incentive structures. Two other papers provide perspectives on overall research with one developing a classification scheme for new assurance services and the other examining factors that impact research productivity of accounting faculty members. Overall, this is a very enlightening group of papers that provide insight into the behaviors of various users of accounting information. Vicky Arnold Editor xi EDITORIAL POLICY AND SUBMISSION GUIDELINES Advances in Accounting Behavioral Research (AABR) publishes articles encompassing all areas of accounting that incorporate theory from and contribute new knowledge and understanding to the fields of applied psychology, sociology, management science, and economics. The Research Annual is primarily devoted to original empirical investigations; however, literature review papers, theoretical analyses, and methodological contributions are welcome. AABR is receptive to replication studies, provided they investigate important issues and are concisely written. The Research Annual especially welcomes manuscripts that integrate accounting issues with organizational behavior, human judgment/decision making, and cognitive psychology. Manuscripts will be blind-reviewed by two reviewers and an associate editor. The recommendations of the reviewers and associate editor will be used to determine whether to accept the paper as is, accept the paper with minor revisions, reject the paper or invite the authors to revise and resubmit the paper. MANUSCRIPT SUBMISSION Manuscripts should be forwarded to the editor, Vicky Arnold, at Vicky. [email protected] via e-mail. All text, tables, and figures should be incorporated into a word document prior to submission. The manuscript should also include a title page containing the name and address of all authors and a concise abstract. Also, include a separate word document with any experimental materials or survey instruments. If you are unable to submit electronically, please forward the manuscript along with the experimental materials to the following address: Vicky Arnold, Editor Advances in Accounting Behavioral Research Department of Accounting U41A School of Business University of Connecticut Storrs, CT 06269-2041, USA xiii xiv References should follow the APA (American Psychological Association) standard. References should be indicated by giving (in parentheses) the author’s name followed by the date of the journal or book; or with the date in parentheses, as in “suggested by Earley (2000).” In the text, use the form Rosman et al. (1995) where there are more than two authors, but list all authors in the references. Quotations of more than one line of text from cited works should be indented and citation should include the page number of the quotation; e.g. (Dunbar, 2001, p. 56). Citations for all articles referenced in the text of the manuscript should be shown in alphabetical order in the reference list at the end of the manuscript. Only articles referenced in the text should be included in the reference list. Format for references is as follows: For Journals Dunn, C. L., & Gerard, G. J. (2001). Auditor efficiency and effectiveness with diagrammatic and linguistic conceptual model representations. International Journal of Accounting Information Systems, 2(3), 1–40. For Books Ashton, R. H., & Ashton, A. H. (1995). Judgment and decision-making research in accounting and auditing. New York, NY: Cambridge University Press. For a Thesis Smedley, G. A. (2001). The effects of optimization on cognitive skill acquisition from intelligent decision aids. Unpublished doctoral dissertation, University. For a Working Paper Thorne, L., Massey, D. W., & Magnan, M. (2000). Insights into selectionsocialization in the audit profession: An examination of the moral reasoning of public accountants in the United States and Canada. Working paper: York University, North York, Ontario. xv For Papers From Conference Proceedings, Chapters From Book, etc. Messier, W. F. (1995). Research in and development of audit decision aids. In: R. H. Ashton & A. H. Ashton (Eds), Judgment and Decision Making in Accounting and Auditing (207–230). New York: Cambridge University Press. A STRUCTURAL EQUATION MODEL OF AUDITORS’ PROFESSIONAL COMMITMENT: THE INFLUENCE OF FIRM SIZE AND POLITICAL IDEOLOGY John T. Sweeney, Jeffrey J. Quirin and Dann G. Fisher ABSTRACT This study models auditors’ professional commitment as the product of socialization forces operating within the public accounting profession. The results of a structural equation analysis from a sample of 349 auditors representing international, national and regional ﬁrms indicate that ﬁrm size is inversely related to professional commitment. Furthermore, the ﬁndings indicate that a strong relationship exists between an auditor’s political ideology and professional commitment. Politically conservative auditors, reﬂecting the dominant ideology in public accounting, reported signiﬁcantly higher professional commitment than politically liberal auditors. INTRODUCTION The accounting scandals that have marked the dawn of the 21st century, such as Enron, MCI, and Global Crossing, have damaged the credibility of the audit report and the reputation of the public accounting industry. Perhaps more than ever, commitment to the ideals and standards of the auditing profession is vital Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 3–25 Copyright © 2003 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06001-0 3 4 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER to maintaining stakeholder’s confidence in the integrity of the audit report and in the reliability of financial statement representations. The purpose of this research effort is to develop and test a comprehensive model of auditors’ professional commitment with the objective of furthering our understanding of this attitude so essential to maintaining public trust. A primary contribution of this study to the auditing research literature is the inclusion of variables not previously considered in models of professional commitment, namely audit firm size and political ideology. Firm size proxies for differences in organizational culture (Pratt & Beaulieu, 1992) and the results indicate that auditors’ professional commitment is directly and inversely affected by firm size. As a profession, the culture of public accounting is predominately politically conservative (Sweeney, 1995). In this study, political ideology is modeled as a socializing variable. The findings indicate that auditors whose ideology is consistent with the prevailing conservative doctrine are more committed to the profession than auditors who are politically liberal. The results of this study have important implications for the public accounting profession. First, the inverse relationship between firm size and professional commitment is cause for concern, as larger firms and especially the international firms, dominate the market for audit services. Larger firms also dominate and increasingly emphasize the more lucrative consulting and non-audit service areas. Perhaps as a result of the metamorphosis from traditional accounting firms to diverse service organizations, auditors from larger firms may have lessened their identification with and commitment to the ideals of the accounting profession. Second, the model indicates that political ideology directly influences commitment, perhaps due to conservative auditors more readily embracing the conservative values traditionally associated with the profession. Political ideology also influences perceptions of success in public accounting, as conservative auditors report a significantly higher probability of attaining partnership in their firms than liberal auditors. This paper proceeds in the following manner. The next section reviews the literature relevant to the development of a model of professional commitment. Hypotheses are then advanced, followed by sections discussing the methodology and analysis. The final section consists of a summary and discussion. LITERATURE REVIEW AND HYPOTHESES DEVELOPMENT Professional commitment, representing the extent to which one identifies and is willing to exert effort in support of a profession (Aranya et al., 1981; Aranya & Ferris, 1984), has been conceptualized as a socialization process where emphasis A Structural Equation Model of Auditors’ Professional Commitment 5 is given to cultivating professional values (Jeffery & Weatherholt, 1996; Larson, 1977). For the auditing profession, these values include honoring the public interest, independence, integrity, and objectivity. An assumption underlying the attitude of professional commitment is that the stronger an individual’s identification with and loyalty to the public accounting profession, the less likely he or she will subrogate professional responsibilities (Farmer, 1993). A strong commitment to the ideals of the profession is considered a prerequisite for independent professional judgments (Aranya et al., 1981; Gaffney et al., 1993). The development of auditors’ professional commitment is generally assumed to precede the development of their organizational commitment (Aranya et al., 1982; Aranya & Ferris, 1984). Anticipatory professional socialization often begins in college, when the choice of accounting as an undergraduate major and career is made, while organizational commitment commences upon entrance to the firm (Fogerty, 1992). Early conceptualizations of the professional-organizational dynamic viewed the two constructs in conflict, as the demands of the employing bureaucracy were perceived to be in competition with professional loyalties (Sorenson & Sorenson, 1974). The conflict between organizational and professional socialization occurs when behaviors concordant with organizational norms and goals are inconsistent with the profession’s code of conduct. Violation of organizational norms may result in internal sanctions levied against the auditor. Violation of professional standards, such as Arthur Andersen’s obstruction of justice in the Enron audit, can result not only in penalties levied against the perpetrator and his or her firm but may also diminish the prestige of the auditing profession and the public’s perception of the assurance provided by the audit report. More recent research has not viewed organizational and professional commitment as inherently incompatible, finding instead a positive association between the two constructs (Aranya et al., 1981, 1982). When the professional and organizational commitments of public accountants are in conflict, however, researchers have found lower job satisfaction and higher turnover intentions (Aranya & Ferris, 1984; Sorenson & Sorenson, 1974). In order to preserve the role of the audit function in maintaining capital markets, it is essential that auditors’ commitment to the profession take priority over loyalties to the organization (Schroeder & Imdieke, 1977).1 Prior research has generally focused on the consequences of professional commitment and has consistently found a significant association with important outcome variables. Professional commitment has had a positive influence on public accountants’ job satisfaction (Aranya et al., 1982; Bline et al., 1991) and organizational commitment (Aranya et al., 1982; Aranya & Ferris, 1984) and a negative association with turnover/migration tendencies (Aranya et al., 1982; 6 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER Bline et al., 1991) and organizational-professional conflict (Aranya & Ferris, 1984). Professional commitment has also been associated with auditors’ judgment regarding client retention decisions (Farmer, 1993). A Model of Auditors’ Professional Commitment The objective of this research is to model auditors’ professional commitment. The changing landscape and demographics of the accounting profession suggest that an understanding of the socializing factors leading to high levels of professional commitment is needed. Siegal et al. (1991, p. 58) define professional socialization as the “acquisition of the values, attitudes, skills and knowledge of a professional subculture.” Fogerty (1992) contends that socialization within public accounting organizations is to a large extent a coercive process, as new initiates are inculcated to adopt the values of the dominant culture. Our model of professional commitment examines two socialization factors not previously considered in prior published research: firm size and political ideology. Firm Size Pratt and Beaulieu (1992) asserted that differences in firm size proxy for differences in culture. They concluded that larger firms have more rigid control systems than smaller firms, resulting in the large firms being more structured and mechanistic than the smaller firms. Wheeler et al. (1987) found that the nature of the work environment, the organizational structure, performance evaluations, compensation and promotion procedures in large firms differed substantially from those of smaller firms. Goetz et al. (1991) contended that the more structured and bureaucratic environment of larger firms resulted in less individual voice in determining rules of conduct within the firm. Ponemon (1992) claims that such a strong firm culture effectively results in the organization weeding out those persons who fail to conform. These factors imply that the loyalty of accountants in the larger firms must be first to the organization and then to the profession. Goetz et al. (1991) support this premise and assert that because smaller firms have “less stand-alone credibility” than do larger firms, practitioners in the smaller firms need the profession more than practitioners in the larger firms. Larger firms are more visible and prestigious, endowing upon their members an identity separate from the profession. This suggests that auditors in smaller firms may identify more readily with the profession, vis-à-vis the organization, than auditors in larger firms and correspondingly develop a greater sense of commitment to the profession. H1. Firm size is inversely related to auditors’ professional commitment. A Structural Equation Model of Auditors’ Professional Commitment 7 Political Ideology Socialization encourages persons “to become similar to their profession, not only as it is embodied by other organizational members, but also as it is defined by the profession’s espoused ideals” (Fogerty, 1992, p. 139). This description of the socialization process implies the existence of a prototypic public accountant embodying desirable characteristics, values and attitudes. The more effective the socialization processes, the greater the correspondence between the prototype and the professional member. Some values and attitudes (i.e. commitment, identification) may be more readily influenced and inculcated by the socialization process than others (i.e. religious preferences). It is also possible that some prototypic characteristics are not amendable by socialization (i.e. gender, race). A particularly appropriate theory for examining the influence of prototypes on socialization processes in the auditing profession is self-categorization theory (SCT) (Chatman et al., 1998; Hogg & Terry, 2000; Tajfel & Turner, 1985).2 SCT focuses on the process whereby individuals define their self-concept in relation to their membership in social groups. Prototype-based comparisons, whereby social categorization of the individual into favorable in-group or unfavorable out-group membership occurs, “lies at the heart” of SCT processes (Hogg & Terry, 2000, p. 122). Prototypes are cognitive representations of the defining and stereotypical features of in-groups, embodying exemplary or ideal types and capturing characteristics that differentiate them from other groups. These characteristics include demographic attributes, behaviors, attitudes and values. Critical to the notion of prototypes is that they accentuate similarities within and differences between groups (Hogg & Terry, 2000). For example, because the prototypical partner in public accounting is male, an in-group characteristic may be masculinity and an out-group characteristic femininity (Maupin, 1993; Maupin & Lehman, 1994).3 Prototype-based self-categorization is relevant for modeling professional commitment as a socialization process directed towards cultivating professional values (Jeffery & Weatherholt, 1996; Larson, 1977) for several reasons. First, ingroup members, reflecting prototypic characteristics, are more likely to cooperate with each other and to compete with out-group members (Chatman et al., 1998). Second, in-group members are likely to receive favorable treatment compared to out-group members (Ashforth & Mael, 1989). This favoritism may be reflected in work assignments, performance evaluations, receipt of voluntary mentoring, or through informal signals of preference relative to out-group members. As a result, in-group members are likely to maintain more favorable attitudes towards their profession and be more readily socialized than out-group members. Third, SCT implies that a prototypically homogeneous audit profession is likely to develop, 8 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER which may facilitate socialization by reducing uncertainty regarding appropriate attitudes and behaviors (Hogg & Terry, 2000). Chatman (1991) utilized a person-organization fit approach, defined as the congruence between organizational and individual values, in examining socialization processes in public accounting. She found that socialization is facilitated by the extent that new auditors possess or inculcate values similar to the prototypical organizational values. Recruits whose values were aligned with the prevalent organizational values had greater satisfaction and lower turnover than recruits who maintained dissimilar values. Kanter (1977) contends that in-group conformity is a prerequisite for advancement in organizations and that promotion largely depends upon presenting political views as well as sex-role characteristics that are similar to the dominant or prototypic upper-level managers. Sweeney and Fisher (1999) propose that conservative political ideology represents a normative set of shared values in public accounting and is an important socialization factor. In his analysis of the influence of social class on political orientation, Burns (1992) identified several dimensions collectively predictive of conservative ideology. The dimensions identified as explaining a conservative/Republican political orientation included engaging in mental (versus manual) labor, self-employment, individualistic (versus collective) economic orientation, white race and male.4 These dimensions are generally descriptive of the prototypic audit firm partner. There has been little research to date examining the political orientation of public accountants. In a broad sample of public accountants, Sweeney (1995) found that approximately 80% identified themselves as politically conservative. Further testimony to the conservative orientation of public accounting is reflected in political party contributions over the last election cycle (1999–2000). The combined contributions of the American Institute of Certified Public Accountants and the Big 5 international firms to the conservatively oriented Republican Party ($3,358,746) were approximately twice those to the more liberal Democratic Party ($1,708,220) (FECInfo, 2001). If political ideology is an important socializing variable in public accounting, then conservative auditors are most likely to inculcate and embrace the prototypic politically conservative values of the profession. Politically liberal auditors may feel disenfranchised by the conservative orientation of public accounting and have difficulty identifying with the dominant political values. As a result, it is likely that politically conservative auditors would be more readily socialized, and therefore be more committed to the profession, than their politically liberal counterparts. H2. Politically conservative auditors will have greater professional commitment than politically liberal auditors. A Structural Equation Model of Auditors’ Professional Commitment 9 Control Paths Prior research has documented that partners in public accounting are typically male (Hooks & Cheramy, 1994; Hull & Umansky, 1997) and, on average, have developed to the conventional level of moral reasoning (Sweeney, 1995). Researchers have suggested that masculinity (Maupin, 1993; Maupin & Lehman, 1994) and conventional moral reasoning (Ponemon, 1992) represent prototypes in public accounting. Since the influence of both gender and moral reasoning on professional commitment has been examined in prior research, these variables are included as control paths in the model of professional commitment. Although the literature suggests that gender barriers in public accounting may preclude women from attaining the same level of commitment to the profession as men (Maupin, 1993; Maupin & Lehman, 1994), the results of empirical research have been equivocal. Gaffney et al. (1993) found that family obligations increased the professional commitment of men in public accounting but had no effect on women’s professional commitment. Street et al. (1993), after controlling for positional level, did not find a difference in professional commitment between female and male public accountants. Covaleski et al. (1998) contend that although women may have “broken the glass ceiling” to attaining partnership in Big 6 firms, there is still a paucity of high-level female partners. Women who are unable or unwilling to adapt masculine characteristics required by the male-dominated culture of public accounting may encounter obstacles in making partner (Maupin & Lehman, 1994). Given the predominance of the male partners and the difficulties that woman may encounter in adopting in-group male qualities, women in public accounting may represent an out-group and have correspondingly less professional commitment than men. H3. Male auditors will have greater professional commitment than will female auditors. Ethics researchers in accounting have consistently found that the ethical development of auditors, as measured by the P score of the Defining Issues Test (DIT) (Rest, 1986, 1993), most commonly reflected conventional reasoning and was inversely related to positional level (Lampe & Finn, 1992; Ponemon & Gabhart, 1993; Shaub, 1994). This result seemingly contradicts Kohlberg’s (1969) moral development theory, which holds that development is sequential and progressive but not regressive. Ponemon (1992) contended that the inverse relationship between P scores and rank in public accounting organizations was the result of a selection-socialization process whereby firms prefer to hire and then promote individuals with a shared set of ethical values and beliefs. He found 10 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER that conventional reasoning auditors, as measured by DIT P scores, were more likely to be favorably evaluated and promoted and less likely to turnover than principled reasoning auditors. Ponemon (1992, p. 244) asserted that “individuals with too high a level of ethical reasoning may experience difficulty progressing to the upper echelons in the accounting firms’ formal hierarchy.” In other words, accountants who reason at a conventional level are most likely to accept, embrace, and support the prototypical norms of the firm and the profession, increasing their acceptance within the organization and opportunities for promotion. Thus, it appears that conventional reasoning, as opposed to higher order principled reasoning, is representative of the ethical value prototype of the audit profession. Dwyer et al. (2000) examined the relationship between practicing accountants’ professional commitment and DIT P scores. Their results suggested that accountants’ ethical development influenced their interpretation of the professional commitment construct, although the authors did not indicate a directional relationship. Shaub et al. (1993) found that auditor’s professional commitment was influenced by their ethical orientation, with ethical idealism positively related to and ethical relativism negatively related to commitment. Jeffery and Weatherholt (1996) posited a link between an accountant’s ethical development, as measured by DIT P scores, and his or her professional commitment. Consistent with Ponemon’s (1992) selection-socialization hypothesis, Jeffery and Weatherholt found that conventional reasoning accountants had higher professional commitment than principled reasoning accountants. H4. Professional commitment will be inversely related to auditors’ ethical development, as measured by the P score of the DIT. The relationship between positional level and professional commitment has also been examined in prior research and is included as a control path in our model. Advancement within public accounting organizations is largely a result of socialization processes, whereby individuals who reflect the dominant culture and values of the organization are more likely to be promoted (Fogerty, 1992; Ponemon, 1992; Pratt & Beaulieu, 1992). Early research on commitment in public accounting organizations (Schroeder & Imdieke, 1977; Sorenson, 1967; Sorenson & Sorenson, 1974) suggested that partners were more organizationally oriented and less professionally committed than were staff members. More recent research has not supported this contention. These studies instead found that professional commitment is positively associated with rank in the firm (Adler & Aranya, 1984; Aranya et al., 1981; Aranya & Ferris, 1984; Jeffery & Weatherholt, 1996; Norris & Niebuhr, 1983). Goetz et al. (1991) speculated that experience and tenure heighten professional commitment. This is consistent with defining professional commitment as a A Structural Equation Model of Auditors’ Professional Commitment 11 socialization process. If the socialization process is successful, then it follows that those who have been in the profession the longest should display the strongest commitment. Turnover-survivorship processes would also suggest that professional commitment should be stronger at higher positional levels. Individuals who are more committed to the profession would be more likely to remain, which may explain the inverse relationship between professional commitment and turnover (Aranya et al., 1982; Bline et al., 1991). H5. Professional commitment of auditors will increase with rank in the firm. Hypothesis 3 and Hypothesis 5 posit that gender and rank will have an effect on professional commitment. Prior studies involving public accountants have indicated a strong relationship between gender and rank, with females being underrepresented at higher ranks (Collins, 1993; Hooks & Cheramy, 1994; Maupin, 1993; Maupin & Lehman, 1994; Sweeney, 1995). As a result, it is necessary to control for the influence of positional level when assessing the relationship between gender and professional commitment. Prior research assessing the ethical development of public accountants have generally found the DIT P scores of females to be higher than the scores of males (Bernardi & Arnold, 1997; Enyon et al., 1997; Shaub, 1994; Sweeney, 1995). The gender effect on P scores appears to hold regardless of firm size. As a result, the ethical development of female auditors is expected, on average, to be more advanced than that of male auditors. Therefore, the influence of gender must be controlled for in assessing the effect of ethical development, as measured by DIT P scores, on auditors’ professional commitment (H4). Sweeney and Fisher (1998, 1999) and Fisher and Sweeney (2002) contend that the DIT contains an imbedded political content biasing the measurement of test-takers’ ethical development. Although Rest et al. (1999) dispute this claim, they concede that as much as 40% of the variance in DIT P scores is explained by political ideology. A priori, the political content of the DIT will result in an upward bias in the P scores of politically liberal auditors and a downward bias in the P scores of politically conservative auditors (Sweeney & Fisher, 1998, 1999). Therefore, the influence of political ideology must be controlled in assessing the effect of ethical development on professional commitment (H4).5 In summary, we hypothesize that auditors’ professional commitment is directly impacted by the following variables: firm size (H1), political ideology (H2), gender (H3), ethical development, as measured by DIT P scores (H4), and positional level (H5). The model of professional commitment also includes the following control paths: positional level on gender, gender on ethical development, and political ideology on DIT P scores. Figure 1 presents our model of auditors’ professional commitment. 12 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER Fig. 1. Theoretical Model of Professional Commitment. METHOD Sample Prior to collecting data, management representatives from offices of multiple public accounting firms agreed to participate in the study and to provide auditor subjects. Three international firms (Big 5: “large”), two national firms (“medium”), and six local or regional firms (“small”) participated in the study. The appropriate office representative indicated the approximate number of available auditor subjects. The office representative was then provided with the required number of research instruments to distribute to the participants. Each research instrument consisted of a questionnaire, the six-story DIT and instructions enclosed in a stamped, return envelope addressed to the researchers. Participation A Structural Equation Model of Auditors’ Professional Commitment 13 Table 1. Descriptive Statistics for Sample. Position Firm Size Totals Small Medium Large Staff Senior Supervisor Manager Partner 23 15 11 10 29 22 10 8 14 9 55 63 19 39 22 100 88 38 63 60 Totals 88 63 198 349 Males: Females: Mean: S.D.: Range: 230 119 Liberals: Conservatives: 63 286 Average age: Average experience: Professional Commitment P Score 75.51 11.73 41–103 42.14 12.53 8.3–73.3 30.3 years (S.D. = 8.1) 7.3 years (S.D. = 7.1) was voluntary and subjects were assured of anonymity. Participants provided demographic data but did not otherwise identify themselves. A total of 383 research instruments were received by the researchers, resulting in a response rate of approximately 72%. From this initial sample, 27 subjects failed to pass the internal reliability checks of the DIT, two subjects did not indicate their political ideology, and five did not complete the professional commitment section. These respondents were purged from the sample. The final sample consisted of a cross-section of 349 auditors, of which 66% were male and 82% were politically conservative. Descriptive statistics for the sample are given in Table 1. Measures Professional commitment (PC) was measured with the 15-item scale adapted by Aranya et al. (1981) from the Porter et al. (1974) organizational commitment questionnaire. This scale has been utilized extensively by accounting researchers to measure professional commitment (Aranya et al., 1982; Gaffney et al., 1993; Harrell et al., 1986; Jeffery & Weatherholt, 1996; Street et al., 1993). Researchers have indicated that the scale has good internal consistency, with Cronbach’s 14 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER alpha reported in the high 0.80s (Aranya et al., 1981; Aranya & Ferris, 1984; Bline et al., 1991). Bline et al. (1991), in an extensive examination of the psychometric properties of the professional commitment questionnaire, report that the scale measures a construct distinct from organizational commitment. Their tests indicated that the professional commitment scale has adequate reliability and validity. Furthermore, the professional commitment construct correlated positively with job satisfaction and negatively with intent to leave the profession. Other accounting researchers have reported negative correlations between the professional commitment scale and organizational-professional conflict (Aranya et al., 1981; Harrell et al., 1986) and positive correlations with favorable work attitudes in public accounting (Aranya et al., 1982).6 Ethical development was measured by the sample respondents’ P score from the 6-story DIT (Rest, 1979, 1986, 1993). The P score is a continuous measure, ranging from 0 to 95, reflecting the relative importance a subject gives to principled moral reasoning in resolving moral dilemmas (Rest et al., 1997, p. 498). Rest (1993) reports an average P score of 45 for college graduates, although accounting researchers have generally found that public accountants score lower than adults from the general population at similar educational levels (Ponemon, 1992; Sweeney, 1995). Rest (1986, pp. 176–179) contends that the P score correlates most strongly with educational level but only weakly with gender, intelligence and ethnic background. Gender, however, appears to have a stronger influence on accountants’ P scores than it does in the general population, with females attaining significantly higher scores (Bernardi & Arnold, 1997; Enyon et al., 1997; Shaub, 1994; Sweeney, 1995). The DIT has been subjected to extensive reliability and validity tests with generally good results (Rest, 1979, 1986; Rest et al., 1999). Some researchers (Emler et al., 1983), however, contend that the DIT contains a political bias. In studies with accounting subjects, Sweeney and Fisher (1998, 1999) found that the DIT contained an imbedded political content that tended to overstate the scores of political liberals and to understate the scores of political conservatives. They suggest that researchers utilizing the DIT control for subjects’ political ideology in order to more clearly interpret the relationship between P scores and the variable of interest. Subjects’ indicated their political ideology in response to the following question: “Regarding important social and political issues, would you classify your opinion or perspective as primarily conservative or liberal?” Forcing subjects to identify their positions as primarily liberal or conservative is consistent with prior research (Sweeney, 1995) and eliminates the ambiguity of a political “moderate” classification. A Structural Equation Model of Auditors’ Professional Commitment 15 EMPIRICAL RESULTS Correlations Table 2 presents correlation coefficients for professional commitment and variables of interest. Subjects’ professional commitment is negatively associated with the size of their respective firm and positively associated with their positional level. Political ideology and gender are associated with professional commitment and DIT P scores. Political ideology is not correlated with gender, position, or firm size. The significant association between gender and position results from the underrepresentation of female auditors at the higher ranks. The association between firm size and position is an apparent artifact of the non-random sample selection process. Structural Equation Modeling Structural equation modeling was used to evaluate the proposed hypotheses. The structural equation model utilized to test the hypotheses corresponds to the model in Fig. 1. Each link between the variables in Fig. 1 has a path coefficient that measures the impact of the antecedent variable in explaining the variance in the outcome variable. For example, the path coefficient for the link between political ideology and P score indicates the increase in P score, measured in standard deviations, associated with a one standard deviation increase in political ideology. The goal of structural equation modeling is to evaluate whether associations proposed in theory, or in prior research, fit the present data set. Evidence of proper fit is provided by various other fit indices. However, measures of proper fit can Table 2. Correlation Matrix. (1) (2) (3) (4) (5) (6) Professional Commitment (1) Firm Size (2) Political Ideology (3) 1.000 −0.246** −0.132** −0.017 0.234** −0.116* 1.000 0.087 −0.080 −0.146** 0.054 1.000 0.194** −0.046 0.055 N = 349. ∗ p < 0.05 (one tailed significance). ∗∗ p < 0.01 (one tailed significance). P Score (4) Position (5) Gender (6) 1.000 −0.105* 0.205** 1.000 −0.353** 1.000 16 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER be problematic since several of the commonly used fit indices are sample size dependent. For this reason, multiple measures of overall model fit are reported in this study. The Normed Fit Index (NFI) (Bentler & Bonett, 1980) has an index range from 0 to 1, with values over 0.9 indicating a good fit. This index may be viewed as the percentage of observed-measure covariation explained by a given model. The disadvantage of the NFI is that it can underestimate goodness-of-fit in small samples. Bentler’s (1990) revised Normed Comparative Fit Index (CFI) is based upon the Bentler and Bonett (1980) NFI but with a correction for sample-size dependency. CFI values always lie between 0 and 1, with values over 0.9 indicating a relatively good fit (Bentler, 1990). Finally, the Adjusted Goodness of Fit Index Fig. 2. Structural Equation Model with Path Coefficients. A Structural Equation Model of Auditors’ Professional Commitment 17 Table 3. Structural Equation Modeling Results. Dependent Variable Independent Variable PC Firm size Political ideology Gender P Score Position Gender Gender Political ideology Position P Score Associated Hypothesis H1 H2 H3 H4 H5 – – – Path Coefficient t-Value p-Value −0.236 −0.154 −0.042 0.059 0.184 −0.353 0.195 0.183 −4.65 −2.98 −0.76 1.13 3.43 −7.04 3.78 3.55 0.001 0.002 0.224 0.132 0.001 0.001 0.001 0.001 N = 349. PC = Professional Commitment. (AGFI), devised by Joreskog and Sorbom (1984), is an additional fit index that ranges from 0 to 1, with values above 0.9 indicating acceptable fit. Specifically, in addition to the traditional Goodness of Fit Index (GFI), the Adjusted Goodness of Fit Index (AGFI), the Normed Fit Index (NFI), and the Comparative Fit Index (CFI) are reported in this study. This lends some assurance that the measures of fit produced are not spurious. Figurative depictions of the results of the structural equation analysis are presented in Fig. 2. With GFI, AGFI, NFI, and CFI values exceeding 0.9 in all instances, the theoretical model appears to provide a very good fit with the dataset. Tabular results of the structural equation analysis including a listing of each hypothesis and its corresponding path coefficient are presented in Table 3. Consistent with the relatively high model fit indices, results in Table 3 indicate that an overwhelming majority of the associations hypothesized in the current study and suggested by prior literature were significant, providing further support for the proposed theoretical model of professional commitment. Tests of Hypotheses Hypothesis 1 predicts a negative relationship between firm size and professional commitment. The path coefficient for this theoretical link is −0.236 and is significant at the p < 0.001 level. Thus, smaller firms tend to have employees who possess higher levels of professional commitment. Hypothesis 2 predicts that conservative auditors will demonstrate higher professional commitment than liberal auditors. For the full sample, a one 18 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER tailed t-test indicated that the professional commitment of politically conservative auditors was higher than that of liberal auditors (76.2 vs. 72.2; p < 0.017). The path coefficient for this theoretical link is −0.154 and is significant at the p < 0.002 level. This result provides support for H2 and implies that political ideology is an influential socialization variable in public accounting. Male auditors are predicted in H3 to have higher professional commitment than female auditors. For the full sample, males reported higher commitment than females (76.5 vs. 73.6; p < 0.016) but the association between positional level and gender must be considered before drawing any conclusions regarding the gender-professional commitment relationship. The control path between gender and position has a coefficient −0.353 and is significant at the p < 0.001 level, implying that male auditors in the sample are more likely to inhabit higher level positions. After controlling for the influence of positional level, the path coefficient linking gender and professional commitment is −0.042 and insignificant. This result suggests that gender does not play a direct role in the development of an auditor’s professional commitment. Hypothesis 4 predicts that there is a positive relationship between ethical development, as measured by DIT P scores, and professional commitment. In order to unambiguously interpret this path, the associations between P score and gender and P score and political ideology must be considered. The corresponding coefficient for the path between gender and P score is 0.195 and significant ( p < 0.001). This result suggests that female auditors attain higher P scores than their male counterparts. Additionally, the path coefficient linking political ideology with P score is 0.183 and also significant ( p < 0.001), suggesting that politically liberal auditors attain higher P scores than politically conservative auditors. After controlling for the influence of gender and political ideology, the path coefficient between P score and professional commitment is 0.059 and insignificant. H4 is therefore rejected, as an auditor’s ethical development does not appear to directly influence his or her professional commitment. Hypothesis 5 predicts that there is a positive relationship between position and professional commitment. The path coefficient linking these two constructs is 0.184 and is significant at the p < 0.001 level. This result provides support for H5 and suggests that auditors employed at higher levels within their respective firms exhibit higher levels of professional commitment, although the relationship is not necessarily linear. Furthermore, it is not clear from the analysis whether auditors with higher levels of professional commitment are more likely to be promoted, or whether auditors develop higher professional commitment as they advance within the profession. A Structural Equation Model of Auditors’ Professional Commitment 19 Additional Analysis Table 4 examines the influence of the significant main effects on auditors’ professional commitment, partitioned by firm size, position and political ideology. Professional commitment scores are highest in the regional firms and, as expected, at the partner level. Senior auditors in regional and national firms also demonstrate relatively high commitment. The influence of political ideology is evident, as Table 4. Summary of Professional Commitment Levels By Political Ideology and Position for Each Firm Size. Firm Size n Mean PC Small 88 80.02 Regional 63 76.44 Position N Mean PC Staff 23 75.57 Senior 15 80.20 Supervisor 11 77.91 Manager 10 76.90 Partner 29 85.34 Staff 22 76.64 Senior 10 80.3 8 72.75 14 72.93 9 80.44 Staff 55 73.47 Senior 63 69.08 Supervisor 19 73.89 Manager 39 72.23 Partner 22 85.55 Supervisor Manager Partner Big 6 198 73.21 Political Ideology n Mean PC Conservative Liberal Conservative Liberal Conservative Liberal Conservative Liberal Conservative Liberal 17 6 11 4 9 2 8 2 21 8 77.06 71.33 81.27 77.25 80.11 68.00 77.13 76.00 86.38 82.63 Conservative Liberal Conservative Liberal Conservative Liberal Conservative Liberal Conservative Liberal 16 6 9 1 8 0 13 1 8 1 78.31 72.17 78.89 93.00 72.75 – 71.92 86.00 79.25 90.00 Conservative Liberal Conservative Liberal Conservative Liberal Conservative Liberal Conservative Liberal 48 7 48 15 17 2 32 7 21 1 74.13 69.00 70.04 66.00 75.35 61.50 73.31 67.29 85.10 95.00 20 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER conservative auditors have higher commitment than liberal auditors for every cell containing at least two liberal auditors. An objective of socialization is to insure that management promotes those individuals who reflect the culture and values of the organization (Fogerty, 1992; Kanter, 1977; Ponemon, 1992). If conservative ideology is a strongly held value in the culture of public accounting, then politically conservative auditors should perceive greater opportunities for advancement than politically liberal auditors. To provide further evidence of the socializing influence of political ideology in public accounting, subjects who were not partners were asked to respond to the following question: “Please indicate what you believe are your chances (likelihood) of making partner in your present firm.” The Likert response scale for the question ranged from 1 (very low) to 7 (very high). Conservative auditors, on average, perceived their opportunities for advancement to partner as significantly greater than liberal auditors (3.68 vs. 2.96; p < 0.0003). LIMITATIONS AND DISCUSSION The credibility auditors confer upon financial statements is jointly dependent upon their technical expertise and their commitment to the professional ideal of independence (Watts & Zimmerman, 1986). In this study, auditors’ professional commitment was modeled as the collective product of operant socializing influences in public accounting firms. Two subsets of socializing forces were examined: professional (firm size and position) and individual characteristics affecting group membership (political ideology, gender, and ethical development). The model was tested on a large sample of auditors representing all positional levels in international, national and regional firms. The results of the structural equation analysis were generally supportive of the model. The results support the contention (Pratt & Beaulieu, 1992) that the culture of the public accounting profession can be differentiated by firm size showing a significant negative relationship between firm size and professional commitment. Fogerty (1995, p. 46) suggests that firms “may differ in the balance they encourage between commitment to the firm and to the profession.” Larger firms, perhaps due to their stronger culture and identity separate from the profession (Goetz et al., 1991), may be more likely to shift this balance towards organizational commitment. Furthermore, auditors from larger firms may identify less with the profession and more with the organization and because of its greater prestige and economic significance, while auditors from smaller firms may correspondingly attach greater significance to professional membership. The findings also indicate that rank or position, symbolic of status and compensation level, was positively related to professional commitment. A Structural Equation Model of Auditors’ Professional Commitment 21 The process of socialization implies that membership within the dominant group conveys benefits. Feelings of inclusion or exclusion from the controlling group are likely manifested in important job-related attitudes, such as professional commitment (Chatman, 1991; Ponemon, 1992). A major contribution of this study to the research literature is the inclusion of political ideology as a socializing force in public accounting organizations. Politically conservative auditors, representing the dominant ideology, had a greater commitment to the profession than did liberal auditors. The public accounting profession has, in recent years, increasingly emphasized the recruitment of under-represented socio-economic groups; however, a truly diverse workplace is open to disparate opinions and viewpoints. Although public accountants have traded their green eyeshades for laptop computers, they appear to still embrace a politically conservative ideology. Firm management can benefit from this research in understanding that the public accounting profession may be so doctrinally conservative that it could be effectively excluding a significant segment of society, political liberals, whose perspectives may be valuable in understanding a rapidly changing world. Efforts directed towards changing the traditionally conservative image of the public accounting profession may be beneficial in attracting new members with alternative viewpoints. Although male auditors, on average, reported a stronger commitment to the profession than female auditors, gender was not a significant direct factor in the model after controlling for the influence of positional level. Gender, however, did have an indirect impact on professional commitment. After controlling for political ideology and gender, the relationship between auditors’ ethical development and professional commitment previously reported was not supported (Jeffery & Weatherholt, 1996). The limitations of this research need to be recognized. First, the sample selection process was non-random, which may limit generalizability. Second, as the data were drawn from survey questionnaires, reliability is dependent upon the truthful responses of the participants. Third, the dichotomous measure of political ideology did not reflect the intensity of the subject’s commitment to conservative or liberal positions. A more comprehensive measure may better contribute to our understanding of the impact of political ideology as a socializing force in public accounting organizations. Potential extensions of this research include examining further the impact of political ideology in accounting organizations. The relationship between political ideology and important job-related attitudes, such as satisfaction, organizational commitment and turnover intentions, may advance our understanding of professional socialization. Political ideology may also affect other important processes in public accounting, such as recruitment and audit team dynamics. 22 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER NOTES 1. Former Securities and Exchange (SEC) Commissioner Arthur Levitt questioned whether the expansion into more lucrative services compromises the traditional audit function (Covaleski, 1999). Suggesting that the audit has merely become a conduit for selling other services, Levitt contends that auditors may not be sufficiently committed to societal expectations and professional standards. 2. SCT is an extension of social identity theory (SIT) (Ashforth & Mael, 1989; Brown, 2000; Tajfel & Turner, 1985). SIT maintains that one’s social identity is derived primarily from group membership, that people strive to maintain a positive identity, and that this positive identity largely results from favorable comparisons between relevant in-groups and out-groups (Ashforth & Mael, 1989). 3. Fogerty (2000, p. 13) described the socializing influence of prototypes in public accounting firms when he stated: “Experienced organizational members selectively provide reinforcement, communicate the approved range for action, and serve as examples of achievement.” 4. An individualist orientation supports the notion of capitalism in viewing people as independent economic actors, as opposed to a collectivist orientation that is more aligned with a socialist perspective (Burns, 1992, p. 352). 5. After controlling for political ideology and gender, Sweeney (1995) did not find a significant relationship between rank and DIT P scores. Therefore, we do not control for the influence of rank on ethical development. 6. Dwyer et al. (2000) examined the dimensionality of the Aranya et al. (1981) professional commitment scale with a broad sample of practicing accountants and concluded that the 15-item scale could be parsimoniously reduced to a five-item measure. In light of this research, we performed a principal components, orthogonal rotation factor analysis of the instrument. Results of the factor analysis indicated that 14 of the 15 items possessed loadings of 0.40 or greater on a single factor. Item 7 of the instrument, which possessed a loading of 0.15, was the lone item not contributing to the factor. The resulting eigenvalue for the 14-item factor was 5.49. The Cronbach alpha for the 15-item measure was 0.88. Supplemental analyses utilizing the reduced 5-item scale from Dwyer et al. (2000) were also performed and the results were essentially identical to those incorporating the full scale. ACKNOWLEDGMENTS We gratefully acknowledge the helpful comments of the participants in 2001 Annual Meeting of the Accounting, Behavior & Organizations Section, the 2002 Critical Perspectives in Accounting Conference, and the accounting research workshops at the Australian National University and at Washington State University. REFERENCES Adler, A., & Aranya, A. N. (1984). Comparison of the work needs, attitudes and preferences of professional accountants at different career stages. Journal of Vocational Behavior (August), 45–57. A Structural Equation Model of Auditors’ Professional Commitment 23 Aranya, N., & Ferris, K. R. (1984). A re-examination of accountants’ organizational-professional conflict. The Accounting Review, 59(October), 1–15. Aranya, N., Lachman, R., & Amernic, J. (1982). Accountant’s job satisfaction: A path analysis. Accounting, Organizations and Society (3), 201–215. Aranya, N., Pollock, J., & Amernic, J. (1981). An examination of professional commitment in public accounting. Accounting, Organizations and Society (4), 271–280. Ashforth, B. E., & Mael, F. A. (1989). Social identity theory and the organization. Academy of Management Review, 18, 20–39. Bentler, P. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246. Bentler, P., & Bonett, D. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588–606. Bernardi, R. A., & Arnold, D. F. (1997). An examination of moral development within public accounting by gender, staff level, and firm. Contemporary Accounting Research, 14, 653–668. Bline, D. M., Duchon, D., & Meixner, W. F. (1991). The measurement of organizational and professional commitment: An examination of the psychometric properties of two commonly used instruments. Behavioral Research in Accounting, 3, 79–96. Brown, A. D. (2000). Organization studies and identity: Towards a research agenda. Human Relations, 54(1), 113–121. Burns, T. J. (1992). Class dimensions, individualism, and political orientation. Social Spectrum, 12, 349–362. Chatman, J. A. (1991). Matching people and organizations: Selection and socialization in public accounting firms. Administrative Science Quarterly, 36, 459–484. Chatman, J. A., Polzer, J. T., Barsade, S. G., & Neal, M. A. (1998). Being different yet feeling similar: The influence of demographic composition and organizational culture on work processes and outcomes. Administrative Science Quarterly, 43, 749–780. Collins, K. M. (1993). Stress and departures from the public accounting profession: A study of gender differences. Accounting Horizons, 7(March), 29–38. Covaleski, J. M. (1999). SEC chief lashes out against auditors. Electronic Accountant (October 8th). Covaleski, M. A., Dirsmith, M. W., Heian, J. B., & Samuel, S. (1998). The calculated and the avowed: Techniques of discipline and struggles over identity in Big 6 public accounting firms. Administrative Science Quarterly, 43, 293–327. Dwyer, P. D., Welker, R. B., & Freidberg, A. H. (2000). A research note concerning the dimensionality of the professional commitment scale. Behavioral Research in Accounting, 12, 279–296. Emler, N. P., Renwick, S., & Malone, B. (1983). The relationship between moral reasoning and political orientation. Journal of Personality and Social Psychology, 45(5), 1072–1080. Enyon, G., Hill, N. T., & Stevens, K. T. (1997). Factors that influence the moral reasoning abilities of accountants: Implications for universities and the profession. Journal of Business Ethics, 16, 1297–1309. Farmer, T. A. (1993). An examination of organizational commitment and professional commitment in an auditing context. Journal of Managerial Issues (Winter), 503–516. FECInfo (2001). Federal Election Commission: Final Report: U.S. Senate and House Campaigns. Washington, DC: Federal Election Commission. Fisher, D. G., & Sweeney, J. T. (2002). Morality vs. ideology: Implications for accounting ethics research. Advances in Accounting Behavioral Research, 5, 141–160. Fogerty, T. J. (1992). Organizational socialization in accounting firms: A theoretical framework and agenda for future research. Accounting, Organizations and Society, 17, 129–149. Fogerty, T. J. (1995). Questioning the assumed homogeneity of the behavioural environment of accounting firms: Some exploratory empirical research. British Accounting Review, 27, 45–59. 24 JOHN T. SWEENEY, JEFFREY J. QUIRIN AND DANN G. FISHER Fogerty, T. J. (2000). Socialization and organizational outcomes in large public accounting firms. Journal of Managerial Issues, 12(Spring), 12–33. Gaffney, M. A., McEwen, R. A., & Welsh, M. J. (1993). Gender effects on commitment of public accountants: A test of competing sociological models. Advances in Public Interest Accounting, 5, 45–73. Goetz, J. F., Morrow, P. C., & McElroy, J. C. (1991). The effect of accounting firm size and member rank on professionalism. Accounting, Organizations and Society, 16, 159–165. Harrell, A., Chewning, E., & Taylor, M. (1986). Organizational-professional conflict and the job satisfaction and the turnover intentions of internal auditors. Auditing: A Journal of Practice and Theory, 5(Spring), 109–121. Hogg, M. A., & Terry, D. J. (2000). Social identity and self-categorization processes in organizational contexts. Academy of Management Review, 25(1), 121–140. Hooks, K. L., & Cheramy, S. J. (1994). Facts and myths about women CPAs. Journal of Accountancy, 178(October), 79–86. Hull, R. P., & Umansky, P. H. (1997). An examination of gender stereotyping as an explanation for vertical job segregation in public accounting. Accounting, Organizations and Society, 22(6), 507–528. Jeffery, C., & Weatherholt, N. (1996). Ethical development, professional commitment, and rule observance attitudes: A study of CPAs and corporate accountants. Behavioral Research in Accounting, 8, 8–31. Joreskog, K., & Sorbom, D. (1984). LISREL – VI users guide (4th ed.). Mooresville, IN: Scientific Software. Kanter, R. (1977). Men and women of the corporation. New York: Basic Books. Kohlberg, L. (1969). Stage and sequence: The cognitive developmental approach to socialization. In: D. A. Goslin (Ed.), Handbook of Socialization Theory and Research (pp. 347–480). Chicago: Rand McNally. Lampe, J., & Finn, D. (1992). A model of auditors’ ethical decision process. Auditing: A Journal of Practice & Theory (Suppl.), 1–21. Larson, M. S. (1977). Rise of professionalism: A sociological analysis. Berkley: University of California Press. Maupin, R. J. (1993). How can women’s lack of upward mobility in accounting organizations be explained? Group and Organization Management, 18(June), 132–152. Maupin, R. J., & Lehman, C. R. (1994). Talking heads: Stereotypes, status, sex-roles and satisfaction of female and male auditors. Accounting, Organizations and Society, 19, 427–437. Norris, D. R., & Niebuhr, R. E. (1983). Professionalism, organizational commitment and job satisfaction in an accounting organization. Accounting, Organizations and Society, 9, 49–59. Ponemon, L. A. (1992). Ethical reasoning and selection-socialization in accounting. Accounting, Organizations and Society, 17, 239–258. Ponemon, L. A., & Gabhart, D. (1993). Ethical reasoning in accounting and auditing. Vancouver, Canada: Canadian General Accountants’ Research Foundation. Porter, L. W., Steers, R. M., Mowday, R. T., & Boulian, P. V. (1974). Organizational commitment, job satisfaction, and turnover among psychiatric technicials. Journal of Applied Psychology, 59(October), 603–609. Pratt, J., & Beaulieu, P. (1992). Organizational culture in public accounting: Size, technology, rank, and functional area. Accounting, Organizations and Society, 17, 667–684. Rest, J. R. (1979). Development in judging moral issues. Minneapolis, MN: University of Minnesota Press. A Structural Equation Model of Auditors’ Professional Commitment 25 Rest, J. R. (1986). Moral development: Advances in research and theory. New York: Prager Press. Rest, J. R. (1993). Guide for the deﬁning issues test. Version 1.3. Minneapolis, MN: University of Minnesota. Rest, J., Narvaez, D., Bebeau, M. J., & Thoma, S. J. (1999). Postconventional moral thinking: A neo-kohlbergian approach. New Jersey: Lawrence Erlbaum Associates. Rest, J., Thoma, S. J., & Edwards, L. (1997). Designing and validating a measure of moral judgment: Stage preferences and stage consistency approaches. Journal of Educational Psychology, 89(1), 5–28. Schroeder, R. G., & Imdieke, L. F. (1977). Local-cosmopolitan and bureaucratic perceptions in public accounting firms. Accounting, Organizations and Society, 1, 39–45. Shaub, M. (1994). An analysis of factors affecting the cognitive moral development of auditors and auditing students. Journal of Accounting Education, 12, 1–26. Shaub, M., Finn, D., & Munter, P. (1993). The effects of auditors’ ethical orientation on commitment and ethical sensitivity. Behavioral Research in Accounting, 5, 145–169. Siegal, P., Blank, M., & Rigsby, J. (1991). Socialization of the accounting professional: Evidence of the effect of educational structure on subsequent auditor retention and advancement. Accounting, Auditing and Accountability Journal, 4, 58–70. Sorenson, J. E. (1967). Professional and bureaucratic organization in the public accounting firm. The Accounting Review, 42(July), 553–565. Sorenson, J. E., & Sorenson, T. C. (1974). The conflict of professionals in bureaucratic organizations. Administrative Science Quarterly (March), 98–106. Street, D. L., Schroeder, R. G., & Schwartz, B. (1993). The central life interests and organizational professional commitment of men and women employed by public accounting firms. Advances in Public Interest Accounting, 5, 201–229. Sweeney, J. T. (1995). The moral expertise of auditors: An explanatory analysis. Research on Accounting Ethics, 1, 213–234. Sweeney, J. T., & Fisher, D. G. (1998). An examination of the validity of a new measure of moral judgment. Behavioral Research in Accounting, 10, 138–158. Sweeney, J. T., & Fisher, D. G. (1999). Politics, faking, and self-presentation: How valid is the P score of the Defining Issues Test? Research on Accounting Ethics, 5, 51–75. Tajfel, H., & Turner, J. C. (1985). The social identity theory of intergroup behavior. In: S. Worchel & W. G. Austin (Eds), Psychology of Intergroup Relations (2nd ed., pp. 7–24). Chicago: NelsonHall. Watts, R. L., & Zimmerman, J. L. (1986). Positive accounting theory. Englewood Cliffs, NJ: PrenticeHall. Wheeler, R., Felsig, R. M., & Reilly, T. (1987). Large or small CPA firms: A practitioner’s perspective. CPA Journal (April), 29–33. AN ANALYSIS OF GROUP INFLUENCES ON GOING CONCERN AUDITOR JUDGMENTS Sunita S. Ahlawat and Timothy J. Fogarty ABSTRACT Studies that have indicated that the processing of audit evidence results in judgment bias may be the result of the study of individual decision-making. Building on work that suggests important differences between individual and group decision-making, this paper evaluates decision-making attributes of audit groups. Experienced auditors from ofﬁces of Big-Five ﬁrms in the U.S. served as the participants in an experiment involving the going concern judgment. Results show that recency does affect the judgments of individual auditors but disappears as an important effect when groups make judgments. Group responses are less extreme and exhibit greater conﬁdence than those of individuals. INTRODUCTION The descriptive theory of belief updating proposed by Hogarth and Einhorn (1992) posits that the order in which evidence is received has a significant and predictable influence on a person’s final judgment. Most of the attention generated by this discovery has focused around recency effects. Recency refers to the tendency to place a greater weight on evidence received later in a sequence. Accordingly, an over-reliance on information presented last may occur. A number of experimental Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 27–51 © 2003 Published by Elsevier Ltd. ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06002-2 27 28 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY studies utilizing various conditions suggest that significant recency effects exist in accountants’ and auditors’ belief revisions (e.g. Asare, 1992; Ashton & Ashton, 1988; Dillard et al., 1991; Pei et al., 1992; Trotman & Wright, 1996; Tubbs et al., 1990). However, recent research has questioned the prevalence of recency in auditing. Cushing and Ahlawat (1996) suggested that such effects may not be common in audit practice. Other studies also have produced evidence that recency effects do not always occur, or occur only under certain circumstances (Kennedy, 1993; Messier & Tubbs, 1994; Trotman & Wright, 1996). This paper builds on the growing recognition that contextual factors (e.g. accountability, cognitive involvement, experience, and task realism) might mitigate judgment bias in audit judgment. Another potential factor is group influence. Many auditing situations involve either formal or informal group consultation (Gibbins & Emby, 1985). For example, a team of audit staff and seniors typically conduct audit fieldwork. The group expands as managers and partners review this work prior to the issuance of an audit report. However, the growing recognition that cognitive heuristics and biases in auditors’ judgments can lead to different outcomes, including different types of audit reports (e.g. Asare, 1992), has developed with little consideration of group influences. This research investigates the potential for group processes to overcome weaknesses in accountants’ judgment. In addition to the recency bias, this paper also examines the related attributes of decision confidence and belief revision that vary between audit groups and individual auditors. This research finds fundamental differences between groups and individuals in their exposure to recency effects, the nature of their belief revision processes, and their confidence in decisions. Four subsequent sections are employed. The first develops the literatures surrounding group decision-making and judgment biases as a prelude to stating the research hypotheses. The second describes the empirical study. The last two sections present the results and discuss their implications and limitations. LITERATURE REVIEW AND RESEARCH HYPOTHESES Groups and Group Decision-Making The unique condition of the group in business settings has been studied for some time. Early studies measured the impact of social cues and interpersonal opinions on performance and cognitive investment (Weiss & Shaw, 1979; White et al., 1977). As this area matured, interactive effects between group conditions and individual attributes were recognized (e.g. Vance & Biddle, 1985). Apart from An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 29 these more generic aspects, groups also were found to influence decision-making. Although individuals come to the group with some degree of pre-discussion preferences and unique decision-relevant information that continue to influence group decisions (Winquist & Larson, 1998), the group resists reduction to the sum of its members. Groups are believed to produce substantively different decisions than individuals (Hill, 1982; Miner, 1984). The improved accuracy of groups that has been reported in many areas may be attributable not only to the increased perspectives contributed by members, but also to the heightened caution as consensus processes tend to eschew extreme solutions (Myers & Lamm, 1976). Although the balance of evidence suggests net gains for group decisions over those of individuals, a full explanation of their origin remains elusive. The extent that groups may be effective at reducing the random error associated with individual choice, may depend on the effectiveness with which feedback can be incorporated. Group advantages may also center on the reduction of individual variability. However, the importance of these conditions varies with the context of the decision. Group Decision-Making in Accounting and Auditing Solomon’s (1987) review of the literature on multi-auditor decision-making has not resulted in a critical mass of work on audit groups. Notwithstanding the paucity of academic treatments, the audit process resolutely remains the result of group deliberations. Evidence gathered by auditors continues to reflect team processes. Work done by staff members still requires a consensus distillation of conclusions. Work reviewed by supervisors, and then by partners, indicates a group orientation toward the work.1 The computerization of the audit may have changed the medium for group interaction but it has not altered the necessity for a meeting of the minds by auditors. A going concern decision involves aspects of both individual and multi-person decision-making. The decision is based on many pieces of evidence that may have been gathered and initially reviewed by selected individuals. Because predicting the going concern status is critical, it is unlikely to be made by an individual without extensive consultation with the audit team and other audit firm members. While the decision itself is likely to be made by a group, individual opinions also are important since pivotally-situated individuals (managers, seniors, and staff), who themselves have weighed the evidence, make recommendations, and suggestions. Consultation with other auditors prior to important decisions (such as going concern) conforms to the requisites of professional auditing standards (Reckers & Schultz, 1993). If group judgments are significantly 30 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY different from individual judgments, the practical implications of going concern studies involving only individual judgment alone may be somewhat limited. The going concern judgment has been characterized as a series of belief revisions, where each revision is the weighted average of the previous judgment and the value of the current evidence (Asare, 1992; Cushing & Ahlawat, 1996). The final revised belief is then compared to the threshold for substantial doubt for issuing an unqualified opinion (Asare, 1992). Thus, unlike most other audit decisions, the going concern matter goes right to the “bottom line” for both clients and auditors. The evaluation of going concern status is regarded as critical, difficult, and complex by most partners (Chow et al., 1987). This necessitates some consideration of how such a decision is made. Cushing and Ahlawat (1996) asserted that in order to effectively revise beliefs the auditor must: (1) read and comprehend all information cues provided; (2) adequately recall relevant information provided in prior stages of the sequential task; (3) give sufficient attention to relevant prior and new information at each stage; (4) effectively relate all of this information to his or her existing knowledge structure; and (5) develop a problem representation sufficient to complete the task effectively. Failure to carry out any one or more of these activities could contribute to the recency effect (Cushing & Ahlawat, 1996). However, these requisites also imply that recency can be reduced with greater effort or attention. A number of studies have encouraged active involvement in the above activities. These include studies that examined the effects of accountability (Kennedy, 1993; Tetlock, 1983), documentation (Cushing & Ahlawat, 1996), explanation (Anderson & Sechler, 1986), and commitment (Church, 1991). Tetlock (1983) and Kennedy (1993) reported that judgments were less prone to order effects when participants were told that they may subsequently have to justify or explain their conclusions to others, such as their superiors. Apparently, the mere prospect of accountability was sufficient to produce more desirable information processing. In contrast, Cushing and Ahlawat (1996) and Church (1991) required participants to prepare a memorandum documenting the rationale for an audit decision. Similar results of reduced recency were reported. A common objective underlying these manipulations was to produce greater cognitive involvement and effort among participants. Although many decisions in the audit process are important, few match the consequences of the going concern decision. Accordingly, the audit firm would like to be highly confident that it has made the correct decision. The measurement of endogenous and exogenous levels of confidence has been part of the study of groups for some time (e.g. Zarnoth & Sniezek, 1997). In situations that lack clear correct answers, such as the going concern area, confidence and accuracy are not redundant (Luus & Wells, 1994). The formation of groups to make decisions may An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 31 be a means to increase confidence levels. However, at this point it is unclear if audit groups are more confident about such decisions than would individuals be when making the same decision. In sum, group dynamics may provide opportunities for more complete problem analysis of the going concern decision. The group process may be another form of cognitive investment that people put into a decision. Focusing primarily upon the tendency towards recency effects in such an environment allows us to evaluate the impact of the group. However, other group differences may also be involved for a broader picture of how groups compare to individuals in the auditing context. Hypotheses The studies discussed above suggest that the tools that enhance cognitive involvement can mitigate order effects. Group decision-making can serve to enhance effort and involvement. Group assistance can also be useful in lessening task demands. Groups have collective experience to draw from, whereas individuals work alone. Studies in social psychology have found that livelier interaction among group members was associated with superior performance (e.g. Valacich & Schwenk, 1995). Interacting groups also reduced belief perseverance (Wright et al., 1990). These findings suggest that the interaction process itself may have a positive effect on judgment. Two aspects of group process could contribute to superior performance. The group tends to broaden the information set that is brought to bear upon a choice (Stasser, 1992). This information set includes perspectives on what factual data means and what limitations it possesses. Group processes also reduces individual inconsistency or extremity (Schultz & Reckers, 1981). As information exchange between members occurs, group interaction becomes a “corrective function” when individual members have initially incomplete or biased information (Stasser & Titus, 1985) and are encouraged to alter opinions in order to reach a collective judgment (Stasser & Davis, 1981). The complexities of some audits make group processes even more salient. Auditors are aware of the importance of group work and the need to share and integrate expertise (Schultz & Reckers, 1981). The audit requires considerable knowledge about industries and competitive factors in order to ascertain the consequences of account balance fluctuations. Fisher and Ellis (1990) suggested that social pressures created by the group interaction process would moderate extreme or divergent views held by group members as they work to accommodate each other’s views. In an audit setting, groups may be useful in preventing anecdotal experience about certain business conditions from being overly generalized. 32 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY Groups may collectively recognize patterns and relationships that individuals working alone may not. Group discussion can lead to a more complete problem analysis resulting in improved judgment quality. Judgment quality is a function of capacity, effort, internal data (i.e. memory and knowledge), and external data access (Kennedy, 1993). Because of potential pooling of resources, correction of errors, and use of qualitatively different learning strategies, process gains from interaction are possible (Hill, 1982). Estimation biases observed with individual judgments can be reduced considerably through group interaction (Sniezek & Henry, 1989). Similarly, group interaction also may mitigate recency due to some combination of enhanced capability, experience, and cognitive involvement.2 This study specifically examines the consequences of group interaction as a means to overcome the limitations of the study of individuals engaging in acts that are more likely performed by groups. By holding the amount of information that decision-makers have more constant than would be true in an actual audit, this study enables a focus upon the judgment process. Since groups can increase cognitive effort, reduce complexity, and capitalize on experience, they should exhibit less recency bias than individuals. Recency may not be a serious problem in practice if audit groups are less susceptible to the order in which evidence is presented. Based on the preceding discussion, the following hypothesis is tested: H1. Audit groups will exhibit less recency effects in their going concern judgments than will individual auditors. Over the last fifteen years, many researchers have recognized that individual confidence is an important dimension of group interaction. Unlike accuracy, confidence can be made explicit at the time judgments are made. Therefore, confidence is a key indictor of the extent that uncertainty is perceived to be inherent in a task. High levels of uncertainty suggest that a decision is unusually sensitive to differences in judgment. This condition may make judgment biases more consequential to the decision. Accordingly, confidence and accuracy can be affected by different factors (Luus & Wells, 1994). Sniezek and Henry (1989, 1990) postulate a two-stage process for groups to reach a consensus judgment: (1) the revision process; and (2) the weighting process. At the revision stage, individual judgments are voluntarily revised in light of information exchanged during interaction. At the weighting stage, group members use some implicit or explicit rule to combine divergent views and negotiate their individual judgment to form a single group judgment. This process is sufficiently engaging and explicit so that when group members adopt a single group judgment, they may have higher confidence in that group judgment than they would have had in their own individual judgment (Sniezek & Henry, 1990). After the weighting process, group members should express higher confidence An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 33 about their decision because it takes into account a wide set of perspectives on importance. Lower confidence would be inconsistent with the social pressures that support the participatory consensus formation around the group’s choice. As such, the group interaction process may lead to higher group confidence compared to the individual members’ pre-group confidence (Sniezek & Henry, 1989, 1990). The greater confidence may also reflect individuals’ recognition that groups can potentially recognize, evaluate, and process more information than individuals.3 In an accounting study, Bloomfield et al. (1996) showed that interaction that inspired group confidence contributed to group performance. In a different vein, Allwood and Granhag (1996) found that groups inspired not only confidence, but also realistic confidence. The level of confidence is particularly important for the going concern decision made by auditors. The evaluation of business survival is inherently oriented toward the future and therefore is more uncertain than most auditing decisions. Since the going concern decision has distinct adverse consequences for the client, high levels of confidence are called for to withstand the client resistance that is likely to result. Accordingly, the following hypothesis will be considered: H2. Audit groups will exhibit greater confidence than individual auditors about going concern decisions. Research over the last thirty years has identified many reasons to depart from the belief that the direction of influence in decision-making is symmetrical. Human beings are not bound to strict mathematical consistency when dealing with information that points to one conclusion relative to information that leads to an opposite result. Pivoting around a baseline (zero), positive movements and negative cues of equal magnitude have often been shown to be processed in a qualitatively differently way. However, the reasons that individuals are influenced by these frames of reference are imperfectly understood (Newman, 1980). If group-based reasoning is capable of integrating more information and wider perspectives, it also may be capable of altering the tendency to treat categories of cues in ways that are inconsistent with Bayesian logic. The more varied experiences available to the group as input to their decision may work against the tendency to over-weigh the negative or the positive. If framing effects are psychological in nature, forcing them into open discussion may have the effect of exposing their inconsistency. In other words, there may be more balance in how groups react to positive and negative types of information than there would be in how individuals react to that same information. Auditing has been described as the attempt to confirm a series of interrelated hypotheses about the clients accounting records (Church & Schneider, 1993). Evidence that the accounts are correct as stated therefore can be logically 34 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY opposed to evidence of material error. The going concern decision appears to be a special case of this bifurcation of evidence, here contrasting pro-survival and anti-survival implications for the business entity. Whereas the former mitigates against a going concern problem, the latter type tends to confirm such a problem. What individual auditors may do in the consideration of these types may not be the same as what groups would do. Individual auditors may not be as able to recognize that they are acting in a way that systematically over weighs either positive or negative information. Groups may therefore be less likely to overreact to either good news about audit client viability or bad news about doubtful continuation. A two-part hypothesis that pinpoints the possibilities of difference would be: H3a. Audit groups will revise their beliefs about going concern in response to confirmatory going concern evidence less than individual auditors. H3b. Audit groups will revise their beliefs about going concern in response to mitigating going concern evidence less than individual auditors. In sum, four specific effects that differentiate groups and individuals are expected. Groups should be less influenced by the order of the evidence that they consider in a going concern decision context. They should also exhibit higher levels of confidence about the accuracy of their determinations. Groups are expected to be more temperate in their reactions to incremental positive and negative information. Together, the hypotheses suggest that groups will make less bias and more confident going concern decisions. THE EXPERIMENT An experiment was designed to test the hypotheses in a context where auditors are asked to evaluate a client’s ability to continue as a going concern. This type of context has been employed frequently in prior studies of recency effects in audit judgment. The specific task in the experiment involves making a series of judgments about a firm’s going-concern status and a recommendation about the type of audit report to issue. The experiment was conducted in the offices of the participating international public accounting firm over a four-week period. In each office, arrangements were made for subjects to participate as individuals or as members of three-person groups. Judgments were made privately by individuals or collaboratively in groups. Although the assignment of participants to conditions was random, group composition was subject to member availability at the pre-established time for An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 35 the exercise.4 The only qualifying stipulation was that participants were primarily engaged in the auditing activities of the firm and that they had at least two years of experience. A researcher distributed and collected all materials in person. For groups, the researcher was present outside the meeting room for the duration of the deliberations. Individuals completed the task in their offices, but without the physical proximity of the researcher. Task and Procedure Each participant was provided with case material. Although each member of the group was given a copy of the case, groups were instructed to respond collectively on a single response sheet. Group members were encouraged to discuss the case prior to reaching a consensus. Each group designated one member to record the group response. A cover letter accompanying the case materials suggested that the task should take about 60 minutes to complete. Whereas letters to groups emphasized the importance of working collectively, letters to individuals stressed the need for independent work. Both types of letters asked participants to proceed through the materials in one sitting. All participants were guaranteed anonymity, assured that there were no right or wrong answers, and told that most of the questions dealt with matters of professional judgment. Participants were asked to read the case assuming that they were performing a review of preliminary results from the current year’s audit engagement. The case was previewed for realism and relevance by audit professionals other than the participants and was revised in accordance with their suggestions. The experimental materials consisted of a set of instructions and a case booklet. The case booklet contained background and financial information for a hypothetical client. The background information included a detailed description of the industry and a company, its operations, economic environment, and the type of audit opinion it had received in the last two years. The financial information comprised audited financial statements for the past three years and the current year. This information included the balance sheet, income statement, selected financial ratios, footnotes, statement of changes in financial position, and schedule of working capital changes. The experimental materials were designed to create a case in which the audit decision was not an obvious unqualified or modified (going concern) opinion. Figure 1 depicts the sequence of procedures required of the auditors for the experiment. The case consisted of four tasks. Participants were asked to complete each task in the order given to capture belief revision. They were instructed to 36 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY Fig. 1. Procedure for the Experiment. An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 37 return the task to the appropriately labeled envelope, and to seal the envelope at the end of each task. In Task 1, participants were first asked to provide their general threshold level for substantial doubt, such that a modified audit opinion would be recommended for any entity whose likelihood for continued existence fell below the threshold level. This established, in quantified terms, participants’ baseline threshold for substantial doubt before they considered the hypothetical client in particular. Group members had to agree to a single baseline. The scale used for pinpointing participants’ threshold levels ranged from 0 to 100, with endpoints labeled “certain not to continue” (0) and “certain to continue” (100). Participants then dealt with case-specific questions. They were asked to: (1) assess the likelihood of the client’s continued existence through the end of the current fiscal year; (2) recommend the type of audit report to be issued; and (3) indicate their confidence in the audit report recommended. A 0–100 scale with end points labeled “certain not to continue” (0) and “certain to continue” (100) measured this for each subsequent likelihood judgment. A similar 0–100 scale with end points labeled “not confident at all” (0) and “very confident” (100) was used to elicit participants’ confidence level. The audit report categories were Unqualiﬁed, Modiﬁed, and Disclaimer. Under U.S. auditing standards, the modified opinion would be appropriate if there were significant doubt about the entity’s continuation (AU 341, AICPA, 1990). At this point, participants did not know that they would receive additional information or have an opportunity to revise their previous judgments. In addition to familiarizing the participants with the client’s overall operations and financial conditions, Task 1 allowed them to set their own decisional anchor points. Task 2 of the case sequentially presented six additional pieces of evidence. Three of the evidence items were classified as “Contrary” with regard to the going concern status of the hypothetical company. Contrary information is defined as any evidence or issue that raises doubts about the entity’s ability to continue in existence. Specifically, the contrary items related to: (1) the upcoming expiration of a patent that had consistently generated approximately 25% of total sales; (2) the departure of one of the company’s key sales executives; and (3) the non-renewal of the company’s line of credit. The other three evidentiary items could be considered “Mitigating” in nature, since they might quell traditional auditor going concern doubts. The mitigating factors were: (1) the receipt of a favorable marketing research report on a new product line; (2) the successful deferment of an account payable over a three-year period; and (3) a successfully concluded contract negotiation with an employee labor union. Following the presentation of each of these pieces of evidence, participants were asked to provide a revised assessment of the likelihood that the client would continue in 38 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY existence through the end of current fiscal year. After providing the last of these assessments, participants were again asked to recommend the type of audit report to be issued and to indicate their confidence in the appropriateness of that report. The six items were presented in two orders. In the condition labeled MMMCCC on Fig. 1, the three mitigating factors (MMM) were presented first, followed by the three pieces of contrary information (CCC). The order of evidence was reversed in the second condition, labeled CCCMMM. The variation in the order of cues was the recency manipulation. Each of these items was presented on a new page contained in an envelope. Participants were asked to complete a new 0–100 scaled sealed assessment of the hypothetical company’s continuation as a going concern before examining the next item of evidence. After the last piece of evidence was revealed, participants were again asked about their confidence about the opinion type they recommended, with a question identical to that used in Task 1. Task 3 of the case required all participants to complete a questionnaire regarding their background and auditing experience. Since these questions concerned their individual attributes, all participants, even those that had worked in groups for Tasks 1 and 2, were asked to work alone on Task 3. Task 4 obtained data for a manipulation check. Nine pieces of evidence (including the six items presented in the experiment) were used to check respondents’ perceptions. They were asked to classify these nine items as contrary, mitigating, or neither, in relation to a going concern question. Individuals that had worked in groups for Task 1 and Task 2 also performed this task collectively in keeping with the intent to study the difference between groups and individuals.5 Participants Ninety-one auditors from a Big-Five CPA firm participated in the experiment. Of the 91 auditors, 49 were managers, and 42 were seniors. There were 21 groups, each consisting of one manager and two seniors. The 28 people who worked as individuals were all managers. This design feature was motivated by a desire to have at least one experienced individual in each decision-making unit.6 Table 1 presents auditor experience by rank and treatment conditions. On average, managers had 8.45 years of experience (range 5–15 years) while seniors had 3.26 years of experience (range 2–5 years). The sample of individuals had, on average, more experience (7.93 years) than auditors in the group condition (5.24 years). However, the groups had managers with more experience (9.19 years) as members. The extent to which group members had previous experience working with each other on actual audit engagements was not available information. An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 39 Table 1. Descriptive Information: Average Auditor Experience by Rank and Treatment Condition (Standard Deviation in Parentheses). Audit Experience Rank Decision Unit Group Information CCCMMM Experience (in years) Manager Senior No. of audits since working as an auditor Manager 106.4 (73.99) Senior 22.3 (15.13) No. of audits in which Manager an opinion other Seniorb than unqualified was issued 9.09 (1.38) 3.36 (0.85) 3.27 (1.79) 0.68 (0.99) Individual Ordera MMMCCC 9.30 (3.09) 3.15 (0.99) 139.5 (122.91) 26.4 (17.27) 3.60 (3.89) 1.15 (1.27) Information Order CCCMMM 8.23 (1.09) – 102.3 (69.63) – 0.38 (2.87) – MMMCCC 7.67 (1.40) – 105.67 (70.12) – 2.47 (1.99) – a Respondents in the CCCMMM (MMMCCC) condition received three items of contrary (mitigating) evidence, followed by three items of mitigating (contrary) evidence. b 7 of 49 managers and 21 of 42 seniors indicated they had not been on any engagements in which the going concern opinion was in fact issued. Most participants indicated that, as members of audit teams, they had been involved in engagements in which an opinion other than unqualified was either seriously considered (81 of 91), or actually issued (63 of 91). This suggests that participants were familiar with non-standard audit reports in the “real world” of audit practice. Of the 63 who had been on audits in which a going concern opinion was issued, 42 were managers and 21 were seniors. Experiment Design Participants were assigned to one of four experimental conditions according to a 2 (decision unit) × 2 (order of evidence) design. Thus, the four treatment conditions for the first hypothesis were: Individual, CCCMMM; Individual, MMMCCC; Group, CCCMMM; and Group, MMMCCC. The dependent variable for the first hypothesis (H1) and the third hypothesis (H3) was the change in the assessed likelihood of the client’s continued existence. The change was measured based on assessments made after the initial review of the case in Task 1 (labeled J0 ) and after the review of all six additional items of evidence in Task 2 of the experiment 40 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY (labeled J6 ). Thus, belief-revision was computed as the percent change difference between revised and initial likelihood judgments (J 6 − J 0 ) for H1 and sequentially for the separate informational cues for H3.7 Confidence assessments elicited from the participants were used to analyze hypothesis H2. These were obtained after the initial (labeled Ci ) and final (labeled Cf ) recommendations for the type of audit report to be issued in Tasks 1 and 2, respectively. RESULTS Descriptive Results The results of the manipulation check in Task 4 were very satisfactory. Participants overwhelmingly reacted in the expected direction. Only 3 (1.02%) of the 294 possible cases (6 items each from 28 individuals and 21 groups) were incorrectly classified. Regardless of this small misclassification, participants always revised their probability assessment in the expected direction (downward in response to contrary information and upward in response to mitigating factors) to the evidential facts during Task 2. The average likelihood judgments (J0 – J6 ) are reported in Table 2. The average initial judgment (J0 ) by individuals (68.92 points) and groups (69.05 points) was not significantly different ( p > 0.10). Table 2 shows how each subsequent informational unit altered the progressive going concern estimation in the predicted direction. The average downward belief revision for contrary information was 39.16 points. The average upward belief revision was 15.82 points for mitigating information. This magnitude difference is consistent with prior findings that auditors are particularly sensitive to disconfirming evidence (Ashton & Ashton, 1988; McMillan & White, 1993). The average downward revision for contrary information was less for groups (31 points) than for individuals (45 points). Similarly, the average upward revision for mitigating information was 11 points for groups and 19 points for individuals. Consistent with the literature that suggests that groups function to taper extreme member positions, group responses were less polarized than individual responses in both the positive and the negative direction in this audit context. Tests of Hypotheses The first hypothesis specified that the groups would exhibit less recency effects than individuals. A 2 (decision unit) × 2 (order) ANOVA was conducted with percent change cumulative belief revision (J 6 − J 0 )/J 0 as the dependent variable. Treatment Conditions Decision Unit Group (N = 11) Group (N = 10) Individual (N = 13) Individual (N = 15) a Respondents Mean (Standard Deviation) of Initial (J0 ) and Revised (J1 Through J6 ) Likelihood Assessments Ordera J0 J1 J2 J3 J4 J5 J6 CCCMMM MMMCCC CCCMMM MMMCCC 69.54 (22.63) 68.50 (16.67) 66.92 (21.27) 70.67 (14.12) 59.54 (24.54) 71.50 (18.86) 41.92 (22.03) 76.53 (14.89) 47.72 (26.77) 73.50 (15.47) 34.85 (16.62) 76.73 (13.23) 38.82 (27.76) 78.50 (14.35) 23.46 (16.88) 81.00 (8.70) 42.73 (26.49) 64.50 (17.55) 36.92 (16.40) 60.00 (8.45) 45.91 (25.18) 54.20 (22.75) 52.07 (20.38) 41.80 (15.36) 51.36 (23.88) 47.00 (21.24) 52.84 (18.76) 34.27 (14.28) in the CCCMMM (MMMCCC) condition received three items of contrary (mitigating) evidence, followed by three items of mitigating (contrary) evidence. An Analysis of Group Inﬂuences on Going Concern Auditor Judgments Table 2. Descriptive Information: Analysis of Belief Assessments by Treatment Conditions. 41 42 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY Table 3. Tests of Hypothesis H1: Analysis of Variance: Order × Decision Unit with Belief Revision (J 6 − J 0 )/J 0 as the Dependent Variable. Source df Order Decision unit Order × Decision-unit Residual 1 1 1 45 Order CCCMMM MMMCCC t p Mean Square 0.464 0.033 0.278 0.051 F Sig. of F 9.085 0.654 5.432 0.004 0.423 0.024 Mean Belief Revision Individual Group −0.1921 −0.5180 −0.2921 −0.3133 4.13 0.000 0.20 0.847 The results are presented in Panel A of Table 3. The significance of the order variable (F = 9.085, p < 0.01) shows that recency effects are present in auditors’ going concerns decisions. More importantly however, the results reveal a significant interaction (F = 5.43, p < 0.05) between order and decision unit. This result suggests that judgments were not only influenced by the order in which evidence was evaluated, but also by whether judgments were made individually or in groups. The decision unit does not have a direct effect and is important only in terms of altering the impact of order effects. This suggests that groups act as a “debiaser” in eliminating recency in auditor going concern judgments. H1 is supported. Another test of recency among individual auditors shows that individuals in MMMCCC condition made a greater average downward adjustment in their going-concern likelihood judgments (from 70.67 to 34.27, a change of 36.40 points) than individuals in CCCMMM (from 66.92 to 52.84, a change of 14.08 points). This difference in average belief-revisions was significant (t = 3.96, p < 0.001). In contrast to the individual results, likelihood judgments of audit groups exhibited no recency. Here, the average downward adjustment was 21.50 points (from 68.50 to 47.00) for the MMMCCC condition, and almost identical 18.18 (from 69.54 to 51.36) points for the CCCMMM condition. This difference was not significant (t = 0.47, p > 0.65).8 Hence, as expected, groups mitigated the recency effect. These results also support H1. The second hypothesis asserted a relationship between decision unit and going concern judgment confidence. Specifically, audit groups were predicted to have greater confidence in their going concern decisions. For these purposes, decision An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 43 Table 4. Tests of Hypothesis H2: Analysis of Variance: Order × Decision Unit with Final Confidences as the Dependent Variable. Source df Mean Square F Sig. of F Order Decision unit Order × Decision-unit Residual 1 1 1 45 138.641 923.415 0.025 22.587 0.623 4.149 0.000 0.434 0.048 0.992 Decision Unit Individual Group t p Average Confidence Initial Final 63.57 75.57 71.25 80.23 2.27 0.028 2.25 0.029 confidence at the end of the case was used as the dependent variable. Final confidence is important because it reflects the processing of all the information in the case, either by groups or individually. Table 4 offers an ANOVA to test the second hypothesis. Information order and decision unit are included as possible effects upon final confidence consistent with H2. The significance of decision unit at p < 0.05 suggests that groups have higher levels of confidence.9 The failure of order effects, and the interaction between order and decision unit, to be significant suggests that only how the decision-making unit was structured influenced confidence. Although H2 pertains to the existence of group differences, the change in confidence that occurred during the experiment was also considered. Groups exhibited significantly higher initial confidence than individuals (t = 2.27, p < 0.03). A 2 × 2 ANCOVA with final confidence as the dependent variable, initial confidence as the covariate, and decision unit and order as the independent variables was conducted. In results not shown, the initial confidence covariate was significant ( p < 0.05). Neither of the two main effects nor their interaction was significant. This suggests that the differential confidence in the final decision was driven by the initial differences, and not by the differential processing of information. Nonetheless, groups maintained a significant difference in confidence over individuals throughout the entire process of belief revision. Groups begin more confidently and stay that way, as further information is made known about relevant events. However, the group does not progressively become significantly more confident. The confidence difference appears to adhere to 44 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY Table 5. Tests of Hypothesis H3: Analysis of Responses to Contrary and Mitigating Information. Mean (Standard Deviation) Response to contrary information Response to mitigating information Individuals Groups 45.21 (13.44) 19.18 (18.66) 31.09 (17.05) 11.33 (15.19) t p 3.13 1.57 0.003 0.122 the mere existence of the group, rather than its continued information handling abilities. The final hypothesis concerns different processing by groups and individuals of the confirmatory and mitigating information. In the test of H3, the six opportunities provided to participants to revise their probability beliefs were distinguished into contrary and mitigating types. As shown in Table 5, there is a significant difference between individual and group responses to contrary information (t = 3.13, p < 0.01), with individuals reacting more severely. This is consistent with H3a. No significant differences exist between audit groups and individual auditors when presented with mitigating information (t = 1.57, p > 0.12). This does not support H3b. Other Analyses In Hypothesis H1, the dependent variable was the revision of the assessment of the likelihood that the client firm will continue as a going concern. As Asare (1992) points out, it is also important to learn whether the differences in audit judgments induced by the recency effect are likely to lead to differences in substantive audit decisions. Accordingly, an additional analysis was performed to examine whether judgment differences were sufficient to influence the audit report decisions in this particular case setting. Table 6 reports the recommended audit opinion of participants in each of the four treatment conditions, both at the initial stage (Task 1) of the experiment, and after reviewing all six additional items of information (the conclusion of Task 2). Since none of the groups or individuals selected the “disclaimer of an opinion” recommendation at any point in the experiment, the audit opinion variable was binary. At the initial point, individuals are no more likely to recommend a modified opinion (2 = 0.92, p > 0.50). However, individuals show a stronger tendency to switch to a modified opinion during the course of the case. When final decisions are considered, individuals are more likely than groups to recommend a An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 45 Table 6. Additional Analyses: Audit Opinion by Treatment Conditions. Panel A: Recommended Audit Opinion Unit Individual Individual Groups Groups Order CCCMMM MMMCCC CCCMMM MMMCCC N Initial Opinion Final Opinion Unqualified Modified Unqualified Modified 13 15 11 10 8 11 9 7 5 4 2 3 4 2 6 4 9 13 5 6 49 35 14 16 33 Panel B: Opinion Chosen vis-à-vis Opinion Indicated by Threshold Opinion According to Threshold Opinion Chosen Initial Final Unqualified Modified Unqualified Modified Individuals Unqualified Modified 18a 1 2 7a 4a 2 3 19a Groups 14a 2 – 5a 9a 1 – 11a a Indicates Unqualified Modified agreement between threshold and actual opinion issued. modified opinion (2 = 5.029, p < 0.05). In results not shown, individuals in the CCCMMM condition tended to recommend more unqualified and fewer modified opinions than individuals in MMMCCC condition at the end of the experiment. This comparison, however, is not significant (2 = 2.24, p > 0.05). A comparison of the distribution of final recommended opinions to the distribution of initial opinions shows that 4 of 8 individuals in the CCCMMM group changed their recommendation from unqualified to modified, while 9 of 11 in the MMMCCC condition changed from unqualified to modified. A much less severe pattern existed for groups. Only 6 of 21 groups (3 in each order condition) changed their recommendation from unqualified to modified. However, neither of these comparisons is significant (2 = 0.962, p > 0.05 and 2 = 0.829, p > 0.05 for individuals and groups, respectively). Contrary to the expected effect of recency on audit opinions, the number of modified opinions increased in both individual and group CCCMMM conditions. Although revisions of belief toward modified opinions may align with the aforementioned heightened sensitivity of auditors to adverse news, these results also suggest possible differences between binary (unqualified, modified) and continuous (percentage probability) outcomes.10 46 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY SUMMARY AND DISCUSSION A considerable discrepancy seems to exist between decision-making in auditing practice and its academic study. Whereas audits are group efforts that utilize the contributions of many differently situated individuals, academic research has been mostly the study of autonomous individuals. If groups are different than individuals, aggregating the latter to form implications about the work of the former may not be appropriate. Many studies of auditors suggest that bias and inconsistency exist in individual judgments. However, interventions that increase individuals’ cognitive effort and engagement may reduce bias. Extensive research on decision-making suggests that forming groups is an effective means of increasing the cognitive effort and engagement of individuals. Accordingly, the grouping of auditors provides a means of simultaneously increasing our appreciation of the nature of decision-making and adding to the realism of the experimental evidence. The task used in this research involved assessments of the going concern status of a hypothetical audit client. In many ways, this may be an atypical judgment due to its extreme nature. However, the descriptive evidence suggested that adequate familiarity with the issue existed among the participants. The repetitivity with which this issue has been studied by auditing researchers also allows more direct comparisons to be made. From a practice standpoint, the need for a strong going concern evaluation cannot be denied. This paper provides empirical support for the proposition that group decision processes differ from those of individuals. The results suggest that when auditors work in groups, judgments are less likely to be influenced by the order in which evidence is received and evaluated. The recency effects reported in several experiments with audit practitioners and reproduced in this study with respect to individual decision-makers were not present for the same judgments that were made by groups. This study builds upon Kennedy’s (1993) and Cushing and Ahlawat’s (1996) work by explaining how other factors relevant to an auditing environment mitigate recency effects. Together, this line of work implies that recency effects may be overstated by studies that lack external validity. The results of this study imply that recency may be less onerous for the profession than others have suggested. If audit decisions are made in groups, less recency bias appears to be present. This study offers some interesting results regarding judgment confidence. Audit groups start with more confidence in their decisions than do individuals. Participants may intuitively appreciate the superior power of the collective power to make an informed evaluation, or may just appreciate the help that others provide when making difficult decisions. Faced with post hoc inconsistent evidence, An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 47 groups tend to sustain, but not significantly increase, their confidence advantage over individuals. This suggests that the advantages of the group mode in an audit setting occur early in the deliberative process. The fact that the confidence of groups did not increase over time also may indicate that this collective mode is not necessarily prone to overconfidence. The results suggest that one of the main differences that groups may offer is their willingness to reduce extreme reactions to particular pieces of information that push toward extreme solutions. In the going concern situation, further evidence of financial distress would logically make the going concern question more salient. However, the contribution toward this conclusion for groups is relatively small. Groups appear to be more willing to suspend judgment or to put each additional piece of information in a broader context. Individuals demonstrate more sensitivity to “bad” news by making larger belief revisions. This difference between groups and individuals is not observed for information that tended to lessen the going concern problem. Individuals did not react more strongly to facts that suggested that the hypothetical business would remain financial viable. Further research is needed to test possible reasons that the two decision units processed good news and bad news differently. The results should redirect the attention of auditing organizations and academic accountants to group dynamics. Groups appear to process information in ways less affected by its order. Groups are also more confident about decisions and less likely to overreact to “bad” news about a client. Auditing firms should be comfortable about the ability of groups to avoid recency bias but be somewhat concerned about the tendency to perhaps react too little to going concern issues. In light of recent sudden corporate bankruptcies, the latter tendency needs to be guarded against. This research did not attempt to evaluate the importance of degrees of confidence. The superior confidence of groups does not necessarily imply that groups made more technically correct decisions about the going concern status of the hypothetical client. This hypothetical nature of the client prevents any proof of superiority. A necessary prelude to the confidence that constituents might have about auditing outcomes is the confidence that auditors themselves have in auditing inputs. Nonetheless, subsequent research should be directed at the specific value of confidence in auditing judgments. The findings of this study are subject to certain limitations. One stems from the unavailability of data regarding the extent to which group members actually had experience working together on previous engagements. The effectiveness of group processes may depend on such experience, as individuals learn to systematically respect or discount the judgments of others. The importance of working histories of groups may not be as high in auditing as in other business settings. As firms get larger and centralize control over their human resources, 48 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY individual assignments become less predictable and stable. No attention was given to hierarchical differences within the participants that were assigned to groups. In the attempt to ensure sufficient going concern expertise, auditors of different ranks were mixed in the groups. No evidence exists on the question of whether participants of higher rank dominated group decisions. A more systematic attempt to isolate the power of more highly ranked individuals would have been necessary to shed light on this question.11 Another potential limitation stems from the fact that auditors in the group condition are more experienced than auditors in the individual condition. Although the groups also included auditors with lesser experience than those that worked individually, an experience effect may have resulted if the more experienced group member dominated the group decisions. NOTES 1. The professional nature of the work mitigates the fact that these groups often consist of individuals at different levels within the organization. However, the empirical regularities created by this professionalism need further investigation. 2. The expected ability of groups to make better-informed decisions does not take into account situations where individuals first make judgments and then enter groups for the reevaluation of the decision. This may cause groups to move towards more extreme positions, as shown by Marxen (1990). 3. Group confidence might be lowered by cases where individuals strongly disagreed with group positions. Therefore, the expectation that group confidence will be higher than individual confidence implicitly asserts that these situations will be rare. This study does not measure the degree to which satisfaction is related to confidence. 4. Group composition could be very important to the dynamics of group decisionmaking. Since this research could not tightly control the composition dimension, interpersonal issues such as charisma and persuasiveness could not be measured. On more objective dimensions such as experience and rank, a suitable mixture of people was achieved. See Table 1 and the discussion of participants in the Results section. 5. The researcher did not inquire about the decision processes of the groups after the experiment was completed. Investigating this in a way that did it justice would require another study. 6. This choice on group composition creates an alternative interpretation about the extent of influence lower level employees can have on higher ones. See Graen and Uhl-Bien (1995). 7. The measure J0 –J6 was also examined in raw change terms. Since no differences in the substantive results occurred, these were not shown. 8. Other tests were conducted to clarify the interpretation of the results presented in Table 3. An ANCOVA with experience as the covariant (p > 0.05) was considered. A significant order/decision-unit interaction (F = 4.674, p < 0.05) again resulted. This suggests that these findings are not attributable to an experience effect. Another analysis used J6 as the dependent variable, J0 as the covariant, and order and decision-unit as the independent variables. This model captures belief revision in a different way by more explicitly controlling for the initial anchoring point (J0 ). It also shows results similar An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 49 to those that are reported above. Specifically, the interaction between order effects and decision-unit was significant (F = 4.35, p < 0.05). Another covariant that could be important is the threshold for substantial doubt. The point at which the decision-maker is confronted with a reportable going concern issue may present a matter independent from the quantifiable belief revision variable. Using the probability estimate for this general threshold specifically collected from the participants in Task 1 as a covariant, the order effect/decision unit interaction term was again significant (F = 4.89, p < 0.05). The results suggest the acceptance of the first hypothesis. Audit groups making going concern decisions are less prone than individual auditors to recency effects. 9. As shown in Panel B of Table 4, this relationship was also analyzed using t-tests. The results show that the difference between the final confidence of individuals (71.25) and groups (80.23) was significant (t = 2.25, p < 0.03). This results is consistent with the expectation in H2. 10. The bottom portion of Table 6 reports whether the participants’ recommended opinions were consistent with the final probability ratings and (J6 ) with their initial threshold judgment provided at the beginning of Task 1, apart from the consideration of case materials. An auditor’s opinion type decision was considered consistent if the likelihood rating was below the threshold judgment, and a modified report was chosen. Alternatively, consistency could also be achieved with the recommendation that the opinion be unqualified if likelihood was above the given threshold. Table 6 reports the results of these comparisons. In total, only 7% (3 of 42) group recommendations of audit opinions were inconsistent. A nearly twice as large 14% (8 of 56) of the individual recommendations were inconsistent. An even more telling process unfolds when initial and final likelihood positions are differentiated. Groups become more consistent to their original threshold over time. Initially, 90% of the groups are consistent. This increases to 95% consistency after the last piece of information has been processed. Individuals become less consistent. The percent of individuals that are consistent changes from 89 to 82% over the course of the decision-making. 11. Conversations with practitioners about this did not reveal any consistent practice. Some firms had a more hierarchical approach than others almost to the point of resting this decision on the engagement partner after the other auditors had collected the relevant information and suggested an outcome. Other firms had a more participatory process wherein the decision cascaded from the lower levels to the top. REFERENCES Allwood, C. M., & Granhag, P. (1996). Realism in confidence judgments as a function of working in dyads or alone. Organizational Behavior and Human Decision Processes, 64, 277–289. American Institute of Certified Public Accountants (1990). Statement on auditing standards No. 59: The auditor’s consideration of an entity’s ability to continue as a going concern. (AU 341) New York, NY: AICPA. Anderson, C. A., & Sechler, E. (1986). Effects of explanation and counter-explanation on the development and use of social theories. Journal of Personality and Social Psychology, 50, 24–34. Asare, S. K. (1992). The auditor’s going-concern decision: Interaction of task variables and the sequential processing of evidence. The Accounting Review, 67, 379–393. Ashton, A. H., & Ashton, R. (1988). Sequential belief revision in auditing. The Accounting Review, 63, 623–641. 50 SUNITA S. AHLAWAT AND TIMOTHY J. FOGARTY Bloomfield, R., Libby, R., & Nelson, M. (1996). Communication of confidence as a determinant of group judgment accuracy. Organizational Behavior and Human Decision Processes, 6, 287–300. Chow, C., McNamee, A., & Plumlee, D. (1987). Practitioners’ perceptions of audit step difficulty and criticalness: Implications for audit research. Auditing: A Journal of Practice and Theory, 6, 123–133. Church, B. (1991). An examination of the effect that commitment to a hypothesis has on auditors’ evaluations of confirming and disconfirming evidence. Contemporary Accounting Research, 7, 513–534. Church, B., & Schneider, A. (1993). Auditor generation of diagnostic hypotheses in response to a superior’s suggestion: Influence effects. Contemporary Accounting Research, 10, 333–350. Cushing, B., & Ahlawat, S. (1996). Mitigation of recency bias in audit judgment: The effect of documentation. Auditing: A Journal of Practice & Theory, 16, 134–146. Dillard, J. N., Kauffman, N., & Spires, E. (1991). Evidence order and belief revision in management accounting decisions. Accounting, Organizations and Society, 7, 619–633. Fisher, B. A., & Ellis, D. (1990). Small group decision-making: Communication and the group process. New York, NY: McGraw-Hill. Gibbins, M., & Emby, C. (1985). Evidence on the nature of professional judgment in public accounting. In: A. R. Abdel-khalik & I. Solomon (Eds), Auditing Research Symposium (pp. 181–212). Champaign, IL: University of Illinois. Graen, G. B., & Uhl-Bien, M. (1995). Relationship-based approach to leadership: Development of leader-member exchange (LMX) theory of leadership over 25 years: Applying a multi-level multi-domain perspective. Leadership Quarterly, 6, 219–247. Hill, G. W. (1982). Group versus individual performance: Are n + 1 heads better than one? Psychological Bulletin, 19, 517–539. Hogarth, R. M., & Einhorn, H. (1992). Order effects in belief updating: The belief adjustment model. Cognitive Psychology, 24, 1–55. Kennedy, J. (1993). Debiasing audit judgment with accountability: A framework and experimental results. Journal of Accounting Research, 31, 231–245. Luus, C. A. E., & Wells, G. (1994). The malleability of eyewitness confidence: Co-witness and perseverance effects. Journal of Applied Psychology, 79, 714–723. Marxen, D. (1990). A behavioral investigation of time budget preparation in a competitive audit environment. Accounting Horizons, 4, 47–57. McMillan, J., & White, R. (1993). Auditors’ belief revisions and evidence search: The effect of hypothesis frame, confirmation bias, and professional skepticism. The Accounting Review, 68, 443–465. Messier, W., & Tubbs, R. (1994). Mitigating recency effects in belief revision: The impact of audit experience and the review process. Auditing: A Journal of Practice & Theory, 14, 57–72. Miner, F. (1984). Group versus industrial decision-making: An investigation of performance measures, decision strategies and process. Organizational Behavior and Human Performance, 39, 112–124. Myers, D., & Lamm, H. (1976). The group polarization phenomenon. Psychological Bulletin, 82, 602–627. Newman, D. (1980). Prospect theory: Implications for information evaluation. Accounting, Organizations and Society, 5, 217–230. Pei, B. K., Reed, S., & Koch, B. (1992). Auditor belief revisions in a performance auditing setting: An application of the belief-adjustment model. Accounting, Organizations, and Society, 17, 169–183. An Analysis of Group Inﬂuences on Going Concern Auditor Judgments 51 Reckers, P. M. J., & Schultz, J. (1993). The effect of fraud signals, evidence order, and group-assisted counsel on independent auditor judgment. Behavioral Research in Accounting, 5, 124–144. Schultz, J. J., & Reckers, P. (1981). The impact of group processing on selected audit disclosure decisions. Journal of Accounting Research, 19, 482–501. Sniezek, J. A., & Henry, R. A. (1989). Accuracy and confidence in group judgment. Organizational Behavior and Human Decision Processes, 43, 1–28. Sniezek, J. A., & Henry, R. (1990). Revision, weighting, and commitment in consensus group judgment. Organizational Behavior and Human Decision Processes, 45, 66–84. Solomon, I. (1987). Multi-auditor judgment/decision-making research. Journal of Accounting Literature, 6, 1–25. Stasser, G. (1992). Information salience and the discovery of hidden profiles by decision-making groups? A “thought experiment”. Organizational Behavior and Human Decision Processes, 52, 156–181. Stasser, G., & Davis, J. (1981). Group decision-making and social influence: A social interaction sequence model. Psychological Review, 88, 523–551. Stasser, G., & Titus, W. (1985). Pooling of unshared information in group decision-making: Biased information sampling during discussion. Journal of Personality and Social Psychology, 48, 1467–1478. Tetlock, P. (1983). Accountability and the perseverance of first impressions. Social Psychology Quarterly, 46, 285–292. Trotman, K., & Wright, A. (1996). Recency effects: Task complexity, decision-mode, and task-specific experience. Behavioral Research in Accounting, 8, 175–193. Tubbs, R., Messier, W., Jr., & Knechel, W. (1990). Recency effects in the auditor’s belief-revision process. The Accounting Review, 65, 452–460. Valacich, J. S., & Schwenk, C. (1995). Devil’s advocacy and dialectical inquiry effects on face-to-face and computer-mediated group decision-making. Organizational Behavior and Human Decision Processes, 63, 158–173. Vance, R., & Biddle, T. (1985). Task experience and social cues: Interactive effects on attitudinal reaction. Organizational Behavior and Human Performance, 35, 252–265. Weiss, H., & Shaw, J. (1979). Social influences in judgments about task. Organizational Behavior and Human Performance, 24, 126–140. White, S., Mitchell, T., & Bell, C. (1977). Goal setting, evaluation apprehension and social cues as determinants of job performance and job satisfaction in a simulated organization. Journal of Applied Psychology, 52, 665–673. Winquist, J., & Larson, J. (1998). Information pooling: When it impacts group decision-making. Journal of Personality and Social Psychology, 74, 371–378. Wright, E., Luus, C., & Christie, S. (1990). Does group discussion facilitate the use of consensus information in making causal attribution? Journal of Personality and Social Psychology, 59, 261–269. Zarnoth, P., & Sniezek, J. (1997). The social influence of confidence in group decision-making. Journal of Experimental Social Psychology, 33, 345–367. INVESTIGATING ERROR PROJECTION AMONG STATE AUDITORS: THE IMPACT OF INTENTIONAL AND SYSTEMATIC MISSTATEMENTS John T. Reisch, Karen S. McKenzie and Alan H. Friedberg ABSTRACT This paper investigates state auditors’ decisions regarding the isolation or projection of sample misstatements to underlying sample populations. Seventy-eight state auditors completed four treatment cases that incorporate the complete 2 × 2 manipulation of intentional/unintentional and systematic/non-systematic misstatements in different case scenarios, enabling a test of the independent variables both across and within case scenarios. The results indicate that both across and within case scenarios, auditors tend to project systematic misstatements more often than they project non-systematic misstatements. However, the auditors’ isolation/projection decisions are generally not inﬂuenced by whether the sample misstatements are intentional or unintentional. Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 53–77 Copyright © 2003 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06003-4 53 54 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG INTRODUCTION In 2000, state and local governments in the U.S. generated over $1.2 trillion in revenues; they also spent over $1.1 trillion, accounting for over 9% of the U.S. gross domestic product (28% of gross domestic product when the federal government is included) (OMB, 2001). The magnitude of this economic activity accentuates the need for proper oversight of the sources and uses of the funds, including the audits of state and local governments. Despite the extent to which state and local government activity impacts the economy, relatively little behavioral auditing research has been conducted on the effectiveness and efficiency of the auditors employed by these entities. This study addresses the issue of auditor effectiveness by empirically testing the professional judgments of state auditors in a context-rich environment; specifically, it examines the subjective assessment of sample evidence by state auditors. Sampling is one area where the evaluation of evidence may be largely affected by subjective differences in auditors’ judgments. The auditing standards explicitly state that in a variables sampling context, the “auditor should project the misstatement results of the sample to the items from which the sample was selected” (AICPA, 2001, AU§350.26).1 However, in addition to the quantitative task of projecting sample misstatements, the standards also note that auditors should consider the qualitative aspects of the misstatements (AICPA, 2001, AU§350.27), and that the “actions that might be taken in light of the nature and cause of particular misstatements” is left to the discretion of the auditor (AICPA, 2001, AU§350.06). Thus, some discord exists as to whether misstatements should always be projected; and if not, under what conditions they should be isolated. If an auditor inappropriately isolates misstatements found in a sample, the likelihood of a non-representative, or biased, estimate of the account balance being tested increases. More specifically, failure to project sample misstatements generally results in an underestimation of the aggregate misstatement in the underlying population, thereby increasing the auditor’s risk of incorrect acceptance. In the case of state auditors, this implies a failure to satisfy an essential element of public control and accountability. The extent to which state auditors do not project sample misstatements of account balances and the potential consequences of inappropriately isolating misstatements is an important research topic. State auditors often conduct financial statement audits; the results of which are used in a variety of ways, including the allocation of resources among programs and personnel, monitoring compliance with fiscal laws, and even bond ratings. This study focuses on non-sampling risk,2 and extends existing literature in three ways. The first contribution is the Investigating Error Projection Among State Auditors 55 finding that systematic misstatements in sampling data significantly affect state auditors’ decisions to project misstatements to the account population, while the impact of intentional misstatements is generally not significant. Although the impact of intentional misstatements on auditors’ projection decisions has been indirectly investigated in prior studies (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997), the effect of systematic misstatements has not been previously examined. A second contribution of this study is the methodology used. Most other studies (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997) have tested for factors impacting auditors’ isolation/projection decisions using incomplete designs that do not isolate the effects of the variables in all combinations of the treatment variables.3 In this study, participants completed four separate cases representing the 2 × 2 manipulation of the two independent variables, intentional or unintentional misstatements and systematic or non-systematic misstatements. In addition, the study’s design centers on an important aspect of sampling research – the extent to which projection decisions are context specific; that is, how different case scenarios (e.g. inventory versus receivables) and types of misstatements can influence auditors’ decisions. The design utilized in this study allows for analysis of the data across treatments (the complete 2 × 2 manipulation) as well as within individual case scenarios because each participant completed four separate cases, each representing an individual cell in the 2 × 2 design. The third contribution of this study is the use of state auditors rather than external auditors to test whether auditors project or isolate misstatements found in samples. None of the prior studies investigating auditors’ evaluation of sample findings use governmental auditors in their empirical tests; thus, a secondary objective of the paper is to specifically investigate state auditors’ sample evaluation decisions. As Green (1992, p. 62) notes, “Applying generic psychological and/or informational processing theories fails to recognize that there are unique characteristics in governmental and non-profit settings that could have different (and perhaps contradictory) influences on behavior in those settings.” One cannot simply project research findings from the for-profit environment onto state government auditors. Thus, by using state auditors as study participants, this study extends the literature investigating auditors’ isolation/projection decisions to this important setting. A review of the existing literature investigating behavioral implications of audit sampling is presented immediately below, leading to the research hypotheses and experimental methodology. The results of the study are presented and analyzed in the penultimate section, followed by concluding comments, limitations of the study, and future research possibilities. 56 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG BACKGROUND AND HYPOTHESES DEVELOPMENT Several empirical studies (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997; Wheeler et al., 1997) have investigated the behavioral implications of auditors analyzing sample results. These studies focus on the potential biases auditors may have when sampling evidence. In particular, the studies find that auditors sometimes inappropriately isolate sample misstatements from the population being tested; that is, auditors may not project misstatements to the population. The manner in which auditors respond to sample data varies so much that it was “deemed a major problem” over a decade ago (Akresh & Tatum, 1988), and anecdotal evidence suggests little has changed since. A recent case filed by the SEC against two former Coopers & Lybrand auditors accused of negligence during an audit of California Micro Devices poignantly demonstrates the problem of evaluating sample evidence (MacDonald, 2000). According to the SEC, the auditors ignored serious issues raised from one-third of the returned accounts receivable confirmations. The SEC asserts that although the audit firm did require adjustments for those confirmations, the auditors did not investigate further to see whether other revenue problems existed. Apparently, the auditors did not project the results of the sampled confirmations to the overall receivable balance. To examine the potential problems auditors may have when evaluating sample results, the study investigates how two factors affect auditors’ isolation/projection decisions: whether the sample misstatements are intentional or unintentional, and whether the misstatements are systematic or non-systematic. The primary objective in testing these factors is to provide new insight into the biases that influence auditors’ decisions when deciding whether to isolate a sample misstatement or project it to the underlying population from which it was drawn. Testing the two factors concurrently fulfills a call by Dusenbury et al. (1994) for finer partitioning of misstatements in researching auditors’ projection decisions. The Uniqueness of Misstatements The propensity of auditors to isolate rather than project sample misstatements can occur when auditors assume that the misstatements do not exist elsewhere in the population. One explanation for the lack of projection is that the auditors view the misstatements as being unique or unusual and, therefore, not truly representative of the underlying population being tested. Empirical evidence indicates that the uniqueness perception of misstatements is highly significant in determining Investigating Error Projection Among State Auditors 57 whether auditors isolate misstatements found in sample data (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997; Wheeler et al., 1997). Burgstahler and Jiambalvo (1986) suggest that the positive correlation between the projection of sample misstatements and the auditors’ perception of similarity of the misstatement to other errors might be explained by the representative heuristic (Kahneman & Tversky, 1972). According to Kahneman and Tversky, individuals make inferences about a situation by comparing it to attributes of similar situations the individual has encountered previously. When applied to a sampling context involving a misstatement, the theory predicts that an auditor will compare the characteristics of the sample error to characteristics of some prototypical error(s) the auditor has categorized in memory. Thus, errors lacking the prototypical characteristics of other, more common errors are more likely to be determined unique by the auditor. Taking this one step further, when a sample error is perceived as unique, the auditor may think that similar errors are unlikely to exist in the underlying population and, therefore, will have a tendency to isolate the error. Conversely, errors that occur more commonly will have a greater likelihood of being projected since the auditor will recognize characteristics of the misstatement when compared to error attributes held in memory. The auditor will then determine that the sample error is not unique and could likely occur again in the population. The general notion of prototype matching explained by the representative heuristic is partially supported by Dusenbury et al. (1994) in their investigation of error containment. Containment is a process that involves a restratification strategy whereby certain qualitative characteristics associated with the initial sample misstatement are identified. All transactions in the population that meet the qualitative criteria are then segregated, ex post, into a narrow stratum. If no other misstatements are found after examining the entire stratum, the initial misstatement does not need to be projected to the rest of the population since the stratum has been thoroughly tested. Dusenbury et al. (1994) found that in the absence of containment information, less frequent errors were isolated more often than frequently occurring errors. However, when containment information was provided to the participating auditors, more frequent errors were isolated more often than less frequent errors, all of which involved irregularities. Intentional Nature of Misstatements A number of accounting studies report the incidence of misstatements, in general, to be low (Ashton, 1991; Libby, 1985; Libby & Frederick, 1990). Intentional misstatements (i.e. fraudulent misstatements) occur even less often. The representative heuristic might suggest that the low frequency of intentional 58 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG misstatements in audit populations would prompt auditors to isolate an irregularity rather than project it to the underlying population being tested. However, auditing standards suggest that irregularities or intentional misstatements warrant additional consideration when uncovered. In fact, the standards specifically state: Generally, an isolated, immaterial error in processing accounting data or applying accounting principles is not significant to the audit. In contrast, when fraud is detected, the auditor should consider the implications for the integrity of management or employees and the possible effect on other aspects of the audit (AICPA, 2001, AU§312.08). Thus, if an auditor uncovers an intentional misstatement in a sample, s/he may be more inclined to project the sample than if the discovered misstatement was unintentional. This premise is supported, in part, by Dusenbury et al. (1994), who found that in the presence of containment information, intentional and less frequently occurring errors were more likely to be projected than unintentional, more frequent misstatements. Dusenbury et al. (1994, p. 262) suggest that “The discovery of an irregularity, while normally rare and thus dissimilar to other errors, should induce additional caution (AU§350.27) and might result in higher projection rates.” Several studies (Anderson & Maletta, 1994; Ashton & Ashton, 1988; Kida, 1984; Trotman & Sng, 1989) have found that auditors place more importance on negative evidence (evidence that counters client assertions) than on positive evidence. Although auditors will likely view any sample misstatement as negative evidence, an irregularity might be considered more negative evidence than a similar, unintentional misstatement. If this assertion holds, it suggests that auditors would tend to be more conservative in their handling of an irregularity than they would an unintentional error. Thus, auditors may have more of an inclination to project an intentional sample misstatement to the population than they would an unintentional error. Intentional Versus Unintentional Misstatements Hypothesis Research (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997; Wheeler et al., 1997) indicates that external auditors generally have a tendency to isolate less frequently occurring errors. Given that intentional misstatements occur very infrequently, these research findings suggest that irregularities discovered in samples are likely to be isolated by auditors because the misstatements are perceived as being unique. Auditing standards and studies on negative evidence, however, suggest that intentional misstatements warrant more consideration than unintentional misstatements. In a sampling context, this additional consideration may involve the projection of a sample misstatement, Investigating Error Projection Among State Auditors 59 a more conservative approach than isolation; this is consistent with the findings of Dusenbury et al. (1994) who found that intentional misstatements, in the presence of containment information, were more likely to be projected than unintentional misstatements. Increased attention to intentional misstatements may be particularly warranted in the case of governmental auditors since generally accepted governmental auditing standards state that the threshold for audit risk may be lower in governmental audits than in audits of commercial entities (GAO, 1994, §4.9), and because various legal and regulatory requirements faced by governmental auditors may require reporting on any intentional misstatement, regardless of its materiality. Thus, the following research hypothesis is proposed: H1. The propensity of state auditors to project intentional sample misstatements to the underlying population being tested will be greater than their propensity to project unintentional misstatements. Systematic Versus Non-systematic Misstatements Hypothesis Misstatements, whether intentional or not, may occur systematically or nonsystematically. Systematic misstatements can be defined as those that are likely to be repeated because of some characteristic(s) associated with a transaction or class of transactions. Systematic misstatements may occur frequently or infrequently depending on the persistence of the underlying cause. However, the presence of a systematic error in an audit sample, by definition, implies that other errors may be present in the underlying population due to the same causal condition. Thus, normatively, systematic misstatements discovered in a sample should be projected to the entire population. The following research hypothesis related to auditors’ behavior in the evaluation of sample findings is proposed: H2. The propensity of state auditors to project systematic sample misstatements to the underlying population being tested will be greater than their propensity to project non-systematic misstatements. EXPERIMENTAL METHODOLOGY Experimental Task To test the hypotheses, a series of sampling cases (see Appendix) that incorporated the experimental manipulations was developed. These cases enabled both across scenarios and within scenario analysis. Burgstahler and Jiambalvo’s (1986) cases served as a basis for comparability with other studies; however, precise 60 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG replication of cases used in these other studies was not tractable given the manipulations employed. In addition, although comparisons can be made between the results of this study and other studies on sample projections, the pressures and incentives encountered by governmental auditors are believed to vary from those encountered by external auditors. Until research addresses environmental factors, perceptions of differences are all that distinguish the governmental auditor from the non-governmental auditor. The experimental instrument included four treatment cases, each representing a different account balance (e.g. sales and accounts receivable) and sampling situation; thus, we are able to capture the four treatments of the 2 × 2 design in which we manipulate two independent variables: (1) the type of misstatement as either intentional or unintentional (INT); and (2) the nature of the misstatement in terms of potential recurrence, operationalized as being either systematic or non-systematic (SYS).4 Each of the four cases should be projected according to the guidance provided by SAS No. 39, “Audit Sampling.” When noted in the cases, the employees responsible for the misstatements were intentionally kept at lower levels (e.g. clerical employees or warehouse workers) to reduce the saliency of the individuals involved, especially for manipulations containing intentional misstatements. The dollar amounts of the misstatements were made immaterial since most misstatements discovered by auditors are not individually material (Elder & Allen, 1998). In addition, keeping the materiality of the misstatements constant (i.e. immaterial) enhanced control over the manipulations being tested by minimizing potential confounding effects from the materiality of the misstatements. The cases used were pretested at a chapter meeting of the Institute of Internal Auditors. Most of the internal auditors in this chapter were governmental auditors, and no feedback was received that indicated a problem understanding any of the case scenarios. In addition, the results of the pretest suggested that the experimental manipulations worked as intended. This study is based on a repeated measure design in which each subject received every possible combination of the 2 × 2 manipulation of INT and SYS. Each combination was incorporated randomly into one of the four different treatment scenarios. Each case scenario in the experimental instrument was included on an individual page and participants were requested not to return to a scenario after it was completed. The presentation of the case scenarios was randomized to minimize potential order effects. After reading each scenario, participants made a decision as to whether they would project the sample to the account population being tested or isolate the sample result from the population. Subjects were then asked to complete a ten point Likert-type scale, which measured the comfort level of their decision. Subjects were instructed to consider each scenario independently and assume that the samples were selected at random from the populations being tested. Investigating Error Projection Among State Auditors 61 Fig. 1. Illustration of Data Analysis Across Cases (Direct Comparison of the Four Experimental Manipulations Without Taking the Individual Case Scenarios into Consideration). The Two Manipulated Variables were: (1) Intentional or Unintentional Misstatement; and (2) Systematic or Non-systematic Misstatement. The Four Cases Scenarios Involved Misstatements in Sales, Inventory, Receivables, and Unknown Receipts. The differences in the auditors’ decisions are analyzed both across all cases and by individual case, as illustrated in Fig. 1. The analysis across the four treatment cases tests the experimental conditions of the 2 × 2 manipulation of INT and SYS, with each subject receiving one each of the four conditions in the 2 × 2 design. In the analysis by individual case, all like cases (e.g. all sales scenarios, Case 1 in Appendix) received by the participating auditors are tested in the aggregate. A major difference between the experimental design used in this study and the research designs of most other studies investigating the isolation/projection decisions of auditors (e.g. Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997) is that in this study, the effects of the two independent variables are isolated in the four combinations of the 2 × 2 design. The other studies tested factors that affect auditors’ decisions by observing differences 62 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG across case scenarios. For example, auditors’ decisions regarding an infrequent sample misstatement in a sales scenario were directly compared to decisions about a more frequently occurring error in a case scenario involving inventory. Thus, while appropriate comparisons can be made between the cases, many extraneous variables that are not measured could also be affecting the results (e.g. different levels of audit risk may be associated with different types of accounts). The complete, multi-cell design of this study controls for potential confounding effects associated with the direct comparisons of different case scenarios because each case scenario in this study has four treatment combinations. In addition, this study provides a robust test of the effects of intentional and systematic misstatements on auditors’ decisions since the effects are also tested in four different case settings. Subjects A total of 100 experimental instruments were distributed to governmental auditors from ten different state audit departments, representing all regions of the United States. Seventy-eight were completed and returned. Department managers of the participating states administered the distribution of instruments to members of their audit departments, and collected and returned the completed instruments to the authors. Each participant was assured complete anonymity. As Table 1 indicates, 65 of the 78 participants (83%) had some professional certification (e.g. CGFM, CPA), with the CPA designation being the most common (62% of the participants). Further analysis shows that only seven participants without some professional certification are from states hiring entry level audit staff from a variety of educational backgrounds (i.e. the state auditors are not required to have an accounting education or minimum number of accounting hours). Thus, a maximum of 7 subjects may have been responding to the instrument as newly hired audit staff without accounting knowledge (8.9%).5 The appropriateness of the participating auditors for the experimental task was further indicated by their response to a question asking the auditors how frequently they use sampling procedures on audit engagements, with 80% being the mean response. In addition, only four participants indicated that they did not use sampling or they did not respond to this particular inquiry. Analyses of the data excluding these four auditors indicate no significant differences from the results reported below. At first glance, the case context might seem inappropriate for the governmental environment investigated (e.g. “it is company policy to . . .” and “a review of the company’s internal audit workpapers reveal . . .”); however, the context is appropriate because auditors of state or local government (SLG) financial Investigating Error Projection Among State Auditors 63 Table 1. Subject Profiles. Panel A: Frequencies Region of U.S.a Number of Subjects Central North Central Northeast Northwest South Southeast 15 11 7 15 15 20 Total 78 Professional Certification CPA CGFM CIA Other None Number of Subjectsb 48 21 3 7 13 Panel B: Means Item Mean Standard Deviation Frequency of sampling % of audit engagements 79.9 25.8 When sampling is used (1) Frequency of statistical sampling (%) (2) Frequency of judgmental sampling (%) 26.3 73.3 30.7 30.7 a States b Does categorized according to Webster’s College Dictionary (1991). not total 78 (the number of subjects) since some subjects hold more than one certification. statements must be knowledgeable of generally accepted accounting principles for governments, which require two different accounting models. “General government” type activities of SLGs employ a current financial resources measurement focus on a modified accrual basis, whereas “proprietary” type activities of SLGs employ an economic resources measurement focus on a full accrual basis. The latter activity type is very similar to private sector accounting, although within the realm of government controls. All state governments employ general government activities in the provision of a general basket of services to all constituents. Most states also employ proprietary activities, linking user fees (charges for services) to costs of services much like a private business enterprise. Some states rely on Public Benefit Corporations (PBC) to provide services the state would have accounted for as proprietary activity, but give audit responsibility over the PBC to state auditors. Each of the state audit offices selected for participation had audit responsibilities over such proprietary activities, whether the activities were part of the state proper or activities of a PBC. In addition, those activities included the focal issues of this study’s experimental cases (e.g. sales, receivables, and 64 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG inventory). The study emphasized the proprietary activities for comparability to prior studies, which have all targeted private sector oriented subjects. RESULTS AND ANALYSIS Manipulation Checks Prior to testing the hypotheses, manipulation checks were performed on INT and SYS to determine whether subjects interpreted the independent variables as intended. In the post-experimental questionnaire, subjects were asked to return to each case and indicate whether the case misstatement was: (1) intentional or unintentional; and (2) systematic or non-systematic. Subjects were asked to do this without changing their initial case responses, and there was no evidence of subjects changing their responses (e.g. erasures or overwritings). Across the four treatment cases, subjects were able to correctly identify the misstatement intention 94.2% of the time ( p < 0.0001 for all four treatment cases), and subjects agreed with the systematic manipulations 62.6% of the time ( p < 0.10 for Cases 2–4; p = 1.00 for Case 1 (sales)). Subjects did not interpret the manipulation of systematic misstatements in Case 1 as anticipated; only 49.3% of the subjects agreed with the intended manipulation of the sales scenario. The results of the systematic manipulation check do not render the experiment invalid; in fact, it actually enhances the findings. In testing the hypotheses, the analyses were run two ways: (1) using the independent variables INT and SYS in the models; and (2) replacing INT and SYS with the subjects’ assessments of intentional and systematic misstatements in the cases. This enabled interpretation of the effects of the variables on the isolation/projection decision from two angles. First, an analysis was performed on the auditors’ decisions based on our manipulations of the two independent variables. Second, an analysis was performed on the subjects’ decisions according to how they interpreted whether the misstatements were intentional and systematic.6 Analysis Across Treatment Cases The data are first analyzed across cases with all of the cases collapsed into a single group; that is, the individual case scenarios are ignored yielding a repeated measures design in which each subject receives all four treatment combinations of the 2 × 2 manipulation. Since each scenario is expected to elicit the same response, multiple scenarios are used so the results are not too dependent on any one particular Investigating Error Projection Among State Auditors 65 Table 2. Logistic Regression Results of Isolation/Projection Decisionsa (Across Treatment Cases). Independent Variable Parameter Estimate Odds Ratio Panel A: Using manipulated variables Intercept −0.6771 INT 0.2995 1.26 SYS 1.1864 3.28 Wald 2 10.71 1.59 24.90 p-Value 0.0011 0.2077 0.0001 Panel B: Using subjects’ assessments of manipulated variables Intercept −1.1578 24.30 0.0001 INTCK 0.3187 1.38 1.51 0.2186 SYSCK 1.9360 6.93 55.77 0.0001 Model 2 p-Value c 27.47 0.0001 0.660 64.28 0.0001 0.741 a The dichotomous dependent variable is the auditors’ decisions with regard to sample misstatement findings, defined as 0 = Isolate or 1 = Project. The independent variable INT is manipulated as an intentional or unintentional misstatement, and SYS is manipulated as a systematic or non-systematic misstatement. INTCK and SYSCK refer to the participants’ assessments of intentional and systematic misstatements. scenario. To the extent that one or more of the scenarios would produce a different response in the dependent variable (the decision of isolating or projecting the sample misstatement), the analysis would be biased against finding a significant result; thus, collapsing the scenarios into a single group is a conservative approach to the data analysis. Table 2 presents the results of the logistic regression with all of the cases collapsed into one group. Panel A shows the results using the manipulated variables INT and SYS. In Panel B, the manipulated explanatory variables INT and SYS have been replaced in the logistic regression model with INTCK and SYSCK, the subjects’ assessments of whether the misstatements were intentional or systematic, respectively. In both across treatment models, the chi-square statistics are significant at the 0.01 level, suggesting the models are good predictors of the auditors’ propensity to isolate or project sample misstatements. In addition, goodness of fit for the logistic regression models was obtained by the c statistic, which is somewhat analogous to the coefficient of multiple correlation (Kane et al., 1996).7 In both models, the c statistic is greater than 0.65. The analysis across treatments does not support H1 regardless of whether INT or INTCK is included in the regression models, indicating that no difference exists in the auditors’ isolation/projection decisions whether the sample misstatements are intentional or not. The independent variable SYS, which is used to operationalize the manipulation of systematic misstatements, is highly significant as indicated in Table 2. As a 66 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG result, H2 is supported; that is, auditors have more of a propensity to project systematic sample misstatements to the underlying population being tested than they do non-systematic misstatements. This is also supported by the odds ratio presented in the table. Since logistic regression provides the log of odds,8 the model also provides the odds of projecting the sample misstatements versus isolating the misstatements (Stokes et al., 1995). Across all treatment cases, participants were 3.28 times more likely to project systematic misstatements than non-systematic misstatements. The findings discussed above for the analyses across treatments strongly support H2, suggesting that state auditors have a greater propensity to project systematic misstatements than non-systematic misstatements. However, the rate at which systematic misstatements are isolated appears to be symptomatic of a lack of understanding of sampling, including the potential impact of not projecting a misstatement that may occur again in similar circumstances. As indicated in Panel A of Table 3, the participating auditors chose to isolate systematic misstatements 35% of the time. The isolation rate for systematic misstatements decreases somewhat based on the subjects’ assessments of whether the misstatements were systematic. As shown in Panel B of Table 3, the participating auditors isolated misstatements they deemed to be systematic 28% of the time. While this is a decrease over the isolation rate in Panel A, it poignantly indicates that auditors frequently isolate sample misstatements even if they believe the misstatements are apt to recur.9 The auditors’ decision frequencies in Table 3, Panels A and B, also show that across the treatment cases, the systematic/intentional manipulations had the largest number of projection decisions, while the non-systematic/unintentional manipulation generally had the least. In addition to testing for the effects of the independent variables in the logistic regressions across treatments, tests were performed for interactions in every model used in this study. Overall, interactive effects were not found to be significant. Tests were also conducted to determine whether the presentation order of the case scenarios, professional certification of the auditors, and task-related knowledge (measured by the frequency in which sampling is performed on engagements) had an impact on the auditors’ projection decisions. No significant findings were noted for any of these factors on any of the tests performed in this study. Analyses of Individual Cases Table 4 presents logistic regression results for each case treatment rather than for a single aggregate group across the treatment cases. Each case and each treatment Investigating Error Projection Among State Auditors 67 Table 3. Projection/Isolation Decision Frequencies. Panel A: Using manipulated variables INT manipulation Intentional Unintentional Total SYS manipulation Systematic Non-systematic Total INT and SYS manipulations Systematic Intentional Unintentional Non-systematic Intentional Unintentional Total Projected Isolated 85 (54%) 75 (48%) 71 (46%) 81 (52%) 160 (51%) 152 (49%) 102 (65%) 58 (37%) 54 (35%) 98 (63%) 160 (51%) 152 (49%) 51 (65%) 51 (65%) 27 (35%) 27 (35%) 34 (44%) 24 (31%) 44 (56%) 54 (69%) 160 (51%) 152 (49%) Panel B: Using subjects’ assessments of manipulated variables INTCK manipulation Intentional 88 (55%) Unintentional 68 (46%) Total SYSCK manipulation Systematic Non-systematic Total INTCK and SYSCK manipulations Systematic Intentional Unintentional Non-systematic Intentional Unintentional Total 73 (45%) 79 (54%) 156 (51%) 152 (49%) 113 (72%) 39 (27%) 44 (28%) 106 (73%) 152 (50%) 150 (50%) 65 (77%) 48 (66%) 19 (23%) 25 (34%) 20 (27%) 19 (27%) 54 (73%) 52 (73%) 152 (50%) 150 (50%) Number (%) of projection and isolation decisions across treatment cases. 68 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG Table 4. Logistic Regression Results of Isolation/Projection Decisionsa (by Individual Case). Independent Variable Parameter Estimate Panel A: Case (sales) Using manipulated variables Intercept −1.0627 INT −1.3655 SYS 2.1975 Odds Ratio 0.26 9.00 Using subject’s assessments of manipulated variables Intercept −0.6399 INTCK −0.7632 0.47 SYSCK 1.2085 3.35 Panel B: Case 2 (inventory) Using manipulated variables Intercept −2.6914 INT 2.5369 SYS 3.0749 12.64 21.65 Using subject’s assessments of manipulated variables Intercept −3.1318 INTCK 2.1819 8.86 SYSCK 3.2703 26.32 Panel C: Case 3 (receivables) Using manipulated variables Intercept −0.1947 INT 0.4356 SYS 0.7499 1.55 2.12 Using subject’s assessments of manipulated variables Intercept −0.6816 INTCK 0.1420 1.15 SYSCK 1.6502 5.21 Panel D: Case 4 (unknown receipts) Using manipulated variables Intercept −0.2045 INT 0.4865 SYS 0.4549 1.63 1.58 Using subject’s assessments of manipulated variables Intercept −1.0347 INTCK 0.5288 1.70 SYSCK 1.8864 6.60 a The Wald 2 p-Value Model 2 p-Value c 4.79 5.82 13.41 0.0287 0.0158 0.0003 20.72 0.0001 0.778 2.10 2.24 5.63 0.1475 0.1345 0.0177 9.58 0.0083 0.696 11.95 10.21 14.81 0.0005 0.0014 0.0001 33.87 0.0001 0.827 15.25 8.41 20.99 0.0001 0.0037 0.0001 40.42 0.0001 0.872 0.24 0.78 2.43 0.6207 0.3766 0.1164 3.01 0.2225 0.610 2.65 0.07 9.75 0.1475 0.1345 0.0177 11.61 0.0030 0.700 0.29 1.09 0.95 0.5913 0.2974 0.3291 2.33 0.3113 0.596 5.25 1.03 13.13 0.0220 0.3107 0.0003 15.80 0.0004 0.747 dichotomous dependent variable is the auditors’ decisions with regard to sample misstatement findings, defined as 0 = Isolate or 1 = Project. The independent variable INT is manipulated as an intentional or unintentional misstatement, and SYS is manipulated as a systematic or non-systematic misstatement. INTCK and SYSCK refer to the participants’ assessments of intentional and systematic misstatements. Investigating Error Projection Among State Auditors 69 was completely randomized in the experiment to minimize any potential order effects. The logistic regression results for the sales scenario (Case 1) are located in Panel A of Table 4. The manipulation INT is significant ( p < 0.05), but unlike the other three treatment cases, the auditors had slightly more of a propensity to isolate the intentional sales misstatement and project the unintentional misstatement. However, the subjects’ assessments of whether the misstatement was intentional, INTCK, is not significant. Both the systematic manipulation (SYS) and the subjects’ assessment of whether the misstatement was systematic (SYSCK) are significant in the Case 1 regressions ( p < 0.01 and 0.05, respectively). The odds ratios indicate a 9.00 and 3.35 times more likelihood of projecting a systematic sales misstatement than a non-systematic misstatement when the variables SYS and SYSCK are used, respectively. For the inventory scenario (Case 2), both manipulated variables INT and SYS are significant ( p < 0.01) as shown in Panel B of Table 4, as are the subjects’ assessments of intentional and systematic misstatements, INTCK and SYSCK. The second case fits the model the strongest for both the manipulated and subjects’ assessed variables, as evidenced by high odds ratios for the intentional and systematic variables, the large model chi-square statistics, and the relatively high c statistics (above 0.825). Using the subjects’ assessments of the manipulated variables for the inventory scenario, the results indicate that systematic misstatements are over 26 times more likely to be projected that non-systematic misstatements, while intentional misstatements are nearly nine times likely to be projected than unintentional misstatements. In Case 3, the receivables scenario, neither INT not INTCK are significant in their respective models (Panel C of Table 4). Of the two systematic misstatement variables, only SYSCK is significant, as evidenced by a p-value of 0.0117 and an odds ratio indicating that systematic misstatements are over five times more likely to be projected than non-systematic misstatements. Panel D of Table 4 shows the results for the unknown receipts scenario (Case 4). Regression results indicate that none of the factors are significant for the model containing the manipulated variables INT and SYS. This may be attributable to a weak fit of the model for the unknown receipts scenario (model 2 = 2.33, c = 0.596). When the subjects’ assessed variables are included in the model (INTCK and SYSCK), the model fits the data much better (model 2 = 15.80, c = 0.749) and the results are more meaningful. In this model, only SYSCK is significant ( p < 0.01). The odds ratio for the auditors’ assessment of a systematic misstatement indicates that misstatements of the unknown receipts are over six times more likely to be projected than an assessed non-systematic misstatement. 70 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG Overall, the analysis of the individual case scenarios indicates that auditors’ isolation/projection decisions are not significantly affected by whether sample misstatements are intentional or not; thus, the within case analyses does not support H1. In addition, the results suggest that auditors are more likely to project systematic sample misstatements to the underlying population than they are non-systematic misstatements. While the finding is largely applicable in the logistic regression models containing the variable SYS, the results are even stronger when the subjects’ assessments of systematic misstatements, SYSCK, is included in the regression models. Discussion of Findings Across case treatments, intentional misstatements were not projected by the auditors more frequently than they were isolated, failing to support H1. Tests performed using the within case analyses also suggest that overall, the auditors’ isolation/projection decisions are not influenced by whether or not the sample misstatements were intentional. The finding that intentional misstatements are generally not isolated contradicts the representative heuristic (Kahneman & Tversky, 1972) and prior empirical evidence that suggests the uniqueness perception of misstatements is highly significant in auditors’ decisions to isolate or project misstatements (e.g. Burgstahler & Jiambalvo, 1986; Hermanson, 1997). In practice, intentional misstatements occur infrequently relative to unintentional errors. Application of the representative heuristic suggests that intentional misstatements should be isolated more than they should be projected because auditors, having seen few, if any, intentional misstatements, may not have a category of irregularity attributes in memory. Authoritative standards appear to counteract the heuristic; generally accepted governmental auditing standards directs the state auditors to “. . . design the audit to provide reasonable assurance of detecting material misstatements resulting from noncompliance with provisions of contracts or grant agreements that have a direct and material effect on the determination of financial statement amounts” (GAO, 1994, §4.13). In addition, government auditors apply the AICPA’s generally accepted auditing standards, including SAS No. 99, “Consideration of Fraud in a Financial Statement Audit.” The analyses performed both across and within case treatments suggest that auditors tend to project systematic misstatements more often than they isolate them, providing support for H2. While this finding was highly significant for the analyses using the systematic manipulation (SYS), the results were even stronger when tests were run based on the subjects’ perceptions of systematic Investigating Error Projection Among State Auditors 71 misstatements (SYSCK). This is evidenced by the overall higher odds ratios and Wald 2 values in Tables 2 and 4. CONCLUDING COMMENTS In this study, two factors posited to affect governmental auditors’ sample projection decisions were tested, whether sample misstatements are intentional and/or systematic. The study’s research design allowed for the testing of these two independent variables both across case scenarios and within case scenarios. Prior studies (Burgstahler & Jiambalvo, 1986; Dusenbury et al., 1994; Hermanson, 1997) found evidence of factors affecting the auditors’ decisions as to whether or not sample errors should be projected to the population from which they were drawn, however, those findings were aggregated across cases and the impact of the factors were not examined within specific case scenarios (i.e. the studies did not manipulate variables within case scenarios). In analyses performed both across and within case treatments, the results of the study indicate that the states auditors did not generally project intentional misstatements more frequently than unintentional misstatements. However, the results suggest that auditors’ isolation/projection decisions are significantly influenced by whether or not the sample misstatements were systematic; specifically, auditors tend to project systematic misstatements more often than they isolate them, providing support for H2. This study also breaks new ground by bringing state auditors into the existing research performed on auditors’ projection decisions regarding the evaluation of sample findings. No other study investigating auditors’ evaluation of sample findings use governmental auditors in their empirical tests. Prior research has focused exclusively on external auditors whose pressures and incentives are perceived to differ from those of the state auditors that participated in this study. For example, external auditors have greater litigation risk than do governmental auditors, while governmental auditors may have lower thresholds for audit risk and materiality due to various legal and regulatory reporting requirements. One unexpected finding of the study is the frequency with which the experimental cases were isolated, especially considering that every manipulation in each of the four treatment cases, normatively, should have been projected. Across all four treatment cases, auditors isolated 49% of the sample misstatements. Even when auditors perceived the misstatements to be systematic, approximately one-quarter of the sample misstatements were not projected to the underlying populations from which they were drawn. This finding may indicate that state auditors do not 72 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG fully understand audit sampling, including how noted misstatements should be projected to the underlying sample population. The inappropriate isolation of sample misstatements may impair the effectiveness of state auditors, which in turn could adversely impact the allocation of scarce resources by various state agencies. If a state auditor fails to project sample misstatements, the likelihood of a biased estimate of the account balance being tested increases, which could result in an underestimation of the aggregate misstatement in the population. A better understanding of biases present in sampling environments resulting from this study may lead to improved judgments in governmental auditors’ projection decisions, and perhaps, more informed decisions by lawmakers based on higher quality financial information. Limitations and Extensions A limitation of this study is the manner in which the instruments were distributed to the subjects. As noted previously, the instruments were sent to the audit directors of the participating state audit departments which could have prevented a random distribution of the instrument to the state auditors in each location; that is, the audit directors may have selected the most diligent auditors in the office to complete the task rather than distributing the instrument randomly across all auditors in the office, limiting the external validity of the study. To reduce potential confounding factors, subjects were told that none of the misstatements presented in the case scenarios were material, and subjects were faced with a dichotomous decision task – to project or isolate the sample misstatements. The participants were not given the opportunity to contain the misstatements. The containment process has been posited as an explanation for choosing isolation over projection of misstatements (Dusenbury et al., 1994; Wheeler et al., 1997), and is a common practice among external auditors (Elder & Allen, 1998). Although this study was not designed to test containment effects, very few of the participants’ comments suggested their decisions were based on perceived containment, or lack thereof. Of all the subjects’ comments, only a few expressed a desire for containment information. The lack of options available to the subjects may have weakened the generalizability of the study, and leaves an avenue open for future research. Although the manipulation of the systematic misstatements for the four cases are likely to be repeated because of some characteristic(s) associated with a transaction or class of transactions, they are not operationalized in the same manner. For example, in Case 2 (inventory), the systematic manipulation is operationalized as a control over inventory by whether or not there were past inventory problems, whereas in Case 3 (receivables), the systematic manipulation is also a control issue, Investigating Error Projection Among State Auditors 73 but it involves a computer system malfunction. The lack of uniformity in the operationalization of systematic misstatements is a weakness of the study. However, the results were analyzed using both the initial manipulations of systematic misstatements and the participating auditors’ assessments of whether the misstatements were systematic, and in both analyses, the systematic manipulations are almost all highly significant in explaining the auditors’ isolation/projection decisions. Future research could address how auditors recognize and interpret the systematic nature of misstatements and how that affects the auditors’ decision processes. The case scenarios were set up in random order to minimize potential order effects. Once the order of the scenarios had been selected for each participant, a specific manipulation of the two independent variables was assigned to each of the four treatment cases in a manner that insured every participant received each of the four combinations of the 2 × 2 design (as illustrated in Fig. 1). While the process of randomizing the research instrument in this manner should have minimized any order effects, our ability to test for order effects was limited given that 28 different combinations of the research instruments were distributed. Tests conducted that compared the results of the different instrument combinations did not indicate the presence of any order effects. In addition, ad hoc measures were developed that compared the decisions among the different instruments in multiple ways (e.g. compared the results based on which the case was presented first without regard to the remaining order). These tests are admittedly imprecise; however, no effects resulting from the order of the case presentation were noted and the randomization of both the cases and treatment combinations should have minimized potential order effects. Nevertheless, the low power of the tests for order effects is a limitation of the study. Finally, the use of state auditors as the subject pool limits the comparability of this study to others that used non-governmental auditors as subjects. While both governmental and non-governmental auditors must decide whether to isolate or project sample misstatements to the population being evaluated, the experimental manipulations may have affected the state auditors’ isolation/projection decisions differently than they would have affected non-governmental auditors. Future research should investigate the differences in audit environments between governmental and non-governmental employers and the impact of those differences on the actions of the auditors. NOTES 1. Generally accepted governmental auditing standards (GAGAS) incorporate AICPA standards relevant to financial statement audits unless the General Accounting Office (GAO) excludes them by formal announcement (GAO, 1994, p. 32). 74 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG 2. Auditing standards (AICPA, 2001, AU§350.11) divide the risk that a sample may be non-representative of the population into sampling risk and non-sampling risk. Sampling risk is the inherent risk of sampling that arises simply because less than the entire population is examined. Non-sampling risk consists of risks not due to the sample selected but instead involves risks associated with evaluating the sample, such as an auditor’s failure to recognize exceptions in the sample selected, and the auditor’s inappropriate or ineffective application of audit procedures. 3. Only one study conducted on auditors’ isolation/projection decisions (Wheeler et al., 1997) used a complete design. They used a full 3 × 2 design to test the impact of containment information on auditors’ sampling decisions and did not test either factor (intentional or systematic misstatement) investigated in this study. In addition, Wheeler et al. used a single case scenario in their study whereas we use four different case scenarios. 4. In this study, the delineation between systematic and non-systematic may be more precisely described as “more systematic” and “less systematic,” because almost every misstatement will have certain characteristics that could be construed as systematic. 5. Analyses of the data excluding the seven auditors who may potentially lack a background in accounting indicate no significant differences from the reported results. 6. Because the manipulation of whether a misstatement is intentional is rather welldefined, analyses were also conducted that excluded the participants that initially failed the INT manipulation check. The results were substantially similar to those presented in the paper. 7. The c statistic is derived by comparing the number of paired responses (of observed and predicted responses) in the data set. It is defined by the equation: c = (nc + 0.5(t − nc − nd))/t, where t is the total number of pairs with different responses, nc is the number of concordant response pairs, nd is the number of discordant response pairs, and t − nc − nd is the number of ties between the response pairs. 8. The odds ratio is calculated by exponentiating the parameter estimates (variable coefficients) using the natural log (Stokes et al., 1995). For example, if the parameter estimate is 1.2528, then the odds ratio is 3.50 (e1.2528 = 3.500). 9. The data were also analyzed using repeated measures analysis of variance (ANOVA) by combining the auditors’ isolate/project decision with the comfort of their decision. The resulting analysis yielded very similar results to the logistic regression presented. ACKNOWLEDGMENTS We appreciate the helpful comments of Richard Dusenbury, Randy Elder, David Gilbertson, Julia Higgs, Bill Hopwood, Dennis O’Reilly, Steve Wheeler, participants at the 1999 Southeast Regional AAA and 2000 Auditing Section Midyear meetings, two anonymous reviewers, and the editor. REFERENCES Akresh, A., & Tatum, K. (1988). Audit sampling – dealing with the problems. Journal of Accountancy (December), 58–64. Investigating Error Projection Among State Auditors 75 American Institute of Certified Public Accountants (AICPA) (2001). AICPA Professional Standards as of June 30th, 2000 (Vol. 1). New York, NY: AICPA. Anderson, B. H., & Maletta, M. (1994). Auditor attendance to negative and positive information: The effect on experience-related differences. Behavioral Research in Accounting (6), 1–20. Ashton, A. H. (1991). Experience and error frequency knowledge as potential determinants of audit experience. The Accounting Review (April), 218–239. Ashton, A. H., & Ashton, R. H. (1988). Sequential belief revision in auditing. The Accounting Review (October), 623–641. Burgstahler, D., & Jiambalvo, J. (1986). Sample error characteristics and projection of error to audit populations. The Accounting Review (April), 233–248. Dusenbury, R., Reimers, J., & Wheeler, S. (1994). The effect of containment information and error frequency on projection of sample errors to audit populations. The Accounting Review (January), 257–264. Elder, R. S., & Allen, R. D. (1998). An empirical investigation of the auditor’s decision to project errors. Auditing: A Journal of Practice and Theory (Fall), 71–87. General Accounting Office (GAO) (1994). Government Auditing Standards: 1994 Revision. Washington, DC: Comptroller General of the United States. Green, S. L. (1992). Behavioral research in governmental and nonprofit accounting: An assessment of the past and suggestions for the future. Research in Governmental and Non-proﬁt Accounting (7), 53–78. Hermanson, H. M. (1997). The effects of audit structure and experience on auditors’ decisions to isolate errors. Behavioral Research in Accounting, Suppl. (9), 76–93. Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology (July), 430–454. Kane, G. D., Richardson, F. M., & Graybill, P. (1996). Recession-induced stress and the prediction of corporate failure. Contemporary Accounting Research, 13(2), 631–642. Kida, T. (1984). The impact of hypothesis-testing strategies on auditors’ use of judgment data. Journal of Accounting Research (Spring), 332–340. Libby, R. (1985). Availability and the generation of hypotheses in analytical review. Journal of Accounting Research (Autumn), 648–667. Libby, R., & Frederick, D. M. (1990). Experience and the ability to explain audit findings. Journal of Accounting Research (Autumn), 348–367. MacDonald, E. (2000). ‘What’s Wevenue?’ Auditors Miss a Fraud and SEC tries to put them out of business – scam at California Micro was well-hidden, says lawyer for Coopers duo – CFO’s misleading resume. Wall Street Journal (January 6), A1. Office of Management and Budget (OMB) (2001). A citizens’ guide to the federal budget, ﬁscal year 2002. Washington, DC: U.S. Government Printing Office. Random House Webster’s College Dictionary (1991). New York, NY: McGraw-Hill. Stokes, M. E., Davis, C. S., & Koch, G. G. (1995). Categorical data analysis using the SAS system. Cary, NC: SAS Institute. Trotman, K. T., & Sng, J. (1989). The effect of hypothesis framing, prior expectations and cue diagnosticity on auditors’ information choice. Accounting, Organizations and Society, 14(5/6), 565–576. Wheeler, S., Dusenbury, R., & Reimers, J. (1997). Projecting sample misstatement to audit populations: Theoretical, professional, and empirical considerations. Decision Sciences (Spring), 261–278. 76 JOHN T. REISCH, KAREN S. McKENZIE AND ALAN H. FRIEDBERG APPENDIX The treatment cases included in the experimental instrument are given below. Cases 1–4 represent the four combinations of the complete 2 × 2 design that tests two sample misstatement manipulations: intentional or unintentional misstatement and systematic or non-systematic misstatement. The unintentional and non-systematic misstatement manipulations are italicized first, followed by the manipulations for intentional and systematic misstatements that are also italicized but in parentheses. Case 1 (Sales) Sales Account No. 77491 was understated by $945.16. It was determined that a temporary clerical employee, who worked during a two week period in April, mistakenly (deliberately) misfooted sales invoices for the account. The client’s controller indicated that this was the only temporary employee (one of 25 temporary employees) used to process sales transactions. Case 2 (Inventory) During a physical inventory observation, it was discovered that inventory item No. 245-0672 (cleaning chemicals) was understated by 23 items valued at $50 each. Further investigation revealed that a warehouse employee temporarily placed the items in the breakroom to restock the company’s supplies closet (temporarily placed the items in the breakroom with the intent of taking them home for his personal use) (the breakroom is adjacent to the company’s supplies closet). A review of the company’s internal audit workpapers for the last two years, which report on periodic surprise inventory test counts, revealed no similar instances (revealed several similar instances) in which inventory was improperly segregated by warehouse workers. Case 3 (Receivables) Receivables Account No. 16788 was overstated by $59. The misstatement was discovered when the auditor compared the price on the selected sales invoice to the client’s approved master price list in effect at the date of the sale. An investigation into the matter revealed that a salesperson overcharged the customer for the item Investigating Error Projection Among State Auditors 77 when she inadvertently read the price of the next item on the master price list (to increase her sales commission). The client’s accounting system was temporarily down when the item was ordered and the transaction had to be manually processed. When the system is operating, it cannot process transactions (it allows overrides of transactions) if the price of the item is not within the approved master price range. It was estimated that the system was down 3–5% of the time during the year. Case 4 (Unknown Receipts) It is company policy to place unidentified receipts into a temporary account “Unknown Receipts.” For example, if cash or a check is received for payment without a remittance advice or other means of identifying the account holder, the transaction is recorded as a debit to cash and a credit to “Unknown Receipts.” When the payor is later identified, an entry is made to debit “Unknown Receipts” and credit the proper account receivable. Transactions involving the “Unknown Receipts” account are carefully reviewed (are not carefully reviewed). A sample of transactions indicated that cash was understated by $155.04. An investigation revealed that an employee unintentionally (intentionally) reversed the entry for an unknown receipt of cash by crediting cash and debiting “Unknown Receipts.” HOW DOES NEGATIVE SOURCE CREDIBILITY AFFECT COMMERCIAL LENDERS’ DECISIONS? Philip R. Beaulieu and Andrew J. Rosman ABSTRACT Data were collected from loan ofﬁcers using a computerized process-tracing program to help shed some light on how source credibility impacts the judgments made by loan ofﬁcers. Loan ofﬁcers did not structure loans more restrictively regardless of whether they were in the positive or negative character condition or whether they approved or denied the loan. Negative source credibility affected decision process effort but did not produce the tradeoff between loan approval and loan structure that is suggested in the literature. Although signiﬁcantly more (fewer) loans were denied when character information was negative (positive), a majority of loan ofﬁcers in the negative character condition approved the loan. While most loan ofﬁcers were aware of negative source credibility, they did not react by denying loans or adjusting loan structure. INTRODUCTION While many agree that source credibility is important to lending decisions, how negative source credibility impacts lender decisions is less understood. Some suggest that loan structure (i.e. collateral and covenants) can be used to compensate for negative source credibility (e.g. Mather, 1999; Oldham, 1998), while others Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 79–94 © 2003 Published by Elsevier Ltd. ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06004-6 79 80 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN maintain that loan officers should not trade off perceived weaknesses in source credibility with tighter structure. The risk of attempting to counterbalance flawed character with loan structure is too great; a safer approach would be to avoid a business relationship than to trust the applicant’s financial representations (e.g. Pace & Simonson, 1977). Research on whether lenders compensate for perceived weakness in source credibility by imposing tighter loan structure requires joint study of loan approval and loan structure decisions. However, the literature on how loan officers react to negative source credibility has focused on loan approval (e.g. Beaulieu, 1994) or loan structure (Mather, 1999), but not both. Thus, the primary contribution of this paper is to determine whether the tradeoff exists. Source credibility was manipulated in the experiment to be either positive (suggesting a credible source) or negative (suggesting a non-credible source). Data were collected using a computerized process-tracing program, which collected information on decision effort, perceptions of the credibility of projected accounting information, loan approval/denial, and loan structure. The results indicate that loan officers will deny loans to less credible clients rather than restructure the conditions of the loan, and that they will not structure loans more restrictively regardless of whether they were in the positive or negative character condition or whether they approved or denied the loan. LITERATURE REVIEW AND HYPOTHESIS DEVELOPMENT Deﬁning Source Credibility In capital markets, source credibility refers to whether managers who direct the preparation of financial statements inspire belief in the statements. Source credibility is particularly important in today’s environment as a number of prominent companies, including several of their CEOs and CFOs were accused of falsifying documents and manipulating accounting information to hide poor financial results. Source credibility is distinct from credibility conferred by attestation services offered by external auditors. While both forms of credibility are important, source credibility, which has received relatively little attention in finance and accounting literature, is the focus of this paper. In a post-Enron world, new research will likely address interactions between source credibility and attestation services. Source credibility is important whenever resource providers lack complete information and must rely on others to provide fair and accurate information. Source How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 81 credibility is particularly important in commercial lending, which lacks the analyst following, observable firm valuation in share prices, and visible disciplinary managerial labor markets affecting large publicly traded firms. In equity markets, for example, there is a public record of the accuracy of management forecasts that can be used to evaluate source credibility (Hirst et al., 1999). In commercial lending, on the other hand, loan officers lack access to the prior accuracy of borrowers’ financial projections (since the media do not report their projections). They must observe the behavior of borrowers, such as their willingness to disclose information, in order to draw conclusions about source credibility. Lenders refer to source credibility as the character of borrowers. Character is usually defined as a borrowers’ determination to repay debt, but also connotes honesty and integrity, which are to be considered when evaluating all statements (financial or otherwise) made by borrowers. Research in many different contexts has established that source credibility affects information use. It has been shown to affect job choice intentions (Coleman & Irving, 1997), purchase intentions (Gotlieb & Sarel, 1991; Grewal et al., 1994) and commercial loan officers’ loan decisions (Beaulieu, 1994; Mather, 1999). In all of these contexts, people reduce the influence of information from non-credible sources on their judgments and action choices, a process called “discounting” (Beach et al., 1978; Kelley, 1972). Most research in source credibility has defined and manipulated it as the prior accuracy of sources of information. An example is Hirst et al. (1999), who studied the effects of source credibility and the form of management earnings forecasts on investors’ judgments and confidence. Participants in an experiment were asked to assume the role of investors evaluating the common stock of a manufacturing firm. Source credibility was manipulated as high (low) forecast accuracy, and subjects were told that “any differences between (the company’s) prior forecasts and the actual realizations were generally very small (very large).”1 Another example of this research is Maines (1990), who examined whether the prior accuracy of individual earnings forecasts affected judgments of the expected accuracy of consensus forecasts that were based on the individual predictions. Research conducted in this manner does not allow participants to make their own judgments of source credibility; it simplifies the issue by defining it as prior accuracy and specifying what that is. In contrast, Beaulieu (1994, 1996) defined source credibility as whether sources of information inspire belief in their representations. This more general definition encompasses prior accuracy, but also includes many other indications of source credibility, which may be referred to collectively as prior behaviors of the source. Relevant prior behaviors depend on the context of information usage; an example in commercial lending is the ability or failure of a borrower to provide documentation when promised (Beaulieu, 1994). In adopting Beaulieu’s definition and manipulation, source credibility is 82 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN treated as a more subtle, complex, and practical issue than is done in most prior research and participants are given more freedom to judge source credibility. Prior research in lending has found that source credibility affects loan approval (Beaulieu, 1994) and loan structure (Mather, 1999) judgments. However, loan approval and loan structure are not separate, independent judgments even though they have been examined separately. To more completely understand the effect of source credibility on lending decisions, both judgments need to be simultaneously examined. Doing so provides a more comprehensive understanding of how loan officers react to negative source credibility, and in particular, whether they compensate for negative source credibility by restrictive loan structure or whether they simply deny the loan request at the outset. This is our basic research question. This research question is important because it focuses on shortcomings in the current literature and helps resolve the debate on how loan officers react to negative source credibility. Framing the research question in terms of a tradeoff between loan approval and structure allows us to investigate whether Mather’s findings (1999) that source credibility affects loan structure would hold if loan officers were permitted to deny loans. Similarly, while Beaulieu (1994) documented that more loan officers denied loan applicants with negative source credibility than those with positive source credibility, there is no evidence whether loan candidates with negative source credibility who were approved received more restrictive loan structures than those who were denied or those who had positive source credibility. If lenders do not structure approved loan candidates with negative source credibility more restrictively then there is no consequence to candidates with negative source credibility that would protect lenders. Loan Approval/Denial and Loan Structure Commercial lending experts recommend that loan officers evaluate source credibility, in the form of a character judgment, as soon as contact with a prospective borrower has been made. If character is not of sufficient quality, then analyzing credit further or considering alternative loan structures may not be worthwhile. This preliminary character judgment is the first hurdle of lending (McDonald & McKinley, 1981; Pace & Simonson, 1977). Stephens (1980) confirmed that loan officers want information about the applicant before examining the details of the loan. This position can also be inferred from Eisenreich (1981, p. 9): Since the majority of information will come from the borrower . . . the lender must have confidence in the raw material of the judgment. If not or if critical facts cannot be verified, the lender cannot make the decision. It would be a gamble rather than a calculated risk. How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 83 Loan officers may be tempted to work with potential borrowers, as is suggested by positive accounting literature (Watts & Zimmerman, 1986), even when they observe negative character. If the apparent risk of financial loss to the bank resulting from the actions of borrowers with negative character could be managed through loan structure, loan officers could acquire profitable clients. Suppose, for example, that a borrower fails to disclose information about a relatively small liability. As the following scenario suggests, the loan officer may still attempt to work with the borrower. The italicized sentence becomes a rationalization for making the loan. Use of the word “undisclosed” is usually just another way of saying that you have been lied to, absent a brilliant excuse for amnesia. At this level, a withhold is usually the same as a lie and is a serious character flaw. Regardless, if the nature and circumstance of an undisclosed issue can be overcome, then the daunting task of managing any remaining financial risk is still left to deal with. If what remains is a quantiﬁable ﬁnancial issue, then this may be manageable through the loan structure. Otherwise, walk away (Oldham, 1998, p. 64, emphasis added). The above direct quote conflicts with the advice offered by other commercial lenders cited earlier (Eisenreich, 1981; McDonald & McKinley, 1981; Pace & Simonson, 1977). It seems to advocate both screening borrowers of questionable credibility and using loan structure to work with them. While prudently this should be the exception rather than the rule, loan officers may use the exception to rationalize loan approvals. Which reaction is more likely to occur is an open issue. Beaulieu (1994) found that character had a significant main effect on loan officers’ loan decisions (approval or denial) and that it interacted with accounting information to affect both decisions and estimates of risk of nonpayment. Specifically, loan decisions and risk estimates responded significantly to a change in the strength of accounting information when character was positive, but not when it was negative. Participants in Beaulieu’s study were told to assume, in a loan application case, that structure of the proposed loan would be determined by the bank’s policy at competitive terms and that collateral would be available to meet the bank’s guidelines for that type of loan. They had no opportunity to adjust loan covenants or collateral. In contrast, Mather (1999) instructed his subjects that loans had already been approved, so that only the loan structure task was required. Under these conditions, Mather found that loan officers set more restrictive loan structure when credibility was unknown than when it was positive. An objective of the current study is to help to resolve the debate by providing evidence as to whether lenders simply deny a loan (H1) consistent with Beaulieu (1994) or select collateral and covenants levels to compensate for weaknesses in source credibility (H2) consistent with Mather (1999). Essentially, H1 and H2 are competing hypotheses. Because the guidance in the literature is at odds, the hypotheses are stated in the null form. 84 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN H1. There will be no difference in the proportion of loan officers who will approve loans when character of the borrower is positive than when character is negative. H2. There will be no difference in proposed loan structure between loan officers receiving negative and positive character information. Process Effort Loan officers make a critical decision regarding how much effort to expend when they evaluate a loan candidate. Rosman and Bedard (1999) find evidence that lenders will structure loans more restrictively when they expend less effort. However, Rosman and Bedard do not consider the relationship between effort and loan structure restrictiveness in light of weaknesses in a potential borrower’s character. When character is perceived to be weak but not entirely non-credible, the lender may pour more effort into the file to check on the initial negative impression of character and to relate character judgments to other information provided, especially accounting information. This possibility is motivated by the fact that initial impressions of character and personality can be incorrect (Korem, 1997). That is, loan officers may consider approving a loan if no aspect of presentation in the financial statements encourages caution, even though assessments of management’s credibility raise doubts about their character.2 Increasing decision effort in such situations reduces concerns raised by initial negative character judgments that do not push loan officers past a threshold where they feel that they must deny loans. Increased processing effort, as a response to negative (but not extremely so) character information, is consistent with Shaub’s (1996) finding that auditors lacking trust in a client will recommend more work in their audit plans. It is also consistent with Beaulieu (2001), in which recommended evidence collection was negatively related to a CFO’s integrity. The other option available to loan officers when character judgments are sufficiently negative is to deny loans because such credits do not clear the “first hurdle” of commercial lending (Pace & Simonson, 1977). This implies that information processing will be terminated quickly when the character of borrowers is so negative that they are considered non-credible. Options one and two (checking initial impressions of character and relating it to accounting information, and denying loans without checking) require more and less processing effort, respectively, than an average or baseline credit with positive character information. It may not be obvious to loan officers whether the How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 85 loan applications of borrowers, perceived at first to have weak character, deserve more or less analysis. Thus, we expect the following: H3. Variance of effort should be greater when loan officers receive negative character information than when they receive positive information. METHOD Procedure Decision process and outcome data were collected using Search Monitor, which is a computerized process-tracing program (Biggs et al., 1993; Brucks, 1988; Rosman & Bedard, 1999). Search Monitor is interactive, menu-driven software that presents case materials to participants and captures a complete trace of selected processes including cue acquisition, acquisition order, and time to examine cues. Subjects were advised at the beginning of the Search Monitor task that a commercial loan applicant was seeking a loan package that included short- and medium-term financing. The case used in this study integrates the case materials used by Beaulieu (1994), which validated the source credibility measures, and Rosman and Bedard (1999), which validated the realism of the lending task and related measures. The loan applicant, a manufacturer of chemical products, was briefly described, including the contact person with the firm, its CFO. Further information about the firm was accessed via a menu having six categories of financial and qualitative data: profitability, inventory turnover, liquidity, and financial leverage & capital structure (financial); and management and industry & product (qualitative). Each of the four categories of financial data consisted of three ratios (and the dollar values of numerators and denominators), divided into historical (years −2, −1 and 0) and projected (years +1 and +2) information. Case information indicated that the historical information was given a clean audit opinion, while no opinion had been expressed regarding the projected figures. For example, the following menu was presented to participants who selected profitability information. (1) (2) (3) (4) (5) (6) Historical net income/average equity Projected net income/average equity Historical net income/average total assets Projected net income/average total assets Historical net income/net sales Projected net income/net sales 86 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN The order of the six cues was randomized differently each time a participant returned to the menu. Participants could move both within each of the six categories of information and between categories as they wished. When they indicated that they had finished selecting and viewing information, they were given a series of screens to register their recommendations about the loan. Approval or denial of the loan was requested, assuming an interest rate set at one percentage point above prime, followed by loan structure recommendations.3 Participants who recommended denial were told that although they did not recommend approval, they had been asked to provide input on how to structure the loan in the event that the loan committee recommended approval. This step was necessary so that H2 could be examined. That is, even if a loan did not pass the initial character judgment hurdle (see Pace & Simonson, 1977, discussed earlier), this step ensures a test of the tradeoff between structure and character that is suggested by some of the literature, including positive accounting theory. Combined, H1 and H2 provide a stronger test of the two competing points of view that have been expressed in the literature. Four loan structure recommendations were requested (see below). Twelve responses were provided for each, corresponding to ranges of percentages that varied, depending on the item.4,5 (1) Percentage of loan principal for which an equivalent amount of assets will be collateralized. (2) Level of profitability (ratio of net income to average equity) to be maintained. (3) Level of liquidity (ratio of cash flows to fixed cash commitments) to be maintained. (4) Level of leverage (ratio of total liabilities to equity) to be maintained. The loan structure recommendations were followed by a question asking participants to indicate confidence in their structure judgments on a nine-point scale. Finally, two questions asked participants to rate the credibility of historical financial information and management’s financial projections, also on nine-point scales. The character information used in this study was adapted from Beaulieu (1994), which contains a complete description of the development and validation process. As shown in Table 1, character was manipulated between-subjects in two places in the Search Monitor program. First, either positive or negative character information regarding the CFO was provided in an introductory screen and was seen by all participants in either condition of the experiment. Second, participants could select more information about the CFO via the management information menu. Those selecting the additional information received either a positive or negative description, depending on the condition to which they had been assigned. How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 87 Table 1. Character Conditions and Locations in Search Monitor.a Location of Information Positive Character Negative Character Introductory screen viewed by all participants When you visited the business, the CFO had available all the documentation that he had promised to provide. Among the items you examined during your initial evaluation are the following. The loan application stated that the firm had not been a defendant in legal actions in the last three years. A background check confirmed this. At your meeting, you said that a decision on the loan would be made within two weeks. The CFO accepted this time frame, and did not urge you to reach a decision earlier. The CFO’s work history is provided, then the following: At your first meeting the CFO answered your questions patiently, and volunteered additional information. He is an active member of several local community service organizations. When you visited the business, the CFO did not have available all the documentation that he had promised to provide. However, the following information did become available to you during your initial evaluation. The loan application stated that the firm had not been a defendant in legal actions in the last three years. A background check showed that a former senior officer of the firm has filed a wrongful dismissal suit. The suit has recently been settled out of court. At your meeting, you said that a decision on the loan would be made within two weeks. The CFO accepted this time frame and did not urge you to reach a decision earlier. Management menu, CFO viewed only if selected The CFO’s work history is provided, then the following: During your meeting with him, Mr. Butler ignored your suggestions for improving his firm’s operations and said that he did not need business advice. He has changed the firm’s public accountant twice in the last five years. Disagreements with the former public accountants were reported. a The sentences in italics were rated as neutral, not providing information about character, in Beaulieu (1994). They were not written in italics in Search Monitor. Participants Twenty-five bankers representing 11 banks in New England participated in the study. There were no statistically significant differences between the 14 bankers in the positive source credibility condition and the 11 bankers in the negative condition on the following dimensions: years in banking, education level, and loan size experience. On average, the 25 participants had 17.8 years of banking experience (range of 8–26 years). All but three of the bankers had a college education. The bankers had experience with loans that ranged from $1,000,000 88 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN to $50,000,000 (mean = $8,730,000), but the most typical loans encountered by the participants during their normal course of business ranged from $90,000 to $1,000,000 (mean = $363,000). RESULTS Manipulation Check The potential for source credibility to impact the perception of the credibility of accounting projections is important because projected accounting information is a standard component of loan applications (Danos et al., 1989), and is not audited. This type of credibility judgment is different than other credibility judgments that are made in equity markets, because the latter are objective assessments of the accuracy of management forecasts (e.g. Hirst et al., 1999). In contrast, source credibility in the lending context is a subjective consideration of the prior behavior of management that is made because there is no objective public record of management forecast accuracy. The credibility of projected unaudited information is a judgment that precedes loan approval and loan structure and is used to assess the success of the manipulation. Mean source credibility ratings of projected accounting information were evaluated on a nine-point scale (1 = low, 9 = high). Subjects rated the credibility of projected information to be higher in the positive condition than in the negative condition (5.43 vs. 4.18, t = 1.63, p = 0.06, one-tailed). Credibility of the historical, audited financial information was also judged on the same nine-point scale. The mean ratings were 6.27 in the negative character condition and 7.14 in the positive condition (t = −1.13, p = 0.27).6 Therefore, any effects of the manipulation of information about the CFO’s character on loan decisions, structure recommendations, and processing effort result from changes in the credibility of projected, rather than historical, accounting information. Hypothesis Testing H1 investigated whether loan officers would simply deny loans if they become sufficiently concerned about character and source credibility. All loan officers given the positive character information about the CFO approved the loan (100% of 14), as did 8 of the 11 given the negative version (73%). The 2 statistic is 4.34 ( p = 0.037). Thus, the null hypothesis is rejected. H2 investigated whether loan officers would adjust loan structure to compensate for negative source credibility. Table 2 reports the four mean loan structure recommendations (collateral and How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 89 Table 2. Mean Structure Recommendations.a Percentage of principal collateralized Covenants Profitability to be maintained Liquidity to be maintained Leverage to be maintainedb Total of covenant recommendations Negative Character Positive Character t-Statistic ( p) 10.5 10.9 0.60 (0.56) 3.6 7.5 5.0 3.1 8.7 6.1 0.82 (0.42) 1.08 (0.29) 0.84 (0.41) 16.1 17.9 0.94 (0.36) a As in Rosman and Bedard (1999), each of the four scales (one for collateral and three for covenants) consisted of twelve responses. Each response represented a range of percentages, for example 10–20% of assets collateralized, but the ranges of percentages differed among the four scales. For all four scales, response 1 indicated 0% and 12 indicated the maximum percentage. Thus, the maximum total score of collateral and covenants possible is 48 (4 × 12). b These scores have been converted as described in Note 4. three covenants) individually and in total. While structure requirements appear greater in the negative character condition in one case (profitability) and greater in the positive condition in three cases (collateral, liquidity and leverage), none of the differences are statistically significant, whether individual or aggregated. Thus, the null form of H2 cannot be rejected. The results regarding H1 and H2 support the contention of those in practice who recommend that loan officers deny loans when character is suspect, and imply that doing so is preferable to handling negative character through loan structure. Table 3 presents two sets of additional analyses of loan structure judgments: (1) loan officers within the negative condition who did not deny the loan are compared to those who did deny the loan; and (2) loan officers who did not deny the loan are compared across conditions. None of the differences in means are statistically significant for either set of analyses. Thus, deniers and approvers structure loans equally restrictively and those who approved loans structure them equally restrictively regardless of whether they were in the positive or negative condition. H3 investigated whether the uncertainty about the usefulness of decision-making effort would cause loan officers in the negative character condition to exhibit greater variance in effort than those in the positive condition. Table 4 reports the results for two aggregate measures of processing effort: total time spent on the task (Panel A) and total number of visits to information screens (Panel B). In Panel A, variance of time spent on the task is much greater in the negative character condition; the standard deviation is almost double that in the positive condition (14.3 vs. 7.5). The F test is significant (F = 3.64, p = 0.032). In Panel B, variance of total visits is similar in the negative and positive conditions, the standard deviations being 8.4 90 Table 3. Additional Analyses: Mean Structure Recommendations for those Who did not Deny the Loan.a Within Negative Condition Within Did-not Deny Decision Deny (n = 3) t-Statistic ( p) Negative Character (n = 8) Positive Character (n = 14) t-Statistic ( p) 10.3 11.0 −0.66 (0.53) 10.3 10.9 −0.66 (0.53) 3.5 8.1 4.7 3.7 6.0 5.7 −0.19 (0.85) 1.21 (0.26) 0.40 (0.70) 3.5 8.1 4.7 3.1 8.7 6.1 0.64 (0.53) −0.53 (0.60) 0.96 (0.35) Total of covenant recommendations 16.4 15.3 0.35 (0.73) 16.4 17.9 −0.72 (0.48) Total of collateral and covenants 26.6 26.3 0.09 (0.93) 26.6 28.8 −0.85 (0.40) Percentage of principal collateralized Covenants Profitability to be maintained Liquidity to be maintained Leverage to be maintainedb a As in Rosman and Bedard (1999), each of the four scales (one for collateral and three for covenants) consisted of twelve responses. Each response represented a range of percentages, for example 10–20% of assets collateralized, but the ranges of percentages differed among the four scales. For all four scales, response 1 indicated 0% and 12 indicated the maximum percentage. Thus, the maximum total score of collateral and covenants possible is 48 (4 × 12). b These scores have been converted as described in Note 4. PHILIP R. BEAULIEU AND ANDREW J. ROSMAN Did Not Deny (n = 8) How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 91 Table 4. Measures of Processing Effort. Negative Character Positive Character 18.3 14.3 6.3–52.6 21.3 7.5 10.7–35.2 Panel A: Total minutes on task Mean Standard deviation Range H0 : variances are equal F Prob > F Panel B: Total visits to information screens Mean Standard deviation Range H0 : variances are equal F Prob > F 3.64 0.032 20.1 8.4 7–33 21.2 10.5 9–47 1.55 0.490 and 10.5, respectively (F = 1.55, p = 0.490). Thus, the variance of effort choices was similar with respect to the quantity of information examined, but not with respect to time spent examining it. DISCUSSION AND CONCLUSIONS In an experiment where loan officers used process-tracing technology called Search Monitor to evaluate a commercial loan application, two results were found that help researchers and practitioners more fully understand the role of source credibility in affecting loan officers’ decision behavior. First, loan officers dealing with negative-character borrowers were less likely to approve loans than those in the positive character condition (H1); and second, they did not compensate for negative source credibility by structuring loans more restrictively (H2). These results suggest that loan officers tend to deny loans rather than compensate for negative character in loan structure. They did so even though the manipulation check and results for variance of processing time in the negative condition (H3) showed that they were aware of and sensitive to character issues. However, a large number of loan officers in the negative condition (8 out of 11) did not deny the loan. How then did those who did not deny the loan in the negative condition react to negative source credibility? Combining the results for H1 and H2 with the additional analyses leads to the following conclusion: a minority of 92 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN loan officers react to negative source credibility, and they do so by denying loans, while the majority do not react in terms of the final decisions to approve a loan or to structure it restrictively. In short, proportionally few loan officers reacted to negative source credibility, but when they did, they denied loans rather than accept the loan and handle their concerns with loan structure. In hindsight, these results mirror the reaction of the stock analyst community to Enron. Those analysts who doubted Enron a year before its bankruptcy were few and far between, but they did so by using their assessment of source credibility as the lens through which to analyze the numbers. Enron’s management was notorious for dealing arrogantly with analysts and being unable to produce financial information. This created an environment of distrust in which patterns of transactions that were questionable could be pieced together. The advice of one analyst who sold Enron stock short was simple: “Test what a company says; don’t take it at face value.” In other words, it is necessary to assess the credibility of the source of the information in order to be able to understand the information itself (Bailey, 2001, p. F1). As is true of experimental research, the ability to generalize results both to other tasks and other financial statement users (in commercial lending and elsewhere) is limited. In particular, although the indicators of character used in this experiment have been validated in other research (Beaulieu, 1994, 1996), subtle changes in the apparent financial strength of firms, task or context may encourage financial statement users to select other signals of source credibility. Other sources of credibility, especially external audits, may become relatively more or less important, depending upon task and context. For example, concerns about accounting for intangible assets may upset the current balance of users’ reliance upon source credibility vs. credibility derived from audits. Our objective is to encourage thought and research about this balance, and about the type of credibility information that different users employ. NOTES 1. Hirst et al. (1999) did not explain to participants how forecast accuracy was calculated. 2. An example of a presentation that encourages caution is writing off all bad debts in a single period, making it difficult to chart profitability (Ruth, 1987). 3. We do not examine pricing, that is to charge interest sufficiently above prime rates to accommodate even the worst credit risks. It is difficult for loan officers in the United States to price-protect themselves, because the commercial lending market is very competitive and there is as little as a two-point spread separating prime from high-risk borrowers (Emmanuel, 1989). 4. Consistent with Rosman and Bedard (1999), collateral was represented to the lenders on a 12-point scale, which ranged from “0%” to “more than 100%” in 10% increments. How Does Negative Source Credibility Affect Commercial Lenders’ Decisions 93 Profitability ranged from “0%” to “more than 50%” of the ratio of net income to average equity, identified in 5% increments. Liquidity ranged from “0%” to “more than 150%” of the ratio of cash flows to fixed cash commitments, in 15% increments. Leverage ranged from “0%” to “more than 70%” of the ratio of total liabilities to equity, in 7% increments. The upper bounds differ due to variation in the normal range of these ratios. The leverage covenant was converted to a revised measure (i.e. 13 − x, where “x” is the value selected by the participant) so that the direction of each scale was similar. 5. In contrast, Mather (1999) asked subjects to make judgments as to the number of covenants they would seek and how tightly they would be imposed. However, the nature of the covenants was not specified. 6. A potential concern regarding the experiment is that some participants may not have seen all of the character information. As explained in Table 1, two facts in each condition of the experiment were viewed only if selected. If a number of participants did not select the additional screen about the CFO, the strength of the character manipulation would not have been consistent. Ten of the 11 participants in the negative condition and 13 of 14 in the positive condition accessed the optional CFO information. In total, 23 of 25 participants investigated the CFO, evidence that the character manipulation was consistent across conditions, and that character and source credibility were important to the participants. Both participants who did not access the additional character information, one in the negative condition and one in the positive condition, approved the loan. ACKNOWLEDGMENTS The authors thank Jean Bedard, Karla Johnstone, Marlys Lipe, Inshik Seol, Kathy Wilkicki and two anonymous reviewers. REFERENCES Bailey, S. (2001). Right on the money. The Boston Globe (December 5th), F1. Beach, L. R., Mitchell, T., Deaton, M., & Prothero, J. (1978). Information relevance, content and source credibility in the revision of opinions. Organizational Behavior and Human Performance, 21, 1–16. Beaulieu, P. (1994). Commercial lenders’ use of accounting information in interaction with source credibility. Contemporary Accounting Research, 10(Spring), 557–585. Beaulieu, P. (1996). A note on the role of memory in commercial loan officers’ use of accounting and character information. Accounting, Organizations and Society, 21(August), 515–528. Beaulieu, P. (2001). The effects of judgments of new clients’ integrity upon risk judgments, audit evidence, and fees. Auditing: A Journal of Practice & Theory (Fall), 85–99. Biggs, S., Rosman, A., & Sergenian, G. (1993). Methodological issues in judgment and decisionmaking research: Concurrent verbal protocol validity and simultaneous trace of process. Journal of Behavioral Decision Making, 6, 187–206. Brucks, M. (1988). Search monitor: An approach for computer-controlled experiments involving consumer information search. Journal of Consumer Research, 15, 117–121. 94 PHILIP R. BEAULIEU AND ANDREW J. ROSMAN Coleman, D., & Irving, G. (1997). The influence of source credibility attributions on expectancy theory predictions of organizational choice. Canadian Journal of Behavioural Science, 29(April), 122–131. Danos, P., Holt, D., & Imhoff, E. (1989). The use of accounting information in bank lending decisions. Accounting, Organizations and Society, 14, 235–246. Eisenreich, D. (1981). Credit analysis: Tying it all together – Part I. Journal of Commercial Bank Lending (December), 2–13. Emmanuel, C. (1989). Limiting exposure to fraudulent financial reporting. The Journal of Commercial Bank Lending (September), 16–27. Gotlieb, J., & Sarel, D. (1991). Comparative advertising effectiveness: The role of involvement and source credibility. Journal of Advertising, 20(1), 38–45. Grewal, D., Gotlieb, J., & Marmorstein, H. (1994). The moderating effects of message framing and source credibility on the perceived price-risk relationship. Journal of Consumer Research, 21(June), 145–153. Hirst, D. E., Koonce, L., & Miller, J. (1999). The joint effect of management’s forecast accuracy and the form of its financial forecasts on investor judgment. Journal of Accounting Research, 37, 101–123. Kelley, H. (1972). Attribution in social interaction. Morristown, NJ: General Learning Press. Korem, D. (1997). The art of proﬁling: Reading people right the ﬁrst time. Richardson, TX: International Focus Press. Maines, L. (1990). The effect of forecast redundancy on judgments of a consensus forecast’s expected accuracy. Journal of Accounting Research, 28(Suppl.), 29–47. Mather, P. (1999). Financial covenants and related contracting processes in the Australian private debt market: An experimental study. Accounting and Business Research, 30(1), 29–42. McDonald, J., & McKinley, J. (1981). Corporate banking: A practical approach to lending. Washington, DC: American Bankers Association. Oldham, J. (1998). The “killer” character component. The Secured Lender, 54(November/December), 62–66. Pace, E., & Simonson, D. (1977). The four hurdles of lending. The Journal of Commercial Bank Lending (March), 10–15. Rosman, A., & Bedard, J. (1999). Lenders’ strategy selection in loan structure decisions. Journal of Business Research, 83–94. Ruth, G. (1987). Commercial lending. Washington, DC: American Bankers Association. Shaub, M. (1996). Trust and suspicion: The effects of situational and dispositional factors on auditors’ trust of clients. Behavioral Research in Accounting, 8, 154–174. Stephens, R. (1980). Uses of ﬁnancial information in bank lending decisions. Ann Arbor, MI: UMI Research Press. Watts, R., & Zimmerman, J. (1986). Positive accounting theory. Englewood Cliffs, NJ: Prentice-Hall. EARNINGS MANAGEMENT AND FRAMING: THE SPECIFIC CASE OF OBSOLETE INVENTORY Marybeth M. Murphy and Joanne P. Healy ABSTRACT Recent events have shown that earnings management is a signiﬁcant problem in the business world and that the culture in place in many organizations may encourage managers to manipulate earnings. While prior research has shown that earnings management exists at the corporate level, it has not examined whether managers at the divisional level are motivated to manage earnings. The purpose of this study is to examine whether divisional managers will be more inclined to manage earnings in order to maximize personal wealth. The secondary research objective is to examine whether the information frame will impact discretionary management accounting decisions. Members of the Institute of Management Accountants participated in an earnings management study in which two conditions were manipulated. First, the annual compensation of subjects was contingent on whether target income was met or not met. Second, information about a potentially obsolete inventory item was framed as either positive or negative. Subjects were asked the likelihood they would write off the potentially obsolete inventory. Research ﬁndings support the earnings management hypothesis and indicate that managers are less likely to write off obsolete inventory when their compensation is impacted by the write-off. Study results also reveal that the manner in which Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 95–119 Copyright © 2003 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06005-8 95 96 MARYBETH M. MURPHY AND JOANNE P. HEALY the inventory information is framed may affect managers’ write-off decision. These results are important as they may indicate that earnings management is more pervasive throughout the organization than previously shown. 1. INTRODUCTION Arthur Levitt, while chair of the Securities and Exchange Commission (SEC), announced a focus on firms that manage earnings (Levitt, 1998). He unfolded an action plan to address earnings management. Initiatives included better accounting practices, standards and interpretative guidelines, stricter SEC focus on earnings management, a review of audit practices, and a call for a cultural change in the business world regarding the acceptance of earnings manipulations. While the SEC can address most of these concerns with better standards and practices, changing the culture of business is more complex. It involves changing the behavior of individuals. Research needs to be conducted that addresses why individuals manage earnings. Such research is important to future accounting practices. The purpose of this study was threefold. First, earnings management was experimentally examined in a managerial accounting setting. Previous empirical research has examined earnings management at the corporate level indirectly through the analysis of financial results.1 Researchers typically study discretionary management decisions (i.e. write down of impaired assets) via publicly available information and infer whether earnings management has occurred based on a comparison of actual financial results to some expectation (Rees et al., 1996; Zucca & Campbell, 1992). Rather than taking this approach in identifying earnings management behavior, this study behaviorally examines whether bonus plans influence managers’ decisions. The second purpose of the study was to investigate earnings management at the divisional level rather than the overall corporate view, looking at what occurs within the firm.2 A survey by Buck Consultants of Fortune 1000 companies found that 61% of U.S. companies offer variable compensation plans below the executive level, and another 27% are considering them (Wilson, 2001). This increase in bonus type plans creates greater incentive for earnings management. Earnings management occurs at the corporate level due, in part, to managers’ efforts to achieve incentive compensation based targets (Watts & Zimmerman, 1978). Schipper (1989) states, “Clearly, compensation schemes and divisional managers’ private information create a potential incentive to manipulate internal managerial accounting reports.” If performance of managers at the lower levels of the firm is also measured based on these types of targets, then the possibility Earnings Management and Framing 97 exists that earnings management could occur at these levels. Managers could use various means to manipulate earnings, from writing off low value inventory items to controlling the timing of shipments to customers. The outcome of some of these methods could be “buried” in the results of normal operations and therefore might not be obvious at the corporate level. Alternatively, the consolidation of this manipulated divisional income could result in significantly greater earnings management at the corporate level than previously estimated. This division level earnings management could be a potential intervening variable, which has led to conflicting results in at least one published earnings management study.3 The third purpose was to examine the effects of framing on earnings management. Subjects were presented with information pertinent to a discretionary managerial decision from both a negative and positive viewpoint. Kahneman and Tversky (1979) theorize that the way information is framed can impact decision-making. This study looks at the potential impact of the information frame on the decision to write off inventory. The results support both earnings management and framing hypotheses. Findings suggest that management accountants are more apt to write off inventory when: (1) their personal wealth is unaffected; and (2) information is framed negatively. An important contribution of this research is the fact that information framing can have an impact on the earnings management decision. The probability of writing off inventory was higher, although insignificant, for participants with negatively framed information, even though their personal wealth decreased, than those with positively framed information who were not eligible for a bonus. The management accounting implication of these results is that managers’ decisions could be influenced by the way information is presented. This paper is organized in the following manner. Background and hypotheses are developed and presented in Section 2. The research design and methodologies used to test the hypotheses are presented in Section 3. Results are shown in Section 4 and finally, Section 5 presents contributions and implications for further study. 2. THEORY AND HYPOTHESES DEVELOPMENT Managers are faced with many different types of risky decisions each day. Many of these decisions are made at the discretion of management with potentially far reaching implications. They impact not only management and those internal to the firm, but also potentially affect the wealth of the shareholders, since many of 98 MARYBETH M. MURPHY AND JOANNE P. HEALY these decisions affect a firm’s cash flows and reported net income. The following hypotheses examine the decision making behavior of managers. 2.1. Hypothesis 1 – Earnings Management (EM) Hypothesis Previous empirical research suggests that an incentive exists for managers to manipulate earnings to achieve personal goals.4 Healy (1985) theorizes that managers make discretionary accounting decisions that maximize the value of their bonuses. As shown in Fig. 1, Healy hypothesizes that if income is above targeted levels, management decisions will reflect an income maximizing strategy and no discretionary accruals will be booked, but if income is below targeted levels, managers will take income decreasing actions. Managers may elect income decreasing accruals when their firm’s income is either above an upper bound or below a lower threshold. In these situations, the reduction of income has little or no effect on managers’ compensation in the current year. Managers would take no action if income were at or above the desired level (but below the upper bound) since this would negatively impact their bonus. Healy (1985) provides empirical support for this theory. While previous research has tested earnings management indirectly through the analysis of financial data, little research has directly tested the behavior of managers (Burgstahler & Dichev, 1997; Cahan et al., 1997; Wu, 1997). Based on Healy’s bonus maximization theory as a basis for predicting earnings management, the following hypothesis is set forth: H1. If earnings before discretionary accruals are less than the lower threshold, the manager is more likely to take income-decreasing actions than when earnings are just above the lower threshold. Fig. 1. Predicted Outcomes Based on Bonus Maximization Theory. Note: Adapted from Healy (1985). Earnings Management and Framing 99 2.2. Hypothesis 2 – Framing Hypothesis Prospect theory (Kahneman & Tversky, 1979) proposes that information presentation impacts the editing process involved in decision making. Subtle changes in the wording of facts of a situation can alter an individual’s reference point (the point at which a decision is made), and ultimately their final decision. For example, stating probabilities as a 25% chance of gain (positive frame) versus a 75% chance of loss (negative frame) has been found to affect decision-making (Kahneman & Tversky, 1984). This framing of information can directly impact the decision by altering the context or frame of reference in a way that is irrelevant, sometimes leading to sub-optimal decisions. Some framing research in accounting has occurred in auditing. Shields et al. (1987) examined the effects of framing on an auditor’s uncertainty judgments of account valuation. The sample space for accounts was framed as either book value misstatements or audit values. They found no effect of the frame on the auditor’s judgments of account accuracy. However, Ayers and Kaplan (1993) found auditors exhibited confirming tendencies when assigned a misstatement (non-misstatement) frame by selecting more misstatement (non-misstatement) cues as relevant to explaining financial statement ratios. Beeler and Hunton (2002) found evidence that the existence of non-audit revenues creates a predecisional distortion of client related information, thereby suggesting a potential impairment of independence. Framing research has also examined managerial accounting issues. Lipe (1993) studied framing in an analysis of variance investigation decision and the subsequent performance evaluation of the investigation manager. She found that when investigation expenditures were framed as a cost, managers were evaluated more favorably than when the same expenditures were framed as a loss. In another study, Rutledge (1995) examined the interaction between recency effects and framing. He found that recency effects might be tempered by the framing of decision relevant information. The above research indicates that framing may impact accounting decisions. The way managers perceive information may influence their propensity to manage earnings, leading to the following hypothesis: H2. Managers will be more likely to take income-decreasing action when relevant decision information is framed in a negative manner than when that information is framed in a positive manner. Presentation framing may also impact the decision to write off an inventory item. This study uses a potentially obsolete inventory item to operationalize the accounting decision. If information regarding an inventory item is presented in 100 MARYBETH M. MURPHY AND JOANNE P. HEALY a positive manner, the manager is expected to be less likely to classify the item as obsolete and not write it off. Conversely, when information about the item is presented from a negative viewpoint, the item is more likely to appear obsolete and be written off. 2.3. Hypothesis 3 – Interaction of Income Level and Framing When considered together, H1 and H2 offer some interesting potential outcomes. Figure 2 outlines the four possible combinations. At one end of the spectrum, if income is less than the target and the frame is negative (Treatment D), the highest level of income decreasing actions would be predicted. Both earnings management and framing theories would suggest that managers would take actions to decrease income. If income is already below the bonus level, writing off an obsolete inventory item and decreasing income further would have no impact on the employee’s compensation. A negative frame of the write-off item would suggest to managers that the likelihood of future sale is low. H3a. Managers will be most likely to take income-decreasing action when relevant information is framed in a negative manner and when earnings before discretionary accruals are just below the lower threshold. Conversely, if income is greater than the target and the frame is positive (Treatment A), no income decreasing actions are likely. Bonus maximization theory would predict that managers do not desire to decrease income below the level necessary to receive a bonus. Prospect theory would suggest that a positive frame about a potential write-off would influence managers to view the item in question more favorably, thereby making them less likely to consider the item obsolete and write it off. H3b. Managers will be least likely to take income-decreasing action when relevant information is framed in a positive manner and when earnings before discretionary accruals are just above the lower threshold. The actions taken for the remaining two treatments (Treatment B – Income greater than target, negative frame and Treatment C – Income less than target, positive frame) are not as easy to predict. In each case, the two variables involved would predict that opposite behavior would occur. The outcome of the experiment depends on which treatment has the dominant effect. If neither variable dominated, treatment groups should fall equally into groups taking action and those not. Analyses of results will provide information as to which of these two effects are greater. Earnings Management and Framing Fig. 2. Predicted Interactions Between Earnings Management and Framing Theories. 101 102 MARYBETH M. MURPHY AND JOANNE P. HEALY H3c. There will be no difference in income decreasing actions of managers when relevant information is framed in a positive manner and earnings before discretionary accruals are just below the lower threshold and when relevant information is framed in a negative manner and earnings are just above the lower threshold. 3. RESEARCH DESIGN AND METHODOLOGIES 3.1. Subjects The intent of the study was to determine if earnings management would occur in a bonus situation and whether the frame would influence a manager’s decision process. Members of the Institute of Management Accountants (IMA) were chosen as subjects, since these individuals are typically in managerial positions that involve decisions such as writing off obsolete inventory. A randomly selected sample of 1000 actively employed members was obtained from the IMA.5 3.2. Instrument Design To test earnings management at lower levels, the experiment was designed to examine the behavior of divisional managers. This was accomplished by creating a setting at the plant level of the firm. An experimental manipulation, which consisted of a short scenario with questions (shown in Appendix) and a demographic questionnaire was used to conduct the experiment. The experimental manipulation was developed using an adaptation of Puto’s (1987) model of the buying decision process (Fig. 3). The two boxes on the right side of the model represent those items that are manipulated in the experiment, while the information on the left side of the model is held constant. 3.2.1. EM Hypothesis Operationalization In the experimental manipulation, the EM hypothesis was operationalized by presenting managers with information concerning their plant’s position with respect to a goal – budgeted operating income. Subjects were told that their bonus compensation was dependent upon meeting budgeted income levels. If budgeted income levels were met, managers received a bonus of 0.5% of income; otherwise they received nothing. This simulates Healy’s lower threshold. To create a positive initial reference point, managers were informed that the latest estimate of operations indicated that budgeted net income would just be met. Net income Earnings Management and Framing 103 Fig. 3. Theoretical Model Inventory Write-Off Decision. was set at $1,502,000, just above the threshold, so that an inventory write-off would reduce net income below the threshold, eliminating the manager’s bonus. With the negative initial reference point, a statement was included that infers early results indicate that actual income will be lower than budgeted income. Net income was set at $1,400,000, well below the threshold, to remove any possibility that the threshold could be achieved. This operationalization simulated a situation where the manager had the opportunity to reduce current year’s 104 MARYBETH M. MURPHY AND JOANNE P. HEALY earnings with no impact on personal wealth and improve prospects for subsequent years. The italicized line in the second paragraph of the scenario shown in Appendix indicates where these statements were placed in the survey instrument with the negative initial reference point shown in brackets. The first independent variable, INCOME, was a result of the manipulation of this initial reference point. Analysis of this variable was conducted as a between-subjects design. 3.2.2. Selection of the Discretionary Decision Previous research has hypothesized that many types of discretionary decisions are used to manage earnings at the corporate level. Studies have examined the timing of recognition of extraordinary items (Barnea et al., 1976; Ronen & Sadan, 1975; Walsh et al., 1991), write down of impaired assets (Zucca & Campbell, 1992), the provision for bad debts (McNichols & Wilson, 1988) and non-recurring charges (Elliott & Shaw, 1988). All have found support for earnings management at the corporate level. Hepworth (1953) suggests that the inventory valuation process can be used as a less obvious method of income smoothing. In a study of business unit managers, Guidry et al. (1999) tested an inventory model of earnings management along with two other previously tested models. Evidence of earnings management was found to be the strongest in the analysis of the inventory reserve account. They suggest that this occurs due to information asymmetry that exists between these managers and upper level management related to inventory valuation. The decision to write off inventory involves considerable management discretion. Accounting Research Bulletin 43 (FASB, 1992) addresses the inventory write-off in the following manner: Thus, in accounting for inventories, a loss should be recognized whenever the utility of goods is impaired by damage, deterioration, obsolescence, changes in price levels, or other causes. The measurement of such losses is accomplished by applying the rule of pricing inventories at cost or market, whichever is lower (Stmt. 5, Para. 8). This guideline can be difficult to follow in some circumstances. Often, especially for internally manufactured products, no ready market exists. This creates problems for the application of the lower of cost or market rule. In addition, someone must make the determination when damage, deterioration or obsolescence has occurred and to what extent the inventory value has been affected. The effects of these write-offs or reductions in inventory value are often not easily discernible to the user of the financial statements, since they are included as part of the calculation of cost of goods sold. Managers’ decision to write off immaterial amounts has not been previously studied. These actions are important because if Earnings Management and Framing 105 managers across the firm (all trying to achieve income targets) decide to write off small valued inventory items, the write-offs have the potential to be material in the aggregate. In addition, they are probably the most common method of writing off inventory (Hepworth, 1953). Therefore, immaterial inventory write-offs were selected as the discretionary decision in this study. To operationalize this inventory write-off, the value of the inventory item involved was set at $15,000. A number of factors were considered when setting the dollar level of the potential discretionary decision. The amount was set at about 4% of inventory, considered immaterial in value. An immaterial value was chosen for a number of reasons. First, if written off, the amount would be buried in cost of goods sold. Therefore it would not be obvious to outsiders, and probably not be detected by auditors. These decisions would be the type described by Hepworth (1953). Second, most managers would be expected to act conservatively and write off the amount. Therefore, differences in decision-making would basically be due to either bonus implications and/or information framing. In addition to materiality considerations, if the write-off would take place, the amount is large enough to cause the income level to fall below budget expectations for those receiving information that income was above the threshold. Obviously, for those below the threshold, the write-off would have no impact on bonuses this year. 3.2.3. Expected Payoffs The independent variables were designed with specific expected payoffs in mind. In the case where income is greater than budget, estimated net income was specifically established at a level at which the write-off of the inventory item would result in net income falling below budgeted levels, thus eliminating the manager’s bonus. Where income is less than budget, since budgeted net income had not been achieved, the write-off of inventory would have no impact on the bonus. Figure 4 indicates the values of these expected payoffs in the year depicted Fig. 4. Bonus Payoff Based on Current Year Inventory Write-Off Decision. 106 MARYBETH M. MURPHY AND JOANNE P. HEALY in the scenario. These payoffs are based on the bonus equal to 0.5% of plant net income ($7,500 if budgeted net income is met). This dollar value was selected to approximate bonus compensation for plant controllers.6 The only subjects to receive a bonus in the current year were those with income greater than budget who did not write off the inventory. Since their expected payoff is $7,510 ($1,502,000 at 0.5%), this group had the greatest opportunity cost from writing off the inventory. Based on this payoff, these subjects were expected to be the least likely to write off inventory, in accordance with the bonus maximization theory. 3.2.4. Framing Operationalization In general, previous framing research suggests that management may consider variables that may be unrelated to the actual decision at hand (Johnson et al., 1991; Kahneman & Tversky, 1984; Lipe, 1993; O’Clock & Devine, 1995; Puto, 1987). As part of the decision to write off inventory, managers in this study were presented with various pieces of information about the current status of inventory. If framing has been found to affect the decision-making process, it could also play a role in the decision to write off obsolete materials. Each subject was given an inventory statement to review. The second independent variable, INVENTORY, was operationalized as a statement about inventory expressed in either a positive frame or a negative frame. The italicized sentence in Appendix that is part of the item description indicates the frame. The negative frame is shown in brackets. The manipulation of the inventory information presentation frame was also conducted as a between-subject design. 3.2.5. Statistical Analysis These two independent variables, INCOME and INVENTORY, result in subjects being assigned to one of four possible treatments shown previously in Fig. 2. The dependent variable selected for this experiment measured the percent likelihood that the subject would recommend the write-off of inventory (PROBWO). A 2 × 2 ANOVA and/or the Kruskal–Wallis Multiple Comparison Tests were utilized to conduct the analyses of the hypotheses. 3.3. Pretesting Prior to mailing the survey instruments, two pretests were conducted to provide evidence for content validity as well as to improve the experimental task. The first pretest was conducted during the monthly meeting of a local IMA chapter. Comments provided by the participants were incorporated to improve the scenario Earnings Management and Framing 107 before the second pretest was undertaken. In general, participants of the initial test found the scenario to be incomplete. Specifically, they requested additional information on the relationship between the value of the write-off and total inventory. Participants also inquired if the parts could be resold as replacement parts. A line was added that indicated that no such market existed. The second pretest was conducted during the monthly meeting of another IMA chapter five months later. Again, all comments were considered and minor grammatical changes were made to the experiment. The responses from these 38 pretest participants have not been included in the final sample. 3.4. Procedure Dillman’s “Total Design Method” (1978) was employed in the design and mailing of the questionnaires. Each envelope and cover letter was printed with the individual’s name and address to make the request more personal. Questionnaires were numerically coded to determine which subjects had responded to the mailing. The cover letter indicated that this coding was for mailing purposes only and individual responses would not be associated with names of subjects. Participants were asked to complete the experiment and were provided with a stamped, self-addressed envelope for its return. 4. RESULTS 4.1. Response Rate Table 1 indicates the number of responses. There were 242 (24.2%) responses from the initial mailing. A second mailing was sent to the non-respondents; an Table 1. Description of Questionnaire Responses. Total respondents Returned to sender Returned incomplete Non accountants Total usable PP NP PN NN Total 85 0 6 14 65 86 0 4 10 72 105 2 6 8 89 115 3 6 16 90 391 5 22 48 316 PP means income greater than target and a positive inventory frame. NP means income less than target and a positive inventory frame. PN means income greater than target and a negative inventory frame. NN means income less than target and a negative inventory frame. 108 MARYBETH M. MURPHY AND JOANNE P. HEALY additional 149 questionnaires were returned, increasing the overall response rate to 39.1%. Of the total 391 responses received, 27 were returned either unanswered or incomplete. Another 48 were from non-accountants. The remaining 316 were used for the analyses. 4.2. Test for Non-response Bias Tests for non-response bias were conducted on the final sample of 316 participants. Mean responses to the participants’ probability of write-off question from the first mailing were compared to those of the second. Kruskal–Wallis tests indicated no significant differences between the two mailings (t > χ 2 = 0.236). t-Tests were conducted for years of experience, number of certifications, firm type, type of degree, and years on the current job. No significant differences were noted. 4.3. Manipulation Checks The manipulation of the earnings management situation was tested utilizing the response to the question, “Did you achieve the operating budget prior to the inventory write-off decision?” This insured that the subjects knew the position of estimated net income relative to the budget. Approximately 86% of the respondents answered the manipulation check for the operating budget correctly.7 The success of the manipulation of the inventory frame was confirmed by analyzing the subjects’ response to the following question, “How risky do you feel it is for the inventory to remain on the books?” Subjects were asked to respond on a 7 point Likert-type scale with “Very Risky” and “Not Risky” at opposite ends. A Mann–Whitney Test found significant differences between the mean of the positive (3.88) and negative (3.49) inventory frames at the 5% probability level indicating that the frame manipulation had succeeded ( p = 0.0458). 4.4. Demographics Table 2 provides overall information about the respondents in this study. The respondents held positions in a fairly diversified number of industries with the 46% of respondents employed by manufacturing firms. Subjects employed by service-oriented firms composed the next largest group (12.7%), followed by those from public accounting firms (10.4%). The remaining subjects (30.0%) worked in a variety of environments from banking, retailing, non-profits, consulting, to distribution. Earnings Management and Framing 109 Table 2. Demographic Characteristics of Respondents. Panel A: Responses by industry Manufacturing Services Public accounting Consulting Retailing Government Financial services Banking Other Total Panel B: Responses by educational level Less than bachelors Bachelors Masters or above No response to question Total Panel C: Responses by certification 0 Certifications 1 Certifications 2 Certifications 3 Certifications 4 Certifications Total Panel D: Responses by managerial level Owner/self-employed Staff/individual contributor First level management Mid level management Top level management No response Total Number of Respondents Percent of Total 148 40 33 8 12 11 6 8 50 46.9 12.7 10.4 2.5 3.8 3.5 1.9 2.5 15.8 316 100.0 8 185 122 1 2.5 58.6 38.6 0.3 316 100.0 149 124 39 3 1 47.2 39.3 12.3 0.9 0.3 316 100.0 20 56 44 100 95 1 6.3 17.7 13.9 31.7 30.1 0.3 316 100.0 The respondents were well educated with over 97.5% holding a bachelor’s degree. An additional 38.6% held advanced degrees. More than half the group possessed some form of certification. Thirty-nine percent held one certification, while 13.5% had obtained two or more. The most common certifications were the Certified Public Accountant (CPA) and the Certified Management Accountant (CMA). 110 MARYBETH M. MURPHY AND JOANNE P. HEALY Significant accounting experience was a strong component of the subjects’ background with 16.3 years average experience as a practicing accountant. Sixty-nine percent had 10 or more years experience in accounting related fields, with 4 subjects having 40 or more years of experience. Another 22% had between 5 and 10 years experience. Only 9% of the sample had less than five years experience. This experience was further demonstrated by the average length of time that the subjects held their current position (5.5 years). Thirty-three percent of the subjects had held their current position more than five years. The experience of the sample was also evident by the subjects’ position in the firm. Sixty-two percent held top or mid-level management positions with titles such as Corporate Controller, Corporate Vice President of Finance and Corporate Financial Officer. Thirty-two percent held positions at the individual contributor or first level management position. Six percent of the subjects were self-employed. In addition to being experienced in their profession, the subjects were well matched with the study characteristics. Sixty percent of the respondents received bonus compensation as part of their pay. The subjects were also familiar with the decision to write off inventory. Eighty two percent of the respondents indicated that sometime during their careers they had been involved in the decision to write off inventory, while 66% held positions that currently had input into the write-off decision-making process. 4.5. Overall Results Table 3 reports the number of respondents, the percent likelihood of inventory write-off, and the standard deviation for each experimental condition. When the Table 3. Average Likelihood (Standard Deviation) of Write-Off by Experimental Condition. Inventory Frame Income target met Income target not met Total Total Positive Negative 57.5 (4.12) n = 65 63.13 (3.54) n = 89 67.64 (3.92) n = 72 76.88 (3.50) n = 90 60.5 n = 154 73.1 n = 162 63.1 n = 137 69.9 n = 179 Earnings Management and Framing 111 income target was met, the probability of writing off inventory was 57.5% for the positive inventory frame compared to 67.64% for the negative frame. The average likelihood of write-off when the income target was met was 63.1%. In the condition where income targets were not met, respondents who received the positive inventory frame indicated that there was a 63.13% likelihood that they would write off inventory, where the negative frame indicated a 76.88% likelihood. The average likelihood of write-off when the income target was not met was 69.9%. The results can also be viewed from the inventory frame. For the positive frame, the average likelihood of inventory write-off was 60.5%. For the negative frame, the average likelihood of write-off was 73.1%. 4.6. Tests of the Hypotheses All hypotheses were tested using a 2 × 2 ANOVA and/or Kruskal–Wallis Multiple Comparison Test. The results of the ANOVA are shown in Table 4. 4.6.1. Earnings Management Hypothesis H1 predicts that when income is below target, managers would be more likely to take income decreasing discretionary actions to manage earnings. Overall, subjects reported a 67% likelihood that they would write off the inventory. However, significant differences were found between groups. The mean response is 73.1% for those respondents who were presented with a scenario that suggested that the budgeted net income would not be met and only 60.5% for those who would likely meet their profit targets (Table 4). These probabilities are significantly different ( p = 0.001). While a high percentage of subjects wrote off the inventory item, there were still significant differences between the income manipulation groups. This result strongly supports H1 and is consistent with previous empirical accounting research that suggests some managers manipulate earnings when they are below budgeted threshold levels (Healy, 1985). Table 4. ANOVA for Percent Likelihood of Write-Off. Variable DF Sum-Squares F-Ratio Prob > F Income Inventory Interaction Error 1 1 1 312 11875.56 3780.09 374.19 1091.31 10.88 3.46 0.34 0.0010 0.0627 0.5582 Total 315 357089.80 112 MARYBETH M. MURPHY AND JOANNE P. HEALY 4.6.2. Framing Hypothesis H2 hypothesizes that the way the information is framed will influence the write-off decision, i.e., that managers presented with negatively framed information about the inventory item will be more likely to write it off than those with positively framed information. Results provide marginal support for this hypothesis ( p = 0.0627). Participants who received negatively framed information responded that the likelihood of writing off inventory was 69.9%, while those receiving the positive frame would write off the item only 63.1% of the time. This suggests that information framing may have an impact on the discretionary decisions. These results also support previous research that framing can play a role in accounting decision making (Johnson et al., 1991; Lipe, 1993; O’Clock & Devine, 1995). 4.6.3. Interaction Hypotheses First, the data was tested to determine if an interaction existed between the variables. The results of the 2 × 2 ANOVA (Table 4) indicated that no interaction exists ( p = 0.5582). While this interaction was not significant, analysis of differences between respondent groups could provide further insight. To conduct this analysis, the responses were divided into four groups based on the experimental condition (refer to Table 5). The comparisons of the resulting values of these groups create the foundation for analysis of H3a to H3c. H3a predicts managers will be most likely to take income-decreasing action when relevant information is framed in a negative manner and when earnings before discretionary accruals are less than the lower threshold. The mean response for Cell D was 76.88% (Table 5), the highest of the four cells. A Kruskal–Wallis Multiple Comparison Test was run to determine if the results of Cell D were significantly different than each of the other cells in Table 5. The percent likelihood of inventory write-off in Cell D was larger and significantly different from each of the other cells. This indicates that the cumulative effects of INCOME and INVENTORY were responsible for the level of Cell D. H3a is accepted. H3b predicts that managers will be least likely to take income-decreasing action when relevant information is framed in a positive manner and when earnings before discretionary accruals are just above the lower threshold (Cell A). The mean value of Cell A is 57.5, the lowest of all cells. Based on the Kruskal–Wallis test, Cell A is significantly different from Cell D and marginally significantly different from Cell B. However, there is no significant difference between Cell A and Cell C. An interesting result is the difference in Cell B and Cell C (H3c). Respondents in Cell B would lose their bonus if they wrote off inventory. While, those in Cell C were below the threshold, thus their personal wealth would be unaffected by an inventory write-off. However, the probability of a write-off is 67.64% in Cell B and 63.13% in Cell C. The direction of the difference seems to suggest that when Positive Inventory, Positive Income (A) Negative Inventory, Positive Income (B) Positive Inventory, Negative Income (C) Negative Inventory, Negative Income (D) ∗ Medians Positive Inventory, Positive Income (A) Negative Inventory, Positive Income (B) Positive Inventory, Negative Income (C) Negative Inventory, Negative Income (D) 0.0000 1.9465** 0.8606 4.0379* 1.9465 0.0000 1.2152 2.0507* 0.8606 1.2152 0.0000 3.4575* 4.0379* 2.0507* 3.4575* 0.0000 Earnings Management and Framing Table 5. Kruskal–Wallis Multiple-Comparison z-Value Test. significantly different at 0.05 level if z-value > 1.96. marginally significantly different at 0.10 level if z-value is > 1.645. ∗∗ Medians 113 114 MARYBETH M. MURPHY AND JOANNE P. HEALY information is framed negatively, managers are more likely to write off inventory even when there would be an adverse effect on their income, than when information is framed positively and their income would be unaffected by their decision. However, since the Kruskal–Wallis z-value comparing the difference between the means of these two groups was not significant (z = 1.2152), it is impossible to support H3c. 4.7. Further Analyses of Results Further tests of the association between INCOME and INVENTORY and PROBWO were conducted using OLS regression analysis with bonus pay, firm type, number of certifications, and years of experience as control variables. The results were similar to the ANOVA previously presented – INCOME remained significant ( p = 0.002), and INVENTORY remained marginally significant ( p = 0.061). The years of experience was found to have a positive association with PROBWO and was the only control variable to approach significance ( p = 0.078). This may indicate that more experienced managers had a greater propensity to write off inventory and decrease earnings. 4.8. Discussion Previous earnings management research has been conducted empirically at the corporate level inferring managerial actions from financial data. This study examines the theory in a behavioral setting at a divisional or plant level. As predicted by the results of empirical research, earnings management appeared to have occurred. Plant managers in this scenario were more apt to write off inventory if its negative impact on income did not unfavorably affect their bonus. These subjects had already missed their income targets, and therefore their bonuses, so they risked little from a personal financial perspective by the write-off of the item. Subjects who were above budgeted levels (and still had a bonus at stake) were more cautious about their willingness to write off the inventory and therefore miss their income target and risk losing their bonus. The results strongly support the empirical research that suggests that some managers manipulate earnings by expensing costs in fiscal years where net income expectations are not realized. The impact of the inventory frame creates the potential for interesting research. Many might have suggested that the results of the test of earnings management would be a “given.” However, the probability of a write-off is actually higher when the information is framed negatively and the executive would lose his/her Earnings Management and Framing 115 bonus by taking the write-off (mean = 67.64), than when the information is framed positively and there is no chance the executive would receive a bonus (mean = 63.13). While the difference in results is not statistically significant and could be the result of random fluctuation, it does have interesting behavioral implications. The differences in information frames were designed to be subtle and not necessarily meant to mislead the reader. If these small changes in information presentation could yield differences in decision making in a situation where outcomes were more or less expected, how could the information frame in other less obvious decisions be impacted? This suggests that the impact of framing could possibly be important in other accounting decisions as well. Managers receive and communicate information about decisions every day. If the frame impacts the decision making in such a seemingly predictable decision as in the earnings management situation, it has the potential to impact other decisions. Are managers aware of the potential impact of framing on their decision making? What should they be alert to in the decision-making analysis? 5. CONTRIBUTIONS AND IMPLICATIONS FOR FURTHER STUDY This study makes some interesting contributions to the body of accounting research. First, it is a behavioral test of Healy’s bonus maximization theory. Previous work has examined the theory indirectly by empirically testing the relationship between various operating results. This study directly investigates the decision making of practicing accountants, many who are in a position to make decisions on the write-off of inventory. The scenario places them in a position to manage earnings by the write-off of inventory. Results show support for this theory. Second, the study examines the earnings management question in a managerial accounting setting at an operational level. Most previous studies have viewed earnings management from a company wide perspective. But earnings management could occur throughout the firm. Lower level managers could be manipulating earnings to achieve their established goals. This could be accomplished with or without their superiors’ knowledge and in line with or in opposition to firm goals. Earnings management at various levels of the firm could lead to overall corporate earnings management, but could also be performed in isolation. No previous research has tested this possibility. Additional research should be conducted in this area to confirm earnings management at the divisional level and to determine what motivates managers to make these types of decisions. Perhaps the segment reporting requirements will provide additional opportunities to explore earnings management at divisional levels. 116 MARYBETH M. MURPHY AND JOANNE P. HEALY Another interesting outcome was the effect of the frame on the write-off decision. It is interesting to note how something as simple and as indirect as the frame of the information presentation (inventory frame) could have a significant impact on results. These results raise the question of what other behavioral factors could influence managers’ decisions to manage earnings and provides a basis for future research into the effects of framing on discretionary managerial decisions. NOTES 1. Burgstahler and Dichev (1997), Wu (1997), Cahan et al. (1997), Rees et al. (1996), Healy (1996), Amir and Livnat (1996), Bernard and Skinner (1996), Dechow et al. (1996), Dechow et al. (1995) are just a few of the most recent examples. 2. Schipper (1989) states that although there is a potential incentive for earnings management at the divisional level, research in that area is “sparse to non-existent.” 3. White (1970) found no evidence of earnings management. 4. Watts and Zimmerman (1978) suggest that political costs and debt violations also affect managers’ motivations to manipulate earnings. These factors would most likely impact earnings management at the corporate level. The current research examines earnings management at the plant level, and does not explicitly test for these other factors. 5. Student members or members reporting their employment status as retired were excluded from the population. 6. Based on 1998 salaries and total compensation reported by Schroeder and Reichardt (1999). 7. ANOVA tests were conducted on the sample excluding those individuals who answered this question incorrectly. Results did not differ greatly from the entire test sample. The p-value for the variable INCOME was p = 0.0005 for this group and 0.0010 for the full sample; for INVENTORY p = 0.0671 and 0.0627 respectively. ACKNOWLEDGMENTS We would especially like to thank Elizabeth Cole, Tim Fogarty, Pete Poznanski, Ray Stephens and Linda Zucca for their helpful comments and the assistance and for the support received from the Institute of Management Accountants. We gratefully acknowledge the financial support received from the Research Council of Kent State University. REFERENCES Amir, E., & Livnat, J. (1996). Multiperiod analysis of adoption motives: The case of SFAS No. 106. The Accounting Review, 71(4), 539–553. Earnings Management and Framing 117 Ayers, S., & Kaplan, S. E. (1993). An examination of the effect of hypothesis framing on auditors’ information choices in an analytical task. Abacus, 29(2), 113–131. Barnea, A., Ronen, J., & Sadan, S. (1976). Classificatory smoothing of income with extraordinary items. The Accounting Review, 52(2), 110–122. Beeler, J. D., & Hunton, J. E. (2002). Contingent economic rents: Insidious threats to audit independence. Advances in Accounting Behavioral Research, 5, 21–50. Bernard, V. L., & Skinner, D. J. (1996). What motivates managers’ choice of discretionary accruals? Journal of Accounting and Economics, 22(1–3), 313–325. Burgstahler, D., & Dichev, I. (1997). Earnings management to avoid earnings decreases and losses. Journal of Accounting and Economics, 24(1), 99–126. Cahan, S. F., Chavis, B. M., & Elemendorf, R. G. (1997). Earnings management of chemical firms in response to political costs from environmental legislation. Journal of Accounting, Auditing & Finance, 12(1), 37–65. Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1995). Detecting earnings management. The Accounting Review, 70(2), 193–225. Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1996). Causes and consequences of earnings manipulations: An analysis of firms subject to enforcement actions by the SEC. Contemporary Accounting Research, 13(1), 1–36. Dillman, D. A. (1978). Mail and telephone surveys – The total design method. New York, NY: Wiley. Elliott, J. A., & Shaw, W. H. (1988). Write offs as accounting procedures to manage earnings. Journal of Accounting Research, 26(Suppl.), 91–119. Financial Accounting Standards Board (1992). Original pronouncements – accounting standards – Volume II. Norwalk, CT. Guidry, F., Leone, A. J., & Rock, S. (1999). Earnings-based bonus plans and earnings management by business unit managers. Journal of Accounting and Economics, 26(1–3), 113–142. Healy, P. M. (1985). The effect of bonus schemes on accounting decisions. Journal of Accounting & Economics, 7(1–3), 85–107. Healy, P. M. (1996). Discussion of a market-based evaluation of discretionary accrual models. Journal of Accounting Research, 34(3), 107–115. Hepworth, S. R. (1953). Smoothing periodic income. The Accounting Review (January), 32–39. Johnson, P. E., Jamal, K., & Berryman, R. G. (1991). Effects of framing on auditor decisions. Organizational Behavior and Human Decision Processes, 53(2), 75–105. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291. Kahneman D., & Tversky, A. (1984). Choices, values and frames. American Psychologist (April), 341–350. Levitt, A. (1998). The numbers game (September 28th). New York, NY: NYU Center for Law and Business. Lipe, M. G. (1993). Analyzing the variance investigation decision: The effects of outcomes, mental accounting and framing. The Accounting Review, 68(4), 748–764. McNichols, M., & Wilson, G. P. (1988). Evidence of earnings management from the provision for bad debts. Journal of Accounting Research, 26(Suppl.), 1–31. O’Clock, P., & Devine, K. (1995). An investigation of framing and firm size on the auditor’s going concern decision. Accounting and Business Research, 25(99), 197–201. Puto, C. P. (1987). The framing of buying decisions. Journal of Consumer Research, 14(3), 301–315. Rees, L., Gill, S., & Gore, R. (1996). An investigation of asset write-downs and concurrent abnormal accruals. Journal of Accounting Research, 34(3), 157–169. 118 MARYBETH M. MURPHY AND JOANNE P. HEALY Ronen, J., & Sadan, S. (1975). Classificatory smoothing: Alternative income models. Journal of Accounting Research, 3(4), 133–149. Rutledge, R. W. (1995). The ability to moderate recency effects through framing of management accounting information. Journal of Mathematical Economics, 11(2), 27–40. Schipper, K. (1989). Commentary on earnings management. Accounting Horizons, 3(4), 91–102. Schroeder, D., & Reichardt, K. (1999). IMA 98 Salary Guide. Strategic Finance, 8(20), 28–41. Shields, M. D., Solomon, I., & Waller, W. S. (1987). Effects of alternative sample space representation on the accuracy of auditor’s uncertainty judgments. Accounting, Organizations and Society, 12(4), 375–385. Walsh, P., Craig, R., & Clarke, F. (1991). Big bath accounting using extraordinary items adjustments: Australian empirical evidence. Journal of Business Finance and Accounting, 18(2), 173–189. Watts, R., & Zimmerman, J. (1978). Towards a positive theory of the determination of accounting standards. Accounting Review, 53(1), 112–134. White, G. E. (1970). Discretionary accounting decisions and income normalization. Journal of Accounting Research, 8(2), 260–273. Wilson, T. B. (2001). What’s hot and what’s not: Key trends in total compensation. Compensation & Beneﬁts Management, 17(2), 45–50. Wu, Y. W. (1997). Management buyouts and earnings management. Journal of Accounting, Auditing, and Finance, 12(4), 373–389. Zucca, L. J., & Campbell, D. R. (1992). A closer look at discretionary write-downs of impaired assets. Accounting Horizons, 6(3), 30–41. APPENDIX Scenario You are the plant accountant for a Cleveland area plant of the Spring Wire Company. The responsibilities of your position include the processing of payroll, payments to vendors (accounts payable), inventory accounting, preparation of budget and estimates, and analysis of actual plant operating results. All members of the plant staff (including yourself) are given a bonus contingent on achieving or exceeding the plant’s operating budgeted net income of $1,500,000. If the budgeted operating income is achieved, 0.5% of the current year’s net income will be paid to you in the form of a bonus. (e.g. if net income is $1,510,000, your bonus would be $7,550.) It is January 1, and you have received estimated net income for the year of $1,502,000 [$1,400,000]. In past years, these early results have proved to be accurate, with few unexpected adjustments made after this date. You have one last chance to review the status of your inventory that was taken on December 31st to determine if any potentially obsolete inventory items should be written off. You are presented with the following information from the Inventory Earnings Management and Framing 119 and Materials Manager (also a staff manager) concerning the inventory item in question. Part Number PX23415 is sold to computer manufacturers. It has a current inventory of 5,000 units on hand with a total current inventory value of $15,000. Your plant’s total inventory including Part Number PX23415 is $350,000. The demand for this product is 15% of last year’s demand [Industry sales of this product have demonstrated an 85% decline in both volume and dollar amounts in the last year]. The inventory turnover ratio for this item has declined substantially from the prior year. Of the original market for the product, about 20% of your competitors remain [Approximately 80% of your competitors in the market for this product have ceased production and sales]. No sales occurred during the months of November or December for your company. Because of the nature of this product, the potential for this part to be sold in the replacement parts market does not exist. (1) Please indicate the percent probability in your opinion that this inventory will be sold. (0–100%) (2) Please indicate the percent probability that you would write off Part Number PX23415 from inventory. (0–100%) For questions 3 through 5, place an “X” on the box that best indicates your opinion. (3) How risky do you feel it is for the inventory to remain on the books? Very Risky Not Risky (1) (2) (3) (4) (5) (6) (7) (4) Indicate on the scale below your perception of what is occurring in the marketplace to the demand for this part? Significantly Decreased No Change (1) (2) (3) (4) (5) (6) (7) (5) Would you consider the bonus an important component of your income? Very Important Not Important (1) (2) (3) (4) (5) (6) (7) (6) Did you achieve the operating budget prior to the inventory write-off decision? Yes or No THE EFFECTS OF INCENTIVE STRUCTURE AND GOAL DIFFICULTY ON TIME PLANNING DECISIONS WITHIN A BALANCED SCORECARD FRAMEWORK Brad Tuttle and Mark J. Ullrich ABSTRACT Recent innovations in management control systems, such as the Balanced Scorecard System, reﬂect today’s complex business environment by accounting for performance in multiple areas. When individuals must allocate their time between multiple areas that compete for their time, the manner in which incentives are structured is hypothesized to inﬂuence their decisions differently depending on goal difﬁculty. A decision-making experiment was conducted to test this proposition. When incentives were structured so that each area of the Balanced Scorecard is rewarded separately, challenging goals received more planned attention than easy or unattainable goals following previous ﬁndings. When incentives were structured so that goals in all areas must be achieved together, the inﬂuence of goal difﬁculty on the time planning decision diverges from previous ﬁndings such that areas having unattainable goals receive the same planned attention as areas having challenging goals. The results suggest that companies must consider how performance is rewarded within a Balanced Scorecard framework. Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 121–144 Copyright © 2003 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06006-X 121 122 BRAD TUTTLE AND MARK J. ULLRICH INTRODUCTION This study is motivated by today’s competitive business environment that requires individuals to give their attention to many areas, all of which compete for their time. Recent innovations in management accounting control systems, such as Kaplan and Norton’s (1992, 1996) Balanced Scorecard, reflect this situation and attempt to influence individuals to balance their time among multiple areas through the establishment of goals, incentives, and accounting systems. While a great deal of research has been conducted regarding the effects of incentives and goal difficulty in relation to a single task (cf. Bonner, Hastie, Sprinkle & Young, 2000; Camerer & Hogarth, 1999; Cameron & Pierce, 1994; Jenkins, Gupta, Mitra & Shaw, 1998; Wood & Locke, 1990), very little is known about the effects of these variables on behavior in relation to accomplishing multiple tasks (Ashford & Northcraft, in press; Locke & Latham, 1990) as addressed by the Balanced Scorecard. Research into the effects of incentives and goal difficulty on behavior within a Balanced Scorecard framework is needed for several reasons. Foremost is the fact that the kinds of incentive structures that are possible when multiple tasks are involved have received scant attention in the literature. For instance, incentives associated with a Balanced Scorecard can be structured so that rewards are received only after meeting the goals in all areas. Or, Balanced Scorecard areas can be decoupled so that rewards are provided after meeting goals associated with individual areas. Furthermore, achieving the goals in one area may be easy while it may be very challenging in another. The combinations of these possibilities add a level of complexity to the Balanced Scorecard environment that has received scant attention in the existing literature. For these reasons, Ashford and Northcraft (in press) call for more research into decision-making when multiple tasks compete for an individual’s time and attention. The use of the Balanced Scorecard as a management tool has increased the need for this research. Naylor, Pritchard and Illgen (1980) posit a theory, hereafter NPI theory, suggesting that when individuals are faced with multiple objectives, how they allocate their time among the areas that compete for their time is more important to achieving overall satisfactory results than the total amount of time spent working on all of their goals. This distinction has been termed, direction of effort versus level of effort (Blau, 1986, 1993). Because the many studies that examine goal difficulty and incentives typically use only a single goal and single task, they address only level of effort. The effects of incentives and goal difficulty on direction of effort remain largely unexplored.1 As a practical issue, organizations would benefit from a better understanding of how incentives and goal difficulty interact to influence how individuals expect to use their time among their areas of responsibility. The Balanced Scorecard The Effects of Incentive Structure and Goal Difﬁculty 123 System (Kaplan & Norton, 1992, 1996) is based on the premise that overall performance is improved when goals in all areas are reached together. Failure on one dimension cannot be completely compensated by success in others. Conceptually then, organizations may desire to reward individuals only when they achieve satisfactory performance in all of the Balanced Scorecard areas. One finding of goal research, however, is that while challenging goals generally motivate more effort than easy goals (Wood & Locke, 1990), unattainable goals often do not and can sometimes have large negative consequences (Fatseas & Hirst, 1992; Lee, Locke & Phan, 1997; Mowen, Middlemist & Luther, 1981; Wright, 1992). This being the case, basing rewards on areas coupled together via a comprehensive control system may produce unintended consequences when information suggests that goals in one or more areas are unattainable. Research is clearly needed to answer these types of practical questions. A theoretical justification for this study is that, for Balanced Scorecard systems to work, they must affect the plans of individuals. Without premeditated, goaldirected planning, individuals do not control their environments but are controlled by them. This notion is consistent with the idea that closely related constructs like goal commitment, goal motivation, and intentions affect goal-related performance (Locke, Latham & Erez, 1988). If the Balanced Scorecard does not motivate individuals sufficiently to alter their plans about where they will spend their time, arguing that they are committed to it is difficult (cf. Naylor & Illgen, 1984, p. 98) or achieving its objectives is unlikely. Hence, this study builds on the theoretical foundation from prior studies by looking at the time planning decisions of the subjects. Studies that examine the effects of planning and intentions on performance generally conclude that these variables have a stronger effect than most other variables. For instance, Chesney and Locke (1991) find that identifying an appropriate strategy for completing a complex task in the initial planning stage has a greater effect on performance than does goal difficulty. Early, Wojnaroski and Prest (1987) find that planning is positively associated with performance in both the laboratory and the field. In a study by Cotton and Tuttle (1986), intentions predicted subsequent behavior more reliably than any other variable they identified in the literature. McAllister, Mitchell and Beach (1979) find that individuals who planned to spend more time on a task actually did spend more time on it and thus conclude that intentions are positively related to performance. Also from a theoretical viewpoint, this research extends the findings of many goal studies that employ tasks having a production-line orientation to a context that more closely resembles those encountered by individuals in management roles. Managers operate in environments that inherently place many demands on their 124 BRAD TUTTLE AND MARK J. ULLRICH time at once. Although prior goal research has examined complex as well as simple tasks (Chesney & Locke, 1991; Wood & Locke, 1990; Wood, Mento & Locke, 1987), subjects have typically worked towards only a single goal. Settings characterized by single objectives are more characteristic of unskilled or process-oriented jobs – not management level positions. On the other hand, the task of allocating one’s individual time and attention between various demands is highly consonant with what managers do. That is, a manager’s time is his most valuable and scarce resource and how that resource is allocated likely makes the most difference to what gets accomplished (Miodonski, 1999; Plack, 2000). Few studies have looked into factors that influence time allocation between tasks in a managerial context. Investigating the difficulty of the goal is also important in a Balanced Scorecard framework. Information about goal difficulty is an integral, if not a necessary, component to the successful achievement of most important goals (Wood & Locke, 1990) and is a major rationale for the existence of the Balanced Scorecard. Simply put, having a “goal” without also having the ability to assess one’s position relative to it, is not much of a goal. Notwithstanding, only a very small portion of the goal literature examines behavior in a setting where information about the level of goal difficulty in one area permits subjects to shift their time to or from other relevant, work-related areas. Yet, this is exactly what is possible within a Balanced Scorecard system. BACKGROUND AND HYPOTHESES Organizations set goals with the purpose of influencing their members to spend their time on the areas that are deemed important to the organization. Considerable theory, as well as anecdotal evidence from the field, suggests that goals influence the way individuals allocate their time. Gollwitzer and Bargh’s (1996) action phases model stresses planning, including how much time to allocate, as an antecedent of goal-directed effort. Ajzen’s (1987) theory of planned behavior explicitly identifies intentions as leading to purposeful, goal-directed behavior (Ajzen & Madden, 1986). In the popular literature, Stephen R. Covey’s (1989) The Seven Habits of Highly Effective People stresses the importance of planning one’s time to achieve specific goals. His time management planning forms explicitly ask individuals to clearly define and record their goals directly on the time forms. Interestingly, Covey asserts that multiple areas should be addressed and that individuals should think about how much time they are willing to devote to each one. Two generally accepted findings of goal research are that challenging goals are more motivating than easy goals, and that impossible goals are often rejected The Effects of Incentive Structure and Goal Difﬁculty 125 and, therefore, are less motivating than easy or challenging goals (Fatseas & Hirst, 1992; Lee, Locke & Phan, 1997; Mowen et al., 1981; Wright, 1992). Erez, Gopher and Arzi (1990) partially extend these conclusions to multiple tasks and find that proportionately more attention is allocated to more difficult tasks. To the extent that these findings generalize to time planning decisions by individuals, they suggest that individuals will plan to spend more time on challenging goals and less time on easy or unattainable goals. Incentive Structure and Goal Difﬁculty Bonner, Hastie, Sprinkle and Young (2000) refer to incentives, within the context of a management control system, as the presence or absence of motivators linked to performance. They differentiate incentives from incentive type, which refers to how pay is tied to performance and provide the following major categories: flat rate, piece rate, variable ratio, quota (or goal), and tournament. As such, incentive type refers to how incentives are tied to performance that is generally associated with a single task. This differs from incentive structure, which is used in this paper to refer to the way incentives are structured between tasks as in the multiple areas of the Balanced Scorecard. An incentive structure may consist of just one incentive type or of multiple incentive types across various performance measures associated with different tasks and goals. Organizations often implement monetary incentives to motivate goal congruent behavior. These incentives are designed to motivate individuals to increase their goal-related effort by making the goal more attractive to attain (Vroom, 1964), by reinforcing performance (Komaki, Coombs & Schepman, 1996), by motivating individuals to set more or higher goals (Wright, 1991) or by increasing the acceptance of difficult goals (Locke, Latham & Erez, 1988). Given the importance of time planning to goal accomplishment, incentives should motivate individuals to plan sufficient time to meet their goals. Evidence shows that performance-based incentives increase the amount of time individuals spend on a task (Awasthi & Pratt, 1990; Libby & Lipe, 1992; Sprinkle, 2000; Stone & Zeibart, 1995; Tuttle & Burton, 1999; Tuttle & Harrell, 2001). Some research, however, suggests that the relationship between incentives and behavior is not direct but is contingent upon the type of incentive being offered (Bonner et al., 2000) and the difficulty of the goal (Wright, 1991). Using NPI theory as a framework, Wright (1991) suggests that goal difficulty and the structure of incentives interact to determine effort. Wright argues that incentives will have a negative effect when effort is costly and does not result in extrinsic rewards. To illustrate, consider the case where an individual is paid 126 BRAD TUTTLE AND MARK J. ULLRICH a salary with the possibility of earning a bonus if a high level of customer satisfaction is achieved. Under these conditions, individuals who just miss their goal, and thus receive no bonus, may feel punished. These individuals may be less motivated than had they been provided the same goal but no monetary incentive. Mowen et al. (1981) and Fatseas and Hirst (1992) find that goal-contingent incentives produce significantly less performance than either piece rate or fixed-pay incentives. Wright (1991) concludes that reward contingency (i.e. the way in which incentives are tied to performance) may have a direct effect on an individual’s personal goals, commitment to assigned goals and performance. When individuals are faced with demands on their time in multiple areas, the manner in which incentives are structured among those areas is likely to interact with goal difficulty on how they plan their time. To illustrate, consider first an incentive structure in which each individual area of the Balanced Scorecard is associated with its own monetary incentive and in which individuals can adjust their time both within an area and also between areas. A manager in this situation would not feel a need to allocate a lot of time to sure winners to achieve results and can be expected to reallocate their efforts to goals in areas that are achievable but challenging – thus increasing their overall expectation of remuneration. In this case, the negative effects predicted by Wright (1991) are isolated to areas with unattainable goals and are likely to be very strong since other Balanced Scorecard areas compete for the manager’s attention. The manager is predicted to shift his/her time to achieving challenging but attainable goals in other areas. Now consider an alternative incentive structure in which a monetary incentive is received only if the goals associated with all areas are reached. In this case, as in the first, easy goals may not need much attention; and some of the manager’s time may be shifted toward more challenging areas. However, the presence of an unattainable goal eliminates the chance to receive the monetary incentive regardless of performance in other areas. Given the arguments by Wright (1991) based on NPI theory and the results of prior studies, we predict that this situation will have negative motivational effects on all areas, not just on the areas in which satisfactory performance is unattainable. Both situations suggest that shift in attention will be greater when incentives are based on goal attainment in each area separately than when incentives are based on goal attainment for all areas as a set. Often, goals are set to be challenging (but attainable) from the outset. However, environmental factors, such as changes in competition, can cause goal difficulty to change midstream. Likewise, the manager may initially spend too much or too little time in a particular area so that ultimately achieving satisfactory performance in that area is almost certain or very unlikely. Under these conditions, sustaining the same attention to these areas is counterproductive and adjustments are necessary. For this reason, organizations develop accounting systems that The Effects of Incentive Structure and Goal Difﬁculty 127 provide continuous information to individuals regarding goal difficulty (Anthony & Govindarajan, 1998). Thus, information produced by the Balanced Scorecard is intended to help individuals adjust their plans for achieving goals (Wood & Locke, 1990). However, the previous findings should also apply to information regarding goal challenge as communicated by the Balanced Scorecard. Thus, regarding managers’ time planning decisions the following hypotheses are proposed: H1. Managers will shift time from Balanced Scorecard areas that information indicates are easy to areas that information indicates are challenging. H2. Managers will shift time from Balanced Scorecard areas that information indicates are unattainable to areas that information indicates are challenging. H3. Shifts in time from Balanced Scorecard areas that information indicates are easy to areas that information indicates are challenging will be greater when Fig. 1. Predicted U-Shaped Functions. 128 BRAD TUTTLE AND MARK J. ULLRICH incentives are based on goal attainment in each area separately than when incentives are based on goal attainment for all areas as a set. H4. Shifts in time from Balanced Scorecard areas that information indicates are unattainable to areas that information indicates are challenging will be greater when incentives are based on goal attainment in each area separately than when incentives are based on goal attainment for all areas as a set. Note that H1 and H2 combine to suggest that a manager’s time allocation will follow an inverted U-shaped function in relation to goal difficulty for a single Balanced Scorecard area. Furthermore, as a result of shifting time to and from competing areas, the time allocated to these other areas will resemble a righted U-shaped function in relation to the goal difficulty of the single target area (holding goal difficulty constant for the other competing areas). Figure 1 expresses this relationship and is consistent with several models of motivation beginning as early as Atkinson (1958). H3 and H4 imply that both U-shaped functions will be flatter when incentives are provided only when goals are achieved in all areas in comparison to when incentives are based on goal attainment for each area of the Balanced Scorecard individually. METHOD Experimental Design A decision-making experiment was conducted in which participants were randomly assigned to one of six cells in a 2 × 3 design. Incentive structure was manipulated between subjects at two levels by making the likelihood of promotion and receiving a 20% bonus contingent upon achieving goals in four Balanced Scorecard areas either: (1) individually; or (2) as a set. Goal difficulty was manipulated between subjects at three levels as being: (1) easy; (2) challenging but attainable; or (3) not attainable. Decision Task and Materials As shown in the Appendix, all participants were projected into the role of a unit-level manager who was to plan his/her time among four areas corresponding to a typical Balanced Scorecard: Customer, Financial, Internal Business, and Learning & Growth. Example performance measures for each area were presented along with information about goal difficulty. These four goals and associated performance measures were derived from the Balanced Scorecard used by Mobil The Effects of Incentive Structure and Goal Difﬁculty 129 Oil’s domestic marketing and oil refining division (Kaplan, 1997a, b). All subjects received the same four areas and goals. The participants were informed that they were being considered for a promotion and that the corporation offered a performance bonus of up to 20% of their salary, both of which were linked to their goals. About half of the participants were informed that the likelihood of promotion and bonus depended upon “how many goals you achieve” while the remaining participants were informed that their promotion and bonus depended upon “achieving all four goals together.” This constituted the incentive structure manipulation with one group’s bonus based on achieving goals in individual areas and the other group’s bonus based on achieving goals in the entire set of Balanced Scorecard areas. Thus, all subjects were provided with a possibility to achieve the same reward; only the manner in which the incentive was structured varied between groups. Goal difficulty was manipulated for the Customer area by providing the subjects with “reliable feedback suggesting that the Customer goal is” easily attainable, challenging but attainable, or not attainable, depending on their experimental condition. This resulted in three conditions: Easy, Challenging, and Unattainable. Goal difficulty was held constant for the other three areas (Financial, Internal Business, and Learning & Growth) at a “challenging but attainable level” for all participants. All participants were informed that they could work as many hours per week as they wished and that they were free to allocate their work hours as they saw fit except that they must spend 15 hours per week on tasks unrelated to their four goals.2 The rest of their time at work was to be devoted to achieving the goals in the four areas. The participants were then asked how they would allocate their hours at work to achieve the goals in each of the four Balanced Scorecard areas. Thus, the hours-per-week the subjects intended to work were collected for each goal resulting in planned time to spend on Customer, Financial, Internal Business, and Learning & Growth goals. The sum of these four responses is the total goal related hours per week. To measure the relative amounts of time allocated to achieving the various goals, the difference in time allocated to the (manipulated) customer area and the average time allocated to the three other areas was computed. Positive numbers reflect more time to the customer goal in relation to the average time allocated to the other three goals. Negative numbers reflect more time to the three competing goals, on average, than to the customer goal. Hence, this measure reflects the relative emphasis that the subjects placed on the manipulated goal in comparison to other challenging goals that are competing for their time.3 After the dependent measures were collected, subjects responded to a goal difficulty manipulation check in which they selected the goal difficulty information that they received in the case from among the three possibilities. Likewise, in order to check the incentive manipulation, the participants selected the incentive 130 BRAD TUTTLE AND MARK J. ULLRICH manipulation statement that they received in the case. Next, the participants were asked two questions regarding the valance of the incentives and their effort-to-performance expectancy. The valance question asked how attractive the bonus and promotion was using a nine point Likert scale anchored by “1 = very unattractive” and “9 = very attractive.” The effort-to-performance expectancy question asked the subjects to rate how likely they would be to accomplish all four goals if they exerted maximum effort using a nine point Likert scale anchored by “1 = very unlikely” and “9 = very likely.”4 Finally, the participants were asked to provide demographic information. The data were gathered during regularly scheduled classes. Participation was voluntary and anonymous and the experiment took about 15 minutes to complete.5 Participants One hundred and ninety-three Professional MBA students participated in the study. Of these, 10 provided incomplete responses and 18 failed one or both manipulation checks and were deleted leaving 165 usable responses. Because the professional MBA requires an undergraduate degree in business, work experience, and is a 12-month program, the subjects were quite homogenous and well qualified to perform the task. About 26% of the participants were women. The typical participant was 30.3 years old with 7.8 years of work experience. On average, the participants had supervised a maximum of 22 people. The participants tended to agree with the statement, “My advancement and/or compensation at work is contingent upon achieving a goal or goals . . .” in the same four areas used in the experimental materials. On a scale of 1 = disagree and 7 = agree, the mean response for this question is 4.7 for the customer area, 4.9 for the financial area, 5.3 for the internal business process area, and 4.9 for the learning and growth area. Approximately 42% of the subjects stated that they are paid a bonus in addition to their salary. These data suggest that the subjects have had exposure to the kinds of goals, incentives and issues in the decision task and that the participants were, as a result of their education and work experience, capable of providing meaningful responses. RESULTS Preliminary Analysis The demographic variables were tested to determine if systematic differences exist across cells. Chi-square tests show no significant differences across treatment The Effects of Incentive Structure and Goal Difﬁculty 131 conditions (all p-values > 0.10) for any of the three categorical demographic variables: gender, educational degree program, and current compensation plan. Separate 2 × 3 ANOVAs were conducted for each continuous demographic variable: the number of years of work experience, the maximum number of individuals supervised, and whether compensation at work was contingent upon achieving a goal or goals in each of the four business areas. Incentive structure at two levels and goal difficulty at three levels served as the independent variables. The ANOVA results show no significant differences (all p > 0.10) for any continuous demographic variable across cells. Thus, results from the analysis of the demographic data suggest that randomization was effective and that the subjects are homogenous across treatment conditions. The attractiveness of the incentives and the expectancy of accomplishing the goals in all four areas (given maximum effort) should not differ between incentive structures. The data support this proposition in that the attractiveness of the incentives based on each area separately (mean = 8.11) does not differ from the attractiveness of the incentives based on achieving the goals in all areas (mean = 8.28, t = 1.0441, p = 0.2980). Incentive structure was not predicted to affect goal challenge but to interact with goal challenge to affect motivation. Consistent with this notion, the expectancy of accomplishing the goals in all four areas when incentives were based on each area (mean = 6.94) does not differ significantly from when incentives were based on all areas (mean = 7.24, t = 0.8874, p = 0.3762). The expectancy of accomplishing all goals, however, should differ by goal difficulty so that the expectancy should decrease with goal difficulty. The results generally support this proposition in that easy and challenging goals (means 7.84 and 7.81, respectively) are seen as more likely to be accomplished (p = 0.0001) than unattainable goals (mean = 5.78). The total amount of time the subjects planned to work in a week, as shown in Table 1, was not affected by incentives or goal difficulty. The finding that subjects do not adjust their workweek for incentives or goal difficulty is consistent with Naylor, Pritchard and Illgen (1980) who assert that total work effort is stable across most conditions other than those associated with individual differences. Hypotheses Testing The first hypothesis predicts that individuals will shift time from goals that information indicates are easy to goals that information indicates are challenging. Recall that goal difficulty was manipulated only for the customer goal and that goal difficulty was held constant (i.e. challenging but attainable) for the other 132 BRAD TUTTLE AND MARK J. ULLRICH Table 1. Total Planned Hours Summed Across All Four Areas. Panel A: ANOVA Source df F p 2 0.14 0.8714 1 2 1.75 0.74 0.1882 0.4771 5 159 0.66 0.6521 Goal difficulty (Easy vs. Challenging vs. Unattainable) Incentive structure (Separate vs. Set) Goal difficulty × Incentive structure Model Error Panel B: Mean (Standard Deviation), Cell Size Goal Difficulty Easy Challenging Incentive average Incentive Structure Each Goal Evaluated Separately All Goals Evaluated As a Set 38.73 (7.12) 30 40.68 (7.51) 25 37.74 (6.57) 27 37.23 (7.04) 26 38.79 (6.21) 28 38.69 (9.12) 29 three goals. To measure the relative amounts of time allocated to achieving the various goals, the difference in time allocated to the (manipulated) customer area and the average time allocated to the three other areas was computed. The hypothesis predicts that the difference in time allocations should be larger (more positive) when the customer goal is challenging compared to when it is easy. The hypothesis was tested using a 2 × 2 ANOVA with the difference in time allocated between the customer goal and the average of the other three goals as the dependent measure. Goal difficulty (easy versus challenging but attainable) and incentive structure (separate versus set) served as the independent variables. As can be seen from Panel A in Table 2, the main effect for goal difficulty is highly significant (F = 33.82, p = 0.0001). When the customer goal is challenging, then all four goals are challenging. In the situation where all goals are challenging, Panel B of Table 2 shows that the subjects allocated more time to the customer area than to the other areas (mean difference = +1.77 hours) possibly reflecting a bias towards taking care of customers or a belief that this area requires a greater The Effects of Incentive Structure and Goal Difﬁculty 133 Table 2. Differencea in Time Allocated to Balanced Scorecard Areas Having Easy or Challenging Customer Goals and Other Areas. Panel A: ANOVA Source Goal difficulty (Easy vs. Challenging) Incentive structure (Separate vs. Set) Goal difficulty × Incentive structure Model Error df F p 1 1 1 33.82 2.32 0.01 0.0001 0.1304 0.9243 3 104 12.19 0.0001 Panel B: Means Goal Difficulty Incentive Structure Average Each Goal Evaluated All Goals Evaluated Separately As a Set Easy Challenging −2.51 2.44 −3.69 1.10 Incentive average −0.04 −1.29 −3.10 1.77 a The difference is calculated as the time allocated to the customer area less the average time allocated to other areas so that positive numbers reflect more relative time spent in the manipulated customer area. time commitment. In contrast, the subjects shift the time they plan to spend accomplishing the three other challenging goals (mean difference = −3.10 hours) when the customer goal is easy. Hypothesis 1 is strongly supported. The second hypothesis predicts that individuals will shift time from areas that information indicates the goals are unattainable to areas that information indicates the goals are challenging. The second hypothesis was tested in a like manner to H1 using a 2 × 2 ANOVA with the difference in time allocated between the customer area and the average of the other three areas as the dependent measure. For this test, goal difficulty (challenging versus unattainable) and incentive structure (separate versus set) served as the independent variables. As can be seen from Panel A in Table 3, the main effect for goal difficulty is highly significant (F = 10.14, p = 0.0019). This result is modified, however, by a significant goal difficulty by incentive interaction as discussed below. Overall results for H1 and H2 support the prediction that subjects shift the time they are willing to spend from one area of responsibility to another due to goal difficulty as described in Fig. 1. Figure 2 shows the results from the study 134 BRAD TUTTLE AND MARK J. ULLRICH Table 3. Differencea in Time Allocated to Balanced Scorecard Areas Having Challenging or Unattainable Customer Goals and Other Areas. Panel A: ANOVA Source Goal difficulty (Challenging vs. Unattainable) Incentive structure (Separate vs. Set) Goal difficulty × Incentive structure Model Error df F p 1 10.14 0.0019 1 1 2.96 8.93 0.0882 0.0035 3 104 7.44 0.0001 Incentive Structure Average Panel B: Means Goal Difficulty Each Goal Evaluated Separately All Goals Evaluated As a Set Challenging Unattainable 2.44 −4.07 1.10 0.90 Incentive average −0.82 1.00 1.77 −1.59 a The difference is calculated as the time allocated to the customer area less the average time allocated to other areas so that positive numbers reflect more relative time spent in the manipulated customer area. in the same graphic form as Fig. 1. Recall that the information indicated that all non-customer goals (i.e. goals from competing areas) are challenging. As can be seen, individuals react to goal difficulty information by shifting their time from areas associated with easy goals to those associated with challenging goals, and from unattainable goals to challenging goals in a manner that supports our overall prediction. H3 and H4 suggest that incentive structure modifies the relationship between goal difficulty and planned time leading to the prediction that the interaction terms reported in Tables 1 and 2 should be significant. As can be seen in Panel A of Table 2, incentive structure does not interact with goal difficulty (F = 0.01, p = 0.9243) thus failing to support H3. Hence, the data do not suggest that incentive structure modifies the amount of time subjects plan to spend on Balanced Scorecard areas associated with easy versus challenging goals. Panel B of Table 3 shows that when incentives are based on each goal separately, information indicating that the customer goal is unattainable caused individuals The Effects of Incentive Structure and Goal Difﬁculty 135 Fig. 2. Observed U-Shaped Functions. to allocate more time to achieving competing goals (mean difference = −4.07). This is compared to when information indicates that the customer goal is challenging but attainable (mean difference = +2.44). These two conditions differ significantly (p = 0.0001). However, the relative time allocated to the customer area and its competing areas does not differ between the unattainable (mean difference = +0.90) and challenging (mean difference = +1.10) conditions when incentives are based on attaining all goals as a set (p = 0.8894). This suggests that incentive structure can modify the effects of goal difficulty, thus supporting H4 while qualifying H2. Hence, the data suggest that incentive structure modifies the amount of time subjects plan to spend in each Balanced Scorecard area when 136 BRAD TUTTLE AND MARK J. ULLRICH those areas differ in terms of whether their associated goals are unattainable versus challenging or easy. Supplemental Analysis Predicting separate differential effects for the manipulations on the time allocated to each of the three non-customer goals is not possible. Nevertheless, in the spirit of the study’s main premise that individuals consider all areas of the Balanced Scorecard together as they plan their time, supplemental analysis of these data is reported. Panel A of Table 4 shows the results of a MANOVA in which hours allocated to the Financial goal, the Internal Business goal, and the Learning and Growth goal are dependent variables with goal constituting a within subject variable. Customer goal difficulty at three levels (easy, challenging, and unattainable) and incentive structure at two levels (separate versus set) served as the independent variables. As can be seen from the table, the analysis shows a three-way interaction between goal, customer goal difficulty, and incentive structure (F 318,4 = 2.44, p = 0.0468) making the interpretation of other effects difficult. Some insights into the interaction of these variables are possible by examining the mean hours allocated towards attaining each of the three non-customer goals, as well as hours allocated to the customer goal, as shown in Panel B of Table 4. Consider first the case in which incentives are based on achieving each goal separately. Here, the time allocated to the customer (manipulated) goal follows the predicted inverted-U shaped pattern (Fig. 1) and the time allocated to each competing goal generally follows the predicted righted-U shaped pattern. As hours are shifted to and from the customer goal according to its difficulty, the hours are spread relatively consistently across the three competing goals. In contrast, consider the case in which incentives are based on achieving all goals as a set. Here, no perceptible difference in time allocation occurs between the challenging and unattainable conditions across any of the four goals. That is, when the customer goal is challenging, the subjects allocated 10.1 hours to this goal and the like figure, 10.3 hours, when the goal is unattainable. When the customer goal is challenging versus unattainable, hours allocated to the other three goals correspond closely as well: financial goal = 8.6 and 9.6; internal business goal = 9.9 and 9.9; and learning and growth goal = 8.6 and 8.9, respectively. These observations suggest that the pattern of results shown in Fig. 2 is driven by the condition in which incentives reward each goal separately – a conclusion that is consistent with H4. This incentive structure more closely resembles those used in the prior studies upon which the predictions were based (in contrast to incentives in which rewards are received only after achieving an entire set of distinct, competing goals). The Effects of Incentive Structure and Goal Difﬁculty Table 4. Time Allocated to Balanced Scorecard Areas Other than the Customer Area. Panel A: MANOVA Source df F p Between subjects effects Customer goal difficulty Incentive structure (Separate vs. Set) Customer goal difficulty × Incentive structure 2 1 2 3.03 2.71 1.20 0.0513 0.1015 0.3040 Within subjects effects Goal Goal × Customer goal difficulty Goal × Incentive structure Goal × Customer goal difficulty × Incentive structure 2 4 2 4 6.18 2.11 0.71 2.44 0.0023 0.0794 0.4940 0.0468 Panel B: Mean Hours Allocated to Balanced Scorecard Areas Customer Area Easy Challenging Unattainable Average Incentive Based on Attaining Each Goal Separately Incentive Based on Attaining All Goals as a Set Customer Financial Internal Business Learning & Growth Customer Financial Internal Business Learning & Growth 7.8 12.0 6.6 11.5 10.5 9.6 10.5 9.9 11.3 9.0 8.4 11.3 6.7 10.1 10.3 10.4 8.6 9.6 11.2 9.9 9.9 9.5 8.6 8.9 8.7 10.5 10.5 9.5 9.1 9.5 10.3 9.0 137 138 BRAD TUTTLE AND MARK J. ULLRICH DISCUSSION Some strengths and limitations to the study should be mentioned before discussing its findings. The study was conducted in the laboratory using a written exercise designed to capture the essentials of managers’ time allocation decisions. As such, care should be taken when extrapolating the results to other contexts and situations. On the other hand, the study employs a strong design that contributes to its internal validity and allows us to examine the proposed causal relationships. It also benefits from a high level of experimenter control and uses a task that corresponds more closely to the kinds of tasks performed by managers than many of the previous goal studies. Furthermore, the materials that the subjects used are based on the Balanced Scorecard of an actual company. These factors increase the study’s external validity. This is one of a very few studies to examine the effects of goal difficulty and the effects of incentive structure in a Balanced Scorecard context where multiple demands vie for the subjects’ time. Based on Naylor, Pritchard and Illgen’s (1980) NPI theory, we predicted that subjects would shift their time between areas based on the goal difficulty information they received and as influenced by their incentives. These predictions were generally supported. Although we found considerable support that incentive structure and goal difficulty affect how individuals allocate their time between areas, we found no evidence that either influences the total amount of time the subjects said they would work to achieve satisfactory results in all the Balanced Scorecard areas. This also supports NPI theory’s assumption that in a work-related situation, people do not change their total level of effort except under very unusual situations. Rather, individuals shift their time from easy goals to more challenging goals and from unattainable goals to challenging goals. These findings suggest that organizations should consider incentives and management control variables such as goal difficulty information as ways to change or refocus individuals’ time and not as ways to induce more effort. This has implications for the kinds of effort attributions that are sometimes made during performance evaluation. Evaluators should be careful about attributing negative performance to a lack of effort unless they have first ruled out misdirected effort. It also highlights the importance of receiving timely and accurate information in order for individuals to appropriately direct their time. The findings also imply that individuals are sensitive to variables that are under organizational control and which are susceptible to manipulation. Organizations could improve appropriate goal directed behavior by making sure that their incentives and reporting systems focus individuals’ time on their important organizational goals. One of the major insights of the study is that individuals react differently to goal difficulty under different incentive structures. When a particular goal is The Effects of Incentive Structure and Goal Difﬁculty 139 easy compared to challenging, the time allocated to achieving goals that are competing for the manager’s attention does not differ according to incentive structure. However, when information indicates that one goal is unattainable, the incentive structure makes a substantial difference. When monetary incentives are based on the extent to which the subjects met each goal individually, the subjects shifted approximately 6.51 hours from the area with unattainable goals to alternative areas. However, when the monetary incentives are provided only upon achieving the goals in all areas, the subjects did not plan to shift any hours from the area with unattainable goals to alternative areas. Envisioning situations in which either result is desirable is certainly possible. If goals are somewhat arbitrary, in that just missing a goal is still beneficial, then basing rewards on individual goal achievement could be counter productive. Once missing a goal becomes obvious, individuals will dramatically decrease their planned effort in that area and redirect it towards meeting challenging but still attainable goals. On the other hand, there are situations in which organizations want to discourage individuals from working on unattainable goals. In this case, they should base monetary rewards on attaining individual goals rather than all goals. We note that individuals will plan to spend time working on a goal despite receiving reliable information that the goal is unattainable. This suggests that individuals consider more than goal difficulty when planning their time. For instance, individuals may continue to be psychologically committed to goals that they have previously accepted despite receiving negative goal difficulty information. In addition, they may feel a need to justify their actions and believe that missing a goal is easier to justify if effort has been expended than if one quits altogether. They may also wish to come as close as possible to achieving the goal in order to preserve their reputations as best they can – coming close may not be viewed as badly as being way off the mark. Also, individuals know that in most cases, goal achievement in future periods is tied to the level of effort exerted this period. Hence, they may be reluctant to completely cease working on an unattainable goal in order to avoid beginning in a hopeless situation the next period. These conjectures are fruitful topics for future research. Together, the findings from the study strongly suggest that when multiple areas compete for attention, as in the Balanced Scorecard, the way incentives are structured influences how individuals plan their time between areas rather than their total level of effort. We have argued that planning one’s time to be successful in multiple areas is a crucial aspect of what individuals, and particularly managers, do. For these reasons, this study represents an important contribution to knowledge about ways incentives can be structured in a Balanced Scorecard framework to help organizations achieve their goals. Hopefully others will find the approach taken by this study useful in examining these issues. 140 BRAD TUTTLE AND MARK J. ULLRICH NOTES 1. Effort includes both time and intensity components, however, Larson and Callahan (1990) argue that individuals are more likely to differentially allocate their time than vary their intensity between tasks. They argue that individuals “groove in” to an overall level of intensity, which they strive to maintain over time. 2. In pretests, subjects were concerned about duties other than those directly tied to Balanced Scorecard areas. Inclusion of the 15 hours per week on tasks unrelated to their four goals controls for differences in the amount of time that subjects would otherwise have assumed needed to be spent on these tasks. 3. One reviewer suggested analyzing the data using proportions rather than difference scores. The results are equivalent using either method (cf. Tuttle & Harrell, 2001). 4. The manipulation checks were presented with the original case materials and asked the subjects not to look back. A stronger test would have been to administer the post-experimental materials separately from the case. 5. A small number of students, which we did not count, chose not to participate. No monetary incentive was provided. ACKNOWLEDGMENTS The authors would like to thank workshop participants at the University of Utah and the University of South Carolina for their helpful comments. REFERENCES Ajzen, I. (1987). Attitudes, traits, and actions: Dispositional prediction of behavior in personality and social psychology. In: L. Berkowitz (Ed.), Advances in Experimental Social Psychology (Vol. 20, pp. 1–63). San Diego, CA: Academic Press. Ajzen, I., & Madden, T. J. (1986). Prediction of goal-directed behavior: Attitudes, intentions, and perceived behavioral control. Journal of Experimental Social Psychology, 22(5), 453–474. Anthony, R., & Govindarajan, V. (1998). Management control systems. Homewood, IL: Irwin/McGrawHill. Ashford, & Northcraft, G. (2002). Robbing Peter to pay Paul: Feedback environments and enacted priorities in response to competing task demands. Human Resource Management Review, forthcoming. Atkinson, J. W. (1958). Motives in fantasy, action, and society: A method of assessment and study. Princeton, NJ: Van Nostrand. Awasthi, V., & Pratt, J. (1990). The effects of monetary incentives on effort and decision performance: The role of cognitive characteristics. The Accounting Review, 65(4), 797–811. Blau, G. (1986). The relationship of management level to effort level, direction of effort, and managerial performance. Journal of Vocational Behavior, 29, 226–239. The Effects of Incentive Structure and Goal Difﬁculty 141 Blau, G. (1993). Operationalizing direction and level of effort and testing their relationship to individual job performance. Organizational Behavior and Human Decision Processes, 55, 152–170. Bonner, S. E., Hastie, R., Sprinkle, G. B., & Young, S. M. (2000). A review of the effects of financial incentives on performance in laboratory tasks: Implications for management accounting. Journal of Management Accounting Research, 12, 19–64. Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19(1–3), 7–42. Cameron, J., & Pierce, W. D. (1994). Reinforcement, reward, and intrinsic motivation: A meta-analysis. Review of Educational Research, 64, 363–423. Chesney, A. A., & Locke, E. A. (1991). Relationships among goal difficulty, business strategies, and performance on a complex management simulation task. Academy of Management Journal, 34(2), 400–424. Cotton, J. L., & Tuttle, J. M. (1986). Employee turnover: A meta-analysis and review with implications for research. Academy of Management Review, 11(1), 55–70. Covey, S. R. (1989). The seven habits of highly effective people: Restoring the character ethic. New York, NY: Simon and Schuster. Early, P. C., Wojnaroski, P., & Prest, W. (1987). Task planning and energy expended: Exploration of how goals influence performance. Journal of Applied Psychology, 72, 107–114. Erez, M., Gopher, D., & Arzi, N. (1990). Effects of goal difficulty, self-set goals, and monetary rewards on dual task performance. Organizational Behavior & Human Decision Processes, 47(2), 247–270. Fatseas, V. A., & Hirst, M. K. (1992). Incentive effects of assigned goals and compensation schemes on budgetary performance. Accounting and Business Research, 22(88), 347–355. Gollwitzer, P. M., & Bargh, J. A. (1996). The psychology of action. New York, NY: Guilford Press. Jenkins, G. D., Gupta, N., Mitra, A., & Shaw, J. D. (1998). Are financial incentives related to performance? A meta-analytic review of empirical research. Journal of Applied Psychology, 83(5), 777–787. Kaplan, R. S. (1997a). Mobil USM&R (A): Linking the balanced scorecard. Boston, MA: Harvard Business School Publishing. Kaplan, R. S. (1997b). Mobil USM&R (B): New England sales and distribution. Boston, MA: Harvard Business School Publishing. Kaplan, R. S., & Norton, D. P. (1992). The balanced scorecard: Measures that drive performance. Harvard Business Review (January–February), 71–79. Kaplan, R. S., & Norton, D. P. (1996). Translating strategy into action: The balanced scorecard. Boston, MA: Harvard Business School Publishing. Komaki, J. L., Coombs, T., & Schepman, S. (1996). Motivational implications of reinforcement theory. In: R. M. Steers, L. W. Porter & G. A. Bigley (Eds), Motivation and Leadership at Work (pp. 34–52). New York, NY: McGraw-Hill. Larson, J. R., Jr., & Callahan, C. (1990). Performance monitoring: How it affects work productivity. Journal of Applied Psychology, 75(5), 530–538. Lee, T. W., Locke, E. A., & Phan, S. H. (1997). Explaining the assigned goal-incentive interaction: The role of self-efficacy and personal goals. Journal of Management, 23(4), 541–559. Libby, R., & Lipe, M. G. (1992). Incentives, effort, and the cognitive processes involved in accountingrelated judgments. Journal of Accounting Research, 30(2), 249–273. Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ: Prentice-Hall. Locke, E. A., Latham, G. P., & Erez, M. (1988). The determinants of goal acceptance and commitment. Academy of Management Review, 13, 23–39. 142 BRAD TUTTLE AND MARK J. ULLRICH McAllister, D. W., Mitchell, T. R., & Beach, L. R. (1979). The contingency model for the selection of decision strategies: An empirical test of the effects of significance, accountability, and reversibility. Organizational Behavior and Human Decision Processes, 24(2), 228–244. Miodonski, B. (1999). Time management is key to juggling multiple jobs. Contractor, 46(2), 5. Mowen, J., Middlemist, R., & Luther, D. (1981). Joint effects of assigned goal level and incentive structure on task performance: A laboratory study. Journal of Applied Psychology, 66, 598– 603. Naylor, J., & Illgen, D. (1984). Goal setting: A theoretical analysis of a motivation technology. Research in Organizational Behavior, 6, 95–140. Naylor, J., Pritchard, R., & Illgen, D. (1980). A theory of behavior in organizations. New York, NY: Academic Press. Plack, H. (2000). Managing time can be crucial. Baltimore Business Journal, 17(40), 27. Sprinkle, G. B. (2000). The effect of incentive contracts on learning and performance. The Accounting Review, 75(3), 299–326. Stone, D. N., & Zeibart, D. A. (1995). A model of financial incentive effects in decision making. Organizational Behavior and Human Decision Processes, 61(3), 250–261. Tuttle, B., & Burton, F. G. (1999). The effects of a modest incentive on information overload in an investment analysis task. Accounting, Organizations and Society, 24, 673–687. Tuttle, B., & Harrell, A. M. (2001). The impact of unit goal priorities, economic incentives, and interim feedback on the planned effort of information systems professionals. Journal of Information Systems, 15(2), 81–98. Vroom, V. H. (1964). Work and motivation. New York, NY: Wiley. Wood, R. E., & Locke, E. A. (1990). Goal setting and strategy effects on complex tasks. Research in Organizational Behavior, 12, 73–109. Wood, R. E., Mento, A. J., & Locke, E. A. (1987). Task complexity as a moderator of goal effects: A meta-analysis. Journal of Applied Psychology, 72(3), 416–425. Wright, P. M. (1991). Goals as mediators of the relationship between monetary incentives and performance: A review and NPI theory examination. Human Resource Management Review, 1(1), 1–22. Wright, P. M. (1992). An examination of the relationships among monetary incentives, goal level, goal commitment, and performance. Journal of Management, 18(4), 677–693. The Effects of Incentive Structure and Goal Difﬁculty 143 APPENDIX Sample Decision Case Columbia Corporation Assume that you are a unit level manager employed by the Columbia Corporation. Columbia’s senior management has identified a competitive strategy that is linked to goals in four important business areas. All unit managers have the same goals. In addition, performance measures were developed for each business area as follows: Area Customer Financial Internal Business Learning & Growth Example performance measures Mystery shopper ratings Return on capital employed (ROCE) Net margin Profit per business unit Employee attitude survey Number of inventory stock-outs Quality assessment score Employee skill development Customer complaints Feedback Customer compliments Sales & growth rate Goal is easily attainable Goal is challenging but attainable Goal is challenging but attainable Timely access to decision making information Goal is challenging but attainable Notice that you have received reliable interim feedback suggesting that the Customer goal is easily attainable and that the other three goals are challenging but attainable. Bonus and Promotion: Two items are of particular interest. First, a division manager is retiring and you are being considered for his replacement. Second, Columbia provides a performance bonus of up to 20% of your salary. Both your promotion and bonus depend on how many goals you achieve. The more goals you achieve the greater your bonus and likelihood of promotion. Decision: Like most managers, assume that you can work as many hours as you want and you can allocate the hours as you see fit. Further, assume that during the 144 BRAD TUTTLE AND MARK J. ULLRICH next performance evaluation period, you must spend 15 hours per week working on administrative and other responsibilities that are not directly related to achieving your goals in the four business areas (e.g. personnel issues, travel). Also, assume that you will devote all your remaining work time towards achieving your goals in the four business areas. Given the information in the case, please indicate below how you would allocate your hours at work to achieve the goals in each business area: Goal Area Hours of Work Effort Allocated Each Week to Achieve the Goals in Each Business Area Customer Financial Internal business Learning & growth Administrative & other Total work hours Hours/week Hours/week Hours/week Hours/week 15 Hours/week Hours/week THE EFFECT OF FAIRNESS IN CONTRACTING ON THE CREATION OF BUDGETARY SLACK Theresa Libby ABSTRACT This paper explores the relationship between fairness in contracting and the creation of budgetary slack. A laboratory experiment was performed in which privately informed subjects were compensated under either a truth-inducing or slack-inducing incentive contract. Contracting processes were either fair or unfair as deﬁned by procedural justice theory (Leventhal, 1980; Lind & Tyler, 1988). Under the slack-inducing contract, subjects exposed to the fair contracting process created signiﬁcantly less slack than subjects exposed to the unfair contracting process. Slack created by subjects compensated under the truth-inducing contract was low and insensitive to the fairness or unfairness of the contracting process employed. INTRODUCTION In large, decentralized organizations, accounting information often forms the basis for budget estimates used in strategic planning, in coordinating work between organizational divisions, and in setting targets used in performance evaluation (Merchant, 1985). The accuracy of budget estimates is key to the effectiveness of these short-run and long-run planning activities. Even so, prior research indicates Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 145–169 Copyright © 2003 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06007-1 145 146 THERESA LIBBY budget estimates are rarely accurate (Otley, 1985). The lack of accuracy of budget estimates may be the result of the manager’s inability to forecast accurately operational input-output relationships due to uncertainty inherent in the task. In addition, the organization may operate in an environment characterized by uncertainty. The manager may respond by building a buffer against uncertainty in the environment or in the task into his or her budget estimate (Davila & Wouters, 2000).1 Alternatively, inaccuracy in budget estimates may be motivated by budgetconstrained performance evaluation and reward systems (Jensen, 2001). Results of several studies in the accounting literature indicate that budget-constrained performance evaluation systems that emphasize variances in budget-to-actual results lead to budget gaming (Bart, 1988; Hopwood, 1972; Merchant, 1985; Walker & Johnson, 1999). One form of budget gaming that has been the focus of significant study is the creation of budgetary slack (Young & Lewis, 1995). Budgetary slack is defined as the intentional incorporation of budget amounts that make the budget easier to attain (Dunk, 1993). Budgetary slack is created when managers build excess resources into their budgets or knowingly understate their productive capabilities (Baiman & Evans, 1983; Young & Lewis, 1995). Budgetary slack is often manifested through overstated expenses or understated revenues and production plans (Kren & Liao, 1988). While budgetary slack may play a positive role by facilitating flexibility in dealing with uncertainty (Cyert & March, 1963; Van der Stede, 2000), this paper focuses on the alternative negative role budgetary slack plays when budgets are used to set targets for performance evaluation.2 Budgetary slack created when budget estimates are intentionally set at a level that is easy to attain can be detrimental to management control system effectiveness, especially when responsibility center managers are held accountable for meeting budget targets and these targets are used to coordinate activities between organizational divisions and to compensate managers for high performance. According to Jensen (2001) and Murphy (2000), the typical pay-forperformance compensation contract includes a fixed salary plus a bonus increasing in performance above a pre-specified budget target. When a manager is compensated under this type of contract, holds private information about the productive capability of his/her division and participates in setting his/her own budget target, incentives for slack creation exist. Consequently, this type of contract has been labeled slack-inducing (Waller, 1988). A significant stream of research has developed using the agency framework to test the ability of other forms of budget-based incentive contracts to encourage managers to reveal their private information while limiting the amount of budgetary slack managers create (Baiman, 1982). These types of contracts have been labeled truth-inducing The Effect of Fairness in Contracting on the Creation of Budgetary Slack 147 (Waller, 1988). Truth-inducing contracts typically include a penalty for performance that differs from a participatively set budget target (Weitzman, 1976). Although theoretically sound, truth-inducing contracts are rarely used in practice (Baker et al., 1988), perhaps because the costs of implementation outweigh their benefits. An alternative to truth-inducing contracts generating slack-reducing effects would therefore be valuable. The objective of the present study is to determine whether the utilization of fair contracting processes combined with an otherwise slack-inducing incentive contract provides a feasible alternative.3 For the purposes of this study, fair contracting processes are defined according to procedural justice theory (Leventhal, 1980; Lind & Tyler, 1988). Procedural justice theory suggests organizational members will perceive a process to be fairer the greater the degree to which the decision-maker creates a “positive atmosphere of cooperation and compromise” even when “. . . the values, desires and concerns of the decision-maker and affected parties may not always agree” (Hunton, 1996, p. 650). This paper describes the results of an experiment in which subjects performed a production task and received compensation under a budget-based incentive contract of either the slack-inducing or truth-inducing form. In addition, subjects received information about a contracting process designed to be either fair or unfair. Results indicated subjects compensated under the slack-inducing contract and assigned to the fair contracting process condition created significantly less budgetary slack than subjects assigned to the unfair contracting process condition. While subjects compensated under the truth-inducing contract, on average, created less slack than subjects compensated under the slack-inducing contract, the fairness of the contracting process had no effect on the amount of slack they created. The remainder of the paper is organized as follows. In the next section, hypotheses are developed followed by a description of the experimental design and experimental method. The results of the statistical analyses are then reported followed by discussion of the experimental findings, their limitations and their implications for future research. RELATED LITERATURE AND HYPOTHESIS DEVELOPMENT The incentive contracting literature in accounting has developed in response to the need for incentive contracts that motivate subordinates to be productive and to truthfully communicate local private information to improve centralized allocation and coordination decisions. In general, these studies recommend the use of budgetbased incentive contracts and participation in budgeting to motivate managers to set 148 THERESA LIBBY accurate budget targets (Demski & Feltham, 1978; Melumad & Reichelstein, 1989; Namazi, 1985). A major concern of this literature is that participation in setting budget targets allows for information sharing, but also increases the potential for the creation of budgetary slack if managers are then compensated based on meeting or exceeding the budget that was participatively set (Antle & Eppen, 1985). Truthinducing contracts have been constructed to address this problem. Truth-inducing contracts impose a penalty for misrepresentation, usually scaled by the difference between budgeted and actual performance, providing an incentive for subordinates to reveal their private information through the budget targets they set (Kirby et al., 1991; Weitzman, 1976). The particular form of truth-inducing contract studied here was developed theoretically by Reichelstein and Osband (1984) and adapted to the budgeting context by Kirby et al. (1991). The contract was further adapted by Kirby (1992) to a context in which the manager selects a budget target and focuses effort on maximizing output to meet or exceed that target. The contract is of the following form: H(A, B) = v(B) + w(B)(A − B) subject to v(B) is increasing and convex (v > 0, v < 0) and w(B) = v (B) for all B. In this context, H(A, B) represents the manager’s total compensation, B represents the productivity estimate (or budget target) for the period, and A represents the actual level of productivity for the period. The manager’s total compensation (H ) is therefore made up of an ex ante payment, v(B), and a bonus or penalty, w(B) (A − B), whose value depends on the variance between budget and actual performance. The truth-inducing properties of this contract have been tested empirically by Kirby (1992), Reichelstein (1992), and Chow et al. (2000). While the theoretical design of the contract relies on the assumption that managers are strict utility maximizers, Kirby (1992) finds the contract maintains its truth-inducing properties even when this assumption is relaxed. Reichelstein (1992) reports a successful application of this contract form by the German Department of Defense. Finally, Chow et al. (2000) experimentally test several mechanisms designed to motivate truthful upward communication of private information including this contract form. They find this truth-inducing contract led to significantly less misrepresentation of private information than a slack-inducing linear profit sharing scheme. Accordingly, in the context of the current study, individuals compensated under this form of truth-inducing contract are expected to create a relatively low amount of budgetary slack. This prediction is stated formally as follows: H1. Individuals compensated under a truth-inducing contract will create less budgetary slack than individuals compensated under a slack-inducing contract. The Effect of Fairness in Contracting on the Creation of Budgetary Slack 149 Although truth-inducing contract forms have been widely examined in the academic literature, the feasibility of implementing truth-inducing contracts in real organizations has been challenged on the basis of complexity (Atkinson, 1985; Jennergren, 1980; Loeb & Magat, 1978) and cost (Evans et al., 2001; Evans & Sridhar, 1996; Luft, 1994). For example, Evans and Sridhar (1996) find the tighter the controls placed on the manager, the higher the risk premium the manager will demand before accepting a particular contract. Evans et al. (2001) find contracts derived on the assumption that all managers will create budgetary slack are costlier than contracts developed assuming some managers derive utility from honestly revealing their private information. In addition, Luft (1994) finds individuals prefer bonus-framed to penaltyframed contracts for several reasons including the perception that bonus-framed contracts are fairer, more rewarding, and more motivational than penalty-framed contracts. Consequently, employees may demand higher (i.e. more costly to the employer) potential payoffs from penalty-framed truth-inducing contracts than bonus-framed slack-inducing contracts to exert the same amount of effort. If this is the case, then an alternative method of reducing the amount of budgetary slack subordinates create while continuing to motivate them under a bonus-framed budget-based incentive contract would be useful. Empirical results of prior studies reviewed below suggest the use of fair processes in allocating organizational resources may provide such an alternative. Fair Process and Social Exchange Theory Social exchange theory defines behavior in terms of two types of exchange, economic exchange and social exchange (Blau, 1964). Economic exchange motivates behavior intended to fulfil the formal economic employment contract. Employers offer a “fair day’s pay” and expect employees to provide a “fair day’s work.” Social exchange, on the other hand, is based on a psychological or implicit contract that defines obligations on the part of both the organization and the employee (Rousseau & Parks, 1993). Employees may go beyond the specific duties laid out in the employment contract if they feel the organization “values their contributions and cares for their general well-being” (Eisenberger et al., 1990, p. 51). The division of employee behavior into these two separate, but related categories mirrors the theoretical predictions of organizational justice theory (OJT). OJT suggests individuals’ overall perceptions of fairness in an organizational setting are based on the combination of judgments about the fairness of the actual amount of resources allocated by the organization to subordinates, known as distributive fairness, and judgments about the fairness of the processes used to make allocation decisions, known as procedural fairness (Folger & Cropanzano, 1998). 150 THERESA LIBBY Employees’ judgments about the fairness of the actual amount of organizational resources distributed to them may be related to the economic exchange relationship between the employee and the organization, while judgments about the fairness of the allocation processes may be related to the social exchange (non-economic) relationship. Prior research has demonstrated the relationship between procedural fairness and social exchange. Specifically, when organizational processes and procedures are perceived by the employee to be fair, positive consequences result including subordinates’ increased trust in superiors, increased commitment to organizational goals, increased willingness to act in the organization’s best interests, reduced turnover, and improved performance (Kim & Mauborgne, 1993; Moorman et al., 1998; Naumann & Bennett, 2000). Naumann and Bennett (2000) report the results of a survey of bank employees in which they find a significant positive correlation between procedural justice climate, organizational commitment and extra-role behavior. Procedural justice climate is defined as an organizational climate with a high emphasis on the fairness of organizational procedures. Employees who perceived a high procedural justice climate performed significantly more tasks that were in the organization’s best interests although these tasks fell outside of their employment contracts. Moorman et al. (1998) study the link between procedural fairness, perceived organizational support, and organizational citizenship behaviors. Their results support the hypothesis that procedural fairness leads to an increase in subordinates’ perceptions of a supportive organizational environment. In addition, they found that when subordinates felt their organization was supportive of them and valued their contributions, subordinates reciprocated by increasing extra-role behaviors. Kim and Mauborgne (1993), in a longitudinal study of multinational organizations, found divisional managers who perceived they had been fairly treated by head office in resource allocation decisions were more likely to comply with head office requests and reported higher commitment to the organization and higher trust in head office management than managers who reported being unfairly treated. These positive outcomes may be a result of subordinates reciprocating the fair treatment they have received from the organization. While these studies consider the effect of fair and unfair organizational processes on subsequent behavior, they do not consider the organizational consequences of combining fair or unfair processes with various incentive contract forms. In the current study, the consequence of combining fair and unfair contracting processes with budget-based incentive contracts of the slack-inducing and truth-inducing forms is examined. When both contract and process are considered, a reduction in budgetary slack created when contracting processes are perceived as fair may indicate The Effect of Fairness in Contracting on the Creation of Budgetary Slack 151 subordinates’ preference for non-economic benefits derived from reciprocating the fair treatment they have received from their organization over economic benefits that may be derived from slack-creation. An interesting question then becomes under what conditions will economic and non-economic benefits be traded off? This question is addressed to some degree in the accounting literature by Luft (1994) and in the economics literature by Fehr et al. (2001) and Fehr and Gachter (2002). These studies are reviewed below. Luft (1994) argues that bonus-framed contracts, like the slack-inducing contract considered here, provide employees with non-monetary payoffs in the form of approval and appreciation which are not communicated by penalty-framed contracts. Approval and appreciation are non-monetary payoffs received through the social exchange relationship between the employee and the organization. Penalty-framed contracts, on the other hand, tend to focus the employee’s attention on the monetary, or economic exchange relationship between the employee and the organization. In addition, Luft (1994) suggests employers allow bonus-framed contracts to remain purposefully incomplete in terms of the amount of bonus that will be received. That is, the employee trusts a bonus will be paid at some stipulated time in the future assuming the employee exerts a pre-determined level of effort. The nature of this “trust contract” implies a longer term relationship between the employee and the organization. Fehr et al. (2001) test the use of fairness as an enforcement device when contracts are left incomplete. They demonstrate, both theoretically and experimentally, that an incomplete bonus-based contract can be more efficient in motivating agents to exert effort than a more complete, penalty-based contract because the incomplete contract leaves room for reciprocity between the principal and the agent. They find fair treatment is reciprocated under the bonus-based incomplete contract, even in a one-period world where the principal contracts with the agent only once and therefore has an economic incentive not to pay the promised bonus. In the context of the current study, individuals compensated under a bonus-based slack-inducing contract have the opportunity to act in their own best interests by misinforming the organization of their actual productive capability (i.e. creating budgetary slack). The individual receives a short-term economic benefit from doing so in the form of a larger bonus. The organization is worse off if the inaccurate information provided is also used for planning and coordination across the organization. When budgeting processes are fair, the literature reviewed above suggests the bonus-framed slack-inducing contract allows room for reciprocity between the employee and the organization. Consequently, the social exchange relationship between the individual and the organization becomes salient under a bonus-framed slack-inducing contract and individuals should therefore respond to fair budgeting processes by creating less budgetary slack. The opposite effect will occur when 152 THERESA LIBBY budgeting processes are unfair; that is, employees who perceive budgeting processes to be unfair will reciprocate this unfair treatment by acting in their own rather than the organization’s best interests by creating a relatively high amount of budgetary slack. Penalty-framed truth-inducing contracts tend to be completely specified in economic terms at the beginning of the period due to difficulties in enforcing the penalty after the fact (Luft, 1994). As a result, a shorter-term economic exchange relationship between the individual and the organization may become salient under the penalty-framed truth-inducing contract. If so, procedural fairness becomes less important and employees may then focus on the economic benefits obtainable in the current period to a greater degree than consideration of any future benefits that may accrue. Fehr and Gachter (2002) refer to this effect as a “crowding out” of agent’s incentives to voluntarily cooperate. Results of their experimental study indicate that incentive contracts that include a penalty for shirking (i.e. the agent provides less than the agreed upon level of effort) are less efficient than a fixed-fee contract because they discourage agents from focusing on the longer term employment relationship and therefore, reduce the agent’s interest in reciprocating fair treatment. Thus, individuals compensated under the penalty-framed truth-inducing contract may not respond to the fairness or unfairness of budgeting processes when selecting a budget target because economic incentives imbedded in the contract will be most salient to them. In summary, this review of the literature implies that the relation between fairness in contracting and budgetary slack creation is moderated by the form of budget-based incentive contract employed. That is, fairness in contracting will influence the amount of budgetary slack individuals create when they are compensated under a slack-inducing, but not a truth-inducing incentive contract. This line of reasoning leads to the following hypothesis: H2. When a slack-inducing contract is employed, budgetary slack will be lower when the contracting process is fair than when the contracting process is unfair; however, when a truth-inducing contract is employed, budgetary slack will be low irrespective of the contracting process employed. EXPERIMENTAL DESIGN AND METHOD Participants The hypotheses were tested in an experiment in which contract type (slack-inducing vs. truth-inducing) and contracting process (fair vs. unfair) were The Effect of Fairness in Contracting on the Creation of Budgetary Slack 153 manipulated between subjects in a 2 × 2 full-factorial design. Subjects were recruited from those enrolled in a required first-year undergraduate business course. A total of 181 students took part in the experiment (96 male and 85 female) in 12 small groups of 12 to 20 subjects. Subjects within each group were randomly assigned to one of the four experimental conditions. The study took approximately forty minutes to complete and the twelve groups took part in the study over a one-week period. To control for information leakage over that week, subjects were asked not to discuss the details of the experiment with their peers and were debriefed and received feedback on their performance only after all subjects had completed the experiment. Experimental Task Subjects acted as employees of the translation division of a book publisher. Subjects performed a production task that involved translating symbols into alphabetic characters using a translation key. The symbols were grouped into words of different lengths and were presented to subjects on worksheets with ten words per page. The words were groups of symbols that did not represent actual English words. The task is a variation of the task developed by Chow (1983) and adapted by Waller (1988).4 While uncertainty was designed into the experimental task, the level of uncertainty was controlled across experimental conditions. Specifically, the lengths of words appearing on each worksheet page were varied between five and nine symbols with the following probabilities: 0.15 for five letter words, 0.50 for seven letter words, and 0.35 for nine letter words. Subjects were aware of this distribution when setting their budget targets. Contract Type Incentive contracts used to compensate subjects were based on the slack-inducing contractual form used by Waller and Chow (1985) and the truth-inducing contractual form developed by Kirby et al. (1991). Subjects earned tickets that were entered in a raffle for one of twelve cash prizes of $150. The more tickets subjects earned, the greater their chance to win one of these prizes. The two types of incentive contracts were operationalized as follows:5 Slack-inducing contract Payment = 3 tickets + 0.30 tickets (Actual − Budget) or Payment = 3 tickets if Actual ≤ Budget. if Actual > Budget, 154 THERESA LIBBY Truth-inducing contract (Budget)2 2(Budget) + (Actual − Budget) 100 100 Subjects assigned the truth-inducing contract were provided a table in which the total compensation under this contract for various pairs of budgeted and actual outcomes was calculated. This table is reproduced in Appendix A. All subjects were given sample budget and actual amounts and asked to calculate the related compensation that would be received. They then checked these calculations to ensure that they understood the relationship between their payment, their budget and their actual performance. Payment = Fair and Unfair Contracting Processes The fair and unfair contracting processes were operationalized through scenarios designed to reflect allocation processes and procedures prior research has indicated most individuals would judge to be fair or unfair (see for example, Lind et al., 1990; Moorman et al., 1998; Naumann & Bennett, 2000). Specifically, judgments of procedural fairness in this study depended on allowing individuals the opportunity to voice their opinions during the allocation process (Hunton & Beeler, 1997; Lind et al., 1990) and on procedures that would encourage subjects to evaluate the manager as trustworthy (Tyler, 1989; Tyler & Lind, 1992).6 In addition, the procedures were designed to ensure that decision making was unbiased (Greenberg, 1986) and that the decision was based on full and complete information (Greenberg, 1987). In general, subjects were told a new incentive contract had been implemented in the division in which they worked. Subjects were asked to assume they were new to the organization, but had collected the information included in the scenario during the course of their employment to date. The scenarios used to operationalize the fair and unfair contracting processes are included in Appendix B. Experimental Procedures Subjects first completed a five-minute practice period to become familiar with the translation task. They earned a piece rate of one raffle ticket for every three words correctly translated. At the end of this practice period, the subjects verified their work and calculated the number of words that they had correctly translated in the practice period.7 After practicing the task and being informed of the probability distribution of words of different lengths, but before experimental manipulations were introduced, subjects recorded their best estimate of next period performance; that is, their best estimate of the number of words they expected to be able to The Effect of Fairness in Contracting on the Creation of Budgetary Slack 155 translate if given another five minutes in which to work. Subjects placed this completed Best Estimate of Production sheet in an envelope and sealed it. Subjects kept this sealed envelope with them until the experiment was complete and consequently, this information was unknown to the researcher until subjects had completed the experiment. Subjects then read a description of the incentive contract under which they were to work and the information about the fair or unfair contracting process. They provided the researcher, acting as the division manager, with the budget they wished to use in calculating the number of tickets earned in the work period. Subjects were told the budget would also be used by the division manager to co-ordinate production between divisions. Information asymmetry was controlled at a relatively high level by informing all subjects they were new to the organization and their manager was therefore unsure of their productive capability and did not have access to the Best Estimate of Production forms. Subjects wrote down their budgets and then performed the translation task for five minutes. The third part of the experiment involved filling out a post-experimental questionnaire. The experimental materials were then collected and one week later, subjects received a performance report and the tickets that they had earned. Tickets were collected and placed in a container from which one of the subjects drew a winning ticket in each group. A cash prize of $150 was paid to the winning subject in each group and the goals of the experiment were discussed.8 These experimental procedures are summarized in Fig. 1. Dependent Variable – Budgetary Slack The dependent variable was the amount of budgetary slack subjects created. Slack was measured as the difference between the best estimate of next period performance subjects provided before they were given contract and process information (i.e. prior to the introduction of the experimental manipulations) and the budget subjects set after the incentive contract and process information was provided to them. The pre-manipulation best estimate of next period performance proxied for subjects’ private information about their own productive capability. This information was known only to the subject until the experiment was complete. Budget slack should therefore represent the intentional understatement of subjects’ productive capabilities motivated by the budget-based incentive contract and/or the contracting process employed. This method of measuring budget slack is similar to the method used prior experimental studies including Young (1985), Waller (1988), and Chow et al. (1988). 156 THERESA LIBBY Fig. 1. Experimental Procedures. Twelve subjects, approximately equally distributed across cells, failed to provide information necessary to calculate budget slack and were therefore dropped from the final sample. In addition, twenty-eight subjects (twenty-three of whom were assigned the truth-inducing contract) unexpectedly chose a budget target higher than their expected future performance. The economic incentives imbedded in the budget-based contracts used in this study were meant to encourage subjects The Effect of Fairness in Contracting on the Creation of Budgetary Slack 157 to choose a budget lower than (slack-inducing case) or equal to (truth-inducing case) their own best estimate of next period’s performance in order to maximize their compensation. It is unlikely that subjects misunderstood the incentives imbedded in the contracts given they were trained in the effects of varying levels of budget and actual performance under the specific contract assigned to them. An analysis of data collected in the post-experimental questionnaire indicates this result may be due to differences in subjects’ risk tolerance. Following Young (1985), subject’s attitudes toward risk were measured through subjects’ responses to a standard gamble. The mean score on this risk aversion measure for these 28 subjects was 0.55 (std. dev. = 0.28)(theoretical range between 0 and 1) compared to a mean score of 0.65 (std. dev. = 0.20) for all other subjects taking part in the experiment. The difference between these means is significant, F(1, 167) = 4.78 ( p < 0.05) indicating these 28 subjects were significantly less risk averse than other subjects taking part in the experiment. The effect is most pronounced in the truth-inducing contract condition. This group of subjects appear to be responding to incentives other than those anticipated and consequently, they were dropped from the sample leaving a final useable sample of one hundred and forty-two subjects. RESULTS Manipulation Check for Contracting Process To ensure subjects assigned the scenarios designed to represent fair and unfair incentive contracting processes actually perceived these processes to be fair or unfair respectively, subjects were asked to answer the following questions on a scale of one (completely unfair) to five (completely fair): “How fair would you judge the procedures used to set the formula on which your earnings were based?” and “How fair would you judge the process of setting the budget used to calculate your earnings?” These questions were based on measures reported in Tyler and Lind (1992).9 Each subject’s score was their mean score across the two questions included in the scale. The overall mean score on this scale was 3.60 (std. dev. = 0.81, Cronbach’s alpha = 0.67). Means and standard deviations for perceived fairness of the contracting process are presented in Table 1, Panel A. A 2 × 2 analysis of variance was performed on subjects’ perceptions of the fairness of the contracting process (see Table 1, Panel B). Results indicated a significant difference in subjects’ perceptions of the fairness of the contracting process depending on whether they read the scenario describing the contracting 158 THERESA LIBBY Table 1. Procedural Fairness by Contract Type and Contracting Process. Panel A: Mean (Standard Deviation) of Procedural Fairness Fair contracting process Unfair contracting process Marginals Slack-Inducing Contract Truth-Inducing Contract Marginals 3.71 (0.63) n = 42 3.41 (0.99) n = 41 3.55 (0.84) n = 83 3.77 (0.79) n = 31 3.56 (0.75) n = 28 3.67 (0.77) n = 59 3.73 (0.70) n = 73 3.47 (0.90) n = 69 3.60 (0.81) n = 142 Panel B: Analysis of Variance of Procedural Fairness Source SS df MS Contract type Contracting process Contract × process 0.30 4.48 0.84 1 1 1 0.30 4.48 0.84 107.46 138 0.78 Error ∗∗ p F 0.38 5.75** 1.08 < 0.05. process designed to be fair or unfair, F(1, 138) = 5.75 ( p < 0.05) but perceptions of fairness did not differ depending on contract type, F(1, 138) = 0.38, or the contract by process interaction, F(1, 138) = 1.08, indicating subjects responded to the manipulation of process fairness as expected.10 In the fair contracting process condition, perceived fairness may also have manifested itself as a felt social pressure to adhere to the existing norms or culture of fairness within this organizational division (Naumann & Bennett, 2000). This perspective may be implied in subjects response to a post-experimental question asking them to rate the fairness of the work environment. A 2 × 2 analysis of variance indicated a significant main effect of contracting process on subjects’ evaluation of the fairness of the work environment, F(1, 138) = 54.58 ( p < 0.001), with subjects in the fair process condition rating the work environment as significantly fairer (mean = 3.34, std. dev. = 1.03) than subjects in the unfair process condition (mean = 2.35, std. dev. = 1.08). No significant differences were found based on contract form or the contract by process interaction. The Effect of Fairness in Contracting on the Creation of Budgetary Slack 159 Hypothesis Tests A 2 × 2 analysis of variance with adjustment for non-orthogonality (regression approach) was used to determine the statistical significance of the differences in the mean amount of slack created in each experimental condition. The one hundred and forty-two usable observations were included in this analysis. Cell means for slack created by experimental condition are presented in Table 2 (Panel A) and Fig. 2. Results of the analysis of variance are presented in Table 2 (Panel B). Based on previous theoretical and empirical results, H1 predicted subjects assigned the truth-inducing contract would create less budget slack than subjects assigned the slack inducing contract. Results of the analysis of variance reveal a significant main effect of contract type, F(1, 138) = 26.99 ( p < 0.001) indicating Table 2. Budgetary Slack by Contract Type and Contracting Process. Panel A: Mean (Standard Deviation) of Budgetary Slack Slack-Inducing Contract Fair contracting process Unfair contracting process Truth-Inducing Contract 2.40 (2.38) n = 42 3.85 (3.97) n = 41 0.80 (1.47) n = 31 0.71 (1.49) n = 28 Panel B: Analysis of Variance of Budgetary Slack Creation Source SS df MS Contract type Contracting process Contract × process 193.21 15.85 20.44 1 1 1 193.21 15.85 20.44 Error 987.79 138 7.16 Panel C: Simple Effects Analysis Fair vs. unfair process for slack-inducing contract Fair vs. unfair process for truth-inducing contract ∗p < 0.10. < 0.001. ∗∗∗ p F df Significance 4.09 81 p < 0.05 0.06 57 p = 0.94 F 26.99*** 2.21 2.86* 160 THERESA LIBBY Fig. 2. Mean Budgetary Slack by Experimental Condition. subjects assigned the slack-inducing contract created significantly more slack on average (3.12 words) than subjects assigned the truth-inducing contract (0.76 words). These results provide support for H1. H2 predicted that less slack would be created under a fair contracting process than an unfair contracting process when the incentive contract was of the slack-inducing type. However, slack creation would be insensitive to the contracting process when the incentive contract was of the truth-inducing type. Results of simple effects analysis employed to test this hypothesis are presented in Table 2 (Panel C). No significant difference between slack created under the fair or unfair contracting process was found within the truth-inducing contract condition, F(1, 57) = 0.06. Differences in the amount of slack created depending on the fairness of the contracting process were significant within the slack-inducing contract condition, F(1, 81) = 4.09 ( p < 0.05) with subjects in the unfair contracting process condition creating significantly more slack than subjects in the fair contracting process condition. These results provide support for H2. DISCUSSION This study explores the relationship between fair and unfair contracting processes, budget-based compensation contracts, and the creation of budgetary slack. Prior research examines the effectiveness of a variety of forms of truth-inducing contracts in reducing budgetary slack. The current study contributes to the literature by examining the effectiveness of two specific contract forms when The Effect of Fairness in Contracting on the Creation of Budgetary Slack 161 combined with fair or unfair contracting processes in reducing the amount of budgetary slack subjects create. Results were as predicted. Consistent with results of prior studies in this area, subjects compensated under the truth-inducing contract created significantly less slack than subjects compensated under the slack-inducing contract. Subjects compensated under a slack-inducing contract and exposed to an unfair contracting process created more budgetary slack than subjects compensated under a slackinducing contract and exposed to a fair contracting process. Finally, the amount of budgetary slack created by subjects compensated under the truth-inducing incentive contract was insensitive to the fairness or unfairness of the contracting process employed. It is interesting to note that the benefit of fairness in contracting is realized only for the slack-inducing contract form. In addition, the truth-inducing contract form was more effective than the combination of the slack-inducing contract and fair contracting process in reducing slack creation. The problem is that truth-inducing contracts do not appear to be widely utilized in practice, can be costly to implement, are considered less fair, and are overall less preferred than bonus-framed contracts like the slack-inducing contract studied here (Luft, 1994). Results of the present study indicate that fairness in contracting may represent a relatively effective alternative to the implementation of a truth-inducing contract. Whether organizations currently utilize this means of reducing slack creation behavior warrants further investigation. The generalizability of these results to real managers is limited by the use of student subjects. Consequently, the ability to transfer what has been learned here to managers in real organizations may be limited. While this threat to external validity cannot be ruled out, the elements of the theory of procedural justice on which the hypotheses are based are not manager-specific, but have been found to apply equally well in a variety of settings. In other words, individuals do not need managerial experience to be affected by the fairness treatment in the study described in this paper. In addition, the incentive contracts by which subjects were compensated are rooted in the basic economic premise that individuals prefer more money to less. These contracts should therefore retain their motivational qualities regardless of the degree of managerial experience of the subjects. Future research is also required to test directly the process by which fairness perceptions are translated into reductions in the amount of slack subordinates create. Cropanzano and Folger (1991) suggest perceptions of fairness lead to increases in organizational commitment which in turn lead to positive organizational outcomes. Future studies, perhaps in field settings, are required to determine whether organizational commitment also moderates the relationship 162 THERESA LIBBY between fair contracting process and the creation of budgetary slack in the incentive-contracting setting studied here. NOTES 1. Task and environmental uncertainty are fundamental issues faced by managers (Thompson, 1967). Task uncertainty refers to the difficulty of the task, its degree of variability and the extent to which successful completion of the task depends on the successful completion of other tasks (Tushman & Nadler, 1978). Environmental uncertainty has many dimensions, the most important of which may be the degree to which the organization is connected to and relies on other entities in its environment for information and/or resources and the extent to which these other entities are undergoing change (Lawrence & Lorsch, 1967). 2. While it may be difficult to distinguish between slack created as a buffer against uncertainty and slack created to game the performance evaluation system in real organizations, the current study benefits from the control allowed in the laboratory environment. Specifically, in the laboratory setting described here, uncertainty is held constant across experimental conditions allowing for an analysis of slack created for budget gaming purposes. 3. While some attention has been paid to the fairness construct in previous accountingrelated studies (Ehlen & Welker, 1996; Hunton & Gibson, 1999; Libby, 1999; Lindquist, 1995; Magner & Welker, 1994; Moser et al., 1995), a search of the literature failed to indicate any other studies examining the relevance of the fairness construct to the creation of budgetary slack. 4. Although this task is relatively simple, it is not unlike the typical simple, repetitive production task for which a piece-rate and/or bonus compensation would be paid in actual organizations in order to motivate performance. The simplicity of the task means that it is easily understood by subjects and is easy for them to learn in a relatively short period of time. Therefore, the task gains in terms of experimental realism what may be lost in terms of mundane realism. 5. The compensation parameters in both the slack-inducing and truth-inducing contracts were set based on the results of a pre-test in which average output on the experimental task for subjects similar in background to those taking part in the experiment was 25 words translated in five minutes (minimum of 14 words, maximum of 38 words). 6. Voice is a generic term indicating the ability for subordinates to communicate their interests to their superiors in an organization in order to exert some influence over the decisions their superiors make (Folger, 1977). Budget participation could be viewed as a context-specific form of voice defined as the process by which managers communicate information about their productive capabilities to their superiors in order to influence the setting of targets in their budget-based incentive contracts (Kren, 1992). 7. Before subjects were paid, their practice period and work period performance was verified and their total compensation was recalculated. 8. Note that subjects’ probability of winning the prize is dependent not just on their own performance, but on the performance of other subjects in the group. Due to the one-period nature of the experiment and the setting in which the experiment took place, subjects had no opportunity to collude or act in any strategic way. Also, note that the perceived attainability of the target is important. If the target had been viewed as unattainable, The Effect of Fairness in Contracting on the Creation of Budgetary Slack 163 subjects would have conserved energy by not performing the task and taking the fixed portion of the payment available under each of the incentive schemes. No subjects took this strategy indicating that the compensation scheme was motivational and that subjects viewed the target as difficult, but attainable. 9. This scale also includes questions about outcome fairness. The outcome-related questions were adapted from Tyler and Lind (1992) as “How would you judge the formula itself that will be used to calculate your earnings for the work period?” and “How fair would you judge the budget itself?” Perceptions of outcome fairness did not differ depending on contract type, F(1, 138) = 2.08, process, F(1, 138) = 1.50, or the contract by process interaction, F(1, 138) = 0.06. 10. As an additional check on subjects’ perceptions of fairness, subjects were asked to answer the following question: Think about the information you received about the negotiation process between the workers and managers in this organization that was involved in setting the earnings formula. On a scale of 1 to 5, where 1 means completely unfair and 5 means completely fair, how fair would you judge the negotiation process? Results of a 2 × 2 analysis of variance indicated a significant main effect of contracting process on subject’s evaluation of the fairness of the negotiation process, F(1, 138) = 39.67, p < 0.001, with subjects in the fair process condition rating the negotiation process as fairer (mean = 3.71, std. dev. = 0.86) than subjects in the unfair process condition (mean = 2.68, std. dev. = 1.09). No significant differences were found based on contract form or the contract by process interaction. ACKNOWLEDGMENTS I would like to thank John Waterhouse, Bill Scott, Duane Kennedy, and Jane Webster for their guidance in the development and execution of this project. I also wish to thank Glenn Feltham, Joseph Fisher, Kathryn Kadous, Kevin Kelloway, Robert Mathieu, Don Moser, Steve Salterio, participants at the 1999 Management Accounting Research Conference, and the accounting research workshops at HEC (Montreal) and the University of Alberta for their many helpful comments and suggestions. I gratefully acknowledge the School of Accountancy, University of Waterloo and CGA-Canada for their financial support of this project. Data available from the author upon request. REFERENCES Antle, R., & Eppen, G. D. (1985). Capital rationing and organizational slack in capital budgeting. Management Science (Feb), 163–174. Atkinson, A. (1985). Truth-inducing schemes in budgeting and resource allocation. Cost & Management (May/June), 38–42. Baiman, S. (1982). Agency research in management accounting: A survey. Journal of Accounting Literature, 1, 154–213. 164 THERESA LIBBY Baiman, S., & Evans, J. H. (1983). Pre-decision information and participative management control systems. Journal of Accounting Research, 21, 371–395. Baker, G. P., Jensen, M. C., & Murphy, K. J. (1988). Compensation and incentives: Practice vs. theory. Journal of Finance, 43(3), 593–617. Bart, C. (1988). Budgeting gamesmanship. Academy of Management Executive, 285–294. Blau, P. (1964). Exchange and power in social life. New York, NY: Wiley. Chow, C. W. (1983). The effects of job standard tightness and compensation scheme on performance: An exploration of linkages. The Accounting Review, 58, 667–685. Chow, C. W., Cooper, J. C., & Waller, W. S. (1988). Participative budgeting effects of truth inducing pay schemes. The Accounting Review, 63, 111–123. Chow, C. W., Hwang, R. N., & Liao, W. (2000). Motivating truthful upward communication of private information: An experimental study of mechanisms from theory and practice. Abacus, 36(2), 160–179. Cropanzano, R., & Folger, R. (1991). Procedural justice and worker motivation. In: R. M. Staw & L. W. Porter (Eds), Motivation and Work Behavior (5th ed., pp. 131–143). New York, NY: McGraw-Hill. Cyert, R. M., & March, J. G. (1963). A behavioral theory of the ﬁrm. Englewood Cliffs, NJ: PrenticeHall. Davila, T., & Wouters, M. (2000). Meeting budgets: Budget emphasis and the release of budgetary slack. Working Paper: Stanford University, Stanford, CA. Demski, J., & Feltham, G. (1978). Economic incentives in budgetary control systems. The Accounting Review, 53, 336–359. Dunk, A. S. (1993). The effect of budget emphasis and information asymmetry on the relation between budgetary participation and slack. The Accounting Review, 68(2), 400–410. Ehlen, C. R., & Welker, R. B. (1996). Procedural fairness in the peer and quality review programs. Auditing: A Journal of Practice and Theory, 15(1), 38–52. Eisenberger, R., Fasolo, P., & Davis-LaMastro, V. (1990). Perceived organizational Support and employee diligence, commitment and innovation. Journal of Applied Psychology, 75, 51–59. Evans, J. H., Hannan, R. L., Krishnan, R., & Moser, D. V. (2001). Honesty in managerial reporting. The Accounting Review, 76(4). Evans, J. H., & Sridhar, S. S. (1996). Multiple control systems, accrual accounting, and earnings management. Journal of Accounting Research, 24(1), 45–65. Fehr, E., & Gachter, A. (2002). Do incentive contracts crowd out voluntary cooperation? Institute for Empirical Research in Economics, Working Paper No. 34, University of Zurich. Fehr, E., Klein, A., & Schmidt, K. M. (2001). Fairness, incentives and contractual incompleteness. CESifo Working Paper No. 445: Center for Economic Studies, Munich. Folger, R. (1977). Distributive and procedural justice: Combined impact of voice and improvement on experienced inequity. Journal of Personality and Social Psychology, 35, 108–119. Folger, R., & Cropanzano, R. (1998). Organizational justice and human resource management. Thousand Oaks, CA: Sage Publications. Greenberg, J. (1986). Determinants of perceived fairness of performance evaluations. Journal of Applied Psychology, 71(2), 340–342. Greenberg, J. (1987). Reactions to procedural injustice in payment distributions: Do the means justify the ends? Journal of Applied Psychology, 72(1), 55–61. Hopwood, A. G. (1972). An empirical study of the role of accounting data in performance evaluation. Journal of Accounting Research, 10, 156–182. The Effect of Fairness in Contracting on the Creation of Budgetary Slack 165 Hunton, J. (1996). Involving information system users in defining system requirements: The influence of procedural justice perceptions on user attitudes and performance. Decision Sciences, 27(4), 647–671. Hunton, J., & Beeler, J. D. (1997). Effects of user participation in systems development: A longitudinal field experiment. MIS Quarterly, 21(4), 359–388. Hunton, J., & Gibson, D. (1999). Soliciting user-input during the development of an accounting information system: Investigating the efficacy of group discussion. Accounting, Organizations and Society, 24, 597–618. Jennergren, L. P. (1980). On the design of incentives in Soviet firms: A survey of some research. Management Science (Feb), 193–197. Jensen, M. C. (2001). Corporate budgeting is broken – Let’s fix it. Harvard Business Review, 79(10), 94–101. Kim, W. C., & Mauborgne, R. A. (1993). Procedural justice, attitudes, and subsidiary top-management compliance with multinationals’ corporate strategic decisions. Academy of Management Journal, 36(3), 502–526. Kirby, A. J. (1992). Incentive compensation schemes: Experimental calibration of the rationality hypothesis. Contemporary Accounting Research, 8, 374–408. Kirby, A. J., Reichelstein, S., Sen, P. K., & Paik, T. (1991). Participation, slack, and budget-based performance evaluation. Journal of Accounting Research, 29, 109–128. Kren, L. (1992). Budgetary participation and managerial performance: The impact of information and environmental volatility. The Accounting Review, 67(3), 511–526. Kren, L., & Liao, W. M. (1988). The role of accounting information in the control of organizations: A review of the evidence. Journal of Accounting Literature, 7, 280–309. Lawrence, P. R., & Lorsch, J. W. (1967). Organization and environment: Managing differentiation and integration. Boston: Graduate School of Business Administration, Harvard University. Leventhal, G. S. (1980). What should be done with equity theory? In: K. J. Gergen, M. S. Greenberg & R. H. Willis (Eds), Social Exchange: Advances in Theory and Research. NY: Plenum Press. Libby, T. (1999). The influence of voice and explanation on performance in a participative budgeting setting. Accounting, Organizations and Society, 24(2), 125–138. Lind, E. A., Kanfer, R., & Earley, P. C. (1990). Voice, control and procedural justice: Instrumental and non-instrumental concerns in fairness judgments. Journal of Personality and Social Psychology, 59(5), 952–959. Lind, E. A., & Tyler, T. R. (1988). The social psychology of procedural justice. NY: Plenum. Lindquist, T. M. (1995). Fairness as an antecedent to participative budgeting: Examining the effects of distributive justice, procedural justice and referent cognitions on satisfaction and performance. Journal of Management Accounting Research, 7, 122–147. Loeb, M., & Magat, W. (1978). Soviet success indicators and the evaluation of divisional management. Journal of Accounting Research (Spring), 103–121. Luft, J. (1994). Bonus and penalty incentives: Contract choice by employees. Journal of Accounting and Economics, 18, 181–206. Magner, N., & Welker, R. B. (1994). Responsibility center managers’ reactions to justice in budgetary resource allocation. Advances in Management Accounting (Vol. 3, pp. 237–253). Greenwich, CT: JAI Press. Melumad, N. D., & Reichelstein, S. (1989). Value of communication in agencies. Journal of Economic Theory, 47, 334–368. 166 THERESA LIBBY Merchant, K. A. (1985). Budgeting and the propensity to create budgetary slack. Accounting, Organizations and Society, 10(2), 201–210. Moorman, R. H., Blakely, G. L., & Niehoff, B. P. (1998). Does perceived organizational support mediate the relationship between procedural justice and organizational citizenship behavior? Academy of Management Journal, 41, 351–368. Moser, D. V., Evans, J. H., III, & Kim, C. K. (1995). The effects of horizontal and exchange inequity on tax reporting decisions. The Accounting Review, 70(4), 619–634. Murphy, K. J. (2000). Performance standards in incentive contracts. Journal of Accounting and Economics, 30(3), 245–278. Namazi, M. (1985). Theoretical developments of principal-agent employment contract in accounting: The state of the art. Journal of Accounting Literature, 4, 113–163. Naumann, S. E., & Bennett, N. (2000). A case for procedural justice climate: Development and test of a multilevel model. Academy of Management Journal, 43(5), 881–889. Otley, D. T. (1985). The accuracy of budgetary estimates: Some statistical evidence. Journal of Business Finance and Accounting, 12(3), 415–425. Reichelstein, S. (1992). Constructing incentive schemes for government contracts: An application of agency theory. The Accounting Review, 67, 712–731. Reichelstein, S., & Osband, K. (1984). Incentives in government contracts. Journal of Public Economics, 24, 257–270. Rousseau, D. M., & Parks, J. M. (1993). The contracts of individuals and organizations. In: L. L. Cummings & B. M. Staw (Eds), Research in Organizational Behavior (Vol. 15). JAI Press. Thompson, J. D. (1967). Organizations in action. New York: McGraw-Hill. Tushman, M. L., & Nadler, D. A. (1978). Information processing as an integrating concept in organizational design. Academy of Management Review, 3(3), 613. Tyler, T. R. (1989). The quality of dispute resolution processes and outcomes: Measurement problems and possibilities. Denver University Law Review, 66, 419–436. Tyler, T. R., & Lind, E. A. (1992). A relational model of authority in groups. In: L. Berkowitz (Ed.), Advances in Experimental Social Psychology (Vol. 25, pp. 115–191). Academic Press. Van der Stede, W. A. (2000). The relationship between two consequences of budgetary controls: Budgetary slack creation and managerial short-term orientation. Accounting Organizations and Society, 25(6), 609–622. Waller, W. S. (1988). Slack in participative budgeting: The joint effects of a truth-inducing pay scheme and risk preferences. Accounting, Organizations and Society, 87–98. Waller, W. S., & Chow, C. W. (1985). The self-selection and effort effects of standard-based employment contracts: A framework and some empirical evidence. The Accounting Review, 60(3), 458–476. Walker, K. B., & Johnson, E. N. (1999). The effects of budget-based incentive compensation scheme on the budgeting behavior of managers and subordinates. Journal of Management Accounting Research, 11, 1–28. Weitzman, M. (1976). The new Soviet incentive model. Bell Journal of Economics (Spring), 251–257. Young, S. M. (1985). Participative budgeting: The effects of risk aversion and asymmetric information on budgetary slack. Journal of Accounting Research, 23(2), 829–842. Young, S. M., & Lewis, B. (1995). Experimental incentive contracting research in management accounting. In: R. H. Ashton & A. H. Ashton (Eds), Judgment and Decision-making Research in Accounting and Auditing (pp. 55–75). Cambridge, NY: Cambridge University Press. The Effect of Fairness in Contracting on the Creation of Budgetary Slack 167 APPENDIX A Sample Payments Under the Truth-inducing Contract Cells of the table below represent the number of tickets earned under different combinations of budgeted and actual performance. Diagonal cells were shaded to emphasize that the maximum payments would be earned when the budgets subjects selected were equal to their actual performance. APPENDIX B Fair and Unfair Contracting Process Scenarios Fair Process: You have learned that your supervisor has held this supervisory position for many years. You have also noted that your supervisor appears to be very popular with your co-workers. Your supervisor’s philosophy is that the employees of the division are the experts when it comes to the work that they do and that much can be learned from listening to their suggestions. The formula that is used to calculate your earnings, as was described above, is a relatively new innovation within this division. The form of the contract was agreed upon approximately one year ago based on negotiations between representatives 168 THERESA LIBBY of the employee group and management. The management negotiation group was headed by your supervisor. Although you have been told that the negotiation process led to a degree of tension between your co-workers and your supervisor, your co-workers seem to be fully supportive of the contract as it now stands. You have been told by one of your co-workers, whose opinion you respect, that this is mainly due to strong communication between the employee and management groups during the negotiation process. You have also noticed that the majority of your co-workers with whom you have talked about the negotiation process feel that the management team was sincerely interested in their opinions about the earnings formula. Before the formula was finalized, the management team performed an informal poll of the employees who would be affected by it and found that the majority supported it. Whenever an issue came up on which there was disagreement, the worker and management groups were able to talk out their differences and come to a satisfactory solution, although the management group also offered to allow any unresolved issues to be passed on to an objective third-party decision maker. Many of the employees of this division have held positions within the division for many years. While increasing their overall pay is, of course, very important to your co-workers, providing accurate budgets to management and increasing overall production efficiency in order to ensure the long-term survival of the organization also seems to be high on their list of priorities. You have heard four or five of them say that they would have to be given a pretty large raise in pay before they would be willing to move to a job in another division mainly due to the positive atmosphere between employees and managers in this division. Unfair Process: You have learned that your supervisor has held this supervisory position for many years. You have also noted that, although your co-workers are polite and do as the supervisor asks, he does not seem to be very popular with them. The supervisor’s philosophy is that employees should work hard to receive higher pay and leave all other decisions to him. You have been told that the supervisor feels that his long-term position as supervisor of the division makes him the best judge of how the work should be done and he is not really interested in receiving feedback or suggestions from the employees that he supervises. The formula that is used to calculate your earnings, as was described above, is a relatively new innovation within this division. The form of the contract was agreed upon approximately one year ago based on negotiations between representatives The Effect of Fairness in Contracting on the Creation of Budgetary Slack 169 of the employee group and management. The management negotiation group was headed by your supervisor. You have been told that the negotiation process led to a great deal of tension between your co-workers and your supervisor. Your co-workers seem to be quite bitter about the contract as it now stands. You have been told by one of your co-workers, whose opinion you respect, that this is mainly due to the lack of communication between the employee and management groups during the negotiation process. You have also noticed that the majority of your co-workers with whom you have talked about the negotiation process feel that the management team appeared to be completely uninterested in their opinions about the earnings formula. Before the formula was finalized, the employee group suggested that an informal poll be taken of employees who would be affected by it to measure their degree of support. This suggestion was ignored by the management group. Whenever an issue came up on which there was disagreement, the worker and manager groups found it difficult to come to a satisfactory solution and generally, the solution was imposed by the person in charge of the management group, who happens to have been your supervisor. Many of the employees of this division have held positions within the division for only a year or two. Receiving the highest possible earnings at the end of each work period seems to be of utmost importance to your co-workers. You have heard four or five of them say that they view their position as only a “stepping stone” to a better position within another division of the organization. Increasing production efficiency and the long-term health of the division by providing accurate budgets to management does not seem to be high on their list of priorities. A few of your co-workers have commented that they would not have to be given a very large raise in pay, or any raise at all, to convince them to move to a job in another division of the organization where the atmosphere between the workers and the supervisor was more positive. A TOBIT ANALYSIS OF ACCOUNTING FACULTY PUBLISHING PRODUCTIVITY IN AUSTRALIAN AND NEW ZEALAND UNIVERSITIES Brett R. Wilkinson, Chris H. Durden and Katherine J. Wilkinson ABSTRACT This study examines the research behavior of Australian and New Zealand accounting faculty to determine the characteristics that inﬂuence research productivity. University reputations are integrally linked with research performance and determining the qualities that predict research behavior may be of particular value in the selection and recruitment process. The study ﬁnds that two key factors signiﬁcantly impact performance: holding a Ph.D. and having an academe-oriented rather than profession-oriented background. These results may be interpreted as afﬁrming the U.S. model of developing specialist academic researchers through doctoral education programs rather than employing faculty with strong professional experience. 1. INTRODUCTION Research is an integral function of any university and a key determinant of academic reputation (Baden-Fuller et al., 2000). Primarily, a university’s research Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 173–186 Copyright © 2003 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06008-3 173 174 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON productivity is measured by the quantity and quality of published outputs achieved by its faculty. In this sense publication output is an important method of assessing academic performance for promotion and tenure decisions (Englebrecht et al., 1994; Zivney, Bertin & Gavin, 1995). It enables research students, prospective faculty members, and the academic community in general to make better-informed decisions about the standing of a particular institution or department (Bairam, 1996; Cargile & Bublitz, 1986; Demski & Zimmerman, 2000; Hull & Wright, 1990). Additionally, the quantum of publishing activity may influence the level of funding a department receives (Doyle & Arthurs, 1995; Gray & Helliar, 1994). The importance of published research was aptly highlighted by Cargile and Bublitz (1986) with their reference to a statement by Davidson (1957, p. 117) that: The effectiveness or efficiency of a faculty is indeed difficult to measure, and I would not deny the important faculty function performed by the non-writing researcher. However, I think he is likely to be a relative “rare bird.” For the great majority of faculty members, it seems to me that we must continue to emphasize the place of research and publication in their programs. Only by this procedure can we hope to have accounting remain a vital and stimulating force in business education and management. Understanding the factors that impact the level of research (measured in publications) achieved by faculty members is very important from a university perspective. This is particularly relevant when recruiting new and inexperienced faculty, where the existing faculty must rely on indicators of possible future publication success rather than on an observed publication stream. The issue is even more salient given that “Most academics publish very little, or not at all” (Demski & Zimmerman, 2000, p. 346). However, research studies in accounting investigating factors that indicate future publishing output levels are relatively limited (e.g. Cargile & Bublitz, 1986; Gee & Gray, 1989; Gray & Helliar, 1994; Maranto & Streuly, 1994). These studies suggest that various factors, such as the institutional setting of the researcher and possessing a Ph.D., impact the level of research output. A difficulty with conducting this form of research is that only a relatively small number of the factors that are likely to influence publication output are “observable” (e.g. research interests, Ph.D. qualification). Other relevant factors are likely to be more difficult to accurately measure (e.g. ability, ambition) (Gray & Helliar, 1994). This study examines the research behavior of Australian and New Zealand accounting faculty to determine the characteristics that influence research productivity. In essence, the study asks what factors will predict the desired research behavior, namely papers published in quality academic journals. It builds on the work of Wilkinson and Durden (1998) and Durden et al. (1999) who measured research outputs in an attempt to enable comparisons of performances across universities. Those studies served to provide a basis for ranking university A Tobit Analysis of Accounting Faculty Publishing Productivity 175 departments, but did not seek to explain in any comprehensive sense the observed differences between individual faculty performances. This study develops a Tobit model to explain publishing output behavior. The findings indicate that two key factors contribute to publishing performance – holding a Ph.D. qualification and having an academe-orientation and background rather than an extensive professional background. Other indicators of publishing productivity were having stated research interests in the financial accounting, managerial accounting and auditing fields. This may also reflect a bias in the higher-ranked journals toward these areas of interest. That is to say, researchers may focus their research efforts in financial accounting, managerial accounting and auditing because the more highly ranked journals are more open to accepting research in these areas than in newer subdisciplines. This is consistent with Daigle and Arnold’s (2000) suggestion that many of the accounting information systems researchers are forced to develop and promote research interests in other subdisciplines because research in these other areas (financial, managerial and tax accounting) is more likely to result in the highly-ranked journal publications required for tenure purposes. The remainder of the paper is organized as follows. Section 2 develops the hypotheses in the context of the extant literature. Section 3 outlines the model development and data analysis. Results are shown in Section 4 and conclusions and limitations are discussed in Section 5. 2. HYPOTHESIS DEVELOPMENT Based on an analysis of prior literature several important characteristics appear to impact research output. First, possessing a Ph.D. impacts research productivity. Since the Ph.D. comprises by definition an intensive research preparation process, a positive relationship likely exists between research productivity and possession of a Ph.D. degree. Arguments about the importance of the Ph.D. are often based on theories of human capital (Long et al., 1998; Maranto & Streuly, 1994). In this sense the Ph.D. provides students with higher levels of intellectual capital which should result in higher levels of research output and career success. This may exist among graduates from a range of Ph.D. programs rather than being restricted only to those with high status academic origins (Long et al., 1998). Other research has also indicated an association between holding a Ph.D. and research productivity (Gray et al., 1987; Gray & Helliar, 1994). The Australian and New Zealand context provides an opportunity to further explore the role of the Ph.D. because only a relatively small proportion of faculty in these two countries hold a Ph.D. At the time this study was undertaken only 24% of faculty members were Ph.D. qualified. H1, in the alternate form, is as follows: 176 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON H1. There is a positive relationship between Ph.D. qualification and research productivity. Other factors that would seemingly be significant explanators of publishing productivity include ambition and motivation (Demski & Zimmerman, 2000; Gray & Helliar, 1994; Long et al., 1998; Maranto & Streuly, 1994). Because explicitly measuring these constructs is difficult, appropriate proxy measures must be sought. Here, the tenure/position confirmation process and promotion structure should motivate publishing activity in the early years of employment (Demski & Zimmerman, 2000). This may decrease in importance as the number of years of employment at a given institution significantly increases. Furthermore, periods of low productivity are expected to impede promotion opportunities to other similarly ranked institutions (Zivney et al., 1995). Accordingly, H2 is stated as follows: H2. There is an inverse relationship between years of employment at a given institution and publishing productivity. Closely related to the above concepts of motivation and ambition, is that of motivation for, or interest in, research (Demski & Zimmerman, 2000). Here, faculty with a predominantly professional background and focus may be less likely to prioritize the research process. These faculty members may have entered academe predominantly to become teachers rather than to pursue an interest in research. Various accounting faculty who have commenced in academe after spending considerable time in practice, have commented on the attraction of teaching as a reason for making the career change (e.g. see Beresford, 2001; Meyer & Titard, 2000). Conversely, those with predominantly academic backgrounds who started a research career early, are likely to have self-selected into such a career owing to a research orientation. This is also reinforced by the observation that increasing numbers of accounting faculty no longer hold professional qualifications (Newell et al., 1996; Otley, 2002). Accounting faculty pursuing an academic career from an early age are expected to have a strong research, rather than teaching or practice, orientation (Abdolmohammadi et al., 1985; Imhoff, 1988; Mautz, 1988). Further, in a non-U.S. context where the pursuit of the Ph.D. has been less prevalent, it is now increasingly expected that university staff should have a completed Ph.D. as a prerequisite to an academic career (Blaxter et al., 1998). This also coincides with greater recognition of the pursuit of research, rather than teaching, as the primary purpose of a university career (Blaxter et al., 1998). The emphasis on research early in a university career is reflective of the theory of accumulative advantage where an “Initial advantage or disadvantage compounds over time. A premium is placed on a quick start, and ‘late blooming’ is penalized” (Maranto & Streuly, 1994, p. 388). Two measures are employed A Tobit Analysis of Accounting Faculty Publishing Productivity 177 as proxies for research interest or orientation: membership of a professional accounting body, and employment background.1 H3 and H4 are stated as follows: H3. There is an inverse relationship between research productivity and professional body membership. H4. There is an inverse relationship between research productivity and substantial experience outside academe (five years or more). Measures of faculty research interests are included as control variables (Gray & Helliar, 1994). No predictions are made as to the significance of these interests, although there is a possibility that highly ranked journal subdiscipline preferences may impact the extent to which a researcher can achieve highly ranked publications in a given field of interest (Hasselback et al., 2000). These measures serve as controls for journal biases toward certain fields. As a final control, a measure of the region of highest qualification is included. Researchers trained in the U.S. may have a greater access to U.S. journals, many of which rank highly in a range of quality indexes (e.g. Brinn et al., 1996; Hasselback et al., 2000; Nobes, 1985). This measure may also proxy for research interest, since Australian and New Zealand accounting faculty who have chosen to pursue Ph.D. qualifications in the U.S. are likely to have done so on the basis of an orientation toward research rather than toward the profession.2 3. MODEL DEVELOPMENT AND DATA ANALYSIS The data used in this study is sourced from Wilkinson and Durden (1998) and Durden et al. (1999). This data was collected from several sources. The Jacaranda Wiley Directory of Accounting: 1998–1999 for Australia and New Zealand was used to derive names and basic data for all Australian and New Zealand accounting faculty, at the lecturer level and above.3 After deletion of non-accounting faculty4 included in the directory (e.g. business law and finance faculty included within accounting departments) and those below the rank of lecturer, the total sample size is 716 faculty members.5 The basic data collected includes research interests, the length of time of employment at the current institution, qualifications held and prior employment details. The research output of each faculty member for the five-year period 1992 to 1997 was derived from an electronic database (ABI Inform) and from hard copies of the Accounting and Tax Index. While these indexes provide a relatively comprehensive coverage of the literature, they do not include several key Australian and New Zealand journals. Failure to include these potentially biases the results since Australian and New Zealand 178 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON accounting faculty might be expected to focus their publishing efforts in these journals. Accordingly, the data obtained from these indexes are supplemented with publications from all Australian and New Zealand accounting journals not included in either the Accounting and Tax Index or ABI Inform. Only research published in journals was utilized.6 The weighted measure of publication output developed by Wilkinson and Durden (1998) and Durden et al. (1999) and using Zeff’s (1996) library holdings measures as a quality index, is used as a measure of research productivity. This weighted measure adjusts each author’s allocation for the number of coauthors (half an article is attributed to each author of a published paper with two authors, one third to each for a published paper with three authors and so on). The measure also allocates a quality weighting based on Zeff’s (1996) library survey. Zeff (1996) conducted a survey of 12 major libraries, five in the U.S., five in the U.K. and two in Australia and measured the library holdings of 77 accounting research journals. Journals are then effectively given a rating from zero to 12 based on the number of the 12 libraries that held the journal. The study uses a quality weighting based on this rating (e.g. a journal held by all 12 libraries was weighted as 1 and a journal held by 6 of the libraries was weighted as 0.5). The Zeff (1996) results are comparable to studies using faculty surveys to measure quality. The 15 journals that were held by 11 or 12 of the 12 libraries reviewed by Zeff (1996) compared closely with the top journals identified using surveys of accounting faculty (e.g. Brown & Huefner, 1994; Hull & Wright, 1990). A key benefit of using Zeff’s ratings is that his research is international in nature and thus likely provides a more appropriate measure of quality for faculty in the Australian-New Zealand environment. The Zeff (1996) ratings for the journals in which faculty in this study had published are shown in Table 1. Interested readers are referred to Zeff (1996) for the ratings of the full 77 journal set. Descriptive statistics for the data used in the model are shown in Table 2. Although there may be a possibility of correlation amongst the independent variables, due to these variables proxying for underlying but unmeasurable qualities, there is no evidence of unreasonably high correlations, as shown in Table 3. The estimated model is as follows: Weighted publications = ␤ + ␤1 years employed + ␤2 financial accounting + ␤3 managerial accounting + ␤4 auditing + ␤5 tax + ␤6 theory + ␤7 education + ␤8 other + ␤9 U.S. qualified + ␤10 Ph.D. + ␤11 membership + ␤12 academe + A Tobit Analysis of Accounting Faculty Publishing Productivity 179 Table 1. Ratings Derived from Zeff (1996) for Journals in the Sample. Abacus Accounting and business research Accounting and finance 12 12 7 Accounting, auditing and accountability journal Accounting forum Accounting historians journal Accounting history 8 2 10 0 Accounting horizons 12 Accounting, organizations and society The Accounting review Advances in accounting 12 12 7 Advances in international accounting Advances in management accounting 6 3 Asian review of accounting 0 Auditing: A journal of theory and practice Australian accounting review Behavioral research in accounting 10 3 7 British accounting review Contemporary accounting research Financial accountability and management The international journal of accounting Issues in accounting education Journal of accounting and economics Journal of accounting and public policy Journal of accounting, auditing and finance Journal of accounting education 11 11 6 Journal of accounting research Journal of business finance and accounting Journal of cost management Journal of international accounting, auditing and taxation Journal of international financial management and accounting Management accounting research 12 12 Pacific accounting review Research in accounting in emerging economies 3 1 11 11 12 11 11 6 9 3 7 7 where: years employed = years employed at current institution; financial, managerial, auditing, tax, theory, education and other = stated faculty areas of research interest; U.S. qualified = a dummy variable taking the value of 1 if the highest educational qualification is from a U.S. institution and zero otherwise; Ph.D. = a dummy variable taking the value of 1 if a Ph.D., DBA or D.Phil. qualification is held and zero otherwise; Membership = a dummy variable taking the value of 1 if professional body membership is held and zero otherwise; Academe = a dummy variable taking the value of 1 if the individual has less than 5 years of experience outside academe and zero otherwise. 180 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON Table 2. Descriptive Statistics for Variables Used in the Study. Panel A: Continuous Variables Variable Weighted publications Years employed at current institution Panel B: Dummy Variables Interest Financial accounting research interest Managerial accounting research interest Auditing research interest Taxation research interest Accounting education research interest Accounting theory research interest Other research interests Qualifications from US Ph.D. Professional membership Academic orientation N Mean Standard Deviation Minimum Maximum 716 716 0.29 9.61 0.68 6.77 0 0 5.625 39.000 Frequency Percent 160 126 84 41 68 17 398 30 172 551 466 22.4 17.6 11.7 5.7 9.5 2.4 55.6 4.2 24.0 77.0 65.1 Since a large number of faculty members had no publications during the period of measurement, there is a high proportion of zeros in the weighted publications measure. Thus, while data for the independent variables is available, the data for the dependent variable is of a censored nature. One possibility would be to estimate the model via OLS using only those faculty for whom the dependent variable is nonzero. However, as noted by Judge et al. (1988), this results in biased and inconsistent estimators. A more appropriate approach is to estimate a Tobit regression model. McDonald and Moffitt (1980, p. 318) identify the Tobit model as assuming “an underlying, stochastic index equal to (X t ␤+u t ) which is observed only when it is positive, and hence qualifies as an unobserved, latent variable.” They express the stochastic model as follows: Yt = Xt ␤ + ut Yt = 0 if Xt ␤ + ut > 0 if Xt ␤ + ut ≤ 0 where: t = 1, 2, . . ., N. A Tobit model is estimated via the SAS LIFEREG procedure, using the normal probability distribution for the error term. As noted by McDonald and Moffitt (1980), the estimated regression parameters cannot be interpreted in the usual sense. They will, however, enable us to ascertain the independent variables that significantly impact publishing performance. As noted later in the paper, the Tobit A Tobit Analysis of Accounting Faculty Publishing Productivity 181 Table 3. Pearson Correlation Coefficients (p Values) Between Key Independent Variables. Years Years U.S. qualifications Membership Ph.D. 1.00000 U.S. Qualifications −0.02101 (0.5747) 1.00000 Membership Ph.D. Academe 0.11230 (0.0026) −0.06764 (0.0705) 1.00000 −0.04369 (0.2429) 0.15979 (<0.0001) −0.01835 (0.6241) 1.00000 0.01701 (0.6496) 0.06543 (0.0802) −0.13644 (0.0003) 0.11011 (0.0032) 1.00000 Academe model can be interpreted as providing a probability of publishing measure for individuals with a given set of characteristics. 4. ANALYSIS AND RESULTS The results of the Tobit model estimation are shown in Table 4. A comparison of this model against an intercept only Tobit model indicates that the model is statistically significant and that all estimated coefficients can be considered to be non-zero. The estimated model suggests that several factors are highly significant in determining weighted publications achieved. Consistent with H1, whether an individual holds a Ph.D. or not is a critical determinant in the level of weighed publications obtained. Those with Ph.D.s were substantially more likely to have achieved weighted publications than those without. This lends support to the current trend in Australia and New Zealand toward requiring a Ph.D. for entry to accounting academia. It is also consistent with the U.S. experience where Ph.D.s have been required since the 1960s, a move that was expressly designed to foster greater research outputs within business schools. The results also support H4, relating to an individual’s orientation toward academe or the profession. Essentially, whether an individual’s background included an extended time (5 years or more) of experience outside academia, was significant in determining publications. Those who had no such experience (and were deemed to have an academic/research orientation) were significantly more likely to have achieved weighted publications than those who had a professional experience background. This raises some issues about the type of faculty that universities should seek to hire. Although there may be some tendency in Australia and New Zealand to attach value to individuals with professional experience, 182 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON Table 4. Results of the Tobit Model Estimation. Variable Intercept Years employed Financial accounting Managerial accounting Auditing Tax Theory Education Other US qualified Membership Ph.D. Academic Scale ( – hat) Log likelihood Parameter Estimate 2 p Value −0.67 0.003 0.57 0.57 0.46 −0.42 0.38 0.01 0.17 0.11 0.09 1.29 0.55 1.15 −606.08 57.02 0.12 18.26 13.63 7.44 2.00 1.30 0.00 2.06 0.22 0.48 114.74 20.07 <0.0001 0.72 <0.0001 0.0002 0.006 0.16 0.25 0.96 0.15 0.64 0.49 <0.0001 <0.0001 Weighted publications = ␤ + ␤1 years employed + ␤2 financial accounting + ␤3 managerial accounting + ␤4 auditing + ␤5 tax + ␤6 theory + ␤7 education + ␤8 other + ␤9 US qualified + ␤10 Ph.D. + ␤11 membership + ␤12 academe + this study calls into question the extent to which such individuals will be likely to achieve quality research outputs, a critical determinant of a university’s reputation. H2 (years of employment at current institution) and H3 (membership of professional body) were not supported. The failure of professional membership to explain productivity may be related to the fact that a high number of faculty hold such membership. Faculty may derive significant benefits from such membership (for example, insurance benefits) such that even faculty with a low professional interest, may maintain membership. The insignificance of years of employment is surprising but may indicate that faculty with a strong research interest maintain that interest over time and may derive sufficient reward within their own institutions (Gray & Helliar, 1994). Also of note were the significant coefficients for faculty research interests in financial accounting, managerial accounting and auditing. This may reflect a bias in the higher-ranked journals toward these areas of interest (Hasselback et al., 2000). Some concerns could be raised with respect to the poor performance of faculty with stated tax research interests. Here, there is a negative, though not significant, relationship between an expressed interest in taxation and weighted publications. This may reflect the tendency, particularly in Australia, for tax faculty to be concentrated in law/business law disciplines. Tax publishing has A Tobit Analysis of Accounting Faculty Publishing Productivity 183 Table 5. Probability Distribution for an Individual with 5 Years’ Employment, an Interest in Financial Accounting, with a U.S. Qualification and Classed as an Academic. Weighted Publication Level Zero Zero to One One to Two More than Two Without Ph.D. With Ph.D. 0.64 0.23 0.27 0.37 0.08 0.29 0.01 0.10 accordingly trended toward legal based research rather than empirical accounting research. This is consistent with comments by Schulman et al. (1996) concerning the low level of empirical research into the policy implications of tax integration, a reform that has been implemented in Australia, New Zealand, Canada and the U.K., along with a range of other countries outside the U.S.. The holding of U.S. qualifications was also non-significant. This may be the result of the low levels of individuals holding such qualifications (30 out of 716 faculty).7 As noted earlier, the Tobit model parameters cannot be interpreted in the same manner as those derived from ordinary least squares. However, the Tobit model can be used to estimate the probability that an individual with a given set of characteristics will publish at a certain level. In fact, an entire probability distribution can be developed for an individual with a given set of characteristics. For example, consider an individual with 5 years’ of employment at their current institution, who has a stated interest in financial accounting, is not a member of a professional organization and who has less than five years’ experience outside the academic environment, is U.S. qualified with no Ph.D., the probability distribution shown in Table 5, row 1 would arise.8 If, by way of contrast, an equivalent individual with respect to the stated characteristics is considered but who also possesses a Ph.D., the probability distribution shown in Table 5, row 2 arises. Thus, the model predicts a higher probability of increased publishing performance across the board, and a much reduced probability of having no publications for an individual with a Ph.D. relative to one without. 5. CONCLUSIONS, LIMITATIONS AND SUGGESTIONS FOR FURTHER RESEARCH This study is focused on developing a model that predicts the likelihood of current faculty or potential faculty publishing at a various levels. Using a measure of Australian and New Zealand faculty publishing productivity over a five year period, the study provides evidence that two key factors significantly impact 184 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON performance: holding a Ph.D. and having an academe-oriented rather than profession-oriented background. These findings are revealing in terms of the types of faculty that universities should consider recruiting, to the extent that research productivity is perceived as being important. As with all models, the Tobit analysis represents a simplification of reality. The publication data on which the model is based most likely contain errors of measurement, resulting in less precise estimates. Further, the extent to which the reported research interests reflect genuine active research interests cannot be ascertained from the data. The model is limited in terms of its applicability, to the Australian and New Zealand context, from which the data were obtained. Generalization to other contexts, such as North America, Europe and Asia, may be problematic given institutional and cultural differences. The findings are broadly consistent however, with the results of Gray and Helliar (1994) in the U.K. context. We encourage research exploring similarities and differences in factors explaining research productivity across different cultural settings in order to facilitate a greater understanding. Analysis of the extent to which the model is robust to different measures of research productivity would also be worthwhile. NOTES 1. Employment background was coded as “professional” for individuals with 5 years or more experience in a non-academic role, and as “academic” for those with less than 5 years experience outside academe. 2. This assessment ignores migration of U.S. citizens already holding Ph.D. qualifications to Australia and New Zealand, about which no a priori belief is held. Further, the study uses “highest qualification from U.S.” rather than Ph.D. specifically. A subsequent test using only U.S. Ph.D. qualification resulted in no qualitative differences. 3. The directory also included part time doctoral teaching assistants and assistant lecturers, neither of whom would be considered permanent faculty, and were excluded accordingly. Tutors were also excluded on the basis that their role is explicitly teaching based, and on grounds that they also tend not to be regarded as permanent faculty. 4. Although the directory is primarily accounting specific, it does include some nonaccounting faculty. Where possible, such faculty were identified and eliminated based on qualifications, teaching responsibilities and research interests. It is possible, however, that in some instances non-accounting faculty may not have been identifiable as such and hence were included. For example, finance faculty listed in the directory that held professional accounting memberships might not have been clearly distinguishable from accounting faculty. It is likely, however, that most departments registered only accounting faculty in the directory and that most non-accounting faculty that were included were identified and deleted. 5. Limited other deletions were made including the deletion of a dean. Details can be found in Wilkinson and Durden (1998) and in Durden et al. (1999). A Tobit Analysis of Accounting Faculty Publishing Productivity 185 6. Only published articles were included. Hence, published book reports and monographs were excluded from the study. 7. As a further check, this was restricted to U.S. Ph.D. qualifications. The estimated coefficient was negative but not significant and there was no qualitative change in the other estimated coefficients. 8. Probabilities are calculated as follows: P(publications ≤ W P) = P(Z ≤ (W P − t ␤)/) For example, the probability that the individual in Table 5 without a Ph.D. will publish zero publications is: P(publications = 0) = P(Z < 0 − (−1.66455 + 0.00279 × 5(YEARS) + 0.54979(FINANCIAL) + 0.11835(U.S. QUALIFIED) + 0.54851(ACADEMIC))/1.15438) or P(Z < 0.376) = 0.647. ACKNOWLEDGMENTS The authors wish to thank Peter Westfall for his assistance with the methodological development. We also thank the editor, Vicky Arnold, and an anonymous reviewer for helpful comments and suggestions in revising the paper. REFERENCES Abdolmohammadi, M. J., Menon, K., Oliver, T. W., & Umapathy, S. (1985). The role of the doctoral dissertation in accounting research careers. Issues in Accounting Education, 3, 59–76. Baden-Fuller, C., Ravazzolo, F., & Schweizer, T. (2000). Making and measuring reputations: The research rankings of European business schools. Long Range Planning, 33(5), 621–650. Bairam, E. I. (1996). Research productivity in New Zealand university economics departments, 1988–1995. New Zealand Economics Papers, 30, 229–241. Beresford, D. R. (2001). Guest editorial: If I could do it over again . . .. The CPA Journal, 71(7), 80. Blaxter, L., Hughes, C., & Tight, M. (1998). Writing on academic careers. Studies in Higher Education, 23(3), 281–295. Brinn, T., Jones, M. J., & Pendlebury, M. (1996). U.K. accountants’ perceptions of research journal quality. Accounting and Business Research, 26(3), 265–278. Brown, L. D., & Huefner, R. J. (1994). The familiarity with and perceived quality of accounting journals: Views of senior accounting faculty in leading U.S. MBA programs. Contemporary Accounting Research, 11(1), 223–250. Cargile, B. R., & Bublitz, B. (1986). Factors contributing to published research by accounting faculties. The Accounting Review, 61(1), 158–178. Daigle, R., & Arnold, V. (2000). An analysis of the research productivity of AIS faculty. International Journal of Accounting Information Systems, 1, 106–122. Davidson, S. (1957). Research and publication by the accounting faculty. The Accounting Review, 32(1), 114–118. Demski, J. S., & Zimmerman, J. L. (2000). On Research vs. Teaching: A long-term perspective. Accounting Horizons, 14(4), 343–352. 186 B. R. WILKINSON, C. H. DURDEN AND K. J. WILKINSON Doyle, J. R., & Arthurs, A. J. (1995). Judging the quality of research in business schools: The U.K. as a case study. Omega International Journal of Management Science, 23(3), 257–270. Durden, C. H., Wilkinson, B. R., & Wilkinson, K. J. (1999). Publishing productivity of Australian accounting ‘units’ based on current faculty composition. Paciﬁc Accounting Review, 11(1), 1–27. Englebrecht, T. D., Govind, S. I., & Patterson, D. M. (1994). An empirical investigation of the publication productivity of promoted accounting faculty. Accounting Horizons, 8(1), 45–68. Gee, K. P., & Gray, R. H. (1989). Consistency and stability of U.K. academic publication output criteria in accounting. British Accounting Review, 21(1), 43–54. Gray, R. H., Haslam, J., & Prodham, B. K. (1987). Academic departments of accounting in the U.K.: A note on publication output. British Accounting Review, 19(1), 53–71. Gray, R., & Helliar, C. (1994). U.K. accounting academics and publication: An exploration of observable variables associated with publication output. British Accounting Review, 26(3), 235–254. Hasselback, J. R., Reinstein, A., & Schwan, E. S. (2000). Benchmarks for evaluating the research productivity of accounting faculty. Journal of Accounting Education, 18(2), 79–97. Hull, R. P., & Wright, G. B. (1990). Faculty perceptions of journal quality: An update. Accounting Horizons, 4(1), 77–97. Imhoff, E. A. (1988). Planning academic accounting careers. Issues in Accounting Education, 3(2), 286–301. Judge, G. G., Hill, R. C., Griffiths, W. E., Lutkepohl, H., & Lee, T.-C. (1988). Introduction to the theory and practice of econometrics (2nd ed.). New York: Wiley. Long, R. G., Bowers, W. P., Barnett, T., & White, M. C. (1998). Research productivity of graduates in management: Effects of academic origin and academic affiliation. Academy of Management Journal, 41(6), 704–714. Maranto, C. L., & Streuly, C. A. (1994). The Determinants of accounting professors’ publishing productivity – The early career. Contemporary Accounting Research, 10(2), 387–407. Mautz, R. K. (1988). Editorial: Fifty years of accounting. Accounting Horizons, 2(1), 126–129. McDonald, J. R., & Moffitt, R. A. (1980). The uses of Tobit analysis. Review of Economics and Statistics, 62(2), 318–321. Meyer, M. J., & Titard, P. L. (2000). Those who can . . . teach. Journal of Accountancy, 190(1), 49–58. Newell, G., Langsam, S., & Kreuze, J. (1996). Accounting faculty profiles: Demographics and perceptions of academia. Journal of Education for Business, 72(2), 87–94. Nobes, C. W. (1985). International variations in perceptions of accounting journals. The Accounting Review, 60(4), 702–705. Otley, D. (2002). British research in accounting and finance (1996−2000): The 2001 research assessment exercise. British Accounting Review, 34(4), 387–417. Schulman, C. G., Thomas, D. W., Sellers, K. F., & Kennedy, D. B. (1996). Effects of tax integration and capital gains tax on corporate leverage. National Tax Journal, 46(1), 31–54. Wiley, J. (1998). Jacaranda Wiley directory of accounting: 1998–1999. Brisbane, Australia: Jacaranda Wiley. Wilkinson, B. R., & Durden, C. H. (1998). A study of accounting faculty publishing productivity in New Zealand. Paciﬁc Accounting Review, 10(2), 75–95. Zeff, S. A. (1996). A study of academic research journals in accounting. Accounting Horizons, 10(3), 158–177. Zivney, T. L., Bertin, W. J., & Gavin, T. A. (1995). A comprehensive examination of faculty publishing. Issues in Accounting Education, 10(1), 1–25. CLASSIFICATION OF CUSTOMIZED ASSURANCE SERVICES BY DECISION MAKERS: THE CASE OF SysTrust™ Philip R. Beaulieu ABSTRACT When decision makers encounter new assurance services that can be customized for individual clients, they must include them in their pre-existing categorization of assurance, a cognitive task known as postclassiﬁcation. This paper draws upon three literatures (classiﬁcation research in accounting, theory of assurance, and cognitive psychology) in order to suggest how this task might be modeled and studied empirically, using the example of SysTrust™ . The role of a necessary condition for successful postclassiﬁcation called the category use effect (Ross, 2000), in which decision makers are reminded of pre-existing categories when they learn to use new categories, is explained. 1. INTRODUCTION New forms of assurance1 provided by public accountants have proliferated in the last decade due to both supply and demand factors. On the supply side, public accounting firms have sought to generate revenue in growth areas of assurance and related consulting activities because growth opportunities in the mature market for traditional financial statement assurance are limited. Demand for innovation in assurance stems partly from technological innovation, which has led to concerns Advances in Accounting Behavioral Research Advances in Accounting Behavioral Research, Volume 6, 189–215 © 2003 Published by Elsevier Ltd. ISSN: 1474-7979/doi:10.1016/S1474-7979(03)06009-5 189 190 PHILIP R. BEAULIEU about the reliability of electronic processes (AICPA, 2000) and electronic reporting of information (Lymer et al., 1999). The result of these supply and demand pressures is customized assurance geared towards non-traditional (at least for public accounting firms) market segments. Customized assurance services, for example services applied to the reliability of systems, share some features in common with traditional financial statement assurance and some significant differences. A major challenge facing public accounting firms is to communicate these similarities and differences between traditional and customized assurance to their clients and other decision makers who rely upon their services (AICPA, 2000). Decision makers face the cognitive task of revising their previous classification of assurance services by incorporating new categories for customized services. Before, they may have placed assurance into one or two simple categories representing all financial statement attestation, or attestation broken down into audit and review level assurance. They now need new subcategories of assurance, classified either as attestation or some other form of assurance, in which to classify customized assurance. If decision makers cannot consistently recognize distinguishing features of all forms of assurance, especially the level of assurance, the risk is that some features will be inaccurately attributed to customized assurance categories, and that public accountants will incur a negative public reaction. For instance, decision makers may assume that assurance regarding reliability of systems is at the same level as an audit of financial statements, whereas practitioners impose qualifications on the assurance related to the criteria by which systems are defined. This would not be an expectations gap about one type of attestation (audit), as has been discussed before (e.g. Houston & Taylor, 1999), but a set of multiple expectations gaps regarding multiple assurance categories. The purposes of this paper are to facilitate understanding of customized assurance classification and suggest avenues for future empirical research in customized assurance services by combining three literatures: behavioral classification research in accounting and auditing (discussed in Section 2), theory of assurance classification (Sections 3 and 4), and recent behavioral research in cognitive psychology (Section 5). Specific research opportunities are identified in Section 6. The first key source mentioned in the paper is Cohen (2000), who proposed two evidential requirements for hierarchical classification systems that are used to organize the paper: non-empirical logical evidence and empirical behavioral evidence. The primary sources of assurance classification are various publications of the American Institute of Certified Public Accountants (AICPA) and Kinney (2000), and the important behavioral research is Ross (1996, 1997, 1999, 2000). Ross studied postclassification, where people revise pre-existing Classiﬁcation of Customized Assurance Services by Decision Makers 191 classification systems to include new categories; this is relevant to the expansion of assurance services beyond traditional audit, review, and compilation of financial statements. SysTrust™ Version 2.0, initially released as an exposure draft in 2000 and effective in examination periods beginning August 31st, 2002, is used to illustrate classification of customized assurance services. Jointly issued by the AICPA and the Canadian Institute of Chartered Accountants (CICA), SysTrust™ is intended to provide assurance to a firm’s management, investors, and partners regarding the reliability of its systems. SysTrust™ is described in Section 4 as a customized service because it encompasses attestation and non-attestation engagements, as well as many types of systems. This range of services within the SysTrust™ brand name complicates the task of classification of the services by decision makers, making the case of SysTrust™ ideally suited to the purposes of this paper. 2. COGNITIVE MODELS OF CLASSIFICATION IN ACCOUNTING Classification systems can in principle be based on the world, culture, language, or the mind (Dahlgren, 1995). Although these perspectives are related, the mental basis is the primary interest of this paper because relevant literature in both accounting (especially Bonner et al., 1997) and cognitive psychology (Ross, 1996, 1997, 1999, 2000) is available. This section presents a brief review of research in accounting with a cognitive approach to classification, and an explanation of two evidential requirements for classification systems proposed by Cohen (2000) that will be used to structure the remainder of the paper. A discussion of literature in cognitive psychology is reserved for Section 5. In accounting, cognitive models of classification are commonly referred to as knowledge structures or schemas, and are used in the fields of auditing and decision-making uses of financial statements.2 The heaviest use of knowledge structures as classification systems has been in auditing, where a variety of dependent variables (tasks) have been studied. Frederick et al. (1994) and Nelson et al. (1995) used card-sorting tasks to obtain evidence of transaction cycle-dominant and audit objective-dominant knowledge structures; Nelson et al. (1995) also included conditional probability estimation tasks. Bonner et al. (1997) also examined knowledge structures based on transaction cycles and audit objectives, but selected error frequency and audit planning tasks. Choo and Trotman (1991) used recall and predictions of the probability of company failure by auditors to test whether their knowledge structures encode the typicality of 192 PHILIP R. BEAULIEU a going-concern firm. Recall was also the dependent variable used by Moeckel (1990) and Libby and Trotman (1993); see Libby (1995) for a review of research in auditing involving knowledge structures and memory. Memory models and recall have been featured in research concerning external users of financial statements, although less has been done in this area than in auditing. Beaulieu (1996) posited that commercial loan officers use a classification system based on the Five Cs of Credit (character, capacity, capital, conditions and collateral), a classification system used to teach loan officers to process information and make loan decisions. Greater recall of decision-consistent character and accounting (capacity and capital) information than decision-inconsistent information provided evidence that the classification system resided in long-term memory and biased recall in favor of decision-consistent information. Another example of research involving users of financial statements is Kida et al. (1998), who proposed that managers making stock investment and financial difficulty decisions encode (classify) accounting information according to affect, a positive or negative response to numerical data. Recall and decision results supported the existence of an affect-based classification system in long-term memory. A fair question to ask is whether auditing and accounting classification systems really exist in the minds of auditors and financial statement users, psychologically and neurologically, or whether they exist only as conventions that are convenient for research purposes. Cohen (2000, p. 2) proposed two types of evidential requirements – logical and behavioral – for hierarchical classification systems, in which “lower level items inherit the properties of higher level items.” Logical evidence requires a convincing argument that a hierarchy is more efficient than alternative methods of organizing and accessing knowledge. The argument for an efficient system asserts that it enables economical storage and access to information, and that “representation of factual knowledge at different levels of generality facilitates the identification of useful analogies” (p. 5). Behavioral evidence consists of experiments in which different hierarchical levels are presented, causing effects in response times, error rates, and quality of responses.3 Bonner et al. (1997) illustrates how these two criteria can be used to evaluate potential classification systems. Bonner et al. (1997) studied how accounting students learn to estimate the frequency of financial statement errors. Subjects in their experiment were taught either: (1) the relationship between financial statement errors and three categories of transaction cycles: sales and receipts, inventory/purchases, and investments; or (2) the relationship between errors and three categories of audit objectives: proper cutoff, validity, and valuation. Subjects then observed a sequence of errors and finally were asked for frequency estimates. The first hypothesis of Bonner et al. (p. 391) was: Classiﬁcation of Customized Assurance Services by Decision Makers 193 Subjects receiving transaction cycle (audit objective) category instruction prior to experiencing frequencies will make frequency estimates which more closely reflect experienced error frequencies for transaction cycle (audit objective) categories than for audit objective (transaction cycle) categories. The logical evidence required by Cohen (2000) to justify this hypothesis is in the domain of accounting and auditing; classification systems may cause similar effects in other contexts, but a hypothesis worded as specifically as this ought to be supported by a reasonable accounting story. Bonner et al. (1997) argued that accounting students lack consistent education regarding categories of financial statement errors and that in their early experience they learn slowly about errors because they seldom encounter them. Thus, the first piece of logical evidence supporting the hypothesis is that accounting students lack any classification system for financial statement errors. The second piece of logical evidence drawn from the accounting domain is that financial statement errors can be arrayed in a matrix (three-by-three in the experiment, Table 2 in Bonner et al.) with transaction cycles as columns and audit objectives as rows. The two classification systems are alternatives, not sub- and super-categories of the same hierarchical system, and knowledge of one system does not help a person understand the logic of the other. Cohen (2000) also requires behavioral evidence to support the view that a classification system is a psychological phenomenon. Bonner et al. (1997) obtained results supporting the hypothesis; for example, when subjects received 28 actual errors they estimated 19.18 errors when asked to estimate according to the same categories they were taught (either transaction cycles or audit objectives), and 15.54 errors when asked to estimate according to different categories. Bonner et al. ruled out alternative explanations of their results not based on the transaction and audit objective classification systems, for example by testing differences in linear trends of frequency estimates rather than mean differences. The results of Bonner et al. are consistent with research in other contexts, but Bonner et al. claim that they are interesting because classification in accounting and auditing involves relatively ill-defined categories, compared to natural categories used elsewhere in cognitive psychology. The combination of logical evidence from the accounting domain and statistically significant behavioral evidence does more than create interesting results – it inspires belief that two alternative classification systems have psychological, and possibly neurological, reality. The following two sections address the logical evidence requirement of Cohen (2000) in the context of customized assurance services, citing assurance literature to build cases for two alternative classification systems. Section 5 on the category use effect will cite cognitive psychology literature in order to suggest behavioral tests of the system. 194 PHILIP R. BEAULIEU 3. CLASSIFICATION OF ASSURANCE SERVICES The term “assurance services” came into use in the 1990s and was formally defined by the AICPA Special Committee on Assurance Services (the “Elliott Committee”) in 1996 as “independent professional services that improve the quality of information, or its context, for decision makers” (AICPA, 1996). The term was intended to include auditing as a subcategory, as indicated in the following quote, which refers to the Special Committee’s conceptual framework for assurance services. The framework’s primary objective is to provide a consistent view of assurance services. It provides guidelines that will enhance consistency and quality in the performance of services. It can also help establish a common public perception of the CPA’s function and value. Assurance services evolve naturally from attestation services, which in turn evolved from audits. The roots of all three are in independent verification. However, the form and content of the services differ. The earlier services are highly structured services considered to be relevant to the greatest number of users. The newer ones are more customized and targeted services intended to be highly useful in more limited circumstances (AICPA, 2000, p. 1). The term assurance services was not part of the auditing lexicon prior to the 1990s. For example, the classic book on the philosophy of auditing by Mautz and Sharaf (1961) does not mention levels or categories of audit services in any of its eight postulates of auditing or five primary concepts of auditing (evidence, due audit care, fair presentation, independence and ethical conduct), let alone mention assurance services. The term appeared in auditing textbooks after the AICPA definition in 1996 – a year later in the case of Arens and Loebbecke (1997). Around that time, audit partners began calling themselves assurance partners. The meaning of new concepts is adjusted by usage until a generally accepted meaning is established. The most relevant examples for the purposes of this paper are the concepts of review and compilation services defined in 1978 by the AICPA. Statement on Standards for Accounting and Review Services (SSARS) No. 1 stated that in a review engagement, the CPA’s report would indicate “limited assurance,” or negative assurance, that nothing came to the attention of the CPA indicating a material misstatement (Kinney, 2000). A compilation was defined as providing no opinion and no assurance regarding departures from GAAP, although the CPA is still associated with the financial statements and has some responsibility (Kinney, 2000). Research regarding the financial statement users most affected by this classification system, commercial lenders, has provided mixed evidence on their understanding and use of reviews and compilations. Bandyopadhyay and Francis (1995) found that loan officers’ interest rate recommendations and loan decisions were affected by the level of attestation (including audit, review, and compilation). Martin et al. (1988) reported that lenders do not generally differentiate between audits and reviews, but their acceptance of compilations depends on a number of Classiﬁcation of Customized Assurance Services by Decision Makers 195 factors, including the level of owners’ equity and term of the loan. Johnson et al. (1983) found that level of attestation (audit, review, compilation, and no attestation) did not affect loan decisions; Wright and Davidson (2000) similarly found no effect on loan risk assessments. In the United States, a gap between users’ and practitioners’ expectations of audits led to the adoption of many Statements of Auditing Standards (SAS), including SAS Nos 52–60, as well as SAS No. 82 on consideration of fraud in a financial statement audit. Thus, in addition to the research conducted between 1983 and 2000 on financial statement users’ perceptions of audit, review, and compilation services, other papers addressed the expectations gap related solely to audit-level attestation. Some of this research suggests that expectations gap standards might effectively narrow the gap (e.g. Bamber & Stratton, 1997; Campbell & Mutchler, 1988; Jennings et al., 1993; Kinney & Nelson, 1996). However, a paper by Houston and Taylor (1999) on WebTrust indicated that users of that assurance service incorrectly inferred that additional assurance regarding product quality was provided. Although the research cited in the preceding paragraphs offers the hope that users can be educated in order to calibrate their expectations of assurance services consistently with practitioners, it also discourages the assumption that decision makers have any particular classification system in mind. To be conservative, this paper will assume nothing about the classification hierarchies that decision makers might have adopted since 1996 to accommodate customized assurance. Instead, two theoretical classification systems, the AICPA (2000) and Kinney (2000), will be examined for their potential in assisting decision makers to classify customized assurance efficiently. In addition to defining assurance services in terms of improvements to the quality and context of information, the Special Committee (AICPA, 2000) related them to attestation and consulting services in a framework of categories. Attestation is a subcategory of assurance with detailed standards, whereas there is some overlap between the categories of assurance and consulting activities. The primary distinction between assurance and consulting is the goal of the service; assurance improves decision-makers’ output indirectly, through provision of better information, whereas consulting aims to aid decision makers directly through research and findings. The AICPA’s positioning of the assurance, attestation, and management consulting categories is shown in Fig. 1. Essential features of these categories are described in Table 1; the hierarchical relationship between attestation and assurance is evident in the table. For example, the objective of assurance is better decision making, which subsumes the narrower objective of attestation, reliable information. The level of assurance is defined as examination, review, or agreed-upon procedures in the attestation category, but the assurance category is 196 PHILIP R. BEAULIEU Fig. 1. Universe of CPA Services (Reproduced from AICPA, 2000, p. 8). flexible with regard to levels, which may range from explicit assurance about the usefulness of information for specific purposes to implicit assurance resulting from CPA involvement. The test of logical evidence advocated by Cohen (2000) requires that the hierarchical system in Fig. 1 be more efficient than alternative classification systems in terms of information storage and access, and identification of useful analogies. The system is economical in that there are just seven categories at three levels; a hierarchy that could accommodate the complexity and variety of assurance services in fewer categories is hard to conceive. The attestation category is parsimonious because when decision makers encounter a service that they expect is attestation, they only have to consider coding it as audit examination, review, or agreed-upon procedures. The system might help decision makers think Classiﬁcation of Customized Assurance Services by Decision Makers 197 Table 1. Types of Services (Reproduced from AICPA, 2000, p. 7). Result Objective Parties to the engagement Independence Substance of CPA output Attestation Assurance Consulting Written conclusion about the reliability of the written assertions of another party. Better information for decision-makers. Recommendations based on the objectives of the engagement. Reliable information. Not specified, but generally three (the third party is usually external); CPA generally paid by the preparer. Required by standards. Conformity with established or stated criteria. Recommendations might be a byproduct. Better decision making. Generally three (although the other two might be employed by the same entity); CPA paid by the preparer or user. Better outcomes. Generally two; CPA paid by the user. Included in definition. Not required. Assurance about reliability or relevance of information. Recommendations; not measured against formal criteria. Form of CPA output Written. Critical information developed by Information content determined by Level of assurance Asserter. Criteria might be established, stated, or unstated. Some form of communication. Either CPA or asserter. Preparer (client). Preparer, CPA, or user. CPA. Examination, review, or agreed-upon procedures. Flexible, for example, it might be compilation level, explicit assurance about usefulness of the information for intended purposes, or implicit from CPA involvement. No explicit assurance. Written or oral. CPA. of useful analogies when they encounter a new assurance service by identifying its relationship to familiar services, particularly the audit and review levels of attestation. Unfortunately, a drawback is that the boundaries of management consulting overlap those of assurance and attestation (bisecting the agreed-upon procedures category). Therefore, according to Fig. 1, a management consulting service may be categorized as exclusively consulting, a non-attestation assurance, 198 PHILIP R. BEAULIEU Fig. 2 Information Quality Assurance Services (Adapted from Kinney, 2000, p. 12). Source: This figure is reproduced from Information Quality Assurance and Internal Control for Management Decision Making (2000, Irwin/McGraw-Hill) by W. Kinney and is reproduced with permission of The McGraw-Hill Companies. an agreed-upon procedure, or some other type of attestation. Table 1 is somewhat inconsistent with Fig. 1 because it states that consulting engagements offer no explicit assurance, and thus would not overlap attestation as in Fig. 1. Cohen (2000) suggests the logical test of comparing a hierarchical classification system to alternative systems. The AICPA’s system provides a starting point because it has the force of the CPA brand for customized assurance services behind it, but other systems are possible. An alternative provided by Kinney (2000, p. 12) is adapted as Fig. 2. This system contains nine categories at three levels, similar in size and depth to the AICPA’s system. Otherwise, Kinney’s system is, at least superficially, more complex. Two concepts are introduced Classiﬁcation of Customized Assurance Services by Decision Makers 199 subordinate to assurance services: relevance improvement, information that helps “the decision maker form a better mental image of real-world conditions” (Kinney, 2000, p. 11); and reliability improvement and compliance, which increase the decision maker’s confidence in the application and display of measurement results. These two subcategories are the keys to Kinney’s system; cognitive effort expended in comprehending them might allow a decision maker to classify new assurance services quickly into the third-level categories. This system shares a difficulty with the AICPA’s system in that two categories, internal control design/operation and information origination services, intersect two higher-order categories. Kinney’s system is different in that the attestation levels of review and agreed-upon procedures, and the non-attestation service of compilation, are not specified. Arguing that either classification system is superior on the basis of a logical argument is difficult. The Kinney system places the decision maker using it under a heavier conceptual burden up front because its two key categories have no single real-world referent, such as an audit or review report. There is a potential difficulty at the third level of the hierarchy because one category is “audits” of internal control; what is an “audit” in quotation marks supposed to be? Unlike the AICPA’s system, the only category of attestation or assurance specified unequivocally is the audit. Decision makers searching either classification for an analogy to a freshly encountered assurance service are assisted differently in each system. Figure 1 (AICPA) has concrete examples available; and if a customized service does not match any of them on essential dimensions, it might be placed in an open space under assurance services, essentially constituting its own category. Using Fig. 2 (Kinney), decision makers would presumably make an initial classification with respect to relevance or reliability improvement; and if the former (latter) is chosen, decision makers would make a second classification choice regarding measurement design or context improvement (audit or non-audit service). This approach enables efficient storage and retrieval of information, but decision makers with different cognitive styles might possibly prefer either classification system, the concrete (AICPA) or the conceptual (Kinney). In reality, decision makers likely have many different classification systems for assurance services, in the extreme one unique system per person. Audit, review, and compilation services are the types of assurance with which most decision makers would be familiar, and many of them might classify these services in a fashion similar to that intended by the AICPA (Fig. 1). Kinney’s (2000, Fig. 2) system appeared in a book that explains assurance services to decision makers, but is probably less well known. Regardless of the exact numbers of decision makers who might have encountered either system, they can be seen as ideal, comprehensive hierarchies capable of accommodating almost any customized assurance service. With the admission that these systems are ideals that decision makers may adapt 200 PHILIP R. BEAULIEU to their individual circumstances, we proceed in the next section to show how they could be revised to include SysTrust™ , an example of customized assurance. 4. CLASSIFICATION OF CUSTOMIZED ASSURANCE: THE EXAMPLE OF SysTrust™ SysTrust™ , as described in Exposure Draft Version 2.0, is intended to “increase the confidence of management, customers, and business partners in systems that support a business or particular activity” (AICPA/CICA, 2000, p. 4). Elsewhere, Version 2.0 defines the set of SysTrust™ users more broadly: Potential users of this service are shareholders, creditors, bankers, business partners, third-party users who outsource functions to other entities, stakeholders, and anyone who in some way relies on the continued availability, security, integrity, and maintainability of a system (AICPA/CICA, 2000, p. 4). The four principles used to judge whether a system is reliable, mentioned in the above quotation, are defined as follows (AICPA/CICA, 2000, pp. 11–13). Availability: The system is available for operation and use at times set forth in service-level statements or agreements. Security: The system is protected against unauthorized physical and logical access. This principle also addresses privacy concerns related to use of confidential information. Integrity: System processing is complete, accurate, timely, and authorized. Maintainability: The system can be updated when required in a manner that continues to provide for system availability, security, and integrity. In a SysTrust™ engagement, a practitioner collects evidence about the effectiveness of controls over the principles for a defined period.4 Version 2.0 lists over 200 illustrative controls, covering all four principles, whose effectiveness practitioners may test. The result is a report on whether management maintained effective controls over the SysTrust™ principles addressed by the engagement, or on management’s assertion about the effectiveness of controls. Any system may be addressed by a SysTrust™ engagement, not just Internet-related systems as in the case of WebTrust™ . For example, a corporation’s financial services system may be defined for the purposes of a SysTrust™ engagement as its data center, including infrastructure such as a CPU and peripherals, software, data, employees, and procedures. SysTrust™ engagements are generally considered attestation because they are performed under Statement on Standards for Attestation Engagements (SSAE) No 1, found in Section 100 of the AICPA’s Professional Standards. However, Classiﬁcation of Customized Assurance Services by Decision Makers 201 other customized engagements are permitted under SysTrust™ Version 2.0, as described below. This document so far has described how the SysTrust™ Principles and Criteria may be used in examination/audit level attestation engagements for systems in production. The SysTrust™ Principles and Criteria may also be used in other types of engagements that meet client needs, as long as the applicable professional standards and the SysTrust™ licensing agreement are observed. Following are examples of other types of SysTrust™ engagements a practitioner might perform (AICPA/CICA, 2000, p. 19). The examples that follow this quote are reporting on selected SysTrust™ principles, engagements for systems in the preimplementation phase, agreed-upon procedures engagements, and consulting engagements (review level assurance is not allowed). Thus, Exposure Draft Version 2.0 enables practitioners to customize SysTrust™ assurance in several ways to meet specific client needs, but these adjustments require a great deal of diligence on the part of decision makers to understand. First, management defines the boundaries of the system in question in a System Description attached to the management assertion regarding the effectiveness of its controls, which in turn is attached to the assurance report. Management can choose to define a system in any way it sees fit; the system might be narrowly defined, as in the case of a data center, or broadly defined, as in the case of an outsourced finance and accounting function or ERP system. A second significant aspect of customization is that Version 2.0 allows reporting on any one of the four SysTrust™ principles. Thus, an engagement could address only the integrity principle, and provide no assurance regarding availability, security, or maintainability. The accountant’s report would list all four principles and state that integrity is the sole principle covered, but it would be left to decision makers to search for a definition of the integrity principle. As defined by SysTrust™ , integrity consists of complete, accurate, timely, and authorized processing, but the auditor’s report refers the user to the AICPA (or CICA) Web site for the definition; it does not appear in the report itself. Customization under SysTrust™ (Version 2.0) extends even further than the definition of system boundaries and reporting on selected principles. There can also be engagements for systems in the pre-implementation phase, i.e. systems that have not yet been placed in operation. Here, the practitioner tests the suitability of the design of controls at a point in time, rather than the operating effectiveness of controls for a period of time, as is the case for other SysTrust™ reports. For pre-implementation phase engagements, the system description attached to the practitioner’s report would require additional detail, such as the version of the system and “other appropriate identifiers” (AICPA/CICA, 2000, p. 20). There are few limits to customization of assurance under the proposed SysTrust™ , making it relatively difficult to perceive as a single product. However, 202 PHILIP R. BEAULIEU Fig. 3. Postclassification of Assurance Services – AICPA Option 1. it has been trademarked and servicemarked in the United States and Canada, and the brand appears in independent accountants’ or auditors’ reports, as in the phrase “SysTrust™ Principles and Criteria.” SysTrust™ users have some alternatives when they consider how to integrate it into their existing conceptual frameworks, a process called postclassification in the cognitive psychology literature (Ross, 1999). They range from creating a single category for SysTrust™ , with features of all customized options attached, to creating many SysTrust™ categories under pre-existing assurance categories with features matching customization. These choices are considered below in three possible postclassifications, two using the AICPA’s classification system and one based on Kinney’s (2000) hierarchy. Figure 3 revises Fig. 1 (the AICPA system) by including a category for SysTrust™ that spans the attestation and management consulting categories, and the attestation subcategories of audit and agreed-upon procedures (excluding review), as defined by Exposure Draft Version 2.0. It might be a challenge for decision makers to add the SysTrust™ category because it intersects different Classiﬁcation of Customized Assurance Services by Decision Makers 203 Fig. 4. Postclassification of Assurance Services – AICPA Option 2. levels and types of assurance, but at least there are concrete reference points (audit, agreed-upon procedures, and consulting) in the classification system. This postclassification is also likely to foster the brand-name awareness of SysTrust™ among decision makers that the AICPA desires by creating a single category for it. A difficulty with this postclassification is that the subject matter and customization features of SysTrust™ , such as reporting on selective system reliability principles, are not primary identifiers of the category. An alternative postclassification to that shown in Fig. 3 would be to create three separate SysTrust™ subcategories for audit examination, agreed-upon procedures, and management consulting, as shown in Fig. 4. This option allows decision 204 PHILIP R. BEAULIEU makers to compare the subject matter of SysTrust™ with other forms of assurance, matched according to level of assurance. For example, within the category of audit examination, the categories of financial statements and SysTrust™ explicitly recognize that assertions regarding financial information and systems are involved. However, breaking SysTrust™ down into three subcategories of other concepts might sacrifice brand recognition among decision makers, and since SysTrust™ is distributed among several subcategories, increase the cognitive effort required to classify each new SysTrust™ engagement. Using the single-category approach (Fig. 3), more effort is likely expended initially in identifying the breadth of the category, but less effort might be needed to store and access new information once postclassification is complete. Postclassification of SysTrust™ according to the Kinney (2000) system is pictured in Fig. 5, which is restricted to the reliability improvement category, Fig. 5. Postclassification of Assurance Services – Based on Kinney (2000). Classiﬁcation of Customized Assurance Services by Decision Makers 205 the relevant portion of Fig. 2. SysTrust™ would be excluded from the category of audits of financial statements and would constitute a subcategory of “audits” of internal control quality, business processes, etc. The meaning of “audits” in quotations would necessarily expand to include both true audits and quasi-audits. There is less emphasis on levels of assurance at the top of the Kinney hierarchy than in the AICPA’s classification system, so decision makers would be required to recognize them at a lower point in the hierarchy, perhaps constructing subcategories of SysTrust™ for audit, agreed-upon procedures and consulting (not shown in Fig. 5). Kinney’s system is similar to the single-category approach based on the AICPA’s system in that there is a relatively high initial postclassification cost in creating a comprehensive category having many features. The cost may be even greater under Kinney’s system because analogs of financial statement assurance levels are further removed from SysTrust™ . The advantage of Kinney’s classification system is that decision makers could quickly classify SysTrust™ as an assurance service that improves reliability of business processes (systems). 5. BEHAVIORAL EVIDENCE AND THE CATEGORY USE EFFECT The preceding section presented three alternative postclassifications (Figs 3–5) that decision makers could use to incorporate SysTrust™ in initial classification systems (Figs 1 and 2) for assurance services. An argument can be made in favor of each system’s ability to help decision makers store and access information regarding key assurance concepts, but Cohen (2000) requires that behavioral evidence is needed to support the assertion that any of these postclassifications has psychological reality. Bonner et al. (1997) pointed out that research attention in cognitive psychology is directed towards natural categories, making it difficult to find combinations of theory and method appropriate for acquiring behavioral evidence in the less concrete domain of assurance. For instance, Rosch et al. (1976) use musical instruments, fruit, clothing, furniture, trees, fish, and birds as taxonomy stimuli. Johonson and Mervis (1997) studied the effect of expertise on categorization of songbirds. In addition to employing natural categories, the focus of most research in cognitive psychology is how people initially form mental categories to aid them in problem-solving or classification tasks (e.g. Malt et al., 1995; Osherson et al., 1990), rather the effect of using a given classification system on subsequent revision of it, such as subclassification. Ross (1996, 1997, 1999, 2000) is an exception to these trends because his research concerns abstract classification systems and revisions to pre-existing 206 PHILIP R. BEAULIEU systems, which he refers to as postclassification. He claims that these learning situations are common to many practical category uses (Ross, 2000); this would include the situation faced by decision makers who must attempt to classify new customized assurance services. Ross performed several experiments illustrating the category use effect, but the one most relevant to customized assurance is the second experiment in Ross (2000). It must be described in detail in order to explain the category use effect. In the initial classification learning phase of this experiment, summarized in Table 2, non-medical student subjects were instructed Table 2. Summary of Ross (2000), Experiment 2. Tasks and Result Learning of Use Condition Classification Required During Learning of Use Classification Not Required During Learning of Use Classification learning Classify “patients” of 3 symptoms into 1 of 2 disease categories. Same as other condition. Learning of use Classify patients into disease categories. Classify patients into 1 of 4 subcategories (treatments), based upon symptoms. A sheet with the two diseases and the two treatments relevant for each is visible. Subjects not required to classify patients into disease categories. Same as other condition. Relevant-use symptoms determine treatment. Irrelevant-use symptoms are irrelevant to treatment. A sheet listing only two treatments for one of the diseases is shown. After symptoms have been listed, a second sheet listing only two treatments for the second disease is shown and symptoms are listed again. Order of treatments shown was counterbalanced. Same as other condition. Final task List symptoms that a person would be likely to have, for both diseases. Same as other condition. Result Ratio of correct to incorrect relevant-use symptoms higher than the ratio for irrelevant-use symptoms. Ratio of correct to incorrect relevant-use symptoms not significantly different for relevant-use and irrelevant-use symptoms. Conclusion During additional learning of a classification system (learning of use), the original categories must be activated so that they can be modified (subcategories added). Classiﬁcation of Customized Assurance Services by Decision Makers 207 to learn a classification system consisting of two fictitious diseases. They learned the system by studying a series of patient cards, each “patient” consisting of a list of three symptoms. There were twelve symptoms in all (e.g. cough, skin rash, sore muscles), but only eight symptoms predicted the two diseases, four symptoms for each disease. The symptoms were perfectly predictive; whenever they appeared, the disease was present. For each patient card, subjects diagnosed one of the two diseases, then received feedback. Diagnoses continued until a criterion level of learning was achieved. The learning of use phase of the experiment had two conditions, one where additional classification by disease was required during this phase, and one where classification by disease was not required. In the former condition, subjects were asked to diagnose (classify) each patient as before in the initial learning phase, and decide which of four drug treatments (two for each disease) should be given. The drug treatments were effective only when specific symptoms, which Ross called relevant-use symptoms, appeared. There were four relevant-use symptoms. The other four symptoms that indicated a particular disease gave no indication as to which drug treatment would be effective in curing the disease; Ross termed them irrelevant-use symptoms. To help the subjects learn, they were allowed to look at a sheet listing the two diseases and the two treatments for each disease as they worked. Subjects were given feedback on their diagnoses and treatment decisions for each patient, and continued until a criterion level of learning was achieved. Essentially, subjects learned a subclassification of the disease categories in this phase, consisting of symptoms of each disease that would or would not respond to treatment. In summary, there were a total of 12 symptoms: four did not indicate a disease, eight indicated a disease (four for each disease), and of the eight indicative symptoms, four indicated which of four treatments would be effective (with two treatments available for each disease). In the other condition of the learning phase of the experiment, where additional classification by disease was not required, subjects did not classify patients into disease categories before recommending a treatment. Each subject saw patient cards indicating symptoms of only one of the two diseases and was able to look at a sheet listing only the two drug treatments corresponding to that disease – not the name of the disease. Later, subjects performed the same task with the second disease (the order of diseases was counterbalanced). Thus, in this condition, subjects were given no direct opportunity to learn how to use the original disease classification system. The final task in both conditions of the experiment was a feature generation task, specifically, subjects were asked to list the symptoms that a person with a disease would be likely to have. The difference was that the group not required to classify by disease during learning of use performed the feature generation 208 PHILIP R. BEAULIEU task separately for each disease. A symptom would be scored correct if it did diagnose the disease, and incorrect if it indicated the other disease or was one of the four symptoms that did not diagnose either disease. In the condition where classification by disease was required in the learning of use phase, the ratio of correct to incorrect relevant-use symptoms listed (0.80) was significantly higher than the corresponding ratio for irrelevant-use symptoms (0.58). In the condition where classification was not required during learning of use, the ratios of correct to incorrect symptoms were lower and did not differ significantly between relevant use (ratio = 0.40) and irrelevant-use (ratio = 0.38) symptoms. This result indicates a category use effect; using the disease categories while learning subclassifications of the system – relevant-use versus irrelevant use symptoms – improved the ability of subjects to list symptoms in general, but particularly symptoms critical to the subclassification being learned. The critical condition for the category use effect to occur, identified in this experiment, is that the original categories must be activated so that they can be revised. In plain language, people must be reminded of original categories while they learn to use new, related categories in order for their use of the original categories to be changed in the correct or intended manner. In four other experiments, Ross (2000) ruled out alternative explanations of the category use effect and found that it applied to a reverse-order task, in which subjects were given one or two symptoms and asked to name the disease most likely for a patient with the symptom(s). In other research, Ross found that the category use effect applies to a problem-solving task in which formulas must be learned (Ross, 1999). In short, the effect is robust across variables, measures, and tasks, although the experiment described above is most relevant to the task of learning features of customized assurance reports. Applied to assurance services, the category use effect requires that decision makers be reminded of initial categories of assurance as they encounter new services, including SysTrust™ . The AICPA assumes that initial categories will be related to the CPA brand name in some fashion (refer to the quote in Section 3). In the AICPA’s initial classification system (Fig. 1), the relevant categories are attestation, including the subcategories of audit examination and agreed-upon procedures, and management consulting. Presumably, this reminder would heighten awareness among decision makers of the customization inherent in SysTrust™ with regard to level of attestation, regardless of whether postclassification involved single (Fig. 3) or multiple (Fig. 4) categories for SysTrust™ . If decision makers were taught Kinney’s (2000) classification system initially (Fig. 2), the one essential category would be “audits” of internal control quality, business process, etc., because SysTrust™ is entirely contained in that category. However, reminders about three higher levels of the hierarchy – audits of financial statements, reliability improvement, and relevance improvement – may possibly help decision makers Classiﬁcation of Customized Assurance Services by Decision Makers 209 define SysTrust™ by contrasting it with different assurance services. In contrast with category use in the AICPA’s classification system, in Kinney’s system the focus is on the types of decisions involved and measurement systems supporting them rather than on levels of assurance. However, if decision makers were required to review the audits of financial statements category they might be reminded of similarities and differences between assurance levels implied by SysTrust™ and traditional audits. This postulated category use effect in the context of assurance services resembles an aspect of the study of category audit knowledge by Bonner et al. (1997). In the first phase of their experiment subjects learned either a transaction cycle or audit objective classification system, and in the second frequency-learning phase they were shown nine errors and asked to classify them according to their assigned system. The second phase of the experiment had a category use component; there was an aided frequency knowledge test in which subjects were reminded of error categories just before the test. There was some evidence in the results that the reminder improved frequency knowledge, offering some support for a category use effect in assurance services. However, Bonner et al. studied two static classification systems and their concern was the effectiveness of category learning before category use, not category learning concurrent with category use. Thus, Bonner et al. encourages inquiry into the category use effect in assurance, but does not address it directly as a postclassification phenomenon. The following section offers suggestions as to how the category use effect (Ross, 2000) can be tested in the field of customized assurance services, specifically SysTrust™ . 6. RESEARCH IMPLICATIONS Behavioral evidence supporting the psychological existence of any classification system will most likely be found in controlled experiments similar to Ross (2000). More importantly, if the category use effect is studied, then this methodology must be used. This section begins with a detailed explanation of one possible experiment, then considers variations in the design. An experiment very similar to the one by Ross (2000), summarized in Table 2, would involve classification of assurance engagements instead of diseases. The initial classification learning task would consist of learning relevant features of an assurance classification system as applied to traditional financial statement assurance, such as the level of assurance implied by each category. An associated characteristic would be the user’s risk level, the risk that “an assertion accompanied by a favorable attest report is materially misstated” (Kinney, 2000, p. 270). Risk level might be rated on a four-point scale including low, medium, high, and 210 PHILIP R. BEAULIEU very high. In the case of the AICPA’s system (Fig. 1), audit examination would be labeled as low risk, review as low to moderate, and agreed-upon procedures as low to very high (Kinney, 2000). After being taught the classification system, subjects would be given a series of two-part engagement descriptions (corresponding to patients in Ross, 2000), the first part containing a description of the firm, its industry, its general financial position and performance, and the second part consisting of an independent accountant’s report. Firms would be described as belonging to different industries, and their financial condition would vary, so that there would be some uncertainty regarding the risk of using the accounting information for an investment decision. Subjects would be asked to make an investment decision and rate user’s risk for each firm until a criterion level of agreement with the classification system’s ratings was achieved, similar to criterion achievement in diagnosis in Ross (2000). In the learning of use task, where the manipulation would occur, all subjects would learn the essential features of SysTrust™ , such as the decision situations in which it could be used, levels of assurance, and customization options. Next, they would all see cases similar to those seen in the first part of the experiment, except that these would describe various fictitious SysTrust™ engagements with different customization features. However, only subjects in the treatment group would be asked to classify each SysTrust™ case according to the initial assurance classification system and would be able to see a picture of the entire system (e.g. Fig. 1), perhaps with some description of the categories. This classification task would likely be more difficult than the corresponding task in Ross (2000) because it is a challenge to perceive relationships between SysTrust™ engagements addressing the reliability of systems and traditional forms of assurance concerning accounting information used in investment decisions. No criterion level of “achievement” would be sought at this point; the purpose is to demonstrate to subjects through experience the similarities and differences between SysTrust™ and all other assurance services, and the range of engagements possible within SysTrust™ . Subjects in the control group would read the same descriptions of SysTrust™ engagements in order to show them specific examples of the service, but they would not be required to classify the engagements according to the initial system and would not see a picture of it. Still in the learning of use stage of the experiment, subjects would be taught a postclassifcation of assurance services that includes SysTrust™ , for instance AICPA option 1 as pictured in Fig. 3. Subjects would be shown where in the revised system various types of engagements would be placed, according to levels of assurance and customization provided. The final task of the experiment would require subjects to read examples of SysTrust™ engagements (including the independent accountant’s reports), rate the reliability of the systems described, Classiﬁcation of Customized Assurance Services by Decision Makers 211 and rate the user’s risk for each of them. The treatment group would observe the entire postclassification system (e.g. Fig. 3), whereas the control group would be shown only the part of the postclassification system showing SysTrust™ . In the case of Fig. 3, this would be only the oval containing SysTrust™ and the subcategories of audit examination, agreed-upon procedures, and management consulting. The categories of review, attestation, compilation, and assurance in the initial classification system would not be shown to the control group. In this design, judgments of user’s risk replace disease symptoms listed (Ross, 2000) as the dependent variable, but the design follows Ross (2000) in all other respects as closely as possible. Evidence of a category use effect would be that subjects in the treatment group rate user’s risk closer to the levels intended by the AICPA (e.g. consistently low risk for audit examination engagements) than the control group’s ratings. The category use effect would have been caused first by the treatment group’s having attempted to classify SysTrust™ with the initial system and being prompted to recall various levels of user’s risk associated with analogous assurance services. Also, they would have been able to observe the entire postclassification system, not just SysTrust™ categories, when making user’s risk judgments. Stated as a formal hypothesis, the category use effect would predict: H1. Decision makers asked to classify SysTrust™ engagements according to an initial classification system when learning to use SysTrust™ , and to use a complete postclassification system when asked to rate the user’s risk of systems, will give ratings closer to those intended by the AICPA than decision makers not asked to use an initial classification system, or a complete postclassification system, during learning of use tasks. The experiment described above could provide behavioral evidence of a category use effect, but fails to identify a more (or the most) efficient assurance classification system because subjects are only required to learn one system in the classification learning task. In the absence of a compelling logical argument, as required by Cohen (2000), that there exists an assurance classification system that is more efficient than alternative methods of organizing and accessing knowledge, additional behavioral evidence is needed to address the question of cognitive efficiency. To answer this question, the research could be extended by teaching another classification system, such as Kinney (2000, Fig. 2), in the classification learning task and comparing user’s risk judgments to results given the AICPA’s system. Other dependent variables measuring cognitive efficiency, for instance recall of the detailed information about the customized features of individual SysTrust™ engagements, could be added to the design. If one wished to predict that a relatively conceptual classification system is 212 PHILIP R. BEAULIEU more efficient than a concrete system, then one possible alternative hypothesis would be: H2. Among decision makers given the opportunity to use complete initial assurance classification systems when learning postclassification systems that include SysTrust™ , those using systems based on Kinney (2000) will later recall more information about SysTrust™ engagements than those using AICPA-based systems. Designing research in assurance classification is inherently more complex than the experiments conducted by Ross (2000). Not only are there alternative initial classification systems, there are alternative postclassification systems even for the same initial system, as shown in Figs 3 and 4. When subjects begin participating they might (e.g. commercial loan officers) or might not (e.g. students not having taken an auditing course) have already learned an assurance classification system. Those in the former group might find it difficult to ignore their preconceptions if they contradict what is taught in the classification learning task. One means of dealing with pre-existing classification systems among user groups would be to survey them regarding concepts such as levels of assurance and to adjust the systems taught in experiments for the results. Research could be extended beyond the strictly cognitive domain by introducing a dependent variable not used in extant classification research – price. Experimental markets could be employed to measure the willingness of subjects to pay for customized assurance services, although some abstraction from the details of specific services such as SysTrust™ might be necessary. A category use effect would be evident if subjects who were reminded in some way of an original, generic assurance classification system as they learned to use customized assurance were willing to pay more for a customized service than subjects not reminded of the initial system. This result would certainly please the AICPA and lend support to their hope that the CPA brand can be extended to a broader spectrum of assurance services. Finally, research could be extended to other customized assurance services offered under the AICPA brand, such as ElderCare. Services offered by other providers could also be included. For example, experiments using recall, user’s risk or price as dependent variables could require subjects to classify websites having either WebTrust™ or BBB Online seals as to the level of assurance provided. Regardless of the assurance services and dependent variables addressed, the difficulty remains that an initial classification system must either be assumed or taught to participants in the research. Consensus is more likely with relatively homogeneous user groups. Thus, a sample comprised of either commercial loan officers or institutional investors will be more likely to share a common classification system Classiﬁcation of Customized Assurance Services by Decision Makers 213 than a sample containing both types of decision makers, or including trading partners as well as investors. If actual decision makers of any type are sampled, rather than students without prior beliefs about classification of assurance services, then care must be taken to identify as precisely as possible the intended consumers of each service. In conclusion, an anecdote may illustrate the importance of understanding how decision makers classify customized assurance services. In 1998 an accountant in one of North America’s largest independent oil and gas producers mentioned to me that a partner of a Big 5 firm had left him a business card containing the title “assurance partner.” The accountant knew that the Big 5 firm was his company’s auditor, but he was confused by the title. Did it mean that the partner was also involved in insurance in some capacity? I explained that the partner was indeed an auditor. Even though assurance is a more familiar term now, that simple explanation could be misleading when describing practitioners who provide customized assurance services. NOTES 1. Assurance will be defined in this paper as defined by the AICPA (2000, p. 1): “independent professional services that improve the quality of information, or its context, for decision makers.” Attestation, including audits, is a subcategory of assurance (see Fig. 1), and at times in the paper attestation services will be referred to as a type of assurance. 2. The fields of auditing and decision-making uses of accounting information are most relevant to this paper, but cognitive models of classification also appear in accounting education literature, e.g. Butler and Mautz (1996) and Bagranoff et al. (1994). There is considerable discussion of ontologies in information systems literature concerning databases and artificial intelligence, e.g. Dahlgren (1995), Parsons and Wand (1997), Terenziani (1995), and Wand and Wang (1996). However, much of this work is based on culture and language, for example in reproducing users’ classifications in artificial systems, rather than psychology and cognition (the focus of this paper). 3. Cohen (2000) discusses two other types of evidence that are less relevant to this paper. Neuropsychological evidence shows that damage affects hierarchical levels differently. Ontogenetic evidence shows that children acquire some hierarchical levels before others. 4. Boritz (2001) pointed out that SysTrust™ assurance does not pertain to system reliability itself – it pertains to effectiveness of controls over principles. He questioned whether this could cause an expectations gap. ACKNOWLEDGMENTS Thanks to Karla Johnstone, Janet Morrill, Steve Salterio, Mike Stein, Michael Wright, and two anonymous reviewers. 214 PHILIP R. BEAULIEU REFERENCES AICPA (1996). Report of the AICPA Special Committee on Assurance Services. http://www. aicpa.org/assurance/index.htm AICPA (2000). Assurance Services – Definition and Interpretive Commentary. http://www.aicpa. org/assurance/scas/comstud/defincom/index.htm AICPA/CICA (2000). SysTrust™ Principles and Criteria for Systems Reliability Exposure Draft, Version 2.0. http://www.aicpa.org or http://www.cica.ca Arens, A., & Loebbecke, J. (1997). Auditing: An integrated approach (8th ed.). Upper Saddle River, NJ: Prentice-Hall. Bagranoff, N., Houghton, K., & Hronsky, J. (1994). The structure of meaning in accounting: A crosscultural experiment. Behavioral Research in Accounting (Suppl.), 35–57. Bamber, E. M., & Stratton, R. (1997). The information content of the uncertainty-modified audit report: Evidence from bank loan officers. Accounting Horizons (June), 1–11. Bandyopadhyay, S., & Francis, J. (1995). The economic effect of differing levels of auditor assurance on bankers’ lending decisions. Canadian Journal of Administrative Sciences, 12, 238–249. Beaulieu, P. (1996). A note on the role of memory in commercial loan officers’ use of accounting and character information. Accounting, Organizations and Society (August), 515–528. Bonner, S., Libby, R., & Nelson, M. (1997). Audit category knowledge as a precondition to learning from experience. Accounting, Organizations and Society (July), 387–410. Boritz, J. E. (2001). Information systems assurance. In: V. Arnold & S. G. Sutton (Eds), Researching Accounting as an Information Systems Discipline. Sarasota, FL: American Accounting Association (forthcoming). Butler, J., & Mautz, R. D., Jr. (1996). Multimedia presentations and learning: A laboratory experiment. Issues in Accounting Education (Fall), 259–280. Campbell, J., & Mutchler, J. (1988). The expectations gap and going-concern uncertainties. Accounting Horizons (March), 42–49. Choo, F., & Trotman, K. (1991). The relationship between knowledge structure and judgments for experienced and inexperienced auditors. The Accounting Review (July), 464–485. Cohen, G. (2000). Hierarchical models in cognition: Do they have psychological reality? European Journal of Cognitive Psychology, 12(1), 1–36. Dahlgren, K. (1995). A linguistic ontology. International Journal of Human-Computer Studies, 43(5–6), 809–818. Frederick, D., Heiman-Hoffman, V., & Libby, R. (1994). The structure of auditors’ knowledge of financial statement errors. Auditing: A Journal of Practice and Theory (Spring), 1–21. Houston, R., & Taylor, G. (1999). Consumer percentions of CPA WebTrust assurances: Evidence of an expectation gap. International Journal of Auditing, 3, 89–105. Jennings, M., Kneer, D., & Reckers, P. (1993). The significance of audit decision aids and precise jurists’ attitudes on perceptions of audit firm culpability and liability. Contemporary Accounting Research (Spring), 489–507. Johnson, D., Pany, K., & White, R. (1983). Audit reports and the loan decision: Actions and perceptions. Auditing: A Journal of Practice and Theory (Spring), 38–51. Johonson, K., & Mervis, C. (1997). Effects of varying levels of expertise on the basic level of categorization. Journal of Experimental Psychology (September), 248–277. Kida, T., Smith, J., & Maletta, M. (1998). The effects of encoded memory traces for numerical data on accounting decision making. Accounting, Organizations and Society (July/August), 451–466. Classiﬁcation of Customized Assurance Services by Decision Makers 215 Kinney, W. (2000). Information quality assurance and internal control for management decision making. Boston: Irwin/McGraw-Hill. Kinney, W., & Nelson, M. (1996). Outcome information and the “expectation gap”: The case of loss contingencies. Journal of Accounting Research (Autumn), 281–294. Libby, R. (1995). The role of knowledge and memory in audit judgment. In: R. Ashton & A. H. Ashton (Eds), Judgment and Decision-Making Research in Accounting and Auditing. Cambridge: Cambridge University Press. Libby, R., & Trotman, K. (1993). The review process as a control for differential recall of evidence in auditor judgments. Accounting, Organizations and Society (August), 559–574. Lymer, A., Debreceny, R., Gray, G., & Rahman, A. (1999). Business reporting on the internet (Discussion Paper). London: International Accounting Standards Committee (IASC). Malt, B., Ross, B., & Murphy, G. (1995). Category coherence in cross-cultural perspective. Cognitive Psychology, 29, 85–148. Martin, C., Handorf, W., & Clewell, W. (1988). Small business lending and levels of report assurance. Akron Business and Economic Review (Summer), 69–84. Mautz, R., & Sharaf, H. A. (1961). The philosophy of auditing. Sarasota, FL: American Accounting Association. Moeckel, C. (1990). The effect of experience on auditors’ memory traces. Journal of Accounting Research (Autumn), 368–387. Nelson, M., Libby, R., & Bonner, S. (1995). Knowledge structures and the estimation of conditional probabilities in audit planning. Accounting Review (January), 27–47. Osherson, D., Smith, E., Wilkie, O., Lopez, A., & Shafir, E. (1990). Category-based induction. Psychological Review, 97, 185–200. Parsons, J., & Wand, Y. (1997). Choosing classes in conceptual modeling. Communications of the ACM, 40(6), 63–69. Rosch, E., Mervis, D., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439. Ross, B. (1996). Category representations and the effects of interacting with instances. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 1249–1265. Ross, B. (1997). The use of categories affects classification. Journal of Memory and Language, 37(August), 240–267. Ross, B. (1999). Postclassification category use: The effects of learning to use categories after learning to classify. Journal of Experimental Psychology: Learning, Memory and Cognition, 25(May), 743–757. Ross, B. (2000). The effects of category use on learned categories. Memory and Cognition, 28(January), 51–63. Terenziani, P. (1995). Towards a causal ontology coping with the temporal constraints between causes and effects. International Journal of Human-Computer Studies, 43(5–6), 847–863. Wand, Y., & Wang, R. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95. Wright, M., & Davidson, R. (2000). The effect of auditor attestation and tolerance for ambiguity on commercial lending decisions. Auditing: A Journal of Practice and Theory (Fall), 67–81.