Statistical Software for Students: Academic Practices and Employer Expectations William C. Adams, Donna Lind Infeld, & Carli M. Wulff Trachtenberg School ● George Washington University Adams, Infeld, & Wulff — Page 1 of 31 Statistical Software for Students: Academic Practices & Employer Expectations O ver the past several decades, students of public administration, public policy, and public affairs have regularly been taught statistics using various computer packages, such as SAS, SPSS, or Stata. However, there has been little published exploration or discussion of which statistical software packages are of greatest benefit to these students. To explore this topic, we conducted a multi-method study addressing the following research questions: 1. Is there any available evidence that indicates a particular statistical software program is superior? 2. Which statistical software programs are most widely integrated into MPA, MPP and related masters programs? 3. What statistical software skills, if any, do relevant employers specify in job announcements? And for those continuing in academe, are there trends in software use? Answering these questions required a canvass of our peers and our students’ prospective employers as well as research on the merits of the competing software. Prior studies have confirmed that most MPA, MPP, and related masters programs do require an introductory course in statistics and in budget and finance, and often offer more advanced courses as well (Infeld & Adams 2011; Koven, Goetzke, & Brennan 2008; Morçöl & Ivanova 2010; NASPAA 2009). But, in the data-rich, computer-based world of the 21st century, what is being taught in those quantitative courses regarding specific statistical software? No prior systematic report of either academic practices or relevant employer needs could be found. However, before exploring this unchartered territory, we searched for evaluations of the comparative merits of prominent software options. Software Merits Despite the current enormous emphasis on outcome evaluations in education in general, and public affairs education in particular (Newcomer & Allen, 2010; Aristigueta & Gomes Adams, Infeld, & Wulff — Page 2 of 31 2006; Castleberry 2006; Durant 2002; Fitzpatrick & Miller-Stevens 2009; Powell 2009; Roberts & Pavlak 2002; Williams 2002), as reviewed below, remarkably few published articles assess the comparative advantages of competing software for students and only one outcome experiment was identified. This literature summary highlights Excel, SAS, SPSS, and Stata because those findings are especially relevant to the subsequent analysis. Features. The sole recent comparison of functionality that could be found lists the features of many statistical programs (excluding Excel) regarding regression analysis, time series analysis, ANOVA, and selected other statistics (Wikipedia, 2011). In these areas, leading software such as SAS, SPSS,1 and Stata all run the most common and many arcane statistical procedures, although a few exotic options are missing (e.g., SPSS lacks quantile regression, autoregressive conditional heteroscedasticity analysis, and generalized autoregressive conditional heteroscedasticity analysis). Accuracy. Most users take for granted that the computations used by statistical software yield precise and identical results. Unfortunately, that is not necessarily the case. Because software performance is so often measured in terms of speed, programming shortcuts that gain speed can sacrifice exactitude. Altman & McDonald (2001) obtained mixed results when comparing Excel, SAS, SPSS, Stata, and several other software packages. The good news was that statistical software packages “typically” (though not always) provide “correct answers, to at least the fourth significant digit, for univariate statistics, regression problems, low-difficulty analysis of variance problems, and low difficulty nonlinear regression programs” (p. 684). The bad news was that many programs were unreliable for nonlinear regression and were “unable to return accurate ANOVA results for problems of even average difficulty” (p. 684). Excel 1997 could not even properly calculate standard deviations for data involving eight-digits or more.2 Later, using large, complex datasets, researchers at the National Center for Health Statistics performed various procedures using SAS, SPSS, Stata, and a less-widely used program, SUDAAN. They obtained “identical results” and concluded that, given this equally high level of precision, software choices should be driven by other factors such as cost, ease of use, and data management capabilities (Siller & Tompkins 2006). However, a more extensive Adams, Infeld, & Wulff — Page 3 of 31 analysis (Keeling & Pavur 2007) of nine software packages (including Excel 2003, SAS 9.1, SPSS 12.0, and Stata 8.1) did detect various shortcomings (especially with nonlinear regression and autocorrelation calculations) but found notable improvements from earlier versions in almost every area. Since 2007, Excel’s computations have continued to draw criticism, however no studies were found that gauged the accuracy of later versions of other statistical software. Indeed, Excel has a history of nontrivial computational errors (Sawitzki 1994, McCullough & Wilson 1999, 2002, 2005; Knüsel 1998, 2002, 2005; Altman, Gill, & McDonald, 2004). A more recent examination of Excel 2007 reached disturbing conclusions (McCullough & Heiser 2008, p. 4570): Excel 2007, like its predecessors, fails a standard set of intermediate-level accuracy tests.... [I]t is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer. Persons who wish to conduct statistical analyses should use some other package. Among the many errors that McCullough & Heiser identified are a flawed linear regression algorithm and output, erroneous nonlinear regression results, a nonrandom random number generator, and inaccurate t-tests (incorrect results especially when missing data are involved; wrong p-values, and even mistaken labels). Also regarding Excel 2007, Yalta (2008) found that “the accuracy of various statistical functions range from unacceptably bad to acceptable but significantly inferior” to alternative implementations. Others identified serious flaws in Excel’s polynomial trend line equations (Hargreaves & McWilliams 2010) as well as other areas (Almiron et al. 2010). Some statisticians argue against using Excel (or any other spreadsheet) as vehicle for teaching statistics (Nash 2008), especially given what they view as misleading and confusing default charts (Su 2008). No scholarly studies defending the precision of Excel could be found. To date, there seem to be no published critiques of Excel 2010, but McCullough & Heiser were pessimistic about future prospects (2008, p. 4570): Microsoft occasionally fixes errors, more often ignores them, and sometimes fixes them incorrectly. Consequently, every time there is a new version of Excel, the [accuracy] tests must be repeated. Adams, Infeld, & Wulff — Page 4 of 31 McCullough (personal communication, August 21, 2011) stated that Microsoft still does not document its “unsupported claims of accuracy” for Excel with any sort of actual “test code with known inputs and outputs.” Usability. No systematic studies of the relative user-friendliness, accessibility or learning curve could be found in the published literature. SPSS and Stata offer both a command line interface and a menu-driven, graphical user interface, while SAS is only command based and Excel does not have a command line option. All four run on Windows and Linux; Excel, SPSS, and Stata also offer Mac versions, although SAS does not. Two personal evaluations argued that Stata was superior to SAS and SPSS, but both authors had written books about Stata (Acock 2005; Mitchell 2005). Costs. Expense matters and certainly gives Excel a structural advantage, often pre-installed on computers or seemingly complimentary as part of the Microsoft Office suite. Also at no extra cost, as of 2011, “SAS OnDemand for Academics” allows online, cloud-based access to a wide range of SAS applications and is free of charge to college instructors and their students. The price of the latest student version of SPSS fell sharply in 2011 and was available on Amazon.com for less than $28, with its limitations of a 13 months license, 1500 cases, and 50 variables. Stata, which not long ago was the least expensive, is now the most expensive for students with the “Small Stata” product costing $49 and limited to one year, 1200 cases, and 99 variables.3 Site licenses for many academic lab computers quickly run into thousands of dollars and often entail complex fee formulas, volume discounts, and vary further depending on the feature sets to be included. While those exact site license calculations are beyond the scope of this study, they cannot be dismissed as irrelevant. Pedagogical value. By the 1990s, the dominant pedagogical model for statistics had decisively moved from a passive lecture approach to more active student engagement using a variety of activities, such as creative problem solving, practical applications, discussions, small groups, original research, plus more interpretation and analysis beyond rote calculations (Moore 1997; National Research Council, 1990, 1991). Not that expository texts and classroom lecture have been jettisoned entirely, but they came to be seen as insufficient alone to successfully engage active learning. Consequently, involving Adams, Infeld, & Wulff — Page 5 of 31 students in active data exploration via software became an especially valuable tool for the new pedagogy. Which software is most suitable for this task? As Moore observed (1997, p. 131), “Software designed for doing statistics is not necessarily well structured for learning statistics.” But there is little published research outside of a few scattered personal reflections and anecdotes about classroom advantages of using any particular software. Certainly arguments can reach the level of religious differences given some people’s attachment to and investment in their preferred statistics software package. The only randomized, controlled test of software was conducted among 24 undergraduates (mostly criminal justice majors) in an introductory statistics course at Indiana University (Proctor 2002). The dozen who used Excel scored higher than the dozen who used SPSS in terms of computational knowledge and slightly higher in conceptual knowledge. One might have expected that more systematic, comparative outcome evaluations would have been conducted by now – given the cost of software licenses, the time spent in computer laboratories, and the large investment in instructional communication – in order to optimize the software selection for quantitative courses. Issues of accuracy notwithstanding, there are hints in the literature that Excel is making inroads beyond its traditional domain of budget, finance, and business. Articles are appearing about ways to use Excel, usually incorporating its Analysis ToolPak add-in, to teach applied statistics in fields as varied as psychology (Warner & Meehan 2001), nursing (DiMaria-Ghalili & Ostrow 2009), and engineering (Prvan, Reid, & Petocz 2002) as well as business (Bell 2000). Likewise, some statistics textbooks have begun to focus on Excel (e.g., Dretzke, 2011; Carlberg 2011). All in all, prior research does not offer much guidance regarding the best software to employ in our quantitative classes. It does raise questions about Excel’s precision, but otherwise we seem to be left in the dark with only our personal anecdotes and experiences. Yet academic programs must still make decisions about what software packages to require of their students. We therefore sought to discover what programs across the country currently require for masters students. Adams, Infeld, & Wulff — Page 6 of 31 Program Practices A nationwide online poll of 260 eligible4 NASPAA representatives was conducted May-July, 2011. Completed surveys constituted a total of 131 accredited and not accredited masters’ programs. Responses encompassed over half (n=98; 52%) of the Master of Public Administration (MPA) programs, over half (n=16; 53%) of the Master of Public Policy (MPP) programs, most (n=5; 71%) of the Master of Public Affairs programs, and twelve of the many dozens of varied other public sector-related masters degrees. Responses from MPP programs also represent over one fourth of those that have institutional membership in the Association for Public Policy Analysis and Management (APPAM). Insert Table 1 and Figure 1 here. Of the 131 programs, only two do not offer an introductory statistics course. (See Figure 1 and Table 1.) Only four programs teach statistics without employing a software program. Before dismissing these exceptions, it should be noted that comments volunteered by a few who do use software in MPA programs show skepticism or ambivalence. “We’re having a lively debate about whether any statistics package should be used. Do MPAs really need to be able to run regressions and such? If we train public leaders, is this a quality they need? … [Some alumni say] these software programs got them their first job. Others say it was a waste of 3 credit hours.” “The administrator is far better having a general knowledge… and ensuring the expertise is in place than knowing one form of software well unless he or she plans to specialize in some way. If an administrator has time to sit and analyze data, he or she is probably not doing the job especially well.” “Most of our students don't use stats on the job. Some use Excel on the job but don't need training from us. They get their real training on the job.” Despite reservations from a few representatives, the overwhelming majority of these degree programs utilize statistical software. In the introductory statistics course, 97% of the MPA programs employ software (excluding the two programs without such a course), as do 100% of the MPP programs and 94% of the other masters programs. Among those programs using software in this course, most offer a companion computer lab (74% of MPA programs, 88% of the MPP programs, and 81% of other masters programs).5 Follow-up Adams, Infeld, & Wulff — Page 7 of 31 statistics courses are offered by a majority of MPA programs (59%) and a very large majority of MPP and other masters programs (94%); such courses almost always use statistical software and often include a computer lab. (See Figure 1 and Table 1.) An introductory budget and finance course is offered by large majorities (92% of MPA programs, 75% MPPs, 88% all others) and typically employs statistical software (89% of those MPA programs offering a course; 92% MPPs; 82% others) although, in this case, most do not have an accompanying computer lab (only 30% of MPA programs using software in this course add a lab; 45% MPPs; 14% others). Subsequent budget and finance courses are less common, but, if offered, usually use statistical software without accompanying computer labs. (See Figure 1 and Table 1.) Thus, many students in these MPA, MPP, and other masters programs are likely to have worked with statistical software in at least two courses and perhaps several, depending on their degree program, fields, and elective choices. Courses in program evaluation, policy analysis, and certainly capstone projects sometimes entail the use of statistical software as well. What software are these programs featuring? For budget and finance courses, the answer is easy: Excel. Having long ago vanquished its foes like Lotus 1-2-3, Quattro Pro, and PlanPerfect, Excel dominates the academic spreadsheet world without any major rival. (See Figure 2 and Table 1.) Only a handful of these masters programs do not use Excel alone in their budgeting and finance course(s); most of these exceptions added SPSS, Stata, or some other software to Excel; two outliers use SPSS alone. Insert Figure 2 here. For introductory statistics courses, software choices are less uniform. SPSS is most widely taught, but it lacks the near monopoly that Excel has in budget and finance courses. In MPA programs, a large majority (70%) use SPSS as do a majority in MPP (63%) and other masters programs (59%). (See Table 1 and Figures 2 and 3.) A plurality in MPA programs (42%) and MPP programs (44%) use SPSS alone, but it is also used in conjunction with Excel, especially in MPA programs (28%). In MPP programs, Stata supplants Excel as a main rival to SPSS. Other software (R, JMP, gretl, and Crystal Ball) were rarely mentioned. Adams, Infeld, & Wulff — Page 8 of 31 Insert Figure 3 and Figure 4 here and ideally on the same page for easy comparison. For subsequent statistics courses, Stata emerges as a stronger contender, markedly surpassing SPSS among MPP programs and showing a nontrivial presence in MPA and other masters programs. (See Table 1 and Figures 2 and 4.) SPSS continues to be the software of choice among MPA programs, used in over two-thirds of the later statistics classes, sometimes along with Excel or other software. Excluding Excel, respondents were asked to rank the demand for SAS, SPSS, and Stata among employers: “How would you estimate the current usage of these three software packages at the jobs your Masters students seek?” As shown in Figure 5, three out of four academics believe that SPSS is most widely used by relevant employers, with opinions split as to whether SAS or Stata is the runner-up. Insert Figure 5 and Figure 6 here. In terms of trends, however, the position of SPSS is not so secure. Respondents were asked if they had “noticed any trends in the popularity of these software programs over the past decade.” As shown in Figure 6, Stata is the clear winner in perceived momentum with 52% saying Stata has gained and only 6% saying it has lost popularity. That net positive 46% compares with a net negative 15% for SAS and a net negative 11% for SPSS. Of course this question asked about relative trends, not absolute standing where SPSS still comes out ahead in both university practices and projected employer needs. Yet, the widespread impressions of Stata’s strides — indeed three out of ten said it had gained “a lot” — suggest that SPSS (and programs that teach it) ought not to rest on its laurels. In this vein, one respondent commented: “We relied on SPSS as our stat package for years, particularly in the intro course. However costs and difficulty in licensing are making us consider alternatives. Since some of us use Stata in our own work, and it is higher education friendly, I think we will be moving more and more in that direction.” In both questions about perceived usage and trends of these three statistical software packages, our assumption was that, as a spreadsheet, Excel is extremely important but falls Adams, Infeld, & Wulff — Page 9 of 31 in a somewhat different category. However, some respondents volunteered that this trio was inconsequential because nothing really matters but Excel. “Most of our students are using Excel or similar, job specific applications, not these stats software.” “I cannot answer [those] questions because I believe that our employers use none of the three statistics packages mentioned.” “In polling our students and graduates only Excel is used in the work world.” Overall, these 131 Masters programs overwhelmingly employ Excel in their budget and finance courses. For statistics courses in MPA programs, SPSS dominates, but sometimes shares the spotlight with Excel. Among MPP programs, SPSS fights off Stata in the first round statistics course, but in subsequent courses Stata wins, with Excel on the sidelines. Other masters programs were in-between, with SPSS on top but with both Stata and Excel often used as well. SAS, R, and other programs are rarely used. Job Listings Even if all other things (such as cost and ease of instruction) are not equal, perhaps extra consideration should still be given to statistical software that will be more valuable after graduation. Why focus on software that is rarely used outside one’s halls of ivy? The literature on diffusion and network effects6 repeatedly indicates that — when it comes to adopting software and hardware — people place an enormous premium on connectivity and minimizing transaction costs (Brock 1975; Katz & Shapiro 1985; Economides 1996). Working collaboratively, exchanging files, and discussing issues are all enormously facilitated when everyone is using a common language, network, and system. Once interconnectivity is established with a shared framework, a new improved framework must be dramatically superior, not just modestly better, to dislodge the advantages of network connectivity and overcome switching costs (Farrell & Saloner 1986; Katz & Shapiro 1986). Network pressures may push products to consolidate on a single standard (such as VHS over BetaMax, Blu-Ray over HD-DVD, cassette tapes over eight-track tapes), but sometimes after winnowing out most competitors a couple of entrenched networks may endure (e.g., PCs and Macs). Adams, Infeld, & Wulff — Page 10 of 31 Surely having familiarity with a specific software package (sufficient to put on one’s resume) is an asset when applying for a job with an employer who uses that same software. Already knowing Stata and speaking Spanish might still be of some value even if the organization uses SPSS and often works in Brazil, but in this example SPSS and Portuguese would be even better. While on-the-job training may be easier for software than for a foreign language, job applicants who are already conversant must have at least some advantage, especially when the employer advertises that such a specific skill is preferred. Which statistics software do relevant employers most often mention? We reported above the impressions of academics regarding software used by employers, but what software skills are explicitly sought by employers. Job announcements were collected from the following online sites (April 20-30, 2011): USAJobs.gov (listing federal government jobs) Idealist.org (listing jobs with nonprofit organizations) State governments (ten largest states)7 City governments (ten largest cities)8 Major consulting firms (BoozAllen, Delloite, ICF, LMI, and PricewaterhouseCoopers) Major policy research organizations (Brookings, Mathematica, RAND, Urban Institute) While hiring had not rebounded to pre-recession levels, searches for full-time jobs using the keywords SAS, SPSS, Stata, and Excel (always checking to confirm “Excel” was not a verb) yielded a total of 409 job listings. A close inspection of listings found that few other statistical packages were requested so the summary here is confined to these four. Eleven job listings were eliminated as specialized outside an area appropriate for typical MPA and MPP students (e.g., actuary, statistician, information technology programmer). Of the remaining 398 openings, only a dozen specified “proficiency” or “mastery” of at least one of these packages. Most descriptions used vague but less demanding words such as “familiarity with,” “skill with,” “knowledge of,” or often simply “experience with.” Insert Figure 7 here. Adams, Infeld, & Wulff — Page 11 of 31 Because the precise distribution of cited software is likely to shift somewhat from week to week, the value of this snapshot is the big picture that it provides, and that picture is quite clear: 1. Relevant employer requests for experience with Excel (n=290) surpassed that of the other three statistical software combined (n=108 positions for SAS, SPSS, or Stata). 2. In the race for the second-place ranking behind Excel, no single software package dominates. (See Figure 7.) 3. SPSS and SAS do appear to outrank Stata (especially with half of the “Stata only” listings due to its popularity at just one organization, the Urban Institute). 4. Familiarity with SAS alone fits two-thirds of the listings citing one of these packages (68%), SPSS alone is sufficient for almost two-thirds (64%); and Stata alone satisfies about four out of ten (38%). 5. Familiarity and experience with any two of these three packages would qualify an applicant at least eight out of ten of these openings where familiarity with a “nonspreadsheet” package is sought. Subsequent searches (conducted May 10, 2011) of other databases repeated these same basic findings: Again, Excel excelled. Among the others, none was dominant, but SAS and SPSS were consistently ahead of Stata: NACElink.com (under “government”): Excel 105, SAS 9, SPSS 7, Stata 3. PolicyJobs.net (with Anglosphere international listings): Excel 133, SAS 49, SPSS 32, Stata 26. The authors’ school’s “Career Central” web site listing jobs from (mostly local area) nonprofits, think tanks, consulting firms, and government: Excel 205, SPSS 20, SAS 16, and Stata 7. Data from these additional sites were not incorporated into the findings shown in Figure 7 because of some duplicate listings and a slightly different time period, but the results reflected the same general pattern. The authors’ school’s “Career Central” web site was also helpful for another purpose. Because all of the listings on this site were ideal for MPA and MPP graduates, it allowed us to compute a ratio (1:3) of listings requesting statistical Adams, Infeld, & Wulff — Page 12 of 31 software familiarity (n=248) to listings with no such mentions (n≅733). Thus, about three fourths of the positions did not appear to require any software background or took it for granted. However, most posts that did cite a software preference seem to have been serious about it. In theory, these job announcements might mention specific statistical software as examples not as an exclusive list, and thus a candidate who was familiar with a rival package would still be on equal footing. If so, employers could easily avoid needlessly narrowing their recruitment by simply inserting “such as,” “e.g.,” or “etc.” before naming merely illustrative software. In fact, only 22% of those who mentioned SAS, SPSS, or Stata suggested any openness to alternatives, while a large majority (78%) did not indicate any flexibility. Thus, it seems fair to conclude that, in the absence of any hint otherwise, an advertisement recruiting someone with familiarity with SPSS, for example, is indeed intended for hiring someone with familiarity with SPSS. Fewer than 1% of the job ads requesting Excel familiarity could be construed as open to alternative software. In addition to gauging the software preferences of relevant employers, we also examined statistical software used in academic social science research. After all, some masters graduates go on to earn doctorates and pursue academic careers. Using Google Scholar’s extensive collection of books and journal articles, full text searches were run for publications from 1996 thru 2010.9 While journal articles no longer routinely mention which software was used for calculations, many still do. Insert Figure 8 here. Across the three five-year periods, while SPSS made some gains, the most striking trend is the surge in citations of Stata in both social science journals and in administration and policy journals.10 (See Figure 8.) This provides empirical validation for the notion among many academics, discussed previously and summarized in Figure 6, that the use of Stata has grown markedly over the past decade. Over time far more sources have been digitized, but the relative proportion of mentions of Stata has grown from single digits to 27% among these four statistical software packages. Stata’s gains have come at the expense of Excel and SAS. SPSS has held steady in the social science area and gained in the administration, Adams, Infeld, & Wulff — Page 13 of 31 policy, and economics area to garner three out of ten mentions and hold the plurality position in the most recent period studied. Summary Only one small randomized outcome study comparing the effectiveness of statistical software could be found and the limited literature on the comparative merits of leading software is mostly impressionistic. An exception is a series of studies of computational accuracy indicating that considerable progress has been made over the past decade but still raises concerns, especially when high levels of precision are required with large datasets, when using some more advanced statistics such as nonlinear regression, and about several aspects of Excel. Overall, for purely pedagogical purposes in graduate courses, no objective basis was found promoting any particular one of the leading statistical software packages. Our large nationwide survey of MPA, MPP, and related masters programs found that Excel has a near monopoly in budget and finance courses. For statistics courses in MPA programs, SPSS dominates, although Excel is also often used. Among MPP programs, SPSS is the most widely used in the initial statistics course, but Stata is used more often in later courses. Other masters programs were in-between, with SPSS again on top but with both Stata and Excel also popular. SAS, R, and other programs are rarely used in these masters’ courses. Relevant employers of these students request familiarity with Excel far more than with any other quantitative software. Among nonspreadsheet options, SAS and SPSS skills were most widely sought. For the pure purpose of teaching statistical concepts, career market considerations are irrelevant of course; the choice would be driven instead by a benefit-cost estimate of the comparative effectiveness of various software packages in engaging students to explore and understand data analysis. Ideally, we might strive to do both: strengthening students’ statistical literacy while simultaneously providing statistical software experience that will be of value in applying for and on the job. Our findings about current academic practices and employer preferences can provide input for decision-making about future program directions. The findings summarized below are framed as responses to various views about statistical software skills. Adams, Infeld, & Wulff — Page 14 of 31 “No statistical software is needed.” To be sure, a majority of relevant employers did not advertise that they required any statistical software familiarity. Perhaps they assume that good candidates these days would at least be acquainted with spreadsheets, but perhaps they do not care. The “no statistical software” view requires a gamble that our graduates can find good jobs without being eligible for the roughly one fourth of the positions that call for specific software skills, that statistical software familiarity would a trivial asset on their resumes, or that they can acquire sufficient statistical literacy without active learning through engagement with software. “Statistical software choices are interchangeable.” In fact, specific software skills do matter. Only about one fifth of the job advertisements mentioning statistical software indicated that substitutions for the preferred software would be acceptable. In today’s “buyer’s market,” employers can easily screen for particular skills and avoid the costs of remedial on-the-job training. “Excel training should surpass all else.” This argument is actually not extreme. Employer requests for Excel experience far surpassed that for all nonspreadsheet statistical software combined. While Excel is used in nearly all introductory budget and finance courses (often with no computer lab) and in many statistics courses, perhaps more graduate programs should add computer labs to further strengthen Excel skills. Despite its shortcomings in some advanced calculations Excel’s accuracy does seem to be improving. “SPSS is still the best nonspreadsheet statistics package.” A good case can be made for SPSS. SPSS and SAS are the nonspreadsheet packages most requested by employers, and, unlike SAS, SPSS has an available menu format that may help account for its status as the leading statistical software begin taught in introductory statistics courses. Moreover, SPSS has gained or held its own in academic research citations, while SAS and Excel have declined. “SAS should rule.” SAS supporters will assert that, despite its surprisingly rare use in MPA and MPP graduate programs, it is functionally powerful enough that it is worth any extra effort. Moreover, our study did find that relevant employers tended to seek familiarity with SAS slightly more than SPSS and clearly more than Stata. Adams, Infeld, & Wulff — Page 15 of 31 “Stata is the wave of the future.” Advocates of Stata can point to its impressive trend line, more than tripling its academic citations in one decade. It is also noteworthy that after the introductory statistics courses, a majority of the MPP programs surveyed along with many MPA and related masters programs, turn to Stata in the intermediate/advanced statistics class. Yet, while Stata’s advances suggest an increasingly important future, it may be premature to jettison the older packages that are still decidedly more popular among relevant employers. “Give MPAs familiarity with Excel and SPSS and give MPPs familiarity with Excel, SPSS, and Stata.” This mixed-methods approach, found to be the dominant practice nationwide, has a strong rationale and should open up the widest range of job opportunities to graduates. Few employers request mastery or proficiency, so experience with Excel plus exposure to at least one menu-driven statistical software package like SPSS appears to be the optimal configuration for MPAs. The additional statistics course(s) that MPPs typically take offer an opportunity to add Stata or SAS to their repertoire. Aiming for research-oriented careers, that larger analytical tool chest should be of special value to MPP graduates. For curriculum decision-making purposes, this research offers some provocative and useful findings although not a blueprint. The software used to teach statistics must be determined by the careful assessments of individual programs. Given the considerable time, expense, and energy devoted to teaching quantitative courses and the potential impact on job opportunities of graduates, these decisions are consequential ones. Excel is a registered trademark of Microsoft Corporation, Redmond, Washington. SAS is a registered trademark of the SAS Institute Inc., Cary, North Carolina. SPSS is a registered trademark of IBM Corporation, Armonk, New York. Stata is a registered trademark of StataCorp LP, College Station, Texas. Adams, Infeld, & Wulff — Page 16 of 31 Authors: William C. Adams, professor of Public Policy and Public Administration at The Trachtenberg School at The George Washington University, teaches research methods and applied statistics courses to both MPA and MPP students. His most recent book is Election Night News and Voter Turnout: Solving the Projection Puzzle. Contact: adams@gwu.edu Donna Lind Infeld is professor at the Trachtenberg School of Public Policy and Public Administration at The George Washington University where she is director of the doctoral program. She teaches graduate courses in policy analysis and research methods. Dr. Infeld’s research is often in the area of aging and long-term care. Contact: dlind@gwu.edu Carli M. Wulff is an MPA candidate specializing in social policy at The Trachtenberg School at The George Washington University and formerly served as a Peace Corps volunteer in Kyrgyzstan, 2006-2008. The authors wish to thank Stephanie Celinni, Dylan Conger, and Laura Minnichelli for their comments and suggestions. An earlier version of this paper was presented at the Association for Public Policy & Management’s Teaching Workshop, November 2, 2011, Washington, DC. Adams, Infeld, & Wulff — Page 17 of 31 References Acock, A. C. (2005). SAS, Stata, SPSS: A comparison. Journal of Marriage and Family, 67(4), 1093-1095. Adams, W.C. (2010). Using the Internet. In J. Wholey, H. Hatry, and K. Newcomer, eds., Handbook of Practical Program Evaluation, 3e. San Francisco: Jossey-Bass. Altman, M., Gill, J., & McDonald M. (2004). Numerical issues in statistical computing for the social scientist. Hoboken, New Jersey: John Wiley & Sons. Altman, M., & McDonald M. (2001). Choosing reliable statistical software. PS: Political Science and Politics, 34(3), 681-687. Almiron, M.G., Lopes, B., Oliveira, A.L.C., Medeiros, A.C., Frery, A.C. (2010). On the numerical accuracy of spreadsheets. Journal of Statistical Software, 34(4), 1-29. Aristigueta, M., & M. B. Gomes, K.M.B. (2006). Assessing performance in NASPAA graduate programs. Journal of Public Affairs Education, 12(1), 1–18. Bell, P.C. (2000). Teaching business statistics with Microsoft Excel. INFORMS: Transactions on Education, 1(1), 18-26. Brock, G. (1975). Competition, standards, and self-regulation in the computer industry. In R. E. Caves and M. J. Roberts, eds., Regulating the Product: Quality and Variety, Cambridge: Ballinger. Castleberry, T. E. (2006). Student learning outcome assessment within the Texas State University MPA program. Applied Research Projects. Paper 182. http://ecommons.txstate.edu/arp/182 Carlberg, C.G. (2011). Statistical Analysis: Microsoft Excel 2010. Indianapolis, IN: Que. DiMaria-Ghalili, R.A., & Ostrow, C.L. (2009). Using Microsoft Excel to teach statistics in a graduate advanced practice nursing program. Journal of Nursing Education 48(2), 106-110. Adams, Infeld, & Wulff — Page 18 of 31 Dretzke, B.J. (2011). Statistics with Microsoft Excel. Upper Saddle River, NJ: Pearson/ Prentice Hall. Durant, Robert F. (2002). Toward becoming a learning organization: Outcomes assessment, NASPAA accreditation, and mission-based capstone courses. Journal of Public Affairs Education, 8(3),193-208. Economides, Nicholas. (1996). The economics of networks. International Journal of Industrial Organization, 14(2), 673-699. Farrell, J., & Saloner, G. (1986). Installed base and compatibility: Innovation, product preannouncements, and predation. American Economic Review, 76, 940-55. Fitzpatrick, J.L., & Miller-Stevens, K. (2009). A case study of measuring outcomes in an MPA program. Journal of Public Affairs Education, 15(1), 17-31. Hargreaves, B.R., & McWilliams, T.P. (2010). Polynomial trendline function flaws in Microsoft Excel. Computational Statistics & Data Analysis, 54 (2010), 11901196. Infeld, D.L., & Adams, W.C. (2011). MPA and MPP students: Twins, siblings, or distant cousins. Journal of Public Affairs Education, 17 (2), 277–303. Katz, M., & Shapiro, C. (1985). Network externalities, competition, and compatibility. American Economic Review, 75(3), 424-440. _______. (1986). Technology adoption in the presence of network externalities. Journal of Political Economy, 94, 822-41. Keeling, K.B., & Pavur, R.J. (2007). A comparative study of the reliability of nine statistical software packages. Computational Statistics & Data Analysis, 51(7), 3811-3831. Knüsel, L. (1998). On the accuracy of statistical distributions in Microsoft Excel 97. Computational Statistics & Data Analysis, 26(3), 375-377. ________. (2002). On the reliability of Microsoft Excel XP for statistical purposes. Computational Statistics & Data Analysis, 39(1), 109-110. Adams, Infeld, & Wulff — Page 19 of 31 ________. (2005). On the accuracy of statistical distributions in Microsoft Excel 2003. Computational Statistics & Data Analysis, 48(3), 445-449. Koven, S. G., Goetzke, F., & Brennan, M. (2008). Profiling public affairs programs: The view from the top. Administration and Society, 40(7), 691–710. Kraemer, K.L.; Bergin, T.; Bretschneider, S.; Duncan, G.; Foss, T.; Gorr, W.; Northrup, A.; Rubin, B.; & Wish, N.B. (1986). Curriculum recommendations for public management education in computing. Public Administration Review, 46 (Special Issue), 595-602. McCullough, B.D., & Wilson, B. (1999). On the accuracy of statistical procedures in Microsoft Excel 97. Computational Statistics & Data Analysis, 31(1) 27-37. ________. (2002). On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Computational Statistics & Data Analysis, 40(4), 713-721. ________. (2005). On the accuracy of statistical procedures in Microsoft Excel 2003. Computational Statistics & Data Analysis, 49(4), 1244-1252. McCullough, B.D., & Heiser, D.A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis, 52(10), 4570–4578. Mitchell, M. N. (2005). Strategically using general purpose statistics packages: A look at Stata, SAS and SPSS. Statistical Consulting Group: UCLA Academic Technology Services (Technical Report Series, Report 1). Moore, D.S. (1997) New pedagogy and new content: The case of statistics. International Statistical Review, 65(2), 123-165. Morçöl, G., & Ivanova, N. P. (2010). Methods taught in public policy programs. Journal of Public Affairs Education, 16(2), 255–277. Nash, J. (2008). Teaching statistics with Excel 2007 and other spreadsheets. Computational Statistics & Data Analysis, 52(10), 4602-4606. Adams, Infeld, & Wulff — Page 20 of 31 NASPAA (2009). NASPAA Standards: 2009. http://naspaa.org/accreditation/doc/NS2009FinalVote10.16.2009.pdf National Research Council (1990). Reshaping school mathematics: A philosophy and framework for curriculum. Washington, DC: National Academy Press. ________. (1991). Moving beyond myths: Revitalizing undergraduate mathematics. Washington, DC: National Academy Press. Newcomer, K.E., & Allen, H.A. (2010). Public service education: Adding value in the public interest. Journal of Public Affairs Education, 16(2), 207–229. Powell, D.C. (2009). How do we know what they know? Evaluating student-learning outcomes in an MPA program. Journal of Public Affairs Education, 15(3), 269–287. Proctor, J.L. (2002). SPSS vs. Excel: Computing software, criminal justice students, and statistics. Journal of Criminal Justice Education, 13(2), 433-442. Prvan, T., Reid, A. & Petocz, P. (2002). Statistical laboratories using Minitab, SPSS, and Excel: A practical comparison. Teaching Statistics, 24(2), 68–75. Roberts, Gary E., & Pavlak, T. (2002). The design and implementation of an integrated values and competency-based MPA core curriculum. Journal of Public Affairs Education, 8(2), 115–129. Sawitzki, G. (1994). Report on the numerical reliability of data analysis systems. Computational Statistics & Data Analysis, 18(2), 289–301. Siller, A.B., & Tompkins, L. (2006) The big four: Analyzing complex sample survey data using SAS, SPSS, STATA, and SUDAAN. Paper 172-31 in Proceedings of the Thirtyfirst Annual SAS Users Group International Conference: San Francisco SAS Institute Inc. http://www2.sas.com/proceedings/sugi31/172-31.pdf Su, Y. (2008). It's easy to produce chartjunk using Microsoft Excel 2007 but hard to make good graphs. Computational Statistics & Data Analysis, 52 (10), 4594-4601. Adams, Infeld, & Wulff — Page 21 of 31 Warner, C.B., & Meehan, A.M. (2001) Microsoft Excel as a tool for teaching basic statistics. Teaching of Psychology, 28(4), 295-298. Wikipedia (2011) http://en.wikipedia.org/wiki/Comparison_of_statistical_packages Accessed May 11, 2011. Williams, David G. (2002). Seeking the Holy Grail: assessing outcomes of MPA programs. Journal of Public Affairs Education, 8(1), 45–56. Yalta, A.T. (2008). The accuracy of statistical distributions in Microsoft Excel 2007. Computational Statistics & Data Analysis, 52(10), 4579-4586. Adams, Infeld, & Wulff — Page 22 of 31 Table 1: Statistical Software Use in Graduate Courses Introductory Statistics Course MPA Programs No course No software SPSS only SPSS + Excel Excel only Stata only Stata + Excel SAS only All other¹ Total MPA Programs No course No software SPSS only SPSS + Excel Excel only Stata only Stata + Excel SAS only All other¹ Total 40 1 22 10 6 10 41% 1% 22% 10% 6% 10% 1 1 2 6% 6% 13% 8 1 50% 6% 1 4 4 2 2 6% 24% 24% 12% 12% 4 17 24% 100% Other Masters 6% 5 2 1 4 29% 12% 6% 24% 2 2% 1 7 7% 3 19% 3 98 100% 16 100% 17 Introductory Budget & Finance Course 6% 18% 100% MPP Programs 9 9% 4 25% 10 10% 1 6% 2 2% 4 4% 70 71% 10 63% 1 1% 2 2% 1 6% 98 100% 16 100% Later Budget & Finance Courses MPA Programs No course No software SPSS only SPSS + Excel Excel only Stata + Excel All other¹ Total MPP Programs Other Masters 1 MPA Programs No course No software SPSS only SPSS + Excel Excel only Stata + Excel All other¹ Total MPP Programs 2 2% 3 3% 41 42% 7 44% 27 28% 1 6% 16 16% 3 3% 4 25% 3 3% 1 6% 1 1% 2 2% 3 19% 98 100% 16 100% Later Statistics Courses 48 9 1 5 32 1 2 98 49% 9% 1% 5% 33% 1% 2% 100% MPP Programs Other Masters 2 1 12% 6% 13 76% 1 17 6% 100% Other Masters 6 2 38% 13% 2 2 12% 12% 7 44% 12 71% 1 16 6% 100% 1 17 6% 100% ¹ “All other” includes a few mentions of other software (R, JMP, grelt, and Crystal Ball) and/or other combinations of two or three programs (such as SAS and SPSS; SPSS and Stata). Adams, Infeld, & Wulff — Page 23 of 31 Figure 1: Programs with Quantitative Classes, Software, and Labs Adams, Infeld, & Wulff — Page 24 of 31 Figure 2: Software Used in Quantitative Classes Adams, Infeld, & Wulff — Page 25 of 31 Figure 3: Software Used in the Introductory Statistics Course Adams, Infeld, & Wulff — Page 26 of 31 Figure 4: Software Used in Later Statistics Courses Adams, Infeld, & Wulff — Page 27 of 31 Figure 5: Perceived Software Use by Relevant Employers (Among SAS, SPSS, and Stata Only) Adams, Infeld, & Wulff — Page 28 of 31 Figure 6: Perceived Trends in Software Popularity during the Past Decade (Among SAS, SPSS, and Stata Only) Adams, Infeld, & Wulff — Page 29 of 31 Figure 7: Relevant Job Announcement Requesting Software Familiarity (SAS, SPSS, and Stata Only) Adams, Infeld, & Wulff — Page 30 of 31 Figure 8: Statistical Software References in Google Scholar, 1996-2010 Adams, Infeld, & Wulff — Page 31 of 31 Endnotes 1 During its acquisition by IBM, SPSS was known for a while during 2009 and 2010 as PASW (Predictive Analytics SoftWare) but has reverted to its SPSS name, which once stood for “Statistical Package for the Social Sciences.” 2 In Excel 2003, Keeling & Pavur (2007) and McCullough & Wilson (2005) found “substantive improvement” in its standard deviations. 3 Stata/IC is available (without a license expiration) for $179, with virtually unlimited cases, and up to 2,047 variables. Certainly for faculty, doctoral students, and perhaps some masters students, that could be an especially attractive value surpassing the regular basic packages from the competition. 4 This figure excludes a total of 19 representatives who were from NASPAA affiliates outside the United States, had bounced/invalid email addresses, or had previously opted out of all online surveys conducted via SurveyMonkey.com. Eligible representatives who were not certain about software issues were asked to invite an “appropriate faculty colleague” to participate in the short survey. 5 Four programs conduct statistical software training in computer workshops that, while not formally linked to a specific course, were added to the “computer lab” total. 6 With direct network effects, the value of an interactive good increases as more people purchase the same good. (See Katz & Shapiro 1985.) Common examples include telephones and fax machines. If few people have telephones or fax machines, they are not nearly as valuable as when many people have them. 7 Of the ten most populous states, four (Georgia, North Carolina, Pennsylvania, and Texas) lack keyword searchable listings for state government jobs or do not seem to run full-text searches for any keywords. Running large budget deficits and undergoing major cutbacks, state government hiring in 2010 was not strong. 8 Six of the ten most populous cities had online job listings that could either be searched for keywords or had so few jobs listed that they could be searched individually. 9 See Adams (2010), for more on the advantages and disadvantages of Google Scholar. 10 Google Scholar categorizes most public administration and public policy journals into its subject area called “Business, Administration, Finance, and Economics.” Political science journals are found, as expected, under “Social Sciences, Arts, and Humanities.” Also note that “Excel” citations were inexact because many articles use the word as a verb. A sample of 150 articles using the word “Excel” or “excel” found that 56% were referring to the software, so that proportion was used to adjust the estimate of articles mentioning Excel.