Adams-Infeld-Wulff APPAM Statistical Software for Students

Statistical Software for Students:
Academic Practices and
Employer Expectations
William C. Adams, Donna Lind Infeld, & Carli M. Wulff
Trachtenberg School ● George Washington University
Adams, Infeld, & Wulff — Page 1 of 31
Statistical Software for Students:
Academic Practices & Employer Expectations
O
ver the past several decades, students of public administration, public policy, and
public affairs have regularly been taught statistics using various computer packages,
such as SAS, SPSS, or Stata. However, there has been little published exploration or
discussion of which statistical software packages are of greatest benefit to these students.
To explore this topic, we conducted a multi-method study addressing the following
research questions:
1. Is there any available evidence that indicates a particular statistical software
program is superior?
2. Which statistical software programs are most widely integrated into MPA, MPP and
related masters programs?
3. What statistical software skills, if any, do relevant employers specify in job
announcements? And for those continuing in academe, are there trends in software
use?
Answering these questions required a canvass of our peers and our students’ prospective
employers as well as research on the merits of the competing software.
Prior studies have confirmed that most MPA, MPP, and related masters programs do
require an introductory course in statistics and in budget and finance, and often offer more
advanced courses as well (Infeld & Adams 2011; Koven, Goetzke, & Brennan 2008; Morçöl
& Ivanova 2010; NASPAA 2009). But, in the data-rich, computer-based world of the 21st
century, what is being taught in those quantitative courses regarding specific statistical
software? No prior systematic report of either academic practices or relevant employer
needs could be found. However, before exploring this unchartered territory, we searched
for evaluations of the comparative merits of prominent software options.
Software Merits
Despite the current enormous emphasis on outcome evaluations in education in general,
and public affairs education in particular (Newcomer & Allen, 2010; Aristigueta & Gomes
Adams, Infeld, & Wulff — Page 2 of 31
2006; Castleberry 2006; Durant 2002; Fitzpatrick & Miller-Stevens 2009; Powell 2009;
Roberts & Pavlak 2002; Williams 2002), as reviewed below, remarkably few published
articles assess the comparative advantages of competing software for students and only
one outcome experiment was identified. This literature summary highlights Excel, SAS,
SPSS, and Stata because those findings are especially relevant to the subsequent analysis.
Features. The sole recent comparison of functionality that could be found lists the features
of many statistical programs (excluding Excel) regarding regression analysis, time series
analysis, ANOVA, and selected other statistics (Wikipedia, 2011). In these areas, leading
software such as SAS, SPSS,1 and Stata all run the most common and many arcane statistical
procedures, although a few exotic options are missing (e.g., SPSS lacks quantile regression,
autoregressive conditional heteroscedasticity analysis, and generalized autoregressive
conditional heteroscedasticity analysis).
Accuracy. Most users take for granted that the computations used by statistical software
yield precise and identical results. Unfortunately, that is not necessarily the case. Because
software performance is so often measured in terms of speed, programming shortcuts that
gain speed can sacrifice exactitude.
Altman & McDonald (2001) obtained mixed results when comparing Excel, SAS, SPSS, Stata,
and several other software packages. The good news was that statistical software packages
“typically” (though not always) provide “correct answers, to at least the fourth significant
digit, for univariate statistics, regression problems, low-difficulty analysis of variance
problems, and low difficulty nonlinear regression programs” (p. 684). The bad news was
that many programs were unreliable for nonlinear regression and were “unable to return
accurate ANOVA results for problems of even average difficulty” (p. 684). Excel 1997
could not even properly calculate standard deviations for data involving eight-digits or
more.2
Later, using large, complex datasets, researchers at the National Center for Health Statistics
performed various procedures using SAS, SPSS, Stata, and a less-widely used program,
SUDAAN. They obtained “identical results” and concluded that, given this equally high level
of precision, software choices should be driven by other factors such as cost, ease of use,
and data management capabilities (Siller & Tompkins 2006). However, a more extensive
Adams, Infeld, & Wulff — Page 3 of 31
analysis (Keeling & Pavur 2007) of nine software packages (including Excel 2003, SAS 9.1,
SPSS 12.0, and Stata 8.1) did detect various shortcomings (especially with nonlinear
regression and autocorrelation calculations) but found notable improvements from earlier
versions in almost every area.
Since 2007, Excel’s computations have continued to draw criticism, however no studies
were found that gauged the accuracy of later versions of other statistical software. Indeed,
Excel has a history of nontrivial computational errors (Sawitzki 1994, McCullough &
Wilson 1999, 2002, 2005; Knüsel 1998, 2002, 2005; Altman, Gill, & McDonald, 2004). A
more recent examination of Excel 2007 reached disturbing conclusions (McCullough &
Heiser 2008, p. 4570):
Excel 2007, like its predecessors, fails a standard set of intermediate-level
accuracy tests.... [I]t is not safe to assume that Microsoft Excel’s statistical
procedures give the correct answer. Persons who wish to conduct statistical
analyses should use some other package.
Among the many errors that McCullough & Heiser identified are a flawed linear regression
algorithm and output, erroneous nonlinear regression results, a nonrandom random
number generator, and inaccurate t-tests (incorrect results especially when missing data
are involved; wrong p-values, and even mistaken labels). Also regarding Excel 2007, Yalta
(2008) found that “the accuracy of various statistical functions range from unacceptably
bad to acceptable but significantly inferior” to alternative implementations. Others
identified serious flaws in Excel’s polynomial trend line equations (Hargreaves &
McWilliams 2010) as well as other areas (Almiron et al. 2010). Some statisticians argue
against using Excel (or any other spreadsheet) as vehicle for teaching statistics (Nash
2008), especially given what they view as misleading and confusing default charts (Su
2008). No scholarly studies defending the precision of Excel could be found.
To date, there seem to be no published critiques of Excel 2010, but McCullough & Heiser
were pessimistic about future prospects (2008, p. 4570):
Microsoft occasionally fixes errors, more often ignores them, and sometimes
fixes them incorrectly. Consequently, every time there is a new version of
Excel, the [accuracy] tests must be repeated.
Adams, Infeld, & Wulff — Page 4 of 31
McCullough (personal communication, August 21, 2011) stated that Microsoft still does not
document its “unsupported claims of accuracy” for Excel with any sort of actual “test code
with known inputs and outputs.”
Usability. No systematic studies of the relative user-friendliness, accessibility or learning
curve could be found in the published literature. SPSS and Stata offer both a command line
interface and a menu-driven, graphical user interface, while SAS is only command based
and Excel does not have a command line option. All four run on Windows and Linux; Excel,
SPSS, and Stata also offer Mac versions, although SAS does not. Two personal evaluations
argued that Stata was superior to SAS and SPSS, but both authors had written books about
Stata (Acock 2005; Mitchell 2005).
Costs. Expense matters and certainly gives Excel a structural advantage, often pre-installed
on computers or seemingly complimentary as part of the Microsoft Office suite. Also at no
extra cost, as of 2011, “SAS OnDemand for Academics” allows online, cloud-based access to
a wide range of SAS applications and is free of charge to college instructors and their
students. The price of the latest student version of SPSS fell sharply in 2011 and was
available on Amazon.com for less than $28, with its limitations of a 13 months license, 1500
cases, and 50 variables. Stata, which not long ago was the least expensive, is now the most
expensive for students with the “Small Stata” product costing $49 and limited to one year,
1200 cases, and 99 variables.3 Site licenses for many academic lab computers quickly run
into thousands of dollars and often entail complex fee formulas, volume discounts, and vary
further depending on the feature sets to be included. While those exact site license calculations are beyond the scope of this study, they cannot be dismissed as irrelevant.
Pedagogical value. By the 1990s, the dominant pedagogical model for statistics had
decisively moved from a passive lecture approach to more active student engagement
using a variety of activities, such as creative problem solving, practical applications,
discussions, small groups, original research, plus more interpretation and analysis beyond
rote calculations (Moore 1997; National Research Council, 1990, 1991). Not that
expository texts and classroom lecture have been jettisoned entirely, but they came to be
seen as insufficient alone to successfully engage active learning. Consequently, involving
Adams, Infeld, & Wulff — Page 5 of 31
students in active data exploration via software became an especially valuable tool for the
new pedagogy.
Which software is most suitable for this task? As Moore observed (1997, p. 131), “Software
designed for doing statistics is not necessarily well structured for learning statistics.” But
there is little published research outside of a few scattered personal reflections and
anecdotes about classroom advantages of using any particular software. Certainly arguments can reach the level of religious differences given some people’s attachment to and
investment in their preferred statistics software package.
The only randomized, controlled test of software was conducted among 24 undergraduates
(mostly criminal justice majors) in an introductory statistics course at Indiana University
(Proctor 2002). The dozen who used Excel scored higher than the dozen who used SPSS in
terms of computational knowledge and slightly higher in conceptual knowledge. One might
have expected that more systematic, comparative outcome evaluations would have been
conducted by now – given the cost of software licenses, the time spent in computer
laboratories, and the large investment in instructional communication – in order to
optimize the software selection for quantitative courses.
Issues of accuracy notwithstanding, there are hints in the literature that Excel is making
inroads beyond its traditional domain of budget, finance, and business. Articles are
appearing about ways to use Excel, usually incorporating its Analysis ToolPak add-in, to
teach applied statistics in fields as varied as psychology (Warner & Meehan 2001), nursing
(DiMaria-Ghalili & Ostrow 2009), and engineering (Prvan, Reid, & Petocz 2002) as well as
business (Bell 2000). Likewise, some statistics textbooks have begun to focus on Excel (e.g.,
Dretzke, 2011; Carlberg 2011).
All in all, prior research does not offer much guidance regarding the best software to
employ in our quantitative classes. It does raise questions about Excel’s precision, but
otherwise we seem to be left in the dark with only our personal anecdotes and experiences.
Yet academic programs must still make decisions about what software packages to require
of their students. We therefore sought to discover what programs across the country
currently require for masters students.
Adams, Infeld, & Wulff — Page 6 of 31
Program Practices
A nationwide online poll of 260 eligible4 NASPAA representatives was conducted May-July,
2011. Completed surveys constituted a total of 131 accredited and not accredited masters’
programs. Responses encompassed over half (n=98; 52%) of the Master of Public
Administration (MPA) programs, over half (n=16; 53%) of the Master of Public Policy
(MPP) programs, most (n=5; 71%) of the Master of Public Affairs programs, and twelve of
the many dozens of varied other public sector-related masters degrees. Responses from
MPP programs also represent over one fourth of those that have institutional membership
in the Association for Public Policy Analysis and Management (APPAM).
Insert Table 1 and Figure 1 here.
Of the 131 programs, only two do not offer an introductory statistics course. (See Figure 1
and Table 1.) Only four programs teach statistics without employing a software program.
Before dismissing these exceptions, it should be noted that comments volunteered by a few
who do use software in MPA programs show skepticism or ambivalence.
“We’re having a lively debate about whether any statistics package should be
used. Do MPAs really need to be able to run regressions and such? If we train
public leaders, is this a quality they need? … [Some alumni say] these software programs got them their first job. Others say it was a waste of 3 credit
hours.”
“The administrator is far better having a general knowledge… and ensuring
the expertise is in place than knowing one form of software well unless he or
she plans to specialize in some way. If an administrator has time to sit and
analyze data, he or she is probably not doing the job especially well.”
“Most of our students don't use stats on the job. Some use Excel on the job
but don't need training from us. They get their real training on the job.”
Despite reservations from a few representatives, the overwhelming majority of these
degree programs utilize statistical software. In the introductory statistics course, 97% of
the MPA programs employ software (excluding the two programs without such a course),
as do 100% of the MPP programs and 94% of the other masters programs. Among those
programs using software in this course, most offer a companion computer lab (74% of MPA
programs, 88% of the MPP programs, and 81% of other masters programs).5 Follow-up
Adams, Infeld, & Wulff — Page 7 of 31
statistics courses are offered by a majority of MPA programs (59%) and a very large
majority of MPP and other masters programs (94%); such courses almost always use
statistical software and often include a computer lab. (See Figure 1 and Table 1.)
An introductory budget and finance course is offered by large majorities (92% of MPA
programs, 75% MPPs, 88% all others) and typically employs statistical software (89% of
those MPA programs offering a course; 92% MPPs; 82% others) although, in this case, most
do not have an accompanying computer lab (only 30% of MPA programs using software in
this course add a lab; 45% MPPs; 14% others). Subsequent budget and finance courses are
less common, but, if offered, usually use statistical software without accompanying
computer labs. (See Figure 1 and Table 1.)
Thus, many students in these MPA, MPP, and other masters programs are likely to have
worked with statistical software in at least two courses and perhaps several, depending on
their degree program, fields, and elective choices. Courses in program evaluation, policy
analysis, and certainly capstone projects sometimes entail the use of statistical software as
well. What software are these programs featuring?
For budget and finance courses, the answer is easy: Excel. Having long ago vanquished its
foes like Lotus 1-2-3, Quattro Pro, and PlanPerfect, Excel dominates the academic spreadsheet world without any major rival. (See Figure 2 and Table 1.) Only a handful of these
masters programs do not use Excel alone in their budgeting and finance course(s); most of
these exceptions added SPSS, Stata, or some other software to Excel; two outliers use SPSS
alone.
Insert Figure 2 here.
For introductory statistics courses, software choices are less uniform. SPSS is most widely
taught, but it lacks the near monopoly that Excel has in budget and finance courses. In MPA
programs, a large majority (70%) use SPSS as do a majority in MPP (63%) and other
masters programs (59%). (See Table 1 and Figures 2 and 3.) A plurality in MPA programs
(42%) and MPP programs (44%) use SPSS alone, but it is also used in conjunction with
Excel, especially in MPA programs (28%). In MPP programs, Stata supplants Excel as a
main rival to SPSS. Other software (R, JMP, gretl, and Crystal Ball) were rarely mentioned.
Adams, Infeld, & Wulff — Page 8 of 31
Insert Figure 3 and Figure 4 here and ideally on the same page for easy comparison.
For subsequent statistics courses, Stata emerges as a stronger contender, markedly
surpassing SPSS among MPP programs and showing a nontrivial presence in MPA and
other masters programs. (See Table 1 and Figures 2 and 4.) SPSS continues to be the
software of choice among MPA programs, used in over two-thirds of the later statistics
classes, sometimes along with Excel or other software.
Excluding Excel, respondents were asked to rank the demand for SAS, SPSS, and Stata
among employers: “How would you estimate the current usage of these three software
packages at the jobs your Masters students seek?” As shown in Figure 5, three out of four
academics believe that SPSS is most widely used by relevant employers, with opinions split
as to whether SAS or Stata is the runner-up.
Insert Figure 5 and Figure 6 here.
In terms of trends, however, the position of SPSS is not so secure. Respondents were asked
if they had “noticed any trends in the popularity of these software programs over the past
decade.” As shown in Figure 6, Stata is the clear winner in perceived momentum with 52%
saying Stata has gained and only 6% saying it has lost popularity. That net positive 46%
compares with a net negative 15% for SAS and a net negative 11% for SPSS. Of course this
question asked about relative trends, not absolute standing where SPSS still comes out
ahead in both university practices and projected employer needs. Yet, the widespread
impressions of Stata’s strides — indeed three out of ten said it had gained “a lot” — suggest
that SPSS (and programs that teach it) ought not to rest on its laurels. In this vein, one
respondent commented:
“We relied on SPSS as our stat package for years, particularly in the intro
course. However costs and difficulty in licensing are making us consider
alternatives. Since some of us use Stata in our own work, and it is higher
education friendly, I think we will be moving more and more in that
direction.”
In both questions about perceived usage and trends of these three statistical software
packages, our assumption was that, as a spreadsheet, Excel is extremely important but falls
Adams, Infeld, & Wulff — Page 9 of 31
in a somewhat different category. However, some respondents volunteered that this trio
was inconsequential because nothing really matters but Excel.
“Most of our students are using Excel or similar, job specific applications, not
these stats software.”
“I cannot answer [those] questions because I believe that our employers use
none of the three statistics packages mentioned.”
“In polling our students and graduates only Excel is used in the work world.”
Overall, these 131 Masters programs overwhelmingly employ Excel in their budget and
finance courses. For statistics courses in MPA programs, SPSS dominates, but sometimes
shares the spotlight with Excel. Among MPP programs, SPSS fights off Stata in the first
round statistics course, but in subsequent courses Stata wins, with Excel on the sidelines.
Other masters programs were in-between, with SPSS on top but with both Stata and Excel
often used as well. SAS, R, and other programs are rarely used.
Job Listings
Even if all other things (such as cost and ease of instruction) are not equal, perhaps extra
consideration should still be given to statistical software that will be more valuable after
graduation. Why focus on software that is rarely used outside one’s halls of ivy?
The literature on diffusion and network effects6 repeatedly indicates that — when it comes
to adopting software and hardware — people place an enormous premium on connectivity
and minimizing transaction costs (Brock 1975; Katz & Shapiro 1985; Economides 1996).
Working collaboratively, exchanging files, and discussing issues are all enormously
facilitated when everyone is using a common language, network, and system. Once interconnectivity is established with a shared framework, a new improved framework must be
dramatically superior, not just modestly better, to dislodge the advantages of network
connectivity and overcome switching costs (Farrell & Saloner 1986; Katz & Shapiro 1986).
Network pressures may push products to consolidate on a single standard (such as VHS
over BetaMax, Blu-Ray over HD-DVD, cassette tapes over eight-track tapes), but sometimes
after winnowing out most competitors a couple of entrenched networks may endure (e.g.,
PCs and Macs).
Adams, Infeld, & Wulff — Page 10 of 31
Surely having familiarity with a specific software package (sufficient to put on one’s
resume) is an asset when applying for a job with an employer who uses that same software.
Already knowing Stata and speaking Spanish might still be of some value even if the
organization uses SPSS and often works in Brazil, but in this example SPSS and Portuguese
would be even better. While on-the-job training may be easier for software than for a
foreign language, job applicants who are already conversant must have at least some
advantage, especially when the employer advertises that such a specific skill is preferred.
Which statistics software do relevant employers most often mention? We reported above
the impressions of academics regarding software used by employers, but what software
skills are explicitly sought by employers. Job announcements were collected from the
following online sites (April 20-30, 2011):

USAJobs.gov (listing federal government jobs)

Idealist.org (listing jobs with nonprofit organizations)

State governments (ten largest states)7

City governments (ten largest cities)8

Major consulting firms (BoozAllen, Delloite, ICF, LMI, and PricewaterhouseCoopers)

Major policy research organizations (Brookings, Mathematica, RAND, Urban
Institute)
While hiring had not rebounded to pre-recession levels, searches for full-time jobs using
the keywords SAS, SPSS, Stata, and Excel (always checking to confirm “Excel” was not a
verb) yielded a total of 409 job listings. A close inspection of listings found that few other
statistical packages were requested so the summary here is confined to these four. Eleven
job listings were eliminated as specialized outside an area appropriate for typical MPA and
MPP students (e.g., actuary, statistician, information technology programmer). Of the
remaining 398 openings, only a dozen specified “proficiency” or “mastery” of at least one of
these packages. Most descriptions used vague but less demanding words such as
“familiarity with,” “skill with,” “knowledge of,” or often simply “experience with.”
Insert Figure 7 here.
Adams, Infeld, & Wulff — Page 11 of 31
Because the precise distribution of cited software is likely to shift somewhat from week to
week, the value of this snapshot is the big picture that it provides, and that picture is quite
clear:
1. Relevant employer requests for experience with Excel (n=290) surpassed that of the
other three statistical software combined (n=108 positions for SAS, SPSS, or Stata).
2. In the race for the second-place ranking behind Excel, no single software package
dominates. (See Figure 7.)
3. SPSS and SAS do appear to outrank Stata (especially with half of the “Stata only”
listings due to its popularity at just one organization, the Urban Institute).
4. Familiarity with SAS alone fits two-thirds of the listings citing one of these packages
(68%), SPSS alone is sufficient for almost two-thirds (64%); and Stata alone satisfies
about four out of ten (38%).
5. Familiarity and experience with any two of these three packages would qualify an
applicant at least eight out of ten of these openings where familiarity with a “nonspreadsheet” package is sought.
Subsequent searches (conducted May 10, 2011) of other databases repeated these same
basic findings: Again, Excel excelled. Among the others, none was dominant, but SAS and
SPSS were consistently ahead of Stata:

NACElink.com (under “government”): Excel 105, SAS 9, SPSS 7, Stata 3.

PolicyJobs.net (with Anglosphere international listings): Excel 133, SAS 49, SPSS 32,
Stata 26.

The authors’ school’s “Career Central” web site listing jobs from (mostly local area)
nonprofits, think tanks, consulting firms, and government: Excel 205, SPSS 20, SAS
16, and Stata 7.
Data from these additional sites were not incorporated into the findings shown in Figure 7
because of some duplicate listings and a slightly different time period, but the results
reflected the same general pattern. The authors’ school’s “Career Central” web site was
also helpful for another purpose. Because all of the listings on this site were ideal for MPA
and MPP graduates, it allowed us to compute a ratio (1:3) of listings requesting statistical
Adams, Infeld, & Wulff — Page 12 of 31
software familiarity (n=248) to listings with no such mentions (n≅733). Thus, about three
fourths of the positions did not appear to require any software background or took it for
granted.
However, most posts that did cite a software preference seem to have been serious about it.
In theory, these job announcements might mention specific statistical software as examples
not as an exclusive list, and thus a candidate who was familiar with a rival package would
still be on equal footing. If so, employers could easily avoid needlessly narrowing their
recruitment by simply inserting “such as,” “e.g.,” or “etc.” before naming merely illustrative
software. In fact, only 22% of those who mentioned SAS, SPSS, or Stata suggested any
openness to alternatives, while a large majority (78%) did not indicate any flexibility.
Thus, it seems fair to conclude that, in the absence of any hint otherwise, an advertisement
recruiting someone with familiarity with SPSS, for example, is indeed intended for hiring
someone with familiarity with SPSS. Fewer than 1% of the job ads requesting Excel
familiarity could be construed as open to alternative software.
In addition to gauging the software preferences of relevant employers, we also examined
statistical software used in academic social science research. After all, some masters
graduates go on to earn doctorates and pursue academic careers. Using Google Scholar’s
extensive collection of books and journal articles, full text searches were run for publications from 1996 thru 2010.9 While journal articles no longer routinely mention which
software was used for calculations, many still do.
Insert Figure 8 here.
Across the three five-year periods, while SPSS made some gains, the most striking trend is
the surge in citations of Stata in both social science journals and in administration and
policy journals.10 (See Figure 8.) This provides empirical validation for the notion among
many academics, discussed previously and summarized in Figure 6, that the use of Stata
has grown markedly over the past decade. Over time far more sources have been digitized,
but the relative proportion of mentions of Stata has grown from single digits to 27% among
these four statistical software packages. Stata’s gains have come at the expense of Excel
and SAS. SPSS has held steady in the social science area and gained in the administration,
Adams, Infeld, & Wulff — Page 13 of 31
policy, and economics area to garner three out of ten mentions and hold the plurality
position in the most recent period studied.
Summary
Only one small randomized outcome study comparing the effectiveness of statistical
software could be found and the limited literature on the comparative merits of leading
software is mostly impressionistic. An exception is a series of studies of computational
accuracy indicating that considerable progress has been made over the past decade but still
raises concerns, especially when high levels of precision are required with large datasets,
when using some more advanced statistics such as nonlinear regression, and about several
aspects of Excel. Overall, for purely pedagogical purposes in graduate courses, no objective
basis was found promoting any particular one of the leading statistical software packages.
Our large nationwide survey of MPA, MPP, and related masters programs found that Excel
has a near monopoly in budget and finance courses. For statistics courses in MPA
programs, SPSS dominates, although Excel is also often used. Among MPP programs, SPSS
is the most widely used in the initial statistics course, but Stata is used more often in later
courses. Other masters programs were in-between, with SPSS again on top but with both
Stata and Excel also popular. SAS, R, and other programs are rarely used in these masters’
courses. Relevant employers of these students request familiarity with Excel far more than
with any other quantitative software. Among nonspreadsheet options, SAS and SPSS skills
were most widely sought.
For the pure purpose of teaching statistical concepts, career market considerations are
irrelevant of course; the choice would be driven instead by a benefit-cost estimate of the
comparative effectiveness of various software packages in engaging students to explore
and understand data analysis. Ideally, we might strive to do both: strengthening students’
statistical literacy while simultaneously providing statistical software experience that will
be of value in applying for and on the job. Our findings about current academic practices
and employer preferences can provide input for decision-making about future program
directions. The findings summarized below are framed as responses to various views
about statistical software skills.
Adams, Infeld, & Wulff — Page 14 of 31
“No statistical software is needed.” To be sure, a majority of relevant employers did not
advertise that they required any statistical software familiarity. Perhaps they assume that
good candidates these days would at least be acquainted with spreadsheets, but perhaps
they do not care. The “no statistical software” view requires a gamble that our graduates
can find good jobs without being eligible for the roughly one fourth of the positions that call
for specific software skills, that statistical software familiarity would a trivial asset on their
resumes, or that they can acquire sufficient statistical literacy without active learning
through engagement with software.
“Statistical software choices are interchangeable.” In fact, specific software skills do matter.
Only about one fifth of the job advertisements mentioning statistical software indicated
that substitutions for the preferred software would be acceptable. In today’s “buyer’s
market,” employers can easily screen for particular skills and avoid the costs of remedial
on-the-job training.
“Excel training should surpass all else.” This argument is actually not extreme. Employer
requests for Excel experience far surpassed that for all nonspreadsheet statistical software
combined. While Excel is used in nearly all introductory budget and finance courses (often
with no computer lab) and in many statistics courses, perhaps more graduate programs
should add computer labs to further strengthen Excel skills. Despite its shortcomings in
some advanced calculations Excel’s accuracy does seem to be improving.
“SPSS is still the best nonspreadsheet statistics package.” A good case can be made for SPSS.
SPSS and SAS are the nonspreadsheet packages most requested by employers, and, unlike
SAS, SPSS has an available menu format that may help account for its status as the leading
statistical software begin taught in introductory statistics courses. Moreover, SPSS has
gained or held its own in academic research citations, while SAS and Excel have declined.
“SAS should rule.” SAS supporters will assert that, despite its surprisingly rare use in MPA
and MPP graduate programs, it is functionally powerful enough that it is worth any extra
effort. Moreover, our study did find that relevant employers tended to seek familiarity with
SAS slightly more than SPSS and clearly more than Stata.
Adams, Infeld, & Wulff — Page 15 of 31
“Stata is the wave of the future.” Advocates of Stata can point to its impressive trend line,
more than tripling its academic citations in one decade. It is also noteworthy that after the
introductory statistics courses, a majority of the MPP programs surveyed along with many
MPA and related masters programs, turn to Stata in the intermediate/advanced statistics
class. Yet, while Stata’s advances suggest an increasingly important future, it may be
premature to jettison the older packages that are still decidedly more popular among
relevant employers.
“Give MPAs familiarity with Excel and SPSS and give MPPs familiarity with Excel, SPSS, and
Stata.” This mixed-methods approach, found to be the dominant practice nationwide, has a
strong rationale and should open up the widest range of job opportunities to graduates.
Few employers request mastery or proficiency, so experience with Excel plus exposure to
at least one menu-driven statistical software package like SPSS appears to be the optimal
configuration for MPAs. The additional statistics course(s) that MPPs typically take offer an
opportunity to add Stata or SAS to their repertoire. Aiming for research-oriented careers,
that larger analytical tool chest should be of special value to MPP graduates.
For curriculum decision-making purposes, this research offers some provocative and useful
findings although not a blueprint. The software used to teach statistics must be
determined by the careful assessments of individual programs. Given the considerable
time, expense, and energy devoted to teaching quantitative courses and the potential
impact on job opportunities of graduates, these decisions are consequential ones.
Excel is a registered trademark of Microsoft Corporation, Redmond, Washington.
SAS is a registered trademark of the SAS Institute Inc., Cary, North Carolina.
SPSS is a registered trademark of IBM Corporation, Armonk, New York.
Stata is a registered trademark of StataCorp LP, College Station, Texas.
Adams, Infeld, & Wulff — Page 16 of 31
Authors:
William C. Adams, professor of Public Policy and Public Administration at The Trachtenberg School at The George Washington University, teaches research methods and applied
statistics courses to both MPA and MPP students. His most recent book is Election Night
News and Voter Turnout: Solving the Projection Puzzle. Contact: adams@gwu.edu
Donna Lind Infeld is professor at the Trachtenberg School of Public Policy and Public
Administration at The George Washington University where she is director of the doctoral
program. She teaches graduate courses in policy analysis and research methods. Dr. Infeld’s
research is often in the area of aging and long-term care. Contact: dlind@gwu.edu
Carli M. Wulff is an MPA candidate specializing in social policy at The Trachtenberg School
at The George Washington University and formerly served as a Peace Corps volunteer in
Kyrgyzstan, 2006-2008.
The authors wish to thank Stephanie Celinni, Dylan Conger, and Laura Minnichelli for their
comments and suggestions. An earlier version of this paper was presented at the
Association for Public Policy & Management’s Teaching Workshop, November 2, 2011,
Washington, DC.
Adams, Infeld, & Wulff — Page 17 of 31
References
Acock, A. C. (2005). SAS, Stata, SPSS: A comparison. Journal of Marriage and Family, 67(4),
1093-1095.
Adams, W.C. (2010). Using the Internet. In J. Wholey, H. Hatry, and K. Newcomer, eds.,
Handbook of Practical Program Evaluation, 3e. San Francisco: Jossey-Bass.
Altman, M., Gill, J., & McDonald M. (2004). Numerical issues in statistical computing for the
social scientist. Hoboken, New Jersey: John Wiley & Sons.
Altman, M., & McDonald M. (2001). Choosing reliable statistical software. PS: Political
Science and Politics, 34(3), 681-687.
Almiron, M.G., Lopes, B., Oliveira, A.L.C., Medeiros, A.C., Frery, A.C. (2010). On the
numerical accuracy of spreadsheets. Journal of Statistical Software, 34(4), 1-29.
Aristigueta, M., & M. B. Gomes, K.M.B. (2006). Assessing performance in NASPAA graduate
programs. Journal of Public Affairs Education, 12(1), 1–18.
Bell, P.C. (2000). Teaching business statistics with Microsoft Excel. INFORMS: Transactions
on Education, 1(1), 18-26.
Brock, G. (1975). Competition, standards, and self-regulation in the computer industry. In
R. E. Caves and M. J. Roberts, eds., Regulating the Product: Quality and Variety,
Cambridge: Ballinger.
Castleberry, T. E. (2006). Student learning outcome assessment within the Texas State
University MPA program. Applied Research Projects. Paper 182.
http://ecommons.txstate.edu/arp/182
Carlberg, C.G. (2011). Statistical Analysis: Microsoft Excel 2010. Indianapolis, IN: Que.
DiMaria-Ghalili, R.A., & Ostrow, C.L. (2009). Using Microsoft Excel to teach statistics in a
graduate advanced practice nursing program. Journal of Nursing Education 48(2),
106-110.
Adams, Infeld, & Wulff — Page 18 of 31
Dretzke, B.J. (2011). Statistics with Microsoft Excel. Upper Saddle River, NJ: Pearson/
Prentice Hall.
Durant, Robert F. (2002). Toward becoming a learning organization: Outcomes assessment,
NASPAA accreditation, and mission-based capstone courses. Journal of Public Affairs
Education, 8(3),193-208.
Economides, Nicholas. (1996). The economics of networks. International Journal of
Industrial Organization, 14(2), 673-699.
Farrell, J., & Saloner, G. (1986). Installed base and compatibility: Innovation, product
preannouncements, and predation. American Economic Review, 76, 940-55.
Fitzpatrick, J.L., & Miller-Stevens, K. (2009). A case study of measuring outcomes in an MPA
program. Journal of Public Affairs Education, 15(1), 17-31.
Hargreaves, B.R., & McWilliams, T.P. (2010). Polynomial trendline function flaws in
Microsoft Excel. Computational Statistics & Data Analysis, 54 (2010), 11901196.
Infeld, D.L., & Adams, W.C. (2011). MPA and MPP students: Twins, siblings, or distant
cousins. Journal of Public Affairs Education, 17 (2), 277–303.
Katz, M., & Shapiro, C. (1985). Network externalities, competition, and compatibility.
American Economic Review, 75(3), 424-440.
_______. (1986). Technology adoption in the presence of network externalities. Journal of
Political Economy, 94, 822-41.
Keeling, K.B., & Pavur, R.J. (2007). A comparative study of the reliability of nine statistical
software packages. Computational Statistics & Data Analysis, 51(7), 3811-3831.
Knüsel, L. (1998). On the accuracy of statistical distributions in Microsoft Excel 97.
Computational Statistics & Data Analysis, 26(3), 375-377.
________. (2002). On the reliability of Microsoft Excel XP for statistical purposes.
Computational Statistics & Data Analysis, 39(1), 109-110.
Adams, Infeld, & Wulff — Page 19 of 31
________. (2005). On the accuracy of statistical distributions in Microsoft Excel 2003.
Computational Statistics & Data Analysis, 48(3), 445-449.
Koven, S. G., Goetzke, F., & Brennan, M. (2008). Profiling public affairs programs: The view
from the top. Administration and Society, 40(7), 691–710.
Kraemer, K.L.; Bergin, T.; Bretschneider, S.; Duncan, G.; Foss, T.; Gorr, W.; Northrup, A.;
Rubin, B.; & Wish, N.B. (1986). Curriculum recommendations for public
management education in computing. Public Administration Review, 46 (Special
Issue), 595-602.
McCullough, B.D., & Wilson, B. (1999). On the accuracy of statistical procedures in Microsoft
Excel 97. Computational Statistics & Data Analysis, 31(1) 27-37.
________. (2002). On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel
XP. Computational Statistics & Data Analysis, 40(4), 713-721.
________. (2005). On the accuracy of statistical procedures in Microsoft Excel 2003.
Computational Statistics & Data Analysis, 49(4), 1244-1252.
McCullough, B.D., & Heiser, D.A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis, 52(10), 4570–4578.
Mitchell, M. N. (2005). Strategically using general purpose statistics packages: A look at
Stata, SAS and SPSS. Statistical Consulting Group: UCLA Academic Technology
Services (Technical Report Series, Report 1).
Moore, D.S. (1997) New pedagogy and new content: The case of statistics. International
Statistical Review, 65(2), 123-165.
Morçöl, G., & Ivanova, N. P. (2010). Methods taught in public policy programs. Journal of
Public Affairs Education, 16(2), 255–277.
Nash, J. (2008). Teaching statistics with Excel 2007 and other spreadsheets. Computational
Statistics & Data Analysis, 52(10), 4602-4606.
Adams, Infeld, & Wulff — Page 20 of 31
NASPAA (2009). NASPAA Standards: 2009.
http://naspaa.org/accreditation/doc/NS2009FinalVote10.16.2009.pdf
National Research Council (1990). Reshaping school mathematics: A philosophy and
framework for curriculum. Washington, DC: National Academy Press.
________. (1991). Moving beyond myths: Revitalizing undergraduate mathematics.
Washington, DC: National Academy Press.
Newcomer, K.E., & Allen, H.A. (2010). Public service education: Adding value in the public
interest. Journal of Public Affairs Education, 16(2), 207–229.
Powell, D.C. (2009). How do we know what they know? Evaluating student-learning
outcomes in an MPA program. Journal of Public Affairs Education, 15(3), 269–287.
Proctor, J.L. (2002). SPSS vs. Excel: Computing software, criminal justice students, and
statistics. Journal of Criminal Justice Education, 13(2), 433-442.
Prvan, T., Reid, A. & Petocz, P. (2002). Statistical laboratories using Minitab, SPSS, and
Excel: A practical comparison. Teaching Statistics, 24(2), 68–75.
Roberts, Gary E., & Pavlak, T. (2002). The design and implementation of an integrated
values and competency-based MPA core curriculum. Journal of Public Affairs
Education, 8(2), 115–129.
Sawitzki, G. (1994). Report on the numerical reliability of data analysis systems.
Computational Statistics & Data Analysis, 18(2), 289–301.
Siller, A.B., & Tompkins, L. (2006) The big four: Analyzing complex sample survey data
using SAS, SPSS, STATA, and SUDAAN. Paper 172-31 in Proceedings of the Thirtyfirst Annual SAS Users Group International Conference: San Francisco SAS Institute
Inc. http://www2.sas.com/proceedings/sugi31/172-31.pdf
Su, Y. (2008). It's easy to produce chartjunk using Microsoft Excel 2007 but hard to make
good graphs. Computational Statistics & Data Analysis, 52 (10), 4594-4601.
Adams, Infeld, & Wulff — Page 21 of 31
Warner, C.B., & Meehan, A.M. (2001) Microsoft Excel as a tool for teaching basic statistics.
Teaching of Psychology, 28(4), 295-298.
Wikipedia (2011) http://en.wikipedia.org/wiki/Comparison_of_statistical_packages
Accessed May 11, 2011.
Williams, David G. (2002). Seeking the Holy Grail: assessing outcomes of MPA programs. Journal
of Public Affairs Education, 8(1), 45–56.
Yalta, A.T. (2008). The accuracy of statistical distributions in Microsoft Excel 2007.
Computational Statistics & Data Analysis, 52(10), 4579-4586.
Adams, Infeld, & Wulff — Page 22 of 31
Table 1: Statistical Software Use in Graduate Courses
Introductory Statistics Course
MPA Programs
No course
No software
SPSS only
SPSS + Excel
Excel only
Stata only
Stata + Excel
SAS only
All other¹
Total
MPA Programs
No course
No software
SPSS only
SPSS + Excel
Excel only
Stata only
Stata + Excel
SAS only
All other¹
Total
40
1
22
10
6
10
41%
1%
22%
10%
6%
10%
1
1
2
6%
6%
13%
8
1
50%
6%
1
4
4
2
2
6%
24%
24%
12%
12%
4
17
24%
100%
Other Masters
6%
5
2
1
4
29%
12%
6%
24%
2
2%
1
7
7%
3
19%
3
98
100% 16
100% 17
Introductory Budget & Finance Course
6%
18%
100%
MPP Programs
9
9%
4
25%
10
10%
1
6%
2
2%
4
4%
70
71% 10
63%
1
1%
2
2%
1
6%
98
100% 16
100%
Later Budget & Finance Courses
MPA Programs
No course
No software
SPSS only
SPSS + Excel
Excel only
Stata + Excel
All other¹
Total
MPP Programs
Other Masters
1
MPA Programs
No course
No software
SPSS only
SPSS + Excel
Excel only
Stata + Excel
All other¹
Total
MPP Programs
2
2%
3
3%
41
42%
7
44%
27
28%
1
6%
16
16%
3
3%
4
25%
3
3%
1
6%
1
1%
2
2%
3
19%
98
100% 16
100%
Later Statistics Courses
48
9
1
5
32
1
2
98
49%
9%
1%
5%
33%
1%
2%
100%
MPP Programs
Other Masters
2
1
12%
6%
13
76%
1
17
6%
100%
Other Masters
6
2
38%
13%
2
2
12%
12%
7
44%
12
71%
1
16
6%
100%
1
17
6%
100%
¹ “All other” includes a few mentions of other software (R, JMP, grelt, and
Crystal Ball) and/or other combinations of two or three programs (such
as SAS and SPSS; SPSS and Stata).
Adams, Infeld, & Wulff — Page 23 of 31
Figure 1: Programs with Quantitative Classes, Software, and Labs
Adams, Infeld, & Wulff — Page 24 of 31
Figure 2: Software Used in Quantitative Classes
Adams, Infeld, & Wulff — Page 25 of 31
Figure 3: Software Used in the Introductory Statistics Course
Adams, Infeld, & Wulff — Page 26 of 31
Figure 4: Software Used in Later Statistics Courses
Adams, Infeld, & Wulff — Page 27 of 31
Figure 5: Perceived Software Use by Relevant Employers
(Among SAS, SPSS, and Stata Only)
Adams, Infeld, & Wulff — Page 28 of 31
Figure 6: Perceived Trends in Software Popularity during
the Past Decade (Among SAS, SPSS, and Stata Only)
Adams, Infeld, & Wulff — Page 29 of 31
Figure 7: Relevant Job Announcement Requesting
Software Familiarity (SAS, SPSS, and Stata Only)
Adams, Infeld, & Wulff — Page 30 of 31
Figure 8: Statistical Software References
in Google Scholar, 1996-2010
Adams, Infeld, & Wulff — Page 31 of 31
Endnotes
1 During its acquisition by IBM, SPSS was known for a while during 2009 and 2010 as PASW
(Predictive Analytics SoftWare) but has reverted to its SPSS name, which once stood for
“Statistical Package for the Social Sciences.”
2 In Excel 2003, Keeling & Pavur (2007) and McCullough & Wilson (2005) found “substantive
improvement” in its standard deviations.
3 Stata/IC is available (without a license expiration) for $179, with virtually unlimited cases, and
up to 2,047 variables. Certainly for faculty, doctoral students, and perhaps some masters
students, that could be an especially attractive value surpassing the regular basic packages from
the competition.
4 This figure excludes a total of 19 representatives who were from NASPAA affiliates outside the
United States, had bounced/invalid email addresses, or had previously opted out of all online
surveys conducted via SurveyMonkey.com. Eligible representatives who were not certain about
software issues were asked to invite an “appropriate faculty colleague” to participate in the short
survey.
5 Four programs conduct statistical software training in computer workshops that, while not
formally linked to a specific course, were added to the “computer lab” total.
6 With direct network effects, the value of an interactive good increases as more people purchase
the same good. (See Katz & Shapiro 1985.) Common examples include telephones and fax
machines. If few people have telephones or fax machines, they are not nearly as valuable as
when many people have them.
7 Of the ten most populous states, four (Georgia, North Carolina, Pennsylvania, and Texas) lack
keyword searchable listings for state government jobs or do not seem to run full-text searches
for any keywords. Running large budget deficits and undergoing major cutbacks, state
government hiring in 2010 was not strong.
8 Six of the ten most populous cities had online job listings that could either be searched for
keywords or had so few jobs listed that they could be searched individually.
9 See Adams (2010), for more on the advantages and disadvantages of Google Scholar.
10 Google Scholar categorizes most public administration and public policy journals into its subject
area called “Business, Administration, Finance, and Economics.” Political science journals are
found, as expected, under “Social Sciences, Arts, and Humanities.” Also note that “Excel” citations
were inexact because many articles use the word as a verb. A sample of 150 articles using the
word “Excel” or “excel” found that 56% were referring to the software, so that proportion was
used to adjust the estimate of articles mentioning Excel.