STAT582 HW7
Review of Two Statistical Software Packages – Minitab and SPSS
Yan Sun
As a statistician, have you ever got stuck in front of your computer, trying to figure out the correct syntax of a command to type into the little programming window, and just could not get it right? At that moment, I am sure you would wish there was some magic easy button that you could just click and then things would work the way they should.
Well, magic does not happen everyday. However, some better choices can make life easier. Instead of using programmed command lines, some statistical software make their usage much easier by using a menu-driven interface. This kind of software are like well-organized control panels. Each of the things you need to do is controlled by a button somewhere on the panel. Once you get familiar with the layout of the panel, the actual work should be quite an enjoyable process. Several good menu-interface statistical software are available. Among them, Minitab and SPSS are the most widely used ones.
This report serves as an introduction to these two software packages. For each of them, the software’s specialties, advantages, and suitability will be discussed. Some important functionalities, their implementations, and programming in the two software will be introduced. This report also includes
‘helpful resources’, which I personally found very helpful in learning and using the two software.
1. Minitab
History
Minitab was originally developed by Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L.
Joiner at Pennsylvania State University in 1972. Nowadays, it is a commercial product distributed by the Minitab Inc. The latest version of this software is Minitab15, which was released in 2008 (1).
This report is based on this latest version.
Specialties and advantages
Comparing to other statistical software, Minitab has several very attractive advantages:
(1) Easy to learn and easy to use. You do not need to memorize complicated programming languages to work with Minitab. All regular statistical functionalities can be performed in Minitab by one or several clicks in the pull-down menu. Besides, the menu is organized in a very intuitive way, such that it is not a hard thing to remember where to find what. All these features make Minitab very accessible to first-time
1
learners. Many statistics educators in higher education institutions prefer Minitab as the major software in teaching intermediate or some advanced level statistics courses (2).
(2) Quality control functionalities. Large amount of statistical functionalities can be performed in Minitab, ranging from simple basic statistics to much more complicated multivariate analysis. However, what makes Minitab stand out among many statistical software is its strength in statistical quality control. Minitab is equipped with almost all of the widely-used tools for process control, including analyzing methods, graphics, designs of experiments, etc. In fact, Minitab is the leading software package used by quality improvement professionals in all kinds of industries around the world. Based on the Minitab Inc. website (3), their clients include GE, TOSHIBA, Bank of America,
SAMSUNG, etc. Besides, Minitab has two other complementary software packages
–
Quality Trainer and Quality Companion - to further enhance its strength in quality improvement.
(3) Nicer graphing output. Most Minitab users are impressed by the variety and quality of the graphs generated by the software. Minitab can produce many kinds of statistical graphs, and they are very easy to be edited and customized. The quality of the graphs is superior than that of many other software (see figure 2 for an example).
Suitability
In the area of the most sophisticated statistical computation, Minitab is not as powerful as software packages such as STATA and R. So for academic research that involves intense and very complicated statistical computation and analysis, Minitab might not be the right choice. However, for most of the works in education, research, business, and industrial process control, which require intermediate or some advanced statistical analysis, Minitab is usually fully capable of meeting users’ needs.
Functionalities
The next part of the report will focus on some of the major functionalities in Minitab and their implementations.
(1) Data importation and general data manipulation . Minitab stores data in the form at of ‘worksheet’. To create a data worksheet, you can type in data directly, or import existing data into worksheet (File>Open) . Data in ‘text’ and ‘Excel’ formats can be imported directly into Minitab worksheet. Conveniently, if you check the ‘merge’ option when you import a dataset into an existing data worksheet, the new data will be put side by side with the old data to create a merged dataset (figure 1a).
Data in Minitab worksheet are very easy to edit and manipulate. Deleting rows/columns can easily be done by selecting the whole rows/columns, right clicking the mouse and choosing ‘delete cells’; Missing values can be filled in directly; One can transpose columns and rows of the dataset by clicking ‘Data>Transposing columns’. Due to the fact that Minitab has a menu-driven interface, you do not need to write codes to edit data. Most of the data manipulating functionalities are in the ‘data’ pull-down menu.
2
(2) Simple statistics and testing.
Most of the statistical functionalities are in the ‘stat’ pull-down menu. For in stance, in the ‘stat’ menu, the ‘basic statistics’ sub-menu contains ‘Display descriptive statistics’, ‘1-sample z test’, ‘2-sample t test’, etc; The
‘tables’ sub-menu contains chi-square test; In the ‘power and sample size’ sub-menu, one can easily calculate sample size or power. During the calculation and testing process, one can specify parameters by choosing different options in dialog boxes.
Figure 1b shows the options in ‘descriptive statistics’ one can choose to analyze his/her data. a. b.
Figure 1a.
old and new data can be merged together by checking the ‘merge’ option in the ‘open worksheet’ window.
1b.
Options of descriptive statistics in Minitab.
(3) Statistical analysis using different models.
Minitab can analyze data using different models. Under the ‘Stat’ menu, different methodology can be chosen to analyze your data, including ‘Regression’, ‘ANOVA’, ‘Multivariate’, ‘Time series’, etc.
After the method is selected, you can specify the model you want to use by further selecting the options in dialog boxes. For instance, if you want to use a general linear model that involves random effect, you can click ‘Stat>ANOVA>General linear model’ and specify the response, your model, and the random effect, in the dialog box, and then Minitab will do the analysis according to your specifications.
A lot more types of statistical analysis can be done in Minitab, including response surface analysis for continuous-categorical variable combined data
(Stat>DOE>Response surface), and nested ANOVA analysis for data that fit a nested variable model (Stat>ANOVA>Fully nested ANOVA). The StatGuide build-in manual in
Minitab contains very detailed information, which can help you choose appropriate method for your data and interpret the analysis results.
(4) Graphics.
Minitab contains a large variety of graphing functionalities. In here, you can generate histogram, QQ-plot, residual plot, and a lot more. Figure 2 includes QQplot, residual plot, histogram and ordered-data plot generated in Minitab based on the
‘diesel data’ we used in class (4). They were produced by checking the ‘graphs’ option
3
when regression analysis was performed (Stat>Regression). The plots can be generated in separate windows or arranged nicely in one window depending on your choice (‘individual plots’ vs ‘Four in one’). The graphs can easily be edited by right clicking on the items in the graph you want to change and selecting the appropriate options, such as ‘edit title’, ‘edit symbols’, ‘copy graph’, etc. Most of the graphing functionalities are either incorporated in the statistical analysis procedure (Stat) as options or gathered under the pulldown menu of ‘Graph’. plots for disel data
Normal Probability Plot
0.5
residual plot
99
90
50
0.0
10
1
-0.5
-0.6
-0.3
0.0
Residual
0.3
0.6
0.7
Histogram
0.5
4
2
0
8
6
0.0
-0.5
-0.6
-0.4
-0.2
Residual
0.0
0.2
0.4
1
Figure 2.
Graphs generated based on ‘diesel’ data.
5
0.8
0.9
1.0
Fitted Value ordered-data plot
1.1
10 15 20 25 30
Observation Order
35 40 45
Programming in Minitab
While Most of the statistical functions are accessible through menus in Minitab, they can also be performed through programming. The language used to program in Minitab is called ‘session command’. In the ‘session’ window, once the command prompt is activated (‘MTB>’ shows up in the window) by ‘Editor>Enable commands’, you can directly type in your commands and make the software perform the procedures. The following Minitab output results from the command: MTB > regress 'ignition' 1 'alcohol', based on the ‘diesel’ data:
The regression equation is ignition = 0.737 + 0.00486 alcohol
Predictor Coef SE Coef T P
Constant 0.73720 0.06419 11.49 0.000 alcohol 0.004863 0.001421 3.42 0.001
S = 0.247874 R-Sq = 20.6% R-Sq(adj) = 18.9%
One can also use subcommand to specify how the command needs to be carried out. If your work involves running the same program repeatedly, you can save the series of
Minitab commands as ‘Execs’, so that you can re-run the program in the future. The
4
Minitab build-in help manual contains detailed instructions on the syntax for almost all of the session commands and how to use ‘Execs’ and other more complicated micros.
The link in (5) is also a good source for quick reference.
Helpful resources
Minitab is well known for its user friendliness. The reason for this is not only its straight forward menu-interface, but also its excellent help facilities and resources. The followings are some of the resources that I feel very helpful in learning and using this software:
(1)
‘Meet Minitab15’ (6) - the first thing that you should read to learn about Minitab. This
142-page PDF file is practi cally the beginner’s guide to the most commonly used features in Minitab. The book is very well written with examples and snap shots of computer screens, such that it is quite fun to read. A good way to read this book is to have the Minitab software open in your computer at the same time. That way you can practice while you go through the book. ‘Meet Minitab15’ can be downloaded from the
Minitab website (6).
(2) Minitab build-in electronic manuals – The build-in manuals in Minitab is one of the best among those of all statistical software packages. If you got questions while using
Minitab, most of the time, you can find your answers here. I personally found two manuals are extremely helpful - t he ‘Help’ manual that helps you use the software, and the ‘StatGuide’ manual that helps you understand the analysis results. Other build-in manuals include ‘Tutorial’, ‘Methods and Formulas’, etc, which should also be very useful.
(3) There are all sorts of websites where you can find helpful information regarding your specific needs in Minitab. Most of the time, you can find them by a simple google search.
Reference
All information from web links are present at the time when this report is written (April,
2010)
(1) http://en.wikipedia.org/wiki/Minitab
(2) Using Minitab for teaching statistics in higher education. John Eales and Julian
Stander. MSOR Connections. Vol 9 No 3.
(3) www.minitab.com
(4) http://www.stat.purdue.edu/~jennings/stat582/datasets/index.html
(5) http://www.austincc.edu/mparker/1342/tf/mm/Appendix.pdf
(6) http://www.minitab.com/en-
US/products/minitab/documentation.aspx?langType=1033
5
2. SPSS
Introduction
SPSS software was developed by Norman H. Nie and C. Hadlai Hull at Stanford
University. When it was first released in 1968, the package was mainly focused on academic research, and ‘SPSS’ stands for ‘Statistical Package for the Social Sciences’.
Today, SPSS is one of the most widely used statistical software in the world. It is used by survey companies, market researchers, health researchers, education researchers, government, and others (1). Based on the SPSS Inc. website, the software now has customers including all 50 U.S. state governments, 100% of the top U.S. universities, 22 top global commercial banks, 18 top property and casualty insurance companies in the
U.S., and 12 top global pharmaceutical companies (2). SPSS has evolved from its academia origin to a leading analytical tool for enterprises around the world.
As a modular software, SPSS has a ‘base’ system module, where you can perform most of the regular data management and statistical analysis functionalities such as descriptive statistics, commonly used tests, linear regression, ANOVA, etc. However, to perform more sophisticated functionalities, such as multivariate GLM, logistic regression, you need SPSS add-on modules. The link in (3) has a good summary on the SPSS base and its add-on modules regarding the specific statistical procedures they perform.
Specialties and advantages
(1) Easy to learn and easy to use. Due to the fact that SPSS is menu-driven, the software is very easy to use. Like Minitab, most of the functionalities in SPSS are organized into pull-down menus in a very intuitive way. Based on my own experience, the learning curves for SPSS and Minitab are similar. In fact, research has been done to compare user satisfactions between SPSS and Minitab among college students, and no significant difference was found (4).
(2) Strength in data management. One of the major advantages that make SPSS unique and succeed in social science is its user-friendly setting for data management.
Large amount of data can be handled in SPSS; Specifying or changing data attributes can be done by just several clicks; Variables and values can be easily labeled for future reference. The data management functionalities in SPSS will be further explored in later part of this report.
Suitability
Comparing to Minitab, I will say SPSS is generally stronger in statistical analysis, especially in some specific area, such as ANOVA-related procedures. The add-on modules give SPSS further flexibility and potentials to develop its capacities. However, for cutting-edge statistical analysis, SPSS is still not as strong a candidate as STATA and R. So SPSS is most suitable to you if your work involves large dataset, frequent data management, and intermediate/partially-advanced statistical analysis (5).
6
Functionalities
(1) Data importation and general data manipulation.
SPSS data files look very much like the spreadsheet in Excel or the worksheet in Minitab. The files usually have the extension of ‘.sav’ and are presented in the ‘Data editor’ window. SPSS can import dataset of almost all kinds of formats, including spreadsheet (e.g. Excel), Database (e.g.
Access), and Text. Excel file can be imported directly using the menu
(File>Open>Data), while Database and Text files can be brought in through importing wizards.
SPSS makes it very easy to manipulate data. There are two types of views for each data editor window. In the ‘data view’, you can edit your dataset by filling in missing values, deleting rows/columns, transposing dataset (Data>Transpose), etc. In the
‘variable view’, the attributes of each variable are listed and you can edit them directly.
For instance, you can specify the labels for the variables or the labels for their values, so that later in the ‘data view’ you can see what the variables are and what their values of ‘0’ or ‘1’ mean. You can also change variable name, variable type, decimals of the variable values in the ‘variable view’.
Most data manipulation functionalities are collected in the ‘Data’ menu and the
‘Transform’ menu. For instance, you can transform any variable in your dataset by clicking ‘Transform>Compute variable’ and specify the transforming functions. For more information regarding data management, please refer to the SPSS user’s guide or the build-in help manual.
(2) Simple statistics and testing.
Most of the simple statistical functionalities and testing proced ures can be found in the ‘Analyze’ pull-down menu. For instance, descriptive statistics, such as ‘mean’ and ‘variance’ of the data, can be calculated through ‘Analyze>Descriptive Statistics>Descriptive’. For a chi-squar test to examine association, you can find the functionality in
‘Analyze>Descriptive Statistics>Crosstabs’.
(3) Statistical analysis using different models.
SPSS can analyze data using linear regression model or nonlinear regression model (You need the ‘regression’ add-on module to perform non-linear regression). These functionalities are in
‘Analyze>Regression>Linear’ and ‘Analyze>Regression>Nonlinear’ menus. For instance, to build a linear regression model, you can specify the dependent variable and the independent variable(s) in the
‘linear regression’ dialog box. You can also specify the model selection method (forward, backward, stepwise, etc.) and the WLS weight in the box. By doing so, the criteria for modeling are set and SPSS will return the modeling results including the parameter estimates and their significance.
If you want to perform ANOVA analysis for categorical data, select through
‘Analyze>General Linear Model>Univariate’, where you can specify ‘dependent variable’,
‘fixed factors’, ‘interactions’ and etc. You can also customize your model by directly typing it into the dialog box.
7
Other more complicated models can also be used for analyzing data in SPSS. For instance, a random effect model can be applied to data analysis by ‘Analyze>Mixed models>Linear’ menu. In the dialog box, one can specify the variables of random effect subject and of the repeated measurement to build the model.
Some sophisticated models are not covered by menus and you have to run commands to do the analysis. For instance, to build a nested ANOVA model, you can first generate the model using menus and specify all criteria you want, except the ‘nested variable’.
Then you click ‘paste’, so that the procedure you just performed using menus will appear in the syntax window as commands. Now you can specify the nested variable in the ‘/design’ subcommand. After that you can run the composed program to generate the nested ANOVA model.
(4) Graphics.
SPSS can generate a large variety of graphs for data exploration and result presentation. For example, if you want to explore whether a group of values are normally distributed, you can generate a QQ-plot for those values. Figure 3a shows a
QQplot based on the values of the ‘ignition delay’ variable in the ‘diesel’ data. It was generated through ‘Analyze>Descriptive statistics>QQ plots’ menu.
Some graphing functionalities are incorporated in analysis procedures. For instance, in the process of linea r regression using ‘alcohol’ as independent variable and ‘ignition delay’ as dependent variable, one can ask SPSS to produce graphs for testing normality assumption by selecting the corresponding options in the dialog box. Figure 3b shows the histogram of residuals resulting from the regression. a. b.
Figure 3a.
QQplot for the ‘ignition delay’ variable in the ‘diesel’ data.
3b.
Histogram for assumption testing based on the ‘diesel’ data.
It is easy to edit graphs in SPSS. By double clicking the graphs in the ‘statistics viewer’ window, a separate window ca lled ‘chart editor’ will appear. In ‘chart editor’, you can
8
perform all kinds of modifications to your graphs, including changing text and color, adding footnote and data label, etc.
Programming in SPSS
SPSS can also run under programmed commands. In fact, although most of the functionalities driven by SPSS commands are also accessible through pull-down menus, some procedures and options can only be performed under commands. The advantage of using commands is that you can save the program and re-run it in the future.
SPSS commands are written and edited in a separate window called ‘syntax editor’. As an example, let’s perform linear regression of ‘ignition delay’ in response to ‘alcohol’, as we did using Minitab in part 1 of this report. Open the syntax editor window by selecting through ‘File>New>Syntax’ and we type in the following commands:
Regression
/dependent ignition
/method=enter alcohol
In this syntax, ‘regression’ is the command, followed by two subcommands (initiated with a ‘/’). The response variable is specified after ‘dependent’, and the predicting variable is specified after ‘method=enter’. The following is part of the SPSS output in response to the commands:
Coefficients a
Model
Unstandardized Coefficients
B Std. Error
Standardized
Coefficients
Beta t Sig.
1 (Constant) .737 .064 11.485 .000 alcohol (mass %) .005 .001 .454 3.422 .001 a. Dependent Variable: ignition delay (Cao)
The parameter estimates and their significance levels (in red) we get here are the same as what we got in the Minitab report.
Another great feature in SPSS command is its auto-completion control. After a command, type in a subcommand indicator ‘/’, and then press ‘Control+Spacebar’, the options for subcommand will show up for you to choose (figure 4). This feature is very helpful when you are unsure about the syntax of the functionality you want to perform.
9
Figure 4.
The auto-completion feature in SPSS programming.
Helpful Resources
(1) SPSS Statistics Base 17.0 Users Guide (6). This is the guide that I started with in learning SPSS. I found it very helpful because it introduced me to the most commonly used procedures in the software. The guide has two versions
– a long and detailed one and a brief one. My suggestion is to start with the brief one, just to get a taste of what this software is like. Then later you can look into the detailed guide for more specifics.
(2) SPSS build-in help manual. For help on specific questions or problems while you work with SPSS, the build-in electronic manual is very handy. In the ‘Help’ menu, the
‘topics’ submenu covers most of the procedures in the software regarding their functions and implementations. The ‘Tutorial’ submenu illustrates how you can use the basic features. The ‘Statistics Coach’ submenu asks what you want to do and helps you choose the most appropriate procedure that meets your specific needs.
(3) Online resources:
• comp.soft-sys.stat.spss newsgroup: ( http://groups.google.com/group/comp.softsys.stat.spss/topics?gvc=2 ). An active google group where people ask and answer questions about specific problems while using SPSS.
• Raynald’s SPSS Tools: ( http://www.spsstools.net/ ) a very resourceful website set up by Raynald Levesque
, the author of ‘SPSS Programming and Data Management’ published by SPSS (7). The website obviously needs to be updated since some of the links do not work anymore. However, the available information is still in large amount.
This website is especially helpful if you are interested in programming in SPSS.
• UCLA Academic Technology Services: ( http://www.ats.ucla.edu/stat/spss/ ). Great website for SPSS learners with many data analysis examples.
Reference
All information from web links is present at the time when this report is written (April,
2010).
(1) http://en.wikipedia.org/wiki/SPSS
10
(2) http://www.spss.com/success/
(3) http://faculty.chass.ncsu.edu/garson/PA765/spssmodules.htm
(4) An empirical comparison of student user-satisfaction between SPSS and Minitab. M.
Feinberg and J. Siekpe. College Student Journal, Dec, 2003.
(5) SAS, Stata, SPSS: A Comparison. AC. Acock. Journal of Marriage and Family,
2005. 67 (4), 1093-1095.
(6) http://support.spss.com/ProductsExt/SPSS/Documentation/SPSSforWindows/index.h
tml
(7) http://www.spss.com/sites/dm-book/
11