Statistica

advertisement
Dan Dillon
Homework Problem #7 – Part 2
STAT 582, Statistical Consulting and Collaboration
Dr. Jennings
Evaluation of STATISTICA Software
STATISTICA is published by StatSoft, Inc. (Tulsa, OK), a company founded in 1984. The
company claims to be “one of the largest global providers of analytic software worldwide”,
including 23 offices world-wide.1
The latest version of the software is STATISTICA 9. The software targets large users, stating the
product lines are “enterprise-wide, scalable, fully Web-enabled distributed processing systems”1
used in Manufacturing, Power Generation, Semiconductors, Pharmaceutical, Chemical,
Petrochemical, Food Processing, Automotive, Heavy Equipment, Insurance, Telecom, and
R&D.1 The software particularly targets FDA-regulated industries by stating that it complies
with 21 CFR 11 (important regulatory requirements for software used in FDA-regulated
environments.1 Another evidence of targeting large users is that there are no prices listed in the
catalogue; instead the reader is told to “Request Price from StatSoft”.2
Twenty-five different versions are available, from STATISTICA Base to STATISTICA
Multivariate Statistical Process Control (MSPC) to WebSTATISTICA Server Applications.2 I was
unable to determine the system requirements, but the trial version worked on my approximately
8 year-old, low-end laptop computer with Window XP. The system is available in 32 and 64 bit
options.
Major Statistical Topics
The STATISTICA Base package includes:2







1
2
Descriptive Statistics, Breakdowns, and Exploratory Data Analysis
Correlations
Interactive Probability Calculator
T-Tests (and other tests of group differences)
Frequency Tables, Crosstabulation Tables, Stub-and-Banner Tables, Multiple Response
Analysis
Multiple Regression Methods
Nonparametric Statistics
http://www.statsoft.com/company/. Accessed April 30, 2010.
http://www.statsoft.com/products/statistica-product-catalog/. Accessed April 30, 2010.
Dan Dillon, STAT 582






Distribution Fitting
Enhanced graphics technology
Powerful query tools
Flexible data management
ANOVA [supports 4 between factors and 1 within (repeated measure) factor]
STATISTICA Visual Basic Language, and more.
The STATISTICA Advanced package includes everything in the base package, plus:2

STATISTICA Multivariate Exploratory Techniques










STATISTICA Advanced Linear/Nonlinear Models














Cluster Analysis Techniques
Factor Analysis and Principle Components
Canonical Correlation Analysis
Reliability/Item Analysis
Classification Trees
Correspondence Analysis
Multidimensional Scaling
Discriminant Analysis
General Discriminant Analysis Models
Automatic model selection
Variance components and time series methods;
Distribution and Simulation
Variance Components and Mixed Model ANOVA/ANCOVA
Survival/Failure Time Analysis
General Nonlinear Estimation (and Logit/Probit)
Log-Linear Analysis
Time Series Analysis, Forecasting
Structural Equation Modeling/Path Analysis (SEPATH)
General Linear Models (GLM)
General Regression Models (GRM)
Generalized Linear/Nonlinear Models (GLZ)
Partial Least Squares (PLS)
STATISTICA Power Analysis and Interval Estimation




Power Calculations
Sample Size Calculations
Interval Estimation
Probability Distribution Calculators, and more.
Page 2 of 12
Dan Dillon, STAT 582
User Interface and Output
The opening screen mimics a spreadsheet. See Figure 1.
Figure 1. Opening screen – spreadsheet.
The toolbar has many general-purpose buttons that are identical to Microsoft Word, but the
purpose of many of the buttons is unclear. Figure is a close-up of the right hand side of the
toolbar.
Figure 2. Close-up of right side of tool-bar.
One can enter the data directly or import the data. Importing is accomplished by the File-Open
command, rather than a separate Import command. Formats from the following programs are
supported: *.css, *.csv, dBase, Excel (including *.xlsx), *.htm, JMP, Lotus, Minitab, Quattro,
*.rtf, SAS, SPPS, *.scr, *.smx, *.sta, *.str, and *.txt. In addition, one may run native R
programs from inside STATISTICA.3
I tested three different basic routines – graphing a scatterplot, linear regression and ANOVA – to
get a feel for the software.
Graphing
I found the graphical interface difficult to use. I chose to graph a scatterplot with multiple
categories overlaid. Selecting Graphs-Scatterplot displays tabbed dialog box (Figure 3).
3
http://www.statsoft.com/Portals/0/Support/Download/Brochures/STATISTICA.pdf. Accessed April 30, 2010.
Page 3 of 12
Dan Dillon, STAT 582
Figure 3. Scatterplot dialog box.
It was not obvious to me that I should not choose Graph type: multiple and that I should not
choose By Group. Instead I needed to go to the Categorized tab and select the variable that
represented the category (in this case, Var 1) and choose other values, as shown in Figure 4. The
help features were not as helpful as they should be and interface and terminology is not obvious.
Figure 4. Scatterplot dialog box – Categorized tab.
Page 4 of 12
Dan Dillon, STAT 582
The resulting graph (Figure 5) was serviceable, but the title section included several lines of
unasked for and marginally useful information (the first line of the title of the graph was added
by using the Options 1 tab in the Scatterplot dialog box).
Figure 5. Scatterplot – first draft.
Figure 6 shows the same graph after reformatting various features. The reformatting interaction
is similar to that found in Microsoft Excel – click on object and a dialog box opens - but not as
versatile and with fewer options.
Figure 6. Scatterplot – final draft.
Page 5 of 12
Dan Dillon, STAT 582
Simple Linear Regression
The data for linear regression was imported from a comma-delimited text file. Chooesing
Statistics-Multiple Regression from the menu (there is no option for Simple Linear Regression)
displays the following dialog box (Figure 7):
Figure 7. Multiple regression dialog box.
Clicking on Variables allows one to choose the dependent and independent variables (Var2 and
Var1, in this case). Clicking on OK runs the analysis. The resulting display is not particularly
attractive and rather sparse. See Figure 8.
Page 6 of 12
Dan Dillon, STAT 582
Figure 8. First regression results.
However, clicking on the Advanced tab yields buttons for additional results. See Figure 9.
Figure 9. Advanced tab on regression results display.
Page 7 of 12
Dan Dillon, STAT 582
Figure 10 is a screenshot after several of the buttons have been clicked.
Figure 10. Multiple-tabbed results output.
Note the tabs at the bottom of the screen allow viewing of different results from the analysis.
The far left has an appearance similar to SAS output screens. Figure 11 shows that the output can
be sent to a Microsoft Word document.
Figure 11. Output Manager.
Page 8 of 12
Dan Dillon, STAT 582
However, it was not obvious how to access the document. I managed to find it by closing down
the worksheet. I had another Word document open at the time, but it appeared that the
STATISTICA document was contained within the program. Note in Figure 12 the menu options
and status bar are not typical for Microsoft Word.
Figure 12. Microsoft Word output.
ANOVA
One-way ANOVA worked similarly. See Figure 13 for the opening dialog box after choosing
Statistics-ANOVA from the main menu. ANOVA and MANOVA are combined.
Figure 13. ANOVA/MANOVA opening screen.
Page 9 of 12
Dan Dillon, STAT 582
After selecting One-way ANOVA and choosing the variables (using a dialog box similar to one in
Figure 7), the following dialog box appears (Figure 14).
Figure 14. One-way ANOVA dialog box.
Figure 15 shows the output when All effects/Graphs is selected. Figure 16 shows the output
when Univariate results is chosen under the Summary tab.
Page 10 of 12
Dan Dillon, STAT 582
Figure 15. All effects/Graphs output.
Figure 16. Univariate results output.
Further detail is deemed unnecessary, as the basic feel and processes are similar to those for
linear regression.
Page 11 of 12
Dan Dillon, STAT 582
Conclusion
The STATISTICA program has a strong advantage over SAS in that the initial presentation has the
look and feel of Microsoft Excel – menus and spreadsheets allow for easy initial selection of
tasks. However, producing specific output was tedious and not very intuitive. Also, the output is
piecemeal – one must select every option above the absolute basic analysis. I would much prefer
to have more analyses and output and then select or read what I wanted, rather than choosing
each piece of analysis and output every time. The analyses performed for this paper were
relatively simple. I suspect that the program has a great deal of power and allows for much
customization – probably at the level of SAS, but accessing it appears difficult.
Page 12 of 12
Download