Dan Dillon Homework Problem #7 – Part 2 STAT 582, Statistical Consulting and Collaboration Dr. Jennings Evaluation of STATISTICA Software STATISTICA is published by StatSoft, Inc. (Tulsa, OK), a company founded in 1984. The company claims to be “one of the largest global providers of analytic software worldwide”, including 23 offices world-wide.1 The latest version of the software is STATISTICA 9. The software targets large users, stating the product lines are “enterprise-wide, scalable, fully Web-enabled distributed processing systems”1 used in Manufacturing, Power Generation, Semiconductors, Pharmaceutical, Chemical, Petrochemical, Food Processing, Automotive, Heavy Equipment, Insurance, Telecom, and R&D.1 The software particularly targets FDA-regulated industries by stating that it complies with 21 CFR 11 (important regulatory requirements for software used in FDA-regulated environments.1 Another evidence of targeting large users is that there are no prices listed in the catalogue; instead the reader is told to “Request Price from StatSoft”.2 Twenty-five different versions are available, from STATISTICA Base to STATISTICA Multivariate Statistical Process Control (MSPC) to WebSTATISTICA Server Applications.2 I was unable to determine the system requirements, but the trial version worked on my approximately 8 year-old, low-end laptop computer with Window XP. The system is available in 32 and 64 bit options. Major Statistical Topics The STATISTICA Base package includes:2 1 2 Descriptive Statistics, Breakdowns, and Exploratory Data Analysis Correlations Interactive Probability Calculator T-Tests (and other tests of group differences) Frequency Tables, Crosstabulation Tables, Stub-and-Banner Tables, Multiple Response Analysis Multiple Regression Methods Nonparametric Statistics http://www.statsoft.com/company/. Accessed April 30, 2010. http://www.statsoft.com/products/statistica-product-catalog/. Accessed April 30, 2010. Dan Dillon, STAT 582 Distribution Fitting Enhanced graphics technology Powerful query tools Flexible data management ANOVA [supports 4 between factors and 1 within (repeated measure) factor] STATISTICA Visual Basic Language, and more. The STATISTICA Advanced package includes everything in the base package, plus:2 STATISTICA Multivariate Exploratory Techniques STATISTICA Advanced Linear/Nonlinear Models Cluster Analysis Techniques Factor Analysis and Principle Components Canonical Correlation Analysis Reliability/Item Analysis Classification Trees Correspondence Analysis Multidimensional Scaling Discriminant Analysis General Discriminant Analysis Models Automatic model selection Variance components and time series methods; Distribution and Simulation Variance Components and Mixed Model ANOVA/ANCOVA Survival/Failure Time Analysis General Nonlinear Estimation (and Logit/Probit) Log-Linear Analysis Time Series Analysis, Forecasting Structural Equation Modeling/Path Analysis (SEPATH) General Linear Models (GLM) General Regression Models (GRM) Generalized Linear/Nonlinear Models (GLZ) Partial Least Squares (PLS) STATISTICA Power Analysis and Interval Estimation Power Calculations Sample Size Calculations Interval Estimation Probability Distribution Calculators, and more. Page 2 of 12 Dan Dillon, STAT 582 User Interface and Output The opening screen mimics a spreadsheet. See Figure 1. Figure 1. Opening screen – spreadsheet. The toolbar has many general-purpose buttons that are identical to Microsoft Word, but the purpose of many of the buttons is unclear. Figure is a close-up of the right hand side of the toolbar. Figure 2. Close-up of right side of tool-bar. One can enter the data directly or import the data. Importing is accomplished by the File-Open command, rather than a separate Import command. Formats from the following programs are supported: *.css, *.csv, dBase, Excel (including *.xlsx), *.htm, JMP, Lotus, Minitab, Quattro, *.rtf, SAS, SPPS, *.scr, *.smx, *.sta, *.str, and *.txt. In addition, one may run native R programs from inside STATISTICA.3 I tested three different basic routines – graphing a scatterplot, linear regression and ANOVA – to get a feel for the software. Graphing I found the graphical interface difficult to use. I chose to graph a scatterplot with multiple categories overlaid. Selecting Graphs-Scatterplot displays tabbed dialog box (Figure 3). 3 http://www.statsoft.com/Portals/0/Support/Download/Brochures/STATISTICA.pdf. Accessed April 30, 2010. Page 3 of 12 Dan Dillon, STAT 582 Figure 3. Scatterplot dialog box. It was not obvious to me that I should not choose Graph type: multiple and that I should not choose By Group. Instead I needed to go to the Categorized tab and select the variable that represented the category (in this case, Var 1) and choose other values, as shown in Figure 4. The help features were not as helpful as they should be and interface and terminology is not obvious. Figure 4. Scatterplot dialog box – Categorized tab. Page 4 of 12 Dan Dillon, STAT 582 The resulting graph (Figure 5) was serviceable, but the title section included several lines of unasked for and marginally useful information (the first line of the title of the graph was added by using the Options 1 tab in the Scatterplot dialog box). Figure 5. Scatterplot – first draft. Figure 6 shows the same graph after reformatting various features. The reformatting interaction is similar to that found in Microsoft Excel – click on object and a dialog box opens - but not as versatile and with fewer options. Figure 6. Scatterplot – final draft. Page 5 of 12 Dan Dillon, STAT 582 Simple Linear Regression The data for linear regression was imported from a comma-delimited text file. Chooesing Statistics-Multiple Regression from the menu (there is no option for Simple Linear Regression) displays the following dialog box (Figure 7): Figure 7. Multiple regression dialog box. Clicking on Variables allows one to choose the dependent and independent variables (Var2 and Var1, in this case). Clicking on OK runs the analysis. The resulting display is not particularly attractive and rather sparse. See Figure 8. Page 6 of 12 Dan Dillon, STAT 582 Figure 8. First regression results. However, clicking on the Advanced tab yields buttons for additional results. See Figure 9. Figure 9. Advanced tab on regression results display. Page 7 of 12 Dan Dillon, STAT 582 Figure 10 is a screenshot after several of the buttons have been clicked. Figure 10. Multiple-tabbed results output. Note the tabs at the bottom of the screen allow viewing of different results from the analysis. The far left has an appearance similar to SAS output screens. Figure 11 shows that the output can be sent to a Microsoft Word document. Figure 11. Output Manager. Page 8 of 12 Dan Dillon, STAT 582 However, it was not obvious how to access the document. I managed to find it by closing down the worksheet. I had another Word document open at the time, but it appeared that the STATISTICA document was contained within the program. Note in Figure 12 the menu options and status bar are not typical for Microsoft Word. Figure 12. Microsoft Word output. ANOVA One-way ANOVA worked similarly. See Figure 13 for the opening dialog box after choosing Statistics-ANOVA from the main menu. ANOVA and MANOVA are combined. Figure 13. ANOVA/MANOVA opening screen. Page 9 of 12 Dan Dillon, STAT 582 After selecting One-way ANOVA and choosing the variables (using a dialog box similar to one in Figure 7), the following dialog box appears (Figure 14). Figure 14. One-way ANOVA dialog box. Figure 15 shows the output when All effects/Graphs is selected. Figure 16 shows the output when Univariate results is chosen under the Summary tab. Page 10 of 12 Dan Dillon, STAT 582 Figure 15. All effects/Graphs output. Figure 16. Univariate results output. Further detail is deemed unnecessary, as the basic feel and processes are similar to those for linear regression. Page 11 of 12 Dan Dillon, STAT 582 Conclusion The STATISTICA program has a strong advantage over SAS in that the initial presentation has the look and feel of Microsoft Excel – menus and spreadsheets allow for easy initial selection of tasks. However, producing specific output was tedious and not very intuitive. Also, the output is piecemeal – one must select every option above the absolute basic analysis. I would much prefer to have more analyses and output and then select or read what I wanted, rather than choosing each piece of analysis and output every time. The analyses performed for this paper were relatively simple. I suspect that the program has a great deal of power and allows for much customization – probably at the level of SAS, but accessing it appears difficult. Page 12 of 12