David Young, STAT 582, 5-1-10 STATISTICA Introduction: STATISTICA is a software package that deals with statistics and analytics, produced by StatSoft. Some of the applications that STATISTICA provides include data analysis, data management, data mining, and data visualization. The base version of STATISTICA contains standard statistical procedures including descriptive and summary statistics, exploratory data analysis, correlations, probability calculators, group difference tests including t-tests, ANOVA, and non-parametric tests, frequency tables and cross-tabs, multiple response analysis, regression methods including multiple regression and logistic regression, non-parametric studies, and distribution fitting. Beyond the base version, STATISTICA houses many other additional modules that contain more advanced features including cluster analysis, simulation, power calculations, neural networks, quality control, design of experiments, and data mining. Beyond producing numerical results and tables, STATISTICA also provides its users with the ability to produce two and three dimensional graphic visualizations for data exploration, analysis, and presentation. One advantage of STATISTICA is that it combines a user-friendly platform with powerful statistical computation. It is also one of the more cost-effective and easily customized statistical software solutions available. Starting the program simply involves loading a table of data and applying the easy to navigate functions which are driven by pull-down menus. Within these menus users may tell STATISTICA which variables and types of analysis they wish to conduct, then explore their results in graphical or tabular outputs. Throughout this paper we shall explore several of the functions that STATISTICA is readily able to handle. The list below contains the categories of statistical data analysis that we shall investigate. Descriptive and Basic Statistics Tests of Differences (ANOVA and Kruskal Wallis test) Regression Power Analysis Quality Control Descriptive and Basic Statistics: Some basic statistics any researcher will need to explore include variable distributions and correlations. These along with many other descriptive measures are provided in the framework of STATISTICA. Once a user inputs their data table, they can simply choose whichever statistical summary reports and graphs that they would like to learn more about, and STATISTICA will output these data quandaries. Figure 1a below presents the density and distribution function of a variable. Figure 1b shows a histogram of the data to give the user a feel for how the data is distributed. Figure 1c shows an example of some scatterplots to gauge the pairwise correlation relationships between a set of variables. Figure 1a Figure 1b Figure 1c Tests of Differences: One of the main methods of comparing tests of difference is Analysis of Variance. To perform a one-way ANOVA in STATISTICA, a user can simply follow the steps provided: Step 1: Choose the Statistics Option from the Menu Bar Step 2: Step 3: Step 4: Step 5: Step 6: Choose Statistics/Tables Choose Breakdown and One-Way ANOVA and click “OK” Go to Individual Tables tab and click “Variables” Select dependent and grouping variables and click “OK” Click on Analysis of Variance Following these simple steps will provide the user with an ANOVA table. Choosing other options within this menu can also provide plots. Figure 2 below shows an example of an interaction plot for the set of variables. Figure 2 The Kruskal Wallis test is an example of a non-parametric test of differences. To perform the Kruskal Wallis test in STATISTICA for a one-way test of differences, the user may follow the steps below: Step 1: Choose the Statistics Option from the Menu Bar Step 2: Choose Nonparametrics Step 3: Choose Comparing multiple indep. Samples (Groups) Step 4: Click “Variables” Step 5: Select dependent and grouping variables and click “OK” Step 6: Click on Summary Regression: To perform ordinary least squares regression on a set of data, a user can follow the steps below: Step 1: Step 2: Step 3: Step 4: Choose the Statistics Option from the Menu Bar Choose Multiple Regression Choose dependent and independent variables and click “OK” Go to Quick tab and click on “Summary: Regression results” This process will provide output that includes parameter estimates with standard errors and p-values, along with an ANOVA table. Figure 3 below shows a scatter plot with a trend line and confidence intervals, along with other important regression information. Figure 3 Power Analysis: Power analysis is important to consider when designing an experiment. Whether the researcher is concerned with designing an experiment with a large enough sample size, or if a post-hoc analysis needs to be conducted to help determine why significant results are not being achieved, it is often worthwhile to conduct a power analysis. STATISTICA allows a user to compute sample sizes and power estimates with a quick input of appropriate table. Figure 4 below shows a Power curve for various sample sizes and power levels, based on certain levels of expected differences, type I error rates, etc. Figure 4 Quality Control: The final type of analysis we will investigate in this primer is that of quality control. In quality control we are often interested in how the average value and ranges of a specific measurement changes in time and across samples. Figure 5 below shows both an x-bar and R chart for a specific variable. These charts also show histograms to determine the distribution of the sample averages and ranges. Figure 5 Another Important aspect of Quality control involves the exploration of Operating Characteristic Curves. These are used when a researcher or quality control engineer wants to investigate the likelihood of finding a sample outside a set of calculated control limits. Figure 6 below shows an example of an OC curve for an x-bar chart for a set of data. Figure 6 Conclusion: Beyond the types of analyses discussed in this paper, STATISTICA provides a wide range of statistical analysis similar in magnitude to that of SPSS. The ease and quickness of STATISTICA to provide statistical results makes this software a practical and helpful tool. The one down side of STATISTICA is that compared to its contemporaries of SPSS and Minitab, documentation is not as readily available. Sources Used: http://www.statsoft.com/textbook/ http://en.wikipedia.org/wiki/Statistica http://www.ats.ucla.edu/stat/Statistica/notes/default.htm