Statistica

advertisement
David Young, STAT 582, 5-1-10
STATISTICA
Introduction:
STATISTICA is a software package that deals with statistics and analytics, produced by StatSoft.
Some of the applications that STATISTICA provides include data analysis, data management, data
mining, and data visualization. The base version of STATISTICA contains standard statistical procedures
including descriptive and summary statistics, exploratory data analysis, correlations, probability
calculators, group difference tests including t-tests, ANOVA, and non-parametric tests, frequency tables
and cross-tabs, multiple response analysis, regression methods including multiple regression and logistic
regression, non-parametric studies, and distribution fitting. Beyond the base version, STATISTICA
houses many other additional modules that contain more advanced features including cluster analysis,
simulation, power calculations, neural networks, quality control, design of experiments, and data
mining. Beyond producing numerical results and tables, STATISTICA also provides its users with the
ability to produce two and three dimensional graphic visualizations for data exploration, analysis, and
presentation.
One advantage of STATISTICA is that it combines a user-friendly platform with powerful
statistical computation. It is also one of the more cost-effective and easily customized statistical
software solutions available. Starting the program simply involves loading a table of data and applying
the easy to navigate functions which are driven by pull-down menus. Within these menus users may tell
STATISTICA which variables and types of analysis they wish to conduct, then explore their results in
graphical or tabular outputs.
Throughout this paper we shall explore several of the functions that STATISTICA is readily able to
handle. The list below contains the categories of statistical data analysis that we shall investigate.
 Descriptive and Basic Statistics
 Tests of Differences (ANOVA and Kruskal Wallis test)
 Regression
 Power Analysis
 Quality Control
Descriptive and Basic Statistics:
Some basic statistics any researcher will need to explore include variable distributions and
correlations. These along with many other descriptive measures are provided in the framework of
STATISTICA. Once a user inputs their data table, they can simply choose whichever statistical summary
reports and graphs that they would like to learn more about, and STATISTICA will output these data
quandaries. Figure 1a below presents the density and distribution function of a variable. Figure 1b
shows a histogram of the data to give the user a feel for how the data is distributed. Figure 1c shows an
example of some scatterplots to gauge the pairwise correlation relationships between a set of variables.
Figure 1a
Figure 1b
Figure 1c
Tests of Differences:
One of the main methods of comparing tests of difference is Analysis of Variance. To perform a
one-way ANOVA in STATISTICA, a user can simply follow the steps provided:
Step 1: Choose the Statistics Option from the Menu Bar
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Choose Statistics/Tables
Choose Breakdown and One-Way ANOVA and click “OK”
Go to Individual Tables tab and click “Variables”
Select dependent and grouping variables and click “OK”
Click on Analysis of Variance
Following these simple steps will provide the user with an ANOVA table. Choosing other options within
this menu can also provide plots. Figure 2 below shows an example of an interaction plot for the set of
variables.
Figure 2
The Kruskal Wallis test is an example of a non-parametric test of differences. To perform the Kruskal
Wallis test in STATISTICA for a one-way test of differences, the user may follow the steps below:
Step 1: Choose the Statistics Option from the Menu Bar
Step 2: Choose Nonparametrics
Step 3: Choose Comparing multiple indep. Samples (Groups)
Step 4: Click “Variables”
Step 5: Select dependent and grouping variables and click “OK”
Step 6: Click on Summary
Regression:
To perform ordinary least squares regression on a set of data, a user can follow the steps below:
Step 1:
Step 2:
Step 3:
Step 4:
Choose the Statistics Option from the Menu Bar
Choose Multiple Regression
Choose dependent and independent variables and click “OK”
Go to Quick tab and click on “Summary: Regression results”
This process will provide output that includes parameter estimates with standard errors and p-values,
along with an ANOVA table.
Figure 3 below shows a scatter plot with a trend line and confidence intervals, along with other
important regression information.
Figure 3
Power Analysis:
Power analysis is important to consider when designing an experiment. Whether the researcher
is concerned with designing an experiment with a large enough sample size, or if a post-hoc analysis
needs to be conducted to help determine why significant results are not being achieved, it is often
worthwhile to conduct a power analysis. STATISTICA allows a user to compute sample sizes and power
estimates with a quick input of appropriate table. Figure 4 below shows a Power curve for various
sample sizes and power levels, based on certain levels of expected differences, type I error rates, etc.
Figure 4
Quality Control:
The final type of analysis we will investigate in this primer is that of quality control. In quality
control we are often interested in how the average value and ranges of a specific measurement changes
in time and across samples. Figure 5 below shows both an x-bar and R chart for a specific variable.
These charts also show histograms to determine the distribution of the sample averages and ranges.
Figure 5
Another Important aspect of Quality control involves the exploration of Operating Characteristic Curves.
These are used when a researcher or quality control engineer wants to investigate the likelihood of
finding a sample outside a set of calculated control limits. Figure 6 below shows an example of an OC
curve for an x-bar chart for a set of data.
Figure 6
Conclusion:
Beyond the types of analyses discussed in this paper, STATISTICA provides a wide range of
statistical analysis similar in magnitude to that of SPSS. The ease and quickness of STATISTICA to provide
statistical results makes this software a practical and helpful tool. The one down side of STATISTICA is
that compared to its contemporaries of SPSS and Minitab, documentation is not as readily available.
Sources Used:
http://www.statsoft.com/textbook/
http://en.wikipedia.org/wiki/Statistica
http://www.ats.ucla.edu/stat/Statistica/notes/default.htm
Download