Dan Dillon Homework Problem #7 – Part 1 STAT 582, Statistical Consulting and Collaboration Dr. Jennings Evaluation of NCSS Software NCSS is published by NCSS, LLC (Kaysville, Utah), a company founded in 1981. Apparently the company is relatively small and/or has a significant interest in customer service because their website states, “When you call NCSS, chances are Dr. Jerry Hintze [Ph.D. statistician, NCSS President] will answer your questions”.1 The latest version of the software is NCSS 2007. One might suppose that it the software is targeted to applied statisticians in the life sciences, because the other two products the company offers are PASS 2008 (Power Analysis and Sample Size, “…the best power analysis and sample size tool on the market. No other sample size program covers as many statistical procedures.”2) and GESS 2006 (Gene Expression Statistical System for Microarrays). In addition, the software has specific functions for real estate appraisers, business analytics (e.g. forecasting), industrial quality control and pharmaceutical development (e.g., bioequivalence). The software appears to target single users or small group purchases (as opposed to “enterprise” systems), as the price list does not mention bulk discounts (but does mention “custom quotes” and “multi-user discounts”) and instead focuses on “a price much lower than the competition’s.”3 The licenses are single-user, perpetual licenses with lifetime phone and email support.4 The system uses a Windows-style GUI (menus and buttons) and is compatible with Windows versions from 95 to Vista, and both 32- and 64-bit operating systems. It requires a Pentium-class processor, 32 MB of RAM, 200 MB of hard disk space and Adobe Reader® version 7.3 1 http://www.ncss.com/about_ncss.html. Accessed April 27, 2010. http://www.ncss.com/index.htm. Accessed April 27, 2010. 3 http://www.ncss.com/ncss.html. Accessed April 27, 2010. 4 https://www.ncssorders.com/ncss_pricelist.asp?Pricing=Academic, Accessed April 27, 2010. 2 Dan Dillon, STAT 582 Major Statistical Topics in NCSS The following are the major areas covered by NCSS:5 Analysis of Variance Descriptive Statistics Multivariate I Repeated Measures Appraisal Methods Design of Experiments Multivariate II ROC Curves Binary Diagnostic Tests Forecasting Proportions Survival Analysis Charts and Graphs General Linear Models Quality Control Time Series Analysis Cross Tabulation Meta-Analysis Regression Analysis T-Tests Curve Fitting Mixed Models Reliability Analysis There are over 180 distinct statistical and graphics procedures. Some of the more important ones (from my standpoint) are: 5 Numerous charts (including bar, error bar, and pie) and plots (including box, dot, histograms, percentile, scatter, and surface). Curve and distribution fitting functions (ratio of polynomials, exponential smoothing, extreme value fitting, beta, gamma, loglinear, lognormal). ANOVA tools (one-way, multiple and repeated measures, mixed models, general linear models). Various regression analyses (linear, logistic, multinomial logistic, Poisson, lognormal, “all-possible” regression search, non-linear, principal components, ridge, variable selection). Time-related analyses (survival (including Kaplan-Meier), life-table, longitudinal mixed models, time series). Various statistical tests (Chi-square, equivalence, Fisher's Exact, Mann-Whitney, MantelHaenszel, multiple comparison, normality, one-sample t-tests, paired t-tests, two-sample t-tests). Multivariate tools (including analysis of covariance, canonical correlation, clustering (hierarchical and K-means), discriminant analysis, factor analysis, MANOVA, multivariate and principal components analysis). Design tools (power calculations, balanced incomplete block, case-control, central composite designs, factorial, fractional factorial, Latin square, matched case-control, Placket-Burman, response surface and screening designs). http://www.ncss.com/ncss_procedures.html. Accessed April 27, 2010. Page 2 of 7 Dan Dillon, STAT 582 User Interface and Output The opening screen mimics a spreadsheet. See Figure 1. Figure 1. Opening Screen – Spreadsheet. One can enter the data directly or import the data. File formats from the following programs are supported: Access, BMDP, DBase, Epi Info, Excel (including *.xlsx), Gauss, JMP, LimDep, Lotus, MatLab, Minitab, Paradox, Quatro, SAS, SigmaPlot, Solo Dos, SPlus, SPPS, Stata, Statistica, Symphony, Systat, *.txt (ASCII fixed and delimited), *.html, and ODBC. Figure 2 is a close-up of the tool-bar. It appears intuitive for a moderately-trained statistician: separate buttons lead to different types of analysis (linear regression, multiple regression, 1-way ANOVA, GLM ANOVA, etc. Figure 2. Close-up of tool-bar. I tested three different basic routines – graphing a scatterplot, linear regression and ANOVA – to get a feel for the software. The graphics interface was easy and produced crisp graphics. See Figure 3. I especially liked the fact that one could directly edit and reformat the text on the upper part of the page and rescale the main graph as one saw fit. Page 3 of 7 Dan Dillon, STAT 582 Figure 3. Graphing Output. Simple linear regression was easily accomplished. Data was imported from a comma-delimited file. Figure 4 shows screen that appears after the Linear Regression button is pressed. Figure 4. Tabbed Dialog Box for Linear Regression. Page 4 of 7 Dan Dillon, STAT 582 Fourteen different tabs are available covering features such as which variables are to be analyzed, formatting of output, and types of graphs to be included. Bootstrap resampling is also included. The right hand side of the screen includes context sensitive help – it varies not only based on tab chosen but also which box is being selected (e.g., the Dependent Variable box). The Guide Me button will automatically move the cursor through most of the major options for all of the tabs and finally prompt the user to press Run. (The graphing routine described above had a similar tabbed dialog box, but with tabs and features unique to it.) Once the user has chosen all the desired options, pressing the Run button executes the procedure. The test runs was done without doing too much customization of output. Ten (10) pages of output were immediately produced, including several graphs and a 3 paragraph summary. Figure 5 is one page of the output. Page/Date/Time Database Y = C1 X = C2 5 4/29/2010 10:15:37 PM C:\DOCUMENTS AND SETTINGS\OWNER\DESKTOP\TEMP2.S0 Analysis of Variance Section Source Intercept Slope Error Lack of Fit Pure Error Adj. Total Total DF 1 1 25 13 12 26 27 Sum of Squares 8405.813 793.2805 491.5262 332.8328 158.6933 1284.807 9690.62 Mean Square 8405.813 793.2805 19.66105 25.60253 13.22444 49.41564 F-Ratio Prob Level Power (5%) 40.3478 0.0000 1.0000 1.9360 0.1311 s = Square Root(19.66105) = 4.434078 Notes: The above report shows the F-Ratio for testing whether the slope is zero, the degrees of freedom, and the mean square error. The mean square error, which estimates the variance of the residuals, is used extensively in the calculation of hypothesis tests and confidence intervals. Figure 5. Linear Regression Output. The ANOVA procedure worked in the much the same way, but, not surprisingly, fewer options: there were 5 tabs, entitled Variables, Reports, Box Plot, Means Plot and Template. Figure 6 contains excerpts from the output. Page 5 of 7 Dan Dillon, STAT 582 Analysis of Variance Report Page/Date/Time 1 4/30/2010 9:29:25 PM Database C:\DOCUMENTS AND SETTINGS\OWNER\DESKTOP\TEMP3.S0 Response C1 [Ed.: output deleted…] Box Plot Section Box Plot 45.00 C1 38.75 32.50 26.25 20.00 1 2 3 C2 [Ed.: output deleted…] Analysis of Variance Table Source Sum of Term DF Squares (Alpha=0.05) A: C2 2 672 S(A) 21 416 Total (Adjusted) 23 1088 Total 24 * Term significant at alpha = 0.05 Mean Square 336 19.80952 [Ed.: output deleted…] Plots of Means Section Means of C1 40.00 C1 35.00 30.00 25.00 20.00 1 2 3 C2 Figure 5. ANOVA Output. Page 6 of 7 F-Ratio 16.96 Prob Level Power 0.000041* 0.998982 Dan Dillon, STAT 582 Conclusion I found the software very easy to manipulate and very intuitive, with excellent graphics. It was my first experience with Windows-style statistical package. I’m very familiar with SAS and its coding requirements and somewhat familiar with R and its interactive features. I could easily unlearn both and learn this software for most of my needs. Page 7 of 7