JMP Statistical Software Package JMP was developed by SAS Institute Inc. to be used as a user-friendly interactive statistical package. It is used to allow users with to effectively explore their data with minimal statistics knowledge and training. The JMP statistical package really focuses on data visuals with graphical representation. It is considered a menu point and click pull-down interface and is quite easy to work with. Some of the menu bar options can be seen below in Figure 1. Figure 1.Menu picture from the “Introduction of Statistics” website. The following report on JMP is meant to overview some of the key highlights provided in the JMP package. The methods may differ depending on which version you may be using and it only discusses some of the basic Statistical Techniques. It should be used as a briefing of some of the many capabilities of JMP, but not as a sole source. We will focus on four basic areas involved in Statistical Analysis: Data Set-up, Data Summary, Graphics, and Basic Regression. A more detailed explanation of the following features can be found in the resources provided in the References Section. Creating, Importing, Saving, and Manipulating Data Before you can conduct analysis you must first provide JMP with a dataset using the data table window. This will be the main window used throughout your analysis. There are several ways to provide data. You can import it (i.e. text file, jmp file, excel) or create your own from an empty table. Once you have the data entered in, you must then classify each variable into one of three categories: continuous, nominal, ordinal. This can be done by clicking the column Headers of each variable and selecting Column Info. Here you can assign the different categories. Classify variables continuous (right aligned in the data window) if they are to be analyzed as numeric measurements and Ordinal or Nominal (left aligned in the data window) for Categorical measures. Specify them ordinal if you have ordered categories of either numeric or some type of character values and nominal if they are either numeric or character classifications. After classifying your variables as either numeric or categorical you can also choose to specify whether they are predictor or response variables. This does not have to be done at the beginning you could always do it when you are building your regression model. This is also the point that you are able to name your different variables by clicking and typing in the column headers of each variable. The data table window also allows you to transform (create new) variables using the JMP formula editor. In this feature you are able to create new columns of variables like logs, quadratics, etc. From the data table window you can also combine, sort, edit, add, rearrange, and delete values, columns, and rows as well as conduct various other manipulations of the data. In addition to these features it also provides you with an option to create a subset of your data to conduct analysis on or just save for later. If your data is not formatted correctly, you can utilize the options in the data table window to reformat it into the correct form. Any missing values in the data are usually denoted as a dot. Specific data points/variables can also be individually excluded or included in your data analysis/graphics without having to continuously delete and add them using the Exclude/Unexclude command or Hide/Unhide command. Once you have achieved the data format required you can easily save your data by just clicking save from the File menu option. In the analysis of your data JMP will provide several outputs/reports of your analysis, you can save these and append additional reports on to the current one using the Journal Window screen. 1 Summarizing Data with Continuous and Categorical Variable Analysis Once your data is formatted, you can get a brief understanding of your data using numeric and/or graphical visuals. JMP allows you to create graphical visuals such as charts, boxplots, histograms, and even scatterplots. For categorical variables, JMP will create bar charts and frequency tables. For a quantitative variable it will generate histograms, stem and leaf plots, and information about the measures of central tendency, quantiles, and standard deviation. Looking at a categorical variable, to create a barplot you must first order your values so that they will properly appear in the chart. Next you can use the Chart command in the Graph menu options. In the dialog box you must specify which variables you wish to graph (selecting from the variables you classified as continuous) and how to group them (selecting from the variables you classified as categorical). After selecting what you would like plotted you also are able to label axes, specify an axis scale, provide a plot title, and add footnotes on the plot. You are not just limited to creating a barplot, you could always select some of the various other charts listed as well. To get an idea of just the general data distribution you can select the Distribution command and then identify what variable to group by and what variable serves as the counts. This will display a screen that contains a normal bar chart, a stacked bar chart, and a frequency table. An example is shown below in Figure 2. Figure 2. JMP output from the Distribution Figure 3. JMP output from the Distribution command for a categorical variable. Image from command for a continuous variable. Image from the “Introduction of Statistics” website. the “Introduction of Statistics” website. If you don’t like the bar chart horizontally you can easily point and click the right triangle to make it vertical. When looking at quantitative variables we must group together the variable values to 2 have a look at the overall distribution. Once you have done this you can easily make a histogram as previously mentioned. In the dialog box displayed you only need to specify a Y variable. When the final report shows instead of frequencies it will have quantiles and moments information. The graphics will display both a histogram and a boxplot. This can be shown in Figure 3 above. Again if you wish to view the histogram and box plot vertically it can easily be done as previously mentioned. With the histograms you can also adjust the number of bars through the Tools menu command. As mentioned, stem and leaf plots can be made by clicking the red triangle and selecting that option. It will be attached to the original output. Other plots such as time and line plots can also be made using the options in the Chart command. To get some other more detailed numerical visuals you could click the right triangle and select the More Moments command in the Display options. This command will provide you with other detailed things such as variance, skewness, etc. If you don’t like the boxplot displayed above the histogram you can generate a normal boxplot by selecting the Quantile Box Plot command in the Display options. After looking at some of the summary statistics for the data we can look at some other types of analysis as well for both or continuous and categorical data. When analyzing our variables we can also perform a one or two Sample t-Test. This can be done by selecting the red triangle in the output from the Distribution command and selecting the Test the Mean option. A dialog box will pop up asking you to specify the null hypothesis and then it will generate the results of the t-test. Using the information provided from the Distribution command you can also create a qq-Plot by selecting the Normal Quantile Plot in the options provided by the output. In conjunction with running the tests about the Distribution, you can also determine the sample size and power necessary for your analysis. This can be done using the Sample Size and Power command in the DOE menu options. After selecting one of the cases/situations you would like to use, a dialog box will prompt you to enter in basic information such as: the significance level, population standard deviation, population mean, and then either the power or sample size (whichever you don’t enter it will give the other). From this information it will generate a Power vs. Sample Size graph. Polynomial Regression and Diagnostics Simple Case one X and one Y After an initial look at the data has been done we are usually interested in finding some type of model fit for the data. JMP allows several ways to analyze the data. The first step to finding an appropriate relationship is to launch the Fit Y by X command in the Analyze menu option. In this command you will be prompted to specify which values you are interested in plotting as your X and Y variables. It will then create scatterplots for these variables, given that X and Y are both continuous numeric values. To generate and regress a linear model on your data, from the red triangle you can select the Fit a Line command. Or to do a Polynomial fit you can easily select the Fit Polynomial command and specify which degree order you want in the dialog box. For a transformation you can select the Fit Special command and choose a transformation for X and/or Y from what is listed. If the transformation or polynomial fit that you want is not listed you could always create it in the data table using the formula editor as previously mentioned and select the new variable directly from there. For each fit you select JMP will not only add it to the scatterplot but also create 3 tables with summary information concerning the fit. The first table is the Summary of Fit table. This table provides information summarizing the fit. It includes the actual regression equation, Rsquare, Root Mean Square Error, and the Mean of the values. The second table called the Analysis of Variable Table will provide 3 standard information about how well the model fits the data points. It provides standard ANOVA table information such as p-values, the F-test, degrees of freedom, and the values for both the Sums and Mean of Squares. The last table is the Parameter Estimates table. This table gives information about the estimates, estimate standard errors, and p-values. If there are points on the graph that are of interest you can choose to mark, exclude, or include them directly from the plot by clicking on them. When plotting with multiple fits all at once you can easily remove a particular fit by clicking the red triangle to the left of it and selecting that option. One other thing that may also be of great interest is the use of a smoothing spline on the chart. You can do this by selecting the Fit Spline command and selecting a lambda to fit the spline. It will automatically be added to the graph. Multivariate Regression When dealing with multiple X variables, analysis is done similarly to that as mentioned above. However instead of using the Fit X by Y command we use the Fit Model command in the Analyze menu options. Here we are able to select a Y and multiple X variables. If you are interested in testing interactions this can be done by selecting the variables you want to interact and then selecting the Factorial to Degree command from the Macros command in the Fit Model menu option. If you are only interested in interacting two variables you can do this by selecting them and then clicking the Cross button while still in the Fit Model command dialog box. When a multivariate analysis is done we get similar results as before, but we also get some new options. The Correlation Matrix will appear by default. To see the Covariance matrix you can click on the red triangle in the resulting output and request it. And to get a scatterplot matrix you may also request this using the red triangle in the output. To include a nested or random effect in your model you can do specifically specify it in the Fit Model command dialog box. For nested you would use the Nest command and for a random effect you would specify this with the Attributes command. In both simple linear and multivariate regression JMP, given you request it by right clicking on the Parameter estimate box, provides with you various diagnostics such as: residuals, predicted values, confidence intervals, cook’s distance, leverage values, and even variance inflation factors. It also provides plots for a numerous selection of these values. JMP also allows you to use Chi-squared testing to check you model. To investigate a non-linear relationship of an Experimental Design you can also utilize the Nonlinear Design command in the DOE menu options. This function will seek the optimal design for a nonlinear parameter model. In general when looking at an Experimental Design the DOE menu option should definitely be explored in depth, it offers various features to assist in an adequate analysis. Graphical Representation JMP is definitely known especially for its’ graphical interface. It definitely outputs user-friendly, easy to read, and completely publishable formatted graphics as well as numeric results. Figure 3 and 4 following show some examples of numerical and graphical results reported from the JMP output. Overall we find that JMP provides sufficient graphics for data interpretation. Other Notes about JMP Other highlights to be noted about JMP is that it allows you to run Stepwise Regression. You can do this in the Fit Model command dialog box by changing the Personality option to “Stepwise.” Then you are able to enter all of the X variables to be considered, including desired interactions and functional forms. Once the Stepwise dialog box pops up you need to change the Direction box to “Mixed” and enter probabilities for the variable entry and removal box. After it selects an appropriate model you can view it by selecting the Make Model command. You can also perform logistic regression in JMP. For this you first need to select a binary Y variable using the Fit Model command if you have multivariate regression or either the Fit Y by X for a simple linear regression. From the output of the command you 4 would select the red triangle to select options, if required, for your regression. Using the Fit Model command you can also perform various other analyses like: Factorial and Response Surface Analysis. With Factorial Analysis you can generate leverage plots to investigate interactions and main effects. Principal component analysis (PCA) can also be done using the Multivariate Methods command in the Analyze menu options. In the PCA dialog box able to select each of the variables you wish to analyze. Overall, JMP appears to be user-friendly, easy to use, and adequate when it comes to Statistical analysis. Figure 3. Images from the “JMP Introductory Guide” of JMP numerical analysis output. Figure 4. Images from the “JMP Introductory Guide” of JMP graphical visuals. 5 References 1. Ramirez, Ph D Jose G, and Brenda S. Ramírez. "Analyzing and Interpreting ... - Google Books." Google Books. N.p., n.d. Web. 30 Apr. 2010. <http://books.google.com/books?id=H8YiTo8gNU8C&pg=PA326&lpg=PA326&dq=how+to+calcu late+sample+size+in+jmp&source=bl&ots=eJiiyNPCoJ&sig=iKK4XJzm5dCCQMJM83bPx0TCmI0& hl=en&ei=pr_dS8OsMoze9ASEt4WiBw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CB0 Q6AEwAw#v=onepage&q=how%20to%20calculate%20sample%20size%20in%20jmp&f=false>. 2. Moore, David, and George McCabe. "Introduction To The Practice Of Statistics, Fourth Edition." default. N.p., n.d. Web. 30 Apr. 2010. <http://bcs.whfreeman.com/iphttp://www.bibme.org/s4e/default.asp?s=&n=&i=&v=&o=&ns=0 &t=&uid=0&rau=0>. 3. Mauromoustakos, Andy. "Getting Familiar with JMP." University of Arkansas. N.p., n.d. Web. 30 Apr. 2010. <www.uark.edu/misc/.../Getting%20Familiar%20with%20JMP%208.ppt >. 4. "Introductory Guide Second Edition to JMP 8.0." JMP Software - Data Analysis - Statistics - Six Sigma - DOE. N.p., n.d. Web. 30 Apr. 2010. <http://www.jmp.com/support/downloads/pdf/jmp8/jmp_introductory_guide.pdf>. 5. Fricker, Ronald. "JMP Notes." Naval Postgraduate School. N.p., n.d. Web. 30 Apr. 2010. <faculty.nps.edu/rdfricke/Business_Stats/JMP%20Handout.pdf>. 6