Minitab Reference Manual: A gentle overview Table of Contents 1. Introduction to Minitab Importing Data 2. Data Analysis and Statistical Concepts 2 5 5 9 Concept 1 – Measurements of Central Tendency 9 Concept 2 – Measurements of Dispersion 24 Concept 3 – Visualization of Univariate Data 29 Concept 4 – Visualization of Multivariate Data 47 Concept 5 – Random Number Generation And Simple Sampling 65 Concept 6 – Confidence Intervals 71 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University These reference manuals have been developed to assist students in the basics of statistical computing – sort of a “Statistical Computing for Dummies”. It is not our intention to use this manual to teach statistical concepts 1…but rather to demonstrate how to utilize previously taught statistical and data analysis concepts the way that professionals and practitioners apply them – through the able assistance of computing. Proficiency in software allows students to focus more on the interpretation of the output and on the application of results rather than on the mathematical computations. We should pause here and strongly make the point that computers should serve as a medium of expediency of calculation – not as a substitution for the ability to execute a calculation. In the Basic Concepts manual, we present statistical concepts, context for their use, and formulas where appropriate. We provide exercises to execute these concepts by hand. Then, in each subsequent manual, the concepts are applied in a consistent manner using each of the five major statistical computing packages – Excel, SPSS, Minitab, R and SAS. Readers of this manual are assumed to have completed some introductory statistics course. For individuals wishing to review statistical concepts, we recommend Introduction to Stats by DeVeaux, Velleman and Bock. 1 3 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Minitab What is Minitab? Minitab was first developed in 1972 at Penn State University (Go Nittany Lions!). Initially, it was developed as a teaching tool to help professors teach basic statistics. It is still used for that purpose at more than 4000 colleges and universities around the world. One reason for its popularity in this venue is that it is a user-friendly, menu-driven interface – much like SPSS. Because it offers accurate and customizable analysis tools for quality control and other important business and industry functions, it is also now widely used by companies of all sizes. It is currently the package of choice at many manufacturing Fortune 500 companies including Ford Motor Company, 3M, Honeywell International, and Samsung. 4 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Getting data into Minitab Prior to actually executing any of the statistical concepts from the Basic Concepts Manual, we first need to get the WidgeOne.xls dataset into the Minitab system and convert it into a Minitab file. When you open Minitab you should see the following screen: This is the Session Window. This is the Data Window. 5 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University As shown above, the display consists of two windows. The Session Window is at the top and is where you will see commands and results displayed. The Data Window is at the bottom. It is the worksheet where the data is displayed in a spreadsheet format. You now need to open up the WidgeOne dataset in Minitab. In the File menu choose Open Worksheet as shown below: 6 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You will next see a typical window for opening a file. Remember the WidgeOne.xls dataset is an Excel file. Your computer will initially be looking for Minitab files only. You have the option of looking for files of any type. The window below shows the system being instructed to look for Excel files. It also shows the WidgeOne dataset being selected: Click Open. 7 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should then see the following display: The three worksheets in the Excel file Widgeone.xls have all been converted to separate Minitab worksheets, named Attendance, Employees and Plant_Survey. The statistical analysis for this guide is exclusively on the Plant_Survey worksheet. You can close out of the others now. Make sure to go to the File menu and save the Plant_Survey worksheet for future use. 8 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Concept 1: Using Minitab for Measurements of Central Tendency Minitab is a menu-based system. Thus it is only a matter of finding what you want to do on the menus and customizing your request. For most computations, you should find Minitab (like SPSS) to be easier than Excel. In order to find the two most predominant measures of central tendency (the mean and median) we start in the Stat menu. Within that menu, choose Basic Statistics and Display Descriptive Statistics as shown: 9 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Next you will see the following: We need to choose the variables for which we are interested in finding the mean and median. We will choose only the quantitative variables YRONJOB, PRDCTY and JOBSAT. We make these selections by clicking on the variables while holding down the Control Key and then clicking on the Select button. This button will appear darker once a variable has been highlighted. 10 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University This interface is common to almost every function in Minitab. The Select button will not activate until at least one variable has been highlighted for selection: 11 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University After you have selected the variables for analysis, your screen should look like this: Now click on the Statistics button. This will show you the statistics selected for display. There are many more statistics on this list than you have been exposed to in this course. 12 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The following dialogue box will appear: We can select several different descriptive statistics. Statistics are selected by clicking in the box next to each until a check mark appears. In this case, we have selected only the mean and median. Once your selections are made, click the OK button and then click OK again in the Display Descriptive Statistics window. 13 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University We obtain the following display containing the means and medians of our five variables. The display appears in our Minitab Session window: Results for: Plant_Survey Descriptive Statistics: JOBGRADE, PRDCTY, SOCREL, YRONJOB, JOBSAT Variable JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT N 40 40 40 40 40 Mean 6.600 84.58 5.500 8.290 6.850 Median 6.500 84.81 5.000 8.350 6.600 Notice that these figures are consistent with what we had generated using Excel and SPSS and what we had computed by hand. Again, it is nice when numbers match! 14 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Before we go on to looking at subsets of the data, let’s recode the values of the qualitative variables we will be using with meaningful labels. The variables Plant and Gender are coded with single letters (N = Norcross, D = Dallas, etc.) We wish to recode these variables, so our displays and graphs will have more descriptive names. These are the steps we use to accomplish this. In the Data menu, select Code and then Text-to-Text as shown below: We choose Text-to-Text because we wish to change text values (like D) to other text values (like Dallas). 15 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University We see a box like the one below. In this example, we have chosen the variable Plant as the column to code the data from and also as the column to code the data into. This means we will recode the data within the same column rather than choosing another one for the recoded values. Fill in the rest of the box as below: 16 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Click OK. The Minitab Data Window should now look like this: 17 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Perform the same type of recode for the Gender variable (M = Male, F = Female). Your data will then appear as follows: Now we are ready to look at subsets of the data that will be determined by these qualitative variables. For example, what if we wanted to know the measurements of central tendency of these variables by gender and by plant? We would proceed exactly as before – Stat>Basic Statistics>Display Descriptive Statistics. We again choose the same three variables. 18 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University This time we will click inside the box called By variables. Once we click inside this box, the menu of variable choices grows to include the qualitative variables from our set. Minitab knew we could not calculate means and medians for qualitative variables and did not include those in the variable selection box. However, when we wish to subset the data, these variables do come in as options. Please note that you should only place qualitative variables in the By variables box. For the first analysis, choose Plant and then click on the Select button. You should see the following display: Click on the Statistics button to choose Mean and Median again. Also check the N Total box this time, so we get the frequency of each subset. Click OK and OK. 19 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The following display will appear in your Session Window: Descriptive Statistics: JOBGRADE, PRDCTY, SOCREL, YRONJOB, JOBSAT Variable JOBGRADE Plant Dallas Norcross N 23 17 Mean 6.870 6.235 Median 7.000 6.000 PRDCTY Dallas Norcross 23 17 88.34 79.49 90.42 79.86 SOCREL Dallas Norcross 23 17 5.522 5.471 5.000 5.000 YRONJOB Dallas Norcross 23 17 8.104 8.541 8.000 9.000 JOBSAT Dallas Norcross 23 17 7.148 6.447 7.000 6.300 Follow the same series of steps, only this type select Gender for the By variables box. Your output should look like this: Descriptive Statistics: JOBGRADE, PRDCTY, SOCREL, YRONJOB, JOBSAT Variable JOBGRADE Gender Female Male N 20 20 Mean 6.300 6.900 Median 6.000 7.000 PRDCTY Female Male 20 20 83.97 85.19 84.90 84.81 SOCREL Female Male 20 20 6.000 5.000 6.000 5.000 YRONJOB Female Male 20 20 8.18 8.395 8.50 8.350 JOBSAT Female Male 20 20 6.980 6.720 6.700 6.400 20 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University As a rule, we do not use the mode as a Measurement of Central Tendency with quantitative data. If the data is qualitative – Plant, Gender, Position – the mode is the ONLY Measurement of Central Tendency available. We can determine the mode of variables such as these by choosing the Stat menu and within that menu selecting Tables and Tally Individual Variables as shown here: 21 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You will see the window below. Select the variables Plant, Gender and Position as shown: 22 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Click OK. This output appears in the Session Window: Tally for Discrete Variables: Plant, Gender, POSITION Plant D N N= Count 23 17 40 Percent 57.50 42.50 Gender F M N= Count 20 20 40 Percent 50.00 50.00 POSITION HRLY MGT N= Count 20 20 40 Percent 50.00 50.00 It is easy to see that Dallas is the modal value for the Plant variable. It is also easy to see that there is no mode for the other two variables in this example. 23 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Concept 2: Using Minitab for Measurements of Dispersion To represent the dispersion of a quantitative variable (Measurements of Dispersion are not relevant for qualitative variables), we typically report the standard deviation. To do this in Minitab, we return to the Stat menu. Again we choose Basic Statistics and Display Descriptive Statistics. Select the 5 quantitative variables as before. Do not select any variables in the By variables box. Click on Statistics and select Standard deviation: Click OK and then OK. 24 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Here is the output you should see in the Session Window: Descriptive Statistics: JOBGRADE, PRDCTY, SOCREL, YRONJOB, JOBSAT Variable JOBGRADE PRDCTY SOCREL YRONJOB JOBSAT N 40 40 40 40 40 StDev 1.549 7.26 1.468 4.257 1.021 We could have obviously included lots of statistics in our analysis simply by choosing the ones we want from the Statistics screen. The second Measurement of Dispersion discussed in The Basic Concepts Manual was the frequency table. In that manual and the Excel manual, when we created a frequency table for the job tenure variable, we created three categories: less than 5 years, 5-10 years and more than 10 years. To create these same categories in Minitab, we need to recode our YRONJOB variable into a new variable called JOBTEN. 25 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Go to the Data menu and choose Code>Numeric to Text: We have selected Numeric to Text because we are changing the numerical variable YRONJOB to a qualitative variable that we will call JOBTEN. 26 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You will see a screen like the one below: Select the YRONJOB variable as it is the one to be recoded. Type the name of the new variable JOBTEN in the Into Columns box (It is a new name, so it is not a choice to be selected from the left-hand menu). Then fill in the old and new values as we have them above. Note that values of YRONJOB between 0 and 4.9 will be coded as New. Values between 5 and 10 are coded as Experienced, and values over 10 are coded as Mature. Click OK. 27 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Your Data Window should now have the new text variable JOBTEN in it: Now we can easily generate a frequency table for the new variable JOBTEN. Once more go to the Stat menu, select Tables>Tally Individual Variables. Select the variable JOBTEN. Click OK. You should see output like this in your Session Window: Tally for Discrete Variables: JOBTEN JOBTEN Experienced Mature New N= 40 Count 16 15 9 Percent 40.00 37.50 22.50 Well Done! 28 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Concept 3: Using Minitab for Visualization of Univariate Data As stated previously, for professional presentation or for formal documents, we recommend the use of a graphics package (e.g. Microsoft Power Point). However, Minitab (like SPSS) has some nice graphs available in the Graph menu, which can be used less formally. To replicate the pie chart developed in The Basic Concepts Manual, we go to the Graph menu and select Pie Chart. Our first choice is whether we are charting counts of unique values (raw data) or values from a table. Our data are unique values (meaning that it is coming straight from the dataset), so we check this box. We then must click inside the Categorical Variables box. Once this is done, the box on the far left will fill with variable choices from our data set (yes, we know…this is one of the more annoying aspects of Minitab). Select JOBTEN as your variable for this graph. 29 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Here is the screen just before we click OK to draw our pie chart for JOBTEN: Select the Labels button. Under the Titles tab, give your pie chart a meaningful title. 30 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Then select the Slice Labels tab: Identify that you want the slices to be labeled with the Category name and the Percent (remember that the reason that we create Pie Charts is to visually represent the proportions). Select OK and OK. 31 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see the following chart: Job Tenure of Employees at the Dallas & Norcross Plants Category Experienced Mature New New 22.5% Experienced 40.0% Mature 37.5% Nice…except you will notice that the tenure categories are in alphabetical order, not inherent order. We would prefer to see the order New, Experienced, Mature. 32 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University To reorder the values, go back to the Plant_Survey sheet. Click on any value in the JobTen column. Now, right click. Select Column>Value Order: 33 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should now see this screen: Reorder these values manually in your preferred order Click OK. 34 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Now your graph looks like this: Job Tenure of Employees at the Dallas & Norcross Plants New 22.5% Category New Experienced Mature Mature 37.5% Experienced 40.0% Much better! You will find that of all of the packages, Minitab has some of the strongest graphics. This graphic can be transported into a word document by right clicking with your mouse and selecting “copy graph.” (If you are saying “Hey…how do I get back to my datasheet!??...just go to Window>Plant_Survey) 35 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University If you need to create a pie chart to understand a quantitative variable (e.g. productivity) relative to a qualitative variable (e.g. Plant), in Minitab you must begin by getting some summary statistics for the quantitative variable with the qualitative one used to subset it. You can do this by selecting Stat>Basic Statistics>Store Descriptive Statistics. This process will store (save) our descriptive statistics in a table in our Data Window, so the Pie Chart command can use the results to make a chart. Replicate the window below. We are asking for statistics on PRDCTY by Plant. Click on the Statistics button and choose the statistic Sum. Click OK. 36 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You will see the following in your Data Window: You can think of this information as a little table within your datasheet that tells you the total amount of (summation of) productivity attributed to each plant. Now, you are ready to make the pie chart for this data. Choose Graph>Pie Chart. You should indicate this time that your chart values are in a table. The categorical variable for your table was named ByVar1 (change this in the worksheet if you need to). The Summary variable was named Sum1 (again – change it if you need to). 37 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Fill out the screen as below: Click on Labels and title the chart “Chart of Productivity by Plant”. Warning Warning Warning Will Robinson: If you do not do this then labels made for other charts will likely display on this new one. It happens to the best of us (well…not us…but other people)! Select the Slice Labels as necessary. Click OK. 38 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see a pie chart similar to the following: Chart of Productivity by Plant Category Dallas Norcross Norcross 39.9% Dallas 60.1% Next, we wish to replicate the bar chart from The Basic Concepts Manual, which displayed the frequency count for each value of the variable JOBTEN. 39 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Again, go to the Graphs menu. This time choose “Bar Chart”. This action will produce: Select the Simple chart as above and click on OK. Select JOBTEN as the categorical variable. Provide a title by clicking on the Labels button. Then click OK. 40 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see something like this: Bar Chart of Employee Tenure 18 16 14 Count 12 10 8 6 4 2 0 41 New Experienced JOBTEN Mature Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University If you need to generate a different style of bar chart such as the one with horizontal bars, you can play with some of the options in the Bar Chart panel. For example, to obtain horizontal bars, click on the Scale button before clicking OK. On the Axes and Tick screen select Transpose values and category scales as shown below: Click OK and OK. 42 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see something like this: Bar Chart of Employee Tenure JOBTEN New Experienced Mature 0 2 4 6 8 10 Count 12 14 16 18 To create a stem and leaf display for the variable YRONJOB, go to the Graph menu and select Stem and Leaf (Notice that only the quantitative variables are available for this graphic). Select YRONJOB as your variable. Click OK. 43 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You will get a Stem and Leaf Display like the one below in your Session Window: Stem-and-Leaf Display: YRONJOB Stem-and-leaf of YRONJOB Leaf Unit = 1.0 2 7 12 16 (8) 16 9 5 1 0 0 0 0 0 1 1 1 1 N = 40 01 22233 44555 6777 88888999 0000111 2333 4445 7 A quick note on interpretation of this messy output…the (8) is in brackets to signify that this is the stem with the greatest number of observations. Here, the values include 8.x, 8.x, 8.x, 8.x, 8.x, 9.x, 9.x, 9.x years of service. 44 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University To get a Boxplot for YRONJOB, go to the Graph menu and select Boxplot. You will see a screen like the below where you can choose the style of Boxplot you need. Choose Simple for this first one. Click OK. Choose YRONJOB for your variable. Click OK again. 45 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You will see the following: Boxplot of YRONJOB 18 16 14 YRONJOB 12 10 8 Recall that in a box plot, the “box” represents the middle 50% of the dataset (the top of Q1 and the top of Q3), and the line inside the box represents Q2 or the median. 6 4 2 0 46 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Concept 4: Using Minitab for Visualization/Organization of Multivariate Data Contingency tables, stacked bar charts, 100% stacked bar charts and scatter plots can be easily generated in Minitab. In order to use Minitab to reproduce the contingency table examining plant and gender from The Basic Concepts Manual, simply go to the Stat menu. Choose Tables then choose Cross Tabulation and Chi Square (although we are not actually calculating the Chi Square stats, some of the information that we need is under this option). Choose Plant for your rows and Gender for your columns. The Cross Tabulations function in Minitab is quite flexible. If you wish to include more than just the frequency counts in the cells of your table, place a check next to row percents, column percents and total percents, as we have below: Place the Plant variable in the Row position. Place the Gender variable in the Column position. 47 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Click OK. The contingency table below should appear in your Session Window: Tabulated statistics: Plant, Gender Rows: Plant Columns: Gender Female Male All Dallas 13 56.52 65.00 32.50 10 43.48 50.00 25.00 23 100.00 57.50 57.50 Norcross 7 41.18 35.00 17.50 10 58.82 50.00 25.00 17 100.00 42.50 42.50 20 50.00 100.00 50.00 20 50.00 100.00 50.00 40 100.00 100.00 100.00 All Cell Contents: Count % of Row % of Column % of Total Notice the key at the bottom indicating that the cell contents have the count (frequency) on the top, followed by row percents, column percents and total percents. Wow…look how much output was created in a single table! That was so much easier than Excel! The output table contains the conditional probabilities described in The Basic Concepts Manual. In the first “cell” – the intersection of Female and Dallas – we have four pieces of information. We know that there are 13 women who work in Dallas. We know that of all of the Dallas employees, 56.5% are female. We know that of all of the women, 65% are in Dallas. Finally, we know that of all employees, 32.50% are females in Dallas. 48 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University If you need to subset this information further (e.g., by Job Tenure), there is an easy way to do that. Go back to the Stat>Tables>Crosstabulation and Chi-Square screen. This table will be a little busy, so let’s just choose the Counts this time. Make your selections of the three variables as follows: Click OK. 49 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The table below will appear in the Session Window: Tabulated statistics: Plant, Gender, JOBTEN Results for JOBTEN = Experienced Rows: Plant Columns: Gender Female Male All 3 3 6 5 5 10 8 8 16 Dallas Norcross All Cell Contents: Count Results for JOBTEN = Mature Rows: Plant Columns: Gender Female Male All 6 2 8 3 4 7 9 6 15 Dallas Norcross All Cell Contents: Count Results for JOBTEN = New Rows: Plant Columns: Gender Female Male All 4 2 6 2 1 3 6 3 9 Dallas Norcross All Cell Contents: 50 Count Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Notice that the same information on Plant and Gender counts has now been provided by each level of Job Tenure – Experienced, Mature and New (the levels are reported in alphabetical order rather than by order of magnitude). The stacked bar charts developed in The Basic Concepts Manual can be easily developed in Minitab. Start in the Graphs menu. Choose the option Bar Chart. You will see: Make sure you choose the Stacked option as shown and click OK. 51 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Then select the variables so that the category axis is Plant and the bars are stacked by Gender. This is done by selecting Gender last and making sure the “stack categories of last categorical variable” box is checked. See below: Add a title if you like. Then, as always, click OK. 52 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see this: Bar Chart of Gender by Plant 25 Gender Male Female 20 Count 15 10 5 0 Plant 53 Dallas Norcross Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The 100% Stacked Bar Chart is a little less straight forward to generate. Select Graph>Bar Chart>Stack>OK as before. Assign the variables Plant and Gender as before. Then select Chart Options. You will see the following screen: To generate the 100% calibration of the bars within each plant value, set the Y-axis to be shown as a % value and accumulate the values within each category. Select OK. 54 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see the following screen: 100% Bar Chart of Gender by Plant Gender Male Female 100 Percent 80 60 40 20 0 Plant Dallas Norcross Percent within levels of Plant. 55 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Another multivariate visualization technique is the scatter plot. Again, Minitab provides us with flexibility to subset our analysis if needed. Consider the relationship between Job Satisfaction and Productivity as we did with SPSS in Chapter 5. This plot can be replicated in Minitab by going into the Graphs menu and choosing Scatterplot. A choice of types of Scatterplots follows. Choose the Simple scatter plot: Then click on OK. 56 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Next, choose PRDCTY for the Y-axis variable and JOBSAT for the X-axis variable as shown below: Click OK. Click on the Labels button and add an appropriate title. Click OK and OK. 57 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Here is the associated output: Productivity versus Job Satistifaction 100 95 PRDCTY 90 85 80 75 70 5 6 7 JOBSAT 8 9 There is a slightly positive relationship between these two variables – it appears as if Job Satisfaction and Productivity are related. 58 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The side-by-side histograms from the Basic Concepts manual can be created much like the univariate histogram. Go to Graph>Histogram and select the simple histogram and click OK. Highlight the quantitative variable, YRONJOB, and click select. You should see this: Click on the multiple graphs button. 59 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The following dialogue box will open up. Be sure to click “In separate panels of the same graph” because we want side-by-side histograms of YRONJOB. After that, click the “by Variables” tab to select which qualitative variable we want to stratify the histograms by. 60 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University At this screen we can select which variable to group by. Highlight plant and click select, then ok. 61 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should get the following graph of histograms of job tenure stratified by plant: Histogram of Widge One Employee Job Tenure by Plant 0 Dallas 4 8 12 16 Norcross 5 Frequency 4 3 2 1 0 0 4 8 12 16 Years on Job Panel variable: Plant Great Job! 62 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The side-by-side box plots can be generated much the same way. Go to Graph>box plot> select simple and then click ok. Once again, highlight YRONJOB, click select and then click the “multiple graph button”. You should see this: Just we did with the side-by-side histograms, choose “in separate panels of the same graph” and click the “by Variables” tab to select Plant. Click ok and ok. 63 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see this: Boxplots of Widge One Employee Job Tenure by Plant Dallas 18 Norcross 16 Years on Job 14 12 10 8 6 4 2 0 Panel variable: Plant Fantastic! Side-by-side graphics are a great way to visualize multivariate relationships. A word of warning: This is only a multivariate analysis if the graphics are of one variable grouped by another. A histogram with one panel showing years on job and another showing productivity scores is NOT a multivariate visualization. 64 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Concept 5: Using Minitab for Random Number Generation and Simple Random Sampling Like the other software applications, Minitab will generate random numbers using the internal clock in the computer. As a result, every time a command is given to Minitab to generate some set of random numbers, a different set of random numbers will be generated. The software normally chooses its own starting point for the generation process by using the time of day to choose a random starting point in the string. Sometimes, however, you may wish to control where Minitab starts its string. For example, you may wish to repeat a sequence by generating the same set of random data. In this case, the BASE command tells the random number generator where to start. The generator will use this base until you set a new BASE or exit Minitab. If you need to set the “base” number so you can replicate your results, simply go to the Calc menu. Choose the Set Base option. You should see the following screen: Here, we have not chosen a base. We could have chosen a positive integer as our base. In doing so, we could replicate our results anytime we wish to do so by going back in and resetting the base to that value. To create a string of random numbers, which is uniformly distributed between 0 and 1, go to Calc > Random Data>Uniform. You may note here that Minitab has a lengthy list of distributions, as did SPSS, which can be used to generate random samples. Indeed, this sort of procedure is quite easy and versatile with this software. 65 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University We will generate 40 values from this normal distribution – with one value for every observation. We will name our new data column “Group”. Every distribution has parameters that must be specified. For the uniform distribution, the only parameters are the two values between which we want our random numbers to fall. We choose these values to be 0 and 1. Fill in your window like the one below: Click OK. The new variable Group should appear in your Data Window. 66 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Here is what a typical result would look like: Remember that your results will vary since this variable was randomly generated. One of the primary reasons for generating random numbers is to assign observations into statistically independent groups. Using the random numbers, let’s assign the 40 observations into 3 groups. Here is one way to accomplish this: Choose Data>Code>Numeric to Text. 67 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The Code – Numeric to Text screen should look like this: Note that since the distribution of the random numbers is uniform, each random value has an equal probability of occurrence. This is very useful information for assignment of groups. If you are interested in assigning groups of approximately equal size, then you should allocate the values of 0 through .33 to one group, .34 to .66 to another, etc. If you want the first group to have approximately 25% of the population, then allocate the random values of 0 through .25 to the first group, etc. Cool. You should see something like the following in your Data Window: 68 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Again, remember results will vary due to the randomness (pun intended) of this procedure. This procedure has taken the 40 observations and assigned them into 3 groups based upon the random numbers created in the previous procedure. Each of the 40 employees is now in one of these randomly assigned, independent groups. 69 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Because this process of selecting a random sample from a set of data is so common, there is a very straight-forward way to accomplish this in Minitab. Because we will be subsetting this dataset, go ahead NOW and save the Minitab file that you are working in – File>Save Current Worksheet As (this will allow you to save it as an Excel spreadsheet again). Now, suppose we wish to select a simple random sample of 30 individuals from this dataset. Go to Calc>Random Data>Sample from Columns. You should see a screen like the following: Specify that you wish to select a random sample of 30 cases. In the From columns box, identify all of the variables. Then, identify all of the variables again for the Store samples in box. This will effectively take a random sample of size 30 from our dataset and discard the observations that were not selected (did you save your file?). Select OK. You should now be left with 30 observations. 70 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Concept 6: Using Minitab for Confidence Intervals Generating confidence intervals in Minitab is very easy. For example, if we wish to compute a 95% confidence interval for the mean Job Satisfaction rating of all employees, we would go to Stat>Basic Statistics>One-Sample T2. We would see the following screen on which we could choose the variable(s) we want to include in our analysis. This time we choose JOBSAT as the Test Variable: Click on the Options button. The default setup is the following. This selection will produce a complete 95% confidence interval. 2 Ttests are very common tests used to determine if two sample means differ significantly or if one sample mean differs from some established value. For more detailed information on Ttests, we suggest Statistical Methods and Data Analysis by Ott and Longnecker. 71 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Click OK and then OK. You will see the following output in your Session Window: One-Sample T: JOBSAT Variable JOBSAT N 40 Mean 6.850 StDev 1.021 SE Mean 95% CI 0.161 (6.524, 7.176) As stated in previous manuals, these results would be reported as: “Based on a representative sample of 40 employees, we are 95% confident that job satisfaction among all employees is estimated to be between 6.52 and 7.18”. This means that the probability that the “true” mean job satisfaction of all employees, which is unknown, falls between 6.52 and 7.18 is 95%. It also means that there is a 5% probability that the true mean job satisfaction is outside of this range (< 6.52 or > 7.18). 72 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Another option here, which is only available for the 95% Interval (the most common), is the Interval Chart. Let’s look at the 95% Interval graphic for Job Satisfaction by Plant. To do this, go to Graph>Interval Plot. Since we have one quantitative variable (Job Satisfaction) that we want to evaluate by two groups within a qualitative variable (Plant), select One Y With Groups: 73 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Select OK. Assign JobSat to the Graph variable and Plant to the Categorical variables for grouping: Add a Title to your graphic as appropriate through the Labels button. Select OK. 74 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University You should see the following Plot: Interval Plot of JOBSAT 95% CI for the Mean 7.6 JOBSAT 7.2 6.8 6.4 6.0 Dallas Norcross Plant Now that’s what I’m talking about! 75 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Minitab Lagniappe What is a “Lagniappe”? This word derives from New World Spanish la ñapa, “the gift”. The word came into the Creole dialect of New Orleans and there acquired a French spelling. It is still used in the Gulf States, especially southern Louisiana, to denote a little bonus that a friendly shopkeeper might add to a purchase. Our lagniappe for our readers includes the extra and interesting things that we have learned to do with these software packages that might not be easily found or well known. A little extra information at no extra cost! We’ve mentioned before that Minitab is very strong in graphics. The Lagniappe we’re giving you is about those graphics. Recall the 95% Confidence Interval plot of job satisfaction by Plant – what if we wanted to create the same graph but stratify by gender instead of plant? Well, we could go through the whole process of creating the graph from scratch again. In fact, Minitab is very helpful in that the previous entries are retained for the next time you open the graph builder. But we would still have to go in and select gender and change the title to say gender and, well, we (and now you) are more consummate Minitab users than that. We can do better. 76 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Minitab has a very useful “similar graph” option. Go to Editor>Make Similar Graph 77 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The following dialogue box will appear: Now make your changes by clicking on the box where you want your new variable, highlighting it from the list on the left and then clicking select. 78 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University The new graph looks like this: Interval Plot for Job Satisfaction 95% CI for the Mean 7.6 7.4 JOBSAT 7.2 7.0 6.8 6.6 6.4 6.2 Female Male Gender 79 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Another great attribute about graphics in Minitab is how easy it is to customize the colors, patterns, fonts, etc. Simply double click on the area you want to change and the graph and an interactive figure editor will open up. From there you have numerous options for customizing your graphs (you can even change the titles and axis labels without having to recreate the entire graph). 80 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University Congratulations – you are well on your way to becoming a STAT geek! Be proud. 81 Developed and maintained by the Center for Statistics and Analytical Services of Kennesaw State University