CHAPTER 2 Section 2.1 and 2.2 Specifying types of data and how they are treated in the program. Use File > Open Worksheet to open the file pennstate1.mtw. The file contains observations of the variables Sex, HrsSleep, SQpick, Height, RandNumb, Fastest, RtSpan, LftSpan, and Form for 190 College Students. The questions that originated these observations appear listed in Section 2.1. The first seven observations of the worksheet appear below along with the variable names. Note that Minitab adds a “T” to the top row of the worksheet for “text” variables. Section 2.3 1. Summarizing one or two categorical variables – Example 2.1 Name column 1 (C1) Letter Picked and column 2 (C2) Frequency in your Minitab worksheet. Under the categorical variable Letter Picked (C1-T) of the worksheet, enter S in the first cell and Q in the second cell. Under the Frequency column (C2), enter 61 in the first cell and 31 in the second cell. Your worksheet should look like the following worksheet: 20 The worksheet displays the number of individuals that fell into each category (S or Q). To calculate the percent in each category, use Stat > Tables > Cross Tabulation and Chi-Square… and select ‘Letter Picked’ in the dialog box of For rows:. Check off Counts, Total percents and then enter the name of the column that contains the counts or frequencies in the Frequencies are in: dialog box. To display the results click OK. The output in the session window should look like the following: 21 Tabulated statistics: Letter Picked Using frequencies in Frequency Rows: Letter Picked Q S All Count % of Total 31 61 92 33.70 66.30 100.00 Note how Minitab has alphabetized the categories – S and Q. To change this default and to get Minitab to look like Table 2.1, place the cursor on a cell in column 1 and use Editor > Column > Value Order and select Order of occurrence in worksheet. After clicking OK, use Stat > Tables > Cross Tabulation and Chi-square, as described previously, to get the output to look like Table 2.1. For Table 2.2 of the book: name column 1 (C1) Listed First, column 2 (C2) Letter Picked and column 3 (C3) Frequency in your Minitab worksheet. Enter the data as illustrated in the next worksheet. 22 To calculate the percent in each category, use Stat > Tables > Cross Tabulation and Chi-Square and select ‘Listed First’ and ‘Letter Picked’ as the variables to be classified. Check off Counts, Row percents and enter the name of the column that contains the counts or frequencies in the Frequencies are in: dialog box and then click OK. The output in the session window should look like the following: 23 Tabulated statistics: Listed First, Letter Picked Using frequencies in Frequency Rows: Listed First Columns: Letter Picked Q Picked S Picked All Q Listed First 53 54.08 45 45.92 98 100.00 S Listed First 31 33.70 61 66.30 92 100.00 All 84 44.21 106 55.79 190 100.00 Cell Contents: Count % of Row If S was listed first then 61/92 or 66.30% of the students picked the letter S. If Q was listed first then 53/98 or 54.08% of the students picked the letter Q. Note that (45 + 61)/190 = 106/190 or 55.79% of the students picked the letter S. Note that Minitab has alphabetized both the row and the column variables. To change this default and to get Minitab to look like Table 2.2 in section 2.3 of the book, highlight both variables, ‘Listed First’ and ‘Letter Picked’, and use Editor > Column > Value Order and select Order of occurrence in worksheet. Proceed as before using Stat > Tables > Cross Tabulation and Chi-Square. 2. Pie Chart & Bar Graph for one categorical variable – Example 2.3 Open the file pennstate1.mtw. To produce the pie chart of the variable RandNumb (Figure 2.1), use Graph > Pie Chart and enter RandNumb in the Categorical variables: dialog box. 24 Select Labels > Slice Labels and check Category name and Percent. (Optional) Customizing the graph. A title can be added to the graph: For example, type Pie chart of numbers picked in the Labels > Title/Footnotes dialog box. Clicking OK will produce the pie chart below. Notice that 29.5% of the students surveyed chose the number 7 while only 1.1% of the students chose the number 1. Pie chart of numbers picked 9 7.4% 10 3.2% 1 2 1.1%4.7% 3 11.6% 8 10.0% 4 11.1% Category 1 2 3 4 5 6 7 8 9 10 5 9.5% 7 29.5% 6 12.1% To produce a bar chart for the variable RandNumb (Figure 2.2), use Graph > Histogram and select Simple. Select (RandNumb) from the variable list box by clicking twice on the 25 name of the variable, or by highlighting the name of the variable and then clicking on select. Clicking OK will produce the histogram below Histogram of RandNumb 60 50 Frequency 40 30 20 10 0 2 4 6 RandNumb 8 10 The bar chart shows the frequency of the variable (RandNumb) on the vertical axis. (Optional) Customizing the graph. A title can be added to a graph Title: Click on Labels > Title/Footnotes to add a title to your graph. 26 3. Bar Chart for two categorical variables – Example 2.4 (data in Example 2.2), as in Figure 2.3. To obtain the bar chart in Figure 2.3 with the data from example 2.2, name column 1 (C1) Slept with:, column 2 (C2) Eyesight and column 3 (C3) Frequency in your Minitab worksheet. Enter the data as below: To produce a bar chart for myopia and nighttime lighting, use Graph > Bar Chart > Cluster 27 Click OK. Enter the two categorical variables in the dialog box as shown below. Select Bar Chart Options… and check Show Y as Percent and Within categories at level 1 (outermost) 28 Click OK and then select Data Options > Frequency. Enter Frequency in the dialog box. Clicking OK produces the bar chart below. Chart of Slept with:, Eyesight 90 80 70 Percent 60 50 40 30 20 10 0 Eyesight Slept with: High None Some Dark High None Some Full Light High None Some Nightlight Percent within levels of Slept with:. We can edit the previous graph to add some color and to order the categorical variables as they appear in Figure 2.3. Place the pointer on the graph and right-click, select Update 29 Graph Automatically. Place the pointer on any of the bars that appear in your graph and right-click, select Edit bars > Groups, and enter Eyesight into the dialog box. Clicking OK produces the bar chart below. Chart of Slept with:, Eyesight Ey esight High None Some 90 80 Percent 70 60 50 40 30 20 10 0 Eyesight Slept with: e gh ne Hi No S om rk Da e gh ne Hi No S om ht ig ll L Fu e gh ne Hi No Som t gh tli h g Ni Percent within levels of Slept with:. 30 Highlight the two columns that contain the categorical variables and then select Editor > Column > Value Order and select Order of occurrence in worksheet. The graph will automatically update to look like the following. Chart of Slept with:, Eyesight Ey esight None Some High 90 80 70 Percent 60 50 40 30 20 10 0 Eyesight Slept with: ne me igh H No So rk Da ne me igh H No S o t gh tli h g Ni ne me igh H No S o t h ig lL l Fu Percent within levels of Slept with:. If you want the Eyesight and Slept With: labels to appear horizontally, double-click on Eyesight or Slept With:, select Alignment and change Text angle to 0. This will produce the following bar chart. Chart of Slept with:, Eyesight Ey esight None Some High 90 80 70 Percent 60 50 40 30 20 10 0 Ey esight Slept with: None Some High Dark None Some High Nightlight None Some High Full Light Percent within levels of Slept with:. 31 Section 2.4 Five-number summary - Example 2.5 The data are in the worksheet pennstate1.mtw. Click on Stat > Basic Statistics > Display Descriptive Statistics and enter RtSpan in the Variables: dialog box. Enter the variable Sex in the By variables (optional): dialog box. Click OK. The output in the session window gives the summary statistics for the variable right handspan (RtSpan) for each sex. The extremes (low and high) are under the headings Minimum and Maximum. Q1 is the lower (or first) quartile (roughly, the median of the lower half of the values) and Q3 is the upper (or third) quartile (roughly, the median of the upper half of the values). Minitab may give slightly different quartiles than the ones that you find by hand. Note that Minitab also gives other descriptive statistics, i.e. the number of data values (N), the number of missing values (N*), the mean, the standard deviation (StDev), and the standard error of the mean (SE Mean) – StDev/sqrt(N). Also, other descriptive statistics are available by selecting Statistics… in the previous window. 32 Section 2.5 For the following, use women’s right handspans. The data are in the file pennstate1F. 1. Histogram – Figure 2.4. From the main menu click on Graph > Histogram > Simple Fit and enter RtSpan in the Graph Variables: dialog box. 33 Clicking OK will produce a histogram that will look similar to Figure 2.4. The histogram below is based on Minitab’s default settings. Histogram of RtSpan 30 25 Frequency 20 15 10 5 0 12 14 16 18 RtSpan 20 22 The shape of the histogram is skewed left with two possible outliers. Note: If you want to display percents rather than counts then click Scale > Y-Scale Type > Percent. If you want to control how many intervals are to be used then double-click on a bar of your graph and select Binning. Also, you can change the name (RtSpan) of the X-label by double-clicking on the label and typing in the name of your liking in the text box. 34 2. Stem-and-leaf plot – Figure 2.5 From the main menu click on Graph > Stem-and-Leaf and enter RtSpan in the Graph Variables: dialog box. If you want to display all of the data then uncheck Trim outliers. Click OK.. The output in your session window should look like the following: Stem-and-Leaf Display: RtSpan Stem-and-leaf of RtSpan Leaf Unit = 0.10 1 2 2 2 3 6 17 36 (29) 38 16 2 12 13 14 15 16 17 18 19 20 21 22 23 N = 103 5 0 0 005 00000005558 0000000000000005555 00000000000000000000022555555 0000000000000000055557 00000000555555 02 The shape of the plot is skewed left with two possible outliers (12.5 and 13.0). Notes: The stem-and-leaf display, also called stem-plot, is very much like the histogram except all the values are displayed and in an ordered fashion. Each observation occupies a physical space, which is not only displayed but also contributes to the shape of the distribution. 35 Each observation is split into two parts, the stem and the leaf. For example, consider the values 23.0 and 23.2 in the data set. The number 23 is considered the stem and the two leaves are 0 and 2. The stem can have several digits if necessary, but the leaf typically only has one digit per observation. In this example, we read 23|2 as 23.2 because at the beginning of the display it says ‘Leaf Unit = 0.10’. If it had displayed ‘Leaf Unit = 1’, then 23|2 would mean 232. If ‘Leaf Unit = 10’, then 23|2 would mean 2320. Always be careful to read the units before interpreting a stem-and-leaf display. Stem-and-leaf displays are very revealing because they can show any gaps in the distribution of the observations. In the above example, we know that there are no observations in the interval 14.0-15.9 because there are no leaves to the right of the stems 14 and 15. Minitab does not include the vertical line that divides the stem from the leaf; the display in the book does include this line. The numbers in blue on the above graph do not appear in the display in the text. Minitab includes these values (‘depths’) to a stem-and-leaf plot; the depths are actually printed in black, we highlighted them in blue here for easier reading. The depth of an observation is the relative position of the observation counting from the nearest extreme. For example, the depth of 23.0 is 2 because it is the second value counting from the upper extreme. In the same way 13.0 also has depth 2 because it is the second value from the lower extreme. The depths reported for each line are the maximum depth of the values in that line. The depths help for counting. For example, the depth 16 for the line with stem 22 means that there are 16 observations with the value 22 or higher. The depth 17 for the line with stem 18 indicates that there are 17 observations with the value 18.8 or lower. The line where the center (median) of the distribution is located does not report the depth but the (number) of observations in that line. Parentheses around the median row are omitted if the median falls between two lines of the display. 3. Dotplot – Figure 2.6. From the main menu click on Graph > Dotplot > Simple and enter RtSpan in the Graph Variables: dialog box. Click OK. 36 Dotplot of RtSpan 12.6 14.0 15.4 16.8 18.2 RtSpan 19.6 21.0 22.4 Note: The shape of the plot is skewed left with two possible outliers. Section 2.6 For the following, use Example 2.10, fastest driving speeds for men. Data are in the file pennstate1M. 1. Summary Measures, including mean, median, quartiles, low and high values, range, interquartile range. From the main menu click on Stat > Basic Statistics > Display Descriptive Statistics and enter Fastest in the Variables: dialog box. 37 Clicking the Statistics… button will give the following window: Check off all desired statistics. Clicking OK returns you to the previous window. Clicking OK again displays the desired statistics in the session window: 38 Descriptive Statistics: Fastest Variable N Fastest 87 N* Mean SE Mean StDev 0 107.40 1.87 17.43 Variable Maximum Range Fastest 150.00 95.00 Minimum Q1 55.00 95.00 Median Q3 110.00 120.00 IQR 25.00 2. Boxplot From the main menu click on Graph > Boxplot > Simple and enter Fastest in the Graph Variables: dialog box. If boxplots are required for each group of data, for example one boxplot for males and one for females, we could use Graph > Boxplot > With Groups. Clicking OK will produce a boxplot. The boxplot represents a picture of the five-number summary. 39 Boxplot of Fastest 150 Fastest 125 100 75 50 Minitab represents points further than 1.5 × IQR from either edge (lower and upper quartiles) of the box with an asterisk. In this case, there is an outlier at approximately 55 mph. ‘Distant outliers’ are values beyond Upper quartile +3×IQR, or below Lower quartile-3×IQR. Note: If one wants to compare a quantitative variable across categories then Graph > Boxplot > With Groups is needed. For example, Figure 2.8 can be reproduced using pennstate1 and entering RtSpan under Graph Variables: and Sex under Categorical variables for grouping (1-4, outermost first):. Section 2.7 1. Drawing a histogram with a superimposed normal curve. Use women’s right handspans (picture not in text). Data are in the file pennstate1F. Click on Graph > Histogram > With Fit and enter RtSpan in the Graph Variables: dialog box. 40 Clicking OK produces a histogram with a superimposed normal curve: Histogram of RtSpan Normal 30 Mean StDev N 25 20.02 1.764 103 Frequency 20 15 10 5 0 12 14 16 18 RtSpan 20 22 24 Note: Since the histogram is skewed left the normal curve does not follow the rectangles of the histogram very well. 2. Calculating standard deviation and variance Use the small pulserate data set from Section 2.7. Enter the numbers 62, 68, 74, 76 into a column of a worksheet. You can name that column x. From the main menu click on Stat > Basic Statistics > Display Descriptive Statistics and enter x in the Variables: dialog box. 41 Click the Statistics… button and select Standard deviation and Variance. Click OK to get your results. Descriptive Statistics: x Variable x StDev 6.32 Variance 40.00 Notes: StDev = standard deviation =s. The text rounds the standard deviation to 6.3. 42