Using JMP Scripts in Introductory Statistics* Amy G. Froelich Iowa State University William M. Duckworth Creighton University Concepts from Introductory Statistics: • Relationship Between Mean and Median • Hypothesis Testing • Connections Between • Normal Quantile Plots • CIs and Two-Sided Testing • Regression and Residual Plots • Between Rejection Rates and α • Sampling Distributions • Relationships Among Testing • Central Limit Theorem Conditions and Power • Normal Distribution vs. t distribution • Sample Size • Confidence Intervals • Alpha Level • Variability • Difference Between True and • Relationship Between Sample Size and Width of CI • Connection Between Coverage Rates and Confidence Level Hypothesized Parameter • Contingency Table • Test for Two Proportions • Fisher’s Exact Test Advantages over Java applets available on web: • Script output in same format as JMP data analysis output. • Flexibility to create script to match and expand different activities. • Internet access not required. Disadvantages over Java applets available on web: • Programming knowledge of JMP scripting language required. • Dependent upon JMP platform. Some Resources for JMP Scripts: • JMP Scripting Library http://www.jmp.com/support/downloads/jmp_scripting_library/ • Statistics Education Materials Repository at Iowa State University http://stated.stat.iastate.edu/ • Commercially Available Scripts Predictum Management Sciences (www.predictum.com) *This material is based upon work supported by the National Science Foundation under Grant No. 0231322. Inference for the Mean Population of 200 Female Heights Hands-On Activity: 0 1 2 3 4 5 6 7 8 9 00 68 67 72 74 63 68 67 68 68 65 01 67 64 63 67 66 64 67 68 69 66 02 70 69 62 60 65 66 67 64 63 72 03 69 66 73 62 68 72 66 69 66 64 04 64 67 65 69 67 68 61 63 65 70 05 66 66 68 69 62 69 66 62 61 65 06 64 68 62 67 62 65 66 66 61 62 07 62 63 68 64 58 66 71 67 66 67 08 65 64 64 62 64 72 63 71 68 66 09 62 63 68 65 64 67 62 64 64 63 10 67 68 70 64 70 67 63 68 70 63 11 67 67 65 72 70 64 62 66 66 70 12 68 64 70 65 64 69 61 66 62 67 13 62 65 65 60 63 61 67 67 64 65 14 67 65 67 61 63 66 71 64 60 61 • Discover effect of sample size on CI width 15 65 67 65 66 65 63 64 69 69 66 • Hypothesize about meaning of confidence 16 60 71 69 62 60 67 66 68 62 67 17 70 67 60 70 63 70 67 61 65 64 18 63 63 64 66 65 66 65 64 68 69 19 65 60 70 62 67 68 63 65 63 63 Random samples from this population • Sample sizes = 10 and 20 • Two samples of each size per group • Sample Mean Height • Calculate 90% CIs for Population Mean Height • Conduct Hypothesis Test for Population Mean Height Under True Null Hypothesis (α = 0.1) Learning Outcomes • Discover variability of CI • Hypothesize about Type I error and alpha level Confidence Interval Script: 100 95% CIs for the Population Mean Height 70 • Sample from Larger Population 69 • 80%, 90%, 95% CIs 68 • Population Mean Height of Females 67 Y Replicates Hands-on Activity 66 Example Coverage Rates 65 • 80% CI – 84/100 64 • 90% CI – 91/100 63 • 95% CI – 95/100 62 0 20 40 60 80 100 Row s 95 out of 100 CIs Contain the True Population Mean Height Hypothesis Testing Script: Type I error Power Replicates Hands-on Activity Ho: μ = μFALSE vs. Ha: μ ≠ μFALSE Ho: μ = μTRUE vs. Ha: μ ≠ μTRUE Vary: sample size (5, 25, 50) Vary: sample size (5, 25, 50) alpha level (0.1, 0.05, 0.01) Value of μFALSE alpha level (0.1, 0.05, 0.01) 100 z-test statistics with sample size = 25 and α = 0.05 100 z-test statistics with sample size = 25 and α = 0.05 25 Count 10 15 10 5 -2 -1 0 1 2 3 4 out of 100 z-test statistics will reject Ho. 5 -4 -3 -2 -1 0 1 29 out of 100 z-test statistics will reject Ho. Count 20 15 Inference for the Proportion Hands-on Activity: Population of 200 Eye Colors Random samples from this population • Sample sizes = 10 and 20 0 1 2 3 4 5 6 7 00 Blue Brown Blue Brown Green Blue Brown Green 01 Hazel Green Blue Hazel Brown Blue Brown Brown Brown 02 Blue Brown Blue Brown Hazel • Two samples of each size per group Green Brown Brown Brown 04 Brown Other Blue Blue Hazel • Proportion of each sample with Blue Eyes 05 Brown Brown Brown Blue Blue 06 Green Blue Hazel Brown 07 Green Hazel Blue 08 Brown Hazel Brown 09 Blue Green Blue 10 Blue Brown Brown 11 Brown Blue 12 Blue 13 Proportion in population with Blue Eyes Learning Outcomes • Discover variability of CI • Discover effect of sample size on CI width • Hypothesize about meaning of confidence Blue Green Green Hazel Green Brown Hazel Green Blue Brown Blue Brown Blue Blue Green Green Blue Blue Blue Blue Hazel Brown Green Green Blue Brown Green Blue Blue Blue Brown Brown Hazel Brown Green Brown Other Brown Blue Blue Brown Hazel Blue Brown Brown Blue Green Brown Blue Blue Other Green Blue Hazel Green Brown Blue Hazel Blue Hazel Brown Other Blue Green Blue Blue Brown Hazel Brown Blue Hazel Brown Blue Green Blue 14 Brown Hazel Blue Hazel Hazel Blue Brown Blue Blue Brown 15 Brown Brown Hazel Hazel Green Brown Brown Brown Brown 16 Green Hazel Blue Green Brown Brown 17 Green Green Other Brown 18 Green Green Blue Blue Blue Brown 19 Brown Hazel Blue Blue Hazel Blue Blue Green Brown Brown 9 Green Brown Green Brown Brown Green 03 • Calculate 90% confidence intervals for 8 Hazel Green Brown Brown Green Blue Blue Blue Blue Green Brown Brown Hazel Brown Green Brown Brown Green Green Confidence Interval Script: Replicates Hands-On Activity 100 90% Confidence Intervals for Proportion of Population with Blue Eye Color • Sample from Larger Population 0.8 • 90%, 95%, 99% CI 0.7 • Proportion in Population with 0.6 0.5 Y Blue Eye Color 0.4 Example Coverage Rates 0.3 • 89/100 – 90% CI 0.2 • 97/100 – 95% CI 0.1 • 98/100 – 99% CI 0 0 20 40 60 80 100 Row s 89 of the 100 Confidence Intervals Contain the True Proportion of Population with Blue Eye Color Plus 4 Method Confidence Interval Script: 100 95% Traditional CIs for Proportion of Population with Hazel Eye Color • Sample from Larger Population 1.1 1 0.9 • Sample Size = 10 0.8 0.7 0.6 Y • 95% CI for Proportion in 0.5 0.4 0.3 • Population with Hazel Eye Color 0.2 0.1 0 -0.1 -0.2 0 20 40 60 80 100 Row s 81 of the 100 Traditional CIs Contain the True Proportion of Population with Hazel Eye Color Compare Two Methods • Traditional 100 95% Plus 4 Method CIs for Proportion of Population with Hazel Eye Color • Plus 4 Method 1 0.9 0.8 0.7 0.6 Y 0.5 Example Coverage Rates 0.4 0.3 0.2 0.1 • Traditional: 81/100 0 -0.1 -0.2 • Plus 4 Method: 91/100 0 20 40 60 80 100 Row s 91 of the 100 Plus 4 Method CIs Contain the True Proportion of Population with Hazel Eye Color Randomization in the Design of Experiments Hands-on Activity*: Comparison of Mean Yields of Two Corn Varieties Convenience Assignment Alternating Assignment A 130 A 149 A 139 B 155 B 137 B 145 A 130 B 137 A 139 B 155 A 149 B 145 A 149 A 133 A 152 B 131 B 147 B 136 B 137 A 133 B 140 A 143 B 147 A 148 A 141 A 156 A 137 B 146 B 132 B 148 A 141 B 144 A 137 B 146 A 144 B 148 A 150 A 142 A 155 B 136 B 152 B 133 B 138 A 142 B 143 A 148 B 152 A 145 A 139 A 155 A 139 B 147 B 137 B 153 A 139 B 143 A 139 B 147 A 149 B 153 A 155 A 138 A 150 B 137 B 145 B 136 B 143 A 138 B 138 A 149 B 145 A 148 No significant difference in mean yields between two varieties. No significant difference in mean yields between two varieties. The Importance of Random Assignment: The “True” Yields Per Plot for Each Variety One Random Assignment of Varieties to Plots A = 130 B = 118 A = 149 B = 137 A = 139 B = 127 A = 167 B = 155 A = 149 B = 137 A = 157 B = 145 B 118 A 149 A 139 A 167 A 149 A 157 A = 149 B = 137 A = 133 B = 121 A = 152 B = 140 A = 143 B = 131 A = 159 B = 147 A = 148 B = 136 B 137 B 121 B 140 A 143 B 147 B 136 A = 141 B = 129 A = 156 B = 144 A = 137 B = 125 A = 158 B = 146 A = 144 B = 132 A = 160 B = 148 B 129 B 144 A 137 B 146 B 132 A 160 A = 150 B = 138 A = 142 B = 130 A = 155 B = 143 A = 148 B = 136 A = 164 B = 152 A = 145 B = 133 B 138 A 142 A 155 B 136 A 164 A 145 A = 139 B = 127 A = 155 B = 143 A = 139 B = 127 A = 159 B = 147 A = 149 B = 137 A = 165 B = 153 A 139 A 155 A 139 A 159 B 137 A 165 A = 155 B = 143 A = 138 B = 126 A = 150 B = 138 A = 149 B = 137 A = 157 B = 145 A = 148 B = 136 B 143 B 126 B 138 B 137 B 145 A 148 Variety A > Variety B by 12 bushels in each plot. Significant difference in mean yields between two varieties. Hypothesis Testing Script**: 100 t-test statistics when true difference = 12 bushels Replicates Hands-on Activity • Random Assignments of Varieties to Plots 15 10 • Distribution of Sample Mean Differences Between 5 Varieties 2 3 4 5 6 7 99 out of 100 t-test statistics will reject Ho. • Number of Rejections of Null Hypothesis of Equal 100 t-test statistics when true difference = 6 bushels 25 • Vary: alpha level (0.05, 0.01) 20 15 10 true difference between Varieties A and B Count Means Count 20 5 0 1 2 3 4 5 43 out of 100 t-test statistics will reject Ho. 100 t-test statistics when true difference = 3 bushels Example Rejection Rates (α = 0.05) • 99/100 – True Difference = 12 15 10 • 43/100 – True Difference = 6 • 13/100 – True Difference = 3 5 -2 -1 0 1 2 3 13 out of 100 t-test statistics will reject Ho. * Original Activity Developed by W. Robert Stephenson and Hal Stern. See their article in STATS, Spring 2000, No. 28, 23-27. ** Programming Assistance provided by Mark Bailey, SAS Institute, Inc. Count 20