A Collection of Demonstration Applets for Introductory Statistics The following is a description of a collection of applets that help demonstrate some of the important ideas in introductory statistics. Most applets can be used for simple in-class demonstrations in classrooms with computer projection capability. In addition to the descriptions, teaching strategies have been suggested for some applets. Three of the applets have an associated investigation activity that can be completed by students either in a lab setting or working on their own. Histograms and Bin Widths http://www.rossmanchance.com/applets Click on Histogram Bin Width Pick a data set from the drop down menu. Use the red slider to manipulate the bin width. Notice how changing the bin width can result in substantially different looking histograms. As a result, students should be cautioned about making conclusions concerning the shape of a distribution based only on a histogram. New data can be graphed by clicking the Edit Data button, pressing delete to clear the existing data, and entering new data values one-by-one. Least Squares Regression Demonstration http://www.dynamicgeometry.com/javasketchpad/gallery/pages/least_squares.php This applet can be used to demonstrate the least-squares criterion and to help students visualize how only one line “best fits” the data. The six points labeled P1 through P6 represent data points. A line is drawn through the points, and from each data point a vertical segment is drawn to the line and a square is constructed. The red square in the lower right corner represents the total area of the six squares (i.e., the sum of the squared vertical distances.) By clicking on the red dots labeled “y-intercept” and “slope”, the student can adjust the y-intercept and slope of the line in an attempt to find the one that minimizes the sum of the areas of the squares. The line that yields the absolute minimum is the least squares regression line for the data. (This applet does not have the option to display the true least squares line.) It is also possible to move the points P1 – P6, by clicking on the red dot on the square. Regression by Eye http://onlinestatbook.com/stat_sim/reg_by_eye/index.html In general, you can get to this and other demonstration applets by going to http://onlinestatbook.com/rvls Michael Legacy NCSSM Summer Writing Project 2007 This applet lets you estimate the regression line by placing the mouse at a starting position, holding it down and drawing a line. When you release the mouse button, the mean square error MSE (which is the average squared deviation of points from the line) is displayed. You can draw another line and see if you can lower the MSE. Each line you draw is in a different color, as is the matching MSE. At any time, you can see the MSE for the best fitting line, by selecting the "Show Minimum MSE" option. To see the best fitting line displayed (by the criterion of least squares), select the "Draw regression line" option. This line will be in black. Five possible values for the correlation are listed. Students can guess which one of them is the correlation for the data displayed in the scatterplot. To see the correct value, click on the "Show r" button. Click the "New Data" button to try again with a new data set. Teaching strategy: Let students attempt to draw the least squares line; often they will begin by drawing a line that approximates the major axis of the ellipse formed by the data. Repeated efforts demonstrate that making the line less steep so that it better approximates the mean y-value for each x-value reduces the MSE. Demonstrating Influential Points http://www.stat.sc.edu/~west/javahtml/Regression.html The applet is designed to help students visualize how adding (or moving) a point can affect the regression line. Points that do this are called influential. Original points are given along with the resulting least-squares regression line and the correlation (in black). A new point may be added to the plot by clicking the mouse button somewhere within the graph space. Resulting changes to the line, the regression equation, and correlation are given in red. Teaching strategy: Let students play with adding points (or you add points if you are doing this in a classroom setting) and have them summarize the results (effect on slope and correlation) when points are added near the center of the data range versus when points are added far from the center in the x-direction. The students should see that adding a point near the regression line barely changes the existing line. Now add a point near the center of the data range (in the x-direction), but fairly far above the existing line. Notice that the line moves upward but stays somewhat parallel to the existing line. Now add a point at the edge of the data (in the region of minimum and maximum x-values) and far from the line. This additional point can substantially change the slope of the regression line. Also have students add a point in line with the existing regression line, but outside the data range. Does the correlation increase or decrease with the addition of this point? Does the slope of the regression line change substantially? [The slope should not change much but the correlation should increase.] Michael Legacy NCSSM Summer Writing Project 2007 Scatterplots, Correlation, and Influential Points (leverage) http://noppa5.pc.helsinki.fi/koe/flash/corr/ch16i.html This applet lets you vary quite a few things. The left slider allows you to choose the sample size and the right slider lets you select the correlation. Note: It is very useful to check the rollover help box, which provides information about the sliders and points. Putting your arrow on a point, for example, gives you the coordinates. Note that the coordinates of all points have been standardized as z-values. Some suggested explorations: 1. Set the sample size slider at 50 to start. Click new sample. Move the correlation slider up and down to watch the changes in the data set. Vary the sample sizes. 2. Move the correlation slider to some value, say 0.8. Repeatedly click the new sample button. Note that lots of different samples can have the same correlation. 3. Click on a point and drag it. Note how the correlation slider moves to reflect the change. The correlation value is given above the graph. 4. Check on the Leverage box below the graph. The leverage indicates how much influence a point has on the location of the line. Note that points with larger circles are more influential. As you move points further out in the x-direction, they become more influential. A leverage value is given in the rollover box for each point. This value is a function of the x-location. (It is not important to know how to calculate this value.) Probability (long-run relative frequency) http://bcs.whfreeman.com/pbs/ Click on Statistical Applets and then Probability When you toss a coin, there are only two possible outcomes, heads or tails. In this applet you can toss a coin multiple times and observe the long run behavior of the proportion of heads. Set the number of tosses and click the Toss button to randomly toss the coins. The vertical heads/tails bar shows you the cumulative proportions. The graph shows the cumulative proportion of heads after each toss. Notice how the proportion of tosses that produce heads (or tails) can be quite variable at first, but will eventually converge to the true probability. Note as well that there can be long runs of one outcome. For example, it would not be all that unusual for a long run of heads to occur in 1000 tosses of a coin. You can also change the probability of getting heads by entering a new value and clicking Reset. Michael Legacy NCSSM Summer Writing Project 2007 Using a Normal Approximation for the Binomial Distribution http://www.whfreeman.com/tps3e Click on Statistical Applets and then Central Limit Theorem Normal Approximation to Binomial Distributions As n increases, the binomial distribution with n trials and probability of success p can be better and better approximated by a normal distribution. This applet uses sliders to change both n and p. You can click and drag a slider with the mouse. The histogram shows the binomial probabilities. The vertical black line marks the mean µ = np. The red curve is the normal density curve with the same mean and standard deviation as the B(n,p) distribution ( µ = np and σ = np (1 − p ) ). Use the slider to set lower and higher values of p. Teaching strategy: Let students investigate various values for n and p. What values of n and p seem to yield a binomial distribution that can be reasonably approximated by a normal distribution? Students should note that the binomial probability distribution becomes more skewed as p moves farther away from 0.5. In order for the binomial probability histogram to look more like a normal curve, students should discover the need to increase n as p becomes more extreme. This activity helps explain why the conditions check on large sample size for inference on proportions involves verifying that np > 10 and n(1 − p) > 10 . Note that some books suggest checking against 5 or 15. Investigating Sampling Distributions This investigation directs students to explore various aspects of sampling distributions using the Rice Virtual Lab in Statistics simulation at www.onlinestatbook.com/rvls. See activity titled Investigations of sample statistic distributions using an on-line simulator. Investigating a Sampling Distribution of Sample Proportions http://www.rossmanchance.com/applets. Click on Reeses Pieces This applet lets you build a sampling distribution of sample proportions using Reeses Pieces. The parameter θ represents the proportion of orange candies in the population in the candy machine. To start, set θ =.6, a sample size of n = 10 , and the num samples to be drawn at 1. Click the Animate box and then click the Draw Samples button. Ten candies will fall from the machine into the bins below sorted by color. The red p̂ is the proportion of the sample that is orange. The proportion of orange candies in the sample is Michael Legacy NCSSM Summer Writing Project 2007 also shown on the dotplot. Click Draw Samples a few more times to begin building the sampling distribution of sample proportions. (The use of the Animate button does slow this procedure down for students to actually see the process of building the distribution.) Change the num samples to 10 and click Draw Samples. Once students are clear on the process, deselect the Animate button and change num samples to 50. Also select the plot normal curve box. Click Draw Samples several times. How closely is the sampling distribution of sample proportions approximated by the normal curve? Teaching Strategy: Vary the investigation with different values of n and θ . Click Reset each time you want to clear the sampling distribution dotplot. Have students determine what values of n and θ yield a sampling distribution that is approximately normal. Understanding Confidence Level http://www.whfreeman.com/tps3e Click on Statistical Applets and then Confidence Interval Note: It is important that students understand the concept of a sampling distribution before introducing them to confidence intervals. Good activities for understanding sampling distributions are the two previous investigations. This applet can be used to develop a basic understanding of a confidence level for constructing a confidence interval to estimate the mean (µ) of a population. Set the desired confidence level by clicking on the radio buttons to the left of the plot. Start with 80%. Click the Sample button for a single SRS. Do this a few times. Each click represents drawing one sample. The dot marks the sample mean, which is the center of the interval. The lines on each side of the dot span the confidence interval. Emphasize that each sample drawn from the population results in an x for that sample along with an interval estimate of the population mean µ and that a different sample will yield a different estimate of µ . Note that once in a while an interval “misses” the mean (µ). This miss will show up in Red. The accumulator on the right hand side will record the total number of samples drawn and the number of times the interval correctly contains the population mean µ. Click the Sample 50 button. Do this enough times to accumulate a few hundred samples. Note that the percent of “hits” will hover around 80%. For this example, on average, in repeated sampling, 80% of the samples drawn will yield an interval that correctly estimates the mean of the population. Caution: Students are tempted to phrase this as a probability. Not so! The probability that you have a correct estimate is either 0 or 1 (you have correctly estimated µ or you have not.) Since µ is unknown, there is no way of knowing for sure. That is why it is expressed as a confidence in the estimate. Have students conjecture about the effect of increasing the confidence level to 90%, 95%, and 99%. Will the true mean be captured more often, less often, or will the frequency remain the same? Using the same screen you currently have, click through the radio buttons, increasing the confidence level as you go to observe how the lengths of the Michael Legacy NCSSM Summer Writing Project 2007 confidence intervals change. Have students work though the previous exercise using different confidence levels. Investigating Errors and Power in Significance Tests for Means ( σ is known) Starting with a problem situation about a packaging machine, students simulate the probability of committing Type I and II errors by using a power applet at http://wise.cgu.edu/power_applet.asp. In addition, students investigate the power of a test by changing the α -value, the sample size, and the difference between the hypothesized mean and the true mean. See the activity Investigating Errors and Power in Significance Tests for Means ( σ is known). Sampling Regression Lines Activity This investigation lets students examine the sampling distribution of sample slopes for a linear regression. The applet allows the student to first look at possible sample slopes one at a time. The student then draws 100 samples to examine the sampling distribution of sample slopes. The student varies sample size, the population standard deviation σ , and the spread of the x-values, to determine their effect on the variability of the sample slopes. See the activity Sampling Regression Lines Using an On-Line Simulation. Michael Legacy NCSSM Summer Writing Project 2007