Demonstration Applets for Introductory Statistics

advertisement
A Collection of Demonstration Applets for Introductory Statistics
The following is a description of a collection of applets that help demonstrate some of the
important ideas in introductory statistics. Most applets can be used for simple in-class
demonstrations in classrooms with computer projection capability. In addition to the
descriptions, teaching strategies have been suggested for some applets. Three of the
applets have an associated investigation activity that can be completed by students either
in a lab setting or working on their own.
Histograms and Bin Widths
http://www.rossmanchance.com/applets
Click on Histogram Bin Width
Pick a data set from the drop down menu. Use the red slider to manipulate the bin width.
Notice how changing the bin width can result in substantially different looking
histograms. As a result, students should be cautioned about making conclusions
concerning the shape of a distribution based only on a histogram.
New data can be graphed by clicking the Edit Data button, pressing delete to clear the
existing data, and entering new data values one-by-one.
Least Squares Regression Demonstration
http://www.dynamicgeometry.com/javasketchpad/gallery/pages/least_squares.php
This applet can be used to demonstrate the least-squares criterion and to help students
visualize how only one line “best fits” the data. The six points labeled P1 through P6
represent data points. A line is drawn through the points, and from each data point a
vertical segment is drawn to the line and a square is constructed. The red square in the
lower right corner represents the total area of the six squares (i.e., the sum of the squared
vertical distances.) By clicking on the red dots labeled “y-intercept” and “slope”, the
student can adjust the y-intercept and slope of the line in an attempt to find the one that
minimizes the sum of the areas of the squares. The line that yields the absolute minimum
is the least squares regression line for the data. (This applet does not have the option to
display the true least squares line.)
It is also possible to move the points P1 – P6, by clicking on the red dot on the square.
Regression by Eye
http://onlinestatbook.com/stat_sim/reg_by_eye/index.html
In general, you can get to this and other demonstration applets by going to
http://onlinestatbook.com/rvls
Michael Legacy
NCSSM Summer Writing Project 2007
This applet lets you estimate the regression line by placing the mouse at a starting
position, holding it down and drawing a line. When you release the mouse button, the
mean square error MSE (which is the average squared deviation of points from the line)
is displayed. You can draw another line and see if you can lower the MSE. Each line
you draw is in a different color, as is the matching MSE. At any time, you can see the
MSE for the best fitting line, by selecting the "Show Minimum MSE" option.
To see the best fitting line displayed (by the criterion of least squares), select the "Draw
regression line" option. This line will be in black.
Five possible values for the correlation are listed. Students can guess which one of them
is the correlation for the data displayed in the scatterplot. To see the correct value, click
on the "Show r" button.
Click the "New Data" button to try again with a new data set.
Teaching strategy: Let students attempt to draw the least squares line; often they will
begin by drawing a line that approximates the major axis of the ellipse formed by the
data. Repeated efforts demonstrate that making the line less steep so that it better
approximates the mean y-value for each x-value reduces the MSE.
Demonstrating Influential Points
http://www.stat.sc.edu/~west/javahtml/Regression.html
The applet is designed to help students visualize how adding (or moving) a point can
affect the regression line. Points that do this are called influential. Original points are
given along with the resulting least-squares regression line and the correlation (in black).
A new point may be added to the plot by clicking the mouse button somewhere within the
graph space. Resulting changes to the line, the regression equation, and correlation are
given in red.
Teaching strategy: Let students play with adding points (or you add points if you are
doing this in a classroom setting) and have them summarize the results (effect on slope
and correlation) when points are added near the center of the data range versus when
points are added far from the center in the x-direction. The students should see that
adding a point near the regression line barely changes the existing line. Now add a point
near the center of the data range (in the x-direction), but fairly far above the existing line.
Notice that the line moves upward but stays somewhat parallel to the existing line. Now
add a point at the edge of the data (in the region of minimum and maximum x-values) and
far from the line. This additional point can substantially change the slope of the
regression line.
Also have students add a point in line with the existing regression line, but outside the
data range. Does the correlation increase or decrease with the addition of this point? Does
the slope of the regression line change substantially? [The slope should not change much
but the correlation should increase.]
Michael Legacy
NCSSM Summer Writing Project 2007
Scatterplots, Correlation, and Influential Points (leverage)
http://noppa5.pc.helsinki.fi/koe/flash/corr/ch16i.html
This applet lets you vary quite a few things. The left slider allows you to choose the
sample size and the right slider lets you select the correlation. Note: It is very useful to
check the rollover help box, which provides information about the sliders and points.
Putting your arrow on a point, for example, gives you the coordinates. Note that the
coordinates of all points have been standardized as z-values.
Some suggested explorations:
1. Set the sample size slider at 50 to start. Click new sample. Move the correlation
slider up and down to watch the changes in the data set. Vary the sample sizes.
2. Move the correlation slider to some value, say 0.8. Repeatedly click the new
sample button. Note that lots of different samples can have the same correlation.
3. Click on a point and drag it. Note how the correlation slider moves to reflect the
change. The correlation value is given above the graph.
4. Check on the Leverage box below the graph. The leverage indicates how much
influence a point has on the location of the line. Note that points with larger
circles are more influential. As you move points further out in the x-direction,
they become more influential. A leverage value is given in the rollover box for
each point. This value is a function of the x-location. (It is not important to know
how to calculate this value.)
Probability (long-run relative frequency)
http://bcs.whfreeman.com/pbs/
Click on Statistical Applets and then Probability
When you toss a coin, there are only two possible outcomes, heads or tails. In this applet
you can toss a coin multiple times and observe the long run behavior of the proportion of
heads. Set the number of tosses and click the Toss button to randomly toss the coins. The
vertical heads/tails bar shows you the cumulative proportions. The graph shows the
cumulative proportion of heads after each toss.
Notice how the proportion of tosses that produce heads (or tails) can be quite variable at
first, but will eventually converge to the true probability. Note as well that there can be
long runs of one outcome. For example, it would not be all that unusual for a long run of
heads to occur in 1000 tosses of a coin. You can also change the probability of getting
heads by entering a new value and clicking Reset.
Michael Legacy
NCSSM Summer Writing Project 2007
Using a Normal Approximation for the Binomial Distribution
http://www.whfreeman.com/tps3e
Click on Statistical Applets and then Central Limit Theorem Normal Approximation to
Binomial Distributions
As n increases, the binomial distribution with n trials and probability of success p can be
better and better approximated by a normal distribution. This applet uses sliders to
change both n and p. You can click and drag a slider with the mouse.
The histogram shows the binomial probabilities. The vertical black line marks the mean µ
= np. The red curve is the normal density curve with the same mean and standard
deviation as the B(n,p) distribution ( µ = np and σ = np (1 − p ) ). Use the slider to set
lower and higher values of p.
Teaching strategy: Let students investigate various values for n and p. What values of n
and p seem to yield a binomial distribution that can be reasonably approximated by a
normal distribution? Students should note that the binomial probability distribution
becomes more skewed as p moves farther away from 0.5. In order for the binomial
probability histogram to look more like a normal curve, students should discover the need
to increase n as p becomes more extreme.
This activity helps explain why the conditions check on large sample size for inference
on proportions involves verifying that np > 10 and n(1 − p) > 10 . Note that some books
suggest checking against 5 or 15.
Investigating Sampling Distributions
This investigation directs students to explore various aspects of sampling distributions
using the Rice Virtual Lab in Statistics simulation at www.onlinestatbook.com/rvls. See
activity titled Investigations of sample statistic distributions using an on-line simulator.
Investigating a Sampling Distribution of Sample Proportions
http://www.rossmanchance.com/applets. Click on Reeses Pieces
This applet lets you build a sampling distribution of sample proportions using Reeses
Pieces. The parameter θ represents the proportion of orange candies in the population in
the candy machine. To start, set θ =.6, a sample size of n = 10 , and the num samples to
be drawn at 1. Click the Animate box and then click the Draw Samples button. Ten
candies will fall from the machine into the bins below sorted by color. The red p̂ is the
proportion of the sample that is orange. The proportion of orange candies in the sample is
Michael Legacy
NCSSM Summer Writing Project 2007
also shown on the dotplot. Click Draw Samples a few more times to begin building the
sampling distribution of sample proportions. (The use of the Animate button does slow
this procedure down for students to actually see the process of building the distribution.)
Change the num samples to 10 and click Draw Samples. Once students are clear on the
process, deselect the Animate button and change num samples to 50. Also select the plot
normal curve box. Click Draw Samples several times. How closely is the sampling
distribution of sample proportions approximated by the normal curve?
Teaching Strategy: Vary the investigation with different values of n and θ . Click Reset
each time you want to clear the sampling distribution dotplot. Have students determine
what values of n and θ yield a sampling distribution that is approximately normal.
Understanding Confidence Level
http://www.whfreeman.com/tps3e
Click on Statistical Applets and then Confidence Interval
Note: It is important that students understand the concept of a sampling distribution
before introducing them to confidence intervals. Good activities for understanding
sampling distributions are the two previous investigations.
This applet can be used to develop a basic understanding of a confidence level for
constructing a confidence interval to estimate the mean (µ) of a population. Set the
desired confidence level by clicking on the radio buttons to the left of the plot. Start with
80%. Click the Sample button for a single SRS. Do this a few times. Each click
represents drawing one sample. The dot marks the sample mean, which is the center of
the interval. The lines on each side of the dot span the confidence interval. Emphasize
that each sample drawn from the population results in an x for that sample along with an
interval estimate of the population mean µ and that a different sample will yield a
different estimate of µ .
Note that once in a while an interval “misses” the mean (µ). This miss will show up in
Red. The accumulator on the right hand side will record the total number of samples
drawn and the number of times the interval correctly contains the population mean µ.
Click the Sample 50 button. Do this enough times to accumulate a few hundred samples.
Note that the percent of “hits” will hover around 80%. For this example, on average, in
repeated sampling, 80% of the samples drawn will yield an interval that correctly
estimates the mean of the population. Caution: Students are tempted to phrase this as a
probability. Not so! The probability that you have a correct estimate is either 0 or 1 (you
have correctly estimated µ or you have not.) Since µ is unknown, there is no way of
knowing for sure. That is why it is expressed as a confidence in the estimate.
Have students conjecture about the effect of increasing the confidence level to 90%, 95%,
and 99%. Will the true mean be captured more often, less often, or will the frequency
remain the same? Using the same screen you currently have, click through the radio
buttons, increasing the confidence level as you go to observe how the lengths of the
Michael Legacy
NCSSM Summer Writing Project 2007
confidence intervals change. Have students work though the previous exercise using
different confidence levels.
Investigating Errors and Power in Significance Tests for Means ( σ is known)
Starting with a problem situation about a packaging machine, students simulate the
probability of committing Type I and II errors by using a power applet at
http://wise.cgu.edu/power_applet.asp. In addition, students investigate the power of a test
by changing the α -value, the sample size, and the difference between the hypothesized
mean and the true mean. See the activity Investigating Errors and Power in Significance
Tests for Means ( σ is known).
Sampling Regression Lines Activity
This investigation lets students examine the sampling distribution of sample slopes for a
linear regression. The applet allows the student to first look at possible sample slopes one
at a time. The student then draws 100 samples to examine the sampling distribution of
sample slopes. The student varies sample size, the population standard deviation σ , and
the spread of the x-values, to determine their effect on the variability of the sample
slopes. See the activity Sampling Regression Lines Using an On-Line Simulation.
Michael Legacy
NCSSM Summer Writing Project 2007
Download