Statistics Applets on the Web. Compiled by Dex

advertisement
Statistics Applets on the Web. Compiled by Dex Whittinghill, Department of Mathematics, Rowan
University. Checked February 4, 2011.
Terminology has roots in Moore & McCabe’s Introduction to the Practice of Statistics, 3rd or 4th editions
(Freeman), but I tried to keep it general. I tried to follow the standard introductory courses (elementary
statistics and statistics I, as well as statistics II), save maybe the location of regression.
‘Parent’ sites from GAISE panel at Salt Lake City in August 2007, and more
Rice Virtual Lab in Statistics: http://onlinestatbook.com/rvls.html
http://onlinestatbook.com/stat_sim/index.html.
and
Rossman/Chance Applet Collection: http://www.rossmanchance.com/applets/index.html.
Stat Istics: http://istics.net/stat/
University if Ilinois – Urbana Campus/CUWU Statistics Program:
http://www.stat.uiuc.edu/courses/stat100/cuwu/
Web Interface for Statistics Education (WISE): http://wise.cgu.edu/applets.asp.
Moore’s Basic Practice of Statistics (4th edition) collection: http://bcs.whfreeman.com/bps4e. there is
probably one for the 5th edition, but as I am not teaching Elementary Statistics this semester, I feel
guilty telling you this much!
Data
Distributions of variables
Displaying Distributions with Graphs
Categorical variables
Bar Graphs
Pie Graphs
Numerical variables
Boxplots
Stemplots
Histograms
Check the effect of changing the bin/interval size on ‘look’ of a histogram. It is
data from Old Faithful geyser (Webster West - S. Carolina)
http://www.stat.sc.edu/~west/javahtml/Histogram.html
Check the effect of changing the bin/interval size on ‘look’ of a histogram. The
first URL bounces you to West’s site. There are two sets of data for the second.
(Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.)
http://www.rossmanchance.com/applets/Histogram.html
XXXXXXX I might not have the latest JAVA on my office machine
Describing Distributions with Numbers
Measures of Center/Location
Mean
Median
Mode
Mean vs. Median
Compare these measures of center as you add and subtract data, and see what the
standard deviation is. This is visually more complicated. (Rice Virtual Lab in
Statistics)
http://www.ruf.rice.edu/~lane/stat_sim/descriptive/index.html
Measures of Spread/Variability/Variation
Range
Interquartile Range
Standard Deviation
Interquartile Range vs. Standard Deviation
The Normal Distribution
Get different normal curves by changing the mean and the standard deviation.
(Balasubramanian Narasimhan, of Stanford U., 1996)
http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html
Relationships between variables
Numerical vs. Categorical
Back-to-back stemplot
Side-by-side boxplots
Numerical vs. numerical
Scatterplots
Correlation
Guessing Correlations (matching) Game. Click on ‘New Plots,’ do the matching
question, and then see how well you did. Note that you will eventually get two of the
correlations that are very close (or even exactly the same!). (CUWU Statistics Program
at/of the U. of Illinois, Champaign-Urbana.)
http://istics.net/stat/Correlations
Guess the correlation. Get a set of points on a scatterplot (-gram) and guess the
correlation. Click on ‘New Data’ and put your guess in the window. This is harder than
the previous applet … but you don’t get graded! (Rossman/Chance Applet Collection.
Programmers: Garcia, Lima, Chance.)
http://www.rossmanchance.com/applets/guesscorrelation/GuessCorrelation.html
Linear Regression or Linear Fit
Draw the least-squares/regression line ‘by eye’ and compare to the actual leastsquares/regression line, or first draw several regression lines and how small you can
make the sum-of-squares error (SSE). Click ‘down’ on the left, drag the mouse to the
right, and let go to get your guessed regression line. Your mean square error (MSE) will
appear on the right. (Rice Virtual Lab in Statistics. Note: Netscape 4.06 or better is
required for Java 1.1)
http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html
Draw the least-squares/regression line ‘by eye’ and compare top the actual leastsquares/regression line. When you Click on the “Your line” square and the “Move Line”
button you move the line around and see the sum of the absolute value of the
errors/residuals (SAE, click ‘Show residuals’), and the sum of the squared errors (SSE,
click ‘Show squared residuals’). When you are ready click “Done” under “Your line”
and then click on the “Regression line” square so you can see the SAE, SSE, the
Correlation, and R2 = r2. You can also add or remove points. This visually shows the
principle of least squares nicely. (Rossman/Chance Applet Collection. Programmers:
Garcia, Lima, Chance.)
http://www.rossmanchance.com/applets/Reg/index.html
Exponential Fit
Spearman’s Rank Correlation
Nonparametric Fits
Categorical vs. Categorical
Simpson’s paradox - cautions about concluding causation
Producing Data
Introduction: Available Data, Observational Studies, Sampling, Experiments
Designing Samples
Examples of Bad Sampling
The Simple Random Sample
Other Good Sampling Techniques
Designing Experiments
Randomized Comparative Experiments
Block Designs and Two-Factor Designs
Probability
The Study of Randomness
Law of Large Numbers
Probability Models
Find standard normal probabilities (scroll down a bit), instead of using a standard normal (or ztable). This gives you the ability to enter numbers more accurate than the hundredths digit, and
your answer will be accurate (I think) to more than four (4) decimal places. (Duke U. Java
Applets, from various locations)
http://www-stat.stanford.edu/~naras/jsm/FindProbability.html
Simulate the sampling distribution for the sample sum when you roll a set of dice.
Unfortunately you can’t “Reset” this one. (Webster West - S. Carolina)
http://www.stat.sc.edu/~west/javahtml/CLT.html
Compare the t-distribution and the normal distribution for 1 to 49 degrees of freedom. Notice
that the t-distribution is not that different. (Duke U. Java Applets, from Balasubramanian
Narasimhan, of Stanford U.)
http://www-stat.stanford.edu/~naras/jsm/TDensity/TDensity.html
Random Variables
Look at the ‘spike diagram’ of a binomial distribution for n = 1 to 99 (20 practically speaking)
and p = .01 to .99. Note that the graph only shows you the probabilities for k = 0 to 20, so it is
most useful if you stick to n < 20. (Duke U. Java Applets, from various locations)
http://www-stat.stanford.edu/~naras/jsm/example5.html
[ On Probation Normal approximation to a Binomial r.v.. See how well a normal distribution
matches the shape of the ‘spike diagram’ for a binomial distribution. (Rice Virtual Lab in
Statistics).
http://www.ruf.rice.edu/~lane/stat_sim/normal_approx/index.html
Means and Variances of Random Variables
From Probability to Inference
Sampling Distributions for Counts and Proportions
See the sampling distribution of the proportion of successes, using orange Reese’s Pieces
as the successes. Adjust the true proportion (theta), the size of the sample, and the number of
repetitions of the sample. (Rossman/Chance Applet Collection. Programmers: Garcia, Lima,
Chance.)
http://statweb.calpoly.edu/chance/applets/Reeses/ReesesPieces.html
The Sampling Distribution of a Sample Mean
Simulate the sampling distribution for the sample mean with the date on a US penny
sampled from a fixed population. Use this to supplement the/our class exercise from
Activity-Based Statistics, by Scheaffer, Gnanadesikan, Watkins & Witmer.
(Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.)
http://statweb.calpoly.edu/chance/applets/SampleData/SampleData.html
Simulate the sampling distribution for the sample mean (or median, or s.d.). (Rice Virtual
Lab in Statistics).
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
The Sampling Distribution of a Sample Proportion or a Sample Mean
Simulate the sampling distribution for the mean (mean number of years in Senate) and
proportion (proportion of republicans and proportion of females). (Rossman/Chance Applet
Collection. Programmers: Garcia, Lima, Chance.)
http://statweb.calpoly.edu/chance/applets/senators/samplesenators.html
Simulate the sampling distribution for the sample mean or sum from a distribution of your
choice (including the rolls of a die), or the sample proportion (i.e., a coin toss where you
pick the P(head)). This is complicated enough so that I will give you extra instructions. (The
CUWU Statistics Program)
http://www.stat.uiuc.edu/courses/stat100//cuwu/
Click on Box Models.
For proportions. For spinning a US penny from the 70’s, say, or click on the radio
button for Coin. Then use the window to the right to choose your “% of heads.” You
will see that the software uses 0 for tails (failure) and 1 for heads (success), respectively.
Click on Sums and Averages. The “# of Draws per Experiment” is the n (try 50). The
“# Experiments” is the number of repetitions of n = 50 you do (try 100).
For means. Click on the radio button for Other and put the values of the data in the
Value column. Then put numbers in the Count column to define the relative proportions
(if you Click on the Die radio button, you will see what I mean). Click on Sums and
Averages. The “# of Draws per Experiment” is the n. The “# of Experiments” is the
number of repetitions of n.
The Central Limit Theorem (see Sampling Distributions above)
Interference
Inference for the mean of a population
Confidence Intervals (CIs)
[ On Probation Simulate confidence intervals for the population proportion. When you
click on “Begin” you will get to choose the number of data in your sample for calculating the
confidence interval (n, but called N), the true proportion of successes in the population (Pi),
the number confidence intervals (or “Simulations”), and the confidence level (90, 95 or 99%).
There is no useful visual display, but you get the proportion of confidence intervals that
contain Pi. (Rice Virtual Lab in Statistics).
http://www.ruf.rice.edu/~lane/stat_sim/normal_approx_conf/index.html
Simulate confidence intervals for the population proportion. First click on “Box
Models”, and then on the “Other” radio button. Suppose you want the probability of a
success, p, to be .47 (or 47%). In the first yellow “Count” box type “47”. In the “Value” box
just to the right, type “1”. In the second yellow “Count” box type “53” (from 100-47). In the
“Value” box just to the right, type “0”. Click on the blue “Accept Box” button at the bottom.
This will give you the distribution of the population with 47% successes, and 53% failures.
Click on the gray “Confidence Intervals” button at the top to get to the actual simulator.
Choose the sample size (#Draws per Experiment), the confidence level (the next box to
the right), and the number of repetitions of ‘taking the sampling size’ (# Experiments).
(CUWU Statistical Program)
http://www.stat.uiuc.edu/courses/stat100//cuwu/
Simulate confidence intervals for the population mean. First click on “Box Models”, and
then on the “Die” radio button (unless you want to choose other and put in a custom
distribution). This will give you the equivalent of rolling a fair die, with each value 1, 2, 3, 4,
5 and 6 having the same weights (each has count 1). Click on the blue “Accept Box” button
at the bottom. Then click on the gray “Confidence Intervals” button at the top to get to the
actual simulator.
Choose the sample size (#Draws per Experiment), the confidence level (the next box to
the right), and the number of repetitions of ‘taking the sampling size’ (# Experiments).
(CUWU Statistical Program)
http://www.stat.uiuc.edu/courses/stat100//cuwu/
Simulate confidence intervals for the population mean. Repeat taking a sample of size 10,
15 or 20 from a (normal?) population with mean 50 and standard deviation 10, and see what
proportion of the resulting confidence intervals contain the true mean. Both 95% and 99%
confidence intervals calculated together. (Rice Virtual Lab in Statistics).
http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html
Simulate confidence intervals for the population proportion (Wald interval) or the
population mean. You can take up to 300 samples of size n (wide choice of values), from a
population with the proportion of success of your choice (using the Wald interval), or a
(normal?) population with mean and standard deviation of your choice. See what proportion
of the resulting confidence intervals contain the true proportion/mean. Choose your own
confidence level. The confidence interval for the proportion is not the classical one.
(Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.)
http://statweb.calpoly.edu/chance/applets/Confsim/Confsim.html
and a newer version
http://www.rossmanchance.com/applets/NewConfsim/Confsim.html
Tests of Significance/Hypothesis test
Type I and Type II errors, their probabilities, and the Power of the test. This site allows
you to look at the Type I error and its probability α (shaded in green), the Type II error and
its probability β (shaded in red) and the Power of a test (= 1-β, shaded in blue). The
hypothesized mean is μ0 = 10, the alternative hypothesis is “greater than 10” (or right tailed).
You can alter the a, the sample size n, and the distance between the hypothesized mean μ0 =
10, a “true mean” greater than 10 (μ1), with the distance being Δ.
The theoretical details on the “cover” page are worth reading, even if you skip that set of
probability formulas that appear after the paragraph beginning with the word “Power”.
Notice the large two-by-two table with the “State of Nature” at the top (with the two
possibilities for the truth about μ), and the “Decision” on the left (with the two possibilities
for what the data will tell us to conclude about μ).
Scroll to the bottom of this page to the little window where it says “Select an applet.”
Choose “All of the above,” and click on the “Open!” button. You will get a new window
with two blank spaces, and three slider bars on the right, marked DELTA (Δ), ALPHA (α)
and SAMPLE SIZE (n). [If at any time you want to be restricted to changing only one of Δ,
α or n, you will see that you have that option.]
Top Window = Null Distribution. This will give the distribution of the sample statistic,
x , assuming that H0 is true, and μ0 = 10. [This will not change in location, but will change
shape as you change n.]
Bottom Window = Alternative Distribution. This will give the distribution of the sample
statistic, x , assuming that Ha is true, or the population mean, μ, is a some value > 10. The
particular value of μ under consideration will be called μ1. [This will change in location as
you change Δ and change in shape as you change n.]
http://www.amstat.org/publications/jse/v11n3/java/Hypothesis/
Type I and Type II Errors, their probabilities, and the Power of the test. In the top of
the left frame you will see the distribution of the hypothesized population (individuals, with
mean μ0 = 100) outlined in blue, and the distribution of the population (individuals) under a
particular mean (μ1) in the region of the alternative hypothesis outlined in red. In the bottom
of the left frame has the sampling distributions of the sample mean x for each situation, and
shaded in the appropriate color. The dark blue represents the probability of the Type I error
(α), and the dark red represents the probability of the Type II error (β). The light red
represents the Power (1-β).
This site allows you to change the mean (μ1) in the region of the alternative hypothesis by
dragging the red normal curve in the left pane, and changing the sample size (n) in the right
pane. This really no more flexible than the previous applet, and maybe more complicated
(stay away from the “Sample” button for now). Is it cuter? (WISE: Web Interface for
Statistics Education):
http://wise.cgu.edu/powermod/power_applet.asp
Comparing Two Means
Comparing Two proportions
Inference for Counts
One-way Chi-square tests (including goodness of fit)
Two-way Chi-square tests
Inference for Regression
Simple Linear Regression
The true/population line VS. the estimated regression line. Choose a sample of points
from the population of points to make/calculate an estimated regression line. The population
points are blue and the true/population regression line is yellow. When a sample of points is
drawn they are red, and the corresponding estimated regression line is red.
Set the yellow population regression first by choosing the line slope and intercept. Set
the center of the cloud with “x mean”. Set how much the population of points is stretched
from right-to-left with “x std”. Now set the true standard deviation of the errors, s, with
“sigma.” This controls the fatness (up-and-down) of the cloud. Experiment. Note that if you
stretch the cloud too much right-to-left, and points spill out on the right, they won’t go away!
(Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.)
http://www.rossmanchance.com/applets/regcoeff/regcoeff.html (Simulates the sampling
Multiple Regression
Logistic Regression
Influence Points
See the influence of adding a single point to a simple linear regression. (Webster West – U.
of S. Carolina).
http://www.stat.sc.edu/~west/javahtml/Regression.html
Analysis of Variance
One-way ANOVA
‘See’ the F/test-statistic change as you change the raw data by clicking an dragging the
points. You are given the group means and sums of squares, and you can also compare the
‘between’ and ‘within’ sums-of-squares. (Rice Virtual Lab in Statistics).
http://www.ruf.rice.edu/~lane/stat_sim/one_way/index.html
‘See’ the F/test-statistic change as you change the true means, the common standard
deviation, and the sample size. You are given the boxplots (or dotplots) for the groups, and
you can also compare the ‘between’ and ‘within’ sums-of-squares. (Rossman/Chance Applet
Collection. Programmers: Garcia, Lima, Chance.)
http://www.rossmanchance.com/applets/Anova/Anova.html
Block designs
Two-way ANOVA
‘See’ the changes in the F-statistics for treatment A, treatment B, and the interaction (AB)
as you manipulate the “graph/figure” of means (and change the cell means and marginal
means). (Rice Virtual Lab in Statistics).
http://www.ruf.rice.edu/~lane/stat_sim/two_way/index.html
Nonparametric Tests
Sign Test
Wilcoxon Rank Sum Test
The Wilcoxon Signed Rank Test
The Kruskal-Wallis Test
The Friedman Test
The Runs Test
Bootstrap and simulation methods
Miscellaneous
Monty Hall problem:
Webster West, U. of S. Carolina.
http://www.stat.sc.edu/~west/javahtml/LetsMakeaDeal.html
“The Doghouse.” Stuff that I almost deleted, but not just yet.
‘See’ the Power of a test shaded, sort of. The software asks for “True mean” and “Hyp.
mean” but shows the pictures on a normalized scale. (R. Todd Ogden, Dept. of Statistics, U.
of S. Carolina)
http://www.stat.sc.edu/~ogden/javahtml/power/power.html
Download