Statistics Applets on the Web. Compiled by Dex Whittinghill, Department of Mathematics, Rowan University. Checked February 4, 2011. Terminology has roots in Moore & McCabe’s Introduction to the Practice of Statistics, 3rd or 4th editions (Freeman), but I tried to keep it general. I tried to follow the standard introductory courses (elementary statistics and statistics I, as well as statistics II), save maybe the location of regression. ‘Parent’ sites from GAISE panel at Salt Lake City in August 2007, and more Rice Virtual Lab in Statistics: http://onlinestatbook.com/rvls.html http://onlinestatbook.com/stat_sim/index.html. and Rossman/Chance Applet Collection: http://www.rossmanchance.com/applets/index.html. Stat Istics: http://istics.net/stat/ University if Ilinois – Urbana Campus/CUWU Statistics Program: http://www.stat.uiuc.edu/courses/stat100/cuwu/ Web Interface for Statistics Education (WISE): http://wise.cgu.edu/applets.asp. Moore’s Basic Practice of Statistics (4th edition) collection: http://bcs.whfreeman.com/bps4e. there is probably one for the 5th edition, but as I am not teaching Elementary Statistics this semester, I feel guilty telling you this much! Data Distributions of variables Displaying Distributions with Graphs Categorical variables Bar Graphs Pie Graphs Numerical variables Boxplots Stemplots Histograms Check the effect of changing the bin/interval size on ‘look’ of a histogram. It is data from Old Faithful geyser (Webster West - S. Carolina) http://www.stat.sc.edu/~west/javahtml/Histogram.html Check the effect of changing the bin/interval size on ‘look’ of a histogram. The first URL bounces you to West’s site. There are two sets of data for the second. (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://www.rossmanchance.com/applets/Histogram.html XXXXXXX I might not have the latest JAVA on my office machine Describing Distributions with Numbers Measures of Center/Location Mean Median Mode Mean vs. Median Compare these measures of center as you add and subtract data, and see what the standard deviation is. This is visually more complicated. (Rice Virtual Lab in Statistics) http://www.ruf.rice.edu/~lane/stat_sim/descriptive/index.html Measures of Spread/Variability/Variation Range Interquartile Range Standard Deviation Interquartile Range vs. Standard Deviation The Normal Distribution Get different normal curves by changing the mean and the standard deviation. (Balasubramanian Narasimhan, of Stanford U., 1996) http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html Relationships between variables Numerical vs. Categorical Back-to-back stemplot Side-by-side boxplots Numerical vs. numerical Scatterplots Correlation Guessing Correlations (matching) Game. Click on ‘New Plots,’ do the matching question, and then see how well you did. Note that you will eventually get two of the correlations that are very close (or even exactly the same!). (CUWU Statistics Program at/of the U. of Illinois, Champaign-Urbana.) http://istics.net/stat/Correlations Guess the correlation. Get a set of points on a scatterplot (-gram) and guess the correlation. Click on ‘New Data’ and put your guess in the window. This is harder than the previous applet … but you don’t get graded! (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://www.rossmanchance.com/applets/guesscorrelation/GuessCorrelation.html Linear Regression or Linear Fit Draw the least-squares/regression line ‘by eye’ and compare to the actual leastsquares/regression line, or first draw several regression lines and how small you can make the sum-of-squares error (SSE). Click ‘down’ on the left, drag the mouse to the right, and let go to get your guessed regression line. Your mean square error (MSE) will appear on the right. (Rice Virtual Lab in Statistics. Note: Netscape 4.06 or better is required for Java 1.1) http://www.ruf.rice.edu/~lane/stat_sim/reg_by_eye/index.html Draw the least-squares/regression line ‘by eye’ and compare top the actual leastsquares/regression line. When you Click on the “Your line” square and the “Move Line” button you move the line around and see the sum of the absolute value of the errors/residuals (SAE, click ‘Show residuals’), and the sum of the squared errors (SSE, click ‘Show squared residuals’). When you are ready click “Done” under “Your line” and then click on the “Regression line” square so you can see the SAE, SSE, the Correlation, and R2 = r2. You can also add or remove points. This visually shows the principle of least squares nicely. (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://www.rossmanchance.com/applets/Reg/index.html Exponential Fit Spearman’s Rank Correlation Nonparametric Fits Categorical vs. Categorical Simpson’s paradox - cautions about concluding causation Producing Data Introduction: Available Data, Observational Studies, Sampling, Experiments Designing Samples Examples of Bad Sampling The Simple Random Sample Other Good Sampling Techniques Designing Experiments Randomized Comparative Experiments Block Designs and Two-Factor Designs Probability The Study of Randomness Law of Large Numbers Probability Models Find standard normal probabilities (scroll down a bit), instead of using a standard normal (or ztable). This gives you the ability to enter numbers more accurate than the hundredths digit, and your answer will be accurate (I think) to more than four (4) decimal places. (Duke U. Java Applets, from various locations) http://www-stat.stanford.edu/~naras/jsm/FindProbability.html Simulate the sampling distribution for the sample sum when you roll a set of dice. Unfortunately you can’t “Reset” this one. (Webster West - S. Carolina) http://www.stat.sc.edu/~west/javahtml/CLT.html Compare the t-distribution and the normal distribution for 1 to 49 degrees of freedom. Notice that the t-distribution is not that different. (Duke U. Java Applets, from Balasubramanian Narasimhan, of Stanford U.) http://www-stat.stanford.edu/~naras/jsm/TDensity/TDensity.html Random Variables Look at the ‘spike diagram’ of a binomial distribution for n = 1 to 99 (20 practically speaking) and p = .01 to .99. Note that the graph only shows you the probabilities for k = 0 to 20, so it is most useful if you stick to n < 20. (Duke U. Java Applets, from various locations) http://www-stat.stanford.edu/~naras/jsm/example5.html [ On Probation Normal approximation to a Binomial r.v.. See how well a normal distribution matches the shape of the ‘spike diagram’ for a binomial distribution. (Rice Virtual Lab in Statistics). http://www.ruf.rice.edu/~lane/stat_sim/normal_approx/index.html Means and Variances of Random Variables From Probability to Inference Sampling Distributions for Counts and Proportions See the sampling distribution of the proportion of successes, using orange Reese’s Pieces as the successes. Adjust the true proportion (theta), the size of the sample, and the number of repetitions of the sample. (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://statweb.calpoly.edu/chance/applets/Reeses/ReesesPieces.html The Sampling Distribution of a Sample Mean Simulate the sampling distribution for the sample mean with the date on a US penny sampled from a fixed population. Use this to supplement the/our class exercise from Activity-Based Statistics, by Scheaffer, Gnanadesikan, Watkins & Witmer. (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://statweb.calpoly.edu/chance/applets/SampleData/SampleData.html Simulate the sampling distribution for the sample mean (or median, or s.d.). (Rice Virtual Lab in Statistics). http://onlinestatbook.com/stat_sim/sampling_dist/index.html The Sampling Distribution of a Sample Proportion or a Sample Mean Simulate the sampling distribution for the mean (mean number of years in Senate) and proportion (proportion of republicans and proportion of females). (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://statweb.calpoly.edu/chance/applets/senators/samplesenators.html Simulate the sampling distribution for the sample mean or sum from a distribution of your choice (including the rolls of a die), or the sample proportion (i.e., a coin toss where you pick the P(head)). This is complicated enough so that I will give you extra instructions. (The CUWU Statistics Program) http://www.stat.uiuc.edu/courses/stat100//cuwu/ Click on Box Models. For proportions. For spinning a US penny from the 70’s, say, or click on the radio button for Coin. Then use the window to the right to choose your “% of heads.” You will see that the software uses 0 for tails (failure) and 1 for heads (success), respectively. Click on Sums and Averages. The “# of Draws per Experiment” is the n (try 50). The “# Experiments” is the number of repetitions of n = 50 you do (try 100). For means. Click on the radio button for Other and put the values of the data in the Value column. Then put numbers in the Count column to define the relative proportions (if you Click on the Die radio button, you will see what I mean). Click on Sums and Averages. The “# of Draws per Experiment” is the n. The “# of Experiments” is the number of repetitions of n. The Central Limit Theorem (see Sampling Distributions above) Interference Inference for the mean of a population Confidence Intervals (CIs) [ On Probation Simulate confidence intervals for the population proportion. When you click on “Begin” you will get to choose the number of data in your sample for calculating the confidence interval (n, but called N), the true proportion of successes in the population (Pi), the number confidence intervals (or “Simulations”), and the confidence level (90, 95 or 99%). There is no useful visual display, but you get the proportion of confidence intervals that contain Pi. (Rice Virtual Lab in Statistics). http://www.ruf.rice.edu/~lane/stat_sim/normal_approx_conf/index.html Simulate confidence intervals for the population proportion. First click on “Box Models”, and then on the “Other” radio button. Suppose you want the probability of a success, p, to be .47 (or 47%). In the first yellow “Count” box type “47”. In the “Value” box just to the right, type “1”. In the second yellow “Count” box type “53” (from 100-47). In the “Value” box just to the right, type “0”. Click on the blue “Accept Box” button at the bottom. This will give you the distribution of the population with 47% successes, and 53% failures. Click on the gray “Confidence Intervals” button at the top to get to the actual simulator. Choose the sample size (#Draws per Experiment), the confidence level (the next box to the right), and the number of repetitions of ‘taking the sampling size’ (# Experiments). (CUWU Statistical Program) http://www.stat.uiuc.edu/courses/stat100//cuwu/ Simulate confidence intervals for the population mean. First click on “Box Models”, and then on the “Die” radio button (unless you want to choose other and put in a custom distribution). This will give you the equivalent of rolling a fair die, with each value 1, 2, 3, 4, 5 and 6 having the same weights (each has count 1). Click on the blue “Accept Box” button at the bottom. Then click on the gray “Confidence Intervals” button at the top to get to the actual simulator. Choose the sample size (#Draws per Experiment), the confidence level (the next box to the right), and the number of repetitions of ‘taking the sampling size’ (# Experiments). (CUWU Statistical Program) http://www.stat.uiuc.edu/courses/stat100//cuwu/ Simulate confidence intervals for the population mean. Repeat taking a sample of size 10, 15 or 20 from a (normal?) population with mean 50 and standard deviation 10, and see what proportion of the resulting confidence intervals contain the true mean. Both 95% and 99% confidence intervals calculated together. (Rice Virtual Lab in Statistics). http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html Simulate confidence intervals for the population proportion (Wald interval) or the population mean. You can take up to 300 samples of size n (wide choice of values), from a population with the proportion of success of your choice (using the Wald interval), or a (normal?) population with mean and standard deviation of your choice. See what proportion of the resulting confidence intervals contain the true proportion/mean. Choose your own confidence level. The confidence interval for the proportion is not the classical one. (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://statweb.calpoly.edu/chance/applets/Confsim/Confsim.html and a newer version http://www.rossmanchance.com/applets/NewConfsim/Confsim.html Tests of Significance/Hypothesis test Type I and Type II errors, their probabilities, and the Power of the test. This site allows you to look at the Type I error and its probability α (shaded in green), the Type II error and its probability β (shaded in red) and the Power of a test (= 1-β, shaded in blue). The hypothesized mean is μ0 = 10, the alternative hypothesis is “greater than 10” (or right tailed). You can alter the a, the sample size n, and the distance between the hypothesized mean μ0 = 10, a “true mean” greater than 10 (μ1), with the distance being Δ. The theoretical details on the “cover” page are worth reading, even if you skip that set of probability formulas that appear after the paragraph beginning with the word “Power”. Notice the large two-by-two table with the “State of Nature” at the top (with the two possibilities for the truth about μ), and the “Decision” on the left (with the two possibilities for what the data will tell us to conclude about μ). Scroll to the bottom of this page to the little window where it says “Select an applet.” Choose “All of the above,” and click on the “Open!” button. You will get a new window with two blank spaces, and three slider bars on the right, marked DELTA (Δ), ALPHA (α) and SAMPLE SIZE (n). [If at any time you want to be restricted to changing only one of Δ, α or n, you will see that you have that option.] Top Window = Null Distribution. This will give the distribution of the sample statistic, x , assuming that H0 is true, and μ0 = 10. [This will not change in location, but will change shape as you change n.] Bottom Window = Alternative Distribution. This will give the distribution of the sample statistic, x , assuming that Ha is true, or the population mean, μ, is a some value > 10. The particular value of μ under consideration will be called μ1. [This will change in location as you change Δ and change in shape as you change n.] http://www.amstat.org/publications/jse/v11n3/java/Hypothesis/ Type I and Type II Errors, their probabilities, and the Power of the test. In the top of the left frame you will see the distribution of the hypothesized population (individuals, with mean μ0 = 100) outlined in blue, and the distribution of the population (individuals) under a particular mean (μ1) in the region of the alternative hypothesis outlined in red. In the bottom of the left frame has the sampling distributions of the sample mean x for each situation, and shaded in the appropriate color. The dark blue represents the probability of the Type I error (α), and the dark red represents the probability of the Type II error (β). The light red represents the Power (1-β). This site allows you to change the mean (μ1) in the region of the alternative hypothesis by dragging the red normal curve in the left pane, and changing the sample size (n) in the right pane. This really no more flexible than the previous applet, and maybe more complicated (stay away from the “Sample” button for now). Is it cuter? (WISE: Web Interface for Statistics Education): http://wise.cgu.edu/powermod/power_applet.asp Comparing Two Means Comparing Two proportions Inference for Counts One-way Chi-square tests (including goodness of fit) Two-way Chi-square tests Inference for Regression Simple Linear Regression The true/population line VS. the estimated regression line. Choose a sample of points from the population of points to make/calculate an estimated regression line. The population points are blue and the true/population regression line is yellow. When a sample of points is drawn they are red, and the corresponding estimated regression line is red. Set the yellow population regression first by choosing the line slope and intercept. Set the center of the cloud with “x mean”. Set how much the population of points is stretched from right-to-left with “x std”. Now set the true standard deviation of the errors, s, with “sigma.” This controls the fatness (up-and-down) of the cloud. Experiment. Note that if you stretch the cloud too much right-to-left, and points spill out on the right, they won’t go away! (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://www.rossmanchance.com/applets/regcoeff/regcoeff.html (Simulates the sampling Multiple Regression Logistic Regression Influence Points See the influence of adding a single point to a simple linear regression. (Webster West – U. of S. Carolina). http://www.stat.sc.edu/~west/javahtml/Regression.html Analysis of Variance One-way ANOVA ‘See’ the F/test-statistic change as you change the raw data by clicking an dragging the points. You are given the group means and sums of squares, and you can also compare the ‘between’ and ‘within’ sums-of-squares. (Rice Virtual Lab in Statistics). http://www.ruf.rice.edu/~lane/stat_sim/one_way/index.html ‘See’ the F/test-statistic change as you change the true means, the common standard deviation, and the sample size. You are given the boxplots (or dotplots) for the groups, and you can also compare the ‘between’ and ‘within’ sums-of-squares. (Rossman/Chance Applet Collection. Programmers: Garcia, Lima, Chance.) http://www.rossmanchance.com/applets/Anova/Anova.html Block designs Two-way ANOVA ‘See’ the changes in the F-statistics for treatment A, treatment B, and the interaction (AB) as you manipulate the “graph/figure” of means (and change the cell means and marginal means). (Rice Virtual Lab in Statistics). http://www.ruf.rice.edu/~lane/stat_sim/two_way/index.html Nonparametric Tests Sign Test Wilcoxon Rank Sum Test The Wilcoxon Signed Rank Test The Kruskal-Wallis Test The Friedman Test The Runs Test Bootstrap and simulation methods Miscellaneous Monty Hall problem: Webster West, U. of S. Carolina. http://www.stat.sc.edu/~west/javahtml/LetsMakeaDeal.html “The Doghouse.” Stuff that I almost deleted, but not just yet. ‘See’ the Power of a test shaded, sort of. The software asks for “True mean” and “Hyp. mean” but shows the pictures on a normalized scale. (R. Todd Ogden, Dept. of Statistics, U. of S. Carolina) http://www.stat.sc.edu/~ogden/javahtml/power/power.html