Statistics 312 – Uebersax Course page:: www.john-uebersax.com/stat312 09 Probability Theory (more) & Graphs (more) Old Business - Copies of textbook on reserve - Correction: BINOMDIST - Fermat's theorem - Why Use (n-1) for Sample Variance? New Business - Probability as relative frequency - Conditional probability example - Pascal's triangle (combinations) - Box-and-whisker plots - Five-number summaries 1. Excel Function BINOMDIST The format is =BINOMDIST(k, n, p, cumul). In the last lecture I mistakenly reversed the order of k and n. 2. Fermat's Last Theorem (optional) See separate handout. 3. Why Use (n – 1) for Sample Variance See proof on separate handout. 4. Probability as Relative Frequency Ball-and-Urn Problem You have an 'urn' (a large jar) filled with red and black marbles or balls: Statistics 312 – Uebersax Course page:: www.john-uebersax.com/stat312 09 Probability Theory (more) & Graphs (more) Suppose there are 70 black and 30 red balls in urn (urn on right). Draw 1 ball at random. What is the probability (p) that the ball will be red? One way to define probability is as the relative frequency of a target event in the population/sample. Relative frequency = no. of target events / no. of all possible events. Answer: 30 target events (red balls) / 70 possible events (all balls) = 30/70 = p. Application to Conditional Probability Remember that the conditional probability of B given A is The concept of relative frequency helps us understand how this formula works. Example: A village in Siberia has snow at the following rates during 'winter': No. snow days No. of days Dec. 7 31 Jan. 15 31 Feb. 8 28 Probability of snow on a randomly selected day in this period: = Pr(snow) = no. of snow days / no. of days = 30/90 = 1/3 = .333. Conditional probability of snow given that it's January = Pr(snow|Jan.) = no. of snow days in Jan. / no. of days in Jan. = 15/31 = .484. Total 30 90 Statistics 312 – Uebersax Course page:: www.john-uebersax.com/stat312 09 Probability Theory (more) & Graphs (more) 5. Pascal's Triangle Pascal's (1623–1662) triangle is a simple method to compute the binomial coefficient, i.e., an alternative to the formula: How it works: coefficients in lower rows are produced by adding two adjacent coefficients in the row above: 1. For each row, first and last value is always 1. 2. (Therefore), first two rows contain all 1's. 3. Starting with third row, a value is produced by adding the two numbers from the line above to its left and right. 4. Example, in row 3, the "2" is produced by adding 1 (above it left) and 1 (above it right). Statistics 312 – Uebersax Course page:: www.john-uebersax.com/stat312 09 Probability Theory (more) & Graphs (more) 6. Box-and-Whisker Plots An easy way to characterize properties of a data distribution: Can be used to detect skewness and other distributional shapes: Reading: pp. 123-125, pp. 117-119 (Review 'Five-Number Summary') Box-and-Whisker Plots with JMP Instructions here: http://web.utk.edu/~cwiek/201Tutorials/SideBySideBoxPlots/