CCSS ALGEBRA 2 STATISTICS ESSENTIALS: The Stat Skills and Background You’ll Need Rob Gerver, Ph.D. North Shore HS Glen Head, NY gerverr@northshoreschools.org STAT TOPICS IN CCSS ALGEBRA 2: Content Standards S-ID, S-IC, S-CP, S-MD • Descriptive statistics: Interpreting categorical and bivariate and univariate quantitative data • Probability: Conditional probability, independence, expected value • Inferential Statistics: Sampling, making inferences, and justifying conclusions There is an emphasis on understanding, interpreting and critiquing—it’s not just students boxing numerical answers! 211 PAGES OF SAMPLE PROBLEMS! https://www.engageny.org/resource/algebra-ii-module-4 YOU’LL NEED TO KNOW MORE THAN JUST WHAT IS IN THE CURRICULUM! CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials DISPLAYING UNIVARIATE DATA Box and whisker plots, histograms, dot plots UNIVARIATE STATISTICAL MEASURES Central tendency: • Mean • Median-resistant to outliers • Mode Dispersion: • Range—ignores spread except for high and low • Quartiles, IQR (colleges middle 50%) • Mean deviation, Mean absolute deviation • Variance, Standard deviation in descriptive and inferential statistics: σ vs. s CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials BIVARIATE MEASURES • Correlation coefficient • Reg slope as a rate, interpreting r as strong, moderate or weak, • Interpreting relationship as linear or not-using scatterplot. • Making predictions--extrapolation BIVARIATE NUANCES • Causation vs correlation • Lurking variables • Confounding variables USE THE SCATTERPLOT! L1 L2 10 8.04 8 6.95 13 7.58 9 8.81 11 8.33 14 9.96 6 7.24 4 4.26 12 10.84 7 4.82 5 5.68 L3 L4 9.14 7.46 8.14 6.77 8.74 12.74 8.77 7.11 9.26 7.81 8.1 8.84 6.13 6.08 3.1 5.39 9.13 8.15 7.26 6.42 4.74 5.73 L5 L6 8 8 8 8 8 8 8 19 8 8 8 6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 6.89 SCATTERPLOTS HELP TELL THE STORY! Scatterplot x-value y-value Correlation Coefficient—3 decimal places I II III IV I L1 L1 L1 L5 II L2 L3 L4 L6 .816 .816 .816 .816 III IV CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials PERMUTATIONS AND COMBINATIONS • nCr and nPr • • • • Calculator commands Using formulas by hand Understanding “ordered” vs not nCr gives number of samples without replacement WHY IS PROBABILITY ALWAYS TEAMED WITH STATISTICS??!! Probability is the basis of statistical inference. THEORETICAL AND EMPIRICAL PROBABILITY MAKE A CARD DECK POSTER! Great for explaining independence and conditional probability! UNDERSTANDING INDEPENDENCE K = king; F= face; D = diamond 1 P(K | D) = 13 1 P(K) = 13 1 P(K | F) = 3 DECLARING INDEPENDENCE The conditional probability definition is P(A Ç B) P(A | B) = P(B) If P(A | B) = P(A) , A and B are independent events. So a test for independence is P(AÇ B) = P(A)× P(B) If A and B are disjoint events, P(A Ç B) = 0 This formula holds if A and B are disjoint or not. P(AÈ B) = P(A)+ P(B)- P(AÇ B). DISJOINT? EXHAUSTIVE? INDEPENDENT? CONDITIONAL PROBABILITY USING TWO-WAY TABLES How California Baseball Fans Watch Their Team’s Games Giant Fans Angel Fans Dodger Fans Athletics Fans Padre Fans TOTALS Watches Watches on on Cable Dish 14 10 6 18 4 20 5 19 15 7 44 74 Doesn’t Watch 6 11 5 6 12 40 TOTALS 30 35 29 30 34 158 Categorical (qualitative) variables cannot be ordered, but you can look at whether or not they are associated with each other; and if they are independent. CONDITIONAL PROBABILITY USING TWO-WAY TABLES Smoking Status: Current Former Never Totals 4-Year Degree 51 92 68 211 2-Year Degree No Degree 22 21 9 52 Totals 43 28 22 93 116 141 99 356 Students need to have dexterity with the tables! P(person is a former smoker) = P(person is a former smoker, given that they have no degree) = P(person has a degree) = P(person has a 4-year degree, given that they are a current smoker) = Students need to be able to use and interpret the algebraic formulas! Conditional Probability Option: SIMPSON’S PARADOX PLAYER PITCHER Julie Righty Lefty Jordan Righty Lefty HITS 40 80 120 10 AT-BATS 100 400 400 100 AVERAGE .400 .200 .300 .100 Julie is better against righties and lefties, but Jordan is the better hitter overall (.260 vs 240)!! Unreal! EXPECTED VALUE A carnival game called “Take Five” involves the rolling of a die. If it lands on 5, the winner gets $5. If it lands on 1, 2, or 3, the player receives $1. If it lands on 4 or 6, the player receives nothing. If the carnival organizers charge $2 to play this game, what is their expected profit if 1000 people play? Die Face 1, 2, 3 5 4 or 6 Payout X $1 $5 $0 Probability P(X) 1/2 1/6 1/3 E(x) = 1(1/2) + 5(1/6) + 0(1/3) = $1.33 and this is the average payout. (2 – 1.33)(1000) is expected profit. EXPECTED VALUE A life Insurance company charges $250 annually for a $100,000 fiveyear term policy. What is their expected profit on this policy? Age at Death 21 22 23 24 25 Profit X ?? ?? ?? ?? ?? ?? P(X) .00183 .00186 .00189 .00191 .00194 ???? Age at Death 21 22 23 24 25 Profit X -99,750 -99,500 -99,250 -99,000 -98,750 +1,250 P(X) .00183 .00186 .00189 .00191 .00194 ???? CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials THE MEAN: AN UNBIASED ESTIMATOR? Is the average of all possible sample means the same as the actual population mean? If so, the mean would be an unbiased estimator. If not, the mean is biased. Let’s find out! Here is a population of 6 people’s scores: 1, 4, 5, 16, 17, 23. What is the population’s mean?_________ What is the mean of all the sample means?________ Conjecture: THE MEDIAN: AN UNBIASED ESTIMATOR? Is the average of all possible sample medians the same as the actual population median? If so, the median would be an unbiased estimator. If not, the median is biased. Let’s find out! Here is a population of 6 people’s scores: 1, 4, 5, 16, 17, 23. What is the population’s mean?_________ What is the mean of all the sample medians?________ Conjecture: THE RANGE: AN UNBIASED ESTIMATOR? Is the average of all possible sample ranges the same as the actual population range? If so, the range would be an unbiased estimator. If not, the range is biased. Let’s find out! Here is a population of 6 people’s scores: 1, 4, 5, 16, 17, 23. What is the population’s range?_________ What is the mean of all the sample ranges?________ Conjecture: THE SAMPLE VARIANCE s2: AN UNBIASED ESTIMATOR? Is the average of all possible sample variances the same as the actual population variance? If so, the sample variance would be an unbiased estimator. What is the population’s variance σ2?_________ What is the mean of all the sample variances?________ Conjecture: THE VARIANCE σ2: AN UNBIASED ESTIMATOR? Is the average of all possible variances the same as the actual population variance? If so, the sample variance would be an unbiased estimator. What is the population’s variance σ2?_________ What is the mean of all the σ2 variances from the samples?________ Conjecture: p̂ : AN UNBIASED ESTIMATOR OF p? The following Y’s and N’s are Yes/No responses to a question, from a population of 8: Y, Y, Y, N, Y, N, Y, Y 1. What is the population proportion p, of Y’s?____ p̂ 2. Imagine taking samples of size 2 with replacement. Make them here, using the grid. In each cell enter the proportion of Y’s in that sample. 3. What is the mean of all the sample proportions?_____ 4. Conjecture: BIAS AND VARIABILITY: STRIVING FOR LOW BIAS AND LOW VARIALIBILITY • Bias describes how near the sample statistics come to estimating the population parameter. • Variability describes how scattered the sample statistics are. • The sample mean and sample proportion have low bias and low variability. CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials What is a Density Curve? • Area underneath curve, and above x-axis, is 1, representing 100%. • Area under the curve, in any specified interval, represents a percent of the total area. • Most famous density curve is the standard normal curve. What is a Sampling Distribution? • A density curve that represents a distribution of a selected statistic from all possible samples of a given size, taken from a specific population, with replacement. • Area under any interval represents a percent of the samples. • Most famous sampling distribution is the standard normal curve. VIOLATING REPLACEMENT The population of Oyster Bay is 293,214. Let’s say you wanted to select a sample of size 500 from this town. What is the probability, written as a fraction, that Bruno will be selected first? 1/293214 Convert this to an 8-place decimal carefully. .000003410478354 Let’s say 499 subjects were already picked and not replaced, and Bruno is not one of them. What is the probability he will be picked next, as a fraction? 1/292,715. Convert this to a decimal. .000003416292298. Compare the probably that Bruno was picked first to the probability he was picked 500th. What do you notice?_____. Since the population is so much larger than the sample size, Bruno’s probability is essentially the same whenever he is picked, giving us the independence we need to use the graphs and formulas. Large populations allow you to pick larger samples, and larger samples are usually more representative of the population. CHOOSING SUBJECTS RANDOMLY CHOOSING SUBJECTS RANDOMLY If you “seed” your calculator, you’ll get the same random numbers. CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials EXPERIMENTAL DESIGN BASICS • Control-set up a comparison group. • Replication—use high n for samples and also repeat experiment in different settings. • Randomness—use correct sampling technique, reliable and valid instruments, no lurking or confounding variables, correct design. • Factor-the treatment • Response variable—quantified result after trt • Independence—subjects picked independently THE LANGUAGE OF EXPERIMENTAL DESIGN Descriptive statistics Population Observational study Experimental study Delimitations Sampling Voluntary response sample Nonresponse Undercoverage Matched pairs Hawthorne effect Replication Statistical significance Inferential statistics Sample Limitations Control Simulation Census Convenience sample Learning effect Placebo Double-blind Randomness Placebo effect THE LOGIC BEHIND HYPOTHESIS TESTING Use binomial theorem one-die roll example: Binompdf(50, 1/6, 21) = .00001551 What do you choose to believe? Are you ever 100% sure you are correct? HYPOTHESIS TESTS • Null hypothesis—the hypothesis of “no difference.” • A sampling distribution is created, based on the null hypothesis. • A sample is taken. • Data from the sample is analyzed as probable, or improbable when compared to the sampling distribution. CONFIDENCE INTERVALS • To explore, get a handle on, some unknown numerical quantity. • Interval estimates vs. point estimates. • Build a margin of error around a sample statistic. • Margin of error based on sample size. • Increase n or lower confidence to shrink interval. • Can follow up a hypothesis test when null hypothesis is rejected. LIMITATIONS AND DELIMITATIONS • Limitations—time, money, effort, accessibility, geographical proximity, release time from work. Limitations are imposed on you. • Delimitations—deliberate limitations you impose on your experiment. TYPES OF SAMPLES • SRS-Simple random sample-each sample has the same chance of being selected. • Systematic Random Sample----stadium view, school room heat • Stratified random sample-mimic population %’s • Cluster sample • Convenience (opportunistic sample) COMPLETELY RANDOMIZED DESIGN The completely randomized design takes a randomly-selected group of subjects and splits them randomly into groups that received different treatments. CRITIQUING STUDIES • Matched pairs vs. two sample designs: Which is preferable? When is matched pairs impossible? • Learning effect • Hawthorne effect • Poor sampling--design, sample size, instruments • Poor design • Violating assumptions of statistical test • Lurking variables • Confounding variables • Influential points • Outliers and resistance CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials CCSS ALGEBRA 2 STATISTICS ESSENTIALS • • • • • • • • Descriptive Statistics: Univariate Data Descriptive Statistics: Bivariate Data Probability and Independence Inferential Statistics: Unbiased Estimators Inferential Statistics: Sampling Distributions Inferential Statistics: Experimental Design Writing Projects Balloon Help Tutorials What Are Balloon-Help Tutorials? • Designed to gradually break that old math habit—”boxing” numerical answers devoid of any verbal explanation. • They require students to explain selected (or all) aspects of a solution to a problem; enhancing it with anything they feel helps explain the problem. Benefits of Balloon-Help Tutorials • Gets students in the habit of writing original, complete sentences more often. • “If you can’t say it, you don’t know it.” • Gets the writing practice frequent, consistent, and spaced out through the year. • Writing practice translates to better free response answers. • An alternative form of assessment. • The grade from these projects can be used in many ways. • Can be used for extra credit options, pinpointed on specific student trouble areas. • Makes for a great showcase or bulletin board. • By-product of trying to teach them the writing skills is they learn they math they are working on. Excerpt from Sample Annotation Written By Students “ High correlation does not imply any causation. In the example with the number of drownings correlated to ice-cream sales, we found that each of those variables was highly correlated with the temperature. The relationship between ice cream sales and temperature is probably causal, as is the relationship between # drownings and temperature.” Assessing the Projects Although they are graded for mathematical accuracy, and creativity of annotations, students are welcome to employ their artistic side. However, color is to be used to improve the explanation of a mathematical point; not for the sake of “glitz.” BALLOON HELP TUTORIALS GRADING SHEET: 1 – 10 in each category 1._____The mathematics is correct. 2._____The full-sentence explanations are correct. 3._____The topic/problem is comprehensive and complete. 4._____All crucial points are addressed verbally. 5._____Color is used with discretion to improve the explanation of a statistical point. 6._____Mathematical and statistical notation and terminology are used correctly. 7._____Captions for figures are descriptive and formatted correctly. 8._____Table headings are descriptive and formatted correctly. 9._____The physical layout of the project--text, diagrams, tables—are high quality. 10._____Appropriate and sufficient examples are given. 11._____Diagrams and/or tables are graduated where necessary. 12._____The project does a clearer job of explaining the topic than the original notes do. 13._____The depth and quality of the project are commensurate with the student’s ability.