CCSS ALGEBRA 2 STATISTICS ESSENTIALS Descriptive

advertisement
CCSS ALGEBRA 2 STATISTICS ESSENTIALS:
The Stat Skills and Background You’ll Need
Rob Gerver, Ph.D.
North Shore HS
Glen Head, NY
gerverr@northshoreschools.org
STAT TOPICS IN CCSS ALGEBRA 2:
Content Standards S-ID, S-IC, S-CP, S-MD
• Descriptive statistics: Interpreting categorical and
bivariate and univariate quantitative data
• Probability: Conditional probability, independence,
expected value
• Inferential Statistics: Sampling, making inferences,
and justifying conclusions
There is an emphasis on understanding,
interpreting and critiquing—it’s not just
students boxing numerical answers!
211 PAGES OF SAMPLE PROBLEMS!
https://www.engageny.org/resource/algebra-ii-module-4
YOU’LL NEED TO KNOW MORE THAN
JUST WHAT IS IN THE CURRICULUM!
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
DISPLAYING UNIVARIATE DATA
Box and whisker plots, histograms, dot plots
UNIVARIATE STATISTICAL MEASURES
Central tendency:
• Mean
• Median-resistant to outliers
• Mode
Dispersion:
• Range—ignores spread except for high and low
• Quartiles, IQR (colleges middle 50%)
• Mean deviation, Mean absolute deviation
• Variance, Standard deviation in descriptive and
inferential statistics: σ vs. s
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
BIVARIATE MEASURES
• Correlation coefficient
• Reg slope as a rate, interpreting r as strong,
moderate or weak,
• Interpreting relationship as linear or not-using scatterplot.
• Making predictions--extrapolation
BIVARIATE NUANCES
• Causation vs correlation
• Lurking variables
• Confounding variables
USE THE SCATTERPLOT!
L1
L2
10 8.04
8 6.95
13 7.58
9 8.81
11 8.33
14 9.96
6 7.24
4 4.26
12 10.84
7 4.82
5 5.68
L3
L4
9.14 7.46
8.14 6.77
8.74 12.74
8.77 7.11
9.26 7.81
8.1 8.84
6.13 6.08
3.1 5.39
9.13 8.15
7.26 6.42
4.74 5.73
L5
L6
8
8
8
8
8
8
8
19
8
8
8
6.58
5.76
7.71
8.84
8.47
7.04
5.25
12.5
5.56
7.91
6.89
SCATTERPLOTS HELP TELL THE STORY!
Scatterplot x-value y-value Correlation
Coefficient—3
decimal places
I
II
III
IV
I
L1
L1
L1
L5
II
L2
L3
L4
L6
.816
.816
.816
.816
III
IV
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
PERMUTATIONS AND COMBINATIONS
• nCr and nPr
•
•
•
•
Calculator commands
Using formulas by hand
Understanding “ordered” vs not
nCr gives number of samples without replacement
WHY IS PROBABILITY ALWAYS
TEAMED WITH STATISTICS??!!
Probability is the
basis of
statistical
inference.
THEORETICAL AND EMPIRICAL PROBABILITY
MAKE A CARD DECK POSTER!
Great for explaining independence and
conditional probability!
UNDERSTANDING INDEPENDENCE
K = king; F= face; D = diamond
1
P(K | D) =
13
1
P(K) =
13
1
P(K | F) =
3
DECLARING INDEPENDENCE
The conditional probability definition is
P(A Ç B)
P(A | B) =
P(B)
If P(A | B) = P(A) , A and B are independent events.
So a test for independence is P(AÇ B) = P(A)× P(B)
If A and B are disjoint events, P(A Ç B) = 0
This formula holds if A and B are disjoint or not.
P(AÈ B) = P(A)+ P(B)- P(AÇ B).
DISJOINT? EXHAUSTIVE? INDEPENDENT?
CONDITIONAL PROBABILITY USING
TWO-WAY TABLES
How California Baseball Fans Watch Their Team’s Games
Giant Fans
Angel Fans
Dodger Fans
Athletics Fans
Padre Fans
TOTALS
Watches Watches on
on Cable
Dish
14
10
6
18
4
20
5
19
15
7
44
74
Doesn’t
Watch
6
11
5
6
12
40
TOTALS
30
35
29
30
34
158
Categorical (qualitative) variables cannot be ordered,
but you can look at whether or not they are associated
with each other; and if they are independent.
CONDITIONAL PROBABILITY USING
TWO-WAY TABLES
Smoking
Status:
Current
Former
Never
Totals
4-Year
Degree
51
92
68
211
2-Year
Degree
No
Degree
22
21
9
52
Totals
43
28
22
93
116
141
99
356
Students need to have dexterity with the tables!
P(person is a former smoker) =
P(person is a former smoker, given that they have no degree) =
P(person has a degree) =
P(person has a 4-year degree, given that they are a current smoker) =
Students need to be able to use and interpret the algebraic formulas!
Conditional Probability Option:
SIMPSON’S PARADOX
PLAYER PITCHER
Julie
Righty
Lefty
Jordan Righty
Lefty
HITS
40
80
120
10
AT-BATS
100
400
400
100
AVERAGE
.400
.200
.300
.100
Julie is better against righties and
lefties, but Jordan is the better
hitter overall (.260 vs 240)!! Unreal!
EXPECTED VALUE
A carnival game called “Take Five” involves the rolling of
a die. If it lands on 5, the winner gets $5. If it lands on 1,
2, or 3, the player receives $1. If it lands on 4 or 6, the
player receives nothing. If the carnival organizers charge
$2 to play this game, what is their expected profit if 1000
people play?
Die Face
1, 2, 3
5
4 or 6
Payout X
$1
$5
$0
Probability P(X)
1/2
1/6
1/3
E(x) = 1(1/2) + 5(1/6) + 0(1/3) = $1.33 and this is the average payout.
(2 – 1.33)(1000) is expected profit.
EXPECTED VALUE
A life Insurance company charges $250 annually for a $100,000 fiveyear term policy. What is their expected profit on this policy?
Age at
Death
21
22
23
24
25
Profit X
??
??
??
??
??
??
P(X)
.00183
.00186
.00189
.00191
.00194
????
Age at
Death
21
22
23
24
25
Profit X
-99,750
-99,500
-99,250
-99,000
-98,750
+1,250
P(X)
.00183
.00186
.00189
.00191
.00194
????
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
THE MEAN: AN UNBIASED ESTIMATOR?
Is the average of all possible sample means the same as the actual population
mean? If so, the mean would be an unbiased estimator. If not, the mean is
biased. Let’s find out!
Here is a population of 6 people’s scores: 1, 4, 5, 16, 17, 23.
What is the population’s mean?_________
What is the mean of all the sample means?________
Conjecture:
THE MEDIAN: AN UNBIASED ESTIMATOR?
Is the average of all possible sample medians the same as the actual population
median? If so, the median would be an unbiased estimator. If not, the median is
biased. Let’s find out!
Here is a population of 6 people’s scores: 1, 4, 5, 16, 17, 23.
What is the population’s mean?_________
What is the mean of all the sample medians?________
Conjecture:
THE RANGE: AN UNBIASED ESTIMATOR?
Is the average of all possible sample ranges the same as the actual population
range? If so, the range would be an unbiased estimator. If not, the range is
biased. Let’s find out!
Here is a population of 6 people’s scores: 1, 4, 5, 16, 17, 23.
What is the population’s range?_________
What is the mean of all the sample ranges?________
Conjecture:
THE SAMPLE VARIANCE s2: AN UNBIASED ESTIMATOR?
Is the average of all possible sample variances the same as the actual
population variance? If so, the sample variance would be an unbiased
estimator.
What is the population’s variance σ2?_________
What is the mean of all the sample variances?________
Conjecture:
THE VARIANCE σ2: AN UNBIASED ESTIMATOR?
Is the average of all possible variances the same as the actual population
variance? If so, the sample variance would be an unbiased estimator.
What is the population’s variance σ2?_________
What is the mean of all the σ2 variances from the samples?________
Conjecture:
p̂ : AN UNBIASED ESTIMATOR OF p?
The following Y’s and N’s are Yes/No responses to a question, from a population of 8:
Y, Y, Y, N, Y, N, Y, Y
1. What is the population proportion p, of Y’s?____
p̂
2. Imagine taking samples of size 2 with replacement. Make them here, using the grid. In
each cell enter the proportion of Y’s in that sample.
3. What is the mean of all the sample proportions?_____
4. Conjecture:
BIAS AND VARIABILITY:
STRIVING FOR LOW BIAS AND LOW VARIALIBILITY
• Bias describes how near the sample statistics come
to estimating the population parameter.
• Variability describes how scattered the sample
statistics are.
• The sample mean and sample proportion have low
bias and low variability.
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
What is a Density Curve?
• Area underneath curve, and above x-axis, is 1,
representing 100%.
• Area under the curve, in any specified interval,
represents a percent of the total area.
• Most famous density curve is the standard
normal curve.
What is a Sampling Distribution?
• A density curve that represents a distribution
of a selected statistic from all possible samples
of a given size, taken from a specific
population, with replacement.
• Area under any interval represents a percent
of the samples.
• Most famous sampling distribution is the
standard normal curve.
VIOLATING REPLACEMENT
The population of Oyster Bay is 293,214. Let’s say you wanted to
select a sample of size 500 from this town. What is the probability,
written as a fraction, that Bruno will be selected first? 1/293214
Convert this to an 8-place decimal carefully. .000003410478354 Let’s
say 499 subjects were already picked and not replaced, and Bruno is
not one of them. What is the probability he will be picked next, as a
fraction? 1/292,715. Convert this to a decimal. .000003416292298.
Compare the probably that Bruno was picked first to the probability
he was picked 500th. What do you notice?_____. Since the population
is so much larger than the sample size, Bruno’s probability is
essentially the same whenever he is picked, giving us the
independence we need to use the graphs and formulas. Large
populations allow you to pick larger samples, and larger samples are
usually more representative of the population.
CHOOSING SUBJECTS RANDOMLY
CHOOSING SUBJECTS RANDOMLY
If you “seed”
your calculator,
you’ll get the
same random
numbers.
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
EXPERIMENTAL DESIGN BASICS
• Control-set up a comparison group.
• Replication—use high n for samples and also
repeat experiment in different settings.
• Randomness—use correct sampling technique,
reliable and valid instruments, no lurking or
confounding variables, correct design.
• Factor-the treatment
• Response variable—quantified result after trt
• Independence—subjects picked independently
THE LANGUAGE OF EXPERIMENTAL DESIGN
Descriptive statistics
Population
Observational study
Experimental study
Delimitations
Sampling
Voluntary response sample
Nonresponse
Undercoverage
Matched pairs
Hawthorne effect
Replication
Statistical significance
Inferential statistics
Sample
Limitations
Control
Simulation
Census
Convenience sample
Learning effect
Placebo
Double-blind
Randomness
Placebo effect
THE LOGIC BEHIND HYPOTHESIS TESTING
Use binomial theorem one-die roll example:
Binompdf(50, 1/6, 21) = .00001551
What do you choose to believe?
Are you ever 100% sure you are correct?
HYPOTHESIS TESTS
• Null hypothesis—the hypothesis of “no
difference.”
• A sampling distribution is created, based on
the null hypothesis.
• A sample is taken.
• Data from the sample is analyzed as probable,
or improbable when compared to the
sampling distribution.
CONFIDENCE INTERVALS
• To explore, get a handle on, some unknown
numerical quantity.
• Interval estimates vs. point estimates.
• Build a margin of error around a sample statistic.
• Margin of error based on sample size.
• Increase n or lower confidence to shrink interval.
• Can follow up a hypothesis test when null
hypothesis is rejected.
LIMITATIONS AND DELIMITATIONS
• Limitations—time, money, effort, accessibility,
geographical proximity, release time from
work. Limitations are imposed on you.
• Delimitations—deliberate limitations you
impose on your experiment.
TYPES OF SAMPLES
• SRS-Simple random sample-each sample has the
same chance of being selected.
• Systematic Random Sample----stadium view,
school room heat
• Stratified random sample-mimic population %’s
• Cluster sample
• Convenience (opportunistic sample)
COMPLETELY RANDOMIZED DESIGN
The completely randomized design takes a
randomly-selected group of subjects and splits
them randomly into groups that received
different treatments.
CRITIQUING STUDIES
• Matched pairs vs. two sample designs: Which is
preferable? When is matched pairs impossible?
• Learning effect
• Hawthorne effect
• Poor sampling--design, sample size, instruments
• Poor design
• Violating assumptions of statistical test
• Lurking variables
• Confounding variables
• Influential points
• Outliers and resistance
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
CCSS ALGEBRA 2 STATISTICS ESSENTIALS
•
•
•
•
•
•
•
•
Descriptive Statistics: Univariate Data
Descriptive Statistics: Bivariate Data
Probability and Independence
Inferential Statistics: Unbiased Estimators
Inferential Statistics: Sampling Distributions
Inferential Statistics: Experimental Design
Writing Projects
Balloon Help Tutorials
What Are Balloon-Help Tutorials?
• Designed to gradually break that
old math habit—”boxing”
numerical answers devoid of any
verbal explanation.
• They require students to explain
selected (or all) aspects of a
solution to a problem; enhancing it
with anything they feel helps
explain the problem.
Benefits of Balloon-Help Tutorials
• Gets students in the habit of writing original, complete sentences
more often.
• “If you can’t say it, you don’t know it.”
• Gets the writing practice frequent, consistent, and spaced out
through the year.
• Writing practice translates to better free response answers.
• An alternative form of assessment.
• The grade from these projects can be used in many ways.
• Can be used for extra credit options, pinpointed on specific
student trouble areas.
• Makes for a great showcase or bulletin board.
• By-product of trying to teach them the writing skills is they learn
they math they are working on.
Excerpt from Sample Annotation
Written By Students
“ High correlation does not imply any
causation. In the example with the number of
drownings correlated to ice-cream sales, we
found that each of those variables was highly
correlated with the temperature. The
relationship between ice cream sales and
temperature is probably causal, as is the
relationship between # drownings and
temperature.”
Assessing the Projects
Although they are graded for mathematical
accuracy, and creativity of annotations,
students are welcome to employ their
artistic side. However, color is to be used to
improve the explanation of a mathematical
point; not for the sake of “glitz.”
BALLOON HELP TUTORIALS GRADING SHEET:
1 – 10 in each category
1._____The mathematics is correct.
2._____The full-sentence explanations are correct.
3._____The topic/problem is comprehensive and complete.
4._____All crucial points are addressed verbally.
5._____Color is used with discretion to improve the explanation of a statistical point.
6._____Mathematical and statistical notation and terminology are used correctly.
7._____Captions for figures are descriptive and formatted correctly.
8._____Table headings are descriptive and formatted correctly.
9._____The physical layout of the project--text, diagrams, tables—are high quality.
10._____Appropriate and sufficient examples are given.
11._____Diagrams and/or tables are graduated where necessary.
12._____The project does a clearer job of explaining the topic than the original notes do.
13._____The depth and quality of the project are commensurate with the student’s ability.
Download