Day1StatsWorkshopPresentation

advertisement
GAISEing into the Common
Core Standards – Day 1
A Professional Development Seminar sponsored by the
Ann Arbor Chapter of the ASA
The Core
Summarize and describe distributions.
Summarize, represent, and interpret data on a
single count or measurement variable
CCSS.Math.Content.6.SP.B.4 Display numerical data in
plots on a number line, including dot plots, histograms,
and box plots.
CCSS.Math.Content.HSS-ID.A.1 Represent data with plots
on the real number line (dot plots, histograms, and box
plots).
CCSS.Math.Content.6.SP.B.5 Summarize numerical data
sets in relation to their context, such as by:
CCSS.Math.Content.6.SP.B.5a Reporting the number of
observations.
CCSS.Math.Content.6.SP.B.5b Describing the nature of the
attribute under investigation, including how it was
measured and its units of measurement.
CCSS.Math.Content.6.SP.B.5c Giving quantitative
measures of center (median and/or mean) and variability
(interquartile range and/or mean absolute deviation), as
well as describing any overall pattern and any striking
deviations from the overall pattern with reference to the
context in which the data were gathered.
CCSS.Math.Content.HSS-ID.A.2 Use statistics appropriate
to the shape of the data distribution to compare center
(median, mean) and spread (interquartile range, standard
deviation) of two or more different data sets.
CCSS.Math.Content.6.SP.B.5d Relating the choice of
measures of center and variability to the shape of the
data distribution and the context in which the data were
gathered.
CCSS.Math.Content.HSS-ID.A.3 Interpret differences in
shape, center, and spread in the context of the data sets,
accounting for possible effects of extreme data points
(outliers).
Getting to Know You …
Let’s Collect Some Data!
• Go around the room and enter data
for yourself on the charts.
• Men: blue markers
• Women: red markers
Let’s think about our data
• What are the different types of variables that
we measured?
• How did you measure each of the variables?
• Were any of these hard to measure?
• What were the units for each variable?
• What might the context be for each of these
charts?
Quantitative vs Categorical data
•
•
•
•
•
•
•
•
Height (in)
Number of letters in your first name
Number of siblings (not including yourself)
Favorite color
Do you currently have a dog?
How many pets do you currently have?
Travel time to this workshop? (min)
How many years have you been teaching?
Quantitative vs Categorical data
Applet:
http://mathnstats.com/applets/Categorical-Quantitative.html
Quantitative vs Categorical data
• Common Misconceptions:
– Histograms vs Bar charts
– Don’t discuss shape
for bar charts!
– Zip code?
Common Shapes of Distributions
Shape
• Skewed right(positive)/left(negative)
Matching Shapes and Characteristics
Distribution 1
Characteristic =
Distribution 2
Characteristic =
Distribution 3
Characteristic =
Distribution 4
Characteristic =
Characteristics:
1. Distribution of age for the population of the United States in the year 1980.
Describe and explain the shape of the distribution.
2. Distribution of miles of coastline for the 50 United States.
Describe and explain the shape of the distribution.
Which states do you think would be in the last class furthest to the right?
3. Distribution of the number of miles traveled to work, that is,
commuting distance for employed adults in a city.
Describe and explain the shape of the distribution.
4. Distribution of age at death for the population of the United States (year 1980).
Describe and explain the shape of the distribution.
Measures of Center
• What is a typical value in a given situation?
– Tallest bar: mode
– Middle Value: median
• Median: differs for odd and even sample sizes
– Show it on your hand!
Measures of Center
• Mean:
–
–
–
–
–
Add and divide
“fair share”
Pencil activity
Block activity
Glass/beaker activity
Measures of Center
Applets for comparing medians and means:
http://onlinestatbook.com/stat_sim/descriptive/index.html
http://www.stat.tamu.edu/~west/ph/
http://bcs.whfreeman.com/ips4e/cat_010/applets/meanmedian.html
Measures of Center: Misconceptions
• When is the mean not a good measure of center?
• The mean doesn’t have to be a value in the data set.
– The mean number of children per household
is 2.5 children!
Why we need measures of Spread?
Midterms are returned and the “average”
was reported as 76 out of 100.
You received a score of 88.
How should you feel?
Measures of Spread
• Look at the data: discuss spread.
• Range = Max – Min = Spread of 100% of data
• Interquartile Range =
IQR = Q3 – Q1 = Spread of middle 50% of data
– Needed for boxplots, it is the length of the box
Measures of Spread
• Mean Absolute Deviation
– MAD=
𝑛
𝑖=1
𝑥𝑖 −𝑥
𝑛
– Average distance of values from the mean
• Standard Deviation
– 𝑠=
𝑛
𝑖=1
𝑥𝑖 −𝑥 2
𝑛−1
– Interpretation is similar to MAD
Increasing Spread
Consider the following three data sets.
I: 20 20 20
II: 18 20 22
III: 17 20 23
(a) Which data set will have the smallest standard deviation?
(b) Which data set will have the largest standard deviation?
(c) Find the standard deviation for each data set
and check your answers to (a) and (b).
Increasing Spread ~ Instructor Side
Different Graphs, Same Data
Bin Sizes in Histograms
Applet:
http://www.stat.sc.edu/~west/javahtml/Histogram.html
Boxplots and Symmetry
Boxplots and Symmetry
Back to the Core
• CCSS.Math.Content.HSS-ID.A.4
Use mean and standard deviation of a data set to fit it to a
normal distribution and to estimate population percentages.
Recognize that there are data sets for which such a procedure
is not appropriate. Use calculators, spreadsheets, and tables
to estimate areas under the normal curve.
Normally Distributed Data
• Bell/Mound Shape
– Symmetric
– Mean ~ median
• Z-scores
– 𝑧=
𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛−𝑚𝑒𝑎𝑛
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
– Empirical Rule as Frame of Reference
– Take them to calculator or table to get probability
Empirical Rule
For bell-shaped histograms, approximately …
• 68% of values fall within 1 standard deviation
of mean in either direction.
• 95% of values fall within 2 standard deviations of mean in
either direction.
• 99.7% of values fall within 3 standard deviations of mean in
either direction.
A very useful frame of reference!
Exam Scores
Scores on final exam have approximately a bell-shaped
distribution with a mean score of 70 points and a
standard deviation of 10 points.
Sketch a picture…
Exam Scores
Scores on final exam have approximately a bell-shaped
distribution with a mean score of 70 points and a standard
deviation of 10 points.
Suppose you scored 80 points on the exam.
How many standard deviations from the mean is your score?
Standard Score or z-score
z 
observed value  mean
standard
deviation
Empirical Rule (in terms of z-scores)
For bell-shaped curves, approximately…
• 68% of the values have z-scores between –1 and 1.
• 95% of the values have z-scores between –2 and 2.
• 99.7% of the values have z-scores between –3 and 3.
Exam Scores
Scores on final exam have approximately a bell-shaped
distribution with a mean score of 70 points and a standard
deviation of 10 points.
Suppose Rob’s score was 2 standard deviations above the mean.
What was Rob’s score?
What can you say about the proportion of students
who scored higher than Rob?
Check for Nonnormal Features
• Are these
normal?
• Why/Why
not?
Comparing Distributions
Comparing Distributions ~ Instructor Side
Are you a Good Timer?
• Quick Experiment:
– Close your eyes
– When you here the “START”,
begin counting off seconds in your head
– When you here the “STOP”, write down the
number you reached
Are you a Good Timer?
• Come up and Graph the results
• What do we see?
• Keep your result – we will revisit it later…
Back to the Core
Draw informal comparative inferences about two populations.
•
•
•
CCSS.Math.Content.7.SP.B.3 Informally assess the degree of visual overlap of two
numerical data distributions with similar variabilities, measuring the difference
between the centers by expressing it as a multiple of a measure of variability. For
example, the mean height of players on the basketball team is 10 cm greater than
the mean height of players on the soccer team, about twice the variability (mean
absolute deviation) on either team; on a dot plot, the separation between the two
distributions of heights is noticeable.
CCSS.Math.Content.7.SP.B.4 Use measures of center and measures of variability for
numerical data from random samples to draw informal comparative inferences
about two populations. For example, decide whether the words in a chapter of a
seventh-grade science book are generally longer than the words in a chapter of a
fourth-grade science book.
CCSS.Math.Content.HSS-IC.B.5 Use data from a randomized experiment to
compare two treatments; use simulations to decide if differences between
parameters are significant.
Parallel Graphs
• Use ideas
from before
to compare:
– Shape
– Center
– Spread
• Be sure to
use same
scale!
Parallel Graphs
What do you see?
Revisit the Timer Experiment
• How else might we explore this data?
• What would be some interesting comparisons
to make?
• Website about parallel plots, you can enter
data for 2+ groups and graphs made for you:
http://www.physics.csbsju.edu/stats/box2.html
Balancing your Design
• Study collects data on which treatment group the
subject was assigned, the main response (time to
cure), and also other variables like age.
• They want to compare the responses for the two
treatment groups, but are concerned that age might
also be related to the response.
• Should check to see that age is balanced for the two
treatment groups before looking for differences in
the response by treatment.
Comparing Data: Usefulness of Randomization
Study to compare two antibiotics for treating strep throat in children, Amoxicillin and Cefadroxil.
At one center, 23 children were randomly assigned to one of two treatment groups.
One concern is that age of the child might influence the effectiveness of the antibiotics.
The ages of the children in each treatment group are given below.
How do the two groups compare with respect to age?
Give the five-number summary for each group. Comment on your results.
Amoxicillin Group (n=11):
8
9
9
10
7
8
9
9
10
11
11
12
14 14
17
Five-number summary:
Cefadroxil Group (n=12):
9
10
Five-number summary:
Make side-by-side boxplots for the antibiotic study data.
10 11
12
13
14
16
Comparing Data: Usefulness of Randomization
~ Instructor Side
Give the five-number summary for each of the two treatment groups.
Comment on your results.
16
14
Amoxicillin Group (n=11): 8 9 9 10 10 11 11 12 14 14 17
Five-number summary: min=8, Q1=9, median=11, Q3=14, max=17
Cefadroxil Group (n=12): 7 8 9 9 9 10 10 11 12 13 14 16
Five-number summary: min=7, Q1=9, median=10, Q3=12.5, max=16
12
10
8
6
Amoxicillin
Cefadroxil
How long? 10 minutes
How might it be done? Ask students to work through this exercise with a partner -- one person
can do it for the Amoxicillin data and the other for the Cefadroxil data. Then discuss the results.
You could have students start with all 23 children and perform the randomization themselves with
a partner. Then each group will have a different answer and the class can see the effect of
randomization overall. There may be a group for which the randomization did not do so well -randomization does not guarantee balancing.
How important? Important. It reinforces the concept of why randomization is a useful technique.
A complete exercise for comparing two groups and assessing if the researchers need to control for
age differences in evaluating the effectiveness of the two antibiotics.
Which is Most Convincing?
Study 1
Study 2
Study 3
Is there a Difference?
Using Simulations
• Background
of study here
how this is
taking many
samples of
size 10 from
same
population
Is there a Difference?
Using Simulations
Experiment: measuring effect
of caffeine (0 mg vs 200 mg)
and deciding if have differing
effects on number of finger
taps per minute (2 hours later)
Resampling Applet
• Applet for resampling:
http://lock5stat.com/statkey/
• We will see more of this on Day 3
Day 1 Wrap Up
•
•
•
•
•
What surprised you today?
What did you find interesting?
How might you bring these ideas to your class?
What would you change?
Other activities/ideas to share with the group?
Download