class intervals

advertisement
Last Time
• Hypothesis Testing
– 1-sided vs. 2-sided Paradox
• Big Picture Goals
– Hypothesis Testing
– Margin of Error
– Sample Size Calculations
• Visualization
– Histograms
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Excel notation to avoid actual calculation
– So no computers or calculators
• Bring sheet of formulas, etc.
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Excel notation to avoid actual calculation
– So no computers or calculators
• Bring sheet of formulas, etc.
• No blue books needed
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Excel notation to avoid actual calculation
– So no computers or calculators
• Bring sheet of formulas, etc.
• No blue books needed
(will just write on my printed version)
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Material Covered:
HW 1 – HW 5
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Material Covered:
HW 1 – HW 5
– Note: due Thursday, Feb. 19
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Material Covered:
HW 1 – HW 5
– Note: due Thursday, Feb. 19
– Will ask grader to return Mon. Feb. 23
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Material Covered:
HW 1 – HW 5
– Note: due Thursday, Feb. 19
– Will ask grader to return Mon. Feb. 23
– Can pickup in my office (Hanes 352)
Administrative Matters
Midterm I, coming Tuesday, Feb. 24
• Material Covered:
HW 1 – HW 5
– Note: due Thursday, Feb. 19
– Will ask grader to return Mon. Feb. 23
– Can pickup in my office (Hanes 352)
– So today’s HW not included
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 261-262, 9-14, 270-276, 30-34
Approximate Reading for Next Class:
Pages 279-282, 34-43
Big Picture
•
Hypothesis Testing
(Given dist’n, answer “yes-no”)
•
Margin of Error
(Find dist’n, use to measure error)
•
Choose Sample Size
(for given amount of error)
Need better prob. tools
Big Picture
•
Margin of Error
•
Choose Sample Size
Need better prob tools
Start with visualizing probability distributions
(key to “alternate representation”)
Histograms
Idea: show rectangles, where area represents
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
(b) Lists (of numbers):
# of observations
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
(b) Lists (of numbers):
# of observations
Note: will studies these in parallel for a while
(several concepts apply to both)
Histograms
Idea: show rectangles, where area represents:
(a) Distributions: probabilities
(b) Lists (of numbers):
# of observations
Caution: There are variations not based on
areas, see bar graphs in text
But eye perceives area, so sensible to use it
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns:
If possible values are: x = 0, 1, … , n,
get good picture from choice:
[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)
where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5”
(called a “half open interval”)
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Start with [1,3), [3,7)
•
As above use half open intervals
(to break ties)
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
a. Prob. dist’ns
b. Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Start with [1,3), [3,7)
•
Can use anything for class intervals
•
But some choices better than others…
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
2. Find “probabilities” or “relative frequencies”
for each class
(a) Probs: use f(x) for [x-½, x+½), etc.
(b) Lists: [1,3): rel. freq. = 1/5 = 20%
[3,7): rel. freq. = 4/5 = 80%
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
2. Find “probabilities” or “relative frequencies”
for each class
3. Above each interval, draw rectangle where
area represents class frequency
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs: If width = 1, then
area = width x height = height
So get area = f(x), by taking height = f(x)
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs: If width = 1, then
area = width x height = height
So get area = f(x), by taking height = f(x)
E.g. Binomial Distribution
Binomial Prob. Histograms
From Class Example 5
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls
Construct Prob. Histo:
•
Create column of x values
•
Compute f(x) values
•
Make bar plot
Binomial Prob. Histograms
•
Make bar plot
–
–
–
“Insert” tab
Choose “Column”
Right Click – Select Data
(Horizontal – x’s, “Add series”, Probs)
Resize, and move by dragging
Delete legend
Click and change title
Right Click on Bars, Format Data Series:
–
–
–
–
•
•
Border Color, Solid Line, Black
Series Options, Gap Width = 0
Binomial Prob. Histograms
From Class Example 5
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls
Construct Prob. Histo:
•
Create column of x values
•
Compute f(x) values
•
Make bar plot
•
Make several, for interesting comparison
Binomial Prob. Histograms
From Class Example 5a
Binomial Prob. Histograms
From Class Example 5a
Compare
Different p
Binomial Prob. Histograms
From Class Example 5a
Compare
Different p:
•
Surprisingly
similar
“mound”
shape
Binomial Prob. Histograms
From Class Example 5a
Compare
Different p:
•
Surprisingly
similar
“mound”
shape
(will exploit this fact)
Binomial Prob. Histograms
From Class Example 5a
Compare
Different p:
•
Centerpoint
moves
as p grows
Binomial Prob. Histograms
From Class Example 5a
Compare
Different p:
•
Centerpoint
moves
as p grows
(will quantify, and use this, too)
Binomial Prob. Histograms
Important point:
Binomial shows common shape across p
Binomial Prob. Histograms
Important point:
Binomial shows common shape across p
Mound Shape
(like dumping dirt out of a truck)
Binomial Prob. Histograms
Important point:
Binomial shows common shape across p
Mound Shape
(like dumping dirt out of a truck)
What about n?
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n:
•
Again very
similar
mound
shape
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n:
•
Again very
similar
mound
shape
(will exploit this fact)
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n:
•
Center does
not appear
to move
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n:
•
Center does
not appear
to move,
but check axes!
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n:
•
Center does
not appear
to move,
but check axes!
(will quantify, and use this, too)
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n:
•
But width of
bump does
seem to
change
Binomial Prob. Histograms
From Class Example 5b
Compare
Different n:
•
But width of
bump does
seem to
change
(will quantify, and use this, too)
Binomial Prob. Histograms
Important point:
Binomial shows common shape across p & n
Mound Shape
(like dumping dirt out of a truck)
Binomial Prob. Histograms
Important point:
Binomial shows common shape across p & n
Mound Shape
(like dumping dirt out of a truck)
Question for later: How can we put this work?
And now for something
(sort of) different
Recall survey from first class meeting
And now for something
(sort of) different
Recall survey from first class meeting
Display Results?
And now for something
(sort of) different
Recall survey from first class meeting
Display Results?
Use “bar graph”
And now for something
(sort of) different
Bar Graph from Survey, on major
35
30
25
20
15
10
5
0
And now for something
(sort of) different
Bar Graph from Survey, on major
Business
biggest
(true for
many
years)
35
30
25
20
15
10
5
0
And now for something
(sort of) different
Bar Graph from Survey, on major
Business
biggest
Biology 2nd
(fairly new)
35
30
25
20
15
10
5
0
And now for something
(sort of) different
Bar Graph from Survey, on major
Business
biggest
Biology 2nd
Variety of
others
35
30
25
20
15
10
5
0
Welcome!
And now for something
(sort of) different
Bar Graph from Survey, on major
35
Labels, not
Class Intervals
30
25
20
15
10
5
0
And now for something
(sort of) different
Bar Graph from Survey, on major
35
30
25
20
Thin bars
Now OK
15
10
5
0
And now for something
(sort of) different
Bar Graph from Survey, on major
35
30
25
20
15
10
Study Counts,
not rel. freq.
5
0
And now for something
(sort of) different
Bar Graph from Survey, on major
35
30
25
20
15
10
Study Counts,
not rel. freq.
(not areas)
5
0
And now for something
(sort of) different
Bar Graph from Survey, on year
60
50
40
30
20
10
0
Freshman
Sophomore
Junior
Senior
Grad
And now for something
(sort of) different
Bar Graph from Survey, on year
60
Distribution
makes sense?
50
40
30
20
10
0
Freshman
Sophomore
Junior
Senior
Grad
And now for something
(sort of) different
Bar Graph from Survey, on year
60
50
Different color
stresses
different data
40
30
20
10
0
Freshman
Sophomore
Junior
Senior
Grad
And now for something
(sort of) different
Bar Graph from Survey, on year
60
50
40
Shorter & fewer
labels appear as
horizontal
30
20
10
0
Freshman
Sophomore
Junior
Senior
Grad
Histograms
Steps for Constructing Histograms:
1. Pick class intervals that contain full dist’n
2. Find “probabilities” or “relative frequencies”
for each class
3. Above each interval, draw rectangle where
area represents class frequency
Histograms
HW: 5.21b (make & print an Excel plot)
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs
(b) Lists
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs
(b) Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
same e.g. as above
Histograms
3. Above each interval, draw rectangle where
area represents class frequency
(a) Probs
(b) Lists:
e.g. 2.3, 4.5, 4.7, 4.8, 5.1
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
From above discussion
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
From above discussion
(will see: not very good)
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
20
Total Frequency = 100%
15
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
20
Total Frequency = 100%
15
So each is 20%
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
20
Total Frequency = 100%
15
20% = Area
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
20
Total Frequency = 100%
15
20% = Area = 2 * height
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
20
Total Frequency = 100%
15
20% = Area = 2 * ht = 2 * (10% / unit)
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
% per unit
20
Total Frequency = 100%
15
20% = Area = 2 * ht = 2 * (10% / unit)
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
% per unit
20
Total Frequency = 100%
15
20% = Area = 4 * ht
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
% per unit
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
20
Total Frequency = 100%
15
20% = Area = 4 * ht = 4 * (5% / unit)
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
% per unit
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
20
Total Frequency = 100%
15
20% = Area = 4 * ht = 4 * (5% / unit)
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
% per unit
20
20% = Area = 4 * ht = 4 * (5% / unit)
15
10
5
1
2
3
4
5
6
7
Histograms
Rectangles - area represents class frequency
2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)
% per unit
20
15
10
5
1
2
3
4
5
6
7
Histograms
Note: This histogram hides structure in data:
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
20
15
10
5
1
2
3
4
5
6
7
Histograms
Quite sparse region
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
20
15
10
5
1
2
3
4
5
6
7
Histograms
Quite dense region
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
20
15
10
5
1
2
3
4
5
6
7
Histograms
Endpoints way off
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
20
15
10
5
1
2
3
4
5
6
7
Histograms
General Major Challenge:
Choice of Class Intervals
% per unit
20
15
10
5
1
2
3
4
5
6
7
Histograms
Try for “better” choice:
2.3, 4.5, 4.7, 4.8, 5.1
1
2
3
4
5
6
7
Histograms
Try for “better” choice:
2.3, 4.5, 4.7, 4.8, 5.1
[2,4)
[4,5)
[5,6)
1
2
3
4
5
6
7
Histograms
Now build histogram as above (areas):
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
60
30
1
2
3
4
5
6
7
Histograms
Now build histogram as above (areas):
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
60
30
1
2
3
4
5
6
7
Histograms
Now build histogram as above (areas):
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
60
30
1
2
3
4
5
6
7
Histograms
Now build histogram as above (areas):
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
60
30
1
2
3
4
5
6
7
Histograms
Now build histogram as above (areas):
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
60
30
1
2
3
4
5
6
7
Histograms
Note: much better visual impression
2.3, 4.5, 4.7, 4.8, 5.1
% per unit
60
30
1
2
3
4
5
6
7
Histograms
Note: much better visual impression
Histogram better reflects “structure in data”
% per unit
60
30
1
2
3
4
5
6
7
Histograms
General Comments:
•
Total area under histogram is 100%
Histograms
General Comments:
•
Total area under histogram is 100%
•
So label vertical axis as “% per unit”
Histograms
General Comments:
•
Total area under histogram is 100%
•
So label vertical axis as “% per unit”
•
Synonym for “Class Interval” is “bin”
Histograms
General Comments:
•
Total area under histogram is 100%
•
So label vertical axis as “% per unit”
•
Synonym for “Class Interval” is “bin”
(think of relative frequency as counting
observations that “fall into bins”)
Histograms
General Comments:
•
Total area under histogram is 100%
•
So label vertical axis as “% per unit”
•
Synonym for “Class Interval” is “bin”
(think of relative frequency as counting
observations that “fall into bins”)
•
Choice of bins is critical
Histograms
General Comments:
•
Total area under histogram is 100%
•
So label vertical axis as “% per unit”
•
Synonym for “Class Interval” is “bin”
(think of relative frequency as counting
observations that “fall into bins”)
•
Choice of bins is critical
•
Common Simplification: Equally spaced
Histograms
General Comments:
•
Choice of bins is critical
•
Common Simplification: Equally spaced
•
But still have choice of binwidth
(also very challenging)
Histograms
HW:
C15 For the data:
0.8, 2.1, 2.6, 0.9, 2.2, 0.8, 2.2, 0.9
a) Make histograms using the bins:
i.
[0,1), [1,2), [2,3)
ii.
[0.5,1.5), [1.5,2.5), [2.5,3.5)
iii.
[0,1), 1,3)
(Interesting to look at differences)
Histograms
HW:
C15 For the data:
0.8, 2.1, 2.6, 0.9, 2.2, 0.8, 2.2, 0.9
a) Make histograms using the bins:
i.
[0,1), [1,2), [2,3)
ii.
[0.5,1.5), [1.5,2.5), [2.5,3.5)
iii.
[0,1), 1,3)
b) Why are bins [0,2), [1,3) inappropriate here?
c) Why are bins [1,2), [2,5) inappropriate here?
Histogram Real Data Example
Buffalo Snow Fall Data
•
Annual totals (in inches)
Histogram Real Data Example
Buffalo Snow Fall Data
•
Annual totals (in inches)
•
For Buffalo, N.Y.
Histogram Real Data Example
Buffalo Snow Fall Data
•
Annual totals (in inches)
•
For Buffalo, N.Y.
•
63 years, ranging from ~30 to ~120
Histogram Real Data Example
Buffalo Snow Fall Data
•
Annual totals (in inches)
•
For Buffalo, N.Y.
•
63 years, ranging from ~30 to ~120
•
A lot of snow, due to “lake effect”
Histogram Real Data Example
Buffalo Snow Fall Data
•
Annual totals (in inches)
•
For Buffalo, N.Y.
•
63 years, ranging from ~30 to ~120
•
A lot of snow, due to “lake effect”
•
Any patterns in data?
Histogram Real Data Example
Buffalo Snow Fall Data
•
Data Available in Class Example 6
•
Left hand column of spreadsheet:
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg6.xls
Histogram Real Data Example
Buffalo Snow Fall Data
•
Data Available in Class Example 6
•
Left hand column of spreadsheet:
http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg6.xls
•
Now do histogram analysis
•
Using Excel
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Data Tab
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Data Tab
•
Push Data Analysis Button
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Data Tab
•
Push Data Analysis Button
•
Pulls up:
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Data Tab
•
Push Data Analysis Button
•
Pulls up:
•
Choose:
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Pulls Up:
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Pulls Up:
•
Link input data
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Pulls Up:
•
Link input data
•
Empty for default
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Pulls Up:
•
Link input data
•
Empty for default
•
Choose here
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Pulls Up:
•
Link input data
•
Empty for default
•
Choose here
•
And location
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Pulls Up:
•
Link input data
•
Empty for default
•
Choose here
•
And location
•
Get Histo Plot
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Manually Chart Result???
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Manually Chart Result???
•
Twiddle Output (similar to above):
•
Delete Series Legend
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Manually Chart Result???
•
Twiddle Output (similar to above):
•
Delete Series Legend
•
Format Data Series – Gap Width  0
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Manually Chart Result???
•
Twiddle Output (similar to above):
•
Delete Series Legend
•
Format Data Series – Gap Width  0
•
Format Data Series – Border Color  Black
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Manually Chart Result???
•
Twiddle Output (similar to above):
•
Delete Series Legend
•
Format Data Series – Gap Width  0
•
Format Data Series – Border Color  Black
•
Chart Tools – Design – Choose Titled
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Manually Chart Result???
•
Twiddle Output (similar to above):
•
Delete Series Legend
•
Format Data Series – Gap Width  0
•
Format Data Series – Border Color  Black
•
Chart Tools – Design – Choose Titled
•
Type in Title
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Result:
Excel Default Bins
18
16
14
12
10
8
6
4
2
0
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
•
Result:
Excel Default Bins
18
16
Unround numbers
14
12
for bin edges
10
8
6
4
2
0
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
•
Result:
Excel Default Bins
18
16
Unround numbers
14
12
for bin edges
10
8
•
Hard to interpret
6
4
2
0
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Data centered
Excel Default Bins
18
16
around 90
14
12
10
8
6
4
2
0
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Data centered
Excel Default Bins
18
16
around 90
•
14
12
Most data between
10
8
50 and 130
6
4
2
0
Histogram Real Data Example
Buffalo Snow Fall Data – Excel Default Histo
•
Data centered
Excel Default Bins
18
16
around 90
•
14
12
Most data between
10
8
50 and 130
6
4
•
Assymetric
2
0
Distribution
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
Manual bins Small Binwidth
8
7
6
5
4
3
2
1
0
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
•
•
Chosen by me
Binwidth = 5, << ~13 from EXCEL default
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
•
•
•
Chosen by me
Binwidth = 5, << ~13 from EXCEL default
Nicer edge numbers
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
•
•
•
•
Chosen by me
Binwidth = 5, << ~13 from EXCEL default
Nicer edge numbers
Data centered around 84 (now more precise)
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
•
•
•
•
•
Chosen by me
Binwidth = 5, << ~13 from EXCEL default
Nicer edge numbers
Data centered around 84 (now more precise)
Bar graph rougher (fewer points in each bin)
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
•
•
•
•
•
•
Chosen by me
Binwidth = 5, << ~13 from EXCEL default
Nicer edge numbers
Data centered around 84 (now more precise)
Bar graph rougher (fewer points in each bin)
Suggests 3 main groups
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
•
•
•
•
•
•
Chosen by me
Binwidth = 5, << ~13 from EXCEL default
Nicer edge numbers
Data centered around 84 (now more precise)
Bar graph rougher (fewer points in each bin)
Suggests 3 main groups
(called “modes” or “clusters”)
Histogram Real Data Example
Buffalo Snow Fall Data – Smaller binwidth
•
•
•
•
•
•
Chosen by me
Binwidth = 5, << ~13 from EXCEL default
Nicer edge numbers
Data centered around 84 (now more precise)
Bar graph rougher (fewer points in each bin)
Suggests 3 main groups
(called “modes” or “clusters”)
(can’t see this above: bin width is important)
Histogram Real Data Example
Buffalo Snow Fall Data – Larger binwidth
Histogram Real Data Example
Buffalo Snow Fall Data – Larger binwidth
Manual Bins - Large BInwidth
30
25
20
15
10
5
0
1
2
3
4
5
Histogram Real Data Example
Buffalo Snow Fall Data – Larger binwidth
•
Chosen by me
•
Binwidth = 30, >> ~13 from EXCEL default
Histogram Real Data Example
Buffalo Snow Fall Data – Larger binwidth
•
Chosen by me
•
Binwidth = 30, >> ~13 from EXCEL default
•
Bar graph is “smooth”
(since many points in each bin)
Histogram Real Data Example
Buffalo Snow Fall Data – Larger binwidth
•
Chosen by me
•
Binwidth = 30, >> ~13 from EXCEL default
•
Bar graph is “smooth”
(since many points in each bin)
•
Only one mode (cluster)???
Histogram Real Data Example
Buffalo Snow Fall Data – Larger binwidth
•
Chosen by me
•
Binwidth = 30, >> ~13 from EXCEL default
•
Bar graph is “smooth”
(since many points in each bin)
•
Only one mode (cluster)???
•
Quite symmetric?
Histogram Real Data Example
Buffalo Snow Fall Data – Larger binwidth
•
Chosen by me
•
Binwidth = 30, >> ~13 from EXCEL default
•
Bar graph is “smooth”
(since many points in each bin)
•
Only one mode (cluster)???
•
Quite symmetric?
(different from above: bin width is important)
Histogram Real Data Example
HW:
1.28
[data in ta01_005.xls]
((c) loses bump near 50)
1.36
[data in ex01_036.xls]
((a) 4 (b) 2 (c) 1)
1.37
1.39
Research Corner
Histo Bin Width
(serious issue)
Research Corner
Histo Bin Width
(serious issue)
Interesting Data Set: Hidalgo Stamps
Research Corner
Histo Bin Width
(serious issue)
Interesting Data Set: Hidalgo Stamps
•
Famous among postage stamp collectors
Research Corner
Histo Bin Width
(serious issue)
Interesting Data Set: Hidalgo Stamps
•
Famous among postage stamp collectors
•
Printed in Mexico, 1800’s, over ~70 years
Research Corner
Histo Bin Width
(serious issue)
Interesting Data Set: Hidalgo Stamps
•
Famous among postage stamp collectors
•
Printed in Mexico, 1800’s, over ~70 years
•
Very different paper thicknesses…
Research Corner
Histo Bin Width
(serious issue)
Interesting Data Set: Hidalgo Stamps
•
Famous among postage stamp collectors
•
Printed in Mexico, 1800’s, over ~70 years
•
Very different paper thicknesses…
•
How many paper sources?
Research Corner
Histo Bin Width
(serious issue)
Interesting Data Set: Hidalgo Stamps
•
Famous among postage stamp collectors
•
Printed in Mexico, 1800’s, over ~70 years
•
Very different paper thicknesses…
•
How many paper sources?
•
Unknown, since records are lost
Research Corner
Histo Bin Width
(serious issue)
Interesting Data Set: Hidalgo Stamps
•
Famous among postage stamp collectors
•
Printed in Mexico, 1800’s, over ~70 years
•
Very different paper thicknesses…
•
How many paper sources?
•
Unknown, since records are lost
•
Study histogram of stamp thicknesses
Research Corner
Movie over binwidth
Research Corner
Movie over binwidth
Shows very wide range
Research Corner
Movie over binwidth
Shows very wide range
(much different
visual impressions)
Research Corner
Movie over binwidth
Shows very wide range
(much different
visual impressions)
How many bumps?
Research Corner
Movie over binwidth
Shows very wide range
(much different
visual impressions)
How many bumps?
Answer published in
literature: 2, 3, 5, 7, 10
Research Corner
Movie over binwidth
Shows very wide range
(much different
visual impressions)
How many bumps?
Answer published in
literature: 2, 3, 5, 7, 10
Very challenging question
Research Corner
How many bumps?
Believe in 2?
Research Corner
How many bumps?
Believe in 3?
Research Corner
How many bumps?
Believe in 5?
Research Corner
How many bumps?
Believe in 7?
Research Corner
How many bumps?
Believe in 10?
Big Picture
•
Margin of Error
•
Choose Sample Size
Need better prob tools
Start with visualizing probability distributions
Big Picture
•
Margin of Error
•
Choose Sample Size
Need better prob tools
Start with visualizing probability distributions,
Next exploit constant shape property of Bi
Big Picture
Start with visualizing probability distributions,
Next exploit constant shape property of Binom’l
Big Picture
Start with visualizing probability distributions,
Next exploit constant shape property of Binom’l
Centerpoint feels p
Big Picture
Start with visualizing probability distributions,
Next exploit constant shape property of Binom’l
Centerpoint feels p
Spread feels n
Big Picture
Start with visualizing probability distributions,
Next exploit constant shape property of Binom’l
Centerpoint feels p
Spread feels n
Big Picture
Start with visualizing probability distributions,
Next exploit constant shape property of Binom’l
Centerpoint feels p
Spread feels n
Now quantify these ideas, to put them to work
Notions of Center
Will later study “notions of spread”
Notions of Center
Textbook: Sections 4.4 and 1.2
Notions of Center
Textbook: Sections 4.4 and 1.2
Recall parallel development:
(a)
Probability Distributions
(b)
Lists of Numbers
Notions of Center
Textbook: Sections 4.4 and 1.2
Recall parallel development:
(a)
Probability Distributions
(b)
Lists of Numbers
Study 1st, since easier
Notions of Center
(b) Lists of Numbers
“Average” or “Mean”
Notions of Center
(b) Lists of Numbers
“Average” or “Mean” of x1, x2, …, xn
n
Mean =
x
i 1
n
i
=
x
Notions of Center
(b) Lists of Numbers
“Average” or “Mean” of x1, x2, …, xn
n
Mean =
x
i 1
i
=
x
n
common notation
Notions of Center
(b) Lists of Numbers
“Average” or “Mean” of x1, x2, …, xn
n
Mean =
x
i 1
i
=
x
n
(as before) Greek sigma for sum
means “sum over I = 1,…,n”
Notions of Center
HW:
C16: for the data of 1.57, find the mean using
the Excel function AVERAGE
(10.03)
Notions of Center
Generalization of Mean:
“Weighted Average”
Notions of Center
Generalization of Mean:
“Weighted Average”
Idea: allow non-equal weights on
xi s:
Notions of Center
Generalization of Mean:
“Weighted Average”
Idea: allow non-equal weights on
n
w x
i 1
i i
xi s:
Notions of Center
Generalization of Mean:
“Weighted Average”
Idea: allow non-equal weights on
n
w x
i i
i 1
Where
wi  0 ,
w
i
i
1
xi s:
Notions of Center
Generalization of Mean:
“Weighted Average”
E.g.: ordinary mean has each
wi 
1
n
Notions of Center
Generalization of Mean:
“Weighted Average”
E.g.: ordinary mean has each
wi 
(constant weights)
1
n
Notions of Center
Generalization of Mean:
“Weighted Average”
Intuition:
Corresponds to finding balance
point of weights on number line
Notions of Center
Generalization of Mean:
“Weighted Average”
Intuition:
Corresponds to finding balance
point of weights on number line
x1 x2
x3
Notions of Center
Generalization of Mean:
“Weighted Average”
Intuition:
Corresponds to finding balance
point of weights on number line
x1 x2
x3
Notions of Center
Generalization of Mean:
“Weighted Average”
Intuition:
Corresponds to finding balance
point of weights on number line
x1 x2
x3
Notions of Center
HW:
C17:
Calculate (and think about as
“balance point”) weighted average of 1, 2,
3, 10 for the weights:
a. ¼, ¼, ¼, 1/4, (ordinary avg.)
(4)
b. 0.1, 0.1, 0.1, 0.7
(more on 10)
(7.6)
c. 0.3, 0.3, 0.3, 0.1
(less on 10)
(2.8)
d. 1/3, 1/3, 1/3, 0
(none on 10)
(2)
e. 0, 1, 0, 0
(all on 2)
(2)
Download