Last Time • Hypothesis Testing – 1-sided vs. 2-sided Paradox • Big Picture Goals – Hypothesis Testing – Margin of Error – Sample Size Calculations • Visualization – Histograms Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Excel notation to avoid actual calculation – So no computers or calculators • Bring sheet of formulas, etc. Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Excel notation to avoid actual calculation – So no computers or calculators • Bring sheet of formulas, etc. • No blue books needed Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Excel notation to avoid actual calculation – So no computers or calculators • Bring sheet of formulas, etc. • No blue books needed (will just write on my printed version) Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Material Covered: HW 1 – HW 5 Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Material Covered: HW 1 – HW 5 – Note: due Thursday, Feb. 19 Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Material Covered: HW 1 – HW 5 – Note: due Thursday, Feb. 19 – Will ask grader to return Mon. Feb. 23 Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Material Covered: HW 1 – HW 5 – Note: due Thursday, Feb. 19 – Will ask grader to return Mon. Feb. 23 – Can pickup in my office (Hanes 352) Administrative Matters Midterm I, coming Tuesday, Feb. 24 • Material Covered: HW 1 – HW 5 – Note: due Thursday, Feb. 19 – Will ask grader to return Mon. Feb. 23 – Can pickup in my office (Hanes 352) – So today’s HW not included Reading In Textbook Approximate Reading for Today’s Material: Pages 261-262, 9-14, 270-276, 30-34 Approximate Reading for Next Class: Pages 279-282, 34-43 Big Picture • Hypothesis Testing (Given dist’n, answer “yes-no”) • Margin of Error (Find dist’n, use to measure error) • Choose Sample Size (for given amount of error) Need better prob. tools Big Picture • Margin of Error • Choose Sample Size Need better prob tools Start with visualizing probability distributions (key to “alternate representation”) Histograms Idea: show rectangles, where area represents Histograms Idea: show rectangles, where area represents: (a) Distributions: probabilities (b) Lists (of numbers): # of observations Histograms Idea: show rectangles, where area represents: (a) Distributions: probabilities (b) Lists (of numbers): # of observations Note: will studies these in parallel for a while (several concepts apply to both) Histograms Idea: show rectangles, where area represents: (a) Distributions: probabilities (b) Lists (of numbers): # of observations Caution: There are variations not based on areas, see bar graphs in text But eye perceives area, so sensible to use it Histograms Steps for Constructing Histograms: 1. Pick class intervals that contain full dist’n Histograms Steps for Constructing Histograms: 1. Pick class intervals that contain full dist’n a. Prob. dist’ns: If possible values are: x = 0, 1, … , n, get good picture from choice: [-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½) where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5” (called a “half open interval”) Histograms Steps for Constructing Histograms: 1. Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Start with [1,3), [3,7) • As above use half open intervals (to break ties) Histograms Steps for Constructing Histograms: 1. Pick class intervals that contain full dist’n a. Prob. dist’ns b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Start with [1,3), [3,7) • Can use anything for class intervals • But some choices better than others… Histograms Steps for Constructing Histograms: 1. Pick class intervals that contain full dist’n 2. Find “probabilities” or “relative frequencies” for each class (a) Probs: use f(x) for [x-½, x+½), etc. (b) Lists: [1,3): rel. freq. = 1/5 = 20% [3,7): rel. freq. = 4/5 = 80% Histograms Steps for Constructing Histograms: 1. Pick class intervals that contain full dist’n 2. Find “probabilities” or “relative frequencies” for each class 3. Above each interval, draw rectangle where area represents class frequency Histograms 3. Above each interval, draw rectangle where area represents class frequency (a) Probs: If width = 1, then area = width x height = height So get area = f(x), by taking height = f(x) Histograms 3. Above each interval, draw rectangle where area represents class frequency (a) Probs: If width = 1, then area = width x height = height So get area = f(x), by taking height = f(x) E.g. Binomial Distribution Binomial Prob. Histograms From Class Example 5 http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls Construct Prob. Histo: • Create column of x values • Compute f(x) values • Make bar plot Binomial Prob. Histograms • Make bar plot – – – “Insert” tab Choose “Column” Right Click – Select Data (Horizontal – x’s, “Add series”, Probs) Resize, and move by dragging Delete legend Click and change title Right Click on Bars, Format Data Series: – – – – • • Border Color, Solid Line, Black Series Options, Gap Width = 0 Binomial Prob. Histograms From Class Example 5 http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls Construct Prob. Histo: • Create column of x values • Compute f(x) values • Make bar plot • Make several, for interesting comparison Binomial Prob. Histograms From Class Example 5a Binomial Prob. Histograms From Class Example 5a Compare Different p Binomial Prob. Histograms From Class Example 5a Compare Different p: • Surprisingly similar “mound” shape Binomial Prob. Histograms From Class Example 5a Compare Different p: • Surprisingly similar “mound” shape (will exploit this fact) Binomial Prob. Histograms From Class Example 5a Compare Different p: • Centerpoint moves as p grows Binomial Prob. Histograms From Class Example 5a Compare Different p: • Centerpoint moves as p grows (will quantify, and use this, too) Binomial Prob. Histograms Important point: Binomial shows common shape across p Binomial Prob. Histograms Important point: Binomial shows common shape across p Mound Shape (like dumping dirt out of a truck) Binomial Prob. Histograms Important point: Binomial shows common shape across p Mound Shape (like dumping dirt out of a truck) What about n? Binomial Prob. Histograms From Class Example 5b Compare Different n Binomial Prob. Histograms From Class Example 5b Compare Different n: • Again very similar mound shape Binomial Prob. Histograms From Class Example 5b Compare Different n: • Again very similar mound shape (will exploit this fact) Binomial Prob. Histograms From Class Example 5b Compare Different n: • Center does not appear to move Binomial Prob. Histograms From Class Example 5b Compare Different n: • Center does not appear to move, but check axes! Binomial Prob. Histograms From Class Example 5b Compare Different n: • Center does not appear to move, but check axes! (will quantify, and use this, too) Binomial Prob. Histograms From Class Example 5b Compare Different n: • But width of bump does seem to change Binomial Prob. Histograms From Class Example 5b Compare Different n: • But width of bump does seem to change (will quantify, and use this, too) Binomial Prob. Histograms Important point: Binomial shows common shape across p & n Mound Shape (like dumping dirt out of a truck) Binomial Prob. Histograms Important point: Binomial shows common shape across p & n Mound Shape (like dumping dirt out of a truck) Question for later: How can we put this work? And now for something (sort of) different Recall survey from first class meeting And now for something (sort of) different Recall survey from first class meeting Display Results? And now for something (sort of) different Recall survey from first class meeting Display Results? Use “bar graph” And now for something (sort of) different Bar Graph from Survey, on major 35 30 25 20 15 10 5 0 And now for something (sort of) different Bar Graph from Survey, on major Business biggest (true for many years) 35 30 25 20 15 10 5 0 And now for something (sort of) different Bar Graph from Survey, on major Business biggest Biology 2nd (fairly new) 35 30 25 20 15 10 5 0 And now for something (sort of) different Bar Graph from Survey, on major Business biggest Biology 2nd Variety of others 35 30 25 20 15 10 5 0 Welcome! And now for something (sort of) different Bar Graph from Survey, on major 35 Labels, not Class Intervals 30 25 20 15 10 5 0 And now for something (sort of) different Bar Graph from Survey, on major 35 30 25 20 Thin bars Now OK 15 10 5 0 And now for something (sort of) different Bar Graph from Survey, on major 35 30 25 20 15 10 Study Counts, not rel. freq. 5 0 And now for something (sort of) different Bar Graph from Survey, on major 35 30 25 20 15 10 Study Counts, not rel. freq. (not areas) 5 0 And now for something (sort of) different Bar Graph from Survey, on year 60 50 40 30 20 10 0 Freshman Sophomore Junior Senior Grad And now for something (sort of) different Bar Graph from Survey, on year 60 Distribution makes sense? 50 40 30 20 10 0 Freshman Sophomore Junior Senior Grad And now for something (sort of) different Bar Graph from Survey, on year 60 50 Different color stresses different data 40 30 20 10 0 Freshman Sophomore Junior Senior Grad And now for something (sort of) different Bar Graph from Survey, on year 60 50 40 Shorter & fewer labels appear as horizontal 30 20 10 0 Freshman Sophomore Junior Senior Grad Histograms Steps for Constructing Histograms: 1. Pick class intervals that contain full dist’n 2. Find “probabilities” or “relative frequencies” for each class 3. Above each interval, draw rectangle where area represents class frequency Histograms HW: 5.21b (make & print an Excel plot) Histograms 3. Above each interval, draw rectangle where area represents class frequency (a) Probs Histograms 3. Above each interval, draw rectangle where area represents class frequency (a) Probs (b) Lists Histograms 3. Above each interval, draw rectangle where area represents class frequency (a) Probs (b) Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 same e.g. as above Histograms 3. Above each interval, draw rectangle where area represents class frequency (a) Probs (b) Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) From above discussion 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) From above discussion (will see: not very good) 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 20 Total Frequency = 100% 15 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 20 Total Frequency = 100% 15 So each is 20% 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 20 Total Frequency = 100% 15 20% = Area 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 20 Total Frequency = 100% 15 20% = Area = 2 * height 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 20 Total Frequency = 100% 15 20% = Area = 2 * ht = 2 * (10% / unit) 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) % per unit 20 Total Frequency = 100% 15 20% = Area = 2 * ht = 2 * (10% / unit) 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) % per unit 20 Total Frequency = 100% 15 20% = Area = 4 * ht 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency % per unit 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 20 Total Frequency = 100% 15 20% = Area = 4 * ht = 4 * (5% / unit) 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency % per unit 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) 20 Total Frequency = 100% 15 20% = Area = 4 * ht = 4 * (5% / unit) 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) % per unit 20 20% = Area = 4 * ht = 4 * (5% / unit) 15 10 5 1 2 3 4 5 6 7 Histograms Rectangles - area represents class frequency 2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7) % per unit 20 15 10 5 1 2 3 4 5 6 7 Histograms Note: This histogram hides structure in data: 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 20 15 10 5 1 2 3 4 5 6 7 Histograms Quite sparse region 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 20 15 10 5 1 2 3 4 5 6 7 Histograms Quite dense region 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 20 15 10 5 1 2 3 4 5 6 7 Histograms Endpoints way off 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 20 15 10 5 1 2 3 4 5 6 7 Histograms General Major Challenge: Choice of Class Intervals % per unit 20 15 10 5 1 2 3 4 5 6 7 Histograms Try for “better” choice: 2.3, 4.5, 4.7, 4.8, 5.1 1 2 3 4 5 6 7 Histograms Try for “better” choice: 2.3, 4.5, 4.7, 4.8, 5.1 [2,4) [4,5) [5,6) 1 2 3 4 5 6 7 Histograms Now build histogram as above (areas): 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 60 30 1 2 3 4 5 6 7 Histograms Now build histogram as above (areas): 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 60 30 1 2 3 4 5 6 7 Histograms Now build histogram as above (areas): 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 60 30 1 2 3 4 5 6 7 Histograms Now build histogram as above (areas): 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 60 30 1 2 3 4 5 6 7 Histograms Now build histogram as above (areas): 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 60 30 1 2 3 4 5 6 7 Histograms Note: much better visual impression 2.3, 4.5, 4.7, 4.8, 5.1 % per unit 60 30 1 2 3 4 5 6 7 Histograms Note: much better visual impression Histogram better reflects “structure in data” % per unit 60 30 1 2 3 4 5 6 7 Histograms General Comments: • Total area under histogram is 100% Histograms General Comments: • Total area under histogram is 100% • So label vertical axis as “% per unit” Histograms General Comments: • Total area under histogram is 100% • So label vertical axis as “% per unit” • Synonym for “Class Interval” is “bin” Histograms General Comments: • Total area under histogram is 100% • So label vertical axis as “% per unit” • Synonym for “Class Interval” is “bin” (think of relative frequency as counting observations that “fall into bins”) Histograms General Comments: • Total area under histogram is 100% • So label vertical axis as “% per unit” • Synonym for “Class Interval” is “bin” (think of relative frequency as counting observations that “fall into bins”) • Choice of bins is critical Histograms General Comments: • Total area under histogram is 100% • So label vertical axis as “% per unit” • Synonym for “Class Interval” is “bin” (think of relative frequency as counting observations that “fall into bins”) • Choice of bins is critical • Common Simplification: Equally spaced Histograms General Comments: • Choice of bins is critical • Common Simplification: Equally spaced • But still have choice of binwidth (also very challenging) Histograms HW: C15 For the data: 0.8, 2.1, 2.6, 0.9, 2.2, 0.8, 2.2, 0.9 a) Make histograms using the bins: i. [0,1), [1,2), [2,3) ii. [0.5,1.5), [1.5,2.5), [2.5,3.5) iii. [0,1), 1,3) (Interesting to look at differences) Histograms HW: C15 For the data: 0.8, 2.1, 2.6, 0.9, 2.2, 0.8, 2.2, 0.9 a) Make histograms using the bins: i. [0,1), [1,2), [2,3) ii. [0.5,1.5), [1.5,2.5), [2.5,3.5) iii. [0,1), 1,3) b) Why are bins [0,2), [1,3) inappropriate here? c) Why are bins [1,2), [2,5) inappropriate here? Histogram Real Data Example Buffalo Snow Fall Data • Annual totals (in inches) Histogram Real Data Example Buffalo Snow Fall Data • Annual totals (in inches) • For Buffalo, N.Y. Histogram Real Data Example Buffalo Snow Fall Data • Annual totals (in inches) • For Buffalo, N.Y. • 63 years, ranging from ~30 to ~120 Histogram Real Data Example Buffalo Snow Fall Data • Annual totals (in inches) • For Buffalo, N.Y. • 63 years, ranging from ~30 to ~120 • A lot of snow, due to “lake effect” Histogram Real Data Example Buffalo Snow Fall Data • Annual totals (in inches) • For Buffalo, N.Y. • 63 years, ranging from ~30 to ~120 • A lot of snow, due to “lake effect” • Any patterns in data? Histogram Real Data Example Buffalo Snow Fall Data • Data Available in Class Example 6 • Left hand column of spreadsheet: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg6.xls Histogram Real Data Example Buffalo Snow Fall Data • Data Available in Class Example 6 • Left hand column of spreadsheet: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg6.xls • Now do histogram analysis • Using Excel Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Data Tab Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Data Tab • Push Data Analysis Button Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Data Tab • Push Data Analysis Button • Pulls up: Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Data Tab • Push Data Analysis Button • Pulls up: • Choose: Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Pulls Up: Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Pulls Up: • Link input data Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Pulls Up: • Link input data • Empty for default Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Pulls Up: • Link input data • Empty for default • Choose here Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Pulls Up: • Link input data • Empty for default • Choose here • And location Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Pulls Up: • Link input data • Empty for default • Choose here • And location • Get Histo Plot Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Manually Chart Result??? Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Manually Chart Result??? • Twiddle Output (similar to above): • Delete Series Legend Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Manually Chart Result??? • Twiddle Output (similar to above): • Delete Series Legend • Format Data Series – Gap Width 0 Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Manually Chart Result??? • Twiddle Output (similar to above): • Delete Series Legend • Format Data Series – Gap Width 0 • Format Data Series – Border Color Black Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Manually Chart Result??? • Twiddle Output (similar to above): • Delete Series Legend • Format Data Series – Gap Width 0 • Format Data Series – Border Color Black • Chart Tools – Design – Choose Titled Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Manually Chart Result??? • Twiddle Output (similar to above): • Delete Series Legend • Format Data Series – Gap Width 0 • Format Data Series – Border Color Black • Chart Tools – Design – Choose Titled • Type in Title Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Result: Excel Default Bins 18 16 14 12 10 8 6 4 2 0 Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • • Result: Excel Default Bins 18 16 Unround numbers 14 12 for bin edges 10 8 6 4 2 0 Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • • Result: Excel Default Bins 18 16 Unround numbers 14 12 for bin edges 10 8 • Hard to interpret 6 4 2 0 Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Data centered Excel Default Bins 18 16 around 90 14 12 10 8 6 4 2 0 Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Data centered Excel Default Bins 18 16 around 90 • 14 12 Most data between 10 8 50 and 130 6 4 2 0 Histogram Real Data Example Buffalo Snow Fall Data – Excel Default Histo • Data centered Excel Default Bins 18 16 around 90 • 14 12 Most data between 10 8 50 and 130 6 4 • Assymetric 2 0 Distribution Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth Manual bins Small Binwidth 8 7 6 5 4 3 2 1 0 Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth • • Chosen by me Binwidth = 5, << ~13 from EXCEL default Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth • • • Chosen by me Binwidth = 5, << ~13 from EXCEL default Nicer edge numbers Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth • • • • Chosen by me Binwidth = 5, << ~13 from EXCEL default Nicer edge numbers Data centered around 84 (now more precise) Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth • • • • • Chosen by me Binwidth = 5, << ~13 from EXCEL default Nicer edge numbers Data centered around 84 (now more precise) Bar graph rougher (fewer points in each bin) Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth • • • • • • Chosen by me Binwidth = 5, << ~13 from EXCEL default Nicer edge numbers Data centered around 84 (now more precise) Bar graph rougher (fewer points in each bin) Suggests 3 main groups Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth • • • • • • Chosen by me Binwidth = 5, << ~13 from EXCEL default Nicer edge numbers Data centered around 84 (now more precise) Bar graph rougher (fewer points in each bin) Suggests 3 main groups (called “modes” or “clusters”) Histogram Real Data Example Buffalo Snow Fall Data – Smaller binwidth • • • • • • Chosen by me Binwidth = 5, << ~13 from EXCEL default Nicer edge numbers Data centered around 84 (now more precise) Bar graph rougher (fewer points in each bin) Suggests 3 main groups (called “modes” or “clusters”) (can’t see this above: bin width is important) Histogram Real Data Example Buffalo Snow Fall Data – Larger binwidth Histogram Real Data Example Buffalo Snow Fall Data – Larger binwidth Manual Bins - Large BInwidth 30 25 20 15 10 5 0 1 2 3 4 5 Histogram Real Data Example Buffalo Snow Fall Data – Larger binwidth • Chosen by me • Binwidth = 30, >> ~13 from EXCEL default Histogram Real Data Example Buffalo Snow Fall Data – Larger binwidth • Chosen by me • Binwidth = 30, >> ~13 from EXCEL default • Bar graph is “smooth” (since many points in each bin) Histogram Real Data Example Buffalo Snow Fall Data – Larger binwidth • Chosen by me • Binwidth = 30, >> ~13 from EXCEL default • Bar graph is “smooth” (since many points in each bin) • Only one mode (cluster)??? Histogram Real Data Example Buffalo Snow Fall Data – Larger binwidth • Chosen by me • Binwidth = 30, >> ~13 from EXCEL default • Bar graph is “smooth” (since many points in each bin) • Only one mode (cluster)??? • Quite symmetric? Histogram Real Data Example Buffalo Snow Fall Data – Larger binwidth • Chosen by me • Binwidth = 30, >> ~13 from EXCEL default • Bar graph is “smooth” (since many points in each bin) • Only one mode (cluster)??? • Quite symmetric? (different from above: bin width is important) Histogram Real Data Example HW: 1.28 [data in ta01_005.xls] ((c) loses bump near 50) 1.36 [data in ex01_036.xls] ((a) 4 (b) 2 (c) 1) 1.37 1.39 Research Corner Histo Bin Width (serious issue) Research Corner Histo Bin Width (serious issue) Interesting Data Set: Hidalgo Stamps Research Corner Histo Bin Width (serious issue) Interesting Data Set: Hidalgo Stamps • Famous among postage stamp collectors Research Corner Histo Bin Width (serious issue) Interesting Data Set: Hidalgo Stamps • Famous among postage stamp collectors • Printed in Mexico, 1800’s, over ~70 years Research Corner Histo Bin Width (serious issue) Interesting Data Set: Hidalgo Stamps • Famous among postage stamp collectors • Printed in Mexico, 1800’s, over ~70 years • Very different paper thicknesses… Research Corner Histo Bin Width (serious issue) Interesting Data Set: Hidalgo Stamps • Famous among postage stamp collectors • Printed in Mexico, 1800’s, over ~70 years • Very different paper thicknesses… • How many paper sources? Research Corner Histo Bin Width (serious issue) Interesting Data Set: Hidalgo Stamps • Famous among postage stamp collectors • Printed in Mexico, 1800’s, over ~70 years • Very different paper thicknesses… • How many paper sources? • Unknown, since records are lost Research Corner Histo Bin Width (serious issue) Interesting Data Set: Hidalgo Stamps • Famous among postage stamp collectors • Printed in Mexico, 1800’s, over ~70 years • Very different paper thicknesses… • How many paper sources? • Unknown, since records are lost • Study histogram of stamp thicknesses Research Corner Movie over binwidth Research Corner Movie over binwidth Shows very wide range Research Corner Movie over binwidth Shows very wide range (much different visual impressions) Research Corner Movie over binwidth Shows very wide range (much different visual impressions) How many bumps? Research Corner Movie over binwidth Shows very wide range (much different visual impressions) How many bumps? Answer published in literature: 2, 3, 5, 7, 10 Research Corner Movie over binwidth Shows very wide range (much different visual impressions) How many bumps? Answer published in literature: 2, 3, 5, 7, 10 Very challenging question Research Corner How many bumps? Believe in 2? Research Corner How many bumps? Believe in 3? Research Corner How many bumps? Believe in 5? Research Corner How many bumps? Believe in 7? Research Corner How many bumps? Believe in 10? Big Picture • Margin of Error • Choose Sample Size Need better prob tools Start with visualizing probability distributions Big Picture • Margin of Error • Choose Sample Size Need better prob tools Start with visualizing probability distributions, Next exploit constant shape property of Bi Big Picture Start with visualizing probability distributions, Next exploit constant shape property of Binom’l Big Picture Start with visualizing probability distributions, Next exploit constant shape property of Binom’l Centerpoint feels p Big Picture Start with visualizing probability distributions, Next exploit constant shape property of Binom’l Centerpoint feels p Spread feels n Big Picture Start with visualizing probability distributions, Next exploit constant shape property of Binom’l Centerpoint feels p Spread feels n Big Picture Start with visualizing probability distributions, Next exploit constant shape property of Binom’l Centerpoint feels p Spread feels n Now quantify these ideas, to put them to work Notions of Center Will later study “notions of spread” Notions of Center Textbook: Sections 4.4 and 1.2 Notions of Center Textbook: Sections 4.4 and 1.2 Recall parallel development: (a) Probability Distributions (b) Lists of Numbers Notions of Center Textbook: Sections 4.4 and 1.2 Recall parallel development: (a) Probability Distributions (b) Lists of Numbers Study 1st, since easier Notions of Center (b) Lists of Numbers “Average” or “Mean” Notions of Center (b) Lists of Numbers “Average” or “Mean” of x1, x2, …, xn n Mean = x i 1 n i = x Notions of Center (b) Lists of Numbers “Average” or “Mean” of x1, x2, …, xn n Mean = x i 1 i = x n common notation Notions of Center (b) Lists of Numbers “Average” or “Mean” of x1, x2, …, xn n Mean = x i 1 i = x n (as before) Greek sigma for sum means “sum over I = 1,…,n” Notions of Center HW: C16: for the data of 1.57, find the mean using the Excel function AVERAGE (10.03) Notions of Center Generalization of Mean: “Weighted Average” Notions of Center Generalization of Mean: “Weighted Average” Idea: allow non-equal weights on xi s: Notions of Center Generalization of Mean: “Weighted Average” Idea: allow non-equal weights on n w x i 1 i i xi s: Notions of Center Generalization of Mean: “Weighted Average” Idea: allow non-equal weights on n w x i i i 1 Where wi 0 , w i i 1 xi s: Notions of Center Generalization of Mean: “Weighted Average” E.g.: ordinary mean has each wi 1 n Notions of Center Generalization of Mean: “Weighted Average” E.g.: ordinary mean has each wi (constant weights) 1 n Notions of Center Generalization of Mean: “Weighted Average” Intuition: Corresponds to finding balance point of weights on number line Notions of Center Generalization of Mean: “Weighted Average” Intuition: Corresponds to finding balance point of weights on number line x1 x2 x3 Notions of Center Generalization of Mean: “Weighted Average” Intuition: Corresponds to finding balance point of weights on number line x1 x2 x3 Notions of Center Generalization of Mean: “Weighted Average” Intuition: Corresponds to finding balance point of weights on number line x1 x2 x3 Notions of Center HW: C17: Calculate (and think about as “balance point”) weighted average of 1, 2, 3, 10 for the weights: a. ¼, ¼, ¼, 1/4, (ordinary avg.) (4) b. 0.1, 0.1, 0.1, 0.7 (more on 10) (7.6) c. 0.3, 0.3, 0.3, 0.1 (less on 10) (2.8) d. 1/3, 1/3, 1/3, 0 (none on 10) (2) e. 0, 1, 0, 0 (all on 2) (2)