Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Probability and Statistics Unit 1: Organizing Data Day 1: Collecting & Organizing Data; Frequency Distributions; Histograms & Relative Frequency Histograms A. Collecting & Organizing Data ______________________ is the study of how to collect, organize, analyze, and interpret numerical information from data. This data can represent information that is: “Qualitative”/ ____________________________ or “Quantitative”/ __________________________. Quantitative data MUST HAVE ___________. Example A1: BOX in data that is “Qualitative” STAR beside data that is “Quantitative” A survey is conducted at a local library collecting the following data from patrons: Age Marital Status Gender Distance you live from library Number of Children in Household Favorite Book Genre Example A2: Yellow Textbook pg 13 “Just Checking” Goal in this chapter: 1. 2. 3. 4. Examine data & describe the distribution of the data Choose the best way to organize/display the data Create (by hand) the most common data displays Read/Interpret data displays 1 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data B. Histograms & Frequency Tables Have a large set of quantitative data? Organize into smaller intervals called ________________. A histogram uses ______________ to show __________________________ of classes. A relative frequency histogram uses bars to show the _____________________ of cases in each class. Basic Construction: Characteristics of a Histogram: TITLE Freq/Rel Freq./Etc. 1. 2. 3. 4. 5. 6. 7. Used for high volume quantitative data Bars equal width Bars touch Class limits/class boundaries on x-axis Class frequency/relative frequency on y-axis Classes cannot overlap or be open-ended Use 4-15 classes. (some sources say 5-15) Class Boundaries Example B1: Textbook page 45 Example B2: This histogram has _____ CLASSES. The CLASS BOUNDARIES of this bar are _______ to ______. The FREQUENCY of this class is ___________. 2 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data OK. Sounds good. Now how do we make a histogram? I knew you were going to ask that. Here we go… Example B3: Time on Hold, in minutes 1 5 5 6 7 4 8 7 6 5 5 6 7 6 6 5 8 9 9 10 7 8 11 2 4 6 5 12 13 6 3 7 8 8 9 9 10 8 9 9 An irate customer called the Dollar Day Mail Order Company 40 times during the last two weeks to see why his order had not arrived. Each time he called, he recorded the length of time he was put “on hold” before begin allowed to talk to a customer service representative. We are going to use five classes to organize our data. (The number of classes will be given to you for homework.) We need to determine how big each interval should be. This is called the “Class width.” Step 1: Determine Class Width 1. Compute: In our example: πΏπππππ π‘ πππ‘π π£πππ’π−π ππππππ π‘ πππ‘π π£πππ’π ππ’ππππ ππ πΆπππ π ππ 2. ROUND UP to next whole number. Step 2: Determine the Data Range for each class: The Class Limits Start with Lower Class Limits (LL) (The lowest value in the data class) Lowest data value = Lowest Class Limit. Add Class Width to get next lowest limit, etc. Fill in Upper Class Limits (UL) (the highest value that fits in the class) Class Limits Class Boundaries Tally (optional) Step 3: Determine the Class Boundaries Upper Class Boundary = Upper limit + 0.5 Lower Class Boundary = Lower limit – 0.5 Frequency Cumulative Frequency Midpoints Step 4: Determine the Frequency of each class Class Frequency = # of data values in class (count) Step 5: Find the Class Midpoint Class Midpoint = Average of Lower and Upper Limits 3 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data A table (like the one we just made) that shows the classes and corresponding frequencies is called a ______________________________________ or _____________________________________ Example B4: Use the frequency table from Example B3 to construct a histogram. Step 1: With a ruler, draw a vertical and horizontal axis. Give vertical axis appropriate scale for frequency. Give horizontal axis appropriate scale for Class Boundaries. Step 2: Draw in the bars Step 3: Give Histogram a title and label the axes Relative Frequency Table The relative frequency of a class is the proportion (or percentage) of all data values in that class. It helps us compare the amount of data in each class. Step 1: Fill in your class limits and frequencies (from our last example) Step 2: Compute the Relative Frequency 1. Find the total frequency (sum) 2. Rel. Frequency = ***NOTATION*** ∑ = π π = ππππ π ππππ. Class Limits Frequency 1–3 3 4–6 15 7–9 17 10 – 12 4 13 – 15 1 π‘ππ‘ππ ππππ. ∑ ππππ = Relative Frequency Total: ∑ ππ = 4 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Example B5: Given a data set of numbers {1, 7, 8, 4, 4, 5, 6, 3, 8, 7, 1, 1, 8, 1} and using four classes a) Find the class width b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, and relative frequencies c) Make a histogram. d) Make a relative frequency histogram. Class Limits Class Boundaries Frequency Relative Frequency Midpoints 5 Course Notes: Probability and Statistics – Mrs. Leahy Day 2: Unit 1: Organizing Data Distribution shapes; Frequency Polygons; Dot Plots Recall a __________________________ can be used to represent a Frequency Distribution. A: Distribution Shapes Distribution Shapes Distribution Shapes Symmetric Uniform Bimodal Symmetric Symmetric/Mound/Bell Shaped: two sides are symmetrical with respect to a vertical line that goes through the middle of the graph Uniform: every class has the same frequency Skewed Left Uniform Bimoda Skewed Right Skewed Left Skewed Right Bimodal: histogram shows _______ peaks separated by at least one shorter bar Unimodal: histogram shows ______ peak Skewed Left: More bars on the left side of the peak… “tail” the left is longer than right Copyright © Cengageon Learning. All rights reserved. 2 | 14 Copyright © Cengage Learning. All rights reserved. Skewed Right: More bars on the right side of the peak….“tail” on right is longer than left Often a ______________________ distribution is caused by collecting data from a group of individuals that could have been classified better into two separate groups for that particular data. Example: height from a mixed group of men and woman Significant gaps between bars at the left or right can be caused by _______________________. These are values that are significantly higher or lower than the rest of your data. Example: salaries of employees at a major corporation where the CEO makes three times as much as rest of the workers. Example A1 Look at Distributions – Textbook pages 50 -51 Example A2: Name that Distribution! Powerpoint 6 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data B: Dot Plots (Similar to a histogram) horizontal axis = shows appropriate scale indicates quantitative data results vertically = one dot per occurrence of a particular value Example B1: A handful of pennies were examined and the year of minting was recorded. The information is recorded on the following dotplot. In which year were the most pennies minted? How many pennies were minted after 1996? How many pennies were there total in this handful? Describe (using a year range) when you think the majority of the pennies in this handful were minted. A dotplot can be created like this too: A dotplot can be used to tell a story, much like a histogram 7 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Example B2: Create a dotplot for the following data set. 12, 15, 16, 16, 14, 12, 14, 18, 19, 14, 15, 18, 16, 13, 15, 16, 13, 10, 18, 16 How many numbers? ______ Lowest? _______ Start with: _______ Highest? _______ End with: _______ Example B3: Create a dotplot for the data. 8 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data C: Frequency Polygon Sometimes, we are interested in a frequency polygon Start with your histogram data. Instead of a bar, use a line graph with a dot at the midpoint of the class. Example C1: Construct a frequency polygon for the following data. Example C2: Construct a cumulative frequency polygon for the following data. 9 Course Notes: Probability and Statistics – Mrs. Leahy Day 3: Unit 1: Organizing Data Stem-and-Leaf Displays A. Exploratory Data Analysis (EDA) Exploratory Data Analysis techniques are used to explore a data set, to detect patterns and extreme data values, to raise new questions, or to pursue leads in many directions. Useful when data has been gathered for ______________________________. For example: Ages of Applicants of Graduate Programs B. Key: 1 2 = 12 Stem-and-Leaf Display Used for _________________________ data. Best with small to medium size sets. A stem-and-leaf display is used to ___________ order and arrange data into groups. The _____________ are aligned vertically from smallest to largest. A vertical line is drawn to the right of the stems. The ____________ with the same stem are placed in the same row as the stem, arranged in ___________________ order. A label (Key) is used to indicate the magnitude of the numbers in the display. Example B1: A study on peanut butter reported the following optimal consumption temperatures for various brands: 56 44 62 36 39 53 50 65 45 40 Make a stem-and-leaf display for this data. Step 1: Identify appropriate stem values. List smallest to largest. No omissions! Step 2: List leaves with corresponding stems In numeric order smallest to largest! Step 3: Include Key and Title 10 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Example B2: For the following data, use the first two digits as the step to make a stem-and-leaf display. 106 94 112 96 89 113 90 85 85 100 Example B3: Describe the distributions of the following stem/leaf displays. 11 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data C: Stem & Leaf Special Cases Splitting the Stems idea: for lots of data, use TWO (or more) intervals instead of one for the stem. Consider: 0 0 1 2 3 3 4 5 5 7 7 8 9 9 9 Using only one stem “0” would give us an overcrowded graph. Instead of using an interval of 0-9, maybe we could use TWO intervals. source: http://learnalgebrafaster.com/split-stem-and-leaf-plot/ Example C1: Make a stem-and-leaf display using a) Two intervals: 0-4, 5-9 b) Five intervals: 0-1, 2-3, 4-5, 6-7, 8-9 Example C2. Britney is a swimmer training for a competition. The number of 50 meter laps she swam each day for 30 days are as follows: a) Prepare a stem-and-leaf plot. b) Redraw the stem-and-leaf plot using two unit intervals. c) Make a comment on what these plots show. 12 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Back-To-Back Stem-and-Leaf Plots If you are comparing two sets of data, you can use a back-to-back stem-and-leaf plot. Example D1: The following class sizes were reported in Economics 101 and Math 151: Econ 101: Math 151 20, 34, 27, 15, 24, 35, 38, 28 14, 18, 21, 34, 29, 13, 32, 23 Make a back-to-back stem-and-leaf plot for the data. 13 Course Notes: Probability and Statistics – Mrs. Leahy Day 4: Unit 1: Organizing Data Qualitative/Categorical Data A: Bar Graph Features of a Bar Graph ο· Bars can be vertical or horizontal. ο· Bars are of uniform width and uniformly spaced. ο· Lengths represent values of variables being displayed, the frequency of occurrence, or the percentage of occurrence. The same measurement scale is used for the length of each bar. ο· The graph is well annotated with title, labels for each bar, and vertical scale or actual value for the length of each bar. Clustered Bar Graph: Example A1: Make a clustered bar graph for the following data. two or more bars for each value on the horizontal axis, clusters are uniformly spaced Month January March May July September November Ave. High Temp Ft. Myers, FL 75 80 89 92 91 81 Ave. High Temp Indianapolis, IN 34 49 72 84 76 51 14 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Another type of bar graph is a SEGMENTED BAR GRAPH : In this graph each bar is a whole and is divided proportionally based on the conditional distributions for each variable. Example A2: Use the contingency table to construct a segmented bar graph. 15 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data B: Pareto Chart bars arranged by frequency, highest to lowest Example B1: A sandwich shop records the number of each kind of sandwich sold last Friday. The numbers are recorded in the chart: Design a Pareto Chart below for the types of sandwiches sold last Friday. 16 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data C: Pie Charts/Circle Graphs Wedges visually display proportional parts of the total population as a percentage or as a portion of 360° Good for qualitative/categorical data The graph should have a title and wedges should be well labeled or have a key/legend. Josh Sundquist’s Pie Charts for Math Nerds: http://youtu.be/LhfGPqW2xkM How do you make a circle graph by hand? Step 1: Determine your grand total (if it’s not given) Step 2: Determine the PERCENTAGE represented by each category Percentage in each category = # in category/ Total Step 3: Determine the number of DEGREES represented by each category Degrees of category = Percentage of category x 360° Step 4: Use a PROTRACTOR to mark off the correct number of degrees, one wedge at a time Example C1: Let’s start easy Make a circle graph for the following data Elementary Teachers of Local Schools Year 1995 Male 15 Female 40 17 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Example C2: Make a circle graph for the following data D: Time Series Graph (Line Graph) Data are plotted in order of occurrence at regular intervals over time. Dots are connected using line segments. Example D1: Make a time series graph for the following data Year 1990 1995 2000 2005 2010 Enrollment 30 34 32 40 52 18 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data E: Displaying Data Determine whether the statement is true or false. A. In a bar graph, the bars do not have to be of uniform width. B. The bars in a bar graph can be vertical or horizontal. C. The lengths of the bars in a bar graph stands for certain values of the variable being displayed. D. When two or more variables are displayed together, the bar graph is called a clustered bar graph (or a comparative bar graph). E. In a Pareto chart, the bars are arranged from left to right according to increasing height. F. A circle graph is also called a pie chart. G. Circle graphs are usually used to display percentages. H. A time series data contains the values of a variable taken at regular intervals over a certain time period. THINK ABOUT IT. Best for what kind of data? What can you “see” from the display? Bar Graph Pareto Chart Circle Graphs Time-Series Graph Histogram Dotplot Stem & Leaf Plot ALL GRAPHS: Provide a title, label the axes, and identify units of measure. 19 Course Notes: Probability and Statistics – Mrs. Leahy Unit 1: Organizing Data Day 1/Day 2 Day 3 Day 4 20