Unit 1 Notes - blank copy - Avon Community School Corporation

Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Probability and Statistics
Unit 1: Organizing Data
Day 1:
Collecting & Organizing Data; Frequency Distributions;
Histograms & Relative Frequency Histograms
A. Collecting & Organizing Data
______________________ is the study of how to collect, organize, analyze, and interpret numerical
information from data. This data can represent information that is:
“Qualitative”/ ____________________________ or “Quantitative”/ __________________________.
Quantitative data MUST HAVE ___________.
Example A1:
BOX in data that is “Qualitative”
STAR beside data that is “Quantitative”
A survey is conducted at a local library collecting the following data from patrons:
Age
Marital Status
Gender
Distance you live from library
Number of Children in Household
Favorite Book Genre
Example A2: Yellow Textbook pg 13 “Just Checking”
Goal in this chapter:
1.
2.
3.
4.
Examine data & describe the distribution of the data
Choose the best way to organize/display the data
Create (by hand) the most common data displays
Read/Interpret data displays
1
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
B. Histograms & Frequency Tables
Have a large set of quantitative data? Organize into smaller intervals called ________________.
A histogram uses ______________ to show __________________________ of classes.
A relative frequency histogram uses bars to show the _____________________ of cases in each class.
Basic Construction:
Characteristics of a Histogram:
TITLE
Freq/Rel Freq./Etc.
1.
2.
3.
4.
5.
6.
7.
Used for high volume quantitative data
Bars equal width
Bars touch
Class limits/class boundaries on x-axis
Class frequency/relative frequency on y-axis
Classes cannot overlap or be open-ended
Use 4-15 classes. (some sources say 5-15)
Class Boundaries
Example B1: Textbook page 45
Example B2:
This histogram has _____ CLASSES.
The CLASS BOUNDARIES of this bar are _______ to ______.
The FREQUENCY of this class is ___________.
2
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
OK. Sounds good. Now how do we make a histogram?
I knew you were going to ask that. Here we go…
Example B3:
Time on Hold, in minutes
1 5 5 6 7 4 8 7 6 5
5 6 7 6 6 5 8 9 9 10
7 8 11 2 4 6 5 12 13 6
3 7 8 8 9 9 10 8 9 9
An irate customer called the Dollar Day Mail Order
Company 40 times during the last two weeks to see why
his order had not arrived. Each time he called, he recorded
the length of time he was put “on hold” before begin
allowed to talk to a customer service representative.
We are going to use five classes to organize our data. (The number of classes will be given to you for
homework.) We need to determine how big each interval should be. This is called the “Class width.”
Step 1: Determine Class Width
1. Compute:
In our example:
πΏπ‘Žπ‘Ÿπ‘”π‘’π‘ π‘‘ π‘‘π‘Žπ‘‘π‘Ž π‘£π‘Žπ‘™π‘’π‘’−π‘ π‘šπ‘Žπ‘™π‘™π‘’π‘ π‘‘ π‘‘π‘Žπ‘‘π‘Ž π‘£π‘Žπ‘™π‘’π‘’
π‘π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ πΆπ‘™π‘Žπ‘ π‘ π‘’π‘ 
2. ROUND UP to next whole number.
Step 2: Determine the Data Range for each class: The Class Limits
Start with Lower Class Limits (LL) (The lowest value in the data class)
Lowest data value = Lowest Class Limit.
Add Class Width to get next lowest limit, etc.
Fill in Upper Class Limits (UL) (the highest value that fits in the class)
Class
Limits
Class
Boundaries
Tally (optional)
Step 3: Determine the Class Boundaries
Upper Class Boundary = Upper limit + 0.5
Lower Class Boundary = Lower limit – 0.5
Frequency
Cumulative
Frequency
Midpoints
Step 4: Determine the Frequency of each class
Class Frequency = # of data values in class
(count)
Step 5: Find the Class Midpoint
Class Midpoint = Average of Lower and Upper Limits
3
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
A table (like the one we just made) that shows the classes and corresponding frequencies is called a
______________________________________ or _____________________________________
Example B4: Use the frequency table from Example B3 to construct a histogram.
Step 1: With a ruler, draw a vertical
and horizontal axis. Give vertical
axis appropriate scale for
frequency. Give horizontal axis
appropriate scale for Class
Boundaries.
Step 2: Draw in the bars
Step 3: Give Histogram a title and
label the axes
Relative Frequency Table
The relative frequency of a class is the proportion (or percentage) of all data values in that class. It helps
us compare the amount of data in each class.
Step 1: Fill in your class limits and
frequencies
(from our last example)
Step 2: Compute the Relative Frequency
1. Find the total frequency (sum)
2. Rel. Frequency =
***NOTATION*** ∑ =
𝒇
𝒏
=
π‘π‘™π‘Žπ‘ π‘  π‘“π‘Ÿπ‘’π‘ž.
Class Limits
Frequency
1–3
3
4–6
15
7–9
17
10 – 12
4
13 – 15
1
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘“π‘Ÿπ‘’π‘ž.
∑ π‘“π‘Ÿπ‘’π‘ž =
Relative Frequency
Total:
∑ π‘Ÿπ‘“ =
4
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example B5: Given a data set of numbers {1, 7, 8, 4, 4, 5, 6, 3, 8, 7, 1, 1, 8, 1} and using four classes
a) Find the class width
b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, and
relative frequencies
c) Make a histogram.
d) Make a relative frequency histogram.
Class Limits
Class Boundaries
Frequency
Relative
Frequency
Midpoints
5
Course Notes: Probability and Statistics – Mrs. Leahy
Day 2:
Unit 1: Organizing Data
Distribution shapes; Frequency Polygons; Dot Plots
Recall a __________________________ can be used to represent a Frequency Distribution.
A: Distribution Shapes
Distribution Shapes
Distribution
Shapes
Symmetric
Uniform
Bimodal
Symmetric
Symmetric/Mound/Bell Shaped:
two sides are symmetrical with respect to a
vertical line that goes through the middle of
the graph
Uniform: every class has the same frequency
Skewed Left
Uniform
Bimoda
Skewed Right
Skewed Left
Skewed Right
Bimodal: histogram shows _______ peaks separated by at
least one shorter bar
Unimodal: histogram shows ______ peak
Skewed Left: More bars on the left side of the peak…
“tail”
the
left
is longer than right
Copyright
© Cengageon
Learning.
All rights
reserved.
2 | 14
Copyright © Cengage Learning. All rights reserved.
Skewed Right: More bars on the right side of the peak….“tail” on right is longer than left
Often a ______________________ distribution is caused by collecting data from a group of individuals
that could have been classified better into two separate groups for that particular data.
Example: height from a mixed group of men and woman
Significant gaps between bars at the left or right can be caused by _______________________.
These are values that are significantly higher or lower than the rest of your data.
Example: salaries of employees at a major corporation where the CEO makes three times as
much as rest of the workers.
Example A1 Look at Distributions – Textbook pages 50 -51
Example A2: Name that Distribution! Powerpoint
6
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
B: Dot Plots (Similar to a histogram)
horizontal axis = shows appropriate scale
indicates quantitative data results
vertically = one dot per occurrence of a particular
value
Example B1: A handful of pennies were examined and the year of minting was recorded. The
information is recorded on the following dotplot.
In which year were the most pennies minted?
How many pennies were minted after 1996?
How many pennies were there total in this handful?
Describe (using a year range) when you think the majority of the
pennies in this handful were minted.
A dotplot can be created like this too:
A dotplot can be used to tell a story, much like a
histogram
7
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example B2: Create a dotplot for the following data set.
12, 15, 16, 16, 14, 12, 14, 18, 19, 14, 15, 18, 16, 13, 15, 16, 13, 10, 18, 16
How many numbers? ______
Lowest? _______ Start with: _______
Highest? _______ End with: _______
Example B3: Create a dotplot for the data.
8
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
C: Frequency Polygon
Sometimes, we are interested in a frequency polygon
Start with your histogram data.
Instead of a bar, use a line graph with a dot at the midpoint of the class.
Example C1: Construct a frequency polygon for the following data.
Example C2: Construct a cumulative frequency polygon for the following data.
9
Course Notes: Probability and Statistics – Mrs. Leahy
Day 3:
Unit 1: Organizing Data
Stem-and-Leaf Displays
A. Exploratory Data Analysis (EDA)
Exploratory Data Analysis techniques are used to explore a data set, to detect patterns and
extreme data values, to raise new questions, or to pursue leads in many directions.
Useful when data has been gathered for ______________________________.
For example: Ages of Applicants of Graduate Programs
B.
Key: 1 2 = 12
Stem-and-Leaf Display
Used for _________________________ data.
Best with small to medium size sets.
A stem-and-leaf display is used to ___________ order and arrange data
into groups.
The _____________ are aligned vertically from smallest to largest.
A vertical line is drawn to the right of the stems.
The ____________ with the same stem are placed in the same row as the stem, arranged in
___________________ order.
A label (Key) is used to indicate the magnitude of the numbers in the display.
Example B1:
A study on peanut butter reported the following optimal consumption temperatures for various brands:
56
44
62
36
39
53
50
65
45
40
Make a stem-and-leaf display for this data.
Step 1: Identify appropriate stem values.
List smallest to largest. No omissions!
Step 2: List leaves with corresponding stems
In numeric order smallest to largest!
Step 3: Include Key and Title
10
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example B2:
For the following data, use the first two digits as the step to make a stem-and-leaf display.
106
94
112
96
89
113
90
85
85
100
Example B3: Describe the distributions of the following stem/leaf displays.
11
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
C: Stem & Leaf Special Cases
Splitting the Stems
idea: for lots of data, use TWO (or more) intervals
instead of one for the stem.
Consider:
0 0 1
2
3 3 4 5 5 7 7 8 9 9 9
Using only one stem “0” would give us an
overcrowded graph. Instead of using an interval
of 0-9, maybe we could use TWO intervals.
source: http://learnalgebrafaster.com/split-stem-and-leaf-plot/
Example C1: Make a stem-and-leaf display using
a) Two intervals: 0-4, 5-9
b) Five intervals: 0-1, 2-3, 4-5, 6-7, 8-9
Example C2. Britney is a swimmer training for a competition. The number of 50 meter laps she swam
each day for 30 days are as follows:
a) Prepare a stem-and-leaf plot.
b) Redraw the stem-and-leaf plot using two unit intervals.
c) Make a comment on what these plots show.
12
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Back-To-Back Stem-and-Leaf Plots
If you are comparing two sets of data,
you can use a back-to-back
stem-and-leaf plot.
Example D1: The following class sizes were reported in Economics 101 and Math 151:
Econ 101:
Math 151
20, 34, 27, 15, 24, 35, 38, 28
14, 18, 21, 34, 29, 13, 32, 23
Make a back-to-back stem-and-leaf plot for the data.
13
Course Notes: Probability and Statistics – Mrs. Leahy
Day 4:
Unit 1: Organizing Data
Qualitative/Categorical Data
A: Bar Graph
Features of a Bar Graph
ο‚· Bars can be vertical or horizontal.
ο‚· Bars are of uniform width and uniformly spaced.
ο‚· Lengths represent values of variables being displayed,
the frequency of occurrence, or the percentage of
occurrence. The same measurement scale is used for
the length of each bar.
ο‚· The graph is well annotated with title, labels for each
bar, and vertical scale or actual value for the length of
each bar.
Clustered Bar Graph:
Example A1:
Make a clustered bar graph for the following
data.
two or more bars for each value on the
horizontal axis, clusters are uniformly spaced
Month
January
March
May
July
September
November
Ave. High
Temp
Ft. Myers, FL
75
80
89
92
91
81
Ave. High
Temp
Indianapolis, IN
34
49
72
84
76
51
14
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Another type of bar graph is a SEGMENTED BAR GRAPH :
In this graph each bar is a whole and is divided proportionally based on the conditional distributions for
each variable.
Example A2:
Use the contingency table to construct a segmented bar graph.
15
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
B: Pareto Chart
bars arranged by frequency, highest to lowest
Example B1: A sandwich shop records the number of each kind of sandwich sold last
Friday. The numbers are recorded in the chart:
Design a Pareto Chart below for the types of sandwiches sold last Friday.
16
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
C: Pie Charts/Circle Graphs
Wedges visually display proportional parts of
the total population as a percentage or as a
portion of 360°
Good for qualitative/categorical data
The graph should have a title and wedges
should be well labeled or have a key/legend.
Josh Sundquist’s Pie Charts for Math Nerds:
http://youtu.be/LhfGPqW2xkM
How do you make a circle graph by hand?
Step 1: Determine your grand total (if it’s not given)
Step 2: Determine the PERCENTAGE represented by each category
Percentage in each category = # in category/ Total
Step 3: Determine the number of DEGREES represented by each category
Degrees of category = Percentage of category x 360°
Step 4: Use a PROTRACTOR to mark off the correct number of degrees, one wedge at a time
Example C1: Let’s start easy
Make a circle graph for the following data
Elementary Teachers
of Local Schools
Year
1995
Male
15
Female 40
17
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example C2: Make a circle graph for the following data
D: Time Series Graph (Line Graph)
Data are plotted in order of occurrence at
regular intervals over time. Dots are connected
using line segments.
Example D1: Make a time series graph
for the following data
Year
1990 1995 2000 2005 2010
Enrollment 30
34
32
40
52
18
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
E: Displaying Data
Determine whether the statement is true or false.
A. In a bar graph, the bars do not have to be of uniform width.
B. The bars in a bar graph can be vertical or horizontal.
C. The lengths of the bars in a bar graph stands for certain values of the variable being
displayed.
D. When two or more variables are displayed together, the bar graph is called a clustered
bar graph (or a comparative bar graph).
E. In a Pareto chart, the bars are arranged from left to right according to increasing height.
F. A circle graph is also called a pie chart.
G. Circle graphs are usually used to display percentages.
H. A time series data contains the values of a variable taken at regular intervals over a
certain time period.
THINK ABOUT IT.
Best for what kind of data?
What can you “see” from the display?
Bar Graph
Pareto Chart
Circle Graphs
Time-Series Graph
Histogram
Dotplot
Stem & Leaf Plot
ALL GRAPHS:
Provide a title, label the axes, and identify units of measure.
19
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Day 1/Day 2
Day 3
Day 4
20