Collecting & Organizing Data - Avon Community School Corporation

Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Probability and Statistics
Unit 1: Organizing Data
Day 1:
Collecting & Organizing Data; Frequency Distributions;
Histograms & Relative Frequency Histograms
A. Collecting & Organizing Data
______________________ is the study of how to collect, organize, analyze, and interpret
numerical information from data.
This data can represent information that is “Qualitative”/ __________________________
or “Quantitative”/ ______________________________.
Example A1:
A survey is conducted at a local library collecting the following data from patrons:
1. Age
4. Marital Status
2. Gender
5. Number of Children in Household
3. Distance you live from library
6. Favorite Book Genre
BOX in data that is “Qualitative”
STAR beside data that is “Quantitative”
In this unit, we will be dealing with a lot of “Quantitative” data, how to organize it, and how to
display the data in a useful manner.
B. Frequency Tables
Have a large set of quantitative data? Organize into smaller intervals called ________________.
How many should we use?
1
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example B1: Characteristics and Important Values in a Frequency Distribution
An irate customer called the Dollar Day Mail Order
Company 40 times during the last two weeks to see why his
order had not arrived. Each time he called, he recorded the
length of time he was put “on hold” before begin allowed
to talk to a customer service representative.
Time on Hold, in minutes
1 5 5 6 7 4 8
5 6 7 6 6 5 8
7 8 11 2 4 6 5
3 7 8 8 9 9 10
7
9
12
8
6
9
13
9
5
10
6
9
We are going to use five classes to organize our data. (The number of classes will be given to you for
homework.) We need to determine how big each interval should be. This is called the “Class width.”
Step 1: Determine Class Width
1. Compute:
In our example:
πΏπ‘Žπ‘Ÿπ‘”π‘’π‘ π‘‘ π‘‘π‘Žπ‘‘π‘Ž π‘£π‘Žπ‘™π‘’π‘’−π‘ π‘šπ‘Žπ‘™π‘™π‘’π‘ π‘‘ π‘‘π‘Žπ‘‘π‘Ž π‘£π‘Žπ‘™π‘’π‘’
π‘π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ πΆπ‘™π‘Žπ‘ π‘ π‘’π‘ 
2. ROUND UP to next whole number.
Step 2: Determine the Data Range for each class: The Class Limits
Start with Lower Class Limits (LL) (The lowest value in the data class)
Lowest data value = Lowest Class Limit.
Add Class Width to get next lowest limit, etc.
Fill in Upper Class Limits (UL) (the highest value that fits in the class)
Class Limits
Class Boundaries
Tally (optional)
Step 3: Determine the Class Boundaries
Upper Class Boundary = Upper limit + 0.5
Lower Class Boundary = Lower limit – 0.5
Frequency
Midpoints
Step 4: Determine the Frequency of each class
Class Frequency = # of data values in class
(count)
Step 5: Find the Class Midpoint
Class Midpoint = Average of Lower and Upper Limits
2
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
A table (like the one we just made) that shows the classes and corresponding frequencies is called a
______________________________________ or _____________________________________
Example B2: Relative Frequency Table
The relative frequency of a class is the proportion of all data values in that class. It helps us compare
the amount of data in each class.
Step 1: Fill in your class limits and frequencies
(from our last example)
Step 2: Compute the Relative Frequency
1. Find the total frequency (sum)
2. Rel. Frequency =
***NOTATION*** ∑ =
𝒇
𝒏
=
π‘π‘™π‘Žπ‘ π‘  π‘“π‘Ÿπ‘’π‘ž.
Class Limits
Frequency
1–3
3
4–6
15
7–9
17
10 – 12
4
13 – 15
1
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘“π‘Ÿπ‘’π‘ž.
∑ π‘“π‘Ÿπ‘’π‘ž =
Relative Frequency
Total:
∑ π‘Ÿπ‘“ =
Example B3: Given a data set of 10 numbers {1, 7, 8, 4, 4, 5, 6, 3, 8, 7} and using four classes
a) Find the class width
b) Make a frequency table showing class limits, class boundaries, midpoints, frequencies, and
relative frequencies
3
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
C: Histograms and Relative Frequency Histograms
Properties of Histograms &
Frequency Distributions
A histogram uses ______________ to show
1. Bars touch
________________________________.
A histogram where the ________________ of the bar = class
frequency is a frequency histogram.
If we use relative frequency for the height, then we have a
relative frequency histogram.
2. Sometimes class limits or
midpoints may be used
instead of boundaries.
3. Class widths must be equal.
4. Class limits cannot overlap.
5. Classes cannot be openended
Example C1: Construct a histogram and a relative frequency
histogram for the data in example B3. (You may want a ruler)
6. Use 4 to 15 classes. (Some
sources recommend 5 to 15)
Basic Construction:
Freq/Rel Freq./Etc.
TITLE
Class
Boundaries
0.5 – 2.5
Frequency Relative
Frequency
1
0.1
2.5 – 4.5
3
0.3
4.5 – 6.5
2
0.2
6.5 – 8.5
4
0.4
Class Boundaries
Don’t forget to use a SCALE on your axis. Ticks should be equally spaced!
4
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Distribution
Shapes
Distribution shapes; Dot Plots; Frequency Polygons
& Ogives
Day 2:
Symmetric
Uniform
Skewed Left
Skewed Right
Recall a __________________________ can be used to represent a Frequency Distribuition.
Bimodal
Distribution Shapes
A: Distribution Shapes
Symmetric
Uniform
Skewed Left
Skewed Right
Bimodal
Copyright © Cengage Learning. All rights reserved.
Mound/Bell Shaped:
two sides are symmetrical with respect to a vertical line that goes through the middle of the graph
Uniform: every class has the same frequency
Copyright © Cengage Learning. All rights reserved.
Biomodal: histogram shows _______ peaks
Skewed Left: “tail” on the left is longer than right
2 | 14
Skewed Right: “tail” on right is longer than left
Unimodal: histogram shows ______ peak
Often a ______________________ distribution is caused by collecting data from a group of individuals
that could have been classified better into two separate groups for that particular data.
Example: height from a mixed group of men and woman
Gaps between bars at the left or right can be caused by _______________________.
These are values that are significantly higher or lower than the rest of your data.
Example: salaries of employees at a major corporation where the CEO makes three times as
much as rest of the workers.
5
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
B: Dot Plots (Similar to a histogram)
Example B1: Consider the following data set
12, 15, 16, 16, 14, 12, 14, 18, 19, 14, 15, 18, 16, 13, 15, 16, 13, 10, 18, 16
How many numbers? ______
Lowest? _______ Start with: _______
Highest? _______ End with: _______
Mode? (most frequent number)
______ how many times? ______
C: Frequency Polygons and Ogives
A Frequency Distribution can be used to produce a histogram.
Class
Boundaries
11.5-13.5
13.5-15.5
15.5-17.5
17.5-19.5
Frequency
7
9
10
4
6
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Another type of graph is a Frequency Polygon and is used to show trends in data.
Frequency Polygon
1. Horizontal axis = midpoints of each class
2. Vertical axis = frequency
3. Use a dot at the frequency, then use line segments to connect dots.
Class
Boundaries
11.5-13.5
13.5-15.5
15.5-17.5
17.5-19.5
Midpoints
Frequency
12.5
14.5
16.5
18.5
7
9
10
4
Ogive
A third type of graph is an Ogive and is used to show the cumulative frequencies for classes in a
frequency distribution. Cumulative frequencies help us determine how many data values are above or
below a certain upper boundary.
Class
Frequency Cumulative Cumulative
Boundaries
Frequency Relative
Frequency
11.5-13.5
7
7
7/30 = 0.23
13.5-15.5
9
16
16/30 = 0.53
15.5-17.5
10
26
26/30 = 0.97
17.5-19.5
4
30
30/30 = 1.00
Example C1:
What percent of data values
are below 17.5 ?
7
Course Notes: Probability and Statistics – Mrs. Leahy
Day 3:
Unit 1: Organizing Data
Bar Graphs, Pareto Charts, Pie Charts, Time Series
A: Bar Graph
Features of a Bar Graph
ο‚· Bars can be vertical or horizontal.
ο‚· Bars are of uniform width and uniformly spaced.
ο‚· Lengths represent values of variables being displayed, the frequency of occurrence, or the percentage of
occurrence. The same measurement scale is used for the length of each bar.
ο‚· The graph is well annotated with title, labels for each bar, and vertical scale or actual value for the length
of each bar.
Clustered Bar Graph:
two or more bars for each value on the
horizontal axis, clusters are uniformly spaced
Pareto Chart:
bars arranged by frequency, highest to lowest
B: Pie Charts/Circle Graphs
Wedges visually display proportional parts of
the total population as a percentage or as a
portion of 360°
Good for qualitative/categorical data
The graph should have a title and wedges
should be well labeled or have a key/legend.
8
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
C: Time Series
Data are plotted in order of occurrence at
regular intervals over time.
Dots are connected using line segments.
D: EXAMPLES
The chart shows the data (in hundreds) of the
enrollment of male and female students in a college
for the years of 1995, 2000, and 2005.
Year
1995
2000
2005
Male
30
34
32
Female 28
35
33
Example
D1. Make a clustered Bar Graph for this data.
Example
D2. Make a Pareto chart for male student
enrollment during the three years.
Keep in mind:
1. Bars = same width
2. Clusters = equally spaced
3. Bar length = % based on category or actual frequency
4. Include Title, Key, Labels, and Scale
Keep in mind:
1. Arrange data highest to lowest
2. Bars = same width and equally spaced
3. Bar length = frequency
4. Include Title, Bar labels
9
Course Notes: Probability and Statistics – Mrs. Leahy
Example
D3: Use the data to make a circle
graph to display the distribution
between male and female students in
1995.
𝑓
%=𝑛=
π‘π‘Žπ‘‘π‘’π‘”π‘œπ‘Ÿπ‘¦ π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘“π‘Ÿπ‘’π‘žπ‘’π‘’π‘›π‘π‘¦
Unit 1: Organizing Data
Year
1995
Male
30
Female
28
Fractional
Part
Percentage
Degrees
Total
Degrees = % x 360°
Example D4: Use the data to make a time series graph for the female student enrollment in those three
years.
Keep in mind:
1. Vertical Axis = frequency
2. Horizontal Axis = Time
3. Data points connected by line segments
4. Include Title, Labels, Scale
Year
1995
Female 28
2000
2005
35
33
10
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example D5:
Determine whether the statement is true or false.
A. In a bar graph, the bars do not have to be of uniform width.
B. The bars in a bar graph can be vertical or horizontal.
C. The lengths of the bars in a bar graph stands for certain values of the variable being
displayed.
D. When two or more variables are displayed together, the bar graph is called a clustered
bar graph (or a comparative bar graph).
E. In a Pareto chart, the bars are arranged from left to right according to increasing height.
F. A circle graph is also called a pie chart.
G. Circle graphs are usually used to display percentages.
H. A time series data contains the values of a variable taken at regular intervals over a
certain time period.
THINK ABOUT IT.
Bar Graph
Good for comparing Qualitative/Categorical Data or Quantitative/Numerical Data
(frequency)
(actual measurements)
Pareto Chart
Good for comparing Qualitative/Categorical Data in decreasing order
Circle Graphs
Good for comparing how a total is dispersed among several categories (compare percentages)
Time-Series Graph
Good for displaying
how data changes over time
Histogram
Good for displaying
the “shape” of the distribution of data
ALL GRAPHS:
Provide a title, label the axes, and identify units of measure.
11
Course Notes: Probability and Statistics – Mrs. Leahy
Day 4:
Unit 1: Organizing Data
Stem-and-Leaf Displays
A. Exploratory Data Analysis (EDA)
Exploratory Data Analysis techniques are used to explore a data set, to detect patterns and
extreme data values, to raise new questions, or to pursue leads in many directions.
Useful when data has been gathered for ______________________________.
For example: Ages of Applicants of Graduate Programs
B.
Key: 1 2 = 12
Stem-and-Leaf Display
Used for _________________________ data.
Best with small to medium size sets.
A stem-and-leaf display is used to ___________ order and arrange data
into groups.
The _____________ are aligned vertically from smallest to largest.
A vertical line is drawn to the right of the stems.
The ____________ with the same stem are placed in the same row as the stem, arranged in
___________________ order.
A label (Key) is used to indicate the magnitude of the numbers in the display.
Example B1:
A study on peanut butter reported the following optimal consumption temperatures for various brands:
56
44
62
36
39
53
50
65
45
40
Make a stem-and-leaf display for this data.
Step 1: Identify appropriate stem values.
List smallest to largest. No omissions!
Step 2: List leaves with corresponding stems
In numeric order smallest to largest!
Step 3: Include Key and Title
12
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example B2:
For the following data, use the first two digits as the step to make a stem-and-leaf display.
106
94
112
96
89
113
90
85
85
100
Example B3: Look at the distributions in example B1 and B2. Are the unimodal/bimodal? Are they
symmetrical or skewed?
C: Splitting the Stems
Consider:
0
0
1
2
3 3 4 5 5 7 7 8 9 9 9
Using only one stem “0” would give us an overcrowded graph. Instead of using an interval of 0-9, maybe
we could use TWO intervals.
Example C1: Make a stem-and-leaf display using
a) Two intervals: 0-4, 5-9
b) Five intervals: 0-1, 2-3, 4-5, 6-7, 8-9
13
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example C2. Britney is a swimmer training for a competition. The number of 50 meter laps she swam
each day for 30 days are as follows:
a) Prepare a stem-and-leaf plot.
b) Redraw the stem-and-leaf plot using two unit intervals.
c) Make a comment on what these plots show.
D: Back-To-Back Stem-and-Leaf Plots
If you are comparing two sets of data,
you can use a back-to-back
stem-and-leaf plot.
14
Course Notes: Probability and Statistics – Mrs. Leahy
Unit 1: Organizing Data
Example D1: The following class sizes were reported in Economics 101 and Math 151:
Econ 101:
Math 151
20, 34, 27, 15, 24, 35, 38, 28
14, 18, 21, 34, 29, 13, 32, 23
Make a back-to-back stem-and-leaf plot for the data.
15