Organizing data and frequency distribution

advertisement
12/8/2014
AGSC 320
Statistical Methods
Organizing & Graphing Data
1
DATA
• Numerical representation of reality
• Raw data: Data recorded in the sequence in
which they are collected and before any
processing
• Qualitative or quantitative
• Depending of the measurement type different
mathematical operations can be done
Ex.
2
Data representation
• Graphical presentation of data
• Most common graph describing data → histogram
• Histogram relates values taken by a variable with
the frequency of occurrence of respective values
• Frequency: # times a value is recorded / observed
• Frequency distribution: organization of raw data
using categories/classes and frequencies
3
1
12/8/2014
Frequency distribution
• Class: a category grouping similar values
– e.g. 1.2, 1.3, 1.7 and 2.2, 2.5, 2.9
– e.g. deer & bear and turkey & dove and bass & salmon
• Properties of a frequency distribution:
– There should be at least 5 classes but less than 20
– Classes are mutually exclusive
– Classes are continuous
• There is no gap between two adjacent classes
– Classes are exhaustive
• any value should be found in one class
– Classes have same width
4
Relative frequency distributions
• Relative frequency of a category – relates the
frequency of A PARTICULAR category with
frequency of ALL categories
Relative Freq. 
Freq. of that category
sum of all frequencies
• Percentage of a category is relative frequency
expresses in percentage
Percentage = Rel.Freq. x 100
5
Frequency distribution
Creating a frequency distribution:
1. •
find the highest and lowest value
2. find the range
3. select the number of classes
– rule of thumb: between 5 and 20 classes
– Sturges’ Rule: c = 3.3 x log2 n +1 (round up-usually)
4.
5.
6.
7.
8.
9.
determine the class width
select a starting point and lower class limits
find upper limits for each class
find class boundaries and class midpoints
tally raw data → find frequencies,
plot the data → Graph
6
2
12/8/2014
Organizing Data
Example (adjustment of Ex.2-3)
Create a frequency distribution using the
following data knowing that the table shows the
heights of 20 trees from a stand
50, 45, 32, 48, 56, 38, 42, 48, 55, 36,
41, 51, 30, 59, 53, 47, 57, 51, 46, 44
7
Organizing Data
• Step 1: extreme values
• Step 2: compute range
Range = largest value – smallest value
• Step 3: determine the # classes
8
Organizing Data
• Step 4: class width
Width = range / # classes
Commonly round width to meaningful values
• Step 5: select staring point:
– Usually the minimum value
9
3
12/8/2014
Organizing Data
• Step 6: determine the upper class limit
• Step 7: find class boundaries and midpoints
• Step 8: tally raw data
Class limits
Class boundaries
Midpoints
Tally
10
Organizing Data
• Histogram: graph displaying data using
continuous bars having the height the frequencies
of the classes and on the abscise the classes
11
Frequency polygon
• Freq. polygon: line connecting the points representing
the class midpoint and class frequency
• Cumulative frequency: total # values bellow the
upper-bound of each class
• Ogive: curve representing the cumulative frequencies for
the classes
40
Class Frequency Ogive
1
2
3
4
5
6
7
2
5
7
10
9
4
1
2
7
14
24
33
37
38
35
30
25
20
15
10
5
0
class 1 class 2 class 3 class 4 class 5 class 6 class 7
12
4
12/8/2014
Relative frequency
• Represents the frequency distribution in respect
with the total number of values
• Ratio between class frequency and total # of values
Relative frequency of a class 
Class Frequency Relative
frequency
1
2
3
4
5
6
7
Total
2
5
7
10
9
4
1
38
0.05
0.13
class frequency # values in class

total # values
total # values
15
10
5
0
0.3
0
2
4
6
8
0.2
0.1
0
0
2
4
6
8
13
Shape of histograms
• Frequency polygon – empirical representation of the
distribution describing the investigated process
• Shape of histogram important in determining the
appropriate statistical methods used to analyze data
Bell – shaped
Unimodal
Bimodal
Uniform distribution
Reverse-J distribution
(Liocourt)
Symmetric
Left – skewed
Right - skewed
14
Stem-and-leaf display
• Another method of displaying data
– Histograms and freq. polygons are other methods
• Stem-and-leaf technique does not lose
information of an individual observation
• Break the information in two parts:
1. Stem: ordered list of significant digits
2. Leaf: ordered list of values having the same
significant digits
15
5
12/8/2014
Stem-and leaf display
Ex.2-49: yard gained by 14 running backs
745
1009
921 1133 1024 848 775 800
1275 857 933 1145 967 995
1. Split each value in two parts:
•
•
first part →stem: significant digits: 7, 9, 11 etc
second part → leaf: rest from actual value: 45, 21, etc
2. List stem value in ascendant order
•
7, 8, 9, 10, etc
3. Write to each stem value the corresponding
observation
•
For 7 there are 45 and 75 (corresponds to 745 and 775)
16
Stem-and-leaf display
• Unsorted
7 45
75
8 48
9 21
00
33
10 24
11 33
09
45
57
67
95
57
67
95
12 75
• Sorted
7 45
75
8 00
9 21
48
33
10 09
11 33
24
45
12 75
17
6
Download