12/8/2014 AGSC 320 Statistical Methods Organizing & Graphing Data 1 DATA • Numerical representation of reality • Raw data: Data recorded in the sequence in which they are collected and before any processing • Qualitative or quantitative • Depending of the measurement type different mathematical operations can be done Ex. 2 Data representation • Graphical presentation of data • Most common graph describing data → histogram • Histogram relates values taken by a variable with the frequency of occurrence of respective values • Frequency: # times a value is recorded / observed • Frequency distribution: organization of raw data using categories/classes and frequencies 3 1 12/8/2014 Frequency distribution • Class: a category grouping similar values – e.g. 1.2, 1.3, 1.7 and 2.2, 2.5, 2.9 – e.g. deer & bear and turkey & dove and bass & salmon • Properties of a frequency distribution: – There should be at least 5 classes but less than 20 – Classes are mutually exclusive – Classes are continuous • There is no gap between two adjacent classes – Classes are exhaustive • any value should be found in one class – Classes have same width 4 Relative frequency distributions • Relative frequency of a category – relates the frequency of A PARTICULAR category with frequency of ALL categories Relative Freq. Freq. of that category sum of all frequencies • Percentage of a category is relative frequency expresses in percentage Percentage = Rel.Freq. x 100 5 Frequency distribution Creating a frequency distribution: 1. • find the highest and lowest value 2. find the range 3. select the number of classes – rule of thumb: between 5 and 20 classes – Sturges’ Rule: c = 3.3 x log2 n +1 (round up-usually) 4. 5. 6. 7. 8. 9. determine the class width select a starting point and lower class limits find upper limits for each class find class boundaries and class midpoints tally raw data → find frequencies, plot the data → Graph 6 2 12/8/2014 Organizing Data Example (adjustment of Ex.2-3) Create a frequency distribution using the following data knowing that the table shows the heights of 20 trees from a stand 50, 45, 32, 48, 56, 38, 42, 48, 55, 36, 41, 51, 30, 59, 53, 47, 57, 51, 46, 44 7 Organizing Data • Step 1: extreme values • Step 2: compute range Range = largest value – smallest value • Step 3: determine the # classes 8 Organizing Data • Step 4: class width Width = range / # classes Commonly round width to meaningful values • Step 5: select staring point: – Usually the minimum value 9 3 12/8/2014 Organizing Data • Step 6: determine the upper class limit • Step 7: find class boundaries and midpoints • Step 8: tally raw data Class limits Class boundaries Midpoints Tally 10 Organizing Data • Histogram: graph displaying data using continuous bars having the height the frequencies of the classes and on the abscise the classes 11 Frequency polygon • Freq. polygon: line connecting the points representing the class midpoint and class frequency • Cumulative frequency: total # values bellow the upper-bound of each class • Ogive: curve representing the cumulative frequencies for the classes 40 Class Frequency Ogive 1 2 3 4 5 6 7 2 5 7 10 9 4 1 2 7 14 24 33 37 38 35 30 25 20 15 10 5 0 class 1 class 2 class 3 class 4 class 5 class 6 class 7 12 4 12/8/2014 Relative frequency • Represents the frequency distribution in respect with the total number of values • Ratio between class frequency and total # of values Relative frequency of a class Class Frequency Relative frequency 1 2 3 4 5 6 7 Total 2 5 7 10 9 4 1 38 0.05 0.13 class frequency # values in class total # values total # values 15 10 5 0 0.3 0 2 4 6 8 0.2 0.1 0 0 2 4 6 8 13 Shape of histograms • Frequency polygon – empirical representation of the distribution describing the investigated process • Shape of histogram important in determining the appropriate statistical methods used to analyze data Bell – shaped Unimodal Bimodal Uniform distribution Reverse-J distribution (Liocourt) Symmetric Left – skewed Right - skewed 14 Stem-and-leaf display • Another method of displaying data – Histograms and freq. polygons are other methods • Stem-and-leaf technique does not lose information of an individual observation • Break the information in two parts: 1. Stem: ordered list of significant digits 2. Leaf: ordered list of values having the same significant digits 15 5 12/8/2014 Stem-and leaf display Ex.2-49: yard gained by 14 running backs 745 1009 921 1133 1024 848 775 800 1275 857 933 1145 967 995 1. Split each value in two parts: • • first part →stem: significant digits: 7, 9, 11 etc second part → leaf: rest from actual value: 45, 21, etc 2. List stem value in ascendant order • 7, 8, 9, 10, etc 3. Write to each stem value the corresponding observation • For 7 there are 45 and 75 (corresponds to 745 and 775) 16 Stem-and-leaf display • Unsorted 7 45 75 8 48 9 21 00 33 10 24 11 33 09 45 57 67 95 57 67 95 12 75 • Sorted 7 45 75 8 00 9 21 48 33 10 09 11 33 24 45 12 75 17 6